Brief Vanilla GAN: Minimal Implementation

by Jayeol Chun

Setup

MNIST

In [3]:
mnist = input_data.read_data_sets('/home/caide-lp-009/Documents/Data/MNIST', one_hot=True)
Extracting /home/caide-lp-009/Documents/Data/MNIST/train-images-idx3-ubyte.gz
Extracting /home/caide-lp-009/Documents/Data/MNIST/train-labels-idx1-ubyte.gz
Extracting /home/caide-lp-009/Documents/Data/MNIST/t10k-images-idx3-ubyte.gz
Extracting /home/caide-lp-009/Documents/Data/MNIST/t10k-labels-idx1-ubyte.gz

Constants Initialization

  • their representations will be clear later
In [4]:
batch = 64
z_dim = 64
k_steps = 1
num_iter = 100000
every = 2000

Image Properties

In [5]:
img_h, img_w, img_c = 28,28,3
img_dim = img_h*img_w

Utility Functions

Initial Noise Vector for Z

  • Does not have to come from uniform distribution, just has to be fixed
In [6]:
def random_uniform(size):
    return np.random.uniform(-1., 1., size)
In [7]:
def gaussian_noise_layer(input_layer, std=.2):
    """https://stackoverflow.com/questions/41174769/additive-gaussian-noise-in-tensorflow"""
    return tf.random_normal(shape=input_layer, mean=0.0, stddev=std, dtype=tf.float32)

Misc.. (skip!)

In [8]:
def define_variable(shape, bias=False, name=None):
    if bias:
        return tf.Variable(tf.zeros(shape), name=name)
    return tf.Variable(gaussian_noise_layer(shape))
In [9]:
def plot(samples):
    """https://github.com/wiseodd/generative-models/blob/master/GAN/vanilla_gan/gan_tensorflow.py#L57"""
    fig = plt.figure(figsize=(4,4))
    gs = gridspec.GridSpec(4,4)
    gs.update(wspace=0.05, hspace=0.05)
    for i, sample in enumerate(samples):
        ax=plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28,28), cmap='Greys_r')
    return fig

Discriminator:

  • given an image, classify whether the image is fake or real (0 vs 1)
  • the output dimension of 1 (and mini-batch number of them)
In [10]:
D_W1 = define_variable(shape=[img_dim, batch])
D_b1 = define_variable(shape=[batch], bias=True)
D_W2 = define_variable(shape=[batch, 1])
D_b2 = define_variable(shape=[1], bias=True)
In [11]:
def D(x):
    hid = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
    logit = tf.matmul(hid, D_W2)+D_b2
    return logit, tf.nn.sigmoid(logit)

Generator

  • given noise (in some pre-defined shape), generate an image
  • hence the output dimension of img_dim (and mini-batch number of them)
In [12]:
G_W1 = define_variable(shape=[z_dim, batch])
G_b1 = define_variable(shape=[batch],bias=True)
G_W2 = define_variable(shape=[batch, img_dim])
G_b2 = define_variable(shape=[img_dim], bias=True)
In [13]:
def G(z):
    hid = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
    logit = tf.matmul(hid, G_W2)+G_b2
    return logit, tf.nn.sigmoid(logit)

Main Execution

In [14]:
X = tf.placeholder(tf.float32, shape=[None, img_dim], name='X') # for batches of data (real)
Z = tf.placeholder(tf.float32, shape=[None, z_dim], name='Z') # for noise
In [15]:
# generate a fake image
fake = G(Z)
In [16]:
D_logit_real, D_real = D(X) # Performance of Discriminator on real image
D_logit_fake, D_fake = D(fake[1]) # Performance of Discriminator on fake image

Training Details:

  • train D by:
    • maximizing $log(D(X)) - log(1-D(G(z))$
  • train G by \log(i)
    • minimizing $log(1 - D(G(z))$
  • BUT:
    • Early on, $D$ likely to perform better and reach optimality faster (we kind of want it too)
    • in the extreme case, $G$ will collapse (mode collapse) and no training occurs for $G$
  • As a workaround:
    • maximizing $log D(G(z))$ instead of minimizing $log(1 - D(G(z))$ !
    • Several authors since 2014 have questioned the validity of this approach, i.e. whether it is a genuine solution or just a path, including the author himself who admitted that this solution is 'heuristically motivated.' (Goodfellow et al., 2014, Arjovsky and Bottou, 2017)
In [17]:
# based on original paper
D_loss = - tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = tf.reduce_mean(tf.log(1. - D_fake)) # This diverges
In [18]:
# Alternative suggested in the original paper
G_loss = -tf.reduce_mean(tf.log(D_fake)) 

Set up for Training procedure

In [19]:
D_step = tf.train.AdamOptimizer().minimize(D_loss, var_list=[D_W1,D_W2,D_b1,D_b2])
G_step = tf.train.AdamOptimizer().minimize(G_loss, var_list=[G_W1,G_W2,G_b1,G_b2])
In [20]:
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
sess.run(tf.global_variables_initializer())
In [22]:
for i in range(num_iter):
    if (i+1)%every==0:
        # fake image quality check for every certain iterations
        print '{}th Train Iteration'.format(i+1)
        _, sample = sess.run(fake, feed_dict={Z:random_uniform((16, z_dim))})
        fig = plot(sample)
        plt.savefig(img_folder+'/{}.png'.format(str(i+1).zfill(3)), bbox_inches='tight')
        plt.show(); plt.close(fig)

    mini_batch, _ = mnist.train.next_batch(batch_size=batch)
    
    for _ in range(k_steps):
        """[NOTE] Author alternates between $k steps of optimizing D and one step of optimizing G in the original paper,
        but retracts this statement two years later and suggests simultaneous training (k_steps=1)"""
        _, D_train_loss = sess.run([D_step, D_loss], feed_dict={X:mini_batch, Z:random_uniform((batch, z_dim))})
    _, G_train_loss = sess.run([G_step, G_loss], feed_dict={Z:random_uniform((batch, z_dim))})
    
    if (i+1)%every==0:
        print 'D loss: {:.4}'.format(D_train_loss)
        print 'G loss: {:.4}\n'.format(G_train_loss)
2000th Train Iteration
D loss: 0.02843
G loss: 7.037

4000th Train Iteration
D loss: 0.1405
G loss: 5.002

6000th Train Iteration
D loss: 0.4166
G loss: 4.694

8000th Train Iteration
D loss: 0.5461
G loss: 3.235

10000th Train Iteration
D loss: 0.7113
G loss: 2.165

12000th Train Iteration
D loss: 0.6883
G loss: 1.973

14000th Train Iteration
D loss: 0.8421
G loss: 1.944

16000th Train Iteration
D loss: 1.038
G loss: 1.841

18000th Train Iteration
D loss: 0.9338
G loss: 1.803

20000th Train Iteration
D loss: 0.969
G loss: 1.36

22000th Train Iteration
D loss: 0.9636
G loss: 1.225

24000th Train Iteration
D loss: 0.9372
G loss: 1.371

26000th Train Iteration
D loss: 1.107
G loss: 1.201

28000th Train Iteration
D loss: 1.084
G loss: 1.323

30000th Train Iteration
D loss: 1.143
G loss: 1.06

32000th Train Iteration
D loss: 1.015
G loss: 1.387

34000th Train Iteration
D loss: 0.9068
G loss: 1.302

36000th Train Iteration
D loss: 0.9172
G loss: 1.443

38000th Train Iteration
D loss: 1.108
G loss: 1.255

40000th Train Iteration
D loss: 0.7789
G loss: 1.566

42000th Train Iteration
D loss: 1.157
G loss: 1.661

44000th Train Iteration
D loss: 0.8797
G loss: 1.414

46000th Train Iteration
D loss: 0.9088
G loss: 1.399

48000th Train Iteration
D loss: 0.9141
G loss: 1.419

50000th Train Iteration
D loss: 1.014
G loss: 1.738

52000th Train Iteration
D loss: 0.798
G loss: 1.531

54000th Train Iteration
D loss: 0.7946
G loss: 1.58

56000th Train Iteration
D loss: 0.7549
G loss: 1.662

58000th Train Iteration
D loss: 0.9223
G loss: 1.665

60000th Train Iteration
D loss: 0.9177
G loss: 1.967

62000th Train Iteration
D loss: 0.8004
G loss: 1.818

64000th Train Iteration
D loss: 0.8622
G loss: 1.589

66000th Train Iteration
D loss: 0.7848
G loss: 2.069

68000th Train Iteration
D loss: 0.8556
G loss: 1.969

70000th Train Iteration
D loss: 0.8465
G loss: 1.943

72000th Train Iteration
D loss: 0.728
G loss: 1.776

74000th Train Iteration
D loss: 0.9248
G loss: 1.77

76000th Train Iteration
D loss: 0.7544
G loss: 2.161

78000th Train Iteration
D loss: 0.6975
G loss: 1.587

80000th Train Iteration
D loss: 0.7943
G loss: 1.912

82000th Train Iteration
D loss: 0.81
G loss: 1.799

84000th Train Iteration
D loss: 0.7834
G loss: 2.078

86000th Train Iteration
D loss: 0.6925
G loss: 2.203

88000th Train Iteration
D loss: 0.6276
G loss: 1.775

90000th Train Iteration
D loss: 0.6171
G loss: 1.881

92000th Train Iteration
D loss: 0.6116
G loss: 2.093

94000th Train Iteration
D loss: 0.8284
G loss: 1.935

96000th Train Iteration
D loss: 0.7823
G loss: 2.258

98000th Train Iteration
D loss: 0.7754
G loss: 2.555

100000th Train Iteration
D loss: 0.7803
G loss: 1.812

Thank you!

References: