In assignment 1, we used linear regression for classification: $$y(\mathbf{x}, \mathbf{w}) = \mathbf{x}^T \mathbf{w} + b$$
We will consider linear model for classification. Note that the model is linear in parameters.
$$y(\mathbf{x}, \mathbf{w}) = \sigma (\mathbf{x}^T \mathbf{w} + b)$$where
$$ \sigma(x) = {1 \over {1 + e^{-x}}}$$import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%pylab inline
import warnings
warnings.filterwarnings('ignore')
Populating the interactive namespace from numpy and matplotlib
with np.load("TINY_MNIST.npz") as data:
x_train, t_train = data["x"], data["t"]
x_eval, t_eval = data["x_eval"], data["t_eval"]
import nn_utils as nn
nn.show_images(x_train[:200], (10, 20), scale=1)
nn.show()
#Placeholders
X = tf.placeholder("float", shape=(None, 64))
Y = tf.placeholder("float", shape=(None, 1))
#Varialbels
W = tf.Variable(np.random.randn(64, 1).astype("float32"), name="weight")
b = tf.Variable(np.random.randn(1).astype("float32"), name="bias")
X.get_shape()
TensorShape([Dimension(None), Dimension(64)])
We will consider linear model for classification. Note that the model is linear in parameters.
$$y(\mathbf{x}, \mathbf{w}) = \sigma (\mathbf{x}^T \mathbf{w} + b)$$logits = tf.add(tf.matmul(X, W), b)
output = tf.nn.sigmoid(logits)
print output.get_shape()
TensorShape([Dimension(None), Dimension(1)])
Cross-Entropy cost = $t * -\text{log}(y) + (1 - t) * -\text{log}(1 - y)$
Cross-Entropy cost = $t * -\text{log}(\sigma(x)) + (1 - t) * -\text{log}(1 - \sigma(x))$
where $ \sigma(x) = {1 \over {1 + e^{-x}}}$
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
sigmoid(100)
1.0
#This doesn't work!
def xentropy(x, t):
return t*-np.log(sigmoid(x)) + (1-t)*-np.log(1.0 - sigmoid(x))
print xentropy(10, 1)
print xentropy(-1000, 0)
print xentropy(1000, 0)
print xentropy(-1000, 1)
4.53988992168e-05 nan inf inf
#This kind of works!
def hacky_xentropy(x, t):
return t*-np.log(1e-15 + sigmoid(x)) + (1-t)*-np.log(1e-15 + 1.0 - sigmoid(x))
print hacky_xentropy(1000, 1)
print hacky_xentropy(-1000, 0)
print hacky_xentropy(1000, 0)
print hacky_xentropy(-1000, 1)
-1.11022302463e-15 -1.11022302463e-15 34.4342154767 34.5387763949
#This kind of works!
def another_hacky_xentropy(x, t):
return -np.log(t*sigmoid(x) + (1-t)*(1-sigmoid(x)))
print another_hacky_xentropy(1000, 1)
print another_hacky_xentropy(-1000, 0)
print another_hacky_xentropy(1000, 0)
print another_hacky_xentropy(-1000, 1)
-0.0 -0.0 inf inf
Cross-Entropy = $x - x * t + log(1 + e^{-x}) = max(x, 0) - x * t + log(1 + e^{-|x|}))$
def good_xentropy(x, t):
return np.maximum(x, 0) - x * t + np.log(1 + np.exp(-np.abs(x)))
print good_xentropy(1000, 1)
print good_xentropy(-1000, 0)
print good_xentropy(1000, 0)
print good_xentropy(-1000, 1)
0.0 0.0 1000.0 1000.0
x = np.arange(-10, 10, 0.1)
y = [good_xentropy(i, 1) for i in x]
plt.plot(x, y)
plt.grid(); plt.xlabel("logit"); plt.ylabel("Cross-Entropy")
<matplotlib.text.Text at 0x1093eecd0>
def svm_cost(x):
return - x + 1 if x < 1 else 0
x = np.arange(-10, 10, 0.1)
y = [svm_cost(i) for i in x]
plt.plot(x, y)
plt.grid(); plt.xlabel("logit"); plt.ylabel("Cross-Entropy")
<matplotlib.text.Text at 0x109374050>
cost_batch = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, targets=Y)
cost = tf.reduce_mean(cost_batch)
print logits.get_shape()
print cost.get_shape()
TensorShape([Dimension(None), Dimension(1)]) TensorShape([])
norm_w = tf.nn.l2_loss(W)
"This is logistic regression on noisy moons dataset from sklearn which shows the smoothing effects of momentum based techniques (which also results in over shooting and correction). The error surface is visualized as an average over the whole dataset empirically, but the trajectories show the dynamics of minibatches on noisy data. The bottom chart is an accuracy plot." (Image by Alec Radford)
![Momentum](http://2.bp.blogspot.com/-q6l20Vs4P_w/VPmIC7sEhnI/AAAAAAAACC4/g3UOUX2r_yA/s1600/s25RsOr%2B-%2BImgur.gif =100x20)
optimizer = tf.train.MomentumOptimizer(learning_rate=1.0, momentum=0.99)
# optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0)
train_op = optimizer.minimize(cost)
#a hack for binary thresholding
pred = tf.greater(output, 0.5)
pred_float = tf.cast(pred, "float")
#accuracy
correct_prediction = tf.equal(pred_float, Y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess = tf.InteractiveSession()
init = tf.initialize_all_variables()
sess.run(init)
for epoch in range(2000):
for i in xrange(8):
x_batch = x_train[i * 100: (i + 1) * 100]
y_batch = t_train[i * 100: (i + 1) * 100]
cost_np, _ = sess.run([cost, train_op],
feed_dict={X: x_batch, Y: y_batch})
#Display logs per epoch step
if epoch % 50 == 0:
cost_train, accuracy_train = sess.run([cost, accuracy],
feed_dict={X: x_train, Y: t_train})
cost_eval, accuracy_eval, norm_w_np = sess.run([cost, accuracy, norm_w],
feed_dict={X: x_eval, Y: t_eval})
print ("Epoch:%04d, cost=%0.9f, Train Accuracy=%0.4f, Eval Accuracy=%0.4f, Norm of Weights=%0.4f" %
(epoch+1, cost_train, accuracy_train, accuracy_eval, norm_w_np))
Epoch:0001, cost=1.221206784, Train Accuracy=0.6900, Eval Accuracy=0.6650, Norm of Weights=78.0369 Epoch:0051, cost=0.108576626, Train Accuracy=0.9925, Eval Accuracy=0.9525, Norm of Weights=21902.0879 Epoch:0101, cost=0.039827708, Train Accuracy=0.9962, Eval Accuracy=0.9400, Norm of Weights=21526.6953 Epoch:0151, cost=0.016990982, Train Accuracy=0.9975, Eval Accuracy=0.9400, Norm of Weights=21287.8965 Epoch:0201, cost=0.008619667, Train Accuracy=0.9987, Eval Accuracy=0.9375, Norm of Weights=21630.7266 Epoch:0251, cost=0.006221656, Train Accuracy=1.0000, Eval Accuracy=0.9375, Norm of Weights=22097.6465 Epoch:0301, cost=0.005005265, Train Accuracy=1.0000, Eval Accuracy=0.9375, Norm of Weights=22532.3867 Epoch:0351, cost=0.004219066, Train Accuracy=1.0000, Eval Accuracy=0.9375, Norm of Weights=22931.1543 Epoch:0401, cost=0.003665265, Train Accuracy=1.0000, Eval Accuracy=0.9375, Norm of Weights=23299.3828 Epoch:0451, cost=0.003253295, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=23642.1973 Epoch:0501, cost=0.002933940, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=23963.8164 Epoch:0551, cost=0.002679895, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=24267.5586 Epoch:0601, cost=0.002475345, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=24556.2324 Epoch:0651, cost=0.002309947, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=24832.2344 Epoch:0701, cost=0.002175402, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=25097.5879 Epoch:0751, cost=0.002064377, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=25353.9160 Epoch:0801, cost=0.001970768, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=25602.5332 Epoch:0851, cost=0.001890046, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=25844.4199 Epoch:0901, cost=0.001819121, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=26080.3594 Epoch:0951, cost=0.001755936, Train Accuracy=1.0000, Eval Accuracy=0.9350, Norm of Weights=26310.9219 Epoch:1001, cost=0.001699073, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=26536.5977 Epoch:1051, cost=0.001647520, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=26757.8105 Epoch:1101, cost=0.001600507, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=26974.9258 Epoch:1151, cost=0.001557441, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=27188.2422 Epoch:1201, cost=0.001517841, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=27398.0293 Epoch:1251, cost=0.001481280, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=27604.5820 Epoch:1301, cost=0.001447390, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=27808.0020 Epoch:1351, cost=0.001415835, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=28008.5566 Epoch:1401, cost=0.001386318, Train Accuracy=1.0000, Eval Accuracy=0.9325, Norm of Weights=28206.3594 Epoch:1451, cost=0.001358589, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=28401.5312 Epoch:1501, cost=0.001332433, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=28594.2266 Epoch:1551, cost=0.001307678, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=28784.4531 Epoch:1601, cost=0.001284176, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=28972.3613 Epoch:1651, cost=0.001261803, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=29158.0625 Epoch:1701, cost=0.001240452, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=29341.5488 Epoch:1751, cost=0.001220033, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=29522.9199 Epoch:1801, cost=0.001200472, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=29702.2598 Epoch:1851, cost=0.001181695, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=29879.6602 Epoch:1901, cost=0.001163654, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=30055.0801 Epoch:1951, cost=0.001146289, Train Accuracy=1.0000, Eval Accuracy=0.9300, Norm of Weights=30228.6738
As you can see when the data is linearly separable, the norm of W goes to infinity! (Can you explain why?)
Add L2 regularization to the above code so as to prevent this from happening (only one line of code! Thanks to awesome TensorFlow!)