In [4]:

# %load /Users/facai/Study/book_notes/preconfig.py
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
#sns.set(font='SimHei')
plt.rcParams['axes.grid'] = False

#from IPython.display import SVG
def show_image(filename, figsize=None, res_dir=True):
    if figsize:
        plt.figure(figsize=figsize)

    if res_dir:
        filename = './res/{}'.format(filename)

    plt.imshow(plt.imread(filename))

Chapter 7 Regularization for Deep Learning¶

the best fitting model is a large model that has been regularized appropriately.

7.1 Parameter Norm Penalties¶

\begin{equation} \tilde{J}(\theta; X, y) = J(\theta; X, y) + \alpha \Omega(\theta) \end{equation}

where $\Omega(\theta)$ is a paramter norm penalty.

typically, penalizes only the weights of the affine transformation at each layer and leaves the biases unregularized.

7.1.1 $L^2$ Parameter Regularization¶

7.1.2 $L^1$ Regularization¶

The sparsity property induced by $L^1$ regularization => feature selection

7.2 Norm Penalties as Constrained Optimization¶

constrain $\Omega(\theta)$ to be less than some constant $k$:

\begin{equation} \mathcal{L}(\theta, \alpha; X, y) = J(\theta; X, y) + \alpha(\Omega(\theta) - k) \end{equation}

In practice, column norm limitation is always implemented as an explicit constraint with reprojection.

7.3 Regularization and Under-Constrained Problems¶

regularized matrix is guarantedd to be invertible.

7.4 Dataset Augmentation¶

create fake data:

transform
inject noise

7.5 Noise Robustness¶

add noise to data
add noise to weight (Bayesian: variable distributaion):
is equivalent with an additional regularization term.
add noise to output target: label smooothing

7.6 Semi-Supervised Learning¶

Goal: learn a representation so that example from the same class have similar representations.

7.7 Multi-Task Learning¶

Task-specific paramters
Generic parameters

In [5]:

show_image("fig7_2.png")

7.8 Early Stopping¶

run it until the ValidationSetError has not imporved for some amount of time.

Use the parameters of the lowest ValidationSetError during the whole train.

In [6]:

show_image("fig7_3.png", figsize=[10, 8])

regularized the paramters of one model (supervised) to be close to model (unsupervised)
to force sets of parameters to be equal: parameter sharing => convolutional neural networks.

7.10 Sparse Representations¶

place a penalty on the activations of the units in a neural network, encouraging their activations to be sparse.

7.11 Bagging and Other Ensemble Methods¶

7.12 Dropout¶

increase the size of the model when using dropout.

small samples, dropout is less effective.

7.13 Adversarial Training¶

In [8]:

show_image("fig7_8.png", figsize=[10, 8])

7.14 Tangent Distance, Tangent Prop, and Manifold Tangent Classifier¶

In [ ]: