Notebook

Deep Learning with Pytorch（简介）¶

python 优先的计算包，主要针对用户的两类需求：

作为numpy的替代品，提供GPU加速
作为深度学习框架，提供最大的灵活性和速度

本节目标

理解PyTorch的核心数据结构 Tensor 和Variable，以及高阶的神经网络接口(nn).
训练一个神经网络进行图片分类（mnist+cifar10）

In [9]:

import numpy as np
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

torch.manual_seed(1)

Out[9]:

<torch._C.Generator at 0x7f34ac1fdcc0>

Torch's tensor library（数据结构）¶

Tensors张量就像numpy的ndarrays(多维数组)，但是PyTorch的Tensor可以使用GPU进行加速。

一维张量==列向量
二维张量==矩阵
三维张量==就叫张量了
四维张量以上就只是数字了

张量是运算的基本数据类型，学习数据类型只需要知道增删改查

文档在此

增：创建张量¶

In [ ]:

# Create a torch.Tensor object with the given data.  It is a 1D vector
V_data = [1., 2., 3.]
V = torch.Tensor(V_data) # 列向量
print(V)
# Index into V and get a scalar标量
print(V[0])


# Creates a matrix
M_data = [[1., 2., 3.], [4., 5., 6]]
M = torch.Tensor(M_data)
print(M)
# Index into M and get a vector向量
print(M[0])


# Create a 3D tensor of size 2x2x2.
T_data = [[[1.,2.], [3.,4.]],
          [[5.,6.], [7.,8.]]]
T = torch.Tensor(T_data)
print(T)
# Index into T and get a matrix矩阵
print(T[0])

print(
torch.randn((3, 4, 5)), # 3x4x5的随机张量
torch.eye(5),
torch.ones(5),
torch.zeros(4,4),
torch.from_numpy(np.array([1, 2, 3])),
torch.linspace(start=-10, end=10, steps=5),
torch.logspace(start=-10, end=10, steps=5)
)

print(
torch.ones_like(T),
torch.zeros_like(T),
) # 不知道为啥报错,应该有这个属性的

创建GPU: CUDA Tensors

Tensor可以通过.cuda 函数转为GPU的Tensor，享受GPU加速

In [ ]:

# 如果GPU可用的话，就执行下一步
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    x + y

删：删除数据等¶

In [ ]:

# 重新创建删除元素后的新张量吧，少年！
# torch.split()

改：张量运算、合并、变形、转换等Operations with Tensors¶

运算（以加法运算的几种使用方式为例）¶

Note: 函数名后面跟着 _ 的函数会修改tensor 本身

比如： x.copy_(y), x.t_(), 会改变 x.

但是x.t() 返回一个新的矩阵，而x的数据不变

In [ ]:

x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])
z = x + y # 运算后还是torch.Tensor #----------------0
# 调用add()方法
z = torch.add(x, y)              #----------------1
# 指定加法结果的输出目标为z
z = torch.Tensor(3)
torch.add(x, y, out=z)           #----------------2
# ！！！！in-place加法！！
y.add(x) # 普通加法, y不变         #----------------3
print('y=',y)
y.add_(x) # in-place 加法, y变了  #----------------4
print('y=',y)
print('z=',z)

合并（按第几维合并）¶

In [ ]:

# By default, it concatenates along the first axis (concatenates rows)默认按第一列合并
x_1 = torch.randn(2, 5)
y_1 = torch.randn(3, 5)
z_1 = torch.cat([x_1, y_1]) # 等价于torch.cat([x_1, y_1], 0), 按第一列合并
print('z_1=',z_1)

# Concatenate columns:
x_2 = torch.randn(2, 3)
y_2 = torch.randn(2, 5)
z_2 = torch.cat([x_2, y_2], 1) # second arg specifies which axis to concat along按第二列合并
print('z_2=',z_2)

# If your tensors are not compatible, torch will complain.  Uncomment to see the error
# torch.cat([x_1, x_2])  # 没有对齐，不要作死去合并

变形（reshape）¶

In [ ]:

x = torch.randn(2, 3, 4)
print('x=',x)
print(x.view(2, 12)) # Reshape to 2 rows, 12 columns
print(x.view(2, -1)) # Same as above.  If one of the dimensions is -1, its size can be inferred
# 有-1自动推断-1应该代表的数,这里-1代表3x4=12
print(x.view(1, -1))
# 这里-1代表2x3x4=24

转换（转换成numpy）¶

Tensor和numpy对象共享内存，所以他们之间的转换很快，而且不会消耗太多的额外资源.

但这也意味着，其中一个变了，另外一个也会随之改变

In [ ]:

# 4.1 torch Tensor -> numpy Array
a = torch.ones(5)
b = a.numpy()
print('a=',a,'b=',b)
a.add_(1) #以`_` 结尾的函数 是in-place 会修改自身！！
print('a=',a,'b=',b) # tensor变了，numpy 的array也变了，因为他们共享内存。
print('\n\n')
# 4.2 numpy Array -> torch Tensor
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print('a=',a,'b=',b) # array变了，tensor 也变了，因为他们共享内存。

查：张量属性、张量内的数据等¶

NOTE: torch.Size 是一个tuple对象的子类, 所以它支持tuple的所有操作，比如x.size()[0]

In [ ]:

print(
    'T.size()=', T.size(),
    'T.size()[1]=', T.size()[1],
    'T.sort()=', T.sort(),
    'T.sign()=', T.sign(),
    
# Tensor具有和numpy类似的选取（indexing）操作   （standard numpy-like indexing with all bells and whistles）
    'T[:,1]=', T[:,1]
)

Computation Graphs and Automatic Differentiation（核心）¶

Autograd: 自动微分（求导）

在Tensor上的所有操作，autograd都能为他们自动提供微分， autograd采用define-by-run的运行机制（注意和define-and-run的区别），意味着反向传播的过程取决于你怎么定义代码（好抽象），即你每次计算都可以提供一个不一样的操作（不像TensorFlow预先定义好一个图，然后不能改，要运行好几次）

Variable：变量¶

autograd.Variable是autograd中的核心类, 它简单的封装了Tensor，并支持几乎所有Tensor操作（即tensor支持的操作，你基本都能直接用在Variable上, Tensor在被封装为Variable之后，可以调用它的.backward操作实现反向传播，自动计算所有梯度)

In [ ]:

import torch.autograd as autograd
from torch.autograd import Variable
# Variables wrap tensor objects
# autograd.Variable是autograd中的核心类, 它简单的封装了Tensor
x = autograd.Variable( torch.Tensor([1., 2., 3]), requires_grad=True )
# You can access the data with the .data attribute
print(x.data)

# You can also do all the same operations you did with tensors with Variables.支持几乎所有Tensor操作
y = autograd.Variable( torch.Tensor([4., 5., 6]), requires_grad=True )
z = x + y
print(z.data)

# BUT z knows something extra.可以调用它的.backward操作实现反向传播，自动计算所有梯度
print(z.grad_fn)

通过.data 属性，可以访问Variable所包含的Tensor，.grad 可以访问对应的梯度(也是个Variable,而不是Tensor)

注意：.grad 和.data的形状一样, 并且.grad是累加的(accumulated), 意味着每一次运行反向传播,梯度都会加上之前的梯度, 所以运行zero_grad很有必要.

In [ ]:

x = Variable(torch.ones(2, 2), requires_grad = True)
y = x.mean()
y.backward()
x.grad # y = x.mean-> y = 0.25 * (x[0][0] + x[0][1] + x[1][0] + x[1][1]) 

In [ ]:

# .grad随y.backward()累加
print('x.data=', x.data)
y.backward()
print('x.grad=', x.grad)
y.backward()
print('x.grad=', x.grad)

# x.grad.data先置0再y.backward()，不累加
x.grad.data.zero_()
y.backward()
print('x.grad=', x.grad)

Function：动态图的实现原理¶

autograd中的另一个比较重要的类是Function, 这个类在你实现具有自动求导功能的函数时很有用(注:在实际使用中尽量用nn.module替代)

Variable 和 Function 彼此相互联系, 建立起无环图, 记住计算历史(对应到图,就是每个节点都会记住哪个节点,或者哪条边指向自己) 这些信息保存在.creator属性中, 它指向一个Function对象, 这个Function对象的输出是这个Variable自己

(由用户创建的Variable的creator 是None 比如 assert Variable(Tensor(3,4)).creator is None.

注: 关于计算图和方向传播的相关知识,极力推荐: http://colah.github.io/posts/2015-08-Backprop/

如果 Variable是一个标量(只包含一个数,而非向量之类的), 你在反向传播时候可以不指定参数,默认是1.否则你必须指定一个梯度,这个梯度和Variable具有相同的形状.

$\frac{d z}{d x} = \frac{d z}{d y} \frac{d y}{d x}$

假设这个Variable是y, 那么反向传播要求的是dz/dx, .backward要传进来的参数就是dz/dy

In [ ]:

x = autograd.Variable(torch.ones(2,2), requires_grad = True)
print('x=', x)
y = x + 2
print('y=', y)
print('x.grad_fn=', x.grad_fn)
print('y.grad_fn=', y.grad_fn)

z = y * y * 3
print('z=', z)

out = z.mean()
print('out=', out)
# let's backprop now
out.backward() # out.backward() 和 out.backward(torch.Tensor([1.0])) 等价
# 打印梯度: d(out)/dx
print('x.grad=', x.grad)

矩阵大小都是4.5. 显而易见, 在纸上稍微写一下就知道了假设 out Variable 是" $o$ ".
则有 $\begin{align} o &= \frac{1}{4}\sum_i z_i\\ z_i &= 3(x_i+2)^2\\ z_i\bigr\rvert_{x_i=1} &= 27\\ \frac{\partial o}{\partial x_i} &= \frac{3}{2}(x_i+2)\\ \frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} &= \frac{9}{2} = 4.5 \end{align}$ .

默认情况下，求导计算将刷新计算图中包含的所有内部缓冲区，因此如果您想在计算图的某部分图上执行两次反向传播，则需要在定义时传入retain_variables = True

In [ ]:

x = Variable(torch.ones(2, 2), requires_grad=True)
y = x + 2
y.backward(torch.ones(2, 2), retain_graph=True)
# 打印梯度: d(out)/dx
print('x.grad=', x.grad)
z = y * y
print('z=', z)

gradient = torch.randn(2, 2)

# this would fail if we didn't specify　that we want to retain variables
y.backward(gradient)

print('x.grad=', x.grad)

有了AutoGrad可以做很多疯狂的事情:

更多关于 Variable 和 Function 的文档: pytorch.org/docs/autograd.html

In [ ]:

x = torch.randn(3)
x = Variable(x, requires_grad = True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print('y=',y)
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
print('x.grad=', x.grad)
print('y=',y)

Neural Network（神经网络）¶

torch.nn 包用来创建神经网络.

nn 构建于 autograd 之上,来定义和运行神经网络. 这个nn 和lua torch的nn 接口相似, 但是实现几乎完全不一样.

nn.Module可以近似为一个网络(?), 包含很多layers, 调用他的forward(input), 可以返回前向传播的结果output

来用LeNet + Mnist练练手吧:

这是一个基础的前向传播(feed-forward)的网络: 接收输入,并通过一层一层的传递到最后,给出输出.

流程如下(大多数的神经网络训练流程都是这样):

定义网络结构和参数
在不同的数据中迭代
- 数据预处理,并输入到网络中
- 计算loss(output和目标距离的偏差)
- 反向传播梯度(误差)
- 更新网络参数(使用最基础的SGD weight = weight + learning_rate * gradient)

定义网络¶

In [ ]:

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5) # 1通道:因为是黑白图片, 6个输出channels,  5x5 的卷积核
        self.conv2 = nn.Conv2d(6, 16, 5) #６通道（因为上面的输出是６通道的）
        self.fc1   = nn.Linear(16*5*5, 120) # 仿射层，　其实就是: y = Wx + b
        self.fc2   = nn.Linear(120, 84)# 同上
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) 
        # Max pooling over a (2, 2) window　，也可以直接写成２，如下
        x = F.max_pool2d(F.relu(self.conv2(x)), 2) 
        # If the size is a square you can only specify a single number
        x = x.view(-1, self.num_flat_features(x))
        #x = x.view(x.size()[0], -1) # 用这个应该也行,
#         if x.sum()>100: return x 
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
#        if x[0]>0 : return 1
        
        return x
    
    # ｘ的大小是Ｎ*H*W*C 
    # N 是batch_size, 我们要的是features_size,也就是一张图片?的所有像素
    def num_flat_features(self, x):
        size = x.size()[1:] 

        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()

In [ ]:

# 其实我更喜欢这种写法:
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(2, 2, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(3 * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )
    def forward(self, input):
        return self.main(input)

In [ ]:

# 或者更加简化:
my_net = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(1, 2, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(3 * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

In [ ]:

print(net)
print(MyNet())
print(my_net)

在nn.module的子类中定义了forward函数, backward 函数会自动被实现(利用autograd)

在forward 函数中你可以使用任何的Tensor支持的函数, 还可以使用if,for循环,print,log等等. 标准python是怎么写的, 你就可以怎么写的.

可学习的参数通过net.parameters()返回

In [ ]:

params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight

记住!!!: forward的输入和输出都是 autograd.Variable, 因为只有Variable才有自动求导功能,Tensor是没有的

In [ ]:

input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)
print('out=', out)
#net.zero_grad() # 所有参数的梯度清零
out.backward(torch.ones(1, 10), retain_graph=True) # 反向传播
# out.backward(torch.randn(1, 10)) #如果不清零,再次反向传播会如何? 梯度叠加 (把randn改成ones可以更明显的看到效果)
print('out=', out)

NOTE: torch.nn 只支持 mini-batches

不支持一次只输入一个样本, 一次必须是一个batch. 但如果你一定要输入一个样本的话,用 input.unsqueeze(0) .(伪装成只有一个样本的batch)

比如 nn.Conv2d 输入必须是一个 4D Tensor , 形如: nSamples x nChannels x Height x Width. 可以把nsample设为1, 但是形状不能是nChannels x Height x Width

Module有很多属性，可以查看权重、参数等等

In [ ]:

print(net)

for param in net.parameters():
     print(type(param.data), param.size())
     print(list(param.data)) 

print(net.state_dict().keys())
#参数的keys

for key in net.state_dict():#模型参数
    print(key, 'corresponds to', list(net.state_dict()[key]))

损失函数¶

损失函数的定义: A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

nn中常用的损失函数: several different loss functions under the nn package.

最简单的loss: nn.MSELoss 计算均方误差

In [ ]:

output = net(input)
# target = Variable(torch.range(1, 10))  # a dummy target, for example
# torch.range is deprecated in favor of torch.arange and will be removed in 0.3.
# arange generates values in [start; end), not [start; end].
target = Variable(torch.arange(1, 11))  # a dummy target, for example
criterion = nn.MSELoss()
loss = criterion(output, target)
loss

现在如果对 loss 进行反向传播的溯源(使用它的.creator 属性),你会看到它的计算图看起来像这样:

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d  
      -> view -> linear -> relu -> linear -> relu -> linear 
      -> MSELoss
      -> loss

所以,当我们调用loss.backward(), 这个图动态生成, 自动微分,图中参数(Parameter)会自动计算他们的导数,并与当前的导数相加(所以zero_grad很有必要)

In [ ]:

# For illustration, let us follow a few steps backward
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU/Threshold

In [ ]:

# 运行.backward, 来看看调用之前和调用之后的grad
# now we shall call loss.backward(), and have a look at conv1's bias gradients before and after the backward.
net.zero_grad() # zeroes the gradient buffers of all parameters
print('conv1.bias.grad -- before backward')
print(net.conv1.bias.grad)
loss.backward() # 这个cell只能运行一次，如再运行，提示要加retain_graph=True
print('conv1.bias.grad -- after backward')
print(net.conv1.bias.grad)

NOTE:nn 包中包含大量神经网络中会用到的函数和工具详细文档见 http://pytorch.org/docs/nn.html

优化器¶

最简单的梯度下降法: 随机梯度下降(SGD):

weight = weight - learning_rate * gradient

很容易实现

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)# inplace

torch.optim 中包含着许多常用的优化方法, 比如RMSProp,Adam等等, 非常易于使用. 而且参照着他们的代码, 实现自己的优化方法也是相当的简单.

In [ ]:

import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr = 0.01)

# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # Does the update

数据加载与预处理¶

通常来说, 当你要处理图片, 文本,语音甚至视频数据时候, 你必须使用标准的Python工具来加载数据,并转为numpy 数组, 然后再转为torch.Tensor

For images, Pillow, OpenCV很有用
For audio, scipy and librosa
For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful.

当然最好用的还是torch提供的vison包,叫做torchvision, 实现了常用的图像数据加载功能比如Imagenet,CIFAR10,MNIST等等, 以及常用的数据转换操作, 这位数据加载带来了极大的方便, 并可避免撰写重复代码.

来看看CIFAR10, 它有10个类别: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'. 图片大小: 3x32x32, i.e. 3-通道彩色 32x32分辨率

Excise（练习:训练图片分类器）¶

步骤如下:

使用torchvision加载并预处理CIFAR10
定义网络
定义损失函数
训练网络(+ 更新网络参数)
测试网络

加载和预处理CIFAR10¶

ImageFolder

ImageFolder 假设图片的文件夹如下所式 :

root/good/xxx.png
root/good/xxy.png
root/good/xxz.png

root/bad/123.png
root/bad/nsdf3.png
root/bad/asd932_.png

通过如下代码可加载：

val_dataset = MyImageFolder('/root',
                transform=transforms.Compose([transforms.Scale(opt.image_size),
                                             # transforms.Lambda(lambda image:image.rotate(random.randint(0,359))),
                                              transforms.ToTensor(),
                                              transforms.Normalize([0.5]*3,[0.5]*3)
                                             ]), loader=my_loader)

val_dataloader=t.utils.data.DataLoader(val_dataset,opt.batch_size,True,num_workers=opt.workers, collate_fn=my_collate)

其中my_loader 用来加载指定路径的图片到内存 my_collate 用来对dataset加载的数据进行检查 transforms 包含两大类的操作：

PIL的Image对象的操作
Torch 的Tensor对象的操作

还可以利用transforms.Lambda 传入任意的函数进行操作

比如要在传入的图片中进行随机旋转（每次）

val_dataset = MyImageFolder('/root',
                transform=transforms.Compose([transforms.Scale(opt.image_size),
                                              transforms.Lambda(lambda image:image.rotate(random.randint(0,359))),
                                              transforms.ToTensor(),
                                              transforms.Normalize([0.5]*3,[0.5]*3)
                                             ]), loader=my_loader)

val_dataloader=t.utils.data.DataLoader(val_dataset,opt.batch_size,True,num_workers=opt.workers, collate_fn=my_collate)

In [8]:

import torchvision
import torchvision.transforms as transforms

In [10]:

# torchvision dataset 的输出默认都是 PILImage: range [0, 1].
# 通过transform 来把他们转成[-1,1]

transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                             ])
trainset = torchvision.datasets.CIFAR10(root='./Data/', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, 
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./Data/', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, 
                                          shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified

来看几张图片

In [ ]:

# functions to show an image
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
def imshow(img):
    img = img / 2 + 0.5 # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1,2,0)))

In [ ]:

# show some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
print(images.size())
# print images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s'%classes[labels[j]] for j in range(4)))

定义CNN¶

Exercise: 直接拷贝上面的LeNet+Mnist网络,然后修改第一个参数为3通道 (因为mnist是黑白, 而cifar是32)

提示: You only have to change the first layer, change the number 1 to be 3.

In [ ]:

class Net0(nn.Module):
    def __init__(self):
        super(Net0, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) 
        x = F.max_pool2d(F.relu(self.conv2(x)), 2) 
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] 

        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net0 = Net0()

In [ ]:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

定义损失函数和优化器(loss和optimizer)¶

In [ ]:

from torch import optim
criterion = nn.CrossEntropyLoss() # use a Classification Cross-Entropy loss 交叉熵损失函数
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

训练网络¶

写一个for循环, 不断地

输入数据
计算损失函数
更新参数

In [ ]:

for epoch in range(2): # loop over the dataset multiple times 
    
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):#(当所有的数据都被输入了一遍的时候,循环就会结束,所以需要上面的for epoch in)
        # get the inputs
        inputs, labels = data
        
        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)
        
        # zero the parameter gradients
        optimizer.zero_grad()
        
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()        
        optimizer.step()
        
        # print statistics
        running_loss += loss.data[0]
        if i % 2000 == 1999: # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training')

我们训练了2个epoch(也就是每张图片都输入训练了两次)

可以看看网络有没有效果(测试的图片输入到网络中, 计算它的label, 然后和实际的label进行比较)

先来看看测试集中的一张图片.

In [ ]:

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s'%classes[labels[j]] for j in range(4)))

计算label

In [ ]:

# 计算图片在每个类别上的分数(能量?)
outputs = net(Variable(images))

# the outputs are energies for the 10 classes. 
# Higher the energy for a class, the more the network 
# thinks that the image is of the particular class

# 得分最高的那个类
_, predicted = torch.max(outputs.data, 1)
print('Predicted: ', ' '.join('%5s'% classes[predicted[j]] for j in range(4)))

还行,至少比随机预测好, 接下来看看在这个测试集的准确率

In [ ]:

correct = 0
total = 0
for data in testloader:
    images, labels = data
    outputs = net(Variable(images))
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

远比随机猜测(准确率10%)好,说明网络学到了点东西

In [ ]:

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
for data in testloader:
    images, labels = data
    outputs = net(Variable(images))
    _, predicted = torch.max(outputs.data, 1)
    c = (predicted == labels).squeeze()
    for i in range(4):
        label = labels[i]
        class_correct[label] += c[i]
        class_total[label] += 1

In [ ]:

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))

经典的网络已经根据论文定义好了，拿来就可以用

In [ ]:

import torchvision.models as models
alexnet = models.alexnet(pretrained=True)  #已经根据论文定义好了模型， 并且有训练好的参数
#可以很方便的进行Finetune和特征提取

Training on the GPU¶

就像我们之前把Tensor从CPU转到GPU一样, 模型也可以很简单的从CPU转到GPU, 这会把所有的模型参数和buffer转成CUDA tensor

In [ ]:

net.cuda()

In [ ]:

import torch

### tensor example
x_cpu = torch.randn(10, 20)
w_cpu = torch.randn(20, 10)
# direct transfer to the GPU
x_gpu = x_cpu.cuda()
w_gpu = w_cpu.cuda()
result_gpu = x_gpu @ w_gpu
# get back from GPU to CPU
result_cpu = result_gpu.cpu()

### model example
model = model.cuda()
# train step
inputs = Variable(inputs.cuda())
outputs = model(inputs)
# get back from GPU to CPU
outputs = outputs.cpu()

如果你觉得在GPU上没有比CPU提速很多, 不要着急, 那是因为这个网络实在太小了

Exercise: 增加网络的深度和宽度 , 看看提速如何 (第一个 nn.Conv2d的第二个参数,第二个 nn.Conv2d的第一个参数, 需要一样(you know Why)

因为有些时候我们想在 CPU 和 GPU 中运行相同的模型，而无需改动代码，我们会需要一种封装：

In [ ]:

class Trainer:
    def __init__(self, model, use_cuda=False, gpu_idx=0):
        self.use_cuda = use_cuda
        self.gpu_idx = gpu_idx
        self.model = self.to_gpu(model)

    def to_gpu(self, tensor):
        if self.use_cuda:
            return tensor.cuda(self.gpu_idx)
        else:
            return tensor

    def from_gpu(self, tensor):
        if self.use_cuda:
            return tensor.cpu()
        else:
            return tensor

    def train(self, inputs):
        inputs = self.to_gpu(inputs)
        outputs = self.model(inputs)
        outputs = self.from_gpu(outputs)

最终架构¶

这里有一段用于解读的伪代码：

In [ ]:

class ImagesDataset(torch.utils.data.Dataset):
    pass

class Net(nn.Module):
    pass

model = Net()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
criterion = torch.nn.MSELoss()

dataset = ImagesDataset(path_to_images)
data_loader = torch.utils.data.DataLoader(dataset, batch_size=10)

train = True
for epoch in range(epochs):
    if train:
        lr_scheduler.step()

    for inputs, labels in data_loader:
        inputs = Variable(to_gpu(inputs))
        labels = Variable(to_gpu(labels))

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        if train:
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

    if not train:
        save_best_model(epoch_validation_accuracy)

In [ ]:

Next（后续学习）¶

如何finetune¶

如何进行参数初始化
如何加载和保存模型
如何给不同的网络层设置不同的学习率
如何冻结某些网络层的学习率
如何从某一层中提取feature

如何进行参数初始化（使用 torch.nn.init ）¶

In [ ]:

def initNetParams(net):
    '''Init net parameters.'''
    for m in net.modules():
        if isinstance(m, nn.Conv2d):
            init.xavier_uniform(m.weight)
            if m.bias:
                init.constant(m.bias, 0)
        elif isinstance(m, nn.BatchNorm2d):
            init.constant(m.weight, 1)
            init.constant(m.bias, 0)
        elif isinstance(m, nn.Linear):
            init.normal(m.weight, std=1e-3)
            if m.bias:
                init.constant(m.bias, 0)

initNetParams(net)

如何加载和保存模型（torch.save()，torch.load(‘.pth’)）¶

保存ConvNet¶

使用torch.save()对网络结构和模型参数的保存，有两种保存方式：

保存整个神经网络的的结构信息和模型参数信息，save的对象是网络net；
保存神经网络的训练模型参数，save的对象是net.state_dict()。

In [ ]:

torch.save(net1, 'net.pkl')  # 保存整个神经网络的结构和模型参数    
torch.save(net1.state_dict(), 'net_params.pkl') # 只保存神经网络的模型参数    

加载ConvNet¶

对应上面两种保存方式，重载方式也有两种。

对应第一种完整网络结构信息，重载的时候通过torch.load(‘.pth’)直接初始化新的神经网络对象即可。
对应第二种只保存模型参数信息，需要首先导入对应的网络，通过net.load_state_dict(torch.load('.pth'))完成模型参数的重载。

在网络比较大的时候，第一种方法会花费较多的时间，所占的存储空间也比较大。

In [ ]:

# 保存和加载整个模型  
torch.save(model_object, 'model.pth')  
model = torch.load('model.pth')  

# 仅保存和加载模型参数  
torch.save(model_object.state_dict(), 'params.pth')  
model_object.load_state_dict(torch.load('params.pth')) 

如何给不同的网络层设置不同的学习率（给Optimizer传dict）¶

Optimizer也支持为每个参数单独设置选项。若想这么做，不要直接传入Variable的iterable，而是传入dict的iterable。每一个dict都分别定义了一组参数，并且包含一个param键，这个键对应参数的列表。其他的键应该optimizer所接受的其他参数的关键字相匹配，并且会被用于对这组参数的优化。

注意：

您仍然可以将选项作为关键字参数传递。它们将被用作默认值，在不覆盖它们的组中。当您只想改变一个选项，同时保持参数组之间的所有其他选项一致时，这很有用。

例如，当我们想指定每一层的学习率时，这是非常有用的：

In [ ]:

optim.SGD([{'params': model.base.parameters()},
           {'params': model.classifier.parameters(), 'lr': 1e-3}
          ], lr=1e-2, momentum=0.9)

这意味着model.base参数将使用默认的学习速率1e-2，model.classifier参数将使用学习速率1e-3，并且0.9的momentum将会被用于所有的参数。

如何冻结某些网络层的学习率（TODO）¶

如何从某一层中提取feature（TODO）¶

In [ ]: