Notebook

머신 러닝 교과서 3판¶

12장 - 다층 인공 신경망을 밑바닥부터 구현¶

아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.

주피터 노트북 뷰어로 보기

구글 코랩(Colab)에서 실행하기

목차¶

인공 신경망으로 복잡한 함수 모델링
- 단일층 신경망 요약
- 다층 신경망 구조
- 정방향 계산으로 신경망 활성화 출력 계산
손글씨 숫자 분류
- MNIST 데이터셋 구하기
- 다층 퍼셉트론 구현
인공 신경망 훈련
- 로지스틱 비용 함수 계산
- 역전파 알고리즘 이해
- 역전파 알고리즘으로 신경망 훈련
신경망의 수렴
신경망 구현에 관한 몇 가지 첨언
요약

In [1]:

from IPython.display import Image

인공 신경망으로 복잡한 함수 모델링¶

...

단일층 신경망 요약¶

In [2]:

Image(url='https://git.io/JLdrS', width=600)

Out[2]:

다층 신경망 구조¶

In [3]:

Image(url='https://git.io/JLdrx', width=600)

Out[3]:

In [4]:

Image(url='https://git.io/JLdrp', width=500)

Out[4]:

정방향 계산으로 신경망 활성화 출력 계산¶

In [5]:

Image(url='https://git.io/JLdoe', width=500)

Out[5]:

사이킷런을 사용해 MNIST 데이터를 적재하려면 다음 코드의 주석을 해제하고 실행하세요.

In [6]:

"""
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split


X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
y = y.astype(int)
X = ((X / 255.) - .5) * 2
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=10000, random_state=123, stratify=y)
"""

Out[6]:

"\nfrom sklearn.datasets import fetch_openml\nfrom sklearn.model_selection import train_test_split\n\n\nX, y = fetch_openml('mnist_784', version=1, return_X_y=True)\ny = y.astype(int)\nX = ((X / 255.) - .5) * 2\nX_train, X_test, y_train, y_test = train_test_split(\n    X, y, test_size=10000, random_state=123, stratify=y)\n"

손글씨 숫자 분류¶

...

MNIST 데이터셋 구하기¶

MNIST 데이터셋은 http://yann.lecun.com/exdb/mnist/%EC%97%90 공개되어 있으며 다음 네 부분으로 구성되어 있습니다.

훈련 세트 이미지: train-images-idx3-ubyte.gz(9.9MB, 압축 해제 후 47MB, 60,000개 샘플)
훈련 세트 레이블: train-labels-idx1-ubyte.gz(29KB, 압축 해제 후 60KB, 60,000개 레이블)
테스트 세트 이미지: t10k-images-idx3-ubyte.gz(1.6MB, 압축 해제 후 7.8MB, 10,000개 샘플)
테스트 세트 레이블: t10k-labels-idx1-ubyte.gz(5KB, 압축 해제 후 10KB, 10,000개 레이블)

이 절에서는 MNIST 데이터 중 일부만 사용합니다. 따라서 훈련 데이터셋의 이미지와 레이블만 다운로드합니다.

파일을 다운로드한 후에 다음 코드 셀을 실행하면 파일 압축을 풀 수 있습니다.

In [7]:

# 코랩을 사용할 때는 다음 코드를 실행하세요.
!wget https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch12/train-images-idx3-ubyte.gz
!wget https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch12/train-labels-idx1-ubyte.gz
!wget https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch12/t10k-images-idx3-ubyte.gz
!wget https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch12/t10k-labels-idx1-ubyte.gz

--2023-11-10 05:26:56--  https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch12/train-images-idx3-ubyte.gz
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch12/train-images-idx3-ubyte.gz [following]
--2023-11-10 05:26:56--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch12/train-images-idx3-ubyte.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9912422 (9.5M) [application/octet-stream]
Saving to: ‘train-images-idx3-ubyte.gz’

train-images-idx3-u 100%[===================>]   9.45M  --.-KB/s    in 0.1s    

2023-11-10 05:26:56 (83.3 MB/s) - ‘train-images-idx3-ubyte.gz’ saved [9912422/9912422]

--2023-11-10 05:26:56--  https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch12/train-labels-idx1-ubyte.gz
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch12/train-labels-idx1-ubyte.gz [following]
--2023-11-10 05:26:56--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch12/train-labels-idx1-ubyte.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28881 (28K) [application/octet-stream]
Saving to: ‘train-labels-idx1-ubyte.gz’

train-labels-idx1-u 100%[===================>]  28.20K  --.-KB/s    in 0.003s  

2023-11-10 05:26:56 (10.9 MB/s) - ‘train-labels-idx1-ubyte.gz’ saved [28881/28881]

--2023-11-10 05:26:57--  https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch12/t10k-images-idx3-ubyte.gz
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch12/t10k-images-idx3-ubyte.gz [following]
--2023-11-10 05:26:57--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch12/t10k-images-idx3-ubyte.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1648877 (1.6M) [application/octet-stream]
Saving to: ‘t10k-images-idx3-ubyte.gz’

t10k-images-idx3-ub 100%[===================>]   1.57M  --.-KB/s    in 0.07s   

2023-11-10 05:26:57 (21.0 MB/s) - ‘t10k-images-idx3-ubyte.gz’ saved [1648877/1648877]

--2023-11-10 05:26:57--  https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch12/t10k-labels-idx1-ubyte.gz
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch12/t10k-labels-idx1-ubyte.gz [following]
--2023-11-10 05:26:57--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch12/t10k-labels-idx1-ubyte.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4542 (4.4K) [application/octet-stream]
Saving to: ‘t10k-labels-idx1-ubyte.gz’

t10k-labels-idx1-ub 100%[===================>]   4.44K  --.-KB/s    in 0s      

2023-11-10 05:26:57 (55.3 MB/s) - ‘t10k-labels-idx1-ubyte.gz’ saved [4542/4542]

In [8]:

# MNIST 데이터 압축을 푸는 코드

import sys
import gzip
import shutil
import os

if (sys.version_info > (3, 0)):
    writemode = 'wb'
else:
    writemode = 'w'

zipped_mnist = [f for f in os.listdir() if f.endswith('ubyte.gz')]
for z in zipped_mnist:
    with gzip.GzipFile(z, mode='rb') as decompressed, open(z[:-3], writemode) as outfile:
        outfile.write(decompressed.read())

위 코드 셀을 실행할 때 에러가 발생할 경우:

위 코드 셀을 실행할 때 문제가 있다면 터미널에서 Unix/Linux gzip 명령을 사용해 파일의 압축을 푸는 것이 좋습니다. 예를 들어 MNIST 다운로드 디렉토리에서 다음 명령을 실행합니다.

gzip *ubyte.gz -d

또는 마이크로소프트 윈도우를 사용한다면 선호하는 압축 프로그램을 사용할 수 있습니다. 이미지는 바이트 형태로 저장되어 있으므로 다음에 나오는 함수를 사용해 넘파이 배열로 읽어 MLP 모델을 훈련합니다.

gzip을 사용하지 않는다면 만들어진 파일 이름이 다음과 같은지 확인하세요.

train-images-idx3-ubyte
train-labels-idx1-ubyte
t10k-images-idx3-ubyte
t10k-labels-idx1-ubyte

만약 압축 해제 후에 (파일 확장자를 예측하는 일부 도구들 때문에) 파일 이름이 train-images.idx3-ubyte처럼 된다면 다음 코드를 진행하기 전에 train-images-idx3-ubyte로 이름을 바꾸어 주세요.

In [9]:

import os
import struct
import numpy as np

def load_mnist(path, kind='train'):
    """`path`에서 MNIST 데이터 불러오기"""
    labels_path = os.path.join(path,
                               '%s-labels-idx1-ubyte' % kind)
    images_path = os.path.join(path,
                               '%s-images-idx3-ubyte' % kind)

    with open(labels_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II',
                                 lbpath.read(8))
        labels = np.fromfile(lbpath,
                             dtype=np.uint8)

    with open(images_path, 'rb') as imgpath:
        magic, num, rows, cols = struct.unpack(">IIII",
                                               imgpath.read(16))
        images = np.fromfile(imgpath,
                             dtype=np.uint8).reshape(len(labels), 784)
        images = ((images / 255.) - .5) * 2

    return images, labels

In [10]:

!ls

sample_data		   t10k-labels-idx1-ubyte     train-images-idx3-ubyte.gz
t10k-images-idx3-ubyte	   t10k-labels-idx1-ubyte.gz  train-labels-idx1-ubyte
t10k-images-idx3-ubyte.gz  train-images-idx3-ubyte    train-labels-idx1-ubyte.gz

In [11]:

X_train, y_train = load_mnist('', kind='train')
print('행: %d, 열: %d' % (X_train.shape[0], X_train.shape[1]))

행: 60000, 열: 784

In [12]:

X_test, y_test = load_mnist('', kind='t10k')
print('행: %d, 열: %d' % (X_test.shape[0], X_test.shape[1]))

행: 10000, 열: 784

각 클래스의 첫 번째 이미지를 그립니다:

In [13]:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(nrows=2, ncols=5, sharex=True, sharey=True)
ax = ax.flatten()
for i in range(10):
    img = X_train[y_train == i][0].reshape(28, 28)
    ax[i].imshow(img, cmap='Greys')

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
# plt.savefig('images/12_5.png', dpi=300)
plt.show()

숫자 7 샘플 25개를 그립니다:

In [14]:

fig, ax = plt.subplots(nrows=5, ncols=5, sharex=True, sharey=True,)
ax = ax.flatten()
for i in range(25):
    img = X_train[y_train == 7][i].reshape(28, 28)
    ax[i].imshow(img, cmap='Greys')

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
# plt.savefig('images/12_6.png', dpi=300)
plt.show()

In [15]:

import numpy as np

np.savez_compressed('mnist_scaled.npz',
                    X_train=X_train,
                    y_train=y_train,
                    X_test=X_test,
                    y_test=y_test)

In [16]:

mnist = np.load('mnist_scaled.npz')
mnist.files

Out[16]:

['X_train', 'y_train', 'X_test', 'y_test']

In [17]:

X_train, y_train, X_test, y_test = [mnist[f] for f in ['X_train', 'y_train',
                                    'X_test', 'y_test']]

del mnist

X_train.shape

Out[17]:

(60000, 784)

다층 퍼셉트론 구현¶

In [18]:

import numpy as np
import sys


class NeuralNetMLP(object):
    """피드포워드 신경망 / 다층 퍼셉트론 분류기

    매개변수

    ------------
    n_hidden : int (기본값: 30)
        은닉 유닛 개수
    l2 : float (기본값: 0.)
        L2 규제의 람다 값
        l2=0이면 규제 없음. (기본값)
    epochs : int (기본값: 100)
        훈련 세트를 반복할 횟수
    eta : float (기본값: 0.001)
        학습률
    shuffle : bool (기본값: True)
        에포크마다 훈련 세트를 섞을지 여부
        True이면 데이터를 섞어 순서를 바꿉니다
    minibatch_size : int (기본값: 1)
        미니 배치의 훈련 샘플 개수
    seed : int (기본값: None)
        가중치와 데이터 셔플링을 위한 난수 초깃값

    속성
    -----------
    eval_ : dict
      훈련 에포크마다 비용, 훈련 정확도, 검증 정확도를 수집하기 위한 딕셔너리

    """
    def __init__(self, n_hidden=30,
                 l2=0., epochs=100, eta=0.001,
                 shuffle=True, minibatch_size=1, seed=None):

        self.random = np.random.RandomState(seed)
        self.n_hidden = n_hidden
        self.l2 = l2
        self.epochs = epochs
        self.eta = eta
        self.shuffle = shuffle
        self.minibatch_size = minibatch_size

    def _onehot(self, y, n_classes):
        """레이블을 원-핫 방식으로 인코딩합니다

        매개변수
        ------------
        y : 배열, 크기 = [n_samples]
            타깃 값.
        n_classes : int
            클래스 개수

        반환값
        -----------
        onehot : 배열, 크기 = (n_samples, n_labels)

        """
        onehot = np.zeros((n_classes, y.shape[0]))
        for idx, val in enumerate(y.astype(int)):
            onehot[val, idx] = 1.
        return onehot.T

    def _sigmoid(self, z):
        """로지스틱 함수(시그모이드)를 계산합니다"""
        return 1. / (1. + np.exp(-np.clip(z, -250, 250)))

    def _forward(self, X):
        """정방향 계산을 수행합니다"""

        # 단계 1: 은닉층의 최종 입력
        # [n_samples, n_features] dot [n_features, n_hidden]
        # -> [n_samples, n_hidden]
        z_h = np.dot(X, self.w_h) + self.b_h

        # 단계 2: 은닉층의 활성화 출력
        a_h = self._sigmoid(z_h)

        # 단계 3: 출력층의 최종 입력
        # [n_samples, n_hidden] dot [n_hidden, n_classlabels]
        # -> [n_samples, n_classlabels]
        z_out = np.dot(a_h, self.w_out) + self.b_out

        # 단계 4: 출력층의 활성화 출력
        a_out = self._sigmoid(z_out)

        return z_h, a_h, z_out, a_out

    def _compute_cost(self, y_enc, output):
        """비용 함수를 계산합니다

        매개변수
        ----------
        y_enc : 배열, 크기 = (n_samples, n_labels)
            원-핫 인코딩된 클래스 레이블
        output : 배열, 크기 = [n_samples, n_output_units]
            출력층의 활성화 출력 (정방향 계산)

        반환값
        ---------
        cost : float
            규제가 포함된 비용

        """
        L2_term = (self.l2 *
                   (np.sum(self.w_h ** 2.) +
                    np.sum(self.w_out ** 2.)))

        term1 = -y_enc * (np.log(output))
        term2 = (1. - y_enc) * np.log(1. - output)
        cost = np.sum(term1 - term2) + L2_term

        # 다른 데이터셋에서는 극단적인 (0 또는 1에 가까운) 활성화 값이 나올 수 있습니다.
        # 파이썬과 넘파이의 수치 연산이 불안정하기 때문에 "ZeroDivisionError"가 발생할 수 있습니다.
        # 즉, log(0)을 평가하는 경우입니다.
        # 이 문제를 해결하기 위해 로그 함수에 전달되는 활성화 값에 작은 상수를 더합니다.
        #
        # 예를 들어:
        #
        # term1 = -y_enc * (np.log(output + 1e-5))
        # term2 = (1. - y_enc) * np.log(1. - output + 1e-5)

        return cost

    def predict(self, X):
        """클래스 레이블을 예측합니다

        매개변수
        -----------
        X : 배열, 크기 = [n_samples, n_features]
            원본 특성의 입력층

        반환값:
        ----------
        y_pred : 배열, 크기 = [n_samples]
            예측된 클래스 레이블

        """
        z_h, a_h, z_out, a_out = self._forward(X)
        y_pred = np.argmax(z_out, axis=1)
        return y_pred

    def fit(self, X_train, y_train, X_valid, y_valid):
        """훈련 데이터에서 가중치를 학습합니다

        매개변수
        -----------
        X_train : 배열, 크기 = [n_samples, n_features]
            원본 특성의 입력층
        y_train : 배열, 크기 = [n_samples]
            타깃 클래스 레이블
        X_valid : 배열, 크기 = [n_samples, n_features]
            훈련하는 동안 검증에 사용할 샘플 특성
        y_valid : 배열, 크기 = [n_samples]
            훈련하는 동안 검증에 사용할 샘플 레이블

        반환값:
        ----------
        self

        """
        n_output = np.unique(y_train).shape[0]  # number of class labels
        n_features = X_train.shape[1]

        ########################
        # 가중치 초기화
        ########################

        # 입력층 -> 은닉층 사이의 가중치
        self.b_h = np.zeros(self.n_hidden)
        self.w_h = self.random.normal(loc=0.0, scale=0.1,
                                      size=(n_features, self.n_hidden))

        # 은닉층 -> 출력층 사이의 가중치
        self.b_out = np.zeros(n_output)
        self.w_out = self.random.normal(loc=0.0, scale=0.1,
                                        size=(self.n_hidden, n_output))

        epoch_strlen = len(str(self.epochs))  # 출력 포맷을 위해
        self.eval_ = {'cost': [], 'train_acc': [], 'valid_acc': []}

        y_train_enc = self._onehot(y_train, n_output)

        # 훈련 에포크를 반복합니다
        for i in range(self.epochs):

            # 미니 배치로 반복합니다
            indices = np.arange(X_train.shape[0])

            if self.shuffle:
                self.random.shuffle(indices)

            for start_idx in range(0, indices.shape[0] - self.minibatch_size +
                                   1, self.minibatch_size):
                batch_idx = indices[start_idx:start_idx + self.minibatch_size]

                # 정방향 계산
                z_h, a_h, z_out, a_out = self._forward(X_train[batch_idx])

                ##################
                # 역전파
                ##################

                # [n_examples, n_classlabels]
                delta_out = a_out - y_train_enc[batch_idx]

                # [n_examples, n_hidden]
                sigmoid_derivative_h = a_h * (1. - a_h)

                # [n_examples, n_classlabels] dot [n_classlabels, n_hidden]
                # -> [n_examples, n_hidden]
                delta_h = (np.dot(delta_out, self.w_out.T) *
                           sigmoid_derivative_h)

                # [n_features, n_examples] dot [n_examples, n_hidden]
                # -> [n_features, n_hidden]
                grad_w_h = np.dot(X_train[batch_idx].T, delta_h)
                grad_b_h = np.sum(delta_h, axis=0)

                # [n_hidden, n_examples] dot [n_examples, n_classlabels]
                # -> [n_hidden, n_classlabels]
                grad_w_out = np.dot(a_h.T, delta_out)
                grad_b_out = np.sum(delta_out, axis=0)

                # 규제와 가중치 업데이트
                delta_w_h = (grad_w_h + self.l2*self.w_h)
                delta_b_h = grad_b_h # 편향은 규제하지 않습니다
                self.w_h -= self.eta * delta_w_h
                self.b_h -= self.eta * delta_b_h

                delta_w_out = (grad_w_out + self.l2*self.w_out)
                delta_b_out = grad_b_out  # 편향은 규제하지 않습니다
                self.w_out -= self.eta * delta_w_out
                self.b_out -= self.eta * delta_b_out

            #############
            # 평가
            #############

            # 훈련하는 동안 에포크마다 평가합니다
            z_h, a_h, z_out, a_out = self._forward(X_train)

            cost = self._compute_cost(y_enc=y_train_enc,
                                      output=a_out)

            y_train_pred = self.predict(X_train)
            y_valid_pred = self.predict(X_valid)

            # 넘파이 1.20에서 `np.float`가 deprecated되므로 대신 `float`를 사용합니다.
            train_acc = ((np.sum(y_train == y_train_pred)).astype(float) /
                         X_train.shape[0])
            valid_acc = ((np.sum(y_valid == y_valid_pred)).astype(float) /
                         X_valid.shape[0])

            sys.stderr.write('\r%0*d/%d | 비용: %.2f '
                             '| 훈련/검증 정확도: %.2f%%/%.2f%% ' %
                             (epoch_strlen, i+1, self.epochs, cost,
                              train_acc*100, valid_acc*100))
            sys.stderr.flush()

            self.eval_['cost'].append(cost)
            self.eval_['train_acc'].append(train_acc)
            self.eval_['valid_acc'].append(valid_acc)

        return self

In [19]:

n_epochs = 200

In [20]:

nn = NeuralNetMLP(n_hidden=100,
                  l2=0.01,
                  epochs=n_epochs,
                  eta=0.0005,
                  minibatch_size=100,
                  shuffle=True,
                  seed=1)

nn.fit(X_train=X_train[:55000],
       y_train=y_train[:55000],
       X_valid=X_train[55000:],
       y_valid=y_train[55000:])

200/200 | 비용: 5065.78 | 훈련/검증 정확도: 99.28%/97.98%

Out[20]:

<__main__.NeuralNetMLP at 0x793aad8f1300>

In [21]:

import matplotlib.pyplot as plt

plt.plot(range(nn.epochs), nn.eval_['cost'])
plt.ylabel('Cost')
plt.xlabel('Epochs')
# plt.savefig('images/12_07.png', dpi=300)
plt.show()

In [22]:

plt.plot(range(nn.epochs), nn.eval_['train_acc'],
         label='Training')
plt.plot(range(nn.epochs), nn.eval_['valid_acc'],
         label='Validation', linestyle='--')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(loc='lower right')
# plt.savefig('images/12_08.png', dpi=300)
plt.show()

In [23]:

y_test_pred = nn.predict(X_test)
# 넘파이 1.20에서 `np.float`가 deprecated되므로 대신 `float`를 사용합니다.
acc = (np.sum(y_test == y_test_pred)
       .astype(float) / X_test.shape[0])

print('테스트 정확도: %.2f%%' % (acc * 100))

테스트 정확도: 97.54%

In [24]:

miscl_img = X_test[y_test != y_test_pred][:25]
correct_lab = y_test[y_test != y_test_pred][:25]
miscl_lab = y_test_pred[y_test != y_test_pred][:25]

fig, ax = plt.subplots(nrows=5, ncols=5, sharex=True, sharey=True)
ax = ax.flatten()
for i in range(25):
    img = miscl_img[i].reshape(28, 28)
    ax[i].imshow(img, cmap='Greys', interpolation='nearest')
    ax[i].set_title('%d) t: %d p: %d' % (i+1, correct_lab[i], miscl_lab[i]))

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
# plt.savefig('images/12_09.png', dpi=300)
plt.show()

인공 신경망 훈련¶

...

로지스틱 비용 함수 계산¶

In [25]:

Image(url='https://git.io/JLdov', width=300)

Out[25]:

역전파 알고리즘 이해¶

...

역전파 알고리즘으로 신경망 훈련¶

In [26]:

Image(url='https://git.io/JLdoa', width=400)

Out[26]:

In [27]:

Image(url='https://git.io/JLdoz', width=500)

Out[27]:

신경망의 수렴¶

In [28]:

Image(url='https://git.io/JLdoK', width=500)

Out[28]:

...

요약¶

...