6 병렬처리¶

GPU를 사용하는 방법 소개

6.1 GPU 실행 환경¶

CUDA Toolkit 7.0과 cuDNN 6.5 v2가 필요.

텐서플로 디바이스 사용방법

"/cpu:0": 서버의 CPU를 지정
"/gpu:0": 서버의 첫 번째 GPU를 지정.
"/gpu:1": 서버의 두 번째 GPU를 지정. 이후는 2,3,4, ...

In [1]:

import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2,3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3,2], name='b')
c = tf.matmul(a, b)

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))

[[ 22.  28.]
 [ 49.  64.]]

코드설명

log.device.replacement

True면 어떤 디바이스에 할당되었는지 로그를 남김.

In [ ]:

with tf.device('/cpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2,3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3,2], name='b')
    c = tf.matmul(a, b)

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))

코드설명

with tf.device('/gpu:2'):

특정한 디바이스에서 지정된 연산이 실행되도록 함.

with 콘텍스트 안에서 모든 연산이 같은 디바이스에서 수행됨.

세 번째 CPU에서 위의 a, b, c에 대한 연산을 수행함.

6.2 여러 GPU에서의 병렬처리¶

여러 개의 GPU에 일을 시키도록 모델을 구성 가능.

In [ ]:

c = []
for d in ['/gpu:2', '/gpu:3']:
    with tf.device('/cpu:0'):
        a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2,3], name='a')
        b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3,2], name='b')
        c = tf.matmul(a, b)
    with tf.device('/cpu:0'):
        sum = tf.add_n(c)

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))

코드설명

두 개의 GPU에 tf.device를 사용해 곱셈을 각각 할당.

나중에 CPU가 합산.

6.3 GPU 코드 예제¶

n이 10일 때 Aⁿ + Bⁿ을 1 GPU와 2 GPU에서 계산할 때 걸린 시간을

datetime 패키지를 사용하여 비교.

이 예제를 실행하려면 시스템에 GPU 2개 이상 필요함.

In [ ]:

import numpy as np
import tensorflow as tf
import datetime

필요한 라이브러리 로드

In [ ]:

A = np.random.rand(1e4, 1e4).astype('float32')
B = np.random.rand(1e4, 1e4).astype('float32')

임의의 값으로 두 개의 행렬을 만듬.

In [ ]:

n = 10

n을 10으로 지정. 이 값이 클수록 계산이 오래 걸림.

In [ ]:

c1 = []
c2 = []

결과를 저장할 리스트 2개를 만듬.

In [ ]:

def matpow(M, n):
    if n < 1:
        return M
    else:
        return tf.matmul(M, matpow(M, n-1))

거듭제곱을 구하는 matpow() 함수를 정의

In [ ]:

with tf.device('/gpu:0'):
    a = tf.constant(A)
    b = tf.constant(B)
    c1.append(matpow(a, n))
    c1.append(matpow(b, n))

with tf.device('/cpu:0'):
    sum = tf.add_n(c1)

t1_1 = datetime.datetime.now()

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    sess.run(sum)
    
t2_1 = datetime.datetime.now()

GPU를 하나만 사용하여 실행할 코드

In [ ]:

with tf.device('/gpu:0'):
    a = tf.constant(A)
    c2.append(matpow(a, n))

with tf.device('/gpu:1'):    
    b = tf.constant(B)
    c2.append(matpow(b, n))    

with tf.device('/cpu:0'):
    sum = tf.add_n(c2)

t1_2 = datetime.datetime.now()

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    sess.run(sum)
    
t2_2 = datetime.datetime.now()

GPU를 두 개 사용하여 실행할 코드

In [ ]:

print("Single GPU computation time:" + str(t2_1 - t1_1))
print("Multi GPU computation time:" + str(t2_2 - t1_2))

연산에 걸린 시간을 출력

6.4 분산 버전 텐서¶

텐서플로 분산 버전

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/distributed_runtime

프로세스 간 통신용 고성능 오픈소스 RPC 프레임워크 gRPC를 사용.

예제 소스: https://github.com/jorditorresBCN/FirstContactWithTensorFlow/blob/master/MultiGPU.py