Notebook

How to simply use tf.data¶

tf.data package를 사용하는 방법에 관한 예시, 예시 데이터는 numpy package를 이용하여 간단하게 data에 해당하는 X, target에 해당하는 y를 생성하여 tf.data package의 각종 module, function을 이용한다. epoch 마다 validation data에 대해서 validation을 하는 상황을 가정

Template¶

for문을 활용, model을 training시 data pipeline으로써 아래의 function과 method를 사용하는 방법에 대한 예시

Dataset class
- tf.data.Dataset.from_tensor_slices으로 Dataset class의 instance를 생성
  - train data에 대한 Dataset class의 instance, tr_data
  - validation data에 대한 Dataset class의 instance, val_data
- 아래와 같은 method를 활용하여 training 시 필요한 요소를 지정
  - instance의 shuffle method를 활용하여, shuffling
  - instanec의 batch method를 활용하여, batch size 지정
  - for문으로 전체 epoch를 control하므로 repeat method는 활용하지 않음
Iterator class
- Dataset class의 instance에서 make_initializable_iterator method로 Iterator class의 instance를 생성
  - train data에 대한 iterator class의 instance, tr_iterator
  - validation data에 대한 iterator class의 instance, val_iterator
  - 주의사항 : make_initializable_iterator method로 Iterator class의 instance를 생성할 경우, random_seed를 고정 X
    - random_seed를 고정할 경우, 서로 다른 epoch의 step 별 mini-batch의 구성이 완전히 똑같아지기 때문
- Anonymous iterator를 tf.data.Iterator.from_string_handle로 생성
  - string_handle argument에 tf.placeholder를 이용
    - tr_iterator를 활용할 것인지, val_iterator를 활용할 것인지 조절

Setup¶

In [1]:

from __future__ import absolute_import, division, print_function
import numpy as np
import tensorflow as tf

print(tf.__version__)

1.12.0

In [2]:

# 전체 데이터의 개수가 12개인 임의의 데이터셋 생성
X = np.c_[np.arange(12), np.arange(12)]
y = np.arange(12)

print(X.shape, y.shape)

(12, 2) (12,)

In [3]:

# 위의 데이터를 train, validation으로 split
X_tr = X[:8]
y_tr = y[:8]

X_val = X[8:]
y_val = y[8:]

print(X_tr.shape, y_tr.shape)
print(X_val.shape, y_val.shape)

(8, 2) (8,)
(4, 2) (4,)

Template¶

In [4]:

n_epoch = 3
batch_size = 2
total_steps = int(X_tr.shape[0] / batch_size)
print('epoch : {}, batch_size : {}, total_steps : {}'.format(n_epoch, batch_size, total_steps))

epoch : 3, batch_size : 2, total_steps : 4

In [5]:

tr_data = tf.data.Dataset.from_tensor_slices((X_tr, y_tr)) # 0th dimension의 size가 같아야
tr_data = tr_data.shuffle(buffer_size = 30)
tr_data = tr_data.batch(batch_size = batch_size)

val_data = tf.data.Dataset.from_tensor_slices((X_val, y_val))
val_data = val_data.batch(batch_size = batch_size)

print(tr_data)
print(val_data)

<BatchDataset shapes: ((?, 2), (?,)), types: (tf.int64, tf.int64)>
<BatchDataset shapes: ((?, 2), (?,)), types: (tf.int64, tf.int64)>

In [6]:

tr_iterator = tr_data.make_initializable_iterator()
val_iterator = val_data.make_initializable_iterator()

handle = tf.placeholder(dtype = tf.string)

In [7]:

iterator = tf.data.Iterator.from_string_handle(string_handle = handle,
                                               output_shapes = tr_iterator.output_shapes,
                                               output_types = tr_iterator.output_types)
X_mb, y_mb = iterator.get_next()

In [8]:

# n_tr_step, n_val_step 변수와 관련된 코드는 step 수 확인을 위해 넣어놓음
sess_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
sess = tf.Session(config = sess_config)
tr_handle, val_handle = sess.run([tr_iterator.string_handle(), val_iterator.string_handle()])

for epoch in range(n_epoch):
    
    print('epoch : {} training start'.format(epoch + 1))
    sess.run(tr_iterator.initializer) # run tr_iterator
    n_tr_step = 0
    
    while True:
        try:
            n_tr_step += 1
            X_tmp, y_tmp = sess.run([X_mb, y_mb], feed_dict = {handle : tr_handle})
            print('step : {}'.format(n_tr_step))
            print(X_tmp, y_tmp)
        
        except:
            print('epoch : {} training finished'.format(epoch + 1))
            break

    print('at epoch : {}, validation start'.format(epoch + 1))        
    sess.run(val_iterator.initializer)
    n_val_step = 0
    while True:
        try:
            n_val_step += 1
            X_tmp, y_tmp = sess.run([X_mb, y_mb], feed_dict = {handle : val_handle})
            
            print('step : {}'.format(n_val_step))
            print(X_tmp, y_tmp)
        except:
            print('validation finished')
            break

epoch : 1 training start
step : 1
[[0 0]
 [1 1]] [0 1]
step : 2
[[4 4]
 [3 3]] [4 3]
step : 3
[[5 5]
 [2 2]] [5 2]
step : 4
[[7 7]
 [6 6]] [7 6]
epoch : 1 training finished
at epoch : 1, validation start
step : 1
[[8 8]
 [9 9]] [8 9]
step : 2
[[10 10]
 [11 11]] [10 11]
validation finished
epoch : 2 training start
step : 1
[[3 3]
 [5 5]] [3 5]
step : 2
[[4 4]
 [0 0]] [4 0]
step : 3
[[1 1]
 [2 2]] [1 2]
step : 4
[[7 7]
 [6 6]] [7 6]
epoch : 2 training finished
at epoch : 2, validation start
step : 1
[[8 8]
 [9 9]] [8 9]
step : 2
[[10 10]
 [11 11]] [10 11]
validation finished
epoch : 3 training start
step : 1
[[4 4]
 [3 3]] [4 3]
step : 2
[[6 6]
 [1 1]] [6 1]
step : 3
[[0 0]
 [2 2]] [0 2]
step : 4
[[7 7]
 [5 5]] [7 5]
epoch : 3 training finished
at epoch : 3, validation start
step : 1
[[8 8]
 [9 9]] [8 9]
step : 2
[[10 10]
 [11 11]] [10 11]
validation finished