Notebook

머신 러닝 교과서 3판¶

13장 - 텐서플로를 사용한 신경망 훈련 (1/2)¶

아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.

주피터 노트북 뷰어로 보기

구글 코랩(Colab)에서 실행하기

목차¶

텐서플로와 훈련 성능
- 성능 문제
- 텐서플로란?
- 텐서플로 학습 순서
텐서플로 처음 시작하기
- 텐서플로 설치
- 텐서플로에서 텐서 만들기
- 텐서의 데이터 타입과 크기 조작하기
- 텐서에 수학 연산 적용하기
- split(), stack(), concatenate() 함수
텐서플로 데이터셋 API(tf.data)를 사용하여 입력 파이프라인 구축하기
- 텐서에서 텐서플로 데이터셋 만들기
- 두 개의 텐서를 하나의 데이터셋으로 연결하기
- shuffle(), batch(), repeat() 메서드
- 로컬 디스크에 있는 파일에서 데이터셋 만들기
- tensorflow_datasets 라이브러리에서 데이터셋 로드하기

In [1]:

from IPython.display import Image

텐서플로와 훈련 성능¶

성능 문제¶

In [2]:

Image(url='https://git.io/JL5iw', width=800)

Out[2]:

텐서플로란?¶

In [3]:

Image(url='https://git.io/JL5io', width=500)

Out[3]:

텐서플로 학습 순서¶

텐서플로 처음 시작하기¶

텐서플로 설치¶

In [4]:

#! pip install tensorflow

In [5]:

import tensorflow as tf
print('텐서플로 버전:', tf.__version__)
import numpy as np

np.set_printoptions(precision=3)

텐서플로 버전: 2.14.0

In [6]:

! python -c 'import tensorflow as tf; print(tf.__version__)'

2023-11-10 05:38:23.439759: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-10 05:38:23.439825: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-10 05:38:23.439877: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-10 05:38:26.479280: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2.14.0

텐서플로에서 텐서 만들기¶

In [7]:

a = np.array([1, 2, 3], dtype=np.int32)
b = [4, 5, 6]

t_a = tf.convert_to_tensor(a)
t_b = tf.convert_to_tensor(b)

print(t_a)
print(t_b)

tf.Tensor([1 2 3], shape=(3,), dtype=int32)
tf.Tensor([4 5 6], shape=(3,), dtype=int32)

In [8]:

tf.is_tensor(a), tf.is_tensor(t_a)

Out[8]:

(False, True)

In [9]:

t_ones = tf.ones((2, 3))

t_ones.shape

Out[9]:

TensorShape([2, 3])

In [10]:

t_ones.numpy()

Out[10]:

array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float32)

In [11]:

const_tensor = tf.constant([1.2, 5, np.pi], dtype=tf.float32)

print(const_tensor)

tf.Tensor([1.2   5.    3.142], shape=(3,), dtype=float32)

tf.convert_to_tensor 함수는 tf.constant 함수와 다르게 (잠시 후에 설명할) tf.Variable 객체도 입력받을 수 있습니다. 이외에도 tf.fill 함수와 tf.one_hot 함수를 사용하여 텐서를 만들 수 있습니다.

tf.fill 함수는 원하는 스칼라 값으로 채워진 텐서를 만듭니다. 첫 번째 매개변수에는 tf.ones 함수처럼 텐서의 크기를 전달합니다. 두 번째 매개변수에 채우고자 하는 스칼라 값을 전달합니다. 예를 들어 다음처럼 쓰면 tf.ones((2, 3))과 같은 결과를 얻을 수 있습니다.

In [12]:

tf.fill((2, 3), 1)

Out[12]:

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 1, 1],
       [1, 1, 1]], dtype=int32)>

큰 사이즈의 텐서를 만들 때 tf.fill 함수가 tf.ones 보다 효율적입니다.

tf.one_hot 함수는 원-핫 인코딩 행렬을 만들어 주는 편리한 함수입니다. 첫 번째 매개변수에 원-핫 인코딩 위치를 나타내는 인덱스를 전달하고 두 번째 매개변수에는 원-핫 인코딩 벡터의 길이를 전달합니다. 만들어진 행렬의 크기는 (첫 번째 매개변수의 길이 $\times$ 두 번째 매개변수)가 됩니다. 예를 들어 다음과 같은 코드는 (3 $\times$ 4) 크기의 원-핫 인코딩 행렬을 만듭니다.

In [13]:

tf.one_hot([0, 1, 2], 4)

Out[13]:

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]], dtype=float32)>

텐서플로 2.4 버전부터 tf.experimental.numpy 모듈 아래에서 넘파이 호환 API를 제공합니다. 이 API는 넘파이 1.16 버전을 기반으로 하며 향후 버전에서 바뀔 수 있습니다. 전체 API 목록은 https://www.tensorflow.org/api_docs/python/tf/experimental/numpy%EB%A5%BC 참고하세요. 텐서플로 넘파이 API에 대한 설명은 https://www.tensorflow.org/guide/tf_numpy%EB%A5%BC 참고하세요.

텐서의 데이터 타입과 크기 조작하기¶

In [14]:

t_a_new = tf.cast(t_a, tf.int64)

print(t_a_new.dtype)

<dtype: 'int64'>

In [15]:

t = tf.random.uniform(shape=(3, 5))

t_tr = tf.transpose(t)
print(t.shape, ' --> ', t_tr.shape)

(3, 5)  -->  (5, 3)

In [16]:

t = tf.zeros((30,))

t_reshape = tf.reshape(t, shape=(5, 6))

print(t_reshape.shape)

(5, 6)

In [17]:

t = tf.zeros((1, 2, 1, 4, 1))

t_sqz = tf.squeeze(t, axis=(2, 4))

print(t.shape, ' --> ', t_sqz.shape)

(1, 2, 1, 4, 1)  -->  (1, 2, 4)

텐서에 수학 연산 적용하기¶

In [18]:

tf.random.set_seed(1)

t1 = tf.random.uniform(shape=(5, 2),
                       minval=-1.0,
                       maxval=1.0)

t2 = tf.random.normal(shape=(5, 2),
                      mean=0.0,
                      stddev=1.0)

In [19]:

t3 = tf.multiply(t1, t2).numpy()
print(t3)

[[-0.27  -0.874]
 [-0.017 -0.175]
 [-0.296 -0.139]
 [-0.727  0.135]
 [-0.401  0.004]]

In [20]:

t4 = tf.math.reduce_mean(t1, axis=0)

print(t4)

tf.Tensor([0.09  0.207], shape=(2,), dtype=float32)

In [21]:

t5 = tf.linalg.matmul(t1, t2, transpose_b=True)

print(t5.numpy())

[[-1.144  1.115 -0.87  -0.321  0.856]
 [ 0.248 -0.191  0.25  -0.064 -0.331]
 [-0.478  0.407 -0.436  0.022  0.527]
 [ 0.525 -0.234  0.741 -0.593 -1.194]
 [-0.099  0.26   0.125 -0.462 -0.396]]

In [22]:

t6 = tf.linalg.matmul(t1, t2, transpose_a=True)

print(t6.numpy())

[[-1.711  0.302]
 [ 0.371 -1.049]]

In [23]:

norm_t1 = tf.norm(t1, ord=2, axis=1).numpy()

print(norm_t1)

[1.046 0.293 0.504 0.96  0.383]

In [24]:

np.sqrt(np.sum(np.square(t1), axis=1))

Out[24]:

array([1.046, 0.293, 0.504, 0.96 , 0.383], dtype=float32)

넘파이 함수들은 입력 매개변수를 처리하기 전에 해당 객체의 __array__() 메서드를 호출합니다. 이를 통해 넘파이 호환성을 가진 객체를 만들 수 있습니다. 예를 들면 판다스의 Series 객체를 넘파이 API에 사용할 수 있습니다. 마찬가지로 텐서에도 이 메서드가 구현되어 있기 때문에 넘파이 함수에 텐서를 입력으로 사용할 수 있습니다.

많은 수학 함수는 최상위 수준에서 참조가 가능합니다. 예를 들면 tf.multiply(), tf.reduce_mean(), tf.reduce_sum(), tf.matmul()처럼 사용할 수 있습니다. 파이썬 3.5 버전부터는 @ 연산자를 사용하여 행렬 연산을 수행할 수 있습니다. 예를 들면 다음 계산은 위와 동일한 결과를 만듭니다.

In [25]:

t1 @ tf.transpose(t2)

Out[25]:

<tf.Tensor: shape=(5, 5), dtype=float32, numpy=
array([[-1.144,  1.115, -0.87 , -0.321,  0.856],
       [ 0.248, -0.191,  0.25 , -0.064, -0.331],
       [-0.478,  0.407, -0.436,  0.022,  0.527],
       [ 0.525, -0.234,  0.741, -0.593, -1.194],
       [-0.099,  0.26 ,  0.125, -0.462, -0.396]], dtype=float32)>

In [26]:

import tensorflow.experimental.numpy as tnp
tnp.experimental_enable_numpy_behavior()

tn1 = tnp.array(t1)
tn2 = tnp.array(t2)

print(tnp.dot(tn1, tn2.T))

tf.Tensor(
[[-1.144  1.115 -0.87  -0.321  0.856]
 [ 0.248 -0.191  0.25  -0.064 -0.331]
 [-0.478  0.407 -0.436  0.022  0.527]
 [ 0.525 -0.234  0.741 -0.593 -1.194]
 [-0.099  0.26   0.125 -0.462 -0.396]], shape=(5, 5), dtype=float32)

텐서플로의 전체 수학 함수는 https://www.tensorflow.org/api_docs/python/tf/math%EB%A5%BC 참조하세요. 선형 대수 함수는 https://www.tensorflow.org/api_docs/python/tf/linalg%EC%97%90%EC%84%9C 볼 수 있습니다.

split(), stack(), concatenate() 함수¶

In [27]:

tf.random.set_seed(1)

t = tf.random.uniform((6,))

print(t.numpy())

t_splits = tf.split(t, 3)

[item.numpy() for item in t_splits]

[0.165 0.901 0.631 0.435 0.292 0.643]

Out[27]:

[array([0.165, 0.901], dtype=float32),
 array([0.631, 0.435], dtype=float32),
 array([0.292, 0.643], dtype=float32)]

In [28]:

tf.random.set_seed(1)
t = tf.random.uniform((5,))

print(t.numpy())

t_splits = tf.split(t, num_or_size_splits=[3, 2])

[item.numpy() for item in t_splits]

[0.165 0.901 0.631 0.435 0.292]

Out[28]:

[array([0.165, 0.901, 0.631], dtype=float32),
 array([0.435, 0.292], dtype=float32)]

In [29]:

A = tf.ones((3,))
B = tf.zeros((2,))

C = tf.concat([A, B], axis=0)
print(C.numpy())

[1. 1. 1. 0. 0.]

In [30]:

A = tf.ones((3,))
B = tf.zeros((3,))

S = tf.stack([A, B], axis=1)
print(S.numpy())

[[1. 0.]
 [1. 0.]
 [1. 0.]]

텐서플로 데이터셋 API(tf.data)를 사용하여 입력 파이프라인 구축하기¶

텐서에서 텐서플로 데이터셋 만들기¶

In [31]:

a = [1.2, 3.4, 7.5, 4.1, 5.0, 1.0]

ds = tf.data.Dataset.from_tensor_slices(a)

print(ds)

<_TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.float32, name=None)>

In [32]:

for item in ds:
    print(item)

tf.Tensor(1.2, shape=(), dtype=float32)
tf.Tensor(3.4, shape=(), dtype=float32)
tf.Tensor(7.5, shape=(), dtype=float32)
tf.Tensor(4.1, shape=(), dtype=float32)
tf.Tensor(5.0, shape=(), dtype=float32)
tf.Tensor(1.0, shape=(), dtype=float32)

In [33]:

ds_batch = ds.batch(3)

for i, elem in enumerate(ds_batch, 100):
    print('batch {}:'.format(i), elem.numpy())

batch 100: [1.2 3.4 7.5]
batch 101: [4.1 5.  1. ]

두 개의 텐서를 하나의 데이터셋으로 연결하기¶

In [34]:

tf.random.set_seed(1)

t_x = tf.random.uniform([4, 3], dtype=tf.float32)
t_y = tf.range(4)

In [35]:

ds_x = tf.data.Dataset.from_tensor_slices(t_x)
ds_y = tf.data.Dataset.from_tensor_slices(t_y)

ds_joint = tf.data.Dataset.zip((ds_x, ds_y))

for example in ds_joint:
    print('  x: ', example[0].numpy(),
          '  y: ', example[1].numpy())

  x:  [0.165 0.901 0.631]   y:  0
  x:  [0.435 0.292 0.643]   y:  1
  x:  [0.976 0.435 0.66 ]   y:  2
  x:  [0.605 0.637 0.614]   y:  3

In [36]:

## 방법 2:
ds_joint = tf.data.Dataset.from_tensor_slices((t_x, t_y))

for example in ds_joint:
    print('  x: ', example[0].numpy(),
          '  y: ', example[1].numpy())

  x:  [0.165 0.901 0.631]   y:  0
  x:  [0.435 0.292 0.643]   y:  1
  x:  [0.976 0.435 0.66 ]   y:  2
  x:  [0.605 0.637 0.614]   y:  3

In [37]:

ds_trans = ds_joint.map(lambda x, y: (x*2-1.0, y))

for example in ds_trans:
    print('  x: ', example[0].numpy(),
          '  y: ', example[1].numpy())

  x:  [-0.67   0.803  0.262]   y:  0
  x:  [-0.131 -0.416  0.285]   y:  1
  x:  [ 0.952 -0.13   0.32 ]   y:  2
  x:  [0.21  0.273 0.229]   y:  3

shuffle(), batch(), repeat() 메서드¶

In [38]:

tf.random.set_seed(1)
ds = ds_joint.shuffle(buffer_size=len(t_x))

for example in ds:
    print('  x: ', example[0].numpy(),
          '  y: ', example[1].numpy())

  x:  [0.976 0.435 0.66 ]   y:  2
  x:  [0.435 0.292 0.643]   y:  1
  x:  [0.165 0.901 0.631]   y:  0
  x:  [0.605 0.637 0.614]   y:  3

In [39]:

ds = ds_joint.batch(batch_size=3,
                    drop_remainder=False)

batch_x, batch_y = next(iter(ds))

print('배치 x: \n', batch_x.numpy())

print('배치 y:   ', batch_y.numpy())

배치 x: 
 [[0.165 0.901 0.631]
 [0.435 0.292 0.643]
 [0.976 0.435 0.66 ]]
배치 y:    [0 1 2]

In [40]:

ds = ds_joint.batch(3).repeat(count=2)

for i,(batch_x, batch_y) in enumerate(ds):
    print(i, batch_x.shape, batch_y.numpy())

0 (3, 3) [0 1 2]
1 (1, 3) [3]
2 (3, 3) [0 1 2]
3 (1, 3) [3]

In [41]:

ds = ds_joint.repeat(count=2).batch(3)

for i,(batch_x, batch_y) in enumerate(ds):
    print(i, batch_x.shape, batch_y.numpy())

0 (3, 3) [0 1 2]
1 (3, 3) [3 0 1]
2 (2, 3) [2 3]

In [42]:

tf.random.set_seed(1)

## 순서 1: shuffle -> batch -> repeat
ds = ds_joint.shuffle(4).batch(2).repeat(3)

for i,(batch_x, batch_y) in enumerate(ds):
    print(i, batch_x.shape, batch_y.numpy())

0 (2, 3) [2 1]
1 (2, 3) [0 3]
2 (2, 3) [0 3]
3 (2, 3) [1 2]
4 (2, 3) [3 0]
5 (2, 3) [1 2]

In [43]:

tf.random.set_seed(1)

## 순서 1: shuffle -> batch -> repeat
ds = ds_joint.shuffle(4).batch(2).repeat(20)

for i,(batch_x, batch_y) in enumerate(ds):
    print(i, batch_x.shape, batch_y.numpy())

0 (2, 3) [2 1]
1 (2, 3) [0 3]
2 (2, 3) [0 3]
3 (2, 3) [1 2]
4 (2, 3) [3 0]
5 (2, 3) [1 2]
6 (2, 3) [1 3]
7 (2, 3) [2 0]
8 (2, 3) [1 2]
9 (2, 3) [3 0]
10 (2, 3) [3 0]
11 (2, 3) [2 1]
12 (2, 3) [3 0]
13 (2, 3) [1 2]
14 (2, 3) [3 0]
15 (2, 3) [2 1]
16 (2, 3) [2 3]
17 (2, 3) [0 1]
18 (2, 3) [1 2]
19 (2, 3) [0 3]
20 (2, 3) [0 1]
21 (2, 3) [2 3]
22 (2, 3) [3 2]
23 (2, 3) [0 1]
24 (2, 3) [3 0]
25 (2, 3) [1 2]
26 (2, 3) [1 3]
27 (2, 3) [2 0]
28 (2, 3) [2 1]
29 (2, 3) [0 3]
30 (2, 3) [2 3]
31 (2, 3) [0 1]
32 (2, 3) [3 1]
33 (2, 3) [2 0]
34 (2, 3) [3 2]
35 (2, 3) [1 0]
36 (2, 3) [3 0]
37 (2, 3) [2 1]
38 (2, 3) [0 2]
39 (2, 3) [3 1]

In [44]:

tf.random.set_seed(1)

## 순서 2: batch -> shuffle -> repeat
ds = ds_joint.batch(2).shuffle(4).repeat(3)

for i,(batch_x, batch_y) in enumerate(ds):
    print(i, batch_x.shape, batch_y.numpy())

0 (2, 3) [0 1]
1 (2, 3) [2 3]
2 (2, 3) [0 1]
3 (2, 3) [2 3]
4 (2, 3) [2 3]
5 (2, 3) [0 1]

In [45]:

tf.random.set_seed(1)

## 순서 2: batch -> shuffle -> repeat
ds = ds_joint.batch(2).shuffle(4).repeat(20)

for i,(batch_x, batch_y) in enumerate(ds):
    print(i, batch_x.shape, batch_y.numpy())

0 (2, 3) [0 1]
1 (2, 3) [2 3]
2 (2, 3) [0 1]
3 (2, 3) [2 3]
4 (2, 3) [2 3]
5 (2, 3) [0 1]
6 (2, 3) [2 3]
7 (2, 3) [0 1]
8 (2, 3) [2 3]
9 (2, 3) [0 1]
10 (2, 3) [2 3]
11 (2, 3) [0 1]
12 (2, 3) [2 3]
13 (2, 3) [0 1]
14 (2, 3) [2 3]
15 (2, 3) [0 1]
16 (2, 3) [0 1]
17 (2, 3) [2 3]
18 (2, 3) [2 3]
19 (2, 3) [0 1]
20 (2, 3) [0 1]
21 (2, 3) [2 3]
22 (2, 3) [2 3]
23 (2, 3) [0 1]
24 (2, 3) [2 3]
25 (2, 3) [0 1]
26 (2, 3) [2 3]
27 (2, 3) [0 1]
28 (2, 3) [0 1]
29 (2, 3) [2 3]
30 (2, 3) [0 1]
31 (2, 3) [2 3]
32 (2, 3) [2 3]
33 (2, 3) [0 1]
34 (2, 3) [2 3]
35 (2, 3) [0 1]
36 (2, 3) [2 3]
37 (2, 3) [0 1]
38 (2, 3) [0 1]
39 (2, 3) [2 3]

In [46]:

tf.random.set_seed(1)

## 순서 3: batch -> repeat -> shuffle
ds = ds_joint.batch(2).repeat(3).shuffle(4)

for i,(batch_x, batch_y) in enumerate(ds):
    print(i, batch_x.shape, batch_y.numpy())

0 (2, 3) [0 1]
1 (2, 3) [0 1]
2 (2, 3) [2 3]
3 (2, 3) [2 3]
4 (2, 3) [0 1]
5 (2, 3) [2 3]

로컬 디스크에 있는 파일에서 데이터셋 만들기¶

In [47]:

# 코랩에서 실행하는 경우에는 다음 코드를 실행하여 이미지를 다운로드받으세요.
!mkdir cat_dog_images
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-01.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-02.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-03.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-01.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-02.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-03.jpg
!mv *.jpg cat_dog_images/

--2023-11-10 05:38:42--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-01.jpg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 139707 (136K) [image/jpeg]
Saving to: ‘cat-01.jpg’

cat-01.jpg          100%[===================>] 136.43K  --.-KB/s    in 0.01s   

2023-11-10 05:38:43 (9.25 MB/s) - ‘cat-01.jpg’ saved [139707/139707]

--2023-11-10 05:38:43--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-02.jpg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 152296 (149K) [image/jpeg]
Saving to: ‘cat-02.jpg’

cat-02.jpg          100%[===================>] 148.73K  --.-KB/s    in 0.02s   

2023-11-10 05:38:43 (7.50 MB/s) - ‘cat-02.jpg’ saved [152296/152296]

--2023-11-10 05:38:43--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-03.jpg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 122677 (120K) [image/jpeg]
Saving to: ‘cat-03.jpg’

cat-03.jpg          100%[===================>] 119.80K  --.-KB/s    in 0.02s   

2023-11-10 05:38:43 (6.57 MB/s) - ‘cat-03.jpg’ saved [122677/122677]

--2023-11-10 05:38:43--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-01.jpg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 135505 (132K) [image/jpeg]
Saving to: ‘dog-01.jpg’

dog-01.jpg          100%[===================>] 132.33K  --.-KB/s    in 0.02s   

2023-11-10 05:38:44 (7.73 MB/s) - ‘dog-01.jpg’ saved [135505/135505]

--2023-11-10 05:38:44--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-02.jpg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 258155 (252K) [image/jpeg]
Saving to: ‘dog-02.jpg’

dog-02.jpg          100%[===================>] 252.10K  --.-KB/s    in 0.03s   

2023-11-10 05:38:44 (8.44 MB/s) - ‘dog-02.jpg’ saved [258155/258155]

--2023-11-10 05:38:44--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-03.jpg
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 213035 (208K) [image/jpeg]
Saving to: ‘dog-03.jpg’

dog-03.jpg          100%[===================>] 208.04K  --.-KB/s    in 0.02s   

2023-11-10 05:38:44 (9.34 MB/s) - ‘dog-03.jpg’ saved [213035/213035]

In [48]:

import pathlib

imgdir_path = pathlib.Path('cat_dog_images')

file_list = sorted([str(path) for path in imgdir_path.glob('*.jpg')])

print(file_list)

['cat_dog_images/cat-01.jpg', 'cat_dog_images/cat-02.jpg', 'cat_dog_images/cat-03.jpg', 'cat_dog_images/dog-01.jpg', 'cat_dog_images/dog-02.jpg', 'cat_dog_images/dog-03.jpg']

In [49]:

import matplotlib.pyplot as plt
import os


fig = plt.figure(figsize=(10, 5))
for i,file in enumerate(file_list):
    img_raw = tf.io.read_file(file)
    img = tf.image.decode_image(img_raw)
    print('이미지 크기: ', img.shape)
    ax = fig.add_subplot(2, 3, i+1)
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(img)
    ax.set_title(os.path.basename(file), size=15)

# plt.savefig('images/13_1.png', dpi=300)
plt.tight_layout()
plt.show()

이미지 크기:  (900, 1200, 3)
이미지 크기:  (900, 1200, 3)
이미지 크기:  (900, 742, 3)
이미지 크기:  (800, 1200, 3)
이미지 크기:  (800, 1200, 3)
이미지 크기:  (900, 1200, 3)

In [50]:

labels = [1 if 'dog' in os.path.basename(file) else 0
          for file in file_list]
print(labels)

[0, 0, 0, 1, 1, 1]

In [51]:

ds_files_labels = tf.data.Dataset.from_tensor_slices(
    (file_list, labels))

for item in ds_files_labels:
    print(item[0].numpy(), item[1].numpy())

b'cat_dog_images/cat-01.jpg' 0
b'cat_dog_images/cat-02.jpg' 0
b'cat_dog_images/cat-03.jpg' 0
b'cat_dog_images/dog-01.jpg' 1
b'cat_dog_images/dog-02.jpg' 1
b'cat_dog_images/dog-03.jpg' 1

In [52]:

def load_and_preprocess(path, label):
    image = tf.io.read_file(path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.resize(image, [img_height, img_width])
    image /= 255.0

    return image, label

img_width, img_height = 120, 80

ds_images_labels = ds_files_labels.map(load_and_preprocess)

fig = plt.figure(figsize=(10, 5))
for i,example in enumerate(ds_images_labels):
    print(example[0].shape, example[1].numpy())
    ax = fig.add_subplot(2, 3, i+1)
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(example[0])
    ax.set_title('{}'.format(example[1].numpy()),
                 size=15)

plt.tight_layout()
# plt.savefig('images/13_2.png', dpi=300)
plt.show()

(80, 120, 3) 0
(80, 120, 3) 0
(80, 120, 3) 0
(80, 120, 3) 1
(80, 120, 3) 1
(80, 120, 3) 1

`tensorflow_datasets` 라이브러리에서 데이터셋 로드하기¶

코랩에는 이미 tensorflow-datasets 라이브러리가 설치되어 있지만 최신 버전으로 업그레이드해주세요. 설치 후에는 커널(또는 코랩 런타임)을 재시작해야합니다.

In [53]:

!pip install --upgrade tensorflow-datasets

Requirement already satisfied: tensorflow-datasets in /usr/local/lib/python3.10/dist-packages (4.9.3)
Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.4.0)
Requirement already satisfied: array-record in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.5.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (8.1.7)
Requirement already satisfied: dm-tree in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.1.8)
Requirement already satisfied: etils[enp,epath,etree]>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.5.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.23.5)
Requirement already satisfied: promise in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (2.3)
Requirement already satisfied: protobuf>=3.20 in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (3.20.3)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (5.9.5)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (2.31.0)
Requirement already satisfied: tensorflow-metadata in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.14.0)
Requirement already satisfied: termcolor in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (2.3.0)
Requirement already satisfied: toml in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.10.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (4.66.1)
Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.14.1)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets) (2023.6.0)
Requirement already satisfied: importlib_resources in /usr/local/lib/python3.10/dist-packages (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets) (6.1.1)
Requirement already satisfied: typing_extensions in /usr/local/lib/python3.10/dist-packages (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets) (4.5.0)
Requirement already satisfied: zipp in /usr/local/lib/python3.10/dist-packages (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets) (3.17.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->tensorflow-datasets) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->tensorflow-datasets) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->tensorflow-datasets) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->tensorflow-datasets) (2023.7.22)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from promise->tensorflow-datasets) (1.16.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-metadata->tensorflow-datasets) (1.61.0)

In [54]:

import tensorflow_datasets as tfds

print(len(tfds.list_builders()))
print(tfds.list_builders()[:5])

1210
['abstract_reasoning', 'accentdb', 'aeslc', 'aflw2k3d', 'ag_news_subset']

In [55]:

## 다음 명령을 실행하면 전체 리스트를 얻을 수 있습니다
tfds.list_builders()

Out[55]:

['abstract_reasoning',
 'accentdb',
 'aeslc',
 'aflw2k3d',
 'ag_news_subset',
 'ai2_arc',
 'ai2_arc_with_ir',
 'amazon_us_reviews',
 'anli',
 'answer_equivalence',
 'arc',
 'asqa',
 'asset',
 'assin2',
 'bair_robot_pushing_small',
 'bccd',
 'beans',
 'bee_dataset',
 'beir',
 'big_patent',
 'bigearthnet',
 'billsum',
 'binarized_mnist',
 'binary_alpha_digits',
 'ble_wind_field',
 'blimp',
 'booksum',
 'bool_q',
 'bot_adversarial_dialogue',
 'bucc',
 'c4',
 'c4_wsrs',
 'caltech101',
 'caltech_birds2010',
 'caltech_birds2011',
 'cardiotox',
 'cars196',
 'cassava',
 'cats_vs_dogs',
 'celeb_a',
 'celeb_a_hq',
 'cfq',
 'cherry_blossoms',
 'chexpert',
 'cifar10',
 'cifar100',
 'cifar100_n',
 'cifar10_1',
 'cifar10_corrupted',
 'cifar10_h',
 'cifar10_n',
 'citrus_leaves',
 'cityscapes',
 'civil_comments',
 'clevr',
 'clic',
 'clinc_oos',
 'cmaterdb',
 'cnn_dailymail',
 'coco',
 'coco_captions',
 'coil100',
 'colorectal_histology',
 'colorectal_histology_large',
 'common_voice',
 'conll2002',
 'conll2003',
 'controlled_noisy_web_labels',
 'coqa',
 'corr2cause',
 'cos_e',
 'cosmos_qa',
 'covid19',
 'covid19sum',
 'crema_d',
 'criteo',
 'cs_restaurants',
 'curated_breast_imaging_ddsm',
 'cycle_gan',
 'd4rl_adroit_door',
 'd4rl_adroit_hammer',
 'd4rl_adroit_pen',
 'd4rl_adroit_relocate',
 'd4rl_antmaze',
 'd4rl_mujoco_ant',
 'd4rl_mujoco_halfcheetah',
 'd4rl_mujoco_hopper',
 'd4rl_mujoco_walker2d',
 'dart',
 'databricks_dolly',
 'davis',
 'deep1b',
 'deep_weeds',
 'definite_pronoun_resolution',
 'dementiabank',
 'diabetic_retinopathy_detection',
 'diamonds',
 'div2k',
 'dmlab',
 'doc_nli',
 'dolphin_number_word',
 'domainnet',
 'downsampled_imagenet',
 'drop',
 'dsprites',
 'dtd',
 'duke_ultrasound',
 'e2e_cleaned',
 'efron_morris75',
 'emnist',
 'eraser_multi_rc',
 'esnli',
 'eurosat',
 'fashion_mnist',
 'flic',
 'flores',
 'food101',
 'forest_fires',
 'fuss',
 'gap',
 'geirhos_conflict_stimuli',
 'gem',
 'genomics_ood',
 'german_credit_numeric',
 'gigaword',
 'glove100_angular',
 'glue',
 'goemotions',
 'gov_report',
 'gpt3',
 'gref',
 'groove',
 'grounded_scan',
 'gsm8k',
 'gtzan',
 'gtzan_music_speech',
 'hellaswag',
 'higgs',
 'hillstrom',
 'horses_or_humans',
 'howell',
 'i_naturalist2017',
 'i_naturalist2018',
 'i_naturalist2021',
 'imagenet2012',
 'imagenet2012_corrupted',
 'imagenet2012_fewshot',
 'imagenet2012_multilabel',
 'imagenet2012_real',
 'imagenet2012_subset',
 'imagenet_a',
 'imagenet_lt',
 'imagenet_pi',
 'imagenet_r',
 'imagenet_resized',
 'imagenet_sketch',
 'imagenet_v2',
 'imagenette',
 'imagewang',
 'imdb_reviews',
 'irc_disentanglement',
 'iris',
 'istella',
 'kddcup99',
 'kitti',
 'kmnist',
 'laion400m',
 'lambada',
 'lfw',
 'librispeech',
 'librispeech_lm',
 'libritts',
 'ljspeech',
 'lm1b',
 'locomotion',
 'lost_and_found',
 'lsun',
 'lvis',
 'malaria',
 'math_dataset',
 'math_qa',
 'mctaco',
 'media_sum',
 'mlqa',
 'mnist',
 'mnist_corrupted',
 'movie_lens',
 'movie_rationales',
 'movielens',
 'moving_mnist',
 'mrqa',
 'mslr_web',
 'mt_opt',
 'mtnt',
 'multi_news',
 'multi_nli',
 'multi_nli_mismatch',
 'natural_instructions',
 'natural_questions',
 'natural_questions_open',
 'newsroom',
 'nsynth',
 'nyu_depth_v2',
 'ogbg_molpcba',
 'omniglot',
 'open_images_challenge2019_detection',
 'open_images_v4',
 'openbookqa',
 'opinion_abstracts',
 'opinosis',
 'opus',
 'oxford_flowers102',
 'oxford_iiit_pet',
 'para_crawl',
 'pass',
 'patch_camelyon',
 'paws_wiki',
 'paws_x_wiki',
 'penguins',
 'pet_finder',
 'pg19',
 'piqa',
 'places365_small',
 'placesfull',
 'plant_leaves',
 'plant_village',
 'plantae_k',
 'protein_net',
 'q_re_cc',
 'qa4mre',
 'qasc',
 'quac',
 'quality',
 'quickdraw_bitmap',
 'race',
 'radon',
 'real_toxicity_prompts',
 'reddit',
 'reddit_disentanglement',
 'reddit_tifu',
 'ref_coco',
 'resisc45',
 'rlu_atari',
 'rlu_atari_checkpoints',
 'rlu_atari_checkpoints_ordered',
 'rlu_control_suite',
 'rlu_dmlab_explore_object_rewards_few',
 'rlu_dmlab_explore_object_rewards_many',
 'rlu_dmlab_rooms_select_nonmatching_object',
 'rlu_dmlab_rooms_watermaze',
 'rlu_dmlab_seekavoid_arena01',
 'rlu_locomotion',
 'rlu_rwrl',
 'robomimic_mg',
 'robomimic_mh',
 'robomimic_ph',
 'robonet',
 'robosuite_panda_pick_place_can',
 'rock_paper_scissors',
 'rock_you',
 's3o4d',
 'salient_span_wikipedia',
 'samsum',
 'savee',
 'scan',
 'scene_parse150',
 'schema_guided_dialogue',
 'sci_tail',
 'scicite',
 'scientific_papers',
 'scrolls',
 'segment_anything',
 'sentiment140',
 'shapes3d',
 'sift1m',
 'simpte',
 'siscore',
 'smallnorb',
 'smartwatch_gestures',
 'snli',
 'so2sat',
 'speech_commands',
 'spoken_digit',
 'squad',
 'squad_question_generation',
 'stanford_dogs',
 'stanford_online_products',
 'star_cfq',
 'starcraft_video',
 'stl10',
 'story_cloze',
 'summscreen',
 'sun397',
 'super_glue',
 'svhn_cropped',
 'symmetric_solids',
 'tao',
 'tatoeba',
 'ted_hrlr_translate',
 'ted_multi_translate',
 'tedlium',
 'tf_flowers',
 'the300w_lp',
 'tiny_shakespeare',
 'titanic',
 'trec',
 'trivia_qa',
 'tydi_qa',
 'uc_merced',
 'ucf101',
 'unified_qa',
 'universal_dependencies',
 'unnatural_instructions',
 'user_libri_audio',
 'user_libri_text',
 'vctk',
 'visual_domain_decathlon',
 'voc',
 'voxceleb',
 'voxforge',
 'waymo_open_dataset',
 'web_graph',
 'web_nlg',
 'web_questions',
 'webvid',
 'wider_face',
 'wiki40b',
 'wiki_auto',
 'wiki_bio',
 'wiki_dialog',
 'wiki_table_questions',
 'wiki_table_text',
 'wikiann',
 'wikihow',
 'wikipedia',
 'wikipedia_toxicity_subtypes',
 'wine_quality',
 'winogrande',
 'wit',
 'wit_kaggle',
 'wmt13_translate',
 'wmt14_translate',
 'wmt15_translate',
 'wmt16_translate',
 'wmt17_translate',
 'wmt18_translate',
 'wmt19_translate',
 'wmt_t2t_translate',
 'wmt_translate',
 'wordnet',
 'wsc273',
 'xnli',
 'xquad',
 'xsum',
 'xtreme_pawsx',
 'xtreme_pos',
 'xtreme_s',
 'xtreme_xnli',
 'yahoo_ltrc',
 'yelp_polarity_reviews',
 'yes_no',
 'youtube_vis',
 'huggingface:acronym_identification',
 'huggingface:ade_corpus_v2',
 'huggingface:adv_glue',
 'huggingface:adversarial_qa',
 'huggingface:aeslc',
 'huggingface:afrikaans_ner_corpus',
 'huggingface:ag_news',
 'huggingface:ai2_arc',
 'huggingface:air_dialogue',
 'huggingface:ajgt_twitter_ar',
 'huggingface:allegro_reviews',
 'huggingface:allocine',
 'huggingface:alt',
 'huggingface:amazon_polarity',
 'huggingface:amazon_reviews_multi',
 'huggingface:amazon_us_reviews',
 'huggingface:ambig_qa',
 'huggingface:americas_nli',
 'huggingface:ami',
 'huggingface:amttl',
 'huggingface:anli',
 'huggingface:app_reviews',
 'huggingface:aqua_rat',
 'huggingface:aquamuse',
 'huggingface:ar_cov19',
 'huggingface:ar_res_reviews',
 'huggingface:ar_sarcasm',
 'huggingface:arabic_billion_words',
 'huggingface:arabic_pos_dialect',
 'huggingface:arabic_speech_corpus',
 'huggingface:arcd',
 'huggingface:arsentd_lev',
 'huggingface:art',
 'huggingface:arxiv_dataset',
 'huggingface:ascent_kb',
 'huggingface:aslg_pc12',
 'huggingface:asnq',
 'huggingface:asset',
 'huggingface:assin',
 'huggingface:assin2',
 'huggingface:atomic',
 'huggingface:autshumato',
 'huggingface:babi_qa',
 'huggingface:banking77',
 'huggingface:bbaw_egyptian',
 'huggingface:bbc_hindi_nli',
 'huggingface:bc2gm_corpus',
 'huggingface:beans',
 'huggingface:best2009',
 'huggingface:bianet',
 'huggingface:bible_para',
 'huggingface:big_patent',
 'huggingface:bigbench',
 'huggingface:billsum',
 'huggingface:bing_coronavirus_query_set',
 'huggingface:biomrc',
 'huggingface:biosses',
 'huggingface:biwi_kinect_head_pose',
 'huggingface:blbooks',
 'huggingface:blbooksgenre',
 'huggingface:blended_skill_talk',
 'huggingface:blimp',
 'huggingface:blog_authorship_corpus',
 'huggingface:bn_hate_speech',
 'huggingface:bnl_newspapers',
 'huggingface:bookcorpus',
 'huggingface:bookcorpusopen',
 'huggingface:boolq',
 'huggingface:bprec',
 'huggingface:break_data',
 'huggingface:brwac',
 'huggingface:bsd_ja_en',
 'huggingface:bswac',
 'huggingface:c3',
 'huggingface:c4',
 'huggingface:cail2018',
 'huggingface:caner',
 'huggingface:capes',
 'huggingface:casino',
 'huggingface:catalonia_independence',
 'huggingface:cats_vs_dogs',
 'huggingface:cawac',
 'huggingface:cbt',
 'huggingface:cc100',
 'huggingface:cc_news',
 'huggingface:ccaligned_multilingual',
 'huggingface:cdsc',
 'huggingface:cdt',
 'huggingface:cedr',
 'huggingface:cfq',
 'huggingface:chr_en',
 'huggingface:cifar10',
 'huggingface:cifar100',
 'huggingface:circa',
 'huggingface:civil_comments',
 'huggingface:clickbait_news_bg',
 'huggingface:climate_fever',
 'huggingface:clinc_oos',
 'huggingface:clue',
 'huggingface:cmrc2018',
 'huggingface:cmu_hinglish_dog',
 'huggingface:cnn_dailymail',
 'huggingface:coached_conv_pref',
 'huggingface:coarse_discourse',
 'huggingface:codah',
 'huggingface:code_search_net',
 'huggingface:code_x_glue_cc_clone_detection_big_clone_bench',
 'huggingface:code_x_glue_cc_clone_detection_poj104',
 'huggingface:code_x_glue_cc_cloze_testing_all',
 'huggingface:code_x_glue_cc_cloze_testing_maxmin',
 'huggingface:code_x_glue_cc_code_completion_line',
 'huggingface:code_x_glue_cc_code_completion_token',
 'huggingface:code_x_glue_cc_code_refinement',
 'huggingface:code_x_glue_cc_code_to_code_trans',
 'huggingface:code_x_glue_cc_defect_detection',
 'huggingface:code_x_glue_ct_code_to_text',
 'huggingface:code_x_glue_tc_nl_code_search_adv',
 'huggingface:code_x_glue_tc_text_to_code',
 'huggingface:code_x_glue_tt_text_to_text',
 'huggingface:com_qa',
 'huggingface:common_gen',
 'huggingface:common_language',
 'huggingface:common_voice',
 'huggingface:commonsense_qa',
 'huggingface:competition_math',
 'huggingface:compguesswhat',
 'huggingface:conceptnet5',
 'huggingface:conceptual_12m',
 'huggingface:conceptual_captions',
 'huggingface:conll2000',
 'huggingface:conll2002',
 'huggingface:conll2003',
 'huggingface:conll2012_ontonotesv5',
 'huggingface:conllpp',
 'huggingface:consumer-finance-complaints',
 'huggingface:conv_ai',
 'huggingface:conv_ai_2',
 'huggingface:conv_ai_3',
 'huggingface:conv_questions',
 'huggingface:coqa',
 'huggingface:cord19',
 'huggingface:cornell_movie_dialog',
 'huggingface:cos_e',
 'huggingface:cosmos_qa',
 'huggingface:counter',
 'huggingface:covid_qa_castorini',
 'huggingface:covid_qa_deepset',
 'huggingface:covid_qa_ucsd',
 'huggingface:covid_tweets_japanese',
 'huggingface:covost2',
 'huggingface:cppe-5',
 'huggingface:craigslist_bargains',
 'huggingface:crawl_domain',
 'huggingface:crd3',
 'huggingface:crime_and_punish',
 'huggingface:crows_pairs',
 'huggingface:cryptonite',
 'huggingface:cs_restaurants',
 'huggingface:cuad',
 'huggingface:curiosity_dialogs',
 'huggingface:daily_dialog',
 'huggingface:dane',
 'huggingface:danish_political_comments',
 'huggingface:dart',
 'huggingface:datacommons_factcheck',
 'huggingface:dbpedia_14',
 'huggingface:dbrd',
 'huggingface:deal_or_no_dialog',
 'huggingface:definite_pronoun_resolution',
 'huggingface:dengue_filipino',
 'huggingface:dialog_re',
 'huggingface:diplomacy_detection',
 'huggingface:disaster_response_messages',
 'huggingface:discofuse',
 'huggingface:discovery',
 'huggingface:disfl_qa',
 'huggingface:doc2dial',
 'huggingface:docred',
 'huggingface:doqa',
 'huggingface:dream',
 'huggingface:drop',
 'huggingface:duorc',
 'huggingface:dutch_social',
 'huggingface:dyk',
 'huggingface:e2e_nlg',
 'huggingface:e2e_nlg_cleaned',
 'huggingface:ecb',
 'huggingface:ecthr_cases',
 'huggingface:eduge',
 'huggingface:ehealth_kd',
 'huggingface:eitb_parcc',
 'huggingface:electricity_load_diagrams',
 'huggingface:eli5',
 'huggingface:eli5_category',
 'huggingface:elkarhizketak',
 'huggingface:emea',
 'huggingface:emo',
 'huggingface:emotion',
 'huggingface:emotone_ar',
 'huggingface:empathetic_dialogues',
 'huggingface:enriched_web_nlg',
 'huggingface:enwik8',
 'huggingface:eraser_multi_rc',
 'huggingface:esnli',
 'huggingface:eth_py150_open',
 'huggingface:ethos',
 'huggingface:ett',
 'huggingface:eu_regulatory_ir',
 'huggingface:eurlex',
 'huggingface:euronews',
 'huggingface:europa_eac_tm',
 'huggingface:europa_ecdc_tm',
 'huggingface:europarl_bilingual',
 'huggingface:event2Mind',
 'huggingface:evidence_infer_treatment',
 'huggingface:exams',
 'huggingface:factckbr',
 'huggingface:fake_news_english',
 'huggingface:fake_news_filipino',
 'huggingface:farsi_news',
 'huggingface:fashion_mnist',
 'huggingface:fever',
 'huggingface:few_rel',
 'huggingface:financial_phrasebank',
 'huggingface:finer',
 'huggingface:flores',
 'huggingface:flue',
 'huggingface:food101',
 'huggingface:fquad',
 'huggingface:freebase_qa',
 'huggingface:gap',
 'huggingface:gem',
 'huggingface:generated_reviews_enth',
 'huggingface:generics_kb',
 'huggingface:german_legal_entity_recognition',
 'huggingface:germaner',
 'huggingface:germeval_14',
 'huggingface:giga_fren',
 'huggingface:gigaword',
 'huggingface:glucose',
 'huggingface:glue',
 'huggingface:gnad10',
 'huggingface:go_emotions',
 'huggingface:gooaq',
 'huggingface:google_wellformed_query',
 'huggingface:grail_qa',
 'huggingface:great_code',
 'huggingface:greek_legal_code',
 'huggingface:gsm8k',
 'huggingface:guardian_authorship',
 'huggingface:gutenberg_time',
 'huggingface:hans',
 'huggingface:hansards',
 'huggingface:hard',
 'huggingface:harem',
 'huggingface:has_part',
 'huggingface:hate_offensive',
 'huggingface:hate_speech18',
 'huggingface:hate_speech_filipino',
 'huggingface:hate_speech_offensive',
 'huggingface:hate_speech_pl',
 'huggingface:hate_speech_portuguese',
 'huggingface:hatexplain',
 'huggingface:hausa_voa_ner',
 'huggingface:hausa_voa_topics',
 'huggingface:hda_nli_hindi',
 'huggingface:head_qa',
 'huggingface:health_fact',
 'huggingface:hebrew_projectbenyehuda',
 'huggingface:hebrew_sentiment',
 'huggingface:hebrew_this_world',
 'huggingface:hellaswag',
 'huggingface:hendrycks_test',
 'huggingface:hind_encorp',
 'huggingface:hindi_discourse',
 'huggingface:hippocorpus',
 'huggingface:hkcancor',
 'huggingface:hlgd',
 'huggingface:hope_edi',
 'huggingface:hotpot_qa',
 'huggingface:hover',
 'huggingface:hrenwac_para',
 'huggingface:hrwac',
 'huggingface:humicroedit',
 'huggingface:hybrid_qa',
 'huggingface:hyperpartisan_news_detection',
 'huggingface:iapp_wiki_qa_squad',
 'huggingface:id_clickbait',
 'huggingface:id_liputan6',
 'huggingface:id_nergrit_corpus',
 'huggingface:id_newspapers_2018',
 'huggingface:id_panl_bppt',
 'huggingface:id_puisi',
 'huggingface:igbo_english_machine_translation',
 'huggingface:igbo_monolingual',
 'huggingface:igbo_ner',
 'huggingface:ilist',
 'huggingface:imagenet-1k',
 'huggingface:imagenet_sketch',
 'huggingface:imdb',
 'huggingface:imdb_urdu_reviews',
 'huggingface:imppres',
 'huggingface:indic_glue',
 'huggingface:indonli',
 'huggingface:indonlu',
 'huggingface:inquisitive_qg',
 'huggingface:interpress_news_category_tr',
 'huggingface:interpress_news_category_tr_lite',
 'huggingface:irc_disentangle',
 'huggingface:isixhosa_ner_corpus',
 'huggingface:isizulu_ner_corpus',
 'huggingface:iwslt2017',
 'huggingface:jeopardy',
 'huggingface:jfleg',
 'huggingface:jigsaw_toxicity_pred',
 'huggingface:jigsaw_unintended_bias',
 'huggingface:jnlpba',
 'huggingface:journalists_questions',
 'huggingface:kan_hope',
 'huggingface:kannada_news',
 'huggingface:kd_conv',
 'huggingface:kde4',
 'huggingface:kelm',
 'huggingface:kilt_tasks',
 'huggingface:kilt_wikipedia',
 'huggingface:kinnews_kirnews',
 'huggingface:klue',
 'huggingface:kor_3i4k',
 'huggingface:kor_hate',
 'huggingface:kor_ner',
 'huggingface:kor_nli',
 'huggingface:kor_nlu',
 'huggingface:kor_qpair',
 'huggingface:kor_sae',
 'huggingface:kor_sarcasm',
 'huggingface:labr',
 'huggingface:lama',
 'huggingface:lambada',
 'huggingface:large_spanish_corpus',
 'huggingface:laroseda',
 'huggingface:lc_quad',
 'huggingface:lccc',
 'huggingface:lener_br',
 'huggingface:lex_glue',
 'huggingface:liar',
 'huggingface:librispeech_asr',
 'huggingface:librispeech_lm',
 'huggingface:limit',
 'huggingface:lince',
 'huggingface:linnaeus',
 'huggingface:liveqa',
 'huggingface:lj_speech',
 'huggingface:lm1b',
 'huggingface:lst20',
 'huggingface:m_lama',
 'huggingface:mac_morpho',
 'huggingface:makhzan',
 'huggingface:masakhaner',
 'huggingface:math_dataset',
 'huggingface:math_qa',
 'huggingface:matinf',
 'huggingface:mbpp',
 'huggingface:mc4',
 'huggingface:mc_taco',
 'huggingface:md_gender_bias',
 'huggingface:mdd',
 'huggingface:med_hop',
 'huggingface:medal',
 'huggingface:medical_dialog',
 'huggingface:medical_questions_pairs',
 'huggingface:medmcqa',
 'huggingface:menyo20k_mt',
 'huggingface:meta_woz',
 'huggingface:metashift',
 'huggingface:metooma',
 'huggingface:metrec',
 'huggingface:miam',
 'huggingface:mkb',
 'huggingface:mkqa',
 'huggingface:mlqa',
 'huggingface:mlsum',
 'huggingface:mnist',
 'huggingface:mocha',
 'huggingface:monash_tsf',
 'huggingface:moroco',
 'huggingface:movie_rationales',
 'huggingface:mrqa',
 'huggingface:ms_marco',
 'huggingface:ms_terms',
 'huggingface:msr_genomics_kbcomp',
 'huggingface:msr_sqa',
 'huggingface:msr_text_compression',
 'huggingface:msr_zhen_translation_parity',
 'huggingface:msra_ner',
 'huggingface:mt_eng_vietnamese',
 'huggingface:muchocine',
 'huggingface:multi_booked',
 'huggingface:multi_eurlex',
 'huggingface:multi_news',
 'huggingface:multi_nli',
 'huggingface:multi_nli_mismatch',
 'huggingface:multi_para_crawl',
 'huggingface:multi_re_qa',
 'huggingface:multi_woz_v22',
 'huggingface:multi_x_science_sum',
 'huggingface:multidoc2dial',
 'huggingface:multilingual_librispeech',
 'huggingface:mutual_friends',
 'huggingface:mwsc',
 'huggingface:myanmar_news',
 'huggingface:narrativeqa',
 'huggingface:narrativeqa_manual',
 'huggingface:natural_questions',
 'huggingface:ncbi_disease',
 'huggingface:nchlt',
 'huggingface:ncslgr',
 'huggingface:nell',
 'huggingface:neural_code_search',
 'huggingface:news_commentary',
 'huggingface:newsgroup',
 'huggingface:newsph',
 'huggingface:newsph_nli',
 'huggingface:newspop',
 'huggingface:newsqa',
 'huggingface:newsroom',
 'huggingface:nkjp-ner',
 'huggingface:nli_tr',
 'huggingface:nlu_evaluation_data',
 'huggingface:norec',
 'huggingface:norne',
 'huggingface:norwegian_ner',
 'huggingface:nq_open',
 'huggingface:nsmc',
 'huggingface:numer_sense',
 'huggingface:numeric_fused_head',
 'huggingface:oclar',
 'huggingface:offcombr',
 'huggingface:offenseval2020_tr',
 'huggingface:offenseval_dravidian',
 'huggingface:ofis_publik',
 'huggingface:ohsumed',
 'huggingface:ollie',
 'huggingface:omp',
 'huggingface:onestop_english',
 'huggingface:onestop_qa',
 'huggingface:open_subtitles',
 'huggingface:openai_humaneval',
 'huggingface:openbookqa',
 'huggingface:openslr',
 'huggingface:openwebtext',
 'huggingface:opinosis',
 'huggingface:opus100',
 'huggingface:opus_books',
 'huggingface:opus_dgt',
 'huggingface:opus_dogc',
 'huggingface:opus_elhuyar',
 'huggingface:opus_euconst',
 'huggingface:opus_finlex',
 'huggingface:opus_fiskmo',
 'huggingface:opus_gnome',
 'huggingface:opus_infopankki',
 'huggingface:opus_memat',
 'huggingface:opus_montenegrinsubs',
 'huggingface:opus_openoffice',
 'huggingface:opus_paracrawl',
 'huggingface:opus_rf',
 'huggingface:opus_tedtalks',
 'huggingface:opus_ubuntu',
 'huggingface:opus_wikipedia',
 'huggingface:opus_xhosanavy',
 'huggingface:orange_sum',
 'huggingface:oscar',
 'huggingface:para_crawl',
 'huggingface:para_pat',
 'huggingface:parsinlu_reading_comprehension',
 'huggingface:pass',
 'huggingface:paws',
 'huggingface:paws-x',
 'huggingface:pec',
 'huggingface:peer_read',
 'huggingface:peoples_daily_ner',
 'huggingface:per_sent',
 'huggingface:persian_ner',
 'huggingface:pg19',
 'huggingface:php',
 'huggingface:piaf',
 'huggingface:pib',
 'huggingface:piqa',
 'huggingface:pn_summary',
 'huggingface:poem_sentiment',
 'huggingface:polemo2',
 'huggingface:poleval2019_cyberbullying',
 'huggingface:poleval2019_mt',
 'huggingface:polsum',
 'huggingface:polyglot_ner',
 'huggingface:prachathai67k',
 'huggingface:pragmeval',
 'huggingface:proto_qa',
 'huggingface:psc',
 'huggingface:ptb_text_only',
 'huggingface:pubmed',
 'huggingface:pubmed_qa',
 'huggingface:py_ast',
 'huggingface:qa4mre',
 'huggingface:qa_srl',
 'huggingface:qa_zre',
 'huggingface:qangaroo',
 'huggingface:qanta',
 'huggingface:qasc',
 'huggingface:qasper',
 'huggingface:qed',
 'huggingface:qed_amara',
 'huggingface:quac',
 'huggingface:quail',
 'huggingface:quarel',
 'huggingface:quartz',
 'huggingface:quickdraw',
 'huggingface:quora',
 'huggingface:quoref',
 'huggingface:race',
 'huggingface:re_dial',
 'huggingface:reasoning_bg',
 'huggingface:recipe_nlg',
 'huggingface:reclor',
 'huggingface:red_caps',
 'huggingface:reddit',
 'huggingface:reddit_tifu',
 'huggingface:refresd',
 'huggingface:reuters21578',
 'huggingface:riddle_sense',
 'huggingface:ro_sent',
 'huggingface:ro_sts',
 'huggingface:ro_sts_parallel',
 'huggingface:roman_urdu',
 'huggingface:roman_urdu_hate_speech',
 'huggingface:ronec',
 'huggingface:ropes',
 'huggingface:rotten_tomatoes',
 'huggingface:russian_super_glue',
 'huggingface:rvl_cdip',
 'huggingface:s2orc',
 'huggingface:samsum',
 'huggingface:sanskrit_classic',
 'huggingface:saudinewsnet',
 'huggingface:sberquad',
 'huggingface:sbu_captions',
 'huggingface:scan',
 'huggingface:scb_mt_enth_2020',
 'huggingface:scene_parse_150',
 'huggingface:schema_guided_dstc8',
 'huggingface:scicite',
 'huggingface:scielo',
 'huggingface:scientific_papers',
 'huggingface:scifact',
 'huggingface:sciq',
 'huggingface:scitail',
 'huggingface:scitldr',
 'huggingface:search_qa',
 'huggingface:sede',
 'huggingface:selqa',
 'huggingface:sem_eval_2010_task_8',
 'huggingface:sem_eval_2014_task_1',
 'huggingface:sem_eval_2018_task_1',
 'huggingface:sem_eval_2020_task_11',
 'huggingface:sent_comp',
 'huggingface:senti_lex',
 'huggingface:senti_ws',
 'huggingface:sentiment140',
 'huggingface:sepedi_ner',
 'huggingface:sesotho_ner_corpus',
 'huggingface:setimes',
 'huggingface:setswana_ner_corpus',
 'huggingface:sharc',
 'huggingface:sharc_modified',
 'huggingface:sick',
 'huggingface:silicone',
 'huggingface:simple_questions_v2',
 'huggingface:siswati_ner_corpus',
 'huggingface:smartdata',
 'huggingface:sms_spam',
 'huggingface:snips_built_in_intents',
 'huggingface:snli',
 'huggingface:snow_simplified_japanese_corpus',
 'huggingface:so_stacksample',
 'huggingface:social_bias_frames',
 'huggingface:social_i_qa',
 'huggingface:sofc_materials_articles',
 'huggingface:sogou_news',
 'huggingface:spanish_billion_words',
 'huggingface:spc',
 'huggingface:species_800',
 'huggingface:speech_commands',
 'huggingface:spider',
 'huggingface:squad',
 'huggingface:squad_adversarial',
 'huggingface:squad_es',
 'huggingface:squad_it',
 'huggingface:squad_kor_v1',
 'huggingface:squad_kor_v2',
 'huggingface:squad_v1_pt',
 'huggingface:squad_v2',
 'huggingface:squadshifts',
 'huggingface:srwac',
 'huggingface:sst',
 'huggingface:stereoset',
 'huggingface:story_cloze',
 'huggingface:stsb_mt_sv',
 'huggingface:stsb_multi_mt',
 'huggingface:style_change_detection',
 'huggingface:subjqa',
 'huggingface:super_glue',
 'huggingface:superb',
 'huggingface:svhn',
 'huggingface:swag',
 'huggingface:swahili',
 'huggingface:swahili_news',
 'huggingface:swda',
 'huggingface:swedish_medical_ner',
 'huggingface:swedish_ner_corpus',
 'huggingface:swedish_reviews',
 'huggingface:swiss_judgment_prediction',
 'huggingface:tab_fact',
 'huggingface:tamilmixsentiment',
 'huggingface:tanzil',
 'huggingface:tapaco',
 'huggingface:tashkeela',
 'huggingface:taskmaster1',
 'huggingface:taskmaster2',
 'huggingface:taskmaster3',
 'huggingface:tatoeba',
 'huggingface:ted_hrlr',
 'huggingface:ted_iwlst2013',
 'huggingface:ted_multi',
 'huggingface:ted_talks_iwslt',
 'huggingface:telugu_books',
 'huggingface:telugu_news',
 'huggingface:tep_en_fa_para',
 'huggingface:text2log',
 ...]

CelebA 데이터셋 다운로드하기

In [56]:

celeba_bldr = tfds.builder('celeb_a')

print(celeba_bldr.info.features)
print('\n', 30*"=", '\n')
print(celeba_bldr.info.features.keys())
print('\n', 30*"=", '\n')
print(celeba_bldr.info.features['image'])
print('\n', 30*"=", '\n')
print(celeba_bldr.info.features['attributes'].keys())
print('\n', 30*"=", '\n')
print(celeba_bldr.info.citation)

FeaturesDict({
    'attributes': FeaturesDict({
        '5_o_Clock_Shadow': bool,
        'Arched_Eyebrows': bool,
        'Attractive': bool,
        'Bags_Under_Eyes': bool,
        'Bald': bool,
        'Bangs': bool,
        'Big_Lips': bool,
        'Big_Nose': bool,
        'Black_Hair': bool,
        'Blond_Hair': bool,
        'Blurry': bool,
        'Brown_Hair': bool,
        'Bushy_Eyebrows': bool,
        'Chubby': bool,
        'Double_Chin': bool,
        'Eyeglasses': bool,
        'Goatee': bool,
        'Gray_Hair': bool,
        'Heavy_Makeup': bool,
        'High_Cheekbones': bool,
        'Male': bool,
        'Mouth_Slightly_Open': bool,
        'Mustache': bool,
        'Narrow_Eyes': bool,
        'No_Beard': bool,
        'Oval_Face': bool,
        'Pale_Skin': bool,
        'Pointy_Nose': bool,
        'Receding_Hairline': bool,
        'Rosy_Cheeks': bool,
        'Sideburns': bool,
        'Smiling': bool,
        'Straight_Hair': bool,
        'Wavy_Hair': bool,
        'Wearing_Earrings': bool,
        'Wearing_Hat': bool,
        'Wearing_Lipstick': bool,
        'Wearing_Necklace': bool,
        'Wearing_Necktie': bool,
        'Young': bool,
    }),
    'identity': FeaturesDict({
        'Identity_No': int64,
    }),
    'image': Image(shape=(218, 178, 3), dtype=uint8),
    'landmarks': FeaturesDict({
        'lefteye_x': int64,
        'lefteye_y': int64,
        'leftmouth_x': int64,
        'leftmouth_y': int64,
        'nose_x': int64,
        'nose_y': int64,
        'righteye_x': int64,
        'righteye_y': int64,
        'rightmouth_x': int64,
        'rightmouth_y': int64,
    }),
})

 ============================== 

dict_keys(['image', 'landmarks', 'attributes', 'identity'])

 ============================== 

Image(shape=(218, 178, 3), dtype=uint8)

 ============================== 

dict_keys(['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young'])

 ============================== 

@inproceedings{conf/iccv/LiuLWT15,
  added-at = {2018-10-09T00:00:00.000+0200},
  author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
  biburl = {https://www.bibsonomy.org/bibtex/250e4959be61db325d2f02c1d8cd7bfbb/dblp},
  booktitle = {ICCV},
  crossref = {conf/iccv/2015},
  ee = {http://doi.ieeecomputersociety.org/10.1109/ICCV.2015.425},
  interhash = {3f735aaa11957e73914bbe2ca9d5e702},
  intrahash = {50e4959be61db325d2f02c1d8cd7bfbb},
  isbn = {978-1-4673-8391-2},
  keywords = {dblp},
  pages = {3730-3738},
  publisher = {IEEE Computer Society},
  timestamp = {2018-10-11T11:43:28.000+0200},
  title = {Deep Learning Face Attributes in the Wild.},
  url = {http://dblp.uni-trier.de/db/conf/iccv/iccv2015.html#LiuLWT15},
  year = 2015
}

In [57]:

# 아래 셀에서 CelebA 데이터를 다운로드할 때 에러가 발생하면 https://git.io/JL5GM 에서
# 4개의 파일을 수동으로 받아 ~/tensorflow_datasets/downloads/manual로 복사하세요.
# 또는 gdown 패키지를 사용해 역자의 드라이브에서 다운로드할 수 있습니다.

import gdown
gdown.download(id='1vDDFjykRuzEagaHwwIlv6hpaHDgEUcGo', output='img_align_celeba.zip')
gdown.download(id='1XDTGJ2-QNMvkIdFwlvYAO28uc9v2d9PO', output='list_attr_celeba.txt')
gdown.download(id='1V6zDszhMCokTZTh4EZ48RB9wslY_YSCl', output='list_eval_partition.txt')
gdown.download(id='1iwam-RFy3tuh0yj29kK9tgEJOqGrH4_r', output='list_landmarks_align_celeba.txt')

!mkdir -p ~/tensorflow_datasets/downloads/manual
!cp img_align_celeba.zip ~/tensorflow_datasets/downloads/manual
!cp list_attr_celeba.txt ~/tensorflow_datasets/downloads/manual
!cp list_eval_partition.txt ~/tensorflow_datasets/downloads/manual
!cp list_landmarks_align_celeba.txt ~/tensorflow_datasets/downloads/manual

Downloading...
From: https://drive.google.com/uc?id=1vDDFjykRuzEagaHwwIlv6hpaHDgEUcGo
To: /content/img_align_celeba.zip
100%|██████████| 1.44G/1.44G [00:26<00:00, 55.3MB/s]
Downloading...
From: https://drive.google.com/uc?id=1XDTGJ2-QNMvkIdFwlvYAO28uc9v2d9PO
To: /content/list_attr_celeba.txt
100%|██████████| 26.7M/26.7M [00:00<00:00, 60.1MB/s]
Downloading...
From: https://drive.google.com/uc?id=1V6zDszhMCokTZTh4EZ48RB9wslY_YSCl
To: /content/list_eval_partition.txt
100%|██████████| 2.84M/2.84M [00:00<00:00, 235MB/s]
Downloading...
From: https://drive.google.com/uc?id=1iwam-RFy3tuh0yj29kK9tgEJOqGrH4_r
To: /content/list_landmarks_align_celeba.txt
100%|██████████| 12.2M/12.2M [00:00<00:00, 54.7MB/s]

In [58]:

# 데이터 다운로드하고 디스크에 저장합니다
celeba_bldr.download_and_prepare()

Downloading and preparing dataset 1.39 GiB (download: 1.39 GiB, generated: 1.63 GiB, total: 3.01 GiB) to /root/tensorflow_datasets/celeb_a/2.1.0...

Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/162770 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/celeb_a/2.1.0.incomplete7TEVX3/celeb_a-train.tfrecord*...:   0%|          …

Generating validation examples...:   0%|          | 0/19867 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/celeb_a/2.1.0.incomplete7TEVX3/celeb_a-validation.tfrecord*...:   0%|     …

Generating test examples...:   0%|          | 0/19962 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/celeb_a/2.1.0.incomplete7TEVX3/celeb_a-test.tfrecord*...:   0%|          |…

Dataset celeb_a downloaded and prepared to /root/tensorflow_datasets/celeb_a/2.1.0. Subsequent calls will reuse this data.

In [59]:

# 디스크에서 tf.data.Datasets으로 데이터 로드합니다
datasets = celeba_bldr.as_dataset(shuffle_files=False)

datasets.keys()

Out[59]:

dict_keys([Split('train'), Split('validation'), Split('test')])

In [60]:

#import tensorflow as tf
ds_train = datasets['train']
assert isinstance(ds_train, tf.data.Dataset)

example = next(iter(ds_train))
print(type(example))
print(example.keys())

<class 'dict'>
dict_keys(['attributes', 'identity', 'image', 'landmarks'])

In [61]:

ds_train = ds_train.map(lambda item:
     (item['image'], tf.cast(item['attributes']['Male'], tf.int32)))

In [62]:

ds_train = ds_train.batch(18)
images, labels = next(iter(ds_train))

print(images.shape, labels)

(18, 218, 178, 3) tf.Tensor([0 1 0 0 1 1 1 1 1 0 0 0 1 0 0 1 1 1], shape=(18,), dtype=int32)

In [63]:

fig = plt.figure(figsize=(12, 8))
for i,(image,label) in enumerate(zip(images, labels)):
    ax = fig.add_subplot(3, 6, i+1)
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(image)
    ax.set_title('{}'.format(label), size=15)

# plt.savefig('images/13_3.png', dpi=300)
plt.show()

데이터셋을 로드하는 또 다른 방법입니다.

In [64]:

mnist, mnist_info = tfds.load('mnist', with_info=True,
                              shuffle_files=False)

print(mnist_info)

print(mnist.keys())

Downloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /root/tensorflow_datasets/mnist/3.0.1...

Dl Completed...:   0%|          | 0/5 [00:00<?, ? file/s]

Dataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.
tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_dir='/root/tensorflow_datasets/mnist/3.0.1.incompleteL38ZZB',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=uint8),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)
dict_keys(['test', 'train'])

In [65]:

ds_train = mnist['train']

assert isinstance(ds_train, tf.data.Dataset)

ds_train = ds_train.map(lambda item:
     (item['image'], item['label']))

ds_train = ds_train.batch(10)
batch = next(iter(ds_train))
print(batch[0].shape, batch[1])

fig = plt.figure(figsize=(15, 6))
for i,(image,label) in enumerate(zip(batch[0], batch[1])):
    ax = fig.add_subplot(2, 5, i+1)
    ax.set_xticks([]); ax.set_yticks([])
    ax.imshow(image[:, :, 0], cmap='gray_r')
    ax.set_title('{}'.format(label), size=15)

# plt.savefig('images/13_4.png', dpi=300)
plt.show()

(10, 28, 28, 1) tf.Tensor([4 1 0 7 8 1 2 7 1 6], shape=(10,), dtype=int64)