아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.
주피터 노트북 뷰어로 보기 | 구글 코랩(Colab)에서 실행하기 |
tensorflow_datasets
라이브러리에서 데이터셋 로드하기from IPython.display import Image
Image(url='https://git.io/JL5iw', width=800)
Image(url='https://git.io/JL5io', width=500)
#! pip install tensorflow
import tensorflow as tf
print('텐서플로 버전:', tf.__version__)
import numpy as np
np.set_printoptions(precision=3)
텐서플로 버전: 2.14.0
! python -c 'import tensorflow as tf; print(tf.__version__)'
2023-11-10 05:38:23.439759: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-11-10 05:38:23.439825: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-11-10 05:38:23.439877: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-11-10 05:38:26.479280: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2.14.0
a = np.array([1, 2, 3], dtype=np.int32)
b = [4, 5, 6]
t_a = tf.convert_to_tensor(a)
t_b = tf.convert_to_tensor(b)
print(t_a)
print(t_b)
tf.Tensor([1 2 3], shape=(3,), dtype=int32) tf.Tensor([4 5 6], shape=(3,), dtype=int32)
tf.is_tensor(a), tf.is_tensor(t_a)
(False, True)
t_ones = tf.ones((2, 3))
t_ones.shape
TensorShape([2, 3])
t_ones.numpy()
array([[1., 1., 1.], [1., 1., 1.]], dtype=float32)
const_tensor = tf.constant([1.2, 5, np.pi], dtype=tf.float32)
print(const_tensor)
tf.Tensor([1.2 5. 3.142], shape=(3,), dtype=float32)
tf.convert_to_tensor
함수는 tf.constant
함수와 다르게 (잠시 후에 설명할) tf.Variable
객체도 입력받을 수 있습니다. 이외에도 tf.fill
함수와 tf.one_hot
함수를 사용하여 텐서를 만들 수 있습니다.
tf.fill
함수는 원하는 스칼라 값으로 채워진 텐서를 만듭니다. 첫 번째 매개변수에는 tf.ones
함수처럼 텐서의 크기를 전달합니다. 두 번째 매개변수에 채우고자 하는 스칼라 값을 전달합니다. 예를 들어 다음처럼 쓰면 tf.ones((2, 3))
과 같은 결과를 얻을 수 있습니다.
tf.fill((2, 3), 1)
<tf.Tensor: shape=(2, 3), dtype=int32, numpy= array([[1, 1, 1], [1, 1, 1]], dtype=int32)>
큰 사이즈의 텐서를 만들 때 tf.fill
함수가 tf.ones
보다 효율적입니다.
tf.one_hot
함수는 원-핫 인코딩 행렬을 만들어 주는 편리한 함수입니다. 첫 번째 매개변수에 원-핫 인코딩 위치를 나타내는 인덱스를 전달하고 두 번째 매개변수에는 원-핫 인코딩 벡터의 길이를 전달합니다. 만들어진 행렬의 크기는 (첫 번째 매개변수의 길이 $\times$ 두 번째 매개변수)가 됩니다. 예를 들어 다음과 같은 코드는 (3 $\times$ 4) 크기의 원-핫 인코딩 행렬을 만듭니다.
tf.one_hot([0, 1, 2], 4)
<tf.Tensor: shape=(3, 4), dtype=float32, numpy= array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.]], dtype=float32)>
텐서플로 2.4 버전부터 tf.experimental.numpy
모듈 아래에서 넘파이 호환 API를 제공합니다. 이 API는 넘파이 1.16 버전을 기반으로 하며 향후 버전에서 바뀔 수 있습니다. 전체 API 목록은 https://www.tensorflow.org/api_docs/python/tf/experimental/numpy%EB%A5%BC 참고하세요. 텐서플로 넘파이 API에 대한 설명은 https://www.tensorflow.org/guide/tf_numpy%EB%A5%BC 참고하세요.
t_a_new = tf.cast(t_a, tf.int64)
print(t_a_new.dtype)
<dtype: 'int64'>
t = tf.random.uniform(shape=(3, 5))
t_tr = tf.transpose(t)
print(t.shape, ' --> ', t_tr.shape)
(3, 5) --> (5, 3)
t = tf.zeros((30,))
t_reshape = tf.reshape(t, shape=(5, 6))
print(t_reshape.shape)
(5, 6)
t = tf.zeros((1, 2, 1, 4, 1))
t_sqz = tf.squeeze(t, axis=(2, 4))
print(t.shape, ' --> ', t_sqz.shape)
(1, 2, 1, 4, 1) --> (1, 2, 4)
tf.random.set_seed(1)
t1 = tf.random.uniform(shape=(5, 2),
minval=-1.0,
maxval=1.0)
t2 = tf.random.normal(shape=(5, 2),
mean=0.0,
stddev=1.0)
t3 = tf.multiply(t1, t2).numpy()
print(t3)
[[-0.27 -0.874] [-0.017 -0.175] [-0.296 -0.139] [-0.727 0.135] [-0.401 0.004]]
t4 = tf.math.reduce_mean(t1, axis=0)
print(t4)
tf.Tensor([0.09 0.207], shape=(2,), dtype=float32)
t5 = tf.linalg.matmul(t1, t2, transpose_b=True)
print(t5.numpy())
[[-1.144 1.115 -0.87 -0.321 0.856] [ 0.248 -0.191 0.25 -0.064 -0.331] [-0.478 0.407 -0.436 0.022 0.527] [ 0.525 -0.234 0.741 -0.593 -1.194] [-0.099 0.26 0.125 -0.462 -0.396]]
t6 = tf.linalg.matmul(t1, t2, transpose_a=True)
print(t6.numpy())
[[-1.711 0.302] [ 0.371 -1.049]]
norm_t1 = tf.norm(t1, ord=2, axis=1).numpy()
print(norm_t1)
[1.046 0.293 0.504 0.96 0.383]
np.sqrt(np.sum(np.square(t1), axis=1))
array([1.046, 0.293, 0.504, 0.96 , 0.383], dtype=float32)
넘파이 함수들은 입력 매개변수를 처리하기 전에 해당 객체의 __array__()
메서드를 호출합니다. 이를 통해 넘파이 호환성을 가진 객체를 만들 수 있습니다. 예를 들면 판다스의 Series
객체를 넘파이 API에 사용할 수 있습니다. 마찬가지로 텐서에도 이 메서드가 구현되어 있기 때문에 넘파이 함수에 텐서를 입력으로 사용할 수 있습니다.
많은 수학 함수는 최상위 수준에서 참조가 가능합니다. 예를 들면 tf.multiply()
, tf.reduce_mean()
, tf.reduce_sum()
, tf.matmul()
처럼 사용할 수 있습니다. 파이썬 3.5 버전부터는 @ 연산자를 사용하여 행렬 연산을 수행할 수 있습니다. 예를 들면 다음 계산은 위와 동일한 결과를 만듭니다.
t1 @ tf.transpose(t2)
<tf.Tensor: shape=(5, 5), dtype=float32, numpy= array([[-1.144, 1.115, -0.87 , -0.321, 0.856], [ 0.248, -0.191, 0.25 , -0.064, -0.331], [-0.478, 0.407, -0.436, 0.022, 0.527], [ 0.525, -0.234, 0.741, -0.593, -1.194], [-0.099, 0.26 , 0.125, -0.462, -0.396]], dtype=float32)>
텐서플로 2.4 버전부터 tf.experimental.numpy
모듈 아래에서 넘파이 호환 API를 제공합니다. 이 API는 넘파이 1.16 버전을 기반으로 하며 향후 버전에서 바뀔 수 있습니다. 전체 API 목록은 https://www.tensorflow.org/api_docs/python/tf/experimental/numpy%EB%A5%BC 참고하세요. 텐서플로 넘파이 API에 대한 설명은 https://www.tensorflow.org/guide/tf_numpy%EB%A5%BC 참고하세요. 텐서플로 넘파이 API를 사용하면 행렬 곱셈을 더 간단히 계산할 수 있습니다.
import tensorflow.experimental.numpy as tnp
tnp.experimental_enable_numpy_behavior()
tn1 = tnp.array(t1)
tn2 = tnp.array(t2)
print(tnp.dot(tn1, tn2.T))
tf.Tensor( [[-1.144 1.115 -0.87 -0.321 0.856] [ 0.248 -0.191 0.25 -0.064 -0.331] [-0.478 0.407 -0.436 0.022 0.527] [ 0.525 -0.234 0.741 -0.593 -1.194] [-0.099 0.26 0.125 -0.462 -0.396]], shape=(5, 5), dtype=float32)
텐서플로의 전체 수학 함수는 https://www.tensorflow.org/api_docs/python/tf/math%EB%A5%BC 참조하세요. 선형 대수 함수는 https://www.tensorflow.org/api_docs/python/tf/linalg%EC%97%90%EC%84%9C 볼 수 있습니다.
tf.random.set_seed(1)
t = tf.random.uniform((6,))
print(t.numpy())
t_splits = tf.split(t, 3)
[item.numpy() for item in t_splits]
[0.165 0.901 0.631 0.435 0.292 0.643]
[array([0.165, 0.901], dtype=float32), array([0.631, 0.435], dtype=float32), array([0.292, 0.643], dtype=float32)]
tf.random.set_seed(1)
t = tf.random.uniform((5,))
print(t.numpy())
t_splits = tf.split(t, num_or_size_splits=[3, 2])
[item.numpy() for item in t_splits]
[0.165 0.901 0.631 0.435 0.292]
[array([0.165, 0.901, 0.631], dtype=float32), array([0.435, 0.292], dtype=float32)]
A = tf.ones((3,))
B = tf.zeros((2,))
C = tf.concat([A, B], axis=0)
print(C.numpy())
[1. 1. 1. 0. 0.]
A = tf.ones((3,))
B = tf.zeros((3,))
S = tf.stack([A, B], axis=1)
print(S.numpy())
[[1. 0.] [1. 0.] [1. 0.]]
a = [1.2, 3.4, 7.5, 4.1, 5.0, 1.0]
ds = tf.data.Dataset.from_tensor_slices(a)
print(ds)
<_TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.float32, name=None)>
for item in ds:
print(item)
tf.Tensor(1.2, shape=(), dtype=float32) tf.Tensor(3.4, shape=(), dtype=float32) tf.Tensor(7.5, shape=(), dtype=float32) tf.Tensor(4.1, shape=(), dtype=float32) tf.Tensor(5.0, shape=(), dtype=float32) tf.Tensor(1.0, shape=(), dtype=float32)
ds_batch = ds.batch(3)
for i, elem in enumerate(ds_batch, 100):
print('batch {}:'.format(i), elem.numpy())
batch 100: [1.2 3.4 7.5] batch 101: [4.1 5. 1. ]
tf.random.set_seed(1)
t_x = tf.random.uniform([4, 3], dtype=tf.float32)
t_y = tf.range(4)
ds_x = tf.data.Dataset.from_tensor_slices(t_x)
ds_y = tf.data.Dataset.from_tensor_slices(t_y)
ds_joint = tf.data.Dataset.zip((ds_x, ds_y))
for example in ds_joint:
print(' x: ', example[0].numpy(),
' y: ', example[1].numpy())
x: [0.165 0.901 0.631] y: 0 x: [0.435 0.292 0.643] y: 1 x: [0.976 0.435 0.66 ] y: 2 x: [0.605 0.637 0.614] y: 3
## 방법 2:
ds_joint = tf.data.Dataset.from_tensor_slices((t_x, t_y))
for example in ds_joint:
print(' x: ', example[0].numpy(),
' y: ', example[1].numpy())
x: [0.165 0.901 0.631] y: 0 x: [0.435 0.292 0.643] y: 1 x: [0.976 0.435 0.66 ] y: 2 x: [0.605 0.637 0.614] y: 3
ds_trans = ds_joint.map(lambda x, y: (x*2-1.0, y))
for example in ds_trans:
print(' x: ', example[0].numpy(),
' y: ', example[1].numpy())
x: [-0.67 0.803 0.262] y: 0 x: [-0.131 -0.416 0.285] y: 1 x: [ 0.952 -0.13 0.32 ] y: 2 x: [0.21 0.273 0.229] y: 3
tf.random.set_seed(1)
ds = ds_joint.shuffle(buffer_size=len(t_x))
for example in ds:
print(' x: ', example[0].numpy(),
' y: ', example[1].numpy())
x: [0.976 0.435 0.66 ] y: 2 x: [0.435 0.292 0.643] y: 1 x: [0.165 0.901 0.631] y: 0 x: [0.605 0.637 0.614] y: 3
ds = ds_joint.batch(batch_size=3,
drop_remainder=False)
batch_x, batch_y = next(iter(ds))
print('배치 x: \n', batch_x.numpy())
print('배치 y: ', batch_y.numpy())
배치 x: [[0.165 0.901 0.631] [0.435 0.292 0.643] [0.976 0.435 0.66 ]] 배치 y: [0 1 2]
ds = ds_joint.batch(3).repeat(count=2)
for i,(batch_x, batch_y) in enumerate(ds):
print(i, batch_x.shape, batch_y.numpy())
0 (3, 3) [0 1 2] 1 (1, 3) [3] 2 (3, 3) [0 1 2] 3 (1, 3) [3]
ds = ds_joint.repeat(count=2).batch(3)
for i,(batch_x, batch_y) in enumerate(ds):
print(i, batch_x.shape, batch_y.numpy())
0 (3, 3) [0 1 2] 1 (3, 3) [3 0 1] 2 (2, 3) [2 3]
tf.random.set_seed(1)
## 순서 1: shuffle -> batch -> repeat
ds = ds_joint.shuffle(4).batch(2).repeat(3)
for i,(batch_x, batch_y) in enumerate(ds):
print(i, batch_x.shape, batch_y.numpy())
0 (2, 3) [2 1] 1 (2, 3) [0 3] 2 (2, 3) [0 3] 3 (2, 3) [1 2] 4 (2, 3) [3 0] 5 (2, 3) [1 2]
tf.random.set_seed(1)
## 순서 1: shuffle -> batch -> repeat
ds = ds_joint.shuffle(4).batch(2).repeat(20)
for i,(batch_x, batch_y) in enumerate(ds):
print(i, batch_x.shape, batch_y.numpy())
0 (2, 3) [2 1] 1 (2, 3) [0 3] 2 (2, 3) [0 3] 3 (2, 3) [1 2] 4 (2, 3) [3 0] 5 (2, 3) [1 2] 6 (2, 3) [1 3] 7 (2, 3) [2 0] 8 (2, 3) [1 2] 9 (2, 3) [3 0] 10 (2, 3) [3 0] 11 (2, 3) [2 1] 12 (2, 3) [3 0] 13 (2, 3) [1 2] 14 (2, 3) [3 0] 15 (2, 3) [2 1] 16 (2, 3) [2 3] 17 (2, 3) [0 1] 18 (2, 3) [1 2] 19 (2, 3) [0 3] 20 (2, 3) [0 1] 21 (2, 3) [2 3] 22 (2, 3) [3 2] 23 (2, 3) [0 1] 24 (2, 3) [3 0] 25 (2, 3) [1 2] 26 (2, 3) [1 3] 27 (2, 3) [2 0] 28 (2, 3) [2 1] 29 (2, 3) [0 3] 30 (2, 3) [2 3] 31 (2, 3) [0 1] 32 (2, 3) [3 1] 33 (2, 3) [2 0] 34 (2, 3) [3 2] 35 (2, 3) [1 0] 36 (2, 3) [3 0] 37 (2, 3) [2 1] 38 (2, 3) [0 2] 39 (2, 3) [3 1]
tf.random.set_seed(1)
## 순서 2: batch -> shuffle -> repeat
ds = ds_joint.batch(2).shuffle(4).repeat(3)
for i,(batch_x, batch_y) in enumerate(ds):
print(i, batch_x.shape, batch_y.numpy())
0 (2, 3) [0 1] 1 (2, 3) [2 3] 2 (2, 3) [0 1] 3 (2, 3) [2 3] 4 (2, 3) [2 3] 5 (2, 3) [0 1]
tf.random.set_seed(1)
## 순서 2: batch -> shuffle -> repeat
ds = ds_joint.batch(2).shuffle(4).repeat(20)
for i,(batch_x, batch_y) in enumerate(ds):
print(i, batch_x.shape, batch_y.numpy())
0 (2, 3) [0 1] 1 (2, 3) [2 3] 2 (2, 3) [0 1] 3 (2, 3) [2 3] 4 (2, 3) [2 3] 5 (2, 3) [0 1] 6 (2, 3) [2 3] 7 (2, 3) [0 1] 8 (2, 3) [2 3] 9 (2, 3) [0 1] 10 (2, 3) [2 3] 11 (2, 3) [0 1] 12 (2, 3) [2 3] 13 (2, 3) [0 1] 14 (2, 3) [2 3] 15 (2, 3) [0 1] 16 (2, 3) [0 1] 17 (2, 3) [2 3] 18 (2, 3) [2 3] 19 (2, 3) [0 1] 20 (2, 3) [0 1] 21 (2, 3) [2 3] 22 (2, 3) [2 3] 23 (2, 3) [0 1] 24 (2, 3) [2 3] 25 (2, 3) [0 1] 26 (2, 3) [2 3] 27 (2, 3) [0 1] 28 (2, 3) [0 1] 29 (2, 3) [2 3] 30 (2, 3) [0 1] 31 (2, 3) [2 3] 32 (2, 3) [2 3] 33 (2, 3) [0 1] 34 (2, 3) [2 3] 35 (2, 3) [0 1] 36 (2, 3) [2 3] 37 (2, 3) [0 1] 38 (2, 3) [0 1] 39 (2, 3) [2 3]
tf.random.set_seed(1)
## 순서 3: batch -> repeat -> shuffle
ds = ds_joint.batch(2).repeat(3).shuffle(4)
for i,(batch_x, batch_y) in enumerate(ds):
print(i, batch_x.shape, batch_y.numpy())
0 (2, 3) [0 1] 1 (2, 3) [0 1] 2 (2, 3) [2 3] 3 (2, 3) [2 3] 4 (2, 3) [0 1] 5 (2, 3) [2 3]
# 코랩에서 실행하는 경우에는 다음 코드를 실행하여 이미지를 다운로드받으세요.
!mkdir cat_dog_images
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-01.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-02.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-03.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-01.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-02.jpg
!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-03.jpg
!mv *.jpg cat_dog_images/
--2023-11-10 05:38:42-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-01.jpg Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 139707 (136K) [image/jpeg] Saving to: ‘cat-01.jpg’ cat-01.jpg 100%[===================>] 136.43K --.-KB/s in 0.01s 2023-11-10 05:38:43 (9.25 MB/s) - ‘cat-01.jpg’ saved [139707/139707] --2023-11-10 05:38:43-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-02.jpg Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 152296 (149K) [image/jpeg] Saving to: ‘cat-02.jpg’ cat-02.jpg 100%[===================>] 148.73K --.-KB/s in 0.02s 2023-11-10 05:38:43 (7.50 MB/s) - ‘cat-02.jpg’ saved [152296/152296] --2023-11-10 05:38:43-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/cat-03.jpg Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 122677 (120K) [image/jpeg] Saving to: ‘cat-03.jpg’ cat-03.jpg 100%[===================>] 119.80K --.-KB/s in 0.02s 2023-11-10 05:38:43 (6.57 MB/s) - ‘cat-03.jpg’ saved [122677/122677] --2023-11-10 05:38:43-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-01.jpg Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 135505 (132K) [image/jpeg] Saving to: ‘dog-01.jpg’ dog-01.jpg 100%[===================>] 132.33K --.-KB/s in 0.02s 2023-11-10 05:38:44 (7.73 MB/s) - ‘dog-01.jpg’ saved [135505/135505] --2023-11-10 05:38:44-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-02.jpg Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 258155 (252K) [image/jpeg] Saving to: ‘dog-02.jpg’ dog-02.jpg 100%[===================>] 252.10K --.-KB/s in 0.03s 2023-11-10 05:38:44 (8.44 MB/s) - ‘dog-02.jpg’ saved [258155/258155] --2023-11-10 05:38:44-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch13/cat_dog_images/dog-03.jpg Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 213035 (208K) [image/jpeg] Saving to: ‘dog-03.jpg’ dog-03.jpg 100%[===================>] 208.04K --.-KB/s in 0.02s 2023-11-10 05:38:44 (9.34 MB/s) - ‘dog-03.jpg’ saved [213035/213035]
import pathlib
imgdir_path = pathlib.Path('cat_dog_images')
file_list = sorted([str(path) for path in imgdir_path.glob('*.jpg')])
print(file_list)
['cat_dog_images/cat-01.jpg', 'cat_dog_images/cat-02.jpg', 'cat_dog_images/cat-03.jpg', 'cat_dog_images/dog-01.jpg', 'cat_dog_images/dog-02.jpg', 'cat_dog_images/dog-03.jpg']
import matplotlib.pyplot as plt
import os
fig = plt.figure(figsize=(10, 5))
for i,file in enumerate(file_list):
img_raw = tf.io.read_file(file)
img = tf.image.decode_image(img_raw)
print('이미지 크기: ', img.shape)
ax = fig.add_subplot(2, 3, i+1)
ax.set_xticks([]); ax.set_yticks([])
ax.imshow(img)
ax.set_title(os.path.basename(file), size=15)
# plt.savefig('images/13_1.png', dpi=300)
plt.tight_layout()
plt.show()
이미지 크기: (900, 1200, 3) 이미지 크기: (900, 1200, 3) 이미지 크기: (900, 742, 3) 이미지 크기: (800, 1200, 3) 이미지 크기: (800, 1200, 3) 이미지 크기: (900, 1200, 3)
labels = [1 if 'dog' in os.path.basename(file) else 0
for file in file_list]
print(labels)
[0, 0, 0, 1, 1, 1]
ds_files_labels = tf.data.Dataset.from_tensor_slices(
(file_list, labels))
for item in ds_files_labels:
print(item[0].numpy(), item[1].numpy())
b'cat_dog_images/cat-01.jpg' 0 b'cat_dog_images/cat-02.jpg' 0 b'cat_dog_images/cat-03.jpg' 0 b'cat_dog_images/dog-01.jpg' 1 b'cat_dog_images/dog-02.jpg' 1 b'cat_dog_images/dog-03.jpg' 1
def load_and_preprocess(path, label):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [img_height, img_width])
image /= 255.0
return image, label
img_width, img_height = 120, 80
ds_images_labels = ds_files_labels.map(load_and_preprocess)
fig = plt.figure(figsize=(10, 5))
for i,example in enumerate(ds_images_labels):
print(example[0].shape, example[1].numpy())
ax = fig.add_subplot(2, 3, i+1)
ax.set_xticks([]); ax.set_yticks([])
ax.imshow(example[0])
ax.set_title('{}'.format(example[1].numpy()),
size=15)
plt.tight_layout()
# plt.savefig('images/13_2.png', dpi=300)
plt.show()
(80, 120, 3) 0 (80, 120, 3) 0 (80, 120, 3) 0 (80, 120, 3) 1 (80, 120, 3) 1 (80, 120, 3) 1
tensorflow_datasets
라이브러리에서 데이터셋 로드하기¶코랩에는 이미 tensorflow-datasets
라이브러리가 설치되어 있지만 최신 버전으로 업그레이드해주세요. 설치 후에는 커널(또는 코랩 런타임)을 재시작해야합니다.
!pip install --upgrade tensorflow-datasets
Requirement already satisfied: tensorflow-datasets in /usr/local/lib/python3.10/dist-packages (4.9.3) Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.4.0) Requirement already satisfied: array-record in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.5.0) Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (8.1.7) Requirement already satisfied: dm-tree in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.1.8) Requirement already satisfied: etils[enp,epath,etree]>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.5.2) Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.23.5) Requirement already satisfied: promise in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (2.3) Requirement already satisfied: protobuf>=3.20 in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (3.20.3) Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (5.9.5) Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (2.31.0) Requirement already satisfied: tensorflow-metadata in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.14.0) Requirement already satisfied: termcolor in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (2.3.0) Requirement already satisfied: toml in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (0.10.2) Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (4.66.1) Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from tensorflow-datasets) (1.14.1) Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets) (2023.6.0) Requirement already satisfied: importlib_resources in /usr/local/lib/python3.10/dist-packages (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets) (6.1.1) Requirement already satisfied: typing_extensions in /usr/local/lib/python3.10/dist-packages (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets) (4.5.0) Requirement already satisfied: zipp in /usr/local/lib/python3.10/dist-packages (from etils[enp,epath,etree]>=0.9.0->tensorflow-datasets) (3.17.0) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->tensorflow-datasets) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->tensorflow-datasets) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->tensorflow-datasets) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->tensorflow-datasets) (2023.7.22) Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from promise->tensorflow-datasets) (1.16.0) Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-metadata->tensorflow-datasets) (1.61.0)
import tensorflow_datasets as tfds
print(len(tfds.list_builders()))
print(tfds.list_builders()[:5])
1210 ['abstract_reasoning', 'accentdb', 'aeslc', 'aflw2k3d', 'ag_news_subset']
## 다음 명령을 실행하면 전체 리스트를 얻을 수 있습니다
tfds.list_builders()
['abstract_reasoning', 'accentdb', 'aeslc', 'aflw2k3d', 'ag_news_subset', 'ai2_arc', 'ai2_arc_with_ir', 'amazon_us_reviews', 'anli', 'answer_equivalence', 'arc', 'asqa', 'asset', 'assin2', 'bair_robot_pushing_small', 'bccd', 'beans', 'bee_dataset', 'beir', 'big_patent', 'bigearthnet', 'billsum', 'binarized_mnist', 'binary_alpha_digits', 'ble_wind_field', 'blimp', 'booksum', 'bool_q', 'bot_adversarial_dialogue', 'bucc', 'c4', 'c4_wsrs', 'caltech101', 'caltech_birds2010', 'caltech_birds2011', 'cardiotox', 'cars196', 'cassava', 'cats_vs_dogs', 'celeb_a', 'celeb_a_hq', 'cfq', 'cherry_blossoms', 'chexpert', 'cifar10', 'cifar100', 'cifar100_n', 'cifar10_1', 'cifar10_corrupted', 'cifar10_h', 'cifar10_n', 'citrus_leaves', 'cityscapes', 'civil_comments', 'clevr', 'clic', 'clinc_oos', 'cmaterdb', 'cnn_dailymail', 'coco', 'coco_captions', 'coil100', 'colorectal_histology', 'colorectal_histology_large', 'common_voice', 'conll2002', 'conll2003', 'controlled_noisy_web_labels', 'coqa', 'corr2cause', 'cos_e', 'cosmos_qa', 'covid19', 'covid19sum', 'crema_d', 'criteo', 'cs_restaurants', 'curated_breast_imaging_ddsm', 'cycle_gan', 'd4rl_adroit_door', 'd4rl_adroit_hammer', 'd4rl_adroit_pen', 'd4rl_adroit_relocate', 'd4rl_antmaze', 'd4rl_mujoco_ant', 'd4rl_mujoco_halfcheetah', 'd4rl_mujoco_hopper', 'd4rl_mujoco_walker2d', 'dart', 'databricks_dolly', 'davis', 'deep1b', 'deep_weeds', 'definite_pronoun_resolution', 'dementiabank', 'diabetic_retinopathy_detection', 'diamonds', 'div2k', 'dmlab', 'doc_nli', 'dolphin_number_word', 'domainnet', 'downsampled_imagenet', 'drop', 'dsprites', 'dtd', 'duke_ultrasound', 'e2e_cleaned', 'efron_morris75', 'emnist', 'eraser_multi_rc', 'esnli', 'eurosat', 'fashion_mnist', 'flic', 'flores', 'food101', 'forest_fires', 'fuss', 'gap', 'geirhos_conflict_stimuli', 'gem', 'genomics_ood', 'german_credit_numeric', 'gigaword', 'glove100_angular', 'glue', 'goemotions', 'gov_report', 'gpt3', 'gref', 'groove', 'grounded_scan', 'gsm8k', 'gtzan', 'gtzan_music_speech', 'hellaswag', 'higgs', 'hillstrom', 'horses_or_humans', 'howell', 'i_naturalist2017', 'i_naturalist2018', 'i_naturalist2021', 'imagenet2012', 'imagenet2012_corrupted', 'imagenet2012_fewshot', 'imagenet2012_multilabel', 'imagenet2012_real', 'imagenet2012_subset', 'imagenet_a', 'imagenet_lt', 'imagenet_pi', 'imagenet_r', 'imagenet_resized', 'imagenet_sketch', 'imagenet_v2', 'imagenette', 'imagewang', 'imdb_reviews', 'irc_disentanglement', 'iris', 'istella', 'kddcup99', 'kitti', 'kmnist', 'laion400m', 'lambada', 'lfw', 'librispeech', 'librispeech_lm', 'libritts', 'ljspeech', 'lm1b', 'locomotion', 'lost_and_found', 'lsun', 'lvis', 'malaria', 'math_dataset', 'math_qa', 'mctaco', 'media_sum', 'mlqa', 'mnist', 'mnist_corrupted', 'movie_lens', 'movie_rationales', 'movielens', 'moving_mnist', 'mrqa', 'mslr_web', 'mt_opt', 'mtnt', 'multi_news', 'multi_nli', 'multi_nli_mismatch', 'natural_instructions', 'natural_questions', 'natural_questions_open', 'newsroom', 'nsynth', 'nyu_depth_v2', 'ogbg_molpcba', 'omniglot', 'open_images_challenge2019_detection', 'open_images_v4', 'openbookqa', 'opinion_abstracts', 'opinosis', 'opus', 'oxford_flowers102', 'oxford_iiit_pet', 'para_crawl', 'pass', 'patch_camelyon', 'paws_wiki', 'paws_x_wiki', 'penguins', 'pet_finder', 'pg19', 'piqa', 'places365_small', 'placesfull', 'plant_leaves', 'plant_village', 'plantae_k', 'protein_net', 'q_re_cc', 'qa4mre', 'qasc', 'quac', 'quality', 'quickdraw_bitmap', 'race', 'radon', 'real_toxicity_prompts', 'reddit', 'reddit_disentanglement', 'reddit_tifu', 'ref_coco', 'resisc45', 'rlu_atari', 'rlu_atari_checkpoints', 'rlu_atari_checkpoints_ordered', 'rlu_control_suite', 'rlu_dmlab_explore_object_rewards_few', 'rlu_dmlab_explore_object_rewards_many', 'rlu_dmlab_rooms_select_nonmatching_object', 'rlu_dmlab_rooms_watermaze', 'rlu_dmlab_seekavoid_arena01', 'rlu_locomotion', 'rlu_rwrl', 'robomimic_mg', 'robomimic_mh', 'robomimic_ph', 'robonet', 'robosuite_panda_pick_place_can', 'rock_paper_scissors', 'rock_you', 's3o4d', 'salient_span_wikipedia', 'samsum', 'savee', 'scan', 'scene_parse150', 'schema_guided_dialogue', 'sci_tail', 'scicite', 'scientific_papers', 'scrolls', 'segment_anything', 'sentiment140', 'shapes3d', 'sift1m', 'simpte', 'siscore', 'smallnorb', 'smartwatch_gestures', 'snli', 'so2sat', 'speech_commands', 'spoken_digit', 'squad', 'squad_question_generation', 'stanford_dogs', 'stanford_online_products', 'star_cfq', 'starcraft_video', 'stl10', 'story_cloze', 'summscreen', 'sun397', 'super_glue', 'svhn_cropped', 'symmetric_solids', 'tao', 'tatoeba', 'ted_hrlr_translate', 'ted_multi_translate', 'tedlium', 'tf_flowers', 'the300w_lp', 'tiny_shakespeare', 'titanic', 'trec', 'trivia_qa', 'tydi_qa', 'uc_merced', 'ucf101', 'unified_qa', 'universal_dependencies', 'unnatural_instructions', 'user_libri_audio', 'user_libri_text', 'vctk', 'visual_domain_decathlon', 'voc', 'voxceleb', 'voxforge', 'waymo_open_dataset', 'web_graph', 'web_nlg', 'web_questions', 'webvid', 'wider_face', 'wiki40b', 'wiki_auto', 'wiki_bio', 'wiki_dialog', 'wiki_table_questions', 'wiki_table_text', 'wikiann', 'wikihow', 'wikipedia', 'wikipedia_toxicity_subtypes', 'wine_quality', 'winogrande', 'wit', 'wit_kaggle', 'wmt13_translate', 'wmt14_translate', 'wmt15_translate', 'wmt16_translate', 'wmt17_translate', 'wmt18_translate', 'wmt19_translate', 'wmt_t2t_translate', 'wmt_translate', 'wordnet', 'wsc273', 'xnli', 'xquad', 'xsum', 'xtreme_pawsx', 'xtreme_pos', 'xtreme_s', 'xtreme_xnli', 'yahoo_ltrc', 'yelp_polarity_reviews', 'yes_no', 'youtube_vis', 'huggingface:acronym_identification', 'huggingface:ade_corpus_v2', 'huggingface:adv_glue', 'huggingface:adversarial_qa', 'huggingface:aeslc', 'huggingface:afrikaans_ner_corpus', 'huggingface:ag_news', 'huggingface:ai2_arc', 'huggingface:air_dialogue', 'huggingface:ajgt_twitter_ar', 'huggingface:allegro_reviews', 'huggingface:allocine', 'huggingface:alt', 'huggingface:amazon_polarity', 'huggingface:amazon_reviews_multi', 'huggingface:amazon_us_reviews', 'huggingface:ambig_qa', 'huggingface:americas_nli', 'huggingface:ami', 'huggingface:amttl', 'huggingface:anli', 'huggingface:app_reviews', 'huggingface:aqua_rat', 'huggingface:aquamuse', 'huggingface:ar_cov19', 'huggingface:ar_res_reviews', 'huggingface:ar_sarcasm', 'huggingface:arabic_billion_words', 'huggingface:arabic_pos_dialect', 'huggingface:arabic_speech_corpus', 'huggingface:arcd', 'huggingface:arsentd_lev', 'huggingface:art', 'huggingface:arxiv_dataset', 'huggingface:ascent_kb', 'huggingface:aslg_pc12', 'huggingface:asnq', 'huggingface:asset', 'huggingface:assin', 'huggingface:assin2', 'huggingface:atomic', 'huggingface:autshumato', 'huggingface:babi_qa', 'huggingface:banking77', 'huggingface:bbaw_egyptian', 'huggingface:bbc_hindi_nli', 'huggingface:bc2gm_corpus', 'huggingface:beans', 'huggingface:best2009', 'huggingface:bianet', 'huggingface:bible_para', 'huggingface:big_patent', 'huggingface:bigbench', 'huggingface:billsum', 'huggingface:bing_coronavirus_query_set', 'huggingface:biomrc', 'huggingface:biosses', 'huggingface:biwi_kinect_head_pose', 'huggingface:blbooks', 'huggingface:blbooksgenre', 'huggingface:blended_skill_talk', 'huggingface:blimp', 'huggingface:blog_authorship_corpus', 'huggingface:bn_hate_speech', 'huggingface:bnl_newspapers', 'huggingface:bookcorpus', 'huggingface:bookcorpusopen', 'huggingface:boolq', 'huggingface:bprec', 'huggingface:break_data', 'huggingface:brwac', 'huggingface:bsd_ja_en', 'huggingface:bswac', 'huggingface:c3', 'huggingface:c4', 'huggingface:cail2018', 'huggingface:caner', 'huggingface:capes', 'huggingface:casino', 'huggingface:catalonia_independence', 'huggingface:cats_vs_dogs', 'huggingface:cawac', 'huggingface:cbt', 'huggingface:cc100', 'huggingface:cc_news', 'huggingface:ccaligned_multilingual', 'huggingface:cdsc', 'huggingface:cdt', 'huggingface:cedr', 'huggingface:cfq', 'huggingface:chr_en', 'huggingface:cifar10', 'huggingface:cifar100', 'huggingface:circa', 'huggingface:civil_comments', 'huggingface:clickbait_news_bg', 'huggingface:climate_fever', 'huggingface:clinc_oos', 'huggingface:clue', 'huggingface:cmrc2018', 'huggingface:cmu_hinglish_dog', 'huggingface:cnn_dailymail', 'huggingface:coached_conv_pref', 'huggingface:coarse_discourse', 'huggingface:codah', 'huggingface:code_search_net', 'huggingface:code_x_glue_cc_clone_detection_big_clone_bench', 'huggingface:code_x_glue_cc_clone_detection_poj104', 'huggingface:code_x_glue_cc_cloze_testing_all', 'huggingface:code_x_glue_cc_cloze_testing_maxmin', 'huggingface:code_x_glue_cc_code_completion_line', 'huggingface:code_x_glue_cc_code_completion_token', 'huggingface:code_x_glue_cc_code_refinement', 'huggingface:code_x_glue_cc_code_to_code_trans', 'huggingface:code_x_glue_cc_defect_detection', 'huggingface:code_x_glue_ct_code_to_text', 'huggingface:code_x_glue_tc_nl_code_search_adv', 'huggingface:code_x_glue_tc_text_to_code', 'huggingface:code_x_glue_tt_text_to_text', 'huggingface:com_qa', 'huggingface:common_gen', 'huggingface:common_language', 'huggingface:common_voice', 'huggingface:commonsense_qa', 'huggingface:competition_math', 'huggingface:compguesswhat', 'huggingface:conceptnet5', 'huggingface:conceptual_12m', 'huggingface:conceptual_captions', 'huggingface:conll2000', 'huggingface:conll2002', 'huggingface:conll2003', 'huggingface:conll2012_ontonotesv5', 'huggingface:conllpp', 'huggingface:consumer-finance-complaints', 'huggingface:conv_ai', 'huggingface:conv_ai_2', 'huggingface:conv_ai_3', 'huggingface:conv_questions', 'huggingface:coqa', 'huggingface:cord19', 'huggingface:cornell_movie_dialog', 'huggingface:cos_e', 'huggingface:cosmos_qa', 'huggingface:counter', 'huggingface:covid_qa_castorini', 'huggingface:covid_qa_deepset', 'huggingface:covid_qa_ucsd', 'huggingface:covid_tweets_japanese', 'huggingface:covost2', 'huggingface:cppe-5', 'huggingface:craigslist_bargains', 'huggingface:crawl_domain', 'huggingface:crd3', 'huggingface:crime_and_punish', 'huggingface:crows_pairs', 'huggingface:cryptonite', 'huggingface:cs_restaurants', 'huggingface:cuad', 'huggingface:curiosity_dialogs', 'huggingface:daily_dialog', 'huggingface:dane', 'huggingface:danish_political_comments', 'huggingface:dart', 'huggingface:datacommons_factcheck', 'huggingface:dbpedia_14', 'huggingface:dbrd', 'huggingface:deal_or_no_dialog', 'huggingface:definite_pronoun_resolution', 'huggingface:dengue_filipino', 'huggingface:dialog_re', 'huggingface:diplomacy_detection', 'huggingface:disaster_response_messages', 'huggingface:discofuse', 'huggingface:discovery', 'huggingface:disfl_qa', 'huggingface:doc2dial', 'huggingface:docred', 'huggingface:doqa', 'huggingface:dream', 'huggingface:drop', 'huggingface:duorc', 'huggingface:dutch_social', 'huggingface:dyk', 'huggingface:e2e_nlg', 'huggingface:e2e_nlg_cleaned', 'huggingface:ecb', 'huggingface:ecthr_cases', 'huggingface:eduge', 'huggingface:ehealth_kd', 'huggingface:eitb_parcc', 'huggingface:electricity_load_diagrams', 'huggingface:eli5', 'huggingface:eli5_category', 'huggingface:elkarhizketak', 'huggingface:emea', 'huggingface:emo', 'huggingface:emotion', 'huggingface:emotone_ar', 'huggingface:empathetic_dialogues', 'huggingface:enriched_web_nlg', 'huggingface:enwik8', 'huggingface:eraser_multi_rc', 'huggingface:esnli', 'huggingface:eth_py150_open', 'huggingface:ethos', 'huggingface:ett', 'huggingface:eu_regulatory_ir', 'huggingface:eurlex', 'huggingface:euronews', 'huggingface:europa_eac_tm', 'huggingface:europa_ecdc_tm', 'huggingface:europarl_bilingual', 'huggingface:event2Mind', 'huggingface:evidence_infer_treatment', 'huggingface:exams', 'huggingface:factckbr', 'huggingface:fake_news_english', 'huggingface:fake_news_filipino', 'huggingface:farsi_news', 'huggingface:fashion_mnist', 'huggingface:fever', 'huggingface:few_rel', 'huggingface:financial_phrasebank', 'huggingface:finer', 'huggingface:flores', 'huggingface:flue', 'huggingface:food101', 'huggingface:fquad', 'huggingface:freebase_qa', 'huggingface:gap', 'huggingface:gem', 'huggingface:generated_reviews_enth', 'huggingface:generics_kb', 'huggingface:german_legal_entity_recognition', 'huggingface:germaner', 'huggingface:germeval_14', 'huggingface:giga_fren', 'huggingface:gigaword', 'huggingface:glucose', 'huggingface:glue', 'huggingface:gnad10', 'huggingface:go_emotions', 'huggingface:gooaq', 'huggingface:google_wellformed_query', 'huggingface:grail_qa', 'huggingface:great_code', 'huggingface:greek_legal_code', 'huggingface:gsm8k', 'huggingface:guardian_authorship', 'huggingface:gutenberg_time', 'huggingface:hans', 'huggingface:hansards', 'huggingface:hard', 'huggingface:harem', 'huggingface:has_part', 'huggingface:hate_offensive', 'huggingface:hate_speech18', 'huggingface:hate_speech_filipino', 'huggingface:hate_speech_offensive', 'huggingface:hate_speech_pl', 'huggingface:hate_speech_portuguese', 'huggingface:hatexplain', 'huggingface:hausa_voa_ner', 'huggingface:hausa_voa_topics', 'huggingface:hda_nli_hindi', 'huggingface:head_qa', 'huggingface:health_fact', 'huggingface:hebrew_projectbenyehuda', 'huggingface:hebrew_sentiment', 'huggingface:hebrew_this_world', 'huggingface:hellaswag', 'huggingface:hendrycks_test', 'huggingface:hind_encorp', 'huggingface:hindi_discourse', 'huggingface:hippocorpus', 'huggingface:hkcancor', 'huggingface:hlgd', 'huggingface:hope_edi', 'huggingface:hotpot_qa', 'huggingface:hover', 'huggingface:hrenwac_para', 'huggingface:hrwac', 'huggingface:humicroedit', 'huggingface:hybrid_qa', 'huggingface:hyperpartisan_news_detection', 'huggingface:iapp_wiki_qa_squad', 'huggingface:id_clickbait', 'huggingface:id_liputan6', 'huggingface:id_nergrit_corpus', 'huggingface:id_newspapers_2018', 'huggingface:id_panl_bppt', 'huggingface:id_puisi', 'huggingface:igbo_english_machine_translation', 'huggingface:igbo_monolingual', 'huggingface:igbo_ner', 'huggingface:ilist', 'huggingface:imagenet-1k', 'huggingface:imagenet_sketch', 'huggingface:imdb', 'huggingface:imdb_urdu_reviews', 'huggingface:imppres', 'huggingface:indic_glue', 'huggingface:indonli', 'huggingface:indonlu', 'huggingface:inquisitive_qg', 'huggingface:interpress_news_category_tr', 'huggingface:interpress_news_category_tr_lite', 'huggingface:irc_disentangle', 'huggingface:isixhosa_ner_corpus', 'huggingface:isizulu_ner_corpus', 'huggingface:iwslt2017', 'huggingface:jeopardy', 'huggingface:jfleg', 'huggingface:jigsaw_toxicity_pred', 'huggingface:jigsaw_unintended_bias', 'huggingface:jnlpba', 'huggingface:journalists_questions', 'huggingface:kan_hope', 'huggingface:kannada_news', 'huggingface:kd_conv', 'huggingface:kde4', 'huggingface:kelm', 'huggingface:kilt_tasks', 'huggingface:kilt_wikipedia', 'huggingface:kinnews_kirnews', 'huggingface:klue', 'huggingface:kor_3i4k', 'huggingface:kor_hate', 'huggingface:kor_ner', 'huggingface:kor_nli', 'huggingface:kor_nlu', 'huggingface:kor_qpair', 'huggingface:kor_sae', 'huggingface:kor_sarcasm', 'huggingface:labr', 'huggingface:lama', 'huggingface:lambada', 'huggingface:large_spanish_corpus', 'huggingface:laroseda', 'huggingface:lc_quad', 'huggingface:lccc', 'huggingface:lener_br', 'huggingface:lex_glue', 'huggingface:liar', 'huggingface:librispeech_asr', 'huggingface:librispeech_lm', 'huggingface:limit', 'huggingface:lince', 'huggingface:linnaeus', 'huggingface:liveqa', 'huggingface:lj_speech', 'huggingface:lm1b', 'huggingface:lst20', 'huggingface:m_lama', 'huggingface:mac_morpho', 'huggingface:makhzan', 'huggingface:masakhaner', 'huggingface:math_dataset', 'huggingface:math_qa', 'huggingface:matinf', 'huggingface:mbpp', 'huggingface:mc4', 'huggingface:mc_taco', 'huggingface:md_gender_bias', 'huggingface:mdd', 'huggingface:med_hop', 'huggingface:medal', 'huggingface:medical_dialog', 'huggingface:medical_questions_pairs', 'huggingface:medmcqa', 'huggingface:menyo20k_mt', 'huggingface:meta_woz', 'huggingface:metashift', 'huggingface:metooma', 'huggingface:metrec', 'huggingface:miam', 'huggingface:mkb', 'huggingface:mkqa', 'huggingface:mlqa', 'huggingface:mlsum', 'huggingface:mnist', 'huggingface:mocha', 'huggingface:monash_tsf', 'huggingface:moroco', 'huggingface:movie_rationales', 'huggingface:mrqa', 'huggingface:ms_marco', 'huggingface:ms_terms', 'huggingface:msr_genomics_kbcomp', 'huggingface:msr_sqa', 'huggingface:msr_text_compression', 'huggingface:msr_zhen_translation_parity', 'huggingface:msra_ner', 'huggingface:mt_eng_vietnamese', 'huggingface:muchocine', 'huggingface:multi_booked', 'huggingface:multi_eurlex', 'huggingface:multi_news', 'huggingface:multi_nli', 'huggingface:multi_nli_mismatch', 'huggingface:multi_para_crawl', 'huggingface:multi_re_qa', 'huggingface:multi_woz_v22', 'huggingface:multi_x_science_sum', 'huggingface:multidoc2dial', 'huggingface:multilingual_librispeech', 'huggingface:mutual_friends', 'huggingface:mwsc', 'huggingface:myanmar_news', 'huggingface:narrativeqa', 'huggingface:narrativeqa_manual', 'huggingface:natural_questions', 'huggingface:ncbi_disease', 'huggingface:nchlt', 'huggingface:ncslgr', 'huggingface:nell', 'huggingface:neural_code_search', 'huggingface:news_commentary', 'huggingface:newsgroup', 'huggingface:newsph', 'huggingface:newsph_nli', 'huggingface:newspop', 'huggingface:newsqa', 'huggingface:newsroom', 'huggingface:nkjp-ner', 'huggingface:nli_tr', 'huggingface:nlu_evaluation_data', 'huggingface:norec', 'huggingface:norne', 'huggingface:norwegian_ner', 'huggingface:nq_open', 'huggingface:nsmc', 'huggingface:numer_sense', 'huggingface:numeric_fused_head', 'huggingface:oclar', 'huggingface:offcombr', 'huggingface:offenseval2020_tr', 'huggingface:offenseval_dravidian', 'huggingface:ofis_publik', 'huggingface:ohsumed', 'huggingface:ollie', 'huggingface:omp', 'huggingface:onestop_english', 'huggingface:onestop_qa', 'huggingface:open_subtitles', 'huggingface:openai_humaneval', 'huggingface:openbookqa', 'huggingface:openslr', 'huggingface:openwebtext', 'huggingface:opinosis', 'huggingface:opus100', 'huggingface:opus_books', 'huggingface:opus_dgt', 'huggingface:opus_dogc', 'huggingface:opus_elhuyar', 'huggingface:opus_euconst', 'huggingface:opus_finlex', 'huggingface:opus_fiskmo', 'huggingface:opus_gnome', 'huggingface:opus_infopankki', 'huggingface:opus_memat', 'huggingface:opus_montenegrinsubs', 'huggingface:opus_openoffice', 'huggingface:opus_paracrawl', 'huggingface:opus_rf', 'huggingface:opus_tedtalks', 'huggingface:opus_ubuntu', 'huggingface:opus_wikipedia', 'huggingface:opus_xhosanavy', 'huggingface:orange_sum', 'huggingface:oscar', 'huggingface:para_crawl', 'huggingface:para_pat', 'huggingface:parsinlu_reading_comprehension', 'huggingface:pass', 'huggingface:paws', 'huggingface:paws-x', 'huggingface:pec', 'huggingface:peer_read', 'huggingface:peoples_daily_ner', 'huggingface:per_sent', 'huggingface:persian_ner', 'huggingface:pg19', 'huggingface:php', 'huggingface:piaf', 'huggingface:pib', 'huggingface:piqa', 'huggingface:pn_summary', 'huggingface:poem_sentiment', 'huggingface:polemo2', 'huggingface:poleval2019_cyberbullying', 'huggingface:poleval2019_mt', 'huggingface:polsum', 'huggingface:polyglot_ner', 'huggingface:prachathai67k', 'huggingface:pragmeval', 'huggingface:proto_qa', 'huggingface:psc', 'huggingface:ptb_text_only', 'huggingface:pubmed', 'huggingface:pubmed_qa', 'huggingface:py_ast', 'huggingface:qa4mre', 'huggingface:qa_srl', 'huggingface:qa_zre', 'huggingface:qangaroo', 'huggingface:qanta', 'huggingface:qasc', 'huggingface:qasper', 'huggingface:qed', 'huggingface:qed_amara', 'huggingface:quac', 'huggingface:quail', 'huggingface:quarel', 'huggingface:quartz', 'huggingface:quickdraw', 'huggingface:quora', 'huggingface:quoref', 'huggingface:race', 'huggingface:re_dial', 'huggingface:reasoning_bg', 'huggingface:recipe_nlg', 'huggingface:reclor', 'huggingface:red_caps', 'huggingface:reddit', 'huggingface:reddit_tifu', 'huggingface:refresd', 'huggingface:reuters21578', 'huggingface:riddle_sense', 'huggingface:ro_sent', 'huggingface:ro_sts', 'huggingface:ro_sts_parallel', 'huggingface:roman_urdu', 'huggingface:roman_urdu_hate_speech', 'huggingface:ronec', 'huggingface:ropes', 'huggingface:rotten_tomatoes', 'huggingface:russian_super_glue', 'huggingface:rvl_cdip', 'huggingface:s2orc', 'huggingface:samsum', 'huggingface:sanskrit_classic', 'huggingface:saudinewsnet', 'huggingface:sberquad', 'huggingface:sbu_captions', 'huggingface:scan', 'huggingface:scb_mt_enth_2020', 'huggingface:scene_parse_150', 'huggingface:schema_guided_dstc8', 'huggingface:scicite', 'huggingface:scielo', 'huggingface:scientific_papers', 'huggingface:scifact', 'huggingface:sciq', 'huggingface:scitail', 'huggingface:scitldr', 'huggingface:search_qa', 'huggingface:sede', 'huggingface:selqa', 'huggingface:sem_eval_2010_task_8', 'huggingface:sem_eval_2014_task_1', 'huggingface:sem_eval_2018_task_1', 'huggingface:sem_eval_2020_task_11', 'huggingface:sent_comp', 'huggingface:senti_lex', 'huggingface:senti_ws', 'huggingface:sentiment140', 'huggingface:sepedi_ner', 'huggingface:sesotho_ner_corpus', 'huggingface:setimes', 'huggingface:setswana_ner_corpus', 'huggingface:sharc', 'huggingface:sharc_modified', 'huggingface:sick', 'huggingface:silicone', 'huggingface:simple_questions_v2', 'huggingface:siswati_ner_corpus', 'huggingface:smartdata', 'huggingface:sms_spam', 'huggingface:snips_built_in_intents', 'huggingface:snli', 'huggingface:snow_simplified_japanese_corpus', 'huggingface:so_stacksample', 'huggingface:social_bias_frames', 'huggingface:social_i_qa', 'huggingface:sofc_materials_articles', 'huggingface:sogou_news', 'huggingface:spanish_billion_words', 'huggingface:spc', 'huggingface:species_800', 'huggingface:speech_commands', 'huggingface:spider', 'huggingface:squad', 'huggingface:squad_adversarial', 'huggingface:squad_es', 'huggingface:squad_it', 'huggingface:squad_kor_v1', 'huggingface:squad_kor_v2', 'huggingface:squad_v1_pt', 'huggingface:squad_v2', 'huggingface:squadshifts', 'huggingface:srwac', 'huggingface:sst', 'huggingface:stereoset', 'huggingface:story_cloze', 'huggingface:stsb_mt_sv', 'huggingface:stsb_multi_mt', 'huggingface:style_change_detection', 'huggingface:subjqa', 'huggingface:super_glue', 'huggingface:superb', 'huggingface:svhn', 'huggingface:swag', 'huggingface:swahili', 'huggingface:swahili_news', 'huggingface:swda', 'huggingface:swedish_medical_ner', 'huggingface:swedish_ner_corpus', 'huggingface:swedish_reviews', 'huggingface:swiss_judgment_prediction', 'huggingface:tab_fact', 'huggingface:tamilmixsentiment', 'huggingface:tanzil', 'huggingface:tapaco', 'huggingface:tashkeela', 'huggingface:taskmaster1', 'huggingface:taskmaster2', 'huggingface:taskmaster3', 'huggingface:tatoeba', 'huggingface:ted_hrlr', 'huggingface:ted_iwlst2013', 'huggingface:ted_multi', 'huggingface:ted_talks_iwslt', 'huggingface:telugu_books', 'huggingface:telugu_news', 'huggingface:tep_en_fa_para', 'huggingface:text2log', ...]
CelebA 데이터셋 다운로드하기
celeba_bldr = tfds.builder('celeb_a')
print(celeba_bldr.info.features)
print('\n', 30*"=", '\n')
print(celeba_bldr.info.features.keys())
print('\n', 30*"=", '\n')
print(celeba_bldr.info.features['image'])
print('\n', 30*"=", '\n')
print(celeba_bldr.info.features['attributes'].keys())
print('\n', 30*"=", '\n')
print(celeba_bldr.info.citation)
FeaturesDict({ 'attributes': FeaturesDict({ '5_o_Clock_Shadow': bool, 'Arched_Eyebrows': bool, 'Attractive': bool, 'Bags_Under_Eyes': bool, 'Bald': bool, 'Bangs': bool, 'Big_Lips': bool, 'Big_Nose': bool, 'Black_Hair': bool, 'Blond_Hair': bool, 'Blurry': bool, 'Brown_Hair': bool, 'Bushy_Eyebrows': bool, 'Chubby': bool, 'Double_Chin': bool, 'Eyeglasses': bool, 'Goatee': bool, 'Gray_Hair': bool, 'Heavy_Makeup': bool, 'High_Cheekbones': bool, 'Male': bool, 'Mouth_Slightly_Open': bool, 'Mustache': bool, 'Narrow_Eyes': bool, 'No_Beard': bool, 'Oval_Face': bool, 'Pale_Skin': bool, 'Pointy_Nose': bool, 'Receding_Hairline': bool, 'Rosy_Cheeks': bool, 'Sideburns': bool, 'Smiling': bool, 'Straight_Hair': bool, 'Wavy_Hair': bool, 'Wearing_Earrings': bool, 'Wearing_Hat': bool, 'Wearing_Lipstick': bool, 'Wearing_Necklace': bool, 'Wearing_Necktie': bool, 'Young': bool, }), 'identity': FeaturesDict({ 'Identity_No': int64, }), 'image': Image(shape=(218, 178, 3), dtype=uint8), 'landmarks': FeaturesDict({ 'lefteye_x': int64, 'lefteye_y': int64, 'leftmouth_x': int64, 'leftmouth_y': int64, 'nose_x': int64, 'nose_y': int64, 'righteye_x': int64, 'righteye_y': int64, 'rightmouth_x': int64, 'rightmouth_y': int64, }), }) ============================== dict_keys(['image', 'landmarks', 'attributes', 'identity']) ============================== Image(shape=(218, 178, 3), dtype=uint8) ============================== dict_keys(['5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive', 'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose', 'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair', 'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee', 'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male', 'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard', 'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline', 'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair', 'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick', 'Wearing_Necklace', 'Wearing_Necktie', 'Young']) ============================== @inproceedings{conf/iccv/LiuLWT15, added-at = {2018-10-09T00:00:00.000+0200}, author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou}, biburl = {https://www.bibsonomy.org/bibtex/250e4959be61db325d2f02c1d8cd7bfbb/dblp}, booktitle = {ICCV}, crossref = {conf/iccv/2015}, ee = {http://doi.ieeecomputersociety.org/10.1109/ICCV.2015.425}, interhash = {3f735aaa11957e73914bbe2ca9d5e702}, intrahash = {50e4959be61db325d2f02c1d8cd7bfbb}, isbn = {978-1-4673-8391-2}, keywords = {dblp}, pages = {3730-3738}, publisher = {IEEE Computer Society}, timestamp = {2018-10-11T11:43:28.000+0200}, title = {Deep Learning Face Attributes in the Wild.}, url = {http://dblp.uni-trier.de/db/conf/iccv/iccv2015.html#LiuLWT15}, year = 2015 }
# 아래 셀에서 CelebA 데이터를 다운로드할 때 에러가 발생하면 https://git.io/JL5GM 에서
# 4개의 파일을 수동으로 받아 ~/tensorflow_datasets/downloads/manual로 복사하세요.
# 또는 gdown 패키지를 사용해 역자의 드라이브에서 다운로드할 수 있습니다.
import gdown
gdown.download(id='1vDDFjykRuzEagaHwwIlv6hpaHDgEUcGo', output='img_align_celeba.zip')
gdown.download(id='1XDTGJ2-QNMvkIdFwlvYAO28uc9v2d9PO', output='list_attr_celeba.txt')
gdown.download(id='1V6zDszhMCokTZTh4EZ48RB9wslY_YSCl', output='list_eval_partition.txt')
gdown.download(id='1iwam-RFy3tuh0yj29kK9tgEJOqGrH4_r', output='list_landmarks_align_celeba.txt')
!mkdir -p ~/tensorflow_datasets/downloads/manual
!cp img_align_celeba.zip ~/tensorflow_datasets/downloads/manual
!cp list_attr_celeba.txt ~/tensorflow_datasets/downloads/manual
!cp list_eval_partition.txt ~/tensorflow_datasets/downloads/manual
!cp list_landmarks_align_celeba.txt ~/tensorflow_datasets/downloads/manual
Downloading... From: https://drive.google.com/uc?id=1vDDFjykRuzEagaHwwIlv6hpaHDgEUcGo To: /content/img_align_celeba.zip 100%|██████████| 1.44G/1.44G [00:26<00:00, 55.3MB/s] Downloading... From: https://drive.google.com/uc?id=1XDTGJ2-QNMvkIdFwlvYAO28uc9v2d9PO To: /content/list_attr_celeba.txt 100%|██████████| 26.7M/26.7M [00:00<00:00, 60.1MB/s] Downloading... From: https://drive.google.com/uc?id=1V6zDszhMCokTZTh4EZ48RB9wslY_YSCl To: /content/list_eval_partition.txt 100%|██████████| 2.84M/2.84M [00:00<00:00, 235MB/s] Downloading... From: https://drive.google.com/uc?id=1iwam-RFy3tuh0yj29kK9tgEJOqGrH4_r To: /content/list_landmarks_align_celeba.txt 100%|██████████| 12.2M/12.2M [00:00<00:00, 54.7MB/s]
# 데이터 다운로드하고 디스크에 저장합니다
celeba_bldr.download_and_prepare()
Downloading and preparing dataset 1.39 GiB (download: 1.39 GiB, generated: 1.63 GiB, total: 3.01 GiB) to /root/tensorflow_datasets/celeb_a/2.1.0...
Dl Completed...: 0 url [00:00, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Generating splits...: 0%| | 0/3 [00:00<?, ? splits/s]
Generating train examples...: 0%| | 0/162770 [00:00<?, ? examples/s]
Shuffling /root/tensorflow_datasets/celeb_a/2.1.0.incomplete7TEVX3/celeb_a-train.tfrecord*...: 0%| …
Generating validation examples...: 0%| | 0/19867 [00:00<?, ? examples/s]
Shuffling /root/tensorflow_datasets/celeb_a/2.1.0.incomplete7TEVX3/celeb_a-validation.tfrecord*...: 0%| …
Generating test examples...: 0%| | 0/19962 [00:00<?, ? examples/s]
Shuffling /root/tensorflow_datasets/celeb_a/2.1.0.incomplete7TEVX3/celeb_a-test.tfrecord*...: 0%| |…
Dataset celeb_a downloaded and prepared to /root/tensorflow_datasets/celeb_a/2.1.0. Subsequent calls will reuse this data.
# 디스크에서 tf.data.Datasets으로 데이터 로드합니다
datasets = celeba_bldr.as_dataset(shuffle_files=False)
datasets.keys()
dict_keys([Split('train'), Split('validation'), Split('test')])
#import tensorflow as tf
ds_train = datasets['train']
assert isinstance(ds_train, tf.data.Dataset)
example = next(iter(ds_train))
print(type(example))
print(example.keys())
<class 'dict'> dict_keys(['attributes', 'identity', 'image', 'landmarks'])
ds_train = ds_train.map(lambda item:
(item['image'], tf.cast(item['attributes']['Male'], tf.int32)))
ds_train = ds_train.batch(18)
images, labels = next(iter(ds_train))
print(images.shape, labels)
(18, 218, 178, 3) tf.Tensor([0 1 0 0 1 1 1 1 1 0 0 0 1 0 0 1 1 1], shape=(18,), dtype=int32)
fig = plt.figure(figsize=(12, 8))
for i,(image,label) in enumerate(zip(images, labels)):
ax = fig.add_subplot(3, 6, i+1)
ax.set_xticks([]); ax.set_yticks([])
ax.imshow(image)
ax.set_title('{}'.format(label), size=15)
# plt.savefig('images/13_3.png', dpi=300)
plt.show()
데이터셋을 로드하는 또 다른 방법입니다.
mnist, mnist_info = tfds.load('mnist', with_info=True,
shuffle_files=False)
print(mnist_info)
print(mnist.keys())
Downloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /root/tensorflow_datasets/mnist/3.0.1...
Dl Completed...: 0%| | 0/5 [00:00<?, ? file/s]
Dataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data. tfds.core.DatasetInfo( name='mnist', full_name='mnist/3.0.1', description=""" The MNIST database of handwritten digits. """, homepage='http://yann.lecun.com/exdb/mnist/', data_dir='/root/tensorflow_datasets/mnist/3.0.1.incompleteL38ZZB', file_format=tfrecord, download_size=11.06 MiB, dataset_size=21.00 MiB, features=FeaturesDict({ 'image': Image(shape=(28, 28, 1), dtype=uint8), 'label': ClassLabel(shape=(), dtype=int64, num_classes=10), }), supervised_keys=('image', 'label'), disable_shuffling=False, splits={ 'test': <SplitInfo num_examples=10000, num_shards=1>, 'train': <SplitInfo num_examples=60000, num_shards=1>, }, citation="""@article{lecun2010mnist, title={MNIST handwritten digit database}, author={LeCun, Yann and Cortes, Corinna and Burges, CJ}, journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist}, volume={2}, year={2010} }""", ) dict_keys(['test', 'train'])
ds_train = mnist['train']
assert isinstance(ds_train, tf.data.Dataset)
ds_train = ds_train.map(lambda item:
(item['image'], item['label']))
ds_train = ds_train.batch(10)
batch = next(iter(ds_train))
print(batch[0].shape, batch[1])
fig = plt.figure(figsize=(15, 6))
for i,(image,label) in enumerate(zip(batch[0], batch[1])):
ax = fig.add_subplot(2, 5, i+1)
ax.set_xticks([]); ax.set_yticks([])
ax.imshow(image[:, :, 0], cmap='gray_r')
ax.set_title('{}'.format(label), size=15)
# plt.savefig('images/13_4.png', dpi=300)
plt.show()
(10, 28, 28, 1) tf.Tensor([4 1 0 7 8 1 2 7 1 6], shape=(10,), dtype=int64)