I have noticed that before training, the model size is small. But after training, its size becomes bigger.
It can be observed from the file size when saving it to disk.
You can look at the question here: https://stackoverflow.com/q/57058178/2593810
import tensorflow as tf
from tensorflow import keras as kr
import numpy as np
import os
tf.__version__
'2.0.0'
def build_model():
model = kr.Sequential([
kr.layers.Dense(1000, 'relu', input_shape=(500,)),
kr.layers.Dense(1000, 'relu'),
kr.layers.Dense(1, 'sigmoid')
])
model.compile('adam', 'binary_crossentropy', ['acc'])
return model
def print_model_size(filename):
print(f"{filename} size: {os.path.getsize(filename) / 1024 / 1024:.3f} MiB")
model_a = build_model()
model_a.summary()
fn = 'model_a.h5'
model_a.save(fn)
print_model_size(fn)
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 1000) 501000 _________________________________________________________________ dense_1 (Dense) (None, 1000) 1001000 _________________________________________________________________ dense_2 (Dense) (None, 1) 1001 ================================================================= Total params: 1,503,001 Trainable params: 1,503,001 Non-trainable params: 0 _________________________________________________________________ model_a.h5 size: 5.748 MiB
def create_y(x):
return (x[:, [100, 200, 300, 400]].sum(1) > 2).astype('float32')
x_train = np.random.random((10000, 500))
x_test = np.random.random((2000, 500))
y_train = create_y(x_train)
y_test = create_y(x_test)
model_a.fit(x_train, y_train, validation_split=0.1, epochs=100, callbacks=[kr.callbacks.EarlyStopping(patience=5, restore_best_weights=True)])
Train on 9000 samples, validate on 1000 samples Epoch 1/100 9000/9000 [==============================] - 5s 512us/sample - loss: 0.5887 - acc: 0.6672 - val_loss: 0.2988 - val_acc: 0.8920 Epoch 2/100 9000/9000 [==============================] - 3s 379us/sample - loss: 0.2613 - acc: 0.8850 - val_loss: 0.1835 - val_acc: 0.9270 Epoch 3/100 9000/9000 [==============================] - 3s 379us/sample - loss: 0.1957 - acc: 0.9153 - val_loss: 0.1744 - val_acc: 0.9220 Epoch 4/100 9000/9000 [==============================] - 4s 395us/sample - loss: 0.1417 - acc: 0.9413 - val_loss: 0.1274 - val_acc: 0.9470 Epoch 5/100 9000/9000 [==============================] - 3s 372us/sample - loss: 0.1284 - acc: 0.9464 - val_loss: 0.1563 - val_acc: 0.9230 Epoch 6/100 9000/9000 [==============================] - 3s 364us/sample - loss: 0.1577 - acc: 0.9313 - val_loss: 0.1393 - val_acc: 0.9390 Epoch 7/100 9000/9000 [==============================] - 3s 363us/sample - loss: 0.1401 - acc: 0.9403 - val_loss: 0.1368 - val_acc: 0.9360 Epoch 8/100 9000/9000 [==============================] - 3s 358us/sample - loss: 0.1217 - acc: 0.9476 - val_loss: 0.2715 - val_acc: 0.8800 Epoch 9/100 9000/9000 [==============================] - 3s 360us/sample - loss: 0.1355 - acc: 0.9427 - val_loss: 0.1444 - val_acc: 0.9390
<tensorflow.python.keras.callbacks.History at 0x29b1e866080>
fn = 'model_a_trained.h5'
model_a.save(fn)
print_model_size(fn)
model_a_trained.h5 size: 17.232 MiB
# copy weights of model A to model B
model_b = build_model()
model_b.set_weights(model_a.get_weights())
fn = 'model_b.h5'
model_b.save(fn)
print_model_size(fn)
model_b.h5 size: 5.748 MiB
load_model = kr.models.load_model
model_a = load_model('model_a_trained.h5')
model_b = load_model('model_b.h5')
print(model_a.evaluate(x_train, y_train, verbose=0))
print(model_a.evaluate(x_test, y_test, verbose=0))
[0.0855224913239479, 0.974] [0.12154238364100456, 0.9475]
print(model_b.evaluate(x_train, y_train, verbose=0))
print(model_b.evaluate(x_test, y_test, verbose=0))
[0.0855224913239479, 0.974] [0.12154238364100456, 0.9475]
You will see that both model_a
and model_b
give the same accuracy yet their disk space consumption is tremendously different.
It's because the .fit()
command stores data of the training process that is not used for prediction.
In this case, the data that is being stored is the previous gradients state of the Adam optimizer. Space consumption varies from optimizer to optimizer.
In the case of SGD, the space consumption would not be big as it does not store gradients data.
So if you don't want to train the model anymore you should save it with include_optimizer=False
to reduce disk space consumption.