compile
?¶I have noticed that after the .compile()
call, the model will predict a lot slower, even before training.
This mean that it can affect model speed in real-time inference like object detection in webcam.
This experiment try to reproduce the issue as clear as possible.
See related question here: https://stackoverflow.com/q/58378374/2593810
import tensorflow as tf
kr = tf.keras
import numpy as np
np.set_printoptions(suppress=True)
tf.__version__, kr.__version__, np.__version__
('2.0.0', '2.2.4-tf', '1.16.5')
model = kr.Sequential([
kr.layers.Dense(2000, activation='relu', input_shape=(5,)),
kr.layers.Dense(2000, activation='relu'),
kr.layers.Dense(5, activation='softmax')
])
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 2000) 12000 _________________________________________________________________ dense_1 (Dense) (None, 2000) 4002000 _________________________________________________________________ dense_2 (Dense) (None, 5) 10005 ================================================================= Total params: 4,024,005 Trainable params: 4,024,005 Non-trainable params: 0 _________________________________________________________________
x = np.random.random((1, 5))
model.predict(x)
array([[0.19717476, 0.19599418, 0.2063633 , 0.20119473, 0.19927306]], dtype=float32)
%%timeit -n 20
model.predict(x)
2.93 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)
model.compile(kr.optimizers.SGD(momentum=0.9), 'sparse_categorical_crossentropy', ['acc'])
model.predict(x)
array([[0.19717476, 0.19599418, 0.2063633 , 0.20119473, 0.19927306]], dtype=float32)
%%timeit -n 20
model.predict(x)
27 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)
Notice that speed after compile is significantly lower than before compile.
from sklearn.model_selection import train_test_split
# create dummy dataset, where y is 1 only at the same index that X is maximum, and 0 everywhere else
X = np.random.random((5000, 5))
Y = np.argmax(X, axis=1)
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
%%time
model.fit(x_train, y_train, epochs=5, validation_split=0.2)
Train on 3200 samples, validate on 800 samples Epoch 1/5 3200/3200 [==============================] - 3s 1ms/sample - loss: 1.3513 - acc: 0.6631 - val_loss: 0.9611 - val_acc: 0.8288 Epoch 2/5 3200/3200 [==============================] - 3s 867us/sample - loss: 0.6527 - acc: 0.8747 - val_loss: 0.4152 - val_acc: 0.9488 Epoch 3/5 3200/3200 [==============================] - 3s 907us/sample - loss: 0.3554 - acc: 0.9244 - val_loss: 0.2873 - val_acc: 0.9438 Epoch 4/5 3200/3200 [==============================] - 3s 908us/sample - loss: 0.2685 - acc: 0.9328 - val_loss: 0.2171 - val_acc: 0.9538 Epoch 5/5 3200/3200 [==============================] - 3s 955us/sample - loss: 0.2231 - acc: 0.9403 - val_loss: 0.1946 - val_acc: 0.9438 Wall time: 15.2 s
<tensorflow.python.keras.callbacks.History at 0x11c9c72e588>
model.evaluate(x_test, y_test, batch_size=128, verbose=0)
[0.2094351042509079, 0.942]
x, model.predict(x)
(array([[0.71759054, 0.88347487, 0.21729862, 0.01851623, 0.87170631]]), array([[0.02285822, 0.7512861 , 0.00000684, 0.00000017, 0.22584875]], dtype=float32))
%%timeit -n 20
model.predict(x)
28.3 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)
Notice that the speed is still slow after fitting
model.save('model.h5', include_optimizer=False, save_format='h5')
model2 = kr.models.load_model('model.h5')
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
%%timeit -n 20
model2.predict(x)
3.28 ms ± 555 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)
The model is indeed slower after compile()
, but why does that happen? I don't know.
I'm quite certain that it is a bug or an unintended surprise.
As a user, you are expecting the model to run as fast as possible when calling predict()
because you have only one way to do prediction from the model.
Think about the numpy
variant, Dense
layers are simply a bunch of matrix multiplications, vector additions, and non-linearity activations.
Those operations are performed to the input x
and the weights inside the model. It will not be slower or faster if there is no garbage being computed.
predict()
should consume constant time always. If it deviates from this it is likely a bug.
Let's see the solution that we will get from this GitHub issue: https://github.com/tensorflow/tensorflow/issues/33340