Why does the model predict significantly slower after `compile`?¶

I have noticed that after the .compile() call, the model will predict a lot slower, even before training.

This mean that it can affect model speed in real-time inference like object detection in webcam.

This experiment try to reproduce the issue as clear as possible.

See related question here: https://stackoverflow.com/q/58378374/2593810

In [1]:

import tensorflow as tf
kr = tf.keras
import numpy as np
np.set_printoptions(suppress=True)
tf.__version__, kr.__version__, np.__version__

Out[1]:

('2.0.0', '2.2.4-tf', '1.16.5')

In [2]:

model = kr.Sequential([
    kr.layers.Dense(2000, activation='relu', input_shape=(5,)),
    kr.layers.Dense(2000, activation='relu'),
    kr.layers.Dense(5, activation='softmax')
])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 2000)              12000     
_________________________________________________________________
dense_1 (Dense)              (None, 2000)              4002000   
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 10005     
=================================================================
Total params: 4,024,005
Trainable params: 4,024,005
Non-trainable params: 0
_________________________________________________________________

Test prediction speed¶

In [3]:

x = np.random.random((1, 5))

In [4]:

model.predict(x)

Out[4]:

array([[0.19717476, 0.19599418, 0.2063633 , 0.20119473, 0.19927306]],
      dtype=float32)

In [5]:

%%timeit -n 20
model.predict(x)

2.93 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)

Compile and test speed¶

In [6]:

model.compile(kr.optimizers.SGD(momentum=0.9), 'sparse_categorical_crossentropy', ['acc'])

In [7]:

model.predict(x)

Out[7]:

array([[0.19717476, 0.19599418, 0.2063633 , 0.20119473, 0.19927306]],
      dtype=float32)

In [8]:

%%timeit -n 20
model.predict(x)

27 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)

Notice that speed after compile is significantly lower than before compile.

Train and test speed¶

In [9]:

from sklearn.model_selection import train_test_split
# create dummy dataset, where y is 1 only at the same index that X is maximum, and 0 everywhere else
X = np.random.random((5000, 5))
Y = np.argmax(X, axis=1)
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)

In [10]:

%%time
model.fit(x_train, y_train, epochs=5, validation_split=0.2)

Train on 3200 samples, validate on 800 samples
Epoch 1/5
3200/3200 [==============================] - 3s 1ms/sample - loss: 1.3513 - acc: 0.6631 - val_loss: 0.9611 - val_acc: 0.8288
Epoch 2/5
3200/3200 [==============================] - 3s 867us/sample - loss: 0.6527 - acc: 0.8747 - val_loss: 0.4152 - val_acc: 0.9488
Epoch 3/5
3200/3200 [==============================] - 3s 907us/sample - loss: 0.3554 - acc: 0.9244 - val_loss: 0.2873 - val_acc: 0.9438
Epoch 4/5
3200/3200 [==============================] - 3s 908us/sample - loss: 0.2685 - acc: 0.9328 - val_loss: 0.2171 - val_acc: 0.9538
Epoch 5/5
3200/3200 [==============================] - 3s 955us/sample - loss: 0.2231 - acc: 0.9403 - val_loss: 0.1946 - val_acc: 0.9438
Wall time: 15.2 s

Out[10]:

<tensorflow.python.keras.callbacks.History at 0x11c9c72e588>

In [11]:

model.evaluate(x_test, y_test, batch_size=128, verbose=0)

Out[11]:

[0.2094351042509079, 0.942]

In [12]:

x, model.predict(x)

Out[12]:

(array([[0.71759054, 0.88347487, 0.21729862, 0.01851623, 0.87170631]]),
 array([[0.02285822, 0.7512861 , 0.00000684, 0.00000017, 0.22584875]],
       dtype=float32))

In [13]:

%%timeit -n 20
model.predict(x)

28.3 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)

Notice that the speed is still slow after fitting

I tried saving the model in HDF5 format without optimizer and then the speed came back.¶

In [14]:

model.save('model.h5', include_optimizer=False, save_format='h5')
model2 = kr.models.load_model('model.h5')

WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.

In [15]:

%%timeit -n 20
model2.predict(x)

3.28 ms ± 555 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)

Conclusion¶

The model is indeed slower after compile(), but why does that happen? I don't know.

I'm quite certain that it is a bug or an unintended surprise.

As a user, you are expecting the model to run as fast as possible when calling predict() because you have only one way to do prediction from the model.

Think about the numpy variant, Dense layers are simply a bunch of matrix multiplications, vector additions, and non-linearity activations. Those operations are performed to the input x and the weights inside the model. It will not be slower or faster if there is no garbage being computed. predict() should consume constant time always. If it deviates from this it is likely a bug.

Let's see the solution that we will get from this GitHub issue: https://github.com/tensorflow/tensorflow/issues/33340

Why does the model predict significantly slower after compile?¶