Magic Commands¶

Magic commands (those that start with %) are commands that modify a configuration of Jupyter Notebooks. A number of magic commands are available by default (see list here)--and many more can be added with extensions. The magic command added in this section allows matplotlib to display our plots directly on the browser instead of having to save them on a local file.

In [1]:

%matplotlib inline

Activity 8: Re-training a model dynamically¶

In this activity, we re-train our model every time new data is available.

First, we start by importing cryptonic. Cryptonic is a simple software application developed for this course that implements all the steps up to this section using Python classes and modules. Consider Cryptonic a template on how you could develop similar applications.

In [2]:

import cryptonic
import pandas as pd

D:\Anaconda3\envs\machinelearn\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

In [3]:

from tqdm import tqdm_notebook

In [4]:

import matplotlib.pyplot as plt

In [5]:

plt.style.use('seaborn-white')

In [6]:

from cryptonic.models.model import Model
from cryptonic.markets.coinmarketcap import CoinMarketCap
from cryptonic.models import normalizations

Fecthing Real-time Data¶

Throughout this project we have been using data originally provided by CoinMarketCap. We have created an interface for collecting both real-time and historical data as as part of cryptonic: the class CoinMarketCap().

In [7]:

print(CoinMarketCap())


        Crypto-currency data comes from the website CoinMarketCap.
        CoinMarketCap is can be accessed at: https://coinmarketcap.com/

        The permission to use the data is available on their FAQ

            https://coinmarketcap.com/faq/

        and reads:

            "Q: Am I allowed to use content (screenshots, data, graphs, etc.) 
            for one of my personal projects and/or commercial use?

            R: Absolutely! Feel free to use any content as you see fit. 
            We kindly ask that you cite us as a source."

Our model is designed to work with daily data. Let's go ahead and collect historic daily data from CoinMarketCap (this is the same data used previously).

In [8]:

historic_data = CoinMarketCap.historic()

In [9]:

# need to reverse dataframe so that earliest is first
historic_data = historic_data.iloc[::-1].reset_index(drop=True)

In [10]:

historic_data.head(3)

Out[10]:

	date	open	high	low	close	volume	market_cap
0	2013-04-28	135.30	135.98	132.10	134.21	None	1500519936
1	2013-04-29	134.44	147.49	134.00	144.54	None	1491160064
2	2013-04-30	144.00	146.93	134.05	139.00	None	1597779968

The data contains practically the same variables from our earlier dataset. However, much of the data comes from an earlier period. Recent Bitcoin prices have gained a lot of volatility if compared to the prices of a few years ago. Before using this data in our model, let's make sure to filter it to dates after January 1, 2017.

In [11]:

#
#  Using the Pandas API, filter the dataframe
#  for observations from 2017 only. 
# 
#  Hint: use the `date` column / variable.
#
model_data = historic_data[historic_data['date'] >= '2017-01-01']

The `Model()` Class¶

We have also created the class Model() which compiles all the code we have written so far. We will use that class to build, train, and evaluate our model.

In [12]:

M = Model(data=model_data,
          variable='close',
          predicted_period_size=7)

In [13]:

M.build()

Out[13]:

<keras.models.Sequential at 0xdc59be0>

In [14]:

M.train(epochs=100, verbose=1)

Epoch 1/100
1/1 [==============================] - 1s 577ms/step - loss: 0.0057
Epoch 2/100
1/1 [==============================] - 0s 18ms/step - loss: 0.0046
Epoch 3/100
1/1 [==============================] - 0s 21ms/step - loss: 0.0040
Epoch 4/100
1/1 [==============================] - 0s 17ms/step - loss: 0.0035
Epoch 5/100
1/1 [==============================] - 0s 17ms/step - loss: 0.0031
Epoch 6/100
1/1 [==============================] - 0s 20ms/step - loss: 0.0028
Epoch 7/100
1/1 [==============================] - 0s 16ms/step - loss: 0.0025
Epoch 8/100
1/1 [==============================] - 0s 17ms/step - loss: 0.0022
Epoch 9/100
1/1 [==============================] - 0s 17ms/step - loss: 0.0020
Epoch 10/100
1/1 [==============================] - 0s 17ms/step - loss: 0.0018
Epoch 11/100
1/1 [==============================] - 0s 18ms/step - loss: 0.0017
Epoch 12/100
1/1 [==============================] - 0s 18ms/step - loss: 0.0015
Epoch 13/100
1/1 [==============================] - 0s 20ms/step - loss: 0.0014
Epoch 14/100
1/1 [==============================] - 0s 17ms/step - loss: 0.0012
Epoch 15/100
1/1 [==============================] - 0s 17ms/step - loss: 0.0011
Epoch 16/100
1/1 [==============================] - 0s 20ms/step - loss: 0.0010
Epoch 17/100
1/1 [==============================] - 0s 16ms/step - loss: 9.4727e-04
Epoch 18/100
1/1 [==============================] - 0s 21ms/step - loss: 8.6410e-04
Epoch 19/100
1/1 [==============================] - 0s 20ms/step - loss: 7.8790e-04
Epoch 20/100
1/1 [==============================] - 0s 16ms/step - loss: 7.1792e-04
Epoch 21/100
1/1 [==============================] - 0s 23ms/step - loss: 6.5352e-04
Epoch 22/100
1/1 [==============================] - 0s 14ms/step - loss: 5.9417e-04
Epoch 23/100
1/1 [==============================] - 0s 15ms/step - loss: 5.3942e-04
Epoch 24/100
1/1 [==============================] - 0s 14ms/step - loss: 4.8890e-04
Epoch 25/100
1/1 [==============================] - 0s 16ms/step - loss: 4.4228e-04
Epoch 26/100
1/1 [==============================] - 0s 19ms/step - loss: 3.9927e-04
Epoch 27/100
1/1 [==============================] - 0s 21ms/step - loss: 3.5963e-04
Epoch 28/100
1/1 [==============================] - 0s 15ms/step - loss: 3.2312e-04
Epoch 29/100
1/1 [==============================] - 0s 15ms/step - loss: 2.8954e-04
Epoch 30/100
1/1 [==============================] - 0s 19ms/step - loss: 2.5869e-04
Epoch 31/100
1/1 [==============================] - 0s 16ms/step - loss: 2.3039e-04
Epoch 32/100
1/1 [==============================] - 0s 20ms/step - loss: 2.0448e-04
Epoch 33/100
1/1 [==============================] - 0s 17ms/step - loss: 1.8079e-04
Epoch 34/100
1/1 [==============================] - 0s 18ms/step - loss: 1.5918e-04
Epoch 35/100
1/1 [==============================] - 0s 19ms/step - loss: 1.3951e-04
Epoch 36/100
1/1 [==============================] - 0s 18ms/step - loss: 1.2167e-04
Epoch 37/100
1/1 [==============================] - 0s 14ms/step - loss: 1.0554e-04
Epoch 38/100
1/1 [==============================] - 0s 18ms/step - loss: 9.1008e-05
Epoch 39/100
1/1 [==============================] - 0s 17ms/step - loss: 7.7976e-05
Epoch 40/100
1/1 [==============================] - 0s 15ms/step - loss: 6.6348e-05
Epoch 41/100
1/1 [==============================] - 0s 16ms/step - loss: 5.6033e-05
Epoch 42/100
1/1 [==============================] - 0s 14ms/step - loss: 4.6941e-05
Epoch 43/100
1/1 [==============================] - 0s 14ms/step - loss: 3.8984e-05
Epoch 44/100
1/1 [==============================] - 0s 14ms/step - loss: 3.2075e-05
Epoch 45/100
1/1 [==============================] - 0s 15ms/step - loss: 2.6126e-05
Epoch 46/100
1/1 [==============================] - 0s 18ms/step - loss: 2.1053e-05
Epoch 47/100
1/1 [==============================] - 0s 18ms/step - loss: 1.6770e-05
Epoch 48/100
1/1 [==============================] - 0s 14ms/step - loss: 1.3194e-05
Epoch 49/100
1/1 [==============================] - 0s 14ms/step - loss: 1.0244e-05
Epoch 50/100
1/1 [==============================] - 0s 15ms/step - loss: 7.8409e-06
Epoch 51/100
1/1 [==============================] - 0s 15ms/step - loss: 5.9114e-06
Epoch 52/100
1/1 [==============================] - 0s 14ms/step - loss: 4.3849e-06
Epoch 53/100
1/1 [==============================] - 0s 17ms/step - loss: 3.1966e-06
Epoch 54/100
1/1 [==============================] - 0s 14ms/step - loss: 2.2876e-06
Epoch 55/100
1/1 [==============================] - 0s 17ms/step - loss: 1.6050e-06
Epoch 56/100
1/1 [==============================] - 0s 15ms/step - loss: 1.1026e-06
Epoch 57/100
1/1 [==============================] - 0s 19ms/step - loss: 7.4056e-07
Epoch 58/100
1/1 [==============================] - 0s 18ms/step - loss: 4.8562e-07
Epoch 59/100
1/1 [==============================] - 0s 13ms/step - loss: 3.1037e-07
Epoch 60/100
1/1 [==============================] - 0s 14ms/step - loss: 1.9302e-07
Epoch 61/100
1/1 [==============================] - 0s 15ms/step - loss: 1.1659e-07
Epoch 62/100
1/1 [==============================] - 0s 13ms/step - loss: 6.8266e-08
Epoch 63/100
1/1 [==============================] - 0s 15ms/step - loss: 3.8697e-08
Epoch 64/100
1/1 [==============================] - 0s 18ms/step - loss: 2.1504e-08
Epoch 65/100
1/1 [==============================] - 0s 17ms/step - loss: 1.4773e-08
Epoch 66/100
1/1 [==============================] - 0s 16ms/step - loss: 4.4828e-08
Epoch 67/100
1/1 [==============================] - 0s 14ms/step - loss: 4.1620e-07
Epoch 68/100
1/1 [==============================] - 0s 14ms/step - loss: 3.0120e-06
Epoch 69/100
1/1 [==============================] - 0s 16ms/step - loss: 7.9916e-06
Epoch 70/100
1/1 [==============================] - 0s 13ms/step - loss: 6.9289e-06
Epoch 71/100
1/1 [==============================] - 0s 12ms/step - loss: 3.5072e-06
Epoch 72/100
1/1 [==============================] - 0s 14ms/step - loss: 1.6460e-06
Epoch 73/100
1/1 [==============================] - 0s 19ms/step - loss: 8.8001e-07
Epoch 74/100
1/1 [==============================] - 0s 18ms/step - loss: 5.8072e-07
Epoch 75/100
1/1 [==============================] - 0s 17ms/step - loss: 4.7660e-07
Epoch 76/100
1/1 [==============================] - 0s 16ms/step - loss: 4.9039e-07
Epoch 77/100
1/1 [==============================] - 0s 14ms/step - loss: 6.1208e-07
Epoch 78/100
1/1 [==============================] - 0s 17ms/step - loss: 9.1186e-07
Epoch 79/100
1/1 [==============================] - 0s 14ms/step - loss: 1.5109e-06
Epoch 80/100
1/1 [==============================] - 0s 14ms/step - loss: 2.6383e-06
Epoch 81/100
1/1 [==============================] - 0s 16ms/step - loss: 4.1685e-06
Epoch 82/100
1/1 [==============================] - 0s 18ms/step - loss: 5.4834e-06
Epoch 83/100
1/1 [==============================] - 0s 13ms/step - loss: 5.2744e-06
Epoch 84/100
1/1 [==============================] - 0s 18ms/step - loss: 4.3413e-06
Epoch 85/100
1/1 [==============================] - 0s 18ms/step - loss: 3.1149e-06
Epoch 86/100
1/1 [==============================] - 0s 20ms/step - loss: 2.3650e-06
Epoch 87/100
1/1 [==============================] - 0s 15ms/step - loss: 1.8645e-06
Epoch 88/100
1/1 [==============================] - 0s 16ms/step - loss: 1.6893e-06
Epoch 89/100
1/1 [==============================] - 0s 16ms/step - loss: 1.7023e-06
Epoch 90/100
1/1 [==============================] - 0s 17ms/step - loss: 2.0366e-06
Epoch 91/100
1/1 [==============================] - 0s 17ms/step - loss: 2.8027e-06
Epoch 92/100
1/1 [==============================] - 0s 15ms/step - loss: 4.2650e-06
Epoch 93/100
1/1 [==============================] - 0s 13ms/step - loss: 5.5191e-06
Epoch 94/100
1/1 [==============================] - 0s 15ms/step - loss: 5.0343e-06
Epoch 95/100
1/1 [==============================] - 0s 18ms/step - loss: 3.4275e-06
Epoch 96/100
1/1 [==============================] - 0s 17ms/step - loss: 2.1638e-06
Epoch 97/100
1/1 [==============================] - 0s 18ms/step - loss: 1.4714e-06
Epoch 98/100
1/1 [==============================] - 0s 16ms/step - loss: 1.1876e-06
Epoch 99/100
1/1 [==============================] - 0s 17ms/step - loss: 1.1678e-06
Epoch 100/100
1/1 [==============================] - 0s 17ms/step - loss: 1.4197e-06

Out[14]:

<keras.callbacks.History at 0x1078ca90>

We can now use the model for making predictions with the predict() method. The parameter denormalized will return values in the original scale of the data. In our case, US dollars.

In [15]:

M.predict(denormalized=True)

Out[15]:

array([6739.1294, 6724.405 , 6378.5703, 6262.376 , 6207.931 , 6105.3643,
       5768.2573], dtype=float32)

We now evaluate our model to inspect the statistics for the last epoch of training compared to a single test week.

In [16]:

M.evaluate()

Out[16]:

{'mape': 6.28, 'mse': 0.0, 'rmse': 525.83}

Finally, we can now save the trained model on disk for later use.

In [17]:

M.save('/bitcoin_model_prod_v0.h5')

Our Model() class can also load a previously trained model when instantiated with the path parameter.

In [18]:

M = Model(path='/bitcoin_model_prod_v0.h5',
          data=model_data,
          variable='close',
          predicted_period_size=7)

In [19]:

M.predict(denormalized=True)

Out[19]:

array([6739.1294, 6724.405 , 6378.5703, 6262.376 , 6207.931 , 6105.3643,
       5768.2573], dtype=float32)

New Data, Re-train Old Model¶

One strategy discussed earlier regards the re-training of our model with new data. In our case, our biggest concern is to shape data in a way that the model has been configured. As an example, we will configure our model to predict a week using 40 weeks. We will first train the model with the first 40 weeks of 2017, then continue to re-train it over the following weeks until we reach week 50.

In [20]:

print('Number of full weeks: {}'.format(len(model_data) // 7))

Number of full weeks: 83

First, let's build a model with the first set of data. Notice how we use 7*40 + 7 as the indexer. This is because we use 40 weeks for training and 1 week for testing.

In [21]:

#######################################
# size of original model_data dataframe
print([len(model_data), len(model_data)/7])
# size of sliced dataframe
print([len(model_data[0:(7*40 + 7 + 1)]), len(model_data[0:(7*40 + 7 + 1)])/7])

[584, 83.42857142857143]
[288, 41.142857142857146]

In [22]:

M = Model(data=model_data[0:(7*40 + 7 + 1)],
          variable='close',
          predicted_period_size=7)

In [23]:

M.build()

Out[23]:

<keras.models.Sequential at 0xc9041d0>

In [24]:

M.train()

Out[24]:

<keras.callbacks.History at 0x11d7cef0>

In [25]:

#
#  Complete the range function and
#  the model_data filtering parameters
#  using an index to split the data in overlapping
#  groups of 7 days. Then, re-train our model
#  and collect the results.
#
#  The variables A, B, C, and D are placeholders.
#
results = []
for i in range(0, 1):
    M.train(model_data[0:(7*40 + 7 + 1)])
    results.append(M.evaluate())

In [26]:

M.predict(denormalized=True)

Out[26]:

array([5690.425 , 5869.613 , 5900.2695, 5931.796 , 6675.685 , 6930.4644,
       7162.448 ], dtype=float32)

New Data, New Model¶

Another strategy is to create and train a new model evey time new data is available. This approach tends to reduce catastrophic forgetting, but training time increases as data increases.

It's implementation is quite simple.

Let's assume we have old data for 49 weeks of 2017 and after a week we now have new data. We represent this wtih the variables old_data and new_data.

In [27]:

old_data = model_data[0*7:7*48 + 7 + 1]

In [28]:

new_data = model_data[0*7:7*49 + 7 + 1]

In [29]:

M = Model(data=old_data,
          variable='close',
          predicted_period_size=7)

In [30]:

M.build()
M.train()

Out[30]:

<keras.callbacks.History at 0x13fa3b00>

In [31]:

M.predict(denormalized=True)

Out[31]:

array([15419.928, 15863.199, 16237.421, 19407.395, 24318.959, 22523.018,
       20626.521], dtype=float32)

Now, assume that new data is available. Using this technicle we go ahead and create a new model using only the new data.

In [32]:

#
#  Re-instantiate the model with the Model()
#  class using the new_data variable instead
#  of the old_data one. 
#
M = Model(data=new_data[1*7:7*49+7+1],
          variable='close',
          predicted_period_size=7)

In [33]:

M.build()
M.train()

Out[33]:

<keras.callbacks.History at 0x148caf60>

In [34]:

M.predict(denormalized=True)

Out[34]:

array([19086.01 , 20918.2  , 21627.947, 20273.305, 20531.28 , 21991.209,
       24124.64 ], dtype=float32)

This approach is very simple to implement and tends to work well. We will be using this to deploy our application.