ODE parameter prediction¶

We will perform ODE parameter prediction for forecasting of the number of cases. We have two ways for prediction.

Time-series prediction withOUT indicators
Time-series prediction with indicators

The second one uses indicators, including OxCGRT indicators, the number of vaccinations.

Note:
The target (Y) of prediction is ODE parameter values, not the number of cases. ODE parameter values are more useful because ODE parameter values have physical meanings, including (non-dimensional) effective contact rate, and they are always in the range of (0, 1).

In [ ]:

from datetime import timedelta
import covsirphy as cs
from matplotlib import pyplot as plt
import numpy as np
cs.__version__

The target of prediction is estimated ODE parameter values. At we will prepare them as explained with tutorials. For example, we can use class method ODEScenario.auto_build(), specifying location name. "Baseline" scenario will be created automatically with downloaded datasets.

In [ ]:

snr = cs.ODEScenario.auto_build(geo="Japan", model=cs.SIRFModel)
# Show summary
snr.summary()

For demonstration, we will get the start date of future phases.

In [ ]:

future_start_date = snr.simulate(display=False).index.max() + timedelta(days=1)
future_start_date

1. Time-series prediction withOUT indicators¶

This scenario "Predicted" does not use indicators, using AutoTS package: a time series package for Python designed for rapidly deploying high-accuracy forecasts at scale.

At first, create "Predicted" scenario, copying estimated ODE parameter values of "Baseline" scenario.

In [ ]:

snr.build_with_template(name="Predicted", template="Baseline");

Then, run ODEScenario().predict(days, name, **kwargs). We can apply keyword arguments of autots.AutoTS() except for forecast_length (always the same as days).

In [ ]:

snr.predict(days=30, name="Predicted");

Check the predicted ODE parameter values.

In [ ]:

df = snr.append().summary()
df.loc[df["Start"] >= future_start_date]

Check the dynamics.

In [ ]:

snr.simulate(name="Predicted");

2. Time-series prediction with indicators¶

using future_regressor functionality of AutoTS, we will predict ODE parameter values with indicators. We can download/create time-series data of indicators using DataEngineer class as explained in Tutorial: Data preparation and Tutorial: Data engineering.

In [ ]:

data_eng = cs.DataEngineer()
data_eng.download(databases=["japan", "covid19dh", "owid"]).clean().transform()
subset_df, *_ = data_eng.subset(geo="Japan")
indicator_df = subset_df.drop(["Population", "Susceptible", "Confirmed", "Infected", "Fatal", "Recovered"], axis=1)
indicator_df

2.1 Principal Component Analysis¶

To remove multicollinearity of indicators, we use pca package: a python package to perform Principal Component Analysis and to create insightful plots via our MLEngineer(seed=0).pca(X, n_components). Standardization (Z-score normalization) and Principal Component Analysis (PCA) will be performed.

In [ ]:

ml = cs.MLEngineer()
pca_dict = ml.pca(X=indicator_df, n_components=0.95)
pca_df = pca_dict["PC"].copy()
pca_df.tail()

The output of MLEngineer().pca() is the model of pca package, we can show some figures easily as follows.

Explained variance:

In [ ]:

pca_dict["model"].plot()
plt.close()

Top features:

In [ ]:

df = pca_dict["topfeat"].copy()
df["PC"] = df["PC"].str.extract(r"(\d+)").astype(np.int64)
df = df.sort_values(["PC", "type"]).reset_index(drop=True)

def highlight(d):
    styles = d.copy()
    styles.loc[:, :] = ""
    styles.loc[d["type"] == "best", :] = "background-color: yellow"
    return styles

df.style.apply(highlight, axis=None)

2-2. Future values of indicators¶

Before prediction of ODE parameter values, we need to prepare future values of (PCA-performed) indicators. We can add future values to the pandas.DataFrame manually or forecast them with MLEngineer(seed=0).predict(Y, days=<int>, X=None) as follows.

In [ ]:

future_df = ml.forecast(Y=pca_df, days=30, X=None)
future_df.tail()

2-3. Prediction of ODE parameter values¶

Now we have Y (estimated ODE parameter values) and X (estimated/forecasted indicator values without multicollinearity), we can predict ODE parameter values of future phases using ODEScenario().predict(days=<int>, name=<str>, seed=0, X=<pandas.DataFrame>). The new scenario is named "Predicted_with_X" here.

In [ ]:

snr.build_with_template(name="Predicted_with_X", template="Baseline")
snr.predict(days=30, name="Predicted_with_X", seed=0, X=future_df);

3. Compare scenarios¶

As explained with Tutorial: Scenario analysis, we can compare scenarios.

In [ ]:

# Adjust the last date, appending a phase
snr.append();

In [ ]:

# Compare reproduction number
ymin = snr.compare_param("Rt", date_range=(future_start_date, None), display=False).min().min()
snr.compare_param("Rt", date_range=(future_start_date, None), ylim=(ymin, None));

Note that minimum value of y in the figure was changed to focus on the differences of the scenarios.

In [ ]:

ymin_value = snr.compare_cases("Confirmed", date_range=(future_start_date, None), display=False).Predicted.min()
snr.compare_cases("Confirmed", date_range=(future_start_date, None), ylim=(ymin_value, None));

In [ ]:

# Show representative values
snr.describe()

Thank you!