We will perform ODE parameter prediction for forecasting of the number of cases. We have two ways for prediction.
The second one uses indicators, including OxCGRT indicators, the number of vaccinations.
Note:
The target (Y) of prediction is ODE parameter values, not the number of cases. ODE parameter values are more useful because ODE parameter values have physical meanings, including (non-dimensional) effective contact rate, and they are always in the range of (0, 1).
from datetime import timedelta
import covsirphy as cs
from matplotlib import pyplot as plt
import numpy as np
cs.__version__
The target of prediction is estimated ODE parameter values. At we will prepare them as explained with tutorials. For example, we can use class method ODEScenario.auto_build()
, specifying location name. "Baseline" scenario will be created automatically with downloaded datasets.
snr = cs.ODEScenario.auto_build(geo="Japan", model=cs.SIRFModel)
# Show summary
snr.summary()
For demonstration, we will get the start date of future phases.
future_start_date = snr.simulate(display=False).index.max() + timedelta(days=1)
future_start_date
This scenario "Predicted" does not use indicators, using AutoTS package: a time series package for Python designed for rapidly deploying high-accuracy forecasts at scale.
At first, create "Predicted" scenario, copying estimated ODE parameter values of "Baseline" scenario.
snr.build_with_template(name="Predicted", template="Baseline");
Then, run ODEScenario().predict(days, name, **kwargs)
. We can apply keyword arguments of autots.AutoTS()
except for forecast_length
(always the same as days
).
snr.predict(days=30, name="Predicted");
Check the predicted ODE parameter values.
df = snr.append().summary()
df.loc[df["Start"] >= future_start_date]
Check the dynamics.
snr.simulate(name="Predicted");
using future_regressor
functionality of AutoTS, we will predict ODE parameter values with indicators. We can download/create time-series data of indicators using DataEngineer
class as explained in Tutorial: Data preparation and Tutorial: Data engineering.
data_eng = cs.DataEngineer()
data_eng.download(databases=["japan", "covid19dh", "owid"]).clean().transform()
subset_df, *_ = data_eng.subset(geo="Japan")
indicator_df = subset_df.drop(["Population", "Susceptible", "Confirmed", "Infected", "Fatal", "Recovered"], axis=1)
indicator_df
To remove multicollinearity of indicators, we use pca package: a python package to perform Principal Component Analysis and to create insightful plots via our MLEngineer(seed=0).pca(X, n_components)
. Standardization (Z-score normalization) and Principal Component Analysis (PCA) will be performed.
ml = cs.MLEngineer()
pca_dict = ml.pca(X=indicator_df, n_components=0.95)
pca_df = pca_dict["PC"].copy()
pca_df.tail()
The output of MLEngineer().pca()
is the model of pca package, we can show some figures easily as follows.
Explained variance:
pca_dict["model"].plot()
plt.close()
Top features:
df = pca_dict["topfeat"].copy()
df["PC"] = df["PC"].str.extract(r"(\d+)").astype(np.int64)
df = df.sort_values(["PC", "type"]).reset_index(drop=True)
def highlight(d):
styles = d.copy()
styles.loc[:, :] = ""
styles.loc[d["type"] == "best", :] = "background-color: yellow"
return styles
df.style.apply(highlight, axis=None)
Before prediction of ODE parameter values, we need to prepare future values of (PCA-performed) indicators. We can add future values to the pandas.DataFrame
manually or forecast them with MLEngineer(seed=0).predict(Y, days=<int>, X=None)
as follows.
future_df = ml.forecast(Y=pca_df, days=30, X=None)
future_df.tail()
Now we have Y (estimated ODE parameter values) and X (estimated/forecasted indicator values without multicollinearity), we can predict ODE parameter values of future phases using ODEScenario().predict(days=<int>, name=<str>, seed=0, X=<pandas.DataFrame>)
. The new scenario is named "Predicted_with_X" here.
snr.build_with_template(name="Predicted_with_X", template="Baseline")
snr.predict(days=30, name="Predicted_with_X", seed=0, X=future_df);
As explained with Tutorial: Scenario analysis, we can compare scenarios.
# Adjust the last date, appending a phase
snr.append();
# Compare reproduction number
ymin = snr.compare_param("Rt", date_range=(future_start_date, None), display=False).min().min()
snr.compare_param("Rt", date_range=(future_start_date, None), ylim=(ymin, None));
Note that minimum value of y in the figure was changed to focus on the differences of the scenarios.
ymin_value = snr.compare_cases("Confirmed", date_range=(future_start_date, None), display=False).Predicted.min()
snr.compare_cases("Confirmed", date_range=(future_start_date, None), ylim=(ymin_value, None));
# Show representative values
snr.describe()
Thank you!