When we go down from multiple time-series to single time-series, the best way how to get access to all relevant information to use/access ModelSelectorResult
objects
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = [12, 6]
from hcrystalball.model_selection import ModelSelector
from hcrystalball.utils import get_sales_data
from hcrystalball.wrappers import get_sklearn_wrapper
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
df = get_sales_data(n_dates=365*2,
n_assortments=1,
n_states=1,
n_stores=2)
df.head()
# let's start simple
df_minimal = df[['Sales']]
ms_minimal = ModelSelector(frequency='D', horizon=10)
ms_minimal.create_gridsearch(
n_splits=2,
between_split_lag=None,
sklearn_models=False,
sklearn_models_optimize_for_horizon=False,
autosarimax_models=False,
prophet_models=False,
tbats_models=False,
exp_smooth_models=False,
average_ensembles=False,
stacking_ensembles=False)
ms_minimal.add_model_to_gridsearch(get_sklearn_wrapper(LinearRegression, hcb_verbose=False))
ms_minimal.add_model_to_gridsearch(get_sklearn_wrapper(RandomForestRegressor, random_state=42, hcb_verbose=False))
ms_minimal.select_model(df=df_minimal, target_col_name='Sales')
There are three ways how you can get to single time-series result level.
.results[i]
, which is fast, but does not ensure, that results are loaded in the same order as when they were created (reason for that is hash used in the name of each result, that are later read in alphabetic order).get_result_for_partition()
through dict
based partitionpartition_hash
(also in results file name if persisted)result = ms_minimal.results[0]
result = ms_minimal.get_result_for_partition({'no_partition_label': ''})
result = ms_minimal.get_result_for_partition(ms_minimal.partitions[0])
result = ms_minimal.get_result_for_partition('fb452abd91f5c3bcb8afa4162c6452c2')
As you can see below, we try to store all relevant information to enable easy access to data, that is otherwise very lenghty.
result
result.X_train
result.y_train
Ready to be plotted or adjusted to your needs
result.df_plot
result.df_plot.tail(50).plot();
result
That can help to filter for example cv_data
or to get a glimpse on which parameters the best model has
result.best_model_hash
result.best_model_name
result.best_model_repr
Get information about how our model behaved in cross validation
result.best_model_cv_results['mean_fit_time']
Or how all the models behaved
result.cv_results.sort_values('rank_test_score').head()
Access predictions made during cross validation with possible cv splits and true target values
result.cv_data.head()
result.cv_data.drop(['split'], axis=1).plot();
result.best_model_cv_data.head()
result.best_model_cv_data.plot();
With **plot_params
that you can pass depending on your plotting backend
result.plot_result(plot_from='2015-06', title='Performance', color=['blue','green']);
result.plot_error(title='Error');
result.persist?