Chapter Goals

  • Demonstrate how to extract algorithm portfolio equity from Quantconnect backtest
  • Demonstrate how to predict future return paths using bayesian cones.
  • Demonstrate how to estimate distribution of algorithm CAGRs.
  • Demonstrate how to use model averaging to aid predictions.

Chapter Outline

  1. Read in Algorithm Portfolio Equity
  2. Choose the Best Algorithm Among 4 Variants
  3. Choose Best Bayesian Model of Algorithm Returns
  4. Compare Bayesian Cones for all Algos and all Return Models
  5. Compare Best Algo Predicted Portfolio Ending Values
  6. Compare Best Algo Predicted CAGR Distributions
  7. Model Averaging
  8. Conclusion

Algo Evaluation Motivation

When evaluating our trading systems there are two major areas of uncertainty we have to address: Algorithm Uncertainty, Model Uncertainty.

Algorithm Uncertainty (AU)

  • trade sequencing
  • slippage/price impact
  • network errors
  • software errors
  • hardware errors

Model Uncertainty (MU)

  • model is misspecified
  • incorrect parameters
  • changing market environment/nonstationarity
  • missing variables
  • etc.

What you will notice is that the AU examples given involve issues that can occur once our algorithm is live. What that also means is that some combination of those issues did not happen in the past but some combination of them could happen in the future. How can we estimate the impact of random exogenous shocks to our trading strategy's performance?

In this chapter we will employ a Bayesian methodology which will allow us to reasonably estimate the variation in strategy performance across many different return paths in an attempt to account for the exogenous shocks. The flip side to this approach is that we will have to use a model and all of the requisite assumptions.

That introduces the second area of importance MU. To incorporate our uncertainty about which model is "best" we will create 3 different models and compare them. Then we will combine their predictions before we make our final inference and prediction.

NOTE: in this chapter I have abstracted away some of the boilerplate code used to create and format data in the imported script ch5_utils.py. Some other functions and processes I have chosen to leave in the notebook for easier reference due to its importance in the analysis.

In [1]:
%load_ext watermark
%watermark

%load_ext autoreload
%autoreload 2

# import standard libs
from IPython.display import display
from IPython.core.debugger import set_trace as bp
from pathlib import PurePath, Path
import sys
import time
from collections import OrderedDict as od
import re
import os
import json
os.environ['THEANO_FLAGS'] = 'device=cpu,floatX=float32'

# get project dir
pp = PurePath(Path.cwd()).parts[:-1]
pdir = PurePath(*pp)
script_dir = pdir / 'scripts' 
viz_dir = pdir / 'visuals' / '05_Algorithm_Evaluation'
log_dir = pdir / 'data' / 'quantconnect_data'
sys.path.append(script_dir.as_posix())

# import python scientific stack
import pandas as pd
pd.set_option('display.max_rows', 50)
import numpy as np
import scipy.stats as stats
import math
import pymc3 as pm
from theano import shared, theano as tt
import ffn

# import visual tools
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

plt.style.use('seaborn-talk')
plt.style.use('bmh')

# import util libs
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")

from utils import cprint
from ch5_utils import pymc3_helper 
pmh = pymc3_helper()

RANDOM_STATE = 777
today = pd.to_datetime('today').date()

print()# 
%watermark -p pandas,numpy,pymc3,theano,sklearn,statsmodels,scipy,ffn,matplotlib,seaborn
2018-11-10T21:09:28-07:00

CPython 3.6.6
IPython 6.5.0

compiler   : GCC 7.2.0
system     : Linux
release    : 4.15.0-38-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 12
interpreter: 64bit

pandas 0.23.4
numpy 1.14.6
pymc3 3.5
theano 1.0.3
sklearn 0.20.0
statsmodels 0.9.0
scipy 1.1.0
ffn (0, 3, 3)
matplotlib 3.0.0
seaborn 0.9.0

Read in backtest portfolio equity data.

To do this I have defined some useful convenience functions to clean up the raw dataset.

1. Read in raw json 
2. Extract portfolio equity from json, rename columns, and parse time column
3. Rename df columns
4. Read in the properly formatted df
In [2]:
def read_bt_json(fp):
    """fn: read Quantconnect backtest json"""
    with open(fp, encoding='utf-8') as f_in:
        return(json.load(f_in))
    
    
def extract_portfolio_equity(jdata):
    """fn: extract port equity timeseries from Quantconnect json"""
    d = jdata['Charts']['Strategy Equity']['Series']['Equity']['Values']
    equity = (pd.DataFrame(d)
              .rename(columns=dict(x='time', y='equity'))
              .assign(time=lambda df: pd.to_datetime(df.time, utc=True, unit='s'))
              .set_index('time'))
    return equity

def _get_column_name(text):
    """fn: to get column name as first text group"""
    #n = 4 # hardcoded based on fn structure
    groups = text.split('.')
    return '_'.join(groups[:-1])

def read_port_equity(fn):
    fp = PurePath(log_dir / fn)#.as_posix()
    jdata = read_bt_json(fp) 
    
    # get column name
    col = _get_column_name(fn)
    # extract equity data
    equity = (extract_portfolio_equity(jdata)
              .rename(columns=dict(equity=col)))
    return equity
In [3]:
#norm_2_cp_equity = read_port_equity('normal_2_thres_70cp.json')
norm_2_cp_equity = read_port_equity('normal_2_thres_57_lev1_5x_rm.json')
norm_2_60_equity = read_port_equity('normal_2_thres_60.json')
norm_2_70_equity = read_port_equity('normal_2_thres_70.json')
norm_2_80_equity = read_port_equity('normal_2_thres_80.json')
cprint(norm_2_cp_equity)
-------------------------------------------------------------------------------
dataframe information
-------------------------------------------------------------------------------
                           normal_2_thres_57_lev1_5x_rm
time                                                   
2018-11-07 21:00:00+00:00                   273236.9544
2018-11-08 14:31:00+00:00                   272016.3744
2018-11-08 21:00:00+00:00                   273236.9544
2018-11-09 14:31:00+00:00                   273877.7244
2018-11-09 21:00:00+00:00                   272953.9344
--------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5137 entries, 2008-09-01 04:00:00+00:00 to 2018-11-09 21:00:00+00:00
Data columns (total 1 columns):
normal_2_thres_57_lev1_5x_rm    5137 non-null float64
dtypes: float64(1)
memory usage: 80.3 KB
None
-------------------------------------------------------------------------------

Combine strategy dataframes into one df for easier analysis.

In [4]:
list_of_dfs = [norm_2_cp_equity, norm_2_60_equity, norm_2_70_equity, norm_2_80_equity]
dfs = (pd.concat(list_of_dfs, axis=1)
       .resample('D') 
       .mean() # resample to average daily value
       .dropna(how='all')) 
cprint(dfs)
-------------------------------------------------------------------------------
dataframe information
-------------------------------------------------------------------------------
                           normal_2_thres_57_lev1_5x_rm  normal_2_thres_60  \
time                                                                         
2018-11-05 00:00:00+00:00                   272126.4094                NaN   
2018-11-06 00:00:00+00:00                   272626.6644                NaN   
2018-11-07 00:00:00+00:00                   272626.6644                NaN   
2018-11-08 00:00:00+00:00                   272626.6644                NaN   
2018-11-09 00:00:00+00:00                   273415.8294                NaN   

                           normal_2_thres_70  normal_2_thres_80  
time                                                             
2018-11-05 00:00:00+00:00                NaN                NaN  
2018-11-06 00:00:00+00:00                NaN                NaN  
2018-11-07 00:00:00+00:00                NaN                NaN  
2018-11-08 00:00:00+00:00                NaN                NaN  
2018-11-09 00:00:00+00:00                NaN                NaN  
--------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2569 entries, 2008-09-01 to 2018-11-09
Data columns (total 4 columns):
normal_2_thres_57_lev1_5x_rm    2569 non-null float64
normal_2_thres_60               2351 non-null float64
normal_2_thres_70               2351 non-null float64
normal_2_thres_80               2351 non-null float64
dtypes: float64(4)
memory usage: 100.4 KB
None
-------------------------------------------------------------------------------

In [5]:
# portfolio equity returns 
R = ffn.to_log_returns(dfs).dropna(how='all')
cprint(R)
-------------------------------------------------------------------------------
dataframe information
-------------------------------------------------------------------------------
                           normal_2_thres_57_lev1_5x_rm  normal_2_thres_60  \
time                                                                         
2018-11-05 00:00:00+00:00                     -0.003412                NaN   
2018-11-06 00:00:00+00:00                      0.001837                NaN   
2018-11-07 00:00:00+00:00                      0.000000                NaN   
2018-11-08 00:00:00+00:00                      0.000000                NaN   
2018-11-09 00:00:00+00:00                      0.002890                NaN   

                           normal_2_thres_70  normal_2_thres_80  
time                                                             
2018-11-05 00:00:00+00:00                NaN                NaN  
2018-11-06 00:00:00+00:00                NaN                NaN  
2018-11-07 00:00:00+00:00                NaN                NaN  
2018-11-08 00:00:00+00:00                NaN                NaN  
2018-11-09 00:00:00+00:00                NaN                NaN  
--------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2568 entries, 2008-09-02 to 2018-11-09
Data columns (total 4 columns):
normal_2_thres_57_lev1_5x_rm    2568 non-null float64
normal_2_thres_60               2350 non-null float64
normal_2_thres_70               2350 non-null float64
normal_2_thres_80               2350 non-null float64
dtypes: float64(4)
memory usage: 100.3 KB
None
-------------------------------------------------------------------------------

Choose the Best Algorithm Among 4 Variants

Now we can use a useful package called ffn to compute an assortment of performance statistics we can use to compare the algorithm portfolios. We're going to choose the best algorithm using the following metrics:

  • Total Return
  • Daily Sharpe
  • CAGR
  • Calmar Ratio
In [6]:
perf = ffn.calc_stats(dfs)

We can view the portfolio cumulative returns indexed to 100 easily

In [7]:
perf.plot()

Next we display the performance statistics.

In [8]:
perf.display()
Stat                 normal_2_thres_57_lev1_5x_rm    normal_2_thres_60    normal_2_thres_70    normal_2_thres_80
-------------------  ------------------------------  -------------------  -------------------  -------------------
Start                2008-09-01                      2008-09-01           2008-09-01           2008-09-01
End                  2017-12-29                      2017-12-29           2017-12-29           2017-12-29
Risk-free rate       0.00%                           0.00%                0.00%                0.00%

Total Return         183.51%                         11.01%               20.51%               9.22%
Daily Sharpe         0.93                            0.24                 0.43                 0.24
Daily Sortino        1.24                            0.29                 0.48                 0.22
CAGR                 11.82%                          1.13%                2.02%                0.95%
Max Drawdown         -26.66%                         -15.83%              -13.01%              -11.08%
Calmar Ratio         0.44                            0.07                 0.16                 0.09

MTD                  2.55%                           0.23%                0.12%                0.30%
3m                   7.45%                           0.38%                0.97%                1.48%
6m                   12.64%                          1.24%                0.92%                1.48%
YTD                  30.98%                          6.14%                4.11%                1.72%
1Y                   31.00%                          6.16%                4.21%                1.72%
3Y (ann.)            13.04%                          3.34%                3.87%                0.86%
5Y (ann.)            11.07%                          1.21%                2.12%                -0.16%
10Y (ann.)           11.82%                          1.13%                2.02%                0.95%
Since Incep. (ann.)  11.82%                          1.13%                2.02%                0.95%

Daily Sharpe         0.93                            0.24                 0.43                 0.24
Daily Sortino        1.24                            0.29                 0.48                 0.22
Daily Mean (ann.)    12.01%                          1.25%                2.12%                1.04%
Daily Vol (ann.)     12.93%                          5.16%                4.99%                4.36%
Daily Skew           -0.08                           -0.27                -0.63                -0.97
Daily Kurt           5.09                            8.66                 12.41                20.09
Best Day             5.18%                           2.31%                2.36%                2.36%
Worst Day            -4.99%                          -2.21%               -2.65%               -2.65%

Monthly Sharpe       0.97                            0.21                 0.38                 0.24
Monthly Sortino      1.51                            0.32                 0.62                 0.30
Monthly Mean (ann.)  12.65%                          1.22%                2.09%                1.05%
Monthly Vol (ann.)   13.00%                          5.93%                5.52%                4.34%
Monthly Skew         -0.16                           0.21                 0.40                 -0.19
Monthly Kurt         0.48                            1.03                 1.74                 2.81
Best Month           9.33%                           4.93%                5.62%                3.88%
Worst Month          -10.34%                         -5.23%               -4.85%               -4.77%

Yearly Sharpe        1.20                            0.33                 0.52                 0.48
Yearly Sortino       -                               1.34                 3.93                 1.05
Yearly Mean          13.87%                          1.93%                2.56%                1.43%
Yearly Vol           11.57%                          5.79%                4.94%                2.96%
Yearly Skew          0.81                            0.98                 1.37                 0.28
Yearly Kurt          -0.58                           0.79                 1.95                 -0.29
Best Year            32.90%                          13.60%               13.21%               6.53%
Worst Year           0.89%                           -4.42%               -2.79%               -3.01%

Avg. Drawdown        -2.09%                          -2.45%               -1.56%               -2.01%
Avg. Drawdown Days   24.97                           175.05               102.22               156.38
Avg. Up Month        3.45%                           1.29%                1.25%                1.02%
Avg. Down Month      -2.34%                          -1.15%               -1.04%               -0.62%
Win Year %           100.00%                         66.67%               77.78%               55.56%
Win 12m %            95.05%                          58.42%               67.33%               53.47%

Among the four algorithms, according to the four metrics listed above the clear "winner" is the normal_4 algo variant. This refers to the algorithm version that uses 4 mixture components and predicts the return distribution by sampling from the normal distribution.

Updated: Best variant is the cherry picked version using a little bit of leverage and a long-term holding period.

Choose Best Bayesian Model of Algorithm Returns

In this section we address the dual issues of algorithm uncertainty and model uncertainty.

Model Sampling and Comparison

In this next section we model portfolio equity returns using 3 different distributions: normal, laplace, student T. Then we compare the models.

$$ Y = \mu + \sigma $$

Here's the Normal specification:

$\mu \sim N(0,1)$

$\sigma \sim HalfCauchy(1)$

$Y \sim N(\mu, \sigma)$

Here's the Laplace specification:

$\mu \sim N(0,1)$

$\sigma \sim HalfCauchy(1)$

$Y \sim Laplace(\mu, \sigma)$

Here's the Student T specification:

$\nu \sim exp(.1)$

$\mu \sim N(0,1)$

$\sigma \sim HalfCauchy(1)$

$Y \sim StudentT(\nu+2, \mu, \sigma)$

In [11]:
## Code adapted from: https://github.com/quantopian/pyfolio/blob/master/pyfolio/bayesian.py ##

def normal_model(R, samples, name):
    """fn: sample normal model of strategy returns
    
    params
    ------
        R: pd.Series() simple returns ts
        samples: int()
        name: str(), model name
        
    returns
    -------
        model, trace
    """
    with pm.Model(name=name) as model:
        mu = pm.Normal('mean rets', mu=0, sd=.01, testval=R.mean())
        sigma = pm.HalfCauchy('vol', beta=1, testval=R.std())
        returns = pm.Normal('returns', mu=mu, sd=sigma, observed=R)

        pm.Deterministic(
            'annual mean returns',
            returns.distribution.mean * 252)

        pm.Deterministic(
            'annual volatility',
            returns.distribution.variance**.5 *
            np.sqrt(252))

        pm.Deterministic(
            'sharpe',
            returns.distribution.mean /
            returns.distribution.variance**.5*np.sqrt(252))

        step = pm.NUTS()
        trace = pm.sample(samples, tune=samples, step=step)   
    return model, trace

def laplace_model(R, samples, name):
    """fn: sample laplace model of strategy returns
    
    params
    ------
        R: pd.Series() simple returns ts
        samples: int()
        name: str(), model name
        
    returns
    -------
        model, trace
    """
    with pm.Model(name=name) as model:
        mu = pm.Normal('mean rets', mu=0, sd=.01, testval=R.mean())
        sigma = pm.HalfCauchy('vol', beta=1, testval=R.std())
        returns = pm.Laplace('returns', mu=mu, b=sigma, observed=R)

        pm.Deterministic(
            'annual mean returns',
            returns.distribution.mean * 252)

        pm.Deterministic(
            'annual volatility',
            returns.distribution.variance**.5 *
            np.sqrt(252))

        pm.Deterministic(
            'sharpe',
            returns.distribution.mean /
            returns.distribution.variance**.5*np.sqrt(252))

        step = pm.NUTS(target_accept=.9)
        trace = pm.sample(samples, tune=samples, step=step)
    return model, trace 

def student_model(R, samples, name):
    """fn: sample student T model of strategy returns
    
    params
    ------
        R: pd.Series() simple returns ts
        samples: int()
        name: str(), model name
        
    returns
    -------
        model, trace
    """

    with pm.Model(name=name) as model:
        nu = pm.Exponential('nu_minus_two', 1. / 10., testval=3.)
        mu = pm.Normal('mean rets', mu=0, sd=.01, testval=R.mean())
        sigma = pm.HalfCauchy('vol', beta=1, testval=R.std())
        returns = pm.StudentT('returns', nu=nu+2, mu=mu, sd=sigma, observed=R)

        pm.Deterministic(
            'annual mean rets',
            returns.distribution.mean * 252)

        pm.Deterministic(
            'annual volatility',
            returns.distribution.variance**.5*np.sqrt(252))

        pm.Deterministic(
            'sharpe',
            returns.distribution.mean /
            returns.distribution.variance**.5*np.sqrt(252))

        step = pm.NUTS(target_accept=.9)
        trace = pm.sample(samples, tune=samples, step=step)       
    return model, trace

def run_models_traces(r, samples=2_000):
    """fn: to run multiple models using algo returns
    
    params
    ------
        r: shared theano array
            example: shared(R['normal_4'].values)
            
    returns
    -------
        models: ordereddict with model outputs
        traces: ordereddict with trace outputs
    """

    # get model, traces
    norm_model, norm_trace = normal_model(r, samples, 'normal_model')
    la_model, la_trace = laplace_model(r, samples, 'la_model')
    t_model, t_trace = student_model(r, samples, 't_model')  
    
    # ordered dict is required to ensure insertion order
    # python 3.7 dict will implement insertion order feature by default
    models = od(norm_model=norm_model, la_model=la_model, t_model=t_model)    
    traces = od(norm_trace=norm_trace, la_trace=la_trace, t_trace=t_trace)
    
    compareDict = {norm_model:norm_trace, la_model:la_trace, t_model:t_trace}
    return models, traces, compareDict

For this portion of the analysis we will focus on the best algorithm variant normal_4.

In [12]:
best_algo_variant = 'normal_2_thres_57_lev1_5x_rm' #'normal_2_thres_70cp'
r = R[best_algo_variant].values
models, traces, compareDict = run_models_traces(r)
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [normal_model_vol, normal_model_mean rets]
Sampling 4 chains: 100%|██████████| 16000/16000 [00:04<00:00, 3681.52draws/s]
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [la_model_vol, la_model_mean rets]
Sampling 4 chains: 100%|██████████| 16000/16000 [00:04<00:00, 3225.34draws/s]
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [t_model_vol, t_model_mean rets, t_model_nu_minus_two]
Sampling 4 chains: 100%|██████████| 16000/16000 [00:11<00:00, 1392.91draws/s]
In [13]:
df_comp_waic = (pm.compare(compareDict, method='stacking'))
df_comp_waic
Out[13]:
WAIC pWAIC dWAIC weight SE dSE var_warn
t_model -17905 2.92 0 0.51 104.14 0 0
la_model -17902.6 2.22 2.39 0.49 104.47 13.78 0
normal_model -17451.4 4.38 453.62 0 132.87 68.85 0
In [14]:
fig, ax = plt.subplots()
pm.compareplot(df_comp_waic, ax=ax);

The best model is the t_model as it has the smallest deviance critierion. We can look at its traceplots below.

In [15]:
rvs = pmh.get_model_varnames(models['t_model'])
pmh.plot_traces_pymc(traces['t_trace'], varnames=rvs)

Comparing Bayesian Cones for all Algos and all Return Models

Even though we already know which model best describes the data, let's compare these algo variants and models a little more, in order to better understand how the model choices affect our algorithm performance predictions.

To demonstrate this, in the next section we will sample from each of the 3 models for each of the 4 algorithm variants and output their predicted performance in the form of a bayesian cone. I first became aware of this concept via Thomas Wiecki at Quantopian. Simply put it is a Bayesian methodology to predict the probable return paths of the algorithm in question. Their primary use case was trying to determine if an algorithm was overfitted to its sample period by comparing its performance since going live vs its backtest performance. Once an algorithm is live if it severely underperforms the sampled bayesian credible intervals then it is likely overfitted.

However in this use case we are going to adapt the methodology to simply predict the algo's likely return paths (aka future performance) and our uncertainty concerning the predicted performance.

First we are going to make a prediction dictionary to hold the following information for each model:

  • name: this is the string name of the model and the dictionary key.
  • ppc_samples: this is the raw pm.sample_ppc() output.
  • ppc: this is a formatted df of the ppc_samples output.
  • cuml_df: this is a formatted df of predicted cumulative returns.
In [20]:
preds = {}

for algo_name in R.columns:
    print('-'*77)
    print(f'sampling algorithm variant: {algo_name}')
    print()
    tmp_dict = {}
    tmp_r = shared(R[algo_name].dropna().values)
    tmp_models, tmp_traces, tmp_compareDict = run_models_traces(tmp_r)
    
    for model, trace in tmp_compareDict.items():
        ppc_samples = pm.sample_ppc(trace, 
                                    samples=252*2,
                                    model=model,
                                    size=tmp_r.eval().shape[0])
        #ppc_samples = pm.sample_posterior_predictive(trace, 
        #                            samples=252*2,
        #                            model=model,
        #                            size=tmp_r.eval().shape[0])        
        tmp_df = pd.DataFrame(ppc_samples[f'{model.name}_returns'])
        tmp_df.index = pd.date_range(pd.to_datetime('today').date(),
                                     periods=tmp_df.shape[0], freq='D')
        ppc = pmh.make_ppc_df(tmp_df)
        cuml_df = ppc.cumsum()
        tmp_dict[model.name] = dict(ppc_samples=ppc_samples,
                                    ppc=ppc,
                                    cuml_df=cuml_df)
    preds[algo_name] = tmp_dict
-----------------------------------------------------------------------------
sampling algorithm variant: normal_2_thres_57_lev1_5x_rm

Multiprocess sampling (4 chains in 4 jobs)
NUTS: [normal_model_vol, normal_model_mean rets]


Sampling 4 chains:   0%|          | 0/16000 [00:00<?, ?draws/s]

Sampling 4 chains:   0%|          | 20/16000 [00:00<01:19, 199.98draws/s]

Sampling 4 chains:   1%|          | 89/16000 [00:00<01:03, 252.53draws/s]

Sampling 4 chains:   1%|          | 184/16000 [00:00<00:48, 323.77draws/s]

Sampling 4 chains:   2%|▏         | 307/16000 [00:00<00:37, 415.42draws/s]

Sampling 4 chains:   2%|▏         | 394/16000 [00:00<00:31, 492.39draws/s]

Sampling 4 chains:   3%|▎         | 468/16000 [00:00<00:42, 364.11draws/s]

Sampling 4 chains:   4%|▍         | 653/16000 [00:00<00:31, 479.61draws/s]

Sampling 4 chains:   6%|▌         | 987/16000 [00:01<00:23, 645.39draws/s]

Sampling 4 chains:   9%|▊         | 1376/16000 [00:01<00:16, 860.70draws/s]

Sampling 4 chains:  11%|█▏        | 1816/16000 [00:01<00:12, 1134.11draws/s]

Sampling 4 chains:  14%|█▎        | 2191/16000 [00:01<00:09, 1434.08draws/s]

Sampling 4 chains:  16%|█▌        | 2578/16000 [00:01<00:07, 1767.57draws/s]

Sampling 4 chains:  18%|█▊        | 2959/16000 [00:01<00:06, 2105.99draws/s]

Sampling 4 chains:  21%|██        | 3369/16000 [00:01<00:05, 2465.67draws/s]

Sampling 4 chains:  24%|██▎       | 3761/16000 [00:01<00:04, 2773.56draws/s]

Sampling 4 chains:  26%|██▌       | 4137/16000 [00:01<00:04, 2863.19draws/s]

Sampling 4 chains:  29%|██▊       | 4585/16000 [00:01<00:03, 3210.20draws/s]

Sampling 4 chains:  31%|███       | 4988/16000 [00:02<00:03, 3417.64draws/s]

Sampling 4 chains:  34%|███▍      | 5450/16000 [00:02<00:02, 3707.00draws/s]

Sampling 4 chains:  37%|███▋      | 5861/16000 [00:02<00:02, 3793.71draws/s]

Sampling 4 chains:  39%|███▉      | 6288/16000 [00:02<00:02, 3918.40draws/s]

Sampling 4 chains:  42%|████▏     | 6739/16000 [00:02<00:02, 4076.90draws/s]

Sampling 4 chains:  45%|████▍     | 7164/16000 [00:02<00:02, 4085.03draws/s]

Sampling 4 chains:  47%|████▋     | 7585/16000 [00:02<00:02, 4065.12draws/s]

Sampling 4 chains:  50%|█████     | 8070/16000 [00:02<00:01, 4271.33draws/s]

Sampling 4 chains:  54%|█████▎    | 8579/16000 [00:02<00:01, 4485.78draws/s]

Sampling 4 chains:  57%|█████▋    | 9100/16000 [00:02<00:01, 4678.34draws/s]

Sampling 4 chains:  60%|██████    | 9617/16000 [00:03<00:01, 4812.69draws/s]

Sampling 4 chains:  63%|██████▎   | 10151/16000 [00:03<00:01, 4958.12draws/s]

Sampling 4 chains:  67%|██████▋   | 10687/16000 [00:03<00:01, 5070.79draws/s]

Sampling 4 chains:  70%|██████▉   | 11199/16000 [00:03<00:00, 5062.17draws/s]

Sampling 4 chains:  73%|███████▎  | 11709/16000 [00:03<00:00, 5043.90draws/s]

Sampling 4 chains:  76%|███████▋  | 12230/16000 [00:03<00:00, 5090.37draws/s]

Sampling 4 chains:  80%|███████▉  | 12761/16000 [00:03<00:00, 5154.02draws/s]

Sampling 4 chains:  83%|████████▎ | 13278/16000 [00:03<00:00, 5014.90draws/s]

Sampling 4 chains:  86%|████████▌ | 13782/16000 [00:03<00:00, 4857.18draws/s]

Sampling 4 chains:  89%|████████▉ | 14283/16000 [00:03<00:00, 4899.78draws/s]

Sampling 4 chains:  92%|█████████▏| 14775/16000 [00:04<00:00, 4790.20draws/s]

Sampling 4 chains:  96%|█████████▌| 15308/16000 [00:04<00:00, 4938.11draws/s]

Sampling 4 chains:  99%|█████████▉| 15805/16000 [00:04<00:00, 4848.30draws/s]

Sampling 4 chains: 100%|██████████| 16000/16000 [00:04<00:00, 3620.96draws/s]Multiprocess sampling (4 chains in 4 jobs)
NUTS: [la_model_vol, la_model_mean rets]


Sampling 4 chains:   0%|          | 0/16000 [00:00<?, ?draws/s]

Sampling 4 chains:   1%|          | 150/16000 [00:00<00:10, 1494.17draws/s]

Sampling 4 chains:   2%|▏         | 240/16000 [00:00<00:12, 1229.99draws/s]

Sampling 4 chains:   2%|▏         | 313/16000 [00:00<00:15, 1009.85draws/s]

Sampling 4 chains:   2%|▏         | 375/16000 [00:00<00:18, 830.87draws/s] 

Sampling 4 chains:   3%|▎         | 436/16000 [00:00<00:32, 480.08draws/s]

Sampling 4 chains:   3%|▎         | 486/16000 [00:00<00:43, 359.04draws/s]

Sampling 4 chains:   4%|▍         | 632/16000 [00:00<00:33, 463.60draws/s]

Sampling 4 chains:   6%|▌         | 880/16000 [00:01<00:24, 613.03draws/s]

Sampling 4 chains:   7%|▋         | 1147/16000 [00:01<00:18, 797.30draws/s]

Sampling 4 chains:   9%|▉         | 1484/16000 [00:01<00:14, 1034.13draws/s]

Sampling 4 chains:  12%|█▏        | 1840/16000 [00:01<00:10, 1313.12draws/s]

Sampling 4 chains:  14%|█▍        | 2257/16000 [00:01<00:08, 1652.67draws/s]

Sampling 4 chains:  17%|█▋        | 2657/16000 [00:01<00:06, 2004.58draws/s]

Sampling 4 chains:  19%|█▉        | 3055/16000 [00:01<00:05, 2354.67draws/s]

Sampling 4 chains:  22%|██▏       | 3456/16000 [00:01<00:04, 2687.39draws/s]

Sampling 4 chains:  24%|██▍       | 3846/16000 [00:01<00:04, 2962.90draws/s]

Sampling 4 chains:  27%|██▋       | 4273/16000 [00:01<00:03, 3260.79draws/s]

Sampling 4 chains:  29%|██▉       | 4663/16000 [00:02<00:03, 3357.74draws/s]

Sampling 4 chains:  32%|███▏      | 5044/16000 [00:02<00:03, 3412.54draws/s]

Sampling 4 chains:  34%|███▍      | 5437/16000 [00:02<00:02, 3552.67draws/s]

Sampling 4 chains:  36%|███▋      | 5817/16000 [00:02<00:02, 3573.22draws/s]

Sampling 4 chains:  39%|███▊      | 6192/16000 [00:02<00:02, 3598.91draws/s]

Sampling 4 chains:  41%|████      | 6582/16000 [00:02<00:02, 3683.04draws/s]

Sampling 4 chains:  43%|████▎     | 6959/16000 [00:02<00:02, 3664.96draws/s]

Sampling 4 chains:  46%|████▌     | 7346/16000 [00:02<00:02, 3723.31draws/s]

Sampling 4 chains:  49%|████▊     | 7788/16000 [00:02<00:02, 3906.97draws/s]

Sampling 4 chains:  52%|█████▏    | 8242/16000 [00:03<00:01, 4076.80draws/s]

Sampling 4 chains:  54%|█████▍    | 8690/16000 [00:03<00:01, 4189.75draws/s]

Sampling 4 chains:  57%|█████▋    | 9114/16000 [00:03<00:01, 4022.54draws/s]

Sampling 4 chains:  60%|█████▉    | 9536/16000 [00:03<00:01, 4079.45draws/s]

Sampling 4 chains:  62%|██████▏   | 9948/16000 [00:03<00:01, 4075.41draws/s]

Sampling 4 chains:  65%|██████▍   | 10374/16000 [00:03<00:01, 4128.26draws/s]

Sampling 4 chains:  68%|██████▊   | 10830/16000 [00:03<00:01, 4247.81draws/s]

Sampling 4 chains:  70%|███████   | 11268/16000 [00:03<00:01, 4285.56draws/s]

Sampling 4 chains:  74%|███████▎  | 11776/16000 [00:03<00:00, 4493.81draws/s]

Sampling 4 chains:  76%|███████▋  | 12233/16000 [00:03<00:00, 4515.63draws/s]

Sampling 4 chains:  80%|███████▉  | 12721/16000 [00:04<00:00, 4619.02draws/s]

Sampling 4 chains:  83%|████████▎ | 13208/16000 [00:04<00:00, 4691.18draws/s]

Sampling 4 chains:  86%|████████▌ | 13694/16000 [00:04<00:00, 4739.68draws/s]

Sampling 4 chains:  89%|████████▊ | 14170/16000 [00:04<00:00, 4641.76draws/s]

Sampling 4 chains:  92%|█████████▏| 14653/16000 [00:04<00:00, 4696.05draws/s]

Sampling 4 chains:  95%|█████████▍| 15124/16000 [00:04<00:00, 4327.41draws/s]

Sampling 4 chains:  97%|█████████▋| 15564/16000 [00:04<00:00, 3121.07draws/s]

Sampling 4 chains: 100%|█████████▉| 15929/16000 [00:05<00:00, 2200.94draws/s]

Sampling 4 chains: 100%|██████████| 16000/16000 [00:05<00:00, 3106.22draws/s]Multiprocess sampling (4 chains in 4 jobs)
NUTS: [t_model_vol, t_model_mean rets, t_model_nu_minus_two]


Sampling 4 chains:   0%|          | 0/16000 [00:00<?, ?draws/s]

Sampling 4 chains:   0%|          | 60/16000 [00:00<00:28, 553.08draws/s]

Sampling 4 chains:   0%|          | 77/16000 [00:00<00:49, 318.59draws/s]

Sampling 4 chains:   1%|          | 94/16000 [00:00<01:26, 184.47draws/s]

Sampling 4 chains:   1%|          | 109/16000 [00:00<01:42, 154.42draws/s]

Sampling 4 chains:   1%|          | 123/16000 [00:00<02:07, 124.41draws/s]

Sampling 4 chains:   1%|          | 137/16000 [00:00<02:03, 128.29draws/s]

Sampling 4 chains:   1%|          | 150/16000 [00:00<02:09, 122.82draws/s]

Sampling 4 chains:   1%|          | 163/16000 [00:01<02:09, 122.67draws/s]

Sampling 4 chains:   1%|          | 179/16000 [00:01<02:00, 131.39draws/s]

Sampling 4 chains:   1%|          | 198/16000 [00:01<01:49, 144.56draws/s]

Sampling 4 chains:   1%|▏         | 215/16000 [00:01<01:49, 144.70draws/s]

Sampling 4 chains:   1%|▏         | 232/16000 [00:01<01:44, 151.09draws/s]

Sampling 4 chains:   2%|▏         | 248/16000 [00:01<01:42, 153.01draws/s]

Sampling 4 chains:   2%|▏         | 265/16000 [00:01<01:40, 157.26draws/s]

Sampling 4 chains:   2%|▏         | 281/16000 [00:01<01:42, 153.37draws/s]

Sampling 4 chains:   2%|▏         | 297/16000 [00:01<01:58, 132.61draws/s]

Sampling 4 chains:   2%|▏         | 313/16000 [00:02<01:56, 135.07draws/s]

Sampling 4 chains:   2%|▏         | 327/16000 [00:02<02:06, 124.04draws/s]

Sampling 4 chains:   2%|▏         | 343/16000 [00:02<01:57, 132.96draws/s]

Sampling 4 chains:   2%|▏         | 360/16000 [00:02<01:53, 137.59draws/s]

Sampling 4 chains:   2%|▏         | 375/16000 [00:02<01:57, 133.51draws/s]

Sampling 4 chains:   2%|▏         | 389/16000 [00:02<02:05, 124.30draws/s]

Sampling 4 chains:   3%|▎         | 402/16000 [00:02<02:09, 120.62draws/s]

Sampling 4 chains:   3%|▎         | 415/16000 [00:02<02:23, 108.52draws/s]

Sampling 4 chains:   3%|▎         | 427/16000 [00:03<02:47, 92.75draws/s] 

Sampling 4 chains:   3%|▎         | 437/16000 [00:03<03:19, 77.88draws/s]

Sampling 4 chains:   3%|▎         | 446/16000 [00:03<03:16, 79.13draws/s]

Sampling 4 chains:   3%|▎         | 455/16000 [00:03<03:13, 80.32draws/s]

Sampling 4 chains:   3%|▎         | 465/16000 [00:03<03:04, 84.03draws/s]

Sampling 4 chains:   3%|▎         | 481/16000 [00:03<02:38, 97.66draws/s]

Sampling 4 chains:   3%|▎         | 522/16000 [00:03<02:02, 126.48draws/s]

Sampling 4 chains:   4%|▍         | 602/16000 [00:03<01:31, 169.18draws/s]

Sampling 4 chains:   4%|▍         | 714/16000 [00:03<01:07, 226.94draws/s]

Sampling 4 chains:   5%|▌         | 839/16000 [00:04<00:50, 300.75draws/s]

Sampling 4 chains:   6%|▌         | 978/16000 [00:04<00:38, 393.04draws/s]

Sampling 4 chains:   7%|▋         | 1118/16000 [00:04<00:29, 501.05draws/s]

Sampling 4 chains:   8%|▊         | 1271/16000 [00:04<00:23, 627.61draws/s]

Sampling 4 chains:   9%|▉         | 1412/16000 [00:04<00:19, 751.72draws/s]

Sampling 4 chains:  10%|▉         | 1567/16000 [00:04<00:16, 888.61draws/s]

Sampling 4 chains:  11%|█         | 1752/16000 [00:04<00:13, 1052.50draws/s]

Sampling 4 chains:  12%|█▏        | 1945/16000 [00:04<00:11, 1218.26draws/s]

Sampling 4 chains:  13%|█▎        | 2126/16000 [00:04<00:10, 1350.05draws/s]

Sampling 4 chains:  14%|█▍        | 2310/16000 [00:04<00:09, 1464.91draws/s]

Sampling 4 chains:  16%|█▌        | 2506/16000 [00:05<00:08, 1581.27draws/s]

Sampling 4 chains:  17%|█▋        | 2702/16000 [00:05<00:07, 1677.33draws/s]

Sampling 4 chains:  18%|█▊        | 2920/16000 [00:05<00:07, 1796.47draws/s]

Sampling 4 chains:  20%|█▉        | 3125/16000 [00:05<00:06, 1865.09draws/s]

Sampling 4 chains:  21%|██        | 3322/16000 [00:05<00:06, 1893.64draws/s]

Sampling 4 chains:  22%|██▏       | 3549/16000 [00:05<00:06, 1991.83draws/s]

Sampling 4 chains:  24%|██▎       | 3772/16000 [00:05<00:05, 2057.56draws/s]

Sampling 4 chains:  25%|██▍       | 3983/16000 [00:05<00:05, 2040.98draws/s]

Sampling 4 chains:  26%|██▌       | 4191/16000 [00:05<00:05, 2048.05draws/s]

Sampling 4 chains:  28%|██▊       | 4403/16000 [00:05<00:05, 2067.66draws/s]

Sampling 4 chains:  29%|██▉       | 4613/16000 [00:06<00:05, 2076.68draws/s]

Sampling 4 chains:  30%|███       | 4827/16000 [00:06<00:05, 2093.28draws/s]

Sampling 4 chains:  31%|███▏      | 5039/16000 [00:06<00:05, 2100.19draws/s]

Sampling 4 chains:  33%|███▎      | 5250/16000 [00:06<00:05, 2092.66draws/s]

Sampling 4 chains:  34%|███▍      | 5460/16000 [00:06<00:05, 2008.65draws/s]

Sampling 4 chains:  35%|███▌      | 5669/16000 [00:06<00:05, 2030.69draws/s]

Sampling 4 chains:  37%|███▋      | 5891/16000 [00:06<00:04, 2083.90draws/s]

Sampling 4 chains:  38%|███▊      | 6114/16000 [00:06<00:04, 2125.38draws/s]

Sampling 4 chains:  40%|███▉      | 6332/16000 [00:06<00:04, 2139.99draws/s]

Sampling 4 chains:  41%|████      | 6547/16000 [00:06<00:04, 2105.67draws/s]

Sampling 4 chains:  42%|████▏     | 6771/16000 [00:07<00:04, 2143.77draws/s]

Sampling 4 chains:  44%|████▎     | 6986/16000 [00:07<00:04, 2067.51draws/s]

Sampling 4 chains:  45%|████▍     | 7194/16000 [00:07<00:04, 1982.87draws/s]

Sampling 4 chains:  46%|████▋     | 7410/16000 [00:07<00:04, 2029.95draws/s]

Sampling 4 chains:  48%|████▊     | 7634/16000 [00:07<00:04, 2083.23draws/s]

Sampling 4 chains:  49%|████▉     | 7865/16000 [00:07<00:03, 2144.92draws/s]

Sampling 4 chains:  51%|█████     | 8095/16000 [00:07<00:03, 2187.69draws/s]

Sampling 4 chains:  52%|█████▏    | 8321/16000 [00:07<00:03, 2207.75draws/s]

Sampling 4 chains:  53%|█████▎    | 8543/16000 [00:07<00:03, 2165.71draws/s]

Sampling 4 chains:  55%|█████▍    | 8769/16000 [00:08<00:03, 2192.99draws/s]

Sampling 4 chains:  56%|█████▋    | 9009/16000 [00:08<00:03, 2250.98draws/s]

Sampling 4 chains:  58%|█████▊    | 9250/16000 [00:08<00:02, 2295.65draws/s]

Sampling 4 chains:  59%|█████▉    | 9481/16000 [00:08<00:02, 2298.17draws/s]

Sampling 4 chains:  61%|██████    | 9712/16000 [00:08<00:02, 2226.78draws/s]

Sampling 4 chains:  62%|██████▏   | 9936/16000 [00:08<00:02, 2140.92draws/s]

Sampling 4 chains:  63%|██████▎   | 10152/16000 [00:08<00:02, 2121.06draws/s]

Sampling 4 chains:  65%|██████▍   | 10366/16000 [00:08<00:02, 2106.25draws/s]

Sampling 4 chains:  66%|██████▌   | 10578/16000 [00:08<00:02, 2091.18draws/s]

Sampling 4 chains:  68%|██████▊   | 10803/16000 [00:08<00:02, 2130.89draws/s]

Sampling 4 chains:  69%|██████▉   | 11032/16000 [00:09<00:02, 2176.10draws/s]

Sampling 4 chains:  70%|███████   | 11272/16000 [00:09<00:02, 2233.95draws/s]

Sampling 4 chains:  72%|███████▏  | 11512/16000 [00:09<00:01, 2280.25draws/s]

Sampling 4 chains:  73%|███████▎  | 11741/16000 [00:09<00:01, 2245.69draws/s]

Sampling 4 chains:  75%|███████▍  | 11967/16000 [00:09<00:01, 2193.24draws/s]

Sampling 4 chains:  76%|███████▌  | 12188/16000 [00:09<00:01, 2178.00draws/s]

Sampling 4 chains:  78%|███████▊  | 12417/16000 [00:09<00:01, 2209.45draws/s]

Sampling 4 chains:  79%|███████▉  | 12639/16000 [00:09<00:01, 2126.11draws/s]

Sampling 4 chains:  80%|████████  | 12864/16000 [00:09<00:01, 2156.78draws/s]

Sampling 4 chains:  82%|████████▏ | 13081/16000 [00:10<00:01, 2085.94draws/s]

Sampling 4 chains:  83%|████████▎ | 13306/16000 [00:10<00:01, 2131.10draws/s]

Sampling 4 chains:  85%|████████▍ | 13545/16000 [00:10<00:01, 2201.54draws/s]

Sampling 4 chains:  86%|████████▌ | 13782/16000 [00:10<00:00, 2248.47draws/s]

Sampling 4 chains:  88%|████████▊ | 14014/16000 [00:10<00:00, 2265.43draws/s]

Sampling 4 chains:  89%|████████▉ | 14252/16000 [00:10<00:00, 2293.82draws/s]

Sampling 4 chains:  91%|█████████ | 14483/16000 [00:10<00:00, 2290.54draws/s]

Sampling 4 chains:  92%|█████████▏| 14713/16000 [00:10<00:00, 2264.42draws/s]

Sampling 4 chains:  93%|█████████▎| 14940/16000 [00:10<00:00, 2232.01draws/s]

Sampling 4 chains:  95%|█████████▍| 15164/16000 [00:10<00:00, 2197.46draws/s]

Sampling 4 chains:  96%|█████████▌| 15385/16000 [00:11<00:00, 2150.58draws/s]

Sampling 4 chains:  98%|█████████▊| 15601/16000 [00:11<00:00, 1630.54draws/s]

Sampling 4 chains:  99%|█████████▊| 15784/16000 [00:11<00:00, 1117.36draws/s]

Sampling 4 chains: 100%|█████████▉| 15931/16000 [00:11<00:00, 899.04draws/s] 

Sampling 4 chains: 100%|██████████| 16000/16000 [00:11<00:00, 1344.55draws/s]

  0%|          | 0/504 [00:00<?, ?it/s]
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-20-c5fe94d775f5> in <module>()
     13                                     samples=252*2,
     14                                     model=model,
---> 15                                     size=tmp_r.eval().shape[0])
     16         #ppc_samples = pm.sample_posterior_predictive(trace,
     17         #                            samples=252*2,

/media/bcr/HDD/anaconda3/envs/bayes_dash/lib/python3.6/site-packages/pymc3/sampling.py in sample_ppc(trace, samples, model, vars, size, random_seed, progressbar)
   1130     ppc_trace = defaultdict(list)
   1131     for varname, value in var_values:
-> 1132         ppc_trace[varname] = np.zeros((samples,) + value.shape, value.dtype)
   1133 
   1134     try:

MemoryError: 

In this section we are adapting the bayesian cone code from pyfolio for our use case below.

  1. The first function make_train_test() is creating the train and test splits needed for the bayesian cone plot. In the original use case this is a little simpler because we would simply be splitting the algorithm return series at the date when the algorithm is live. In this custom implementation we are using all of the algorithm's cumulative returns + 1, as the train set. The test set is the predicted mean of the sampled cumulative returns scaled by the last return of the train set.

  2. The function compute_bayes_cone() is calculating the percentile scores, or returns at the specified percentiles.

  3. The function plot_bayes_cone() plots a single algorithm's predicted performance along with credible intervals we computed compute_bayes_cone()

  4. The function plot_bayes_cone_grid() is simply a wrapper function to plot each of the 4 algorithm variants with each of the 3 models for easy comparison

In [ ]:
## Code adapted from: https://github.com/quantopian/pyfolio/blob/master/pyfolio/bayesian.py ##

def make_train_test(returns, cuml_df):
    """fn: to make train test dfs
    
    Params
    ------
        returns: pd.Series()
        cuml_df: df of simulated cuml returns
    
    Returns
    -------
        train, test: pd.Series(), pd.DataFrame()
    """
    t = returns.cumsum().add(1)
    t_test = cuml_df['mean_sim_path'].add(1) * t.iloc[-1]
    t_test.index = t_test.index.tz_localize('utc')
    return t, t_test

def compute_bayes_cone(preds, starting_value=1.):
    """
    Compute 5, 25, 75 and 95 percentiles of cumulative returns, used
    for the Bayesian cone.
    
    Params
    ------
    preds : numpy.array
        Multiple (simulated) cumulative returns.
    starting_value : int (optional)
        Have cumulative returns start around this value.
        Default = 1.
        
    Returns
    -------
    dict of percentiles over time
        Dictionary mapping percentiles (5, 25, 75, 95) to a
        timeseries.
    """

    def scoreatpercentile(cum_preds, p):
        return [stats.scoreatpercentile(
            c, p) for c in cum_preds.T]

    cum_preds = np.cumprod(preds + 1, 1) * starting_value
    perc = {p: scoreatpercentile(cum_preds, p) for p in (5, 25, 75, 95)}

    return perc

def plot_bayes_cone(train, test, algo_name, model_name, percentiles, ax=None):
    """fn: plot bayes cone using train 'test' split"""
    if ax is None: ax = plt.gca()
        
    t = train
    t_rel = test
    t.loc[t_rel.index[0]] = t_rel.iloc[0]

    t.plot(ax=ax, color='g', label='in-sample')
    t_rel.plot(ax=ax, color='r', label='future estimate')

    ax.fill_between(t_rel.index, percentiles[5], percentiles[95], alpha=.3)
    ax.fill_between(t_rel.index, percentiles[25], percentiles[75], alpha=.6)
    ax.legend(loc='upper left', frameon=True, framealpha=0.5)
    ax.set_title(f'{algo_name}::{model_name}')
    ax.set_xlabel('')
    ax.set_ylabel('Cumulative returns')

    ax.set_xlim(t.index[0], t_rel.index[-1])


def plot_bayes_cone_grid(preds, R, model_labels):

    algo_names = R.columns
    
    n = len(model_labels)
    m = len(algo_names)
    
    fig, axes = plt.subplots(m,n, figsize=(15,15))

    for i, algo_name in enumerate(algo_names):    
        for j, model_label in enumerate(model_labels):

            r = R[algo_name]
            tmp_c_df = preds[algo_name][model_label]['cuml_df']
            train, test = make_train_test(r, tmp_c_df)

            tmp_ppc_samples = preds[algo_name][model_label]['ppc_samples']
            
            # extract subsample before computing cone
            df_t = (pd.DataFrame(tmp_ppc_samples[f'{model_label}_returns'])
                    .sample(len(tmp_ppc_samples[f'{model_label}_returns']),
                            axis=1)
                    .values)
            perc = compute_bayes_cone(df_t, starting_value=train.iloc[-1])

            ax = axes[i,j]
            plot_bayes_cone(train, test, algo_name, model_label, perc, ax=ax)
        plt.suptitle('Bayesian cone || algo_name::model_name', 
                     fontsize=14, fontweight='medium',)
    plt.tight_layout()    
    fig.subplots_adjust(top=.92)
    today=pd.to_datetime('today').date()
    save_pth = PurePath(viz_dir/f'bayesian_cones_comparison_{today}.png').as_posix()
    fig.savefig(save_pth, dpi=300, bbox_inches='tight')    

Now let's examine the results.

In [ ]:
model_labels = ['normal_model', 'la_model', 't_model']
## plot
plot_bayes_cone_grid(preds, R, model_labels)

Some observations:

  • If we look at the last column which features all 4 algo variants paired with the t_model we can see that compared to the other columns the t_model consistently has the widest dispersion of predicted return paths. This is somewhat expected given that the Student T model is supposed to incorporate wider tails.

  • Looking at the la_model column we can see that it consistently estimates the predicted return path to hover very close to zero on average no matter what the performance of the of the algorithm was. This is also expected because the Laplace distribution is made by double exponential functions that concentrate the mass of the distribution around 0. The la_model also seems to have the smallest range in the cone compared to the others.

Compare Best Algo's Predicted Portfolio Ending Values

Compare Best Algo Predicted CAGR Distributions

In the next section we will quantify our observations about the bayesian cone widths. We will again disregard all the other algorithm variants and use our best_algo_variant which is the normal_4 algo to compare each of the 3 models and the width's of their bayesian cones at the end of the prediction period. To do this we will be using the distribution of the simulated portfolio ending values and comparing their 5, 50, 95 percentile values.

This analysis will also allow us to compare estimated CAGRs of the best_algo_variant for each of the 3 return models.

First let's look at a quick sample of the predicted cumulative return paths of our best algorithm with the "best" model of returns.

In [ ]:
ex_cuml_df = preds[best_algo_variant]['t_model']['cuml_df']

fig, ax = plt.subplots(figsize=(15,5))
ex_cuml_df.sample(150, axis=1).plot(legend=False, ax=ax, x_compat=True)
xmin, xmax = ex_cuml_df.index.min(), ex_cuml_df.index.max()
ax.set_xlim((xmin, xmax))
save_pth=PurePath(viz_dir/f'best_algo_t_model_simulated_paths_sample_{today}.png').as_posix()
fig.savefig(save_pth, dpi=300)

In this section we are going to extract the simulated portfolio ending values using the best algo variant normal_2_thres_70cp and for each return model and we're going to compare the ending value distributions.

We will also compute CAGR's for each simulated return path and summarize the distribution of expected CAGRs using the 5, 50, 95 percentiles.

In [ ]:
n_cols = 5000
quantiles = [0.05,.5,.95]

for model in model_labels:
    ## get ppc, paths and end vals 
    tmp_ppc = preds[best_algo_variant][model]['ppc']
    sim_path_df = pmh.get_paths(tmp_ppc, n_cols)
    sim_end_val = pmh.get_end_vals(sim_path_df)
    
    ## plot sim port end vals
    fig, axes = plt.subplots(1,2, figsize=(15,3))
    pmh.plot_port_end_dist(sim_end_val, axes,
                           model_name=model)
    save_pth = PurePath(viz_dir/f'best_algo_{model}_simulated_ending_values_dist_{today}.png').as_posix()
    fig.savefig(save_pth, dpi=300)
    ## plot cagr bar
    cagr = ffn.calc_cagr(sim_path_df)
    step = .01
    pcts = np.arange(0,1+step,step)
    pcts = np.round(pd.Index(pcts),2)
    ser = cagr.quantile(pcts)    

    fig, ax = plt.subplots(figsize=(15,3))
    pmh.plot_cagr_bar(ser, quantiles,
                      ax, model_name=model)
    save_pth = PurePath(viz_dir/f'best_algo_{model}_simulated_cagr_perc_{today}.png').as_posix()
    fig.savefig(save_pth, dpi=300)    

Model Averaging

So now we have 3 return models. Our previous WAIC based decision criterion told us that the t_model does the best job of modeling the algorithm returns. But as we can see there are other potentially credible models that hold information that could help improve our predictions. How can we incorporate other useful models in our predictions?

Lucky for us pymc3 has made the process straightforward. In section 3 we compared our 3 models using pm.compare(). The output dataframe has a column of weights which we can "vaguely interpret as the probability that each model will make the correct predictions on future data."

Next we can use the function pm.sample_ppc_w() to incorporate those weights. Make sure the weights are ordered identically to the traces and models we constructed above.

In [ ]:
df_comp_waic
In [ ]:
def make_stacked_model_df(traces, models, weights, samples=500, seed=0):
    """fn: sample wtd ppc
    
    Params
    ------
        traces: list of traces
        models: list of models
        weights: pd.Series of weights
    
    Returns
    -------
        ppc_df: df of wtd ppc samples
    """
    cols = []
    for i in tqdm(range(samples)):
        ppc_w = pm.sample_ppc_w(traces=traces, samples=252*2, 
                                models=models, 
                                weights=weights,
                                random_seed=seed+i, # reproducible                               
                                progressbar=False)
        ppc_w_data = (ppc_w[k] for k in list(ppc_w.keys()))
        col = pd.Series(np.hstack(ppc_w_data))
        cols.append(col)
    ppc_df = (pd.concat(cols,axis=1)) 
    ppc_df.index = pd.date_range(pd.to_datetime('today').date(),
                              periods=ppc_df.shape[0], freq='D')

    ppc_df = ppc_df.assign(mean_sim_port=lambda df: df.mean(1))
    return ppc_df


traces = list(compareDict.values())
models = list(compareDict.keys())
weights = df_comp_waic.weight.sort_values(ascending=True)

ppc_df = make_stacked_model_df(traces, models, weights)
cprint(ppc_df)
In [ ]:
n_cols = 500
quantiles = [0.05,.5,.95]

## get ppc, paths and end vals 
sim_path_df = pmh.get_paths(ppc_df, n_cols)
sim_end_val = pmh.get_end_vals(sim_path_df)

## plot sim port end vals
fig, axes = plt.subplots(1,2, figsize=(15,3))
pmh.plot_port_end_dist(sim_end_val, axes,
                       model_name='stacked_model')
save_pth = PurePath(viz_dir/f'stacked_model_simulated_ending_values_dist_{today}.png').as_posix()
fig.savefig(save_pth, dpi=300)

## plot cagr bar
cagr = ffn.calc_cagr(sim_path_df)
step = .01
pcts = np.arange(0,1+step,step)
pcts = np.round(pd.Index(pcts),2)
ser = cagr.quantile(pcts)    

fig, ax = plt.subplots(figsize=(15,3))
pmh.plot_cagr_bar(ser, quantiles,
                  ax, model_name='stacked_model')
save_pth = PurePath(viz_dir/f'stacked_model_simulated_cagr_perc_{today}.png').as_posix()
fig.savefig(save_pth, dpi=300)  

What if we try our own arbitrary weights?

In [ ]:
new_weights = [0.32, 0.34, 0.34]

new_ppc_dfs = make_stacked_model_df(traces, models, new_weights)
cprint(new_ppc_dfs)
In [ ]:
n_cols = 500
quantiles = [0.05,.5,.95]

## get ppc, paths and end vals 
sim_path_df = pmh.get_paths(new_ppc_dfs, n_cols)
sim_end_val = pmh.get_end_vals(sim_path_df)

## plot sim port end vals
fig, axes = plt.subplots(1,2, figsize=(15,3))
pmh.plot_port_end_dist(sim_end_val, axes,
                       model_name='alt_stacked_model')
save_pth = PurePath(viz_dir/f'alt_stacked_model_simulated_ending_values_dist_{today}.png').as_posix()
fig.savefig(save_pth, dpi=300)

## plot cagr bar
cagr = ffn.calc_cagr(sim_path_df)
step = .01
pcts = np.arange(0,1+step,step)
pcts = np.round(pd.Index(pcts),2)
ser = cagr.quantile(pcts)    

fig, ax = plt.subplots(figsize=(15,3))
pmh.plot_cagr_bar(ser, quantiles,
                  ax, model_name='alt_stacked_model')
save_pth = PurePath(viz_dir/f'alt_stacked_model_simulated_cagr_perc_{today}.png').as_posix()
fig.savefig(save_pth, dpi=300)  

Conclusions

Using our stacked_model with the original pymc3 generated weights we can interpret the results as so; Trading the normal_2_thres_70cp algorithm variant, we would expect over the next 504 trading days or approximately 2 years that our portfolio equity will be between ~$92K and ~$114K with an expected value of ~$103K. The predicted CAGR is estimated to be between -5% and +10% with an expected value of 2.0%.

In [ ]: