Rethinking Statistics course in Python - Week 2¶

Lecture 3: Geocentric Models

Video
Slides

Lecture 4: Wiggly Orbits

Video
Slides

Proposed problems and solutions in R for the exercises of the week.

In [1]:

%matplotlib inline
import pymc3 as pm
import numpy as np
import pandas as pd
import theano

from scipy import stats
from sklearn.metrics import mean_squared_error
import math

import matplotlib.pyplot as plt
import altair as alt
alt.data_transformers.enable('default', max_rows=None)
import arviz as az

import warnings
warnings.filterwarnings('ignore')

Short Intro¶

Example of building a prior manually (kind of a pain):

In [139]:

x = np.linspace(100, 250, 100)
plt.plot(x, stats.norm.pdf(x, 178, 20));

In [140]:

x = np.linspace(-10, 60, 100)
plt.plot(x, stats.uniform.pdf(x, 0, 50));

In [141]:

n_samples = 1000
sample_mu = stats.norm.rvs(loc=178, scale=20, size=n_samples)
sample_sigma = stats.uniform.rvs(loc=0, scale=50, size=n_samples)
prior_h = stats.norm.rvs(loc=sample_mu, scale=sample_sigma)
pm.kdeplot(prior_h)
plt.xlabel('heights', fontsize=14)
plt.yticks([]);

Exercise 1¶

The weights listed below were recorded in the !Kung census, but heights were not recorded for these individuals. Provide predicted heights and 89% compatibility intervals for each of these individuals. That is, fill in the table below, using model-based predictions.

In [2]:

aux = pd.DataFrame(np.array([45, 40, 65, 31, 53])).reset_index().rename(columns={'index':'Individual',0:'weight'})
aux['expected height']=np.nan
aux['89% interval'] = np.nan
aux

Out[2]:

	Individual	weight	expected height	89% interval
0	0	45	NaN	NaN
1	1	40	NaN	NaN
2	2	65	NaN	NaN
3	3	31	NaN	NaN
4	4	53	NaN	NaN

Let's get the data.

The weights that interest us are all adult weights, so we can analyze only the adults and make an okay linear approximation. If you did something else, that's okay. I deliberately made the question a little vague. Loading the data, selecting out adults, and doing the regression from the book:

In [2]:

d = pd.read_csv('../../data/Howell1.csv', sep=';', header=0)
d.head()

Out[2]:

	height	weight	age	male
0	151.765	47.825606	63.0	1
1	139.700	36.485807	63.0	0
2	136.525	31.864838	65.0	0
3	156.845	53.041915	41.0	1
4	145.415	41.276872	51.0	0

In [4]:

d2 = d[d.age >= 18].reset_index(drop=True)
d2 = d2.assign(weight_c=pd.Series(d2.weight - d2.weight.mean()))

This is the first of many times that a model is defined in pymc3 in this course so I'll take a moment to explain a bit its general structure. First the model is defined and named, in this case as model_1. Then there are four basic sections within it:

Data: where the data that model needs is defined. I'm taking advantage here of the Data container class recently included in pymc3 v3.7. This class wraps the theano shared variable class and let the model be aware of its inputs and outputs. In the later notebooks I usually define the data outside of the model because it saves some space when comparing models but for a individual model I think is just a bit nicer. This is very useful for making predictions with the model for data not used for its fitting, something extremely common in Machine Learning ($Xtest$ anyone?). In the course and the book this is usually called out-of-sample prediction but in general Data Science we refer to it as just measuring the accuracy of the model on the test set. More information on the Data class can be found here and here and on the out-of-sample (and in-sample) prediction in here.
Priors: where the priors are defined. This priors are the initial distributions our parameters evolve from. They are supposed to represent our knowledge about our system and the nature of reality and maths jeje. For instance, a standard deviation can not be negative so we initialize it as a Uniform distribution with lower bound zero. In the course this is mentioned again and again.
Regression: where we define the actual structure of the model, in this case a linear regression (to estimate $\mu$) and where we also define the prior for our output variable, in this case height_hat. I will be using the "hat" subindex in the output metric of almost all models to express that it is the model estimation of the original variable, in this case height.
Prior, trace and posterior: where the priors are sampled from and where the posteriors are estimated and sampled from. Most of the time we will not need neither the priors or the posterior predictions only the parameters, that are stored in the trace. However, I think its a good policy to have in all models to keep all concepts in mind and just uncomment them when we need them (like in this exercise).

In [5]:

with pm.Model() as model_1:
    # Data
    weight = pm.Data('weight', d2['weight_c'].values)
    height = pm.Data('height', d2['height'].values)
    
    # Priors
    alpha = pm.Normal('alpha', mu=178, sd=20)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    
    # Regression
    mu = alpha + beta * weight
    height_hat = pm.Normal('height_hat', mu=mu, sd=sigma, observed=height)
    
    # Prior sampling, trace definition and posterior sampling
    prior = pm.sample_prior_predictive()
    posterior_1 = pm.sample(draws=1000, tune=1000)
    posterior_pred_1 = pm.sample_posterior_predictive(posterior_1)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta, alpha]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [00:02<00:00, 2787.39draws/s]
100%|██████████| 4000/4000 [00:54<00:00, 73.70it/s]

Let's get summaries and traces using both az and pm.

In [6]:

az.summary(posterior_1, credible_interval=.89).round(2)
#pm.summary(posterior_1, alpha=.11).round(2)# also possible

Out[6]:

	mean	sd	hpd_5.5%	hpd_94.5%	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
alpha	154.6	0.28	154.17	155.04	6154.0	6152.0	6156.0	2962.0	1.0
beta	0.9	0.04	0.84	0.97	6599.0	6599.0	6655.0	2465.0	1.0
sigma	5.1	0.19	4.80	5.41	5620.0	5581.0	5667.0	3028.0	1.0

Both pymc3 and arviz have traceplots functions to see how the distribution of each parameter looks like. There usually are four curves per parameter since four chains is the default when running the model.

In [7]:

pm.traceplot(posterior_1);
#az.plot_trace(posterior_1); #also possible

Let's get a predicted value for each of our original samples. This is not requested by the exercise but I think it makes sense first to estimate how good our predictions are in-sample (or in $X{train}$) before moving to doing predictions out-of-sample ($X_{test}$)._

In [8]:

posterior_pred_1['height_hat'].shape

Out[8]:

(4000, 352)

We got 1000 draws (times 4 chains) for height_hat for each of the original exercise samples. Let's just take the mean for each original sample and build a new column for it in the original dataset.

In [9]:

d2 = d2.assign(height_hat=np.mean(posterior_pred_1['height_hat'],axis=0))

In [10]:

plot = alt.Chart(d2)\
   .mark_circle()\
   .encode(
        x=alt.X('height', title='height',scale=alt.Scale(domain=(135, 180))),
        y=alt.Y('height_hat', title='height_hat',scale=alt.Scale(domain=(135, 180)))
          )

plot

Out[10]:

In [11]:

print(f'The RMSE in X_train is {round(math.sqrt(mean_squared_error(d2.height.values, d2.height_hat.values)),2)} cm.')

The RMSE in X_train is 5.07 cm.

I think thats a reasonably good RMSE. Let's build now the High Posterior Density Interval for each original sample.

In [12]:

height_hat_hpd = pm.hpd(posterior_pred_1['height_hat'], alpha=0.11)
#height_hat_hpd = az.hpd(posterior_pred_1['height_hat'], credible_interval=.89) # is also good

In [13]:

hpdi = pd.DataFrame(height_hat_hpd).rename({0:'lower_hpdi',1:'upper_hpdi'}, axis=1)
d3 = pd.concat([d2,hpdi], axis = 1)
round(d3.head(5),2)

Out[13]:

	height	weight	age	male	weight_c	height_hat	lower_hpdi	upper_hpdi
0	151.76	47.83	63.0	1	2.84	157.03	148.86	165.17
1	139.70	36.49	63.0	0	-8.50	146.79	139.01	155.29
2	136.52	31.86	65.0	0	-13.13	142.63	134.62	151.08
3	156.84	53.04	41.0	1	8.05	162.00	153.85	170.50
4	145.42	41.28	51.0	0	-3.71	151.41	143.28	159.62

Let's finally predict the values requested by the exercise. We had our original variables stored in aux.

In order to use the model to predict on unseen data we need to swap the values stored in the data within the model. I know, it's a bit convoluted but you get used to it jeje. In this case we had two inputs: weight and height but actually only have information regarding weight for our new individuals so we only update this variable. It is a bit odd though that the dependent variable, in this case height can just remain holding the original training data and things work fine. I guess that sample_posterior_predictive does not use it. Since the dimensions do not match (training data has 352 indiviuals and test data only 5) I guess it's the only possible explanation.

I read here that some people define a $fake-holdout-y$ with the same dimensions as $X_{test}$ but at least now it seems unnecessary (check also this where only the predictors are updated). Maybe someone can PR this and shed some light here.

In any case the predictions are good and match perfectly those obtained in the original R solutions.

In [14]:

weight.set_value(aux.weight.values-np.mean(d2.weight))

Once the data is swapped we use sample_posterior_predictive again over the original trace and model.

In [15]:

posterior_pred_1 = pm.sample_posterior_predictive(trace = posterior_1, samples = 500, model = model_1)

100%|██████████| 500/500 [00:07<00:00, 64.27it/s]

And we finally get the requested values for the new individuals.

In [16]:

aux['expected height'] = posterior_pred_1['height_hat'].mean(axis=0)
aux['89% interval'] = pd.Series(list(pm.hpd(posterior_pred_1['height_hat'], alpha=0.11)))
aux

Out[16]:

	Individual	weight	expected height	89% interval
0	0	45	154.378392	[147.10980069780555, 162.98875419930548]
1	1	40	150.080918	[142.40517323434042, 158.0363212364431]
2	2	65	172.656271	[165.08652880603088, 181.55263532157568]
3	3	31	141.740800	[133.28582732918625, 150.67856920275216]
4	4	53	161.565233	[151.87099317877485, 168.51969307816304]

Exercise 2¶

Model the relationship between height (cm) and the natural logarithm of weight (log-kg): log(weight). Use the entire Howell1 data frame, all 544 rows, adults and non-adults. Use any model type from Chapter 4 (from the book) that you think useful: an ordinary linear regression, a polynomial or a spline. Plot the posterior predictions against the raw data.

In [3]:

d2 = d
d2 = d2.assign(log_weight_c=pd.Series(np.log(d2.weight) - np.log(d2.weight).mean()))

In [4]:

with pm.Model() as model_2:
    # Data
    log_weight = pm.Data('log_weight', d2['log_weight_c'].values)
    height = pm.Data('height', d2['height'].values)
    
    # Priors
    alpha = pm.Normal('alpha', mu=178, sd=20)
    beta = pm.Normal('beta', mu=0, sd=1)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    
    # Regression
    mu = alpha + beta * log_weight
    height_hat = pm.Normal('height_hat', mu=mu, sd=sigma, observed=height)
    
    # Prior sampling, trace definition and posterior sampling
    prior = pm.sample_prior_predictive()
    posterior_2 = pm.sample(draws=1000, tune=1000)
    posterior_pred_2 = pm.sample_posterior_predictive(posterior_2)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta, alpha]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [00:04<00:00, 1901.01draws/s]
100%|██████████| 4000/4000 [01:06<00:00, 70.39it/s]

Let's get summaries and traces using both az and pm.

In [36]:

az.summary(posterior_2, credible_interval=.89).round(2)
#pm.summary(trace_model_2, alpha=.11).round(2) also possible

Out[36]:

	mean	sd	mcse_mean	mcse_sd	hpd_5.5%	hpd_94.5%	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
alpha	138.32	0.79	0.02	0.01	137.16	139.68	2234.0	2234.0	2236.0	2656.0	1.0
beta	16.19	1.35	0.03	0.02	14.11	18.38	1921.0	1900.0	1939.0	1804.0	1.0
sigma	18.58	0.93	0.02	0.02	17.11	20.02	1923.0	1923.0	1930.0	1846.0	1.0

In [20]:

pm.traceplot(posterior_2);
#az.plot_trace(trace_model_2); #also possible

In [21]:

d2 = d2.assign(height_hat=np.mean(posterior_pred_2['height_hat'],axis=0)) # just as the previous exercise, we recover the mean of height_hat for each sample

In [22]:

height_pred_hpd = pm.hpd(posterior_pred_2['height_hat'], alpha=0.11) # and the High Posterior Density interval for each sample as well
# az.hpd(height_pred['height_hat'], credible_interval=.89) is also good

Some harmless data wragling and plotting:

In [23]:

hpdi = pd.DataFrame(height_pred_hpd).rename({0:'lower_hpdi',1:'upper_hpdi'}, axis=1)
d2 = pd.concat([d2,hpdi], axis = 1)
d2_plot = d2.melt(id_vars=['weight'],
                  value_vars=['height','height_hat'],
                  var_name='height_origin',
                  value_name='heights')

In [27]:

plot = alt.Chart(d2_plot.query('height_origin.isin(["height","height_hat"])'))\
   .mark_point()\
   .encode(
        x=alt.X('weight', title='weight',scale=alt.Scale(domain=(0, 80))),
        y=alt.Y('heights', title='heights',scale=alt.Scale(domain=(0, 180))),
        color='height_origin'
          )

plot_hpdi = alt.Chart(d2)\
   .mark_area(opacity=0.2, color='orange')\
   .encode(
        x=alt.X('weight', title='weight',scale=alt.Scale(domain=(0, 80))),
        y='lower_hpdi:Q',
        y2='upper_hpdi:Q'
          )

plot_2 = alt.Chart(d2)\
   .mark_point(color='black')\
   .encode(
        x=alt.X('height', title='height',scale=alt.Scale(domain=(0, 180))),
        y=alt.Y('height_hat', title='height_hat',scale=alt.Scale(domain=(0, 180)))
          )

alt.hconcat(plot+plot_hpdi,plot_2)

Out[27]:

Results are meh. I don't know why I get a worst fit that the R solution. Since the parameters are not shown I cannot trace back the error but the code seems fine to me. Let's see what happens when we take a polynomial regression of second degree.

In [5]:

d2 = d
d2 = d2.assign(weight_c=pd.Series(d2.weight - d2.weight.mean())/d2.weight.std())
d2 = d2.assign(weight_c_2=pd.Series(d2.weight_c**2))

In [28]:

with pm.Model() as model_2_2:
    # Data
    weight_c = pm.Data('weight_c', d2['weight_c'].values)
    weight_c_2 = pm.Data('weight_c_2', d2['weight_c_2'].values)
    height = pm.Data('height', d2['height'].values)
    
    # Priors
    alpha = pm.Normal('alpha', mu=178, sd=20)
    beta1 = pm.Lognormal('beta1', mu=0, sd=1) # note this is a lognormal, not a normal, just to follow the code of the book as close as possible (a normal works just as good)
    beta2 = pm.Normal('beta2', mu=0, sd=1)
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    
    # Regression
    mu = alpha + beta1 * weight_c + beta2 * weight_c_2
    height_hat = pm.Normal('height_hat', mu=mu, sd=sigma, observed=height)
    
    # Prior sampling, trace definition and posterior sampling
    prior = pm.sample_prior_predictive()
    posterior_2_2 = pm.sample(draws=1000, tune=1000)
    posterior_pred_2_2 = pm.sample_posterior_predictive(posterior_2_2)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta2, beta1, alpha]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [00:04<00:00, 1697.10draws/s]
100%|██████████| 4000/4000 [00:56<00:00, 70.83it/s]

Let's get summaries and traces using both az and pm.

In [29]:

az.summary(posterior_2_2, credible_interval=.89).round(2)
#pm.summary(trace_model_2_2, alpha=.11).round(2) also possible

Out[29]:

	mean	sd	mcse_mean	hpd_5.5%	hpd_94.5%	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
alpha	146.04	0.37	0.01	145.48	146.64	2650.0	2650.0	2647.0	3019.0	1.0
beta2	-7.79	0.27	0.00	-8.21	-7.36	2446.0	2445.0	2447.0	2542.0	1.0
beta1	21.74	0.29	0.00	21.23	22.16	3142.0	3142.0	3155.0	3140.0	1.0
sigma	5.80	0.18	0.00	5.52	6.09	3414.0	3408.0	3422.0	2868.0	1.0

In [30]:

pm.traceplot(posterior_2_2);
#az.plot_trace(posterior_2_2); #also possible

In [35]:

d2 = d2.assign(height_hat=np.mean(posterior_pred_2_2['height_hat'],axis=0))

In [36]:

height_pred_hpd = pm.hpd(posterior_pred_2_2['height_hat'], alpha=0.11)
# az.hpd(height_pred['height_hat'], credible_interval=.89) is also good

In [37]:

hpdi = pd.DataFrame(height_pred_hpd).rename({0:'lower_hpdi',1:'upper_hpdi'}, axis=1)
d2 = pd.concat([d2,hpdi], axis = 1)
d2_plot = d2.melt(id_vars=['weight'],
                  value_vars=['height','height_hat'],
                  var_name='height_origin',
                  value_name='heights')

In [38]:

plot = alt.Chart(d2_plot.query('height_origin.isin(["height","height_hat"])'))\
   .mark_point()\
   .encode(
        x=alt.X('weight', title='weight',scale=alt.Scale(domain=(0, 80))),
        y=alt.Y('heights', title='heights',scale=alt.Scale(domain=(0, 180))),
        color='height_origin'
          )

plot_hpdi = alt.Chart(d2)\
   .mark_area(opacity=0.2, color='orange')\
   .encode(
        x=alt.X('weight', title='weight',scale=alt.Scale(domain=(0, 80))),
        y='lower_hpdi:Q',
        y2='upper_hpdi:Q'
          )

plot_2 = alt.Chart(d2)\
   .mark_point(color='black')\
   .encode(
        x=alt.X('height', title='height',scale=alt.Scale(domain=(0, 180))),
        y=alt.Y('height_hat', title='height_hat',scale=alt.Scale(domain=(0, 180)))
          )

alt.hconcat(plot+plot_hpdi,plot_2)

Out[38]:

Much better now.

Exercise 3¶

Plot the prior predictive distribution for the polynomial regression model in Chapter 4 (just what we did in the previous exercise). You can modify the the code that plots the linear regression prior predictive distribution. 20 or 30 parabolas from the prior should suffice to show where the prior probability resides. Can you modify the prior distributions of alpha, beta1, and beta2 so that the prior predictions stay within the biologically reasonable outcome space? That is to say: Do not try to fit the data by hand. But do try to keep the curves consistent with what you know about height and weight, before seeing these exact data.

We have already sampled the prior with the sample_prior_predictive function within the model. Super-easy sintax. Let's do it, with 30 samples is enough for plotting.

In [34]:

az.summary(prior, credible_interval=.89).round(2)
# using summary with the priors is just a bit ugly, a bunch of warnings trigger and NaNs are produced

arviz.stats.stats_utils - WARNING - Array contains NaN-value.
arviz.stats.stats_utils - WARNING - Array contains NaN-value.
arviz.stats.stats_utils - WARNING - Array contains NaN-value.
arviz.stats.stats_utils - WARNING - Array contains NaN-value.
arviz.stats.stats_utils - WARNING - Array contains NaN-value.
arviz.stats.stats_utils - WARNING - Array contains NaN-value.

Out[34]:

	mean	sd	mcse_mean	mcse_sd	hpd_5.5%	hpd_94.5%	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
sigma_interval__	0.16	1.63	NaN	NaN	-5.68	NaN	NaN	NaN	NaN	NaN	NaN
sigma	26.05	13.65	NaN	NaN	0.17	NaN	NaN	NaN	NaN	NaN	NaN
beta1_log__	-0.08	1.03	NaN	NaN	-2.81	NaN	NaN	NaN	NaN	NaN	NaN
beta1	1.57	2.06	NaN	NaN	0.06	NaN	NaN	NaN	NaN	NaN	NaN
alpha	178.34	20.12	NaN	NaN	120.19	NaN	NaN	NaN	NaN	NaN	NaN
beta2	-0.06	1.02	NaN	NaN	-2.75	NaN	NaN	NaN	NaN	NaN	NaN
height_hat	178.31	35.66	0.88	0.63	122.61	234.7	1622.0	1622.0	1556.0	3843.0	1.22

Let's just recover the values from the priors manually and build a pandas dataframe.

In [75]:

df = pd.DataFrame()
df = df.assign(height_hat = pd.Series(prior['height_hat'].mean(axis=1)),
               alpha = pd.Series(prior['alpha']),
               sigma = pd.Series(prior['sigma']),
               beta1 = pd.Series(prior['beta1']),
               beta2 = pd.Series(prior['beta2']))
df.head()

Out[75]:

	height_hat	alpha	sigma	beta1	beta2
0	217.866044	219.399265	23.173812	3.352894	-1.657478
1	178.677972	175.505789	46.415113	0.176993	0.567291
2	161.901267	160.900475	13.919769	1.652082	1.164084
3	198.518299	199.519911	34.265158	1.179330	1.502513
4	168.899595	167.592501	27.945360	0.739642	-0.053964

In [76]:

df.mean() # it checks out with the az.summary results

Out[76]:

height_hat    178.308150
alpha         178.337894
sigma          26.047420
beta1           1.571378
beta2          -0.057667
dtype: float64

Now some data wrangling.

In [77]:

priors = pd.DataFrame()

for i in range(30): # 30 parabolas is fine
    priors['prior_'+str(i)] = df.loc[i,'alpha'] + df.loc[i,'beta1'] * d2['weight_c'] + df.loc[i,'beta2'] * d2['weight_c_2']

aux = pd.concat([d2.weight_c, priors], axis=1)
aux_plot = aux.melt(id_vars=['weight_c'],
                  value_vars=list(aux.columns)[1:],
                  var_name='prior',
                  value_name='height')

In [80]:

plot = alt.Chart(aux_plot)\
   .mark_line()\
   .encode(
        x=alt.X('weight_c', title='weight_c'),
        y=alt.Y('height', title='height'),
        color=alt.Color('prior', legend=None)
          )

plot.properties(title='beta1 ~ dlnorm(0, 1) , beta2 ~ dnorm(0, 1)')

Out[80]:

Recall that the world's tallest person was 270 cm tall. The tallest person in the sample is about 180 cm. The prior curvature is not very strong. Those parabolas hardly bend at all. We can increase the standard deviation on the beta2 prior, but that will produce some silly shapes (left below), where either average weight is tallest or shortest. That can't be right. The basic problem is that beta2 needs to be negative to make the curve bend down, but beta1 has to also change in order to move the maximum height to the right. It's all a bit confusing, and is they key reason that working with polynomial models is so hard. The prior on the right below can only bend down, but I've made the linear model a + beta1*weight_c - beta2*weight_c_2 and given beta2 a logNormal prior.

In [38]:

d2 = d
d2 = d2.assign(weight_c=pd.Series(d2.weight - d2.weight.mean())/d2.weight.std())
d2 = d2.assign(weight_c_2=pd.Series(d2.weight_c**2))

Please note I'm running three models in the same snippet for simplicity. The only differences are the priors.

In [71]:

with pm.Model() as model_3:
    # Data
    weight_c = pm.Data('weight_c', d2['weight_c'].values)
    weight_c_2 = pm.Data('weight_c_2', d2['weight_c_2'].values)
    height = pm.Data('height', d2['height'].values)
    
    # Priors
    alpha = pm.Normal('alpha', mu=178, sd=20)
    beta1_1 = pm.Lognormal('beta1_1', mu=0, sd=1)
    beta2_1 = pm.Normal('beta2_1', mu=0, sd=10) # increasing the sd
    
    beta1_2 = pm.Lognormal('beta1_2', mu=2, sd=1)
    beta2_2 = pm.Lognormal('beta2_2', mu=1, sd=1) # making beta2 a lognormal
    
    beta1_3 = pm.Normal('beta1_3', mu=0, sd=100) # flat prior
    beta2_3 = pm.Normal('beta2_3', mu=0, sd=100) # flat prior
    
    sigma = pm.Uniform('sigma', lower=0, upper=50)
    
    # Regressions
    mu_1 = alpha + beta1_1 * weight_c + beta2_1 * weight_c_2
    height_hat_1 = pm.Normal('height_hat_1', mu=mu_1, sd=sigma, observed=height)
    
    mu_2 = alpha + beta1_2 * weight_c - beta2_2 * weight_c_2 # note the sign change
    height_hat_2 = pm.Normal('height_hat_2', mu=mu_2, sd=sigma, observed=height)
    
    mu_3 = alpha + beta1_3 * weight_c + beta2_3 * weight_c_2
    height_hat_3 = pm.Normal('height_hat_3', mu=mu_3, sd=sigma, observed=height)
    
    # Prior sampling, trace definition and posterior sampling
    prior = pm.sample_prior_predictive()
    posterior_3 = pm.sample()
    posterior_pred_3 = pm.sample_posterior_predictive(posterior_3)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, beta2_3, beta1_3, beta2_2, beta1_2, beta2_1, beta1_1, alpha]
Sampling 4 chains, 0 divergences: 100%|██████████| 4000/4000 [00:04<00:00, 948.38draws/s] 
100%|██████████| 2000/2000 [01:31<00:00, 21.90it/s]

In [81]:

df = pd.DataFrame()
df = df.assign(alpha = pd.Series(prior['alpha']),
               sigma = pd.Series(prior['sigma']),
               beta1_1 = pd.Series(prior['beta1_1']),
               beta2_1 = pd.Series(prior['beta2_1']),
               beta1_2 = pd.Series(prior['beta1_2']),
               beta2_2 = pd.Series(prior['beta2_2']),
               beta1_3 = pd.Series(prior['beta1_3']),
               beta2_3 = pd.Series(prior['beta2_3']))

In [82]:

df.head()

Out[82]:

	alpha	sigma	beta1_1	beta2_1	beta1_2	beta2_2	beta1_3	beta2_3
0	174.900998	7.264140	0.939067	-11.344971	3.215034	0.980401	-6.743351	-15.686045
1	192.935702	8.193026	0.519352	-3.573985	2.470305	2.705127	67.586147	56.931776
2	194.906673	13.476760	1.030064	-5.529674	6.735777	3.593835	205.460102	19.219400
3	191.899950	1.687804	6.197893	-16.481562	28.411498	15.928632	2.799394	57.436045
4	182.414696	28.177356	0.581860	6.988846	22.424196	3.230487	-72.042675	238.269823

Now some data wrangling.

In [83]:

priors_1 = pd.DataFrame()
priors_2 = pd.DataFrame()
priors_3 = pd.DataFrame()

for i in range(30): # 30 parabolas is fine
    priors_1['prior_'+str(i)] = df.loc[i,'alpha'] + df.loc[i,'beta1_1'] * d2['weight_c'] + df.loc[i,'beta2_1'] * d2['weight_c_2']
    priors_2['prior_'+str(i)] = df.loc[i,'alpha'] + df.loc[i,'beta1_2'] * d2['weight_c'] - df.loc[i,'beta2_2'] * d2['weight_c_2'] # note the sign change
    priors_3['prior_'+str(i)] = df.loc[i,'alpha'] + df.loc[i,'beta1_3'] * d2['weight_c'] + df.loc[i,'beta2_3'] * d2['weight_c_2']

aux_1 = pd.concat([d2.weight_c, priors_1], axis=1)
aux_2 = pd.concat([d2.weight_c, priors_2], axis=1)
aux_3 = pd.concat([d2.weight_c, priors_3], axis=1)

In [84]:

aux_1_plot = aux_1.melt(id_vars=['weight_c'],
                      value_vars=list(aux_1.columns)[1:],
                      var_name='prior',
                      value_name='height')

aux_2_plot = aux_2.melt(id_vars=['weight_c'],
                      value_vars=list(aux_2.columns)[1:],
                      var_name='prior',
                      value_name='height')

aux_3_plot = aux_3.melt(id_vars=['weight_c'],
                      value_vars=list(aux_3.columns)[1:],
                      var_name='prior',
                      value_name='height')

In [85]:

plot_1 = alt.Chart(aux_1_plot)\
   .mark_line(clip=True)\
   .encode(
        x=alt.X('weight_c', title='weight_c', scale=alt.Scale(domain=(-2, 2))),
        y=alt.Y('height', title='height', scale=alt.Scale(domain=(50, 250))),
        color=alt.Color('prior', legend=None)
          )

plot_2 = alt.Chart(aux_2_plot)\
   .mark_line(clip=True)\
   .encode(
        x=alt.X('weight_c', title='weight_c', scale=alt.Scale(domain=(-2, 2))),
        y=alt.Y('height', title='height', scale=alt.Scale(domain=(50, 250))),
        color=alt.Color('prior', legend=None)
          )

alt.hconcat(plot_1.properties(title='beta1 ~ dlnorm(0, 1) , beta2 ~ dnorm(0, 1)'),
            plot_2.properties(title='beta1 ~ dlnorm(2, 1) , beta2 ~ dlnorm(1, 1)'))

Out[85]:

A key problem in getting reasonable curves here is that obviously a and beta1 and beta2 are correlated in the family of reasonable curves. But the priors are uncorrelated, they are independent of one another. Still, if you can get independent priors to at least live within some reasonably space of outcome values, that's a lot better than flat priors. What would flat priors look like here? Something like this:

In [86]:

plot_3 = alt.Chart(aux_3_plot)\
   .mark_line(clip=True)\
   .encode(
        x=alt.X('weight_c', title='weight_c', scale=alt.Scale(domain=(-2, 2))),
        y=alt.Y('height', title='height', scale=alt.Scale(domain=(50, 250))),
        color=alt.Color('prior', legend=None)
          )

plot_3.properties(title='beta1 ~ dnorm(0, 100) , beta2 ~ dnorm(0, 100)')

Out[86]:

These prior curves actually strongly favor explosive growth or shrinkage near the mean. This is a general phenomenon with “flat” priors: Once the predictor is at all complicated, “flat” does not imply “no relationship.” Do any of the priors above make a difference for inference in this sample? No. There is a lot of data and the model is quite simple, in terms of the way that parameters relate to predictions. This will not always be the case.