This notebook briefly walks through fitting the four linear Bayesian regression calibrations for global core top δ18Oc and SSTs.
Notebook settings for graphics and load some libraries.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pymc3 as pm
Set paths to our core top data. These are also given as supplemental information in the paper.
coretop_path = '../data/parsed/coretops.csv'
coretop_grid_path = '../data/parsed/coretops_grid.csv'
This should be straightforward.
coretops_raw = pd.read_csv(coretop_path)
coretops_grid = pd.read_csv(coretop_grid_path)
coretops_raw['foramtype'] = coretops_raw['species'].astype('category')
coretops_grid['foramtype'] = coretops_grid['species'].astype('category')
Take a quick look at the data to see what we're dealing with... (and make sure everything is there)
coretops_raw['foramtype'].unique()
[G. bulloides, G. ruber, N. incompta, N. pachyderma, T. sacculifer] Categories (5, object): [G. bulloides, G. ruber, N. incompta, N. pachyderma, T. sacculifer]
coretops_raw['foramtype'].value_counts()
G. ruber 1002 G. bulloides 635 T. sacculifer 442 N. pachyderma 425 N. incompta 132 Name: foramtype, dtype: int64
coretops_grid.head()
species | gridlat | gridlon | d18oc | d18osw | t_annual | t_seasonal | m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | foramtype | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | G. bulloides | -57.5 | -53.5 | 2.240000 | -0.297468 | 2.100175 | 3.783593 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | G. bulloides |
1 | G. bulloides | -56.5 | -115.5 | 2.280000 | -0.292819 | 5.167257 | 5.167257 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | G. bulloides |
2 | G. bulloides | -56.5 | -56.5 | 2.245000 | -0.281994 | 3.576477 | 4.666713 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | G. bulloides |
3 | G. bulloides | -55.5 | -58.5 | 2.146000 | -0.272228 | 4.832495 | 4.983686 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | G. bulloides |
4 | G. bulloides | -55.5 | -57.5 | 2.348571 | -0.266258 | 4.671475 | 5.106034 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | G. bulloides |
This is the model setup that pools all foram species together and calibrates on annual-mean SSTs.
Models is: δ18Oc=α+β∗T+δ18Osw−0.27+ϵ
with parameters: α∼N(3,4)β∼N(−0.2,1)ϵ∼N(0,τ)τ∼HalfCauchy(1)
Copy the original coretops_grid
to coretops
, which is our working copy. Then we make a temp
column from the annual SST data.
coretops = coretops_grid.copy()
coretops['temp'] = coretops['t_annual']
This gives you an idea what the data looks like:
print(coretops.shape)
coretops.head()
(1386, 21)
species | gridlat | gridlon | d18oc | d18osw | t_annual | t_seasonal | m1 | m2 | m3 | ... | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | foramtype | temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | G. bulloides | -57.5 | -53.5 | 2.240000 | -0.297468 | 2.100175 | 3.783593 | 1 | 1 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | G. bulloides | 2.100175 |
1 | G. bulloides | -56.5 | -115.5 | 2.280000 | -0.292819 | 5.167257 | 5.167257 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | G. bulloides | 5.167257 |
2 | G. bulloides | -56.5 | -56.5 | 2.245000 | -0.281994 | 3.576477 | 4.666713 | 1 | 1 | 1 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | G. bulloides | 3.576477 |
3 | G. bulloides | -55.5 | -58.5 | 2.146000 | -0.272228 | 4.832495 | 4.983686 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | G. bulloides | 4.832495 |
4 | G. bulloides | -55.5 | -57.5 | 2.348571 | -0.266258 | 4.671475 | 5.106034 | 1 | 1 | 1 | ... | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | G. bulloides | 4.671475 |
5 rows × 21 columns
This is the number of gridpoints we have for each foram species.
coretops.foramtype.value_counts()
G. ruber 489 G. bulloides 291 N. pachyderma 273 T. sacculifer 243 N. incompta 90 Name: foramtype, dtype: int64
Now define the model and sample it.
temp = coretops['temp'].values
d18osw = coretops['d18osw'].values
d18oc = coretops['d18oc'].values
with pm.Model() as model:
# Intercept and slope
a = pm.Normal('a', mu=3.0, sd=2)
b = pm.Normal('b', mu=-0.2, sd=1)
# Model error
tau = pm.HalfCauchy('tau', beta=1)
# Likelihood
d18oc_est = a + b * temp + (d18osw - 0.27)
likelihood_d18oc = pm.Normal('likelihood_d18oc', mu=d18oc_est, sd=tau,
observed=d18oc)
trace = pm.sample(draws=5000, tune=1000, chains=2, init='jitter+adapt_diag', random_seed=123)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [tau, b, a] Sampling 2 chains: 100%|██████████| 12000/12000 [00:14<00:00, 822.13draws/s]
Very basic diagnostic plots showing the sampling and posterior distribution of the parameters. Remember that the trace plot (below) has two lines on each plot, one for each chain.
pm.plots.traceplot(trace)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfed0da58>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfecdda90>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfec87cc0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfecafef0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfec64160>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfec0b128>]], dtype=object)
pm.plots.forestplot(trace)
GridSpec(1, 2, width_ratios=[3, 1])
Here is our PSIS-LOO statistics, for cross-validation.
pm.stats.loo(trace, model)
LOO_r(LOO=2248.330012450686, LOO_se=74.15124124931084, p_LOO=3.6165249073924315, shape_warn=0)
Same deal as before. This is a model setup that pools all foram species together but calibrates against the seasonal SSTs instead of annual.
Models is: δ18Oc=α+β∗T+δ18Osw−0.27+ϵ
with parameters: α∼N(3,4)β∼N(−0.2,1)ϵ∼N(0,τ)τ∼HalfCauchy(1)
coretops = coretops_grid.copy()
coretops['temp'] = coretops['t_seasonal']
print(coretops.shape)
coretops.head()
(1386, 21)
species | gridlat | gridlon | d18oc | d18osw | t_annual | t_seasonal | m1 | m2 | m3 | ... | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | foramtype | temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | G. bulloides | -57.5 | -53.5 | 2.240000 | -0.297468 | 2.100175 | 3.783593 | 1 | 1 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | G. bulloides | 3.783593 |
1 | G. bulloides | -56.5 | -115.5 | 2.280000 | -0.292819 | 5.167257 | 5.167257 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | G. bulloides | 5.167257 |
2 | G. bulloides | -56.5 | -56.5 | 2.245000 | -0.281994 | 3.576477 | 4.666713 | 1 | 1 | 1 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | G. bulloides | 4.666713 |
3 | G. bulloides | -55.5 | -58.5 | 2.146000 | -0.272228 | 4.832495 | 4.983686 | 1 | 1 | 1 | ... | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | G. bulloides | 4.983686 |
4 | G. bulloides | -55.5 | -57.5 | 2.348571 | -0.266258 | 4.671475 | 5.106034 | 1 | 1 | 1 | ... | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | G. bulloides | 5.106034 |
5 rows × 21 columns
Now define the model and sample...
temp = coretops['temp'].values
d18osw = coretops['d18osw'].values
d18oc = coretops['d18oc'].values
with pm.Model() as model:
# Intercept and slope
a = pm.Normal('a', mu=3.0, sd=2)
b = pm.Normal('b', mu=-0.2, sd=1)
# Model error
tau = pm.HalfCauchy('tau', beta=1)
# Likelihood
d18oc_est = a + b * temp + (d18osw - 0.27)
likelihood_d18oc = pm.Normal('likelihood_d18oc', mu=d18oc_est, sd=tau,
observed=d18oc)
trace = pm.sample(draws=5000, tune=1000, chains=2, init='jitter+adapt_diag', random_seed=123)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [tau, b, a] Sampling 2 chains: 100%|██████████| 12000/12000 [00:13<00:00, 859.43draws/s]
Diagnostic plots...
pm.plots.traceplot(trace)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfc78f4e0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfc73d518>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfc6e3748>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfc7c79b0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2c0cfb7b00>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2bfc7b7a90>]], dtype=object)
pm.plots.forestplot(trace)
GridSpec(1, 2, width_ratios=[3, 1])
PSIS-LOO, the cross-validation statistic:
pm.stats.loo(trace, model)
LOO_r(LOO=2063.3352538660483, LOO_se=65.1643759424864, p_LOO=3.268363037828749, shape_warn=0)
This is the hierarchical model setup, calibrated on annual SSTs. We're allowing some wiggle room in the parameters, for the individual species.
Models is: δ18Oc=αi+βi∗T+δ18Osw−0.27+ϵ
using parameters set for individual foram species (i): ϵ∼N(0,τi)αi∼N(μα,σα)βi∼N(μβ,σβ)τi∼Γ(σ2mσ2d,σmσ2d)
and hyperparameters: μα∼N(3,4)μβ∼N(−0.2,1)σα∼HalfCauchy(0.5)σβ∼HalfCauchy(0.25)σm∼HalfCauchy(1)σd∼HalfCauchy(1)
The rest follows what we had before.
Let's make a new copy of the core top data and set temp
:
coretops = coretops_grid.copy()
coretops['temp'] = coretops['t_annual']
Define the model and sample from it...
temp = coretops['temp'].values
d18osw = coretops['d18osw'].values
d18oc = coretops['d18oc'].values
foramtype = coretops['foramtype'].cat.codes
n_foram = len(set(foramtype))
with pm.Model() as model:
# Hyperparameters
mu_a = pm.Normal('mu_a', mu=3, sd=2)
sigma_a = pm.HalfCauchy('sigma_a', beta=0.5)
mu_b = pm.Normal('mu_b', mu=-0.2, sd=1)
sigma_b = pm.HalfCauchy('sigma_b', beta=0.25)
sigma_m = pm.HalfCauchy('sigma_m', beta=1)
sigma_d = pm.HalfCauchy('sigma_d', beta=1)
# Intercept and slope
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=n_foram)
b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=n_foram)
# Model error
tau = pm.Gamma('tau', alpha=sigma_m**2 / sigma_d**2,
beta=sigma_m / sigma_d**2,
shape=n_foram)
# Likelihood
d18oc_est = a[foramtype] + b[foramtype] * temp + (d18osw - 0.27)
likelihood_d18oc = pm.Normal('likelihood_d18oc', mu=d18oc_est,
sd=tau[foramtype], observed=d18oc)
trace = pm.sample(draws=5000, tune=1000, chains=2, init='jitter+adapt_diag', random_seed=123)
/home/sbm/miniconda3/envs/d18oc_sst/lib/python3.7/site-packages/theano/tensor/subtensor.py:2197: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. rval = inputs[0].__getitem__(inputs[1:]) Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... /home/sbm/miniconda3/envs/d18oc_sst/lib/python3.7/site-packages/theano/tensor/subtensor.py:2197: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. rval = inputs[0].__getitem__(inputs[1:]) Multiprocess sampling (2 chains in 2 jobs) NUTS: [tau, b, a, sigma_d, sigma_m, sigma_b, mu_b, sigma_a, mu_a] Sampling 2 chains: 100%|██████████| 12000/12000 [03:07<00:00, 64.02draws/s]
Diagnostic plots and such... same as before.
pm.plots.traceplot(trace)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6e4fb70>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6deacf8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6e06f28>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6da8198>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6dc53c8>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6ef4ef0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6f1b390>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be736af98>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be81e8588>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be8c5af28>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be8c3de48>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be92a1e10>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be98f8978>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be98daba8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bea342e48>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2bea796c18>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2bea770c18>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2bea7c4eb8>]], dtype=object)
pm.plots.forestplot(trace)
GridSpec(1, 2, width_ratios=[3, 1])
...and the PSIS-LOO cross-validation statistic:
pm.stats.loo(trace, model)
/home/sbm/miniconda3/envs/d18oc_sst/lib/python3.7/site-packages/theano/tensor/subtensor.py:2197: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. rval = inputs[0].__getitem__(inputs[1:])
LOO_r(LOO=1783.1659029713364, LOO_se=68.30307163243374, p_LOO=18.424742754624276, shape_warn=0)
Here we use the same hierarchical model setup, but calibrated on seasonal SSTs.
Models is: δ18Oc=αi+βi∗T+δ18Osw−0.27+ϵ
using parameters set for individual foram species (i): ϵ∼N(0,τi)αi∼N(μα,σα)βi∼N(μβ,σβ)τi∼Γ(σ2mσ2d,σmσ2d)
and hyperparameters: μα∼N(3,4)μβ∼N(−0.2,1)σα∼HalfCauchy(0.5)σβ∼HalfCauchy(0.25)σm∼HalfCauchy(1)σd∼HalfCauchy(1)
coretops = coretops_grid.copy()
coretops['temp'] = coretops['t_seasonal']
Fit the model and sample it...
temp = coretops['temp'].values
d18osw = coretops['d18osw'].values
d18oc = coretops['d18oc'].values
foramtype = coretops['foramtype'].cat.codes
n_foram = len(set(foramtype))
with pm.Model() as model:
# Hyperparameters
mu_a = pm.Normal('mu_a', mu=3, sd=2)
sigma_a = pm.HalfCauchy('sigma_a', beta=0.5)
mu_b = pm.Normal('mu_b', mu=-0.2, sd=1)
sigma_b = pm.HalfCauchy('sigma_b', beta=0.25)
sigma_m = pm.HalfCauchy('sigma_m', beta=1)
sigma_d = pm.HalfCauchy('sigma_d', beta=1)
# Intercept and slope
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=n_foram)
b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=n_foram)
# Model error
tau = pm.Gamma('tau', alpha=sigma_m**2 / sigma_d**2,
beta=sigma_m / sigma_d**2,
shape=n_foram)
# Likelihood
d18oc_est = a[foramtype] + b[foramtype] * temp + (d18osw - 0.27)
likelihood_d18oc = pm.Normal('likelihood_d18oc', mu=d18oc_est,
sd=tau[foramtype], observed=d18oc)
trace = pm.sample(draws=5000, tune=1000, chains=2, init='jitter+adapt_diag', random_seed=123)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [tau, b, a, sigma_d, sigma_m, sigma_b, mu_b, sigma_a, mu_a] Sampling 2 chains: 100%|██████████| 12000/12000 [04:22<00:00, 26.80draws/s] The number of effective samples is smaller than 25% for some parameters.
Diagnostics...
pm.plots.traceplot(trace)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f2c025db5c0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2c03b03630>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be693e160>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6242470>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be61dc9b0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be61f9f28>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be619c4e0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be61baa20>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be615afd0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6180588>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be611db00>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be61450b8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be60eb630>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6111ba8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be60c3160>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be60696d8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6092c50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7f2be6042208>]], dtype=object)
pm.plots.forestplot(trace)
GridSpec(1, 2, width_ratios=[3, 1])
And our PSIS-LOO cross-validation statistic...
pm.stats.loo(trace, model)
LOO_r(LOO=1952.2478024964116, LOO_se=65.8857575929775, p_LOO=17.875933673433337, shape_warn=0)