A Dockerfile that will produce a container with all the dependencies necessary to run this notebook is available here.

Clear the theano cache before each run to avoid subtle bugs.

In [1]:
!rm -rf /home/jovyan/.theano/

In [2]:
%matplotlib inline

In [3]:
import datetime
from itertools import product
import logging
import pickle
from warnings import filterwarnings

In [4]:
from matplotlib import pyplot as plt
from matplotlib.offsetbox import AnchoredText
from matplotlib.ticker import FuncFormatter, StrMethodFormatter
import numpy as np
import pandas as pd
import scipy as sp
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from theano import shared, tensor as tt

In [5]:
# keep theano from complaining about compile locks for small models
(logging.getLogger('theano.gof.compilelock')
.setLevel(logging.CRITICAL))

# silence PyMC3 warnings (there aren't many)
filterwarnings(
'ignore', ".*diverging samples after tuning.*",
module='pymc3'
)
filterwarnings(
'ignore', ".*acceptance probability.*",
module='pymc3'
)

In [6]:
pct_formatter = StrMethodFormatter('{x:.1%}')

blue, green, *_ = sns.color_palette()

# configure pyplot for readability when rendered as a slideshow and projected
sns.set(color_codes=True)

plt.rc('figure', figsize=(8, 6))

LABELSIZE = 14
plt.rc('axes', labelsize=LABELSIZE)
plt.rc('axes', titlesize=LABELSIZE)
plt.rc('figure', titlesize=LABELSIZE)
plt.rc('legend', fontsize=LABELSIZE)
plt.rc('xtick', labelsize=LABELSIZE)
plt.rc('ytick', labelsize=LABELSIZE)

In [7]:
SEED = 207183 # from random.org, for reproducibility


# Last Two Minute Report¶

Since late in the 2014-2015 season, the NBA has issued last two minute reports. These reports give the league's assessment of the correctness of fall calls and non-calls in the last two minutes of any game where the score difference was three or fewer points at any point in the last two minutes.

These reports are notably different from play-by-play logs, in that they include information on non-calls for notable on-court interactions. This non-call information presents a unique opportunity to study the factors that impact foul calls. There is a level of subjectivity inherent in the the NBA's definition of notable on-court interactions which we attempt to mitigate later using season-specific factors.

### Scraping the data¶

%%bash