Title: Two Years of Bayesian Bandits for ECommerce
Abstract: At Monetate, we've deployed Bayesian bandits (both noncontextual and contextual) to help our clients optimize their ecommerce sites since early 2016. This talk is an overview of the lessons we've learned from both the processes of deploying realtime Bayesian machine learning systems at scale and building a data product on top of these systems that is accessible to nontechnical users (marketers). This talk will cover
We will focus primarily on noncontextual bandits and give a brief overview of these problems in the contextual setting as time permits.
Bio: Austin Rochford is Chief Data Scientist at Monetate, where he does research and development for machine learningdriven marketing products. He is a recovering mathematician, a passionate Bayesian, and a PyMC3 developer.
A Dockerfile
that will produce a container with the dependenceis of this notebook is available here.
%matplotlib inline
from tqdm import tqdm, trange
from matplotlib import pyplot as plt
from matplotlib.dates import DateFormatter, MonthLocator
from matplotlib.ticker import StrMethodFormatter
import numpy as np
import pandas as pd
import scipy as sp
import seaborn as sns
SEED = 76224 # from random.org, for reproducibiliy
np.random.seed(SEED)
sns.set()
C = sns.color_palette()
pct_formatter = StrMethodFormatter('{x:.1%}')
#configure matplotlib
FIGURE_WIDTH = 8
FIGURE_HEIGHT = 6
plt.rc('figure', figsize=(FIGURE_WIDTH, FIGURE_HEIGHT))
LABELSIZE = 14
plt.rc('axes', labelsize=LABELSIZE)
plt.rc('axes', titlesize=LABELSIZE)
plt.rc('figure', titlesize=LABELSIZE)
plt.rc('legend', fontsize=LABELSIZE)
plt.rc('xtick', labelsize=LABELSIZE)
plt.rc('ytick', labelsize=LABELSIZE)


Much of the A/B testing industry uses classical Fisher/NeymanPearson style fixedhorizon frequentist significance tests. Sophistocated approaches use sequential hypothesis testing. We've found, through many years of interaction with marketers, that they take a fundamentally Bayesian view of the world. Most interpret pvalues as the "probability that the experiment is better/worse than the control."
Multiarmed bandit problems are extensively studied and come in many variantions. Here we focus on the simplest multiarmed bandit objective, regret minimization.
Thompson sampling randomizes user/variant assignment according to the probabilty that each variant maximizes the posterior expected reward.
The probability that a user is assigned variant A is
$$ \begin{align*} P(r_A > r_B\ \ \mathcal{D}) & = \int_0^1 P(r_A > r\ \ \mathcal{D})\ \pi_B(r\ \ \mathcal{D})\ dr \\ & = \int_0^1 \left(\int_r^1 \pi_A(s\ \ \mathcal{D})\ ds\right)\ \pi_B(r\ \ \mathcal{D})\ dr \\ & \propto \int_0^1 \left(\int_r^1 s^{\alpha_A  1} (1  s)^{\beta_A  1}\ ds\right) r^{\alpha_B  1} (1  r)^{\beta_B  1}\ dr \end{align*} $$N = 5000
x, y = np.random.uniform(0, 1, size=(2, N))
fig, ax = plt.subplots()
ax.set_aspect('equal');
ax.scatter(x, y, c='k', alpha=0.5);
ax.set_xticks([0, 1]);
ax.set_xlim(0, 1.01);
ax.set_yticks([0, 1]);
ax.set_ylim(0, 1.01);
fig