Two Years of Bayesian Bandits for E-Commerce¶

PyData NYC • October 18, 2018 • @AustinRochford¶

[email protected] • [email protected]¶

• Founded 2008, web optimization and personalization SaaS

Simulating a bandit¶

In [12]:
class BetaBinomial:
def __init__(self, a0=1., b0=1.):
self.a = a0
self.b = b0

def sample(self):
return sp.stats.beta.rvs(self.a, self.b)

def update(self, n, x):
self.a += x
self.b += n - x

In [13]:
class Bandit:
def __init__(self, a_post, b_post):
self.a_post = a_post
self.b_post = b_post

def assign(self):
return 1 * (self.a_post.sample() < self.b_post.sample())

def update(self, arm, reward):
arm_post = self.a_post if arm == 0 else self.b_post
arm_post.update(1, reward)

In [15]:
A_RATE, B_RATE = 0.05, 0.1
N = 1000

rewards_gen = generate_rewards(A_RATE, B_RATE, N)

In [16]:
bandit = Bandit(BetaBinomial(), BetaBinomial())
arms = np.empty(N, dtype=np.int64)
rewards = np.empty(N)

for t, arm_rewards in tqdm(enumerate(rewards_gen), total=N):
arms[t] = bandit.assign()
rewards[t] = arm_rewards[arms[t]]

bandit.update(arms[t], rewards[t])

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1000/1000 [00:00<00:00, 5443.70it/s]

In [18]:
fig

Out[18]:

Simulating many bandits¶

In [20]:
N_BANDIT = 100

arms = np.empty((N_BANDIT, N), dtype=np.int64)
rewards = np.empty((N_BANDIT, N))

for i in trange(N_BANDIT):
arms[i], rewards[i] = simulate_bandit(
generate_rewards(A_RATE, B_RATE, N), N
)

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 100/100 [00:11<00:00,  8.98it/s]

In [22]:
fig

Out[22]:

A/A testing¶

A/A bandits¶
In [24]:
N = 2000

arms, rewards = simulate_bandits(
lambda: generate_rewards(A_RATE, A_RATE, N),
N, N_BANDIT
)

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 100/100 [00:22<00:00,  4.57it/s]

In [26]:
fig

Out[26]:
In [28]:
fig

Out[28]: