This notebook is the code used in my blog post on Green dice are loaded (welcome to p-hacking)
It demonstrates how to mislead people about a scientific experiment by hacking p-values following the methodlogy from this xkcd comic:
First, some useful imports.
import numpy as np
import pandas as pd
from numpy.random import random_integers, seed
We are given 20 dice colors, and we roll dice for each color 1,000 times. We report the number of six. We set the random seed to make sure results are reproducible.
dice = ['Purple', 'Brown', 'Pink', 'Blue', 'Teal',
'Salmon', 'Red', 'Turquoise', 'Magenta', 'Yellow',
'Grey', 'Tan', 'Cyan', 'Green', 'Mauve',
'Beige', 'Lilac', 'Black', 'Peach', 'Orange']
def dice_experiments(dice, n):
df = pd.DataFrame(index=dice)
for die in dice:
result = random_integers(1,6,n)
df.ix[die,'Number of Six'] = np.sum(result[result==6])//6
return df
np.random.seed(75000)
df = dice_experiments(dice, 1000)
df
Number of Six | |
---|---|
Purple | 151 |
Brown | 167 |
Pink | 158 |
Blue | 167 |
Teal | 181 |
Salmon | 162 |
Red | 170 |
Turquoise | 161 |
Magenta | 165 |
Yellow | 180 |
Grey | 172 |
Tan | 164 |
Cyan | 181 |
Green | 188 |
Mauve | 165 |
Beige | 172 |
Lilac | 178 |
Black | 176 |
Peach | 173 |
Orange | 157 |
The green color has 188 times a six, which is rather high. Indeed, average value is 1,000/6 = 167.
The probability that green dice have at least 188 six is about 4% as we will see below. This is the p-value for the green color result. If we decided to watch the green color before the experiement, then this low p-value would warrant us a scientific publication. Read my blog to see why.
Let's see how unlikely this result is by running our experiment many times.
def get_fraction_one_color(color, dice, k, n, repeat):
success = 0.0
for experiment in range(repeat):
df = dice_experiments(dice, n)
if df.loc['Green','Number of Six'] >= k:
success += 1
return success/repeat
get_fraction_one_color('Green', dice, 188, 1000, 1000)
0.041
This is about 4.1%, i.e. rather unlikely. There must be a reason why green dice favor six that much!
We misinterpreted what we did and our conclusion is misleading. What we did was to run the experiment for 20 colors, then select the color with the highest numpber of six. The p-value should be the probability that at least one color gets at least 188 six.
Let's evaluate this probability experimentally.
def get_fraction_any_color(dice, k, n, repeat):
success = 0.0
for experiment in range(repeat):
df = dice_experiments(dice, n)
if df['Number of Six'].max() >= k:
success += 1
return success/repeat
get_fraction_any_color(dice, 188, 1000, 1000)
0.556
This probability is about 56%, i.e. very likely. There is nothing special about our dice afterall.
This little code may look contrived, but it is not. scientists got publications using a similar methodology, see my blog for details.
Let's compute the above p-values using probability theory.
The probability that k out of n are six is given by the function below. Indeed, in order to get exactly $k$ times a six among $n$ rolls, we must first decide where those six appear in the sequence of 1,000 rolls. There are exactly $n$ choose $k = n!/(k!.(n-k)!$ ways to do it. Then for each of these sequence, the probability that all of the $k$ occurrences are a six is $1/6^k$ and the probability that the remaining $n-k$ occurrences are not a six is $(5/6)^{n-k}$. The probability to get exactly $k$ times a six out of $n$ dice rolls is therefore equal to $1/6^k(5/6)^{n-k}n!/(k!(n-k)!)$.
from scipy.misc import comb
def proba_k_among_n(k, n):
p = 1/6
q = 1-p
result = p**k * q**(n-k) * comb(n, k)
return result
print("%0.4f" % proba_k_among_n(188,1000))
0.0066
The probability that at least k out of n are six is the sum of the probability for each possible value greater than or equal to k.
def proba_at_least_k_among_n(k, n):
proba = 0.0
for i in range(k, n+1):
proba += proba_k_among_n(i, n)
return proba
print("%0.4f" % proba_at_least_k_among_n(188,1000))
0.0401
Probability that one specific color gets at least k six out of n is what we just computed.
def proba_one_color(k, n):
return proba_at_least_k_among_n(k, n)
print("%0.3f" % proba_one_color(188, 1000))
0.040
Probability that at least one color among ncolors gets at least k six out of n
def proba_at_least_one_color(ncolors, k, n):
p = proba_one_color(k, n)
proba_no_color = (1-p)**ncolors
return 1 - proba_no_color
print("%0.3f" % proba_at_least_one_color(20, 188, 1000))
0.559
This is close to what we found experimentally.
A more direct code, using binomial distribution.
from scipy import stats
One experiment gets more than 188 times a six.
stats.binom(1000, 1/6.0).sf(187)
0.040142990286396493
At least one in 20 experiments gets more than 188 times a six.
1 - (stats.binom(1000, 1/6.0).cdf(187)**20)
0.55931241410111465
We wanted to mimick the xkcd comic, hence we looked for a random seed that yields the result we want. This is really bad p-hacking, as we define first the p-value, then find experiment settings that produce it.
def find_seed(dice, k, n, repeat, rounding):
nseed = 0.0
for seed in range(0,repeat,rounding):
np.random.seed(seed)
df = dice_experiments(dice, n)
m = df['Number of Six'].max()
if m == k and m == df.loc['Green','Number of Six']:
return seed
res = find_seed(dice, 188, 1000, 1000000, 1000)
res
75000