Solution to a problem posted at
https://www.reddit.com/r/statistics/comments/4csjee/finding_pab_given_two_sets_of_data/
Copyright 2016 Allen Downey
MIT License: http://opensource.org/licenses/MIT
from __future__ import print_function, division
from numpy.random import choice
from collections import Counter
from collections import defaultdict
Roll six 6-sided dice:
def roll(die):
return choice(die, 6)
die = [1,2,3,4,5,6]
roll(die)
array([2, 1, 2, 1, 1, 1])
Count how many times each outcome occurs and score accordingly:
def compute_score(outcome):
counts = Counter(outcome)
dd = defaultdict(list)
[dd[v].append(k) for k, v in counts.items()]
return len(dd[max(dd)])
compute_score([1,1,1,1,1,1])
1
Run many times and accumulate scores:
n = 100000
scores = [compute_score(roll(die)) for _ in range(n)]
Print the percentages of each score:
for score, freq in sorted(Counter(scores).items()):
print(score, 100*freq/n)
1 59.178 2 35.384 3 3.867 6 1.571
Or even better, just enumerate the possibilities.
from itertools import product
die = [1,2,3,4,5,6]
counts = Counter(compute_score(list(outcome)) for outcome in product(*[die]*6))
n = sum(counts.values())
for score, freq in sorted(counts.items()):
print(score, 100*freq/n)
1 59.2335390947 2 35.3652263374 3 3.85802469136 6 1.54320987654