In a gameshow, contestants try to guess which of 3 closed doors contain a cash prize (goats are behind the other two doors). Of course, the odds of choosing the correct door are 1 in 3. As a twist, the host of the show occasionally opens a door after a contestant makes his or her choice. This door is always one of the two the contestant did not pick, and is also always one of the goat doors (note that it is always possible to do this, since there are two goat doors). At this point, the contestant has the option of keeping his or her original choice, or swtiching to the other unopened door. The question is: is there any benefit to switching doors? The answer surprises many people who haven't heard the question before.
We can answer the problem by running simulations in Python. We'll do it in several parts.
First, write a function called simulate_prizedoor
. This function will simulate the location of the prize in many games -- see the detailed specification below:
import IPython
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib
import sklearn
import requests
import networkx as nx
import BeautifulSoup
import mrjob
import pattern
%matplotlib inline
# this actually imports matplotlib
import matplotlib.pyplot as plt
IPython version: 1.0.0 (need at least 1.0) Numpy version: 1.7.1 (need at least 1.7.1) SciPy version: 0.12.0 (need at least 0.12.0) Pandas version: 0.11.0 (need at least 0.11.0) Mapltolib version: 1.2.1 (need at least 1.2.1) Scikit-Learn version: 0.13.1 (need at least 0.13.1) requests version: 1.2.3 (need at least 1.2.3) NetworkX version: 1.7 (need at least 1.7) BeautifulSoup version: 3.2.1 (need at least 3.2) Mr Job version: 0.4 (need at least 0.4) Pattern version: 2.6 (need at least 2.6)
"""
Function
--------
simulate_prizedoor
Generate a random array of 0s, 1s, and 2s, representing
hiding a prize between door 0, door 1, and door 2
Parameters
----------
nsim : int
The number of simulations to run
Returns
-------
sims : array
Random array of 0s, 1s, and 2s
Example
-------
>>> print simulate_prizedoor(3)
array([0, 0, 2])
"""
AR NOTES
The following code should work now for any number of games and any number of doors.
One way to keep # games and # doors open-ended is to query the user using input(), but this barfs in iPython. eg.: number_of_games = input('How many games would you like to run? ') same for num_doors and num_goats_revealed
# for now we can just set these values at the head of the code file
number_of_games = 10
number_of_doors = 3
number_of_goats_revealed = number_of_doors - 2
def simulate_prizedoor(nsim):
#compute here
# initialize list
sims=[]
# populate with random doors of [#doors-1] (this accounts for 0-indexing)
sims = [np.random.random_integers(0,number_of_doors-1) for _ in range(0, nsim)]
return sims
#your code here
# assign list of doors to 'simDoors'
simDoors = simulate_prizedoor(number_of_games)
# display output
# print 'The prize doors:', simDoors
The prize doors: [2, 2, 2, 0, 1, 2, 1, 0, 0, 0]
Next, write a function that simulates the contestant's guesses for nsim
simulations. Call this function simulate_guess
. The specs:
"""
Function
--------
simulate_guess
Return any strategy for guessing which door a prize is behind. This
could be a random strategy, one that always guesses 2, whatever.
Parameters
----------
nsim : int
The number of simulations to generate guesses for
Returns
-------
guesses : array
An array of guesses. Each guess is a 0, 1, or 2
Example
-------
>>> print simulate_guess(5)
array([0, 0, 0, 0, 0])
"""
#your code here
def simulate_guess(nsim):
# initialize
guesses = []
# populate - same as simDoors routine
guesses = [np.random.random_integers(0,number_of_doors-1) for _ in xrange(nsim)]
return guesses
simGuesses = simulate_guess(number_of_games)
# display output
# print 'The guesses: ', simGuesses
The guesses: [2, 2, 2, 1, 1, 0, 1, 1, 2, 0]
Next, write a function, goat_door
, to simulate randomly revealing one of the goat doors that a contestant didn't pick.
"""
Function
--------
goat_door
Simulate the opening of a "goat door" that doesn't contain the prize,
and is different from the contestants guess
Parameters
----------
prizedoors : array
The door that the prize is behind in each simulation
guesses : array
THe door that the contestant guessed in each simulation
Returns
-------
goats : array
The goat door that is opened for each simulation. Each item is 0, 1, or 2, and is different
from both prizedoors and guesses
Examples
--------
>>> print goat_door(np.array([0, 1, 2]), np.array([1, 1, 1]))
>>> array([2, 2, 0])
"""
#your code here
# load math for fabs()
import math
# FUNCTION: goat_door
# input: arrays of prize doors and guesses
# output: array of revealed goat doors
def goat_door(prizedoors, guesses):
# initialize goats array
goats = []
# LOOP
# iterate through each 'game' where there's a prize door and a guessed door
QUESTION
I understand how to vectorize single operations in python, such as computing doors/guesses and populating arrays
But can you vectorize/array-orient series of commands, like with the apply family functions in R?
for i in xrange(number_of_games):
# make an array with all possible door choices
doors = range(0, number_of_doors)
# for readability
this_prize = prizedoors[i]
this_guess = guesses[i]
# remove prize, guess, from doors[] and return what's left over in goats[]
# (slightly redundant if prize == guess, but when len(doors) gets big, this is the easiest omnibus implementation)
if this_prize in doors: doors.remove(this_prize)
if this_guess in doors: doors.remove(this_guess)
goats.append(doors[0:number_of_goats_revealed])
return goats
# execute
simGoats = goat_door(simDoors, simGuesses)
# display output
#print 'The prize doors: ', simDoors
#print 'The guesses: ', simGuesses
#print 'The goat doors: ', simGoats
Write a function, switch_guess
, that represents the strategy of always switching a guess after the goat door is opened.
"""
Function
--------
switch_guess
The strategy that always switches a guess after the goat door is opened
Parameters
----------
guesses : array
Array of original guesses, for each simulation
goatdoors : array
Array of revealed goat doors for each simulation
Returns
-------
The new door after switching. Should be different from both guesses and goatdoors
Examples
--------
>>> print switch_guess(np.array([0, 1, 2]), np.array([1, 2, 1]))
>>> array([2, 0, 0])
"""
#your code here
def switch_guess(guesses, goatdoors):
# initialize array for new (ie. switched) choices
new_picks = []
# LOOP
# iterate through all games, switch choice in each one
for i in xrange(number_of_games):
doors = range(0, number_of_doors)
these_goats = goatdoors[i]
this_guess = guesses[i]
# go through each goat in the revealed goat doors
# remove that value from doors()
for i in xrange(len(these_goats)):
if these_goats[i] in doors: doors.remove(these_goats[i])
# then remove the guess value from doors()
if this_guess in doors: doors.remove(this_guess)
# what's left is the door you'd switch to
new_picks.append(doors[0])
return new_picks
simSwitches = switch_guess(simGuesses, simGoats)
# display output
#print 'The guesses: ', simGuesses
#print 'The prize doors: ', simDoors
#print 'The goat doors: ', simGoats
#print 'New picks: ', simSwitches
Last function: write a win_percentage
function that takes an array of guesses
and prizedoors
, and returns the percent of correct guesses
"""
Function
--------
win_percentage
Calculate the percent of times that a simulation of guesses is correct
Parameters
-----------
guesses : array
Guesses for each simulation
prizedoors : array
Location of prize for each simulation
Returns
--------
percentage : number between 0 and 100
The win percentage
Examples
---------
>>> print win_percentage(np.array([0, 1, 2]), np.array([0, 0, 0]))
33.333
"""
#your code here
def win_percentage(guesses, prizedoors):
wins = 0
for i in xrange(number_of_games):
this_guess = guesses[i]
this_prizedoor = prizedoors[i]
if this_guess == this_prizedoor: wins += 1
return float(wins)/number_of_games*100
no_switch_winrate = str(win_percentage(simGuesses, simDoors))
switch_winrate = str(win_percentage(simSwitches, simDoors))
Now, put it together. Simulate 10000 games where contestant keeps his original guess, and 10000 games where the contestant switches his door after a goat door is revealed. Compute the percentage of time the contestant wins under either strategy. Is one strategy better than the other?
#your code here
print 'If you stuck with your original choice always, you won', no_switch_winrate, '% of the time.'
print 'If you changed your original choice always, you won', switch_winrate, '% of the time.'
phrases = {
'switch':'So it turns out that switching is better than staying.',
'stay':'So it turns out that staying is better than switching.',
'same':'So it turns out that switching and staying are about the same.'
}
if switch_winrate > no_switch_winrate: print phrases['switch']
elif switch_winrate < no_switch_winrate: print phrases['stay']
else: print phrases['same']
If you stuck with your original choice always, you won 34.04 % of the time. If you changed your original choice always, you won 65.96 % of the time. So it turns out that switching is better than staying.
Many people find this answer counter-intuitive (famously, PhD mathematicians have incorrectly claimed the result must be wrong. Clearly, none of them knew Python).
One of the best ways to build intuition about why opening a Goat door affects the odds is to re-run the experiment with 100 doors and one prize. If the game show host opens 98 goat doors after you make your initial selection, would you want to keep your first pick or switch? Can you generalize your simulation code to handle the case of n
doors?