Dice Baseball

The March 22, 2019 Riddler asks us to simulate baseball using probabilities from a 19th century dice game. There were some choices to make that were left unspecified in the rules; the following are my current choices (in an early version I made different choices that resulted in slightly more runs):

  • On a b-base hit, runners advance b bases, except that a runner on second scores on a 1-base hit.
  • On an "out at first", all runners advance one base.
  • A double play only applies if there is a runner on first; in that case other runners advance.
  • On a fly out, a runner on third scores; other runners do not advance.
  • On an error all runners advance one base.
  • On a base on balls, only forced runners advance.

I also made some choices about the implementation:

  • I wanted to have one event per batter, so I don't allow "strike" as an event. Rather I compute the probability of a strikeout event (i.e. getting three "strike" dice rolls in a row before getting another event) as (7/36)**3, and check for that.
  • Note that a die roll such as (1, 1) is a 1/36 event, whereas (1, 2) is a 2/36 event, because it also represents (2, 1).
  • I'll represent events with the following one letter codes:
    • K, O, o, f, D: strikeout, foul out, out at first, fly out, double play
    • 1, 2, 3, 4: single, double, triple, home run
    • E, B: error, base on balls
  • I'll keep track of runners with a list of occupied bases; runners = [1, 2] means runners on first and second.
  • A runner who advances to base 4 or higher has scored a run (unless there are already 3 outs).
  • The function inning simulates a half inning and returns the number of runs scored.
  • I want to be able to test inning by feeding it specific events, and I also want to generate many innings worth of random events. So I'll make the interface be that I pass in an iterator of events.
  • I'll random simulate 1 million innings and store the resulting scores in innings.
  • To simulate a game I just sample 9 elements of innings and sum them.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import random
In [2]:
def our_national_ball_game():
    "An iterator of events sampled from the odds specified in `Our National Ball Game`."
    events = '2111111EEBBOOooooooofffffD334'
    while True:
        yield 'K' if random.random() < (7 / 36) ** 3 else random.choice(events)

def inning(events=our_national_ball_game(), verbose=False) -> int:
    "Simulate a half inning based on events, and return number of runs scored."
    outs = runs = 0 # Inning starts with no outs and no runs,
    runners = []    # ... and with nobody on base
    while True:
        x = next(events)
        if verbose: print(f'outs: {outs}, runs: {runs}, runners: {runners}, event: {x}')
        if x in 'KODof': # strikeout, foul out, double play, out at first, fly out, 
            outs += 1    # Batter is out
            if x == 'D' and 1 in runners: # double play
                outs += 1
                runners = [r + 1 for r in runners if r != 1]
            elif x == 'o': # out at first (other runners advance)
                runners = [r + 1 for r in runners]
            elif x == 'f' and 3 in runners and outs < 3: # fly out; runner on 3rd scores
                runners.remove(3)
                runs += 1
        else: 
            runners.append(0) # Batter becomes a runner
            if x in '1234':   # single, double, triple, homer
                runners = [r + int(x) + (r == 2) for r in runners]
            elif x == 'E':    # error
                runners = [r + 1 for r in runners]
            elif x == 'B':    # base on balls
                runners = [r + all(b in runners for b in range(r)) for r in runners]
        # See if inning is over, and if not, whether anyone scored
        if outs >= 3:
            return runs
        runs += sum(r >= 4 for r in runners)
        runners = [r for r in runners if r < 4]

Let's peek at some random innings:

In [3]:
inning(verbose=True)
outs: 0, runs: 0, runners: [], event: f
outs: 1, runs: 0, runners: [], event: B
outs: 1, runs: 0, runners: [1], event: o
outs: 2, runs: 0, runners: [2], event: 1
outs: 2, runs: 1, runners: [1], event: 2
outs: 2, runs: 1, runners: [3, 2], event: o
Out[3]:
1
In [4]:
inning(verbose=True)
outs: 0, runs: 0, runners: [], event: 3
outs: 0, runs: 0, runners: [3], event: o
outs: 1, runs: 1, runners: [], event: O
outs: 2, runs: 1, runners: [], event: B
outs: 2, runs: 1, runners: [1], event: o
Out[4]:
1

And we can feed in any events we want to test the code:

In [5]:
inning(iter('2EBD12f'), verbose=True)
outs: 0, runs: 0, runners: [], event: 2
outs: 0, runs: 0, runners: [2], event: E
outs: 0, runs: 0, runners: [3, 1], event: B
outs: 0, runs: 0, runners: [3, 2, 1], event: D
outs: 2, runs: 1, runners: [3], event: 1
outs: 2, runs: 2, runners: [1], event: 2
outs: 2, runs: 2, runners: [3, 2], event: f
Out[5]:
2

That looks good.

Now, simulate a million innings, and then sample from them to simulate a million nine-inning games:

In [6]:
N = 1000000
innings = [inning() for _ in range(N)]
games = [sum(random.sample(innings, 9)) for _ in range(N)]

Finally, display the mean number of runs scored per team per nine-inning game, along with a histogram:

In [7]:
plt.hist(games, ec='black', bins=max(games)-min(games)+1)
sum(games) / N
Out[7]:
14.462798