An Introduction to Probabilistic Programming in FSharp

Drawing

Introduction

A combination of pursuing a Master's in Data Science combined with an awesome trip to Open FSharp in San Francisco convinced me to contribute to this year's FSharp Advent Calendar; my contribution from last year can be found here. One of the workshops presented at Open FSharp entitled "Interactive Computing with F# Jupyter" by Colin Gravill covered making use of IFSharp Notebooks, the F# analog of Python's Jupyter notebooks, for computational experiments; this workshop truly blew my mind by highlighting the amazing capability of utilizing the power of F# in a "Notebook-esque" environment.

Probabilistic Programming has always been a topic that has caused me to break out in metaphorical hives. Therefore, I decided to pursue the selfish goal of trying to demystify this topic as much as possible by writing a blog post. I hope you enjoy it as much as I researching and developing this notebook!

What is Probabilistic Programming?

Probabilistic Programming is a software methodology for constructing probabilistic models to be used to make probabilistic inferences.

Simple and succint domain driven modelling techniques in conjunction with the lucid mathematical expressibility of F# makes the language and the concomitant ecosystem a perfect environment to start developing probabilistic models. The two experiments I cover in this blog post that elucidate on these qualities and try to give a taste of Probabilistic Programming are the Monty Hall Problem and Approximate Bayesian Computation.

The Monty Hall Problem

Drawing

The Monty Hall Problem involves a contestant at a game show choosing one of three doors in an effort to choose the door that hides a prize. Once the player chooses a door, the game show host opens a door that doesn't have a prize behind it and gives the contestant a choice to switch or stay with the chosen door.

Intuitively, staying with the selected door would make sense since switching wouldn't make a difference to the end result. However, after the host opens a door without the prize behind it, there is more information available to the contestant to make a decision on and hence, this 50/50 chance induced by switching improves the overall favors of winning.

This problem has stumped even the most astute mathematicians such as Paul Erdős and so, seemed like a perfect problem to solve to demonstrate the power of domain driven modelling and constructing probabilistic models in F#. In this section, I plan to develop the Monty Hall game domain and then take a probabilistic approach of proving that switching doors after the reveal is more favorable than staying with the selected door.

Preliminaries

Our first step is to implement some building blocks that will be used while developing the domain.

Probability

Probability is defined as the likelihood of a given event's occurrence that can be modelled as a double with valid values between 0 and 1 (both inclusive).

In [1]:
type Probability = private Probability of double

module Probability =
    let create ( probability : double ) : Probability option =
        if probability >= 0.0 && probability <= 1.0 then Some ( Probability probability )
        else None

    let getProbability ( Probability probability ) = probability
Example
In [2]:
// Creating
let prob = Probability.create ( 1.0 / 2.0 )

// Getting the value
match prob with
    | Some p -> Probability.getProbability p
    | None   -> nan
Out[2]:
0.5

Probability Distribution

A Probability Distribution can be modelled as a sequence of pairs of event occurences and probabilities.

In [3]:
type Distribution<'T> = seq< 'T * Probability >
Uniform Distribution

For the Monty Hall problem we will be making use of the Uniform Distribution as from the perspective of the contestant, the prize could be behind any of the doors available to be opened. The Uniform Distribution assumes an equal and constant probability of all events.

In [4]:
module UniformDistribution = 

    let private uniformRandomNumberGenerator = System.Random()

    let create ( events : seq< int > ) : Distribution<int>  = 
        seq {
            let countOfSequence = Seq.length events
            let distributionSequence =
                events
                |> Seq.map( fun e -> e, Probability.create ( 1.0 / double( countOfSequence )))
                |> Seq.filter( fun ( e, p ) -> p.IsSome )
                |> Seq.map( fun (e, p) -> e, p.Value )
            yield! distributionSequence
        }

    let drawOneFromDistribution ( uniformDistribution : Distribution<int> ) : int * Probability = 
        let countOfDistribution = 
            Seq.length uniformDistribution
        let idx = uniformRandomNumberGenerator.Next( 0, countOfDistribution )
        uniformDistribution
        |> Seq.item idx
        
    let drawOneFromEvents ( events : seq< int > ) : int * Probability = 
        let distribution = create events
        drawOneFromDistribution distribution
Example: Dice Rolls
In [5]:
let diceRollDistribution = UniformDistribution.create [ 1..6 ]
printfn "%A" ( Seq.toList diceRollDistribution )
[(1, Probability 0.1666666667); (2, Probability 0.1666666667);
 (3, Probability 0.1666666667); (4, Probability 0.1666666667);
 (5, Probability 0.1666666667); (6, Probability 0.1666666667)]
In [6]:
printfn "%A" ( UniformDistribution.drawOneFromDistribution diceRollDistribution )
(3, Probability 0.1666666667)

Game Domain

The game domain consists of 3 main elements:

  1. Game Outcome: The contestant can either Win or Lose.
  2. Doors: The game consists of 3 doors. To represent the initial state of the doors, we add in an "NA" door representing our "null" state.
  3. Game State: The Game State represents the state of the game at any instance of the game consisting of:
    • The Door Chosen by the Host as the Winning door
    • The Door Chosen by the Contestant
    • The Door Chosen by the Host to Open after the contestant has chosen the door.
In [7]:
type Outcome   = Win | Lose
type Door      = A | B | C | NA
type GameState = { winningDoor : Door; chosenDoor : Door; openedDoor : Door }

Game States

Drawing

The game states are the following:

  1. Start of the game
  2. The host hides the prize behind a random door
  3. The contestant chooses a door believed to hide the prize
  4. The host opens one of the doors that doesn't have the prize and isn't the door chosen by the contestant.
  5. The contestant chooses one of two strategies:
    • Switch doors
    • Stay with the currently chosen door
  6. Prize is revealed and the outcome of the game is made known.
In [8]:
let doors = [ A; B; C ]
let startGameState : GameState  = 
    { winningDoor = NA; chosenDoor = NA; openedDoor = NA }
In [9]:
let hidePrize ( state : GameState ) : GameState =
    let winningDoorIdx = fst ( UniformDistribution.drawOneFromEvents ( [ 1..3 ] )) - 1
    { state with winningDoor = doors.[ winningDoorIdx ] }
    
hidePrize startGameState
Out[9]:
{winningDoor = B;
 chosenDoor = NA;
 openedDoor = NA;}

The start of the game and hidding of the prize is naturally composed into the initializeGame.

In [10]:
let initializeGame : GameState =
    hidePrize startGameState
    
initializeGame
Out[10]:
{winningDoor = C;
 chosenDoor = NA;
 openedDoor = NA;}
In [11]:
let chooseDoor ( state : GameState ) ( door : Door ) : GameState =
    { state with chosenDoor = door }
    
let chooseRandomDoor ( state : GameState ) : GameState =
    let chosenDoorIdx = fst ( UniformDistribution.drawOneFromEvents ( [ 1..3 ] )) - 1
    { state with chosenDoor = doors.[ chosenDoorIdx ] }

let chosenRandomDoor = chooseRandomDoor initializeGame
chosenRandomDoor
Out[11]:
{winningDoor = C;
 chosenDoor = C;
 openedDoor = NA;}
In [12]:
let openDoor ( state : GameState ) : GameState =
    // Choose the Non-Winning Door that hasn't been chosen by the contestant.
    let doorToOpen = 
        doors 
        |> List.except [ state.winningDoor; state.chosenDoor ]
        |> List.item 0
    { state with openedDoor = doorToOpen }
    
let postOpenDoor = openDoor ( chosenRandomDoor )
postOpenDoor
Out[12]:
{winningDoor = C;
 chosenDoor = C;
 openedDoor = A;}

Next, we define a strategy once the door is opened by host. The two strategies are switching or staying with the chosen door.

In [13]:
type Strategy = GameState -> Outcome

let switch ( state : GameState ) : Outcome =
    let doorToSwitchTo = 
        doors
        |> List.except [ state.chosenDoor; state.openedDoor ]
        |> List.item 0
    if doorToSwitchTo = state.winningDoor then Win
    else Lose
    
let stay ( state : GameState ) : Outcome = 
    if state.chosenDoor = state.winningDoor then Win
    else Lose

printfn "On Switch: %A" ( switch postOpenDoor )
printfn "On Stay: %A"   ( stay postOpenDoor )
On Switch: Lose
On Stay: Win

Running Simulations

Now that we have the domain model and all the game states and strategies well defined, we continue by running 10,000 simulations of the contestant strategy of switching doors or staying with the current door.

In [14]:
let simulationCount = 10000

let simulateMontyHall ( strategy : Strategy ) : Outcome = 
    let game = 
        initializeGame 
        |> chooseRandomDoor
        |> openDoor
    strategy( game )

simulateMontyHall switch
Out[14]:
Win

Plotting the Results

Now that we have all the pieces in place to run the simulation, we proceed to pull in the XPlot library to help us plot our results via Paket.

In [15]:
#load "Paket.fsx"

Paket.Package([ "XPlot.Plotly" ])

#load "XPlot.Plotly.fsx"
#load "XPlot.Plotly.Paket.fsx"

open XPlot.Plotly