Bayesian Inference in FSharp

Introduction

For my F# Advent submission this year, I decided to implement an extremely simple Bayesian Statistical Framework to be used for inference. The purpose of statistical inference is to infer characteristics of a population via samples. The "Bayesian" prefix refers to making use of the Bayes Theorem to conduct inferential statistics.

image.png

I like to think of Statistical Inference via a pasta cooking analogy: hypothetically, assume you lose the cooking instructions of dried pasta you are trying to cook. You start boiling the water and don't know when the pasta is al-dente. One way to check is to careful remove a piece or two and test if it is cooked to your liking. Through this process, you'll know right away if the pasta needs to cook more or is ready to be sauce'd up. Similarly, inference deals with trying to figure out characteristics (is the pasta ready?) of a population (all the pasta) via samples (a single or few pieces of the pasta of your choice).

image.png

F#, as a language, didn't fail to deliver an awesome development experience! Admittedly, I hadn't used F# much in the past couple of months because of my deep dives into R and Python. However, within little to no-time a combination of muscle memory and ease of usage bumped up my productivity. The particular aspects of the language that made it easy to develop a Bayesian Statistical Framework are Pattern Matching and Immutable Functional Data Structures such as Records and Discriminated Unions that make expressing the domain succinctly and lucidly not only for the developer but also the reader.

Motivation

I recently received a Masters in Data Science but discovered that none of my courses dove deep into Bayesian Statistics. Therefore, motivated by a penchant desire to fully understand the differences between frequentist approaches and Bayesian ones to eventually gain an understanding of Bayesian Neural Networks. I, consequently, set out to dive deep into the world of Bayesian Statistics.

While reading "A First Course in Bayesian Statistical Methods" by Peter Hoff (highly recommend this book to those with a strong mathematical background) and this playlist by Ben Lambert, I discovered pymc3, a Probabilistic Programming library in Python. Pymc3, in my opinion, has some great abstractions that I wanted to implement myself in a functional-first setting.

The best way to learn a new statistical algorithm is my implementing it and the best way to learn something well is to teach it to others; this submission is a result of that ideology.

Bayes Theorem

The crux of Bayesian Inference lies in making use of the Bayes Theorem to get the Posterior:

\begin{align} p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{p(y)} \end{align}

From an inferential perspective, here are the interpretations of the terms of theorem:

Term Mathematical Representation Definition
Variable to Infer $\theta$ The variable we are trying to infer details about from a population i.e. trying to figure out if that variable is characteristic of the population.
Observations $y$ Observations obtained to validate our hypothesis that $\theta$ is characteristic of the true population.
Prior $p(\theta)$ The likelihood of the hypothesis that $\theta$ represents the true population before obtaining any observations
Likelihood $p(y \mid \theta)$ Also known as the Sampling Model, is the likelihood that $y$ would be the outcome of our experiment if we knew $\theta$ was representative of the true population.
Evidence $p(y)$ Likelihood of observing the data we have collected in all circumstances regardless of our hypothesis. This quantity can also be interpretted as the normalizing constant that causes our posterior probability to be a real number between 0.0 and 1.0. It's important to note that this quantity, out of the box, doesn't depend on the variable we are trying to infer.
Posterior $p(\theta \mid y)$ Likelihood of our hypothesis that $\theta$ is characteristic of the true population after observing the data or in light of new information.

In my opinion, Bayes Theorem resonates with the Scientific Method as we are effectively trying to quantify the validity of a hypothesis in light of new information i.e. compute the posterior distribution. As an important observational aside, the type of the output of Bayes Theorem i.e. the type of is a probability distribution rather than a point-estimate, which has some consequences in practice.

Bayesian Inference in Practice

In practice, the denominator or the Evidence i.e. the normalization constant of the Bayes theorem is a pain to calculate. For example, for continuous variables, Bayes theorem has the following form via an application of the Law of Total Probability:

\begin{align} p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{\int_{\Theta}p(y \mid \tilde{\theta}) p(\tilde{\theta}) \; d\tilde{\theta}} \end{align}

That integral in the denominator is UGLY and sometimes computational infeasible and so, approximation techniques must be used to get to generate the posterior distribution: enter Markov-Chain Monte Carlo (MCMC) methods taking advantage of the fact that the denominator is a normalization constant i.e. the posterior isn't proportional to the evidence but is proportional to the product of the prior and likelihood:

\begin{align} p(\theta \mid y) \propto p(y \mid \theta) p(\theta) \end{align}

Markov-Chain Monte-Carlo (MCMC) Algorithms

MCMC algorithms help avoid the issue of the ugly sometimes infeasible denominator of Bayes Theorem by giving us approximations of the posterior distribution. For the sake of clarity to give a definition for these algorithms, let's break up the conjugated terms:

Monte-Carlo

This term refers to fact that we make use of random numbers via a random number generator to approximate the posterior distribution to some reasonable degree of precision. The premise for the use of the Monte-Carlo method is that of the Law of Large Numbers, which states that the average of the results of an experiment of a large number of iterations or trials tends to get closer to the expected value.

The use case here is the large number of iterations guided by random number generators to cleverly approximate the posterior distribution will eventually resemble the posterior distribution.

Markov-Chain

"The future only depends on the present, not the past" - Peter Hoff (knicked this from Peter Hoff's AWESOME book mentioned before) encapsulates the idea behind Markov-Chains. Markov-Chains are a sequence of events whose probability is only a function of the previous event. To a computer scientist, a Markov-Chain can be thought of as a linked list (Thanks Steve!) of approximations whose values via Monte-Carlo methods typify the posterior.

Metropolis Hastings Algorithm

The Metropolis Hastings Algorithm is one choice of a MCMC Algorithm that can be used to approximate the posterior distribution. This algorithm, for each chain in the Markov-Chains, decides at each step or iteration if a random move in parameter space to approximate the posterior is worth keeping or rejecting based on a criteria that'll describe later. The specific flavor of Metropolis Hastings I make use of is the Symmetric Metropolis Hastings algorithm and I'll elucidate on its implementation in a section below.

Goals

The goals of this submission are to develop the following:

  1. A Bayesian Domain Specific Language (DSL) and Its Parser that a statistician with no programming experience can understand and use to specify a model and its parameters. Once our user specifies the model and the parameters, the parser associated with the Bayesian DSL converts this model into something digestable by the framework.
  2. A Bayesian Network from the Parsed Bayesian DSL addressing the simplest case i.e. one random variable representing the prior and one for the likelihood in the form of a Directed Acylical Graph (DAG).
  3. Symmeteric Metropolis-Hastings, an MCMC algorithm that makes use of the Bayesian DSL representation to approximate the posterior distribution.

Here is the proposed workflow I have developed in this submission; I will continue to add to it as time goes on but it is a start: image

Getting Setup

The inferential logic will require minimal dependencies:

Dependency Use
XPlot.Plotly Used for charting the histogram of the distributions.
MathNet.Numerics Used for the out-of-the-box statistical distributions.
Newtonsoft.Json Parsing the parameters of the Bayesian DSL.

I wrote this notebook via the .net-fsharp kernel via .NET Interactive and found it great to use! This blogpost was pretty helpful to get me started.

In [1]:
// Dependencies
#r "nuget: XPlot.Plotly"
#r "nuget: MathNet.Numerics"
#r "nuget: Newtonsoft.Json"

// Imports 
open XPlot.Plotly
open MathNet.Numerics
open Newtonsoft.Json
Installing package XPlot.Plotly.done!
Installing package MathNet.Numerics.done!
Installing package Newtonsoft.Json.done!

A Bayesian Domain Specific Language (DSL) and Its Parser

plan1

The goal of this section is to develop a Bayesian Domain Specific Language (DSL) that a statistician with no programming experience can understand and use to specify a model and its parameters.

Drawing inspiration from Stan and pymc3 tutorials such as this one, I wanted this DSL to be extremely simple, albeit, complete and therefore, the components I found that describe each random variable of the Bayesian Model are:

  1. The Name of the Random Variable
  2. Conditionals of the Random Variables i.e. what other variables are given to be true to completely specify the random variable
  3. The distribution associated with the random variable.
  4. The parameters as variables of the distribution e.g. for a Normal Distribution, the parameters will be $\mu$ and $\sigma$, the mean and the variance respectively.
  5. The observed data for the Random Variable - if the random variable has
  6. A map of the parameters to constants to decouple the model from the values associated with the model.

I also wanted this DSL to be a first class citizen in the eyes of a Bayesian statistician that specifies models using probability distributions.

Format

Random Variable Name [|Comma-separated Conditionals] ~ Distribution(Comma-separated Parameters without spaces) [: observed]

The details enclosed in [] imply optionality.

Example

θ ~ Gamma(a,b)
Y|θ ~ Poisson(θ) : observed1
Z|Y,θ ~ Beta(θ, Y) : observed2

The Domain

The domain or the types associated with the representation of the ParsedRandomVariable representing a single line in the specified model and the ParsedBayesianModel is a list of these.

In [2]:
type ParsedRandomVariable = 
    { Name             : string
      Conditionals     : string list
      Distribution     : string
      Parameters       : string list
      Observed         : string option }

type ParsedBayesianModel = ParsedRandomVariable list

Parsing Logic

The parsing logic for each line in the user specified model is as follows:

  1. Split the line by spaces.
  2. Extract the Name and the Conditionals from the first part of the model split on the tilde (~). Random Variable Name | [Conditionals].
  3. From the second part of the model split on the tilde, get the name of the distribution and its associated parameters along with the optionally available observed variable.

F#'s awesome pattern matching was a lifesaver here not only for it's ease of use but it's automatic ability to specify the failure cases.

In [3]:
open System

// Format: RVName [|Conditionals] ~ Distribution(Parameters) [: observed] 
// [] -> optional
// NOTE: There can't be any spaces in the distribution parameters.
let parseLineOfModel (lineInModel : string) : ParsedRandomVariable = 
 
    // Helper fn to split the string based on a variety of type of delimiters.
    // Resultant type is a list of strings to feed in for the pattern matching.
    let splitToList (toSplit : string) (delimiters : obj) : string list = 
        let split = 
            match delimiters with
            | :? string        as s   -> toSplit.Split(s, StringSplitOptions.RemoveEmptyEntries) 
            | :? array<string> as arr -> toSplit.Split(arr, StringSplitOptions.RemoveEmptyEntries) 
            | :? array<char>   as arr -> toSplit.Split(arr, StringSplitOptions.RemoveEmptyEntries) 
            | _ -> failwithf "Splitting based on delimiters failed as it is neither a string nor an array of strings: Of Type: %A - %A" (delimiters.GetType()) toSplit
        
        Array.toList split

    match splitToList lineInModel " " with
    | nameAndConditionals :: "~" :: distributionParametersObserved ->
        // Get the name and conditionals.
        let splitNameAndConditionals = splitToList nameAndConditionals "|"
        let name = splitNameAndConditionals.[0]
        let conditionals = 
            match splitNameAndConditionals with 
            | name :: conditionals -> 
                if conditionals.Length > 0 then splitToList conditionals.[0] ","
                else []
            | _ -> failwithf "Pattern not found for RV Name and Conditionals - the format is: RVName|Condtionals: %A" splitNameAndConditionals

        let extractAndGetParameters (distributionNameAndParameters : string) : string * string list = 
            let splitDistributionAndParameters = splitToList distributionNameAndParameters [| "("; ")" |]
            (splitDistributionAndParameters.[0], splitToList splitDistributionAndParameters.[1] ",")
            
        match distributionParametersObserved with 

        // Case: Without Observations. Example: θ ~ Gamma(a,b)
        | distributionNameAndParameters when distributionNameAndParameters.Length = 1 ->
            let extractedDistributionAndParameters = extractAndGetParameters distributionNameAndParameters.[0]
            { Name             = name; 
              Conditionals     = conditionals; 
              Distribution     = (fst extractedDistributionAndParameters).ToLower();
              Observed         = None; 
              Parameters       = snd extractedDistributionAndParameters; }

        // Case: With Observations. Example: Y|θ ~ Poisson(θ) : observed
        | distributionNameAndParameters :: ":" :: observed ->
            let extractedDistributionAndParameters = extractAndGetParameters distributionNameAndParameters
            { Name             = name;
              Conditionals     = conditionals; 
              Distribution     = (fst extractedDistributionAndParameters).ToLower();
              Observed         = Some observed.Head; // Only 1 observed list permitted.
              Parameters       = snd extractedDistributionAndParameters; } 

        // Case: Error.
        | _ -> failwithf "Pattern not found for the model while parsing the distribution, parameters and optionally, the observed variables: %A" distributionParametersObserved 

    | _ -> failwithf "Pattern not found for the following line in the model - please check the syntax: %A" lineInModel

let parseModel (model : string) : ParsedBayesianModel = 
    model.Split('\n') 
    |> Array.map(parseLineOfModel)
    |> Array.toList

// Helper to print out the Parsed Bayesian Model
let printParsedModel (model : string) : unit = 
    let parsedModel = parseModel model
    printfn "Model: %A is represented as %A" model parsedModel

Examples of Parsing a User-Specified DSL

In [4]:
// Print out our simple 1-Parameter Model.
let model1 = @"θ ~ Gamma(a,b)
              Y|θ ~ Poisson(θ) : observed"
printParsedModel(model1)

// This model doesn't make sense but adding to test multiple conditionals.
let model2  = @"θ ~ Beta(unit,unit)
               γ ~ Gamma(a,b)
               Y|θ,γ ~ Binomial(n,θ) : observed"
printParsedModel(model2)
Model: "θ ~ Gamma(a,b)
              Y|θ ~ Poisson(θ) : observed" is represented as [{ Name = "θ"
   Conditionals = []
   Distribution = "gamma"
   Parameters = ["a"; "b"]
   Observed = None }; { Name = "Y"
                        Conditionals = ["θ"]
                        Distribution = "poisson"
                        Parameters = ["θ"]
                        Observed = Some "observed" }]
Model: "θ ~ Beta(unit,unit)
               γ ~ Gamma(a,b)
               Y|θ,γ ~ Binomial(n,θ) : observed" is represented as [{ Name = "θ"
   Conditionals = []
   Distribution = "beta"
   Parameters = ["unit"; "unit"]
   Observed = None }; { Name = "γ"
                        Conditionals = []
                        Distribution = "gamma"
                        Parameters = ["a"; "b"]
                        Observed = None }; { Name = "Y"
                                             Conditionals = ["θ"; "γ"]
                                             Distribution = "binomial"
                                             Parameters = ["n"; "θ"]
                                             Observed = Some "observed" }]
Out[4]:

Specifying the Parameters

The idea is to decouple the parameters separate from the model and this is done by saving the details as a JSON string. This task is made simple using the Newtonsoft.Json library.

In [5]:
open System
open System.Collections.Generic

open MathNet.Numerics.Distributions

open Newtonsoft.Json

type Observed = float list

type ParameterList = 
    { Observed : float list; Parameters : Dictionary<string, float> } 

let deserializeParameters (paramsAsString : string) : ParameterList = 
    JsonConvert.DeserializeObject<ParameterList>(paramsAsString)

Examples of the Deserialization of the Parameters

In [6]:
// Parameter List 1
let parameters1 = "{Parameters : {μ0 : 0, σ0 : 1, μ : 5, σ : 2, λ : 4}, observed : [4.2,0.235,2.11]}"
let deserializedParameters1 = deserializeParameters parameters1
printfn "Deserialized Parameters 1: %A" (deserializedParameters1)

// Parameter List 2
let parameters2 = "{Parameters: {λ : 2}}"
let deserializedParameters2 = deserializeParameters parameters2
// Applying the Deserialized Parameters to Sample from a Distribution
let exp = Exponential deserializedParameters2.Parameters.["λ"] 
printfn "Sampling from the Exponential Distribution with the λ = %A parameter: %A" exp (exp.Sample())
Deserialized Parameters 1: { Observed = [4.2; 0.235; 2.11]
  Parameters = seq [[μ0, 0]; [σ0, 1]; [μ, 5]; [σ, 2]; ...] }
Sampling from the Exponential Distribution with the λ = Exponential(λ = 2) parameter: 1.673949327
Out[6]:

Converting the Parsed Model into a Bayesian Network

plan_section2

The goal is to take the parsed representations of the user-inputted model and convert them into a directed acylic graph (DAG) that represents a Bayesian Network in an effort to easily and generically compute the posterior distribution.

The first step here is to map the distribution that's specified by the user as a string to an actual function that can compute the density or mass functions used in the prior and likelihood calculations. Subsequent to the distribution mapping, creating the notion of a Bayesian Node and the associated Bayesian Network will lead us to having a concrete form of the specified model. In this flavor of implementation, I will be simplifying the logic to only consider a two node network with one node representing the prior and the other, the likelihood. In a future post, I'll be generalizing this algorithm further.

Distribution Mapping

The distribution mapping step involves developing:

  1. Type-specified domain for the distribution.
  2. Parsing logic that takes the name of the distribution and applies the ParsedRandomVariable based parameters to it.
  3. A helper function based on the type of distribution to get the probability density function based on an input.

NOTE: The list of distributions is incomplete, however, the idea should be similar to add newer ones.

In [7]:
open MathNet.Numerics.Distributions

type DistributionType = 
    | Continuous
    | Discrete

type Parameter     = float
type DiscreteInput = int
type Input         = float

type DensityType = 
    | OneParameter         of (Parameter * Input -> float) 
    | OneParameterDiscrete of (Parameter * DiscreteInput -> float)
    | TwoParameter         of (Parameter * Parameter * Input -> float)

type DistributionInfo = { RVName             : string
                          DistributionType   : DistributionType 
                          DistributionName   : string 
                          Parameters         : float list
                          Density            : DensityType } with
                          
    static member Create (item : ParsedRandomVariable) 
                         (parameterList : ParameterList) : DistributionInfo = 
    // I know this is ugly but this functionality assumes the user enters the 
    // parameters in the order that's expected by the MathNet Numerics Library. 
    // Grab the parameters associated with this Random Variable.
    let rvParameters = 
        item.Parameters
        |> List.filter(parameterList.Parameters.ContainsKey) 
        |> List.map(fun item -> parameterList.Parameters.[item])

    // Extract Distribution Associated with the Parsed Random Variable.
    match item.Distribution with

    // 1 Parameter Distributions 
    | "exponential" -> 
        { RVName           = item.Name
          DistributionName = item.Distribution
          Parameters       = rvParameters
          DistributionType = DistributionType.Continuous
          Density          = OneParameter Exponential.PDF }
    | "poisson" ->
        { RVName           = item.Name
          DistributionName = item.Distribution
          Parameters       = rvParameters
          DistributionType = DistributionType.Discrete
          Density          = OneParameterDiscrete Poisson.PMF } 
    // 2 Parameter Distributions 
    | "normal" ->
        { RVName           = item.Name
          DistributionName = item.Distribution
          Parameters       = rvParameters
          DistributionType = DistributionType.Continuous
          Density          = TwoParameter Normal.PDF }
    | "gamma" ->
        { RVName           = item.Name
          DistributionName = item.Distribution
          Parameters       = rvParameters
          DistributionType = DistributionType.Continuous
          Density          = TwoParameter Gamma.PDF }
    | "beta" ->
        { RVName           = item.Name
          DistributionName = item.Distribution
          Parameters       = rvParameters
          DistributionType = DistributionType.Continuous
          Density          = TwoParameter Beta.PDF }
    | "continuousuniform" ->
        { RVName           = item.Name
          DistributionName = item.Distribution
          Parameters       = rvParameters
          DistributionType = DistributionType.Continuous
          Density          = TwoParameter ContinuousUniform.PDF }
    // Failure Case
    | _ -> failwithf "Distribution not registered: %A" item.Distribution                      
                          
    member this.ComputeOneParameterPDF (parameter : float) (input : float) : float =
        match this.Density with
        | OneParameter pdf -> pdf(parameter,input)
        | _ -> failwithf "Incorrect usage of function with a non One Parameter Density Type. Density given: %A" this.Density 
    member this.ComputeOneParameterDiscretePMF (parameter : float) (input : int) : float =
        match this.Density with
        | OneParameterDiscrete pmf -> pmf(parameter,input)
        | _ -> failwithf "Incorrect usage of function with a non One Parameter Discrete Density Type. Density given: %A" this.Density 
    member this.ComputeTwoParameterPDF (parameter1 : float) (parameter2 : float) (input : float) : float =
        match this.Density with
        | TwoParameter pdf -> pdf(parameter1, parameter2, input)
        | _ -> failwithf "Incorrect usage of function with a non Two Parameter Density Type. Density given: %A" this.Density

Helper Methods to get the Distributions from the User Specified DSL

In [8]:
let getDistributionInfoForModel(model : string) (parameterList : string) : DistributionInfo list = 
    let parsedModel   = parseModel model
    let parameterList = deserializeParameters parameterList 
    parsedModel
    |> List.map(fun x -> DistributionInfo.Create x parameterList)

let getDensityOrProbabilityForModel (model : string) 
                                    (parameterList : string) 
                                    (data : float seq) : IDictionary<string, float seq> = 
    getDistributionInfoForModel model parameterList
    |> List.map(fun (e : DistributionInfo) -> 
        match e.Density with
        | OneParameter p ->
            let param   = List.exactlyOne e.Parameters 
            let results = data |> Seq.map(fun d -> e.ComputeOneParameterPDF param d)
            e.RVName, results
        | OneParameterDiscrete p ->
            let param   = List.exactlyOne e.Parameters 
            let results = data |> Seq.map(fun d -> e.ComputeOneParameterDiscretePMF param (int d))
            e.RVName, results
        | TwoParameter p ->
            let p2 : float list = e.Parameters |> List.take 2
            let results = data |> Seq.map(fun d -> e.ComputeTwoParameterPDF p2.[0] p2.[1] d)
            e.RVName, results)
    |> dict

Testing the Distribution Mapping Logic on Random Samples via a Simple Model

In [9]:
// Exponential. 
let exponentialModel     = "x ~ Exponential(lambda)"
let exponentialParamList = "{Parameters: {lambda : 2, a : 2., b : 2.3}, Observed : []}"
let exponentialDummyData = ContinuousUniform.Samples(0., 200.) |> Seq.take 2000
let exponentialPdfs      = getDensityOrProbabilityForModel exponentialModel exponentialParamList exponentialDummyData
printfn "Exponential: %A" (exponentialPdfs.Values)

// Normal.
let normalModel     = "x ~ Normal(mu,sigma)"
let normalParamList = "{Parameters: {mu: 0., sigma : 1.}, Observed : []}"
let normalDummyData = Normal.Samples(0.0, 1.0) |> Seq.take 2000
let normalPdfs      = getDensityOrProbabilityForModel normalModel normalParamList normalDummyData 
printfn "Normal: %A" (normalPdfs.Values)

// Poisson.
let poissonModel     = "x ~ Poisson(theta)"
let poissonParamList = "{Parameters: {theta: 44}, Observed : []}"
let poissonDummyData = ContinuousUniform.Samples(0., 5.) |> Seq.take 2000 
let poissonPdfs      = getDensityOrProbabilityForModel poissonModel poissonParamList poissonDummyData 
printfn "Poisson: %A" (poissonPdfs.Values)
Exponential: seq
  [seq [5.021224175e-54; 6.790246353e-50; 2.044679134e-68; 2.02863983e-172; ...]]
Normal: seq [seq [0.3813916696; 0.1382035578; 0.1331370548; 0.2884060033; ...]]
Poisson: seq
  [seq [1.104713281e-15; 1.21518461e-14; 7.532136009e-17; 1.21518461e-14; ...]]
Out[9]:

Bayesian Node

The next step is to specify the Bayesian Node, the abstraction that should encapsulate all the information for successfully computing probabilities. There can be two types of Nodes:

  1. With Observed Variables for cases such as the Likelihood where we have data from a sampling model.
  2. Non-Observed Variables for cases such as the Prior where all we have is a guess as to what the distribution of the parameter we want to infer is.
In [20]:
type BayesianNodeTypeInfo =
    | Observed of float list 
    | NonObserved

type BayesianNode = 
    { Name                 : string
      NodeType             : BayesianNodeTypeInfo 
      DistributionInfo     : DistributionInfo
      ParsedRandomVariable : ParsedRandomVariable } with

    static member Create(parsedRandomVariable : ParsedRandomVariable)
                        (parameterList : ParameterList) =

        let nodeType : BayesianNodeTypeInfo =
            match parsedRandomVariable.Observed with
            | Some _ -> BayesianNodeTypeInfo.Observed parameterList.Observed
            | None   -> BayesianNodeTypeInfo.NonObserved
        { Name                 = parsedRandomVariable.Name;
          NodeType             = nodeType;
          DistributionInfo     = DistributionInfo.Create parsedRandomVariable parameterList;
          ParsedRandomVariable = parsedRandomVariable; }

    member this.GetDependents (parsedBayesianModel : ParsedBayesianModel) : ParsedRandomVariable list =
        parsedBayesianModel 
        |> List.filter(fun x -> x.Conditionals |> List.contains(this.Name))

An Example of Creating a Node From a Line in the Model

In [21]:
let lineOfModel = @"x ~ Exponential(lambda) : observed"
let paramList   = "{Parameters: {lambda : 2, a : 2., b : 2.3 }, observed : [1,2,3,55]}"
BayesianNode.Create (parseLineOfModel lineOfModel) (deserializeParameters paramList)
Out[21]:
NameNodeTypeDistributionInfoParsedRandomVariable
x{ FSI_0024+BayesianNodeTypeInfo+Observed: Item: [ 1, 2, 3, 55 ], Tag: 0, IsObserved: True, IsNonObserved: False }{ FSI_0011+DistributionInfo: RVName: x, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: exponential, Parameters: [ 2 ], Density: { FSI_0011+DensityType+OneParameter: Item: { [email protected]: }, Tag: 0, IsOneParameter: True, IsOneParameterDiscrete: False, IsTwoParameter: False } }{ FSI_0006+ParsedRandomVariable: Name: x, Conditionals: [ ], Distribution: exponential, Parameters: [ lambda ], Observed: { Microsoft.FSharp.Core.FSharpOption<System.String>: Value: observed } }

Simple Bayesian Network

A General Bayesian Network is a Directed Acyclic Graph consisting of Bayesian Nodes and the relationship amongst them. A Simple Bayesian Network is a type of Bayesian Network with two Bayesian Nodes namely, the Prior and Likelihood nodes or in other words, the model that the user can specify is constrained to two random variables; this is done for the sake of simplicity.

The Simple Bayesian Network also consists of functions that can be used to compute the numerator of Bayes Theorem i.e. $Posterior = Prior \times Likelihood$

The Prior is computed using the PDF or PMF based on the parameters associated with the Prior Bayesian Node. For example, a Normal Prior Node will have the formula of Normal.PDF(mean, stddev, input).

The Likelihood, on the other hand, involves the result from the Prior and the observed list. The assumptions in the computation of the Likelihood is that the observations are conditionally independent given our prior and the exchangability of the observations where the order of the observations don't matter i.e. the order or labels of the observations don't change the outcome.

\begin{align} p(y_1, \dots, y_n \mid \theta) = p(y_1 \mid \theta) \times \dots \times p(y_n \mid \theta) = \prod_i p(y_i \mid \theta). \end{align}

In other words, we will be computing the PDF based on the prior, parameters of the likelihood node and the observed variables. For example, if we were to be inferring the mean of the Normal distribution, our likelihood will be computed by the following pseudo-code: product(Normal.PDF(prior, stddevOfLikelihood, observation))

In [12]:
// Only to be used for a model with 2 nodes 
// i.e. one for the prior and one for the likelihood.
type SimpleBayesianNetworkModel = 
  { Name       : string
    Nodes      : IDictionary<string, BayesianNode>
    Prior      : BayesianNode
    Likelihood : BayesianNode } with

    member this.GetPriorProbability (input : float) : float = 
      let distributionInfo = this.Prior.DistributionInfo 
      match distributionInfo.Density with
      | OneParameter p ->
          let param = List.exactlyOne distributionInfo.Parameters 
          distributionInfo.ComputeOneParameterPDF param input 
      | OneParameterDiscrete p ->
          let param   = List.exactlyOne distributionInfo.Parameters 
          distributionInfo.ComputeOneParameterDiscretePMF param (int input)
      | TwoParameter p ->
          let p2 : float list = distributionInfo.Parameters |> List.take 2
          distributionInfo.ComputeTwoParameterPDF p2.[0] p2.[1] input

    member this.GetLikelihoodProbability (prior : float) : float = 
        let distributionInfo = this.Likelihood.DistributionInfo
        let observed         = match this.Likelihood.NodeType with
                               | Observed l -> l
                               | _ -> failwithf "Incorrectly constructed Simple Network Model. %A" this 
        let density = 
          match distributionInfo.Density with
          | OneParameter p -> 
              observed |> List.map(fun d -> distributionInfo.ComputeOneParameterPDF prior d)
          | OneParameterDiscrete p ->
              observed |> List.map(fun d -> distributionInfo.ComputeOneParameterDiscretePMF prior (int (Math.Ceiling d)))
          | TwoParameter p ->
              let p : float = distributionInfo.Parameters |> List.exactlyOne 
              observed |> List.map(fun d -> distributionInfo.ComputeTwoParameterPDF prior p d)
        density
        |> List.fold (*) 1.0

    member this.GetPosteriorWithoutScalingFactor (input: float) : float = 
      let priorPdf      = this.GetPriorProbability input 
      let likelihoodPdf = this.GetLikelihoodProbability priorPdf
      priorPdf * likelihoodPdf

    static member Construct (name : string)
                            (model : ParsedBayesianModel)
                            (parameterList : ParameterList) : SimpleBayesianNetworkModel = 

        // Construct all the modes of the model.
        let allNodes : (string * BayesianNode) list =
            model
            |> List.map(fun m -> m.Name, BayesianNode.Create m parameterList)

        let prior : BayesianNode =  
            allNodes
            |> List.filter(fun (_,m) -> m.NodeType = BayesianNodeTypeInfo.NonObserved) 
            |> List.map(fun (_,m) -> m)
            |> List.exactlyOne

        let likelihood : BayesianNode =  
            allNodes
            |> List.filter(fun (_,m) -> m.NodeType <> BayesianNodeTypeInfo.NonObserved)
            |> List.map(fun (_,m) -> m)
            |> List.exactlyOne

        { Name       = name; 
          Nodes      = dict allNodes; 
          Prior      = prior;
          Likelihood = likelihood; }

Example of Creating a Simple Bayesian Network

In [13]:
let model = @"x ~ Normal(μ,τ) 
              y|x ~ Normal(x,σ) : observed"
let parsedModel = parseModel model
let paramList = "{Parameters: {μ : 5, τ : 3.1622, σ : 1}, observed : [9.37,10.18,9.16,11.60,10.33]}"
let normalModel = 
    SimpleBayesianNetworkModel.Construct "Normal-Normal" parsedModel (deserializeParameters paramList)
normalModel
Out[13]:
NameNodesPriorLikelihood
Normal-Normal[ { System.Collections.Generic.KeyValuePair<System.String,FSI_0014+BayesianNode>: Key: x, Value: { FSI_0014+BayesianNode: Name: x, NodeType: { FSI_0014+BayesianNodeTypeInfo+_NonObserved: Tag: 1, IsObserved: False, IsNonObserved: True }, DistributionInfo: { FSI_0011+DistributionInfo: RVName: x, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: normal, Parameters: [ 5, 3.1622 ], Density: { FSI_0011+DensityType+TwoParameter: Item: [email protected], Tag: 2, IsOneParameter: False, IsOneParameterDiscrete: False, IsTwoParameter: True } }, ParsedRandomVariable: { FSI_0006+ParsedRandomVariable: Name: x, Conditionals: [ ], Distribution: normal, Parameters: [ μ, τ ], Observed: <null> } } }, { System.Collections.Generic.KeyValuePair<System.String,FSI_0014+BayesianNode>: Key: y, Value: { FSI_0014+BayesianNode: Name: y, NodeType: { FSI_0014+BayesianNodeTypeInfo+Observed: Item: [ 9.37, 10.18, 9.16, 11.6, 10.33 ], Tag: 0, IsObserved: True, IsNonObserved: False }, DistributionInfo: { FSI_0011+DistributionInfo: RVName: y, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: normal, Parameters: [ 1 ], Density: { FSI_0011+DensityType+TwoParameter: Item: [email protected], Tag: 2, IsOneParameter: False, IsOneParameterDiscrete: False, IsTwoParameter: True } }, ParsedRandomVariable: { FSI_0006+ParsedRandomVariable: Name: y, Conditionals: [ x ], Distribution: normal, Parameters: [ x, σ ], Observed: { Microsoft.FSharp.Core.FSharpOption`1[System.String]: Value: observed } } } } ]{ FSI_0014+BayesianNode: Name: x, NodeType: { FSI_0014+BayesianNodeTypeInfo+_NonObserved: Tag: 1, IsObserved: False, IsNonObserved: True }, DistributionInfo: { FSI_0011+DistributionInfo: RVName: x, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: normal, Parameters: [ 5, 3.1622 ], Density: { FSI_0011+DensityType+TwoParameter: Item: { [email protected]: }, Tag: 2, IsOneParameter: False, IsOneParameterDiscrete: False, IsTwoParameter: True } }, ParsedRandomVariable: { FSI_0006+ParsedRandomVariable: Name: x, Conditionals: [ ], Distribution: normal, Parameters: [ μ, τ ], Observed: <null> } }{ FSI_0014+BayesianNode: Name: y, NodeType: { FSI_0014+BayesianNodeTypeInfo+Observed: Item: [ 9.37, 10.18, 9.16, 11.6, 10.33 ], Tag: 0, IsObserved: True, IsNonObserved: False }, DistributionInfo: { FSI_0011+DistributionInfo: RVName: y, DistributionType: { FSI_0011+DistributionType: Tag: 0, IsContinuous: True, IsDiscrete: False }, DistributionName: normal, Parameters: [ 1 ], Density: { FSI_0011+DensityType+TwoParameter: Item: { [email protected]: }, Tag: 2, IsOneParameter: False, IsOneParameterDiscrete: False, IsTwoParameter: True } }, ParsedRandomVariable: { FSI_0006+ParsedRandomVariable: Name: y, Conditionals: [ x ], Distribution: normal, Parameters: [ x, σ ], Observed: { Microsoft.FSharp.Core.FSharpOption<System.String>: Value: observed } } }

Debugging the Prior and Likelihood

A quick sanity check to see if the numbers match up if we were to compute the Prior and Likelihood "by hand".

In [14]:
let model = @"x ~ Normal(μ,τ) 
              y|x ~ Normal(x,σ) : observed"
let parsedModel = parseModel model
let paramList = "{Parameters: {μ : 5, τ : 3.1622, σ : 1}, observed : [9.37,10.18,9.16,11.60,10.33]}"
let normalModel = 
    SimpleBayesianNetworkModel.Construct "Normal-Normal" parsedModel (deserializeParameters paramList)
    
// Prior Check
let priorFromModel = normalModel.GetPriorProbability 0.0
let computedPrior  = Normal.PDF(5., 3.1622, 0.0)
printfn "Normal Model Prior: %A | Prior from formula: %A" priorFromModel computedPrior

// Likelihood Check
let likelihoodFromModel = normalModel.GetLikelihoodProbability 0.0
let observed : float list = 
    match normalModel.Likelihood.NodeType with
    | Observed x -> x
    | _ -> failwithf "Incorrectly constructed Likelihood"
    
let likelihoodListFromFormula : float list = 
    observed |> List.map(fun (o:float) -> Normal.PDF(computedPrior, 1., o))
let likelihoodFromFormula = likelihoodListFromFormula |> List.fold (*) 1.

printfn "Normal Model Likelihood: %A | Likelihood from formula: %A" 
    (normalModel.GetLikelihoodProbability 0.03614314702) 
    (likelihoodFromFormula)
Normal Model Prior: 0.03614314702 | Prior from formula: 0.03614314702
Normal Model Likelihood: 4.158402902e-114 | Likelihood from formula: 4.158402901e-114
Out[14]:

Applying the Symmetric Metropolis Hastings Algorithm to Approximate the Posterior Distribution

plan_section3

Now that we have some digestible representation of the Simple Bayesian Network, our next step is to make use of an MCMC algorithm to approximate the posterior. The steps here as follows:

  1. Define a well-specified MCMC domain to run the algorithmss.
  2. Implement a type of MCMC algorithm called the Symmetric Metropolis Hasting Algorithm.
  3. Apply the algorithm to a Simple Bayesian Network.

The Symmetric Metropolis Hastings Algorithm

I would consider the Symmetric Metropolis Hastings Algorithm to be a simple MCMC algorithm that's based on some smart applications of the Bayes Theorem and therefore, I chose it to be the "toy" approximation algorithm used here. There are better algorithms out there such as Hamiltonian Monte-Carlo or the No U-Turn Sampler that I plan to prospectively explore.

The Algorithm

  1. Produce a proposal based on the proposal distribution and the current value.
  2. Compute the Posterior with the proposal and the current value.
  3. Compute the Acceptance Ratio.
  4. Accept or Reject the proposal. If accepted, the current value will be the proposed value else, grab another proposal using the proposal distribution.
  5. Repeat until iterative convergence for all chains requested.

The main ideas associated with the steps or single iterations of this algorithm are:

1. Proposing New Values To Approximate the Parameter Space

Using a proposal distribution, we come up with newer values of the parameter to explore and find regions that are more likely to be outputted by the posterior distribution. Examples of the proposal distributions are:

  1. Uniform(current - δ, current + δ)
  2. Normal(current, δ^2)

where δ is a hyperparameter that's a proxy of the size of the jumps.

2. Computing the Acceptance Ratio

Taking advantage of the notion below and considering different parameters, say $\theta$ and $\theta^{*}$, we come up with a new quantity called the Acceptance Ratio, $r$.

$p(\theta \mid y) \propto p(y \mid \theta) p(\theta)$

Where $r = \frac{p(\theta \mid y)}{p(\theta^{*} \mid y)} = \frac{p(y \mid \theta) p(\theta)}{p(y \mid \theta^{*}) p(\theta^{*})}$

By examining this ratio, it can be made evident that if the numerator is greater than the denominator, we have an acceptance ratio > 1; this implies that the posterior without the normalization constant with $\theta$ is higher than that with $\theta^{*}$ or is a data-point more likely (or with less uncertainity) to be obtained from the posterior.

3. Accepting or Rejecting The Proposal Based on the Acceptance Ratio

The Symmetric Metropolis Hastings algorithm is a type of Metropolis Hastings algorithm where the proposal distribution between the current the proposed, new value is symmetric i.e. the probability to obtain the proposal given the current value is the same as the probability to obtain the current value given the proposal.

With this idea in mind, once we have computed the acceptance ratio, we want to check to see how likely it'll typify the posterior distribution. This can be done by taking a random drawing from a uniform distribution where the lower and upper values are 0 and 1 respectively. This idea can be elucidated on by the fact that if the proposal based posterior better approximates the real-posterior better than a random drawing from a 0-1 uniform distribution, in the long run, it'll eventually be close-enough to the real-posterior.

As mentioned before, if the proposal was accepted, we use that as the next current value to generate another proposal or else try to get another proposal in the subsequent iteration.

Weaknesses of this Algorithm

The simplicity of this model comes with costs, specifically, the dependence on the hyperparameters that have to be tuned and autocorrelation of values or if the proposed distribution is extremely close to the current value, it'll take longer to explore the entire parameter space as the jumps will be smaller. As an example of the weaknesses, I needed to experiment with a bunch of hyperparameters before I got to some semblance of the posterior distribution I was expecting in addition to increases the number of iterations.

Defining the MCMC Domain

I liked the idea of using a simple Request -> Result model from the Client-Server model to encapulate the approximation. Additionally, I drew inspiration from reading pymc3's code to come up with the following domain.

In general, I want any of the prospective MCMC algorithms to have the ability to:

  1. Output multiple chains that approximate the posterior.
  2. Have the ability to specify multiple types of "steps" or mechanisms to induce jumps in parameter space for appropriately approximating the posterior distribution.
  3. Outputting statistics as to how the algorithm did.
  4. Specifying a converge criteria after which the MCMC algorithm stops the approximation.
  5. An ability to pass in a proposal distribution on the basis of which jumps in the parameter space are induced.

Terminology

The Acceptance Rate is a result of an MCMC chain the proportion of the number of accepted proposals to the total number of non-burn in proposals. This metric is a proxy of how the steps are approximating the posterior.

The BurnInIterationsPct is the percent of posterior approximates to ignore; these are taken from the beginning of the chains and are representative of the first few jumps before the algorithm is at a point where it's better at approximating the posterior.

In [22]:
type ConvergenceCriteria = 
    | IterativeConvergence of int // Number of Iterations

type ProposalDistribution =
    | Normal of float  // Normal( current, delta )
    | ContinuousUniform of float // ContinuousUniform( current - delta, current + delta )
    | PositiveContinuousUniform of float // ( x ~ Uniform( current - delta, current + delta ) if x <= 0 then 0.1 else x )

type MCMCInferenceStep = 
    | SymmetricMetropolisHastings of ProposalDistribution * float

type MCMCChain =
    { Id             : int
      AcceptanceRate : float // # of Acceptances / Total # of Chain Items
      StepValues     : float seq }

type MCMCRequest = 
    { StepFunction         : MCMCInferenceStep
      ConvergenceCriteria  : ConvergenceCriteria
      BurnInIterationsPct  : float 
      Chains               : int 
      PrintDebug           : bool }

type MCMCResult =
    { Chains         : MCMCChain seq 
      MCMCRequest    : MCMCRequest }
In [16]:
open System
open MathNet.Numerics.Distributions
open XPlot.Plotly

type AcceptanceRejection =
    | Acceptance of float // Acceptance of the Proposal
    | Rejection  of float // Rejection of the Proposal

let doSymmetricMetropolisHastings (request : MCMCRequest) 
                                  (iterations : int)
                                  (proposalDistribution : ProposalDistribution)
                                  (initialValue : float)
                                  (simpleBayesianModel : SimpleBayesianNetworkModel) : MCMCResult =

    let getChain (id : int) (request : MCMCRequest) 
                 (iterations : int) (simpleBayesianModel : SimpleBayesianNetworkModel) : MCMCChain =

        let burnin         = int (Math.Ceiling(request.BurnInIterationsPct / 100. * float iterations))
        let zeroOneUniform = ContinuousUniform()

        let mutable current = simpleBayesianModel.GetPosteriorWithoutScalingFactor initialValue

        let matchedProposalDistribution (input : float) :  float =
            match proposalDistribution with 
            | Normal delta            -> Normal(input, delta).Sample()
            | ContinuousUniform delta -> ContinuousUniform(input - delta, input + delta).Sample()
            | PositiveContinuousUniform delta -> 
                let u = ContinuousUniform(input - delta, input + delta).Sample()
                if u <= 0. then input + 0.1 else u

        let step (iteration : int) : AcceptanceRejection =
            let proposed           = matchedProposalDistribution current
            let currentProbability = simpleBayesianModel.GetPosteriorWithoutScalingFactor current
            let proposedProbabilty = simpleBayesianModel.GetPosteriorWithoutScalingFactor proposed
            let acceptanceRatio    = Math.Min(currentProbability / proposedProbabilty, 1.)
            let uniformDraw        = zeroOneUniform.Sample()

            if request.PrintDebug then 
                printfn "Chain: %A Iteration: %A - Current Probability: %A | Proposed Probability: %A | AcceptanceRatio: %A | Uniform Draw: %A | Current: %A | Proposed: %A" 
                    id iteration currentProbability proposedProbabilty acceptanceRatio uniformDraw current proposed

            if uniformDraw < acceptanceRatio then (current <- proposed; Acceptance proposed) 
            else Rejection current
    
        let stepResults : AcceptanceRejection seq =
            seq {1..iterations}
            |> Seq.map step
            |> Seq.skip burnin

        let getAcceptanceRateAndStepValues : float * float seq =

            // Compute the Acceptance Rate.
            let acceptanceRate : float = 
                let totalBurninWithoutBurnin : float = float (Seq.length stepResults)
                let totalNumberOfAcceptances : float = 
                    stepResults 
                    |> Seq.filter(fun x -> 
                        match x with
                        | Acceptance x -> true
                        | _ -> false)
                    |> Seq.length
                    |> float
                totalNumberOfAcceptances / totalBurninWithoutBurnin

            // Grab the Step Values that'll approximate the posterior.
            let stepValues =
                stepResults |> Seq.map(fun s ->
                    match s with
                    | Acceptance v -> v
                    | Rejection v -> v)

            acceptanceRate, stepValues

        let acceptanceRate, stepValues = getAcceptanceRateAndStepValues 

        { Id             = id
          AcceptanceRate = acceptanceRate
          StepValues     = stepValues} 

    let chains : MCMCChain seq =
        seq {1..request.Chains}
        |> Seq.map(fun id ->  getChain id request iterations simpleBayesianModel)

    { Chains      = chains
      MCMCRequest =  request }

Running the Algorithm

This method takes in a generic MCMC Request and a model and returns an MCMC Result consisting of the chain of results that have the posterior sequence as well as the acceptance rate.

In [17]:
let runMCMC (request : MCMCRequest) 
            (model : SimpleBayesianNetworkModel) : MCMCResult =
    match request.StepFunction with
    | SymmetricMetropolisHastings (proposalDistribution,initialValue) ->
        match request.ConvergenceCriteria with 
        | IterativeConvergence iterations -> 
            doSymmetricMetropolisHastings request iterations proposalDistribution initialValue model
        | _ -> failwith "You need to pass in the number of iterations for the Metropolis-Hastings algorithm"
    | _ -> failwithf "Step Function Not Registered: %A" request.StepFunction

Tieing it All Together

Let's now go from a user inputted model to the results of the Symmetric Metropolis Hastings algorithm to approximate the posterior for two cases. We'll plot the results of the first chain to highlight if the algorithm did the right thing.

Exponential Prior and Exponential Likelihood

In [18]:
let model = @"x ~ Exponential(a) 
              y|x ~ Exponential(x) : observed"
let parsedModel = parseModel model
let paramList = "{Parameters: {a : 2., b : 2.}, observed : [1,2,3,4,4,2,5,6,7,3,2,3,4,5,6,1,2,3,4,4,4,4]}"
let simpleModel = SimpleBayesianNetworkModel.Construct "Exponential Model" parsedModel (deserializeParameters paramList)

let request : MCMCRequest = 
    { StepFunction        = SymmetricMetropolisHastings (ProposalDistribution.PositiveContinuousUniform 0.002, 0.001)
      ConvergenceCriteria = IterativeConvergence 10000
      BurnInIterationsPct = 0.2
      PrintDebug          = false 
      Chains              = 4 }

let mcmc       = runMCMC request simpleModel
let firstChain = Seq.head mcmc.Chains

Histogram(x = firstChain.StepValues)
|> Chart.Plot
|> Chart.WithTitle "Exponential Prior and Likelihood"
Out[18]:

As the online notebook renderer isn't displaying the chart, I took a screenshot of it.

Exponential Prior and Likelihood

This histogram of the Step Values indicates that the Posterior has the shape of an Exponential distribution; which is expected. As an example, here is the PDF of a typical exponential distribution:

ExponentialDistribution

Normal Prior and Exponential Likelihood

In [19]:
#!fsharp
let model = @"x ~ Normal(μ,τ) 
              y|x ~ Exponential(x) : observed"
let parsedModel = parseModel model
let paramList = "{Parameters: {μ : 5, τ : 3.1622, σ : 1}, observed : [9.37,10.18,9.16,11.60,10.33]}"
let simpleModel = 
    SimpleBayesianNetworkModel.Construct "Normal-Exponential" parsedModel (deserializeParameters paramList)


let request : MCMCRequest = 
    { StepFunction        = SymmetricMetropolisHastings (ProposalDistribution.PositiveContinuousUniform 0.003, 0.0001)
      ConvergenceCriteria = IterativeConvergence 50000
      BurnInIterationsPct = 0.2
      PrintDebug          = false 
      Chains              = 4 }

let mcmc       = runMCMC request simpleModel
let firstChain = Seq.head mcmc.Chains

Histogram(x = firstChain.StepValues)
|> Chart.Plot
|> Chart.WithTitle "Normal Prior with Exponential Likelihood"
|> Chart.WithWidth 700
|> Chart.WithHeight 500
Out[19]: