Bayes Nets

A Julia package for Bayesian Networks

A Bayesian Network (BN) represents a probability distribution over a set of variables, $P(x_1, x_2, \ldots, x_n)$. Bayesian networks leverage variable relations in order to efficiently decompose the joint distribution into smaller conditional probability distributions.

A BN is defined by a directed acyclic graph and a set of conditional probability distributions. Each node in the graph corresponds to a variable $x_i$ and is associated with a conditional probability distribution $P(x_i \mid \text{parents}(x_i))$.

Installation

Pkg.add("BayesNets");

Default visualization of the network structure is provided by the GraphPlot package. However, we recommend using tex-formatted graphs provided by the TikzGraphs package. Installation requirements for TikzGraphs (e.g., PGF/Tikz and pdf2svg) are provided here. Simply run using TikzGraphs in your Julia session to automatically switch to tex-formatted graphs (thanks to the Requires.jl package).

Usage

In [1]:
using Random
Random.seed!(0) # seed the random number generator to 0, for a reproducible demonstration
using BayesNets
using TikzGraphs # required to plot tex-formatted graphs (recommended), otherwise GraphPlot.jl is used

Representation

Bayesian Networks are represented with the BayesNet type. This type contains the directed acyclic graph (a LightTables.DiGraph) and a list of conditional probability distributions (a list of CPDs)

Here we construct the BayesNet $a \rightarrow b$, with Gaussians $a$ and $b$:

$$ a = \mathcal{N}(0,1) \qquad b = \mathcal{N}(2a +3,1) $$
In [2]:
bn = BayesNet()
push!(bn, StaticCPD(:a, Normal(1.0)))
push!(bn, LinearGaussianCPD(:b, [:a], [2.0], 3.0, 1.0))
Out[2]:

Conditional Probability Distributions

Conditional Probablity Distributions, $P(x_i \mid \text{parents}(x_i))$, are defined in BayesNets.CPDs. Each CPD knows its own name, the names of its parents, and is associated with a distribution from Distributions.jl.

CPDForm Description
StaticCPD Any Distributions.distribution; indepedent of any parents
FunctionalCPD Allows for a CPD defined with a custom eval function
ParentFunctionalCPD Modification to FunctionalCPD allowing the parent values to be passed in
CategoricalCPD Categorical distribution, assumes integer parents in $1:N$
LinearGaussianCPD Linear Gaussian, assumes target and parents are numeric
ConditionalLinearGaussianCPD A linear Gaussian for each discrete parent instantiation

Each CPD can be learned from data using fit.

Here we learn the same network as above.

In [3]:
a = randn(100)
b = randn(100) .+ 2*a .+ 3

data = DataFrame(a=a, b=b)
cpdA = fit(StaticCPD{Normal}, data, :a)
cpdB = fit(LinearGaussianCPD, data, :b, [:a])

bn2 = BayesNet([cpdA, cpdB])
Out[3]:

Each CPD implements four functions:

  • name(cpd) - obtain the name of the variable target variable
  • parents(cpd) - obtain the list of parents
  • nparams(cpd - obtain the number of free parameters in the CPD
  • cpd(assignment) - allows calling cpd() to obtain the conditional distribution
  • Distributions.fit(Type{CPD}, data, target, parents)
In [4]:
cpdB(:a=>0.5)
Out[4]:
Normal{Float64}(μ=3.9100288094947837, σ=2.5884318677452463)

Several functions conveniently condition and then produce their return values:

In [5]:
rand(cpdB, :a=>0.5) # condition and then sample
pdf(cpdB, :a=>1.0, :b=>3.0) # condition and then compute pdf(distribution, 3)
logpdf(cpdB, :a=>1.0, :b=>3.0) # condition and then compute logpdf(distribution, 3);

The NamedCategorical distribution allows for String or Symbol return values. The FunctionalCPD allows for crafting quick and simple CPDs:

In [6]:
bn2 = BayesNet()
push!(bn2, StaticCPD(:sighted, NamedCategorical([:bird, :plane, :superman], [0.40, 0.55, 0.05])))
push!(bn2, FunctionalCPD{Bernoulli}(:happy, [:sighted], a->Bernoulli(a == :superman ? 0.95 : 0.2)))
Out[6]:

Variables can be removed by name using delete!. A warning will be issued when removing a CPD with children.

In [7]:
delete!(bn2, :happy)
Out[7]:

Likelihood

A Bayesian Network represents a joint probability distribution, $P(x_1, x_2, \ldots, x_n)$. Assignments are represented as dictionaries mapping variable names (Symbols) to variable values. We can evaluate probabilities as we would with Distributions.jl, only we use exclamation points as we modify the internal state when we condition:

In [8]:
pdf(bn, :a=>0.5, :b=>2.0) # evaluate the probability density
Out[8]:
0.01900834726778591

We can also evaluate the likelihood of a dataset:

In [9]:
data = DataFrame(a=[0.5,1.0,2.0], b=[4.0,5.0,7.0])
pdf(bn, data)    #  0.00215
logpdf(bn, data) # -6.1386;

Or the likelihood for a particular cpd:

In [10]:
pdf(cpdB, data)    #  0.006
logpdf(cpdB, data) # -5.201
Out[10]:
-5.611376074196931

Sampling

Assignments can be sampled from a BayesNet.

In [11]:
rand(bn)
Out[11]:
Dict{Symbol,Any} with 2 entries:
  :a => 1.20808
  :b => 4.93954
In [12]:
rand(bn, 5)
Out[12]:

5 rows × 2 columns

ab
Float64Float64
10.1218343.03183
2-0.5131012.11181
30.5232821.93753
40.7088595.28383
52.700317.53423

In general, sampling can be done according to rand(BayesNet, BayesNetSampler, nsamples) to produce a table of samples, rand(BayesNet, BayesNetSampler) to produce a single Assignment, or rand!(Assignment, BayesNet, BayesNetSampler) to modify an assignment in-place.

New samplers need only implement rand!.

The functions above default to the DirectSampler, which samples the variables in topographical order.

Rejection sampling can be used to draw samples that are consistent with a provided assignment:

In [13]:
bn = BayesNet()
push!(bn, StaticCPD(:a, Categorical([0.3,0.7])))
push!(bn, StaticCPD(:b, Categorical([0.6,0.4])))
push!(bn, CategoricalCPD{Bernoulli}(:c, [:a, :b], [2,2], [Bernoulli(0.1), Bernoulli(0.2), Bernoulli(1.0), Bernoulli(0.4)]))
Out[13]:
In [14]:
rand(bn, RejectionSampler(:c=>1), 5)
Out[14]:

5 rows × 3 columns

abc
Int64Int64Bool
1121
2221
3211
4211
5111

One can also use weighted sampling:

In [15]:
rand(bn, LikelihoodWeightedSampler(:c=>1), 5)
Out[15]:

5 rows × 4 columns

abcp
AnyAnyAnyAny
12210.333333
22110.166667
32110.166667
42110.166667
52110.166667

One can also use Gibbs sampling. More options are available than are shown in the example below.

In [16]:
bn_gibbs = BayesNet()
push!(bn_gibbs, StaticCPD(:a, Categorical([0.999,0.001])))
push!(bn_gibbs, StaticCPD(:b, Normal(1.0)))
push!(bn_gibbs, LinearGaussianCPD(:c, [:a, :b], [3.0, 1.0], 0.0, 1.0))

evidence = Assignment(:c => 10.0)
initial_sample = Assignment(:a => 1, :b => 1, :c => 10.0)
gsampler = GibbsSampler(evidence, burn_in=500, thinning=1, initial_sample=initial_sample)
rand(bn_gibbs, gsampler, 5)
Out[16]:

5 rows × 3 columns

abc
AnyAnyFloat64
118.39310.0
217.0663610.0
317.4485310.0
416.781310.0
516.781310.0

Parameter Learning

BayesNets.jl supports parameter learning for an entire graph.

In [17]:
# specify each node's CPD type individually
fit(BayesNet, data, (:a=>:b), [StaticCPD{Normal}, LinearGaussianCPD])
Out[17]:
In [18]:
# specify a single CPD type for all nodes
fit(BayesNet, data, (:a=>:b), LinearGaussianCPD)
Out[18]:

Fitting can be done for specific BayesNets types as well:

In [19]:
data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3], 
                 b=[1,1,1,2,2,2,2,1,1,2,1,1],
                 a=[1,1,1,2,1,1,2,1,1,2,1,1])

fit(DiscreteBayesNet, data, (:a=>:b, :a=>:c, :b=>:c))
Out[19]:

Fitting a DiscreteCPD, which is a CategoricalCPD{Categorical}, can be done with a specified number of categories. This prevents cases where your test data does not provide an example for every category.

In [20]:
cpd = fit(DiscreteCPD, DataFrame(a=[1,2,1,2,2]), :a, ncategories=3);
cpd = fit(DiscreteCPD, data, :b, [:a], parental_ncategories=[3], target_ncategories=3);

Inference

Inference methods for discrete Bayesian networks can be used via the infer method:

In [21]:
bn = DiscreteBayesNet()
push!(bn, DiscreteCPD(:a, [0.3,0.7]))
push!(bn, DiscreteCPD(:b, [0.2,0.8]))
push!(bn, DiscreteCPD(:c, [:a, :b], [2,2], 
        [Categorical([0.1,0.9]),
         Categorical([0.2,0.8]),
         Categorical([1.0,0.0]),
         Categorical([0.4,0.6]),
        ]))
Out[21]:
In [22]:
ϕ = infer(bn, :c, evidence=Assignment(:b=>1))
Out[22]:

2 rows × 2 columns

cpotential
Int64Float64
110.17
220.83

Several inference methods are available. Exact inference is the default.

InfereceMethod Description
ExactInference Performs exact inference using discrete factors and variable elimination
LikelihoodWeightingInference Approximates p(query \ evidence) with N weighted samples using likelihood weighted sampling
LoopyBelief The loopy belief propagation algorithm
GibbsSamplingNodewise Gibbs sampling where each iteration changes one node
GibbsSamplingFull Gibbs sampling where each iteration changes all nodes

All inference methods inherit from the InferenceMethod type.

In [23]:
ϕ = infer(GibbsSamplingNodewise(), bn, [:a, :b], evidence=Assignment(:c=>2))
Out[23]:

4 rows × 3 columns

abpotential
Int64Int64Float64
1110.134
2210.214
3120.0
4220.652

Inference produces a Factor type. It can be converted to a DataFrame.

In [24]:
convert(DataFrame, ϕ)
Out[24]:

4 rows × 3 columns

abpotential
Int64Int64Float64
1110.134
2210.214
3120.0
4220.652

Structure Learning

Structure learning can be done as well.

In [25]:
using Discretizers
using RDatasets
iris = dataset("datasets", "iris")
names(iris)
data = DataFrame(
    SepalLength = iris[!,:SepalLength],
    SepalWidth = iris[!,:SepalWidth],
    PetalLength = iris[!,:PetalLength],
    PetalWidth = iris[!,:PetalWidth],
    Species = encode(CategoricalDiscretizer(iris[!,:Species]), iris[!,:Species]),
)
data[1:3,:] # only display a subset...
Out[25]:

3 rows × 5 columns

SepalLengthSepalWidthPetalLengthPetalWidthSpecies
Float64Float64Float64Float64Int64
15.13.51.40.21
24.93.01.40.21
34.73.21.30.21

Here we use the K2 structure learning algorithm which runs in polynomial time but requires that we specify a topological node ordering.

In [26]:
parameters = K2GraphSearch([:Species, :SepalLength, :SepalWidth, :PetalLength, :PetalWidth], 
                       ConditionalLinearGaussianCPD,
                       max_n_parents=2)
fit(BayesNet, data, parameters)
Out[26]:

CPD types can also be specified per-node. Note that complete CPD definitions are required - simply using StaticCPD is insufficient as you need the target distribution type as well, as in StaticCPD{Categorical}.

Changing the ordering will change the structure.

In [27]:
CLG = ConditionalLinearGaussianCPD
parameters = K2GraphSearch([:Species, :PetalLength, :PetalWidth, :SepalLength, :SepalWidth], 
                        [StaticCPD{Categorical}, CLG, CLG, CLG, CLG],
                        max_n_parents=2)
fit(BayesNet, data, parameters)
Out[27]:

A ScoringFunction allows for extracting a scoring metric for a CPD given data. The negative BIC score is implemented in NegativeBayesianInformationCriterion.

A GraphSearchStrategy defines a structure learning algorithm. The K2 algorithm is defined through K2GraphSearch and GreedyHillClimbing is implemented for discrete Bayesian networks and the Bayesian score:

In [28]:
data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3], 
                 b=[1,1,1,2,2,2,2,1,1,2,1,1],
                 a=[1,1,1,2,1,1,2,1,1,2,1,1])
parameters = GreedyHillClimbing(ScoreComponentCache(data), max_n_parents=3, prior=UniformPrior())
bn = fit(DiscreteBayesNet, data, parameters)
Out[28]:

One can specify the number of categories for each variable in case it cannot be correctly inferred:

In [29]:
bn = fit(DiscreteBayesNet, data, parameters, ncategories=[3,3,2])
Out[29]:

A whole suite of features are supported for DiscreteBayesNets.

In [30]:
bayesian_score(bn, data, parameters.prior) # compute the Bayesian score of the data under the BayesNet
Out[30]:
-31.28804624550449
In [31]:
count(bn, :a, data) # obtain a list of counts for the node
Out[31]:

2 rows × 2 columns

acount
Int64Int64
119
223
In [32]:
statistics(bn.dag, data) # sufficient statistics from a discrete dataset
Out[32]:
3-element Array{Array{Int64,2},1}:
 [4; 4; 4]
 [3 1 3; 1 3 1]
 [3 1 … 2 0; 0 0 … 1 1]
In [33]:
table(bn, :b) # obtain the factor table for a node
Out[33]:

6 rows × 3 columns

abp
Int64Int64Float64
1110.666667
2210.166667
3120.25
4220.666667
5130.0833333
6230.166667
In [34]:
table(bn, :c, :a=>1) # obtain a factor table matching a particular assignment
Out[34]:

9 rows × 4 columns

bacp
Int64Int64Int64Float64
11110.4
22110.2
33110.333333
41120.2
52120.6
63120.333333
71130.4
82130.2
93130.333333

Reading from XDSL

One can read discrete Bayesian networks from the .XDSL file format.

In [35]:
bn = readxdsl(joinpath(dirname(pathof(BayesNets)), "..", "test", "sample_bn.xdsl"))
Out[35]:

Bayesian Score for a Network Structure

The bayesian score for a discrete-valued BayesNet can can be calculated based only on the structure and data (the CPDs do not need to be defined beforehand). This is implemented with a method of bayesian_score that takes in a directed graph, the names of the nodes and data.

In [36]:
data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3], 
                 b=[1,1,1,2,2,2,2,1,1,2,1,1],
                 a=[1,1,1,2,1,1,2,1,1,2,1,1])
g = DAG(3)
add_edge!(g,1,2); add_edge!(g,2,3); add_edge!(g,1,3)
bayesian_score(g, [:a,:b,:c], data)
Out[36]:
-29.642240688899513