In this notebook we give an easy method for finding the optimal BN for a subset of the BN variables. First we create a Gobnilp object and read in some data:
from pygobnilp.gobnilp import Gobnilp
m = Gobnilp()
m.learn('discrete.dat',end='data')
Using license file /home/james/gurobi.lic Academic license - for non-commercial use only Changed value of parameter PreCrush to 1 Prev: 0 Min: 0 Max: 1 Default: 0 Changed value of parameter CutPasses to 100000 Prev: -1 Min: -1 Max: 2000000000 Default: -1 Changed value of parameter GomoryPasses to 100000 Prev: -1 Min: -1 Max: 2000000000 Default: -1 Changed value of parameter MIPFocus to 2 Prev: 0 Min: 0 Max: 3 Default: 0 Changed value of parameter ZeroHalfCuts to 2 Prev: -1 Min: -1 Max: 2 Default: -1 Changed value of parameter MIPGap to 0.0 Prev: 0.0001 Min: 0.0 Max: inf Default: 0.0001 Changed value of parameter MIPGapAbs to 0.0 Prev: 1e-10 Min: 0.0 Max: inf Default: 1e-10
We can use the bn_variables
property to get a list of the variables in this data. Normally a BN would be learned for all these variables.
m.bn_variables
['A', 'B', 'C', 'D', 'E', 'F']
To restrict attention to just the variables B, C, D and F, we just use the set_bn_variables
method:
m.set_bn_variables('BCDF')
m.bn_variables
['B', 'C', 'D', 'F']
Since the data has already been read in we use the learn
without supplying a data source and start the learning process from the 'data' stage.
m.learn(start='data')
********** BN has score -16917.208027209443 ********** B<- -4841.285573475907 C<- -3501.5105385969146 D<-B,C -5104.223974458091 F<- -3470.18794067853 ********** bnlearn modelstring = [B][C][D|B:C][F] ********** CPDAG: Vertices: B,C,D,F B->D C->D
It is interesting to compare this network to that learned from the entire set of variables (when there is no restricton on parent set size).
m.set_bn_variables('ABCDEF')
m.learn(start='data',palim=None) # no restricton on parent set size
********** BN has score -24028.0947783535 ********** A<- -5502.137377150637 B<-A -3688.9395212202216 C<- -3501.5105385969146 D<-A,C -3555.014444236549 E<-B,F -4310.304956470649 F<- -3470.18794067853 ********** bnlearn modelstring = [A][B|A][C][D|A:C][E|B:F][F] ********** CPDAG: Vertices: A,B,C,D,E,F A-B A->D B->E C->D F->E
We get the same conditional independence relations for B, C, D and F in both graphs.