In this notebook we see how to call Gobnilp from with an R session and how to use Gobnilp and bnearn together. First of all let's attach the bnlearn package (which I will assume is already installed)
library(bnlearn)
Attaching package: ‘bnlearn’ The following object is masked from ‘package:stats’: sigma
To communicate with Gobnilp we will use the excellent reticulate
package (which is also assumed installed):
library(reticulate)
Now that reticulate is in use we can use the "import" command to import the gobnilp
module:
gob <- import("pygobnilp.gobnilp")
Now we can use the newly created gob
to access attributes of the gobnilp
module. Usually the class Gobnilp
is all we are interested in and we just want that to create a Gobnilp
object. Here's how to do that:
m <- gob$Gobnilp()
Note that instead of Python's '.' to get to attributes we have R's '$'. Although in the above we created the variable gob
usually there is no need to do so, since we just want the Gobnilp
object, so doing this is simpler:
m <- import("pygobnilp.gobnilp")$Gobnilp()
Now that we have the object m
we can use it to, for example, learn a BN from a data file:
m$learn("discrete.dat",plot=FALSE)
I have put plot=FALSE
since although when running Gobnilp from R the learned BN is indeed plotted, errors are also generated. Note also that not even any textual output about the learned BN is here. This is an issue with Jupyter since this is textual ouptut is generated when using a normal R session. But we can still get a report on what was learned as follows:
m$learned_bn
********** BN has score -24028.0947783535 ********** A<- -5502.137377150637 B<-A -3688.9395212202216 C<- -3501.5105385969146 D<-A,C -3555.014444236549 E<-B,F -4310.304956470649 F<- -3470.18794067853 ********** bnlearn modelstring = [A][B|A][C][D|C:A][E|B:F][F]
writeLines(m$learned_bn$cpdag_str())
********** CPDAG: Vertices: A,B,C,D,E,F A-B A->D B->E C->D F->E
We can also use Gobnilp to learn a BN from data supplied as an R data frame. Since we have bnlearn available we can grab its built in dataset learning.test
and then have a look at it.
data(learning.test)
summary(learning.test)
A B C D E F a:1668 a:2362 a:3717 a:1755 a:1941 a:2509 b:1670 b: 568 b:1024 b:1570 b:1493 b:2491 c:1662 c:2070 c: 259 c:1675 c:1566
So learning.test
is a data frame with 6 discrete variables. Here's how to use Gobnilp to learn from this data frame (with default parameters)
m2 <- gob$Gobnilp()
m2$learn(learning.test,plot=FALSE)
m2$learned_bn
********** BN has score -24028.0947783535 ********** A<- -5502.137377150637 B<-A -3688.939521220218 C<- -3501.5105385969146 D<-A,C -3555.014444236549 E<-B,F -4310.3049564706525 F<- -3470.18794067853 ********** bnlearn modelstring = [A][B|A][C][D|C:A][E|B:F][F]
So we get the same BN as before, which is no surpise since learning.test
and the data in the file discrete.dat
are exactly the same! Since we have only 6 variables there is no need to use Gobnilp's default parent set limit of 3. So let's learn again without it.
m2 <- gob$Gobnilp()
m2$learn(learning.test,plot=FALSE,palim=99)
m2$learned_bn
********** BN has score -24028.0947783535 ********** A<- -5502.137377150637 B<-A -3688.939521220218 C<- -3501.5105385969146 D<-A,C -3555.014444236549 E<-B,F -4310.3049564706525 F<- -3470.18794067853 ********** bnlearn modelstring = [A][B|A][C][D|C:A][E|B:F][F]
We get the same BN yet again. So this really is a globally optimal BN. We can use one of bnlearn's structure learning algorithms on learning.test
. Here's hill-climbing, for example:
m3 <- hc(learning.test)
m3
Bayesian network learned via Score-based methods model: [A][C][F][B|A][D|A:C][E|B:F] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 learning algorithm: Hill-Climbing score: BIC (disc.) penalization coefficient: 4.258597 tests used in the learning procedure: 40 optimized: TRUE
So hill-climbing with the BIC score finds the same network.
We can use Gobnilp to learn from continuous data in the same way. Here is an example using the bnlearn dataset gaussian.test
:
data(gaussian.test)
m4 <- gob$Gobnilp()
m4$learn(gaussian.test,data_type="continuous",score="BGe",plot=FALSE,palim=99)
m4$learned_bn
********** BN has score -53258.94161814129 ********** A<- -7124.782936593145 B<-D 487.26955605315015 D<- -14692.560410628308 C<-A,B -3743.043565645814 E<- -10545.851006239516 F<-A,D,E,G -7109.807736959381 G<- -10530.165518128275 ********** bnlearn modelstring = [A][B|D][D][C|B:A][E][F|G:E:D:A][G]
And again, we can see how hill-climbing does (using BGe scoring) ...
m5 <- hc(gaussian.test,score="bge")
m5
score(m5,gaussian.test,type="bge")
Bayesian network learned via Score-based methods model: [A][B][E][G][C|A:B][D|B][F|A:D:E:G] nodes: 7 arcs: 7 undirected arcs: 0 directed arcs: 7 average markov blanket size: 4.00 average neighbourhood size: 2.00 average branching factor: 1.00 learning algorithm: Hill-Climbing score: Bayesian Gaussian (BGe) graph prior: Uniform imaginary sample size (normal): 1 imaginary sample size (Wishart): 9 tests used in the learning procedure: 75 optimized: TRUE
Hill-climbing and Gobnilp's exact algorithm find BNs in the same Markov equivalence class, and since BGe is score-equivalent they have the same score. In this particular example, the only advantage of exact learning is that we know we have the optimal BN.