I introduce a new open source Bayesian network structure learning API called, Free-BN (FBN). FBN is licensed under the Apache 2.0 license. Following, I’ll scratch the surface of FBN and walk you through an example of using FBN. Why another Bayesian network structure learning API?While working on my dissertation, I had a tough time looking for open source APIs for constraint-based structural learning of Bayesian networks. The few open source APIs I found dealing with Bayesian networks written in Java were: This page here provides a long list of Bayesian network related software/APIs. One of the fruition of my dissertation (though not reported or included in my doctoral dissertation) was the development of FBN for Bayesian network structural learning written in Java. Some features of FBNSo, what can FBN currently do (related to Bayesian networks)? Here’s a non-exhaustive list.
Working with FBN should be relatively easy. It’s meant to be an API (not an application). Currently, FBN can only learn from database sources, although, you could extend the API to learn from flat files. FBN works primarily based on the design of inversion of control (IOC) or dependency injection (DI) and uses the Spring Framework to achieve that design. Using DI and working primarily with interfaces mean the API can easily be extended to include other structure learning algorithms. Walkthrough preliminariesBefore I perform the walkthrough on how to use FBN, let’s provide some background information. The dataset is generated using logic sampling and the Bayesian network reported by (Cooper 1992). This Bayesian network has three variables: X1, X2, and X3. The structure of this Bayesian network is a serial connection: X1 -> X2 -> X3. The local probability models reported are shown in the table below.
The algorithm to learn the Bayesian network from the data will be Three Phase Dependency Analysis (TPDA) (Cheng 2002). TPDA is a constraint-based Bayesian network structure learning algorithm. It has three phases: drafting, thickening, and thinning. TPDA is implemented in FBN and will be used to learn the Bayesian network structure from the data generated using logic sampling. Setup your data sourceFBN takes as input data stored in a database with JDBC drivers. Some examples of such databases are Oracle, MS SQL Server, and MySQL. In this walkthrough, I’ll be showing examples using MySQL. The data must be stored in two separate tables: one table to specify the variables (denote this as vtable), and one table to hold the actual data (denote this as dtable). The vtable should have the following fields: name, type, and domain. An example of a DDL for a vtable using MySQL is:
Since we have three binary variables (x1, x2, and x3), we have to insert values into the vtable to describe these variables.
The type is set to 1 for categorical variables. For all types see net.fdm.data.intf.Variable. Now, we have to create a table to hold the data. The following is a sample MySQL DDL to create such a table.
Now that we have created the dtable, insert data into it.
If you download the source code for FBN, the MySQL scripts are located in demo/mysql.sql. The source code to create the Bayesian network and perform logic sampling is located in demo/com/vang/jee/fbn/demo/DataGenerator.java. Set up your structure learning algorithmNow it’s time to setup our structure learning algorithm of choice. We can do so in code (using Java), or, the better alternative, is to “wire up” the algorithm using Spring and XML files. The following code shows how to wire up the TPDA structure learning algorithm using Java.
The getDataSource method gets a DataSource pointing to your database (in this MySQL instance). The getVariableDao method provides a reference to the VariableDao object that has access to the variable and data. The getStructureLearner method wires up the TPDA implementation. In the main method, you get a reference to all the variables for which you want to perform Bayesian network structure learning and instance of the structure learner. You then pass this array of variables into the learner to produce a Graph. The nodes in the graph should be: x1, x2, x3. The arcs in this graph is: x1–x2 and x2–x3. Therefore, the structure is: x1–x2–x3. Clearly, this graph structure is an undirected graph, and thus cannot satisfy the directed acyclic graph (DAG) requirement of a Bayesian network. The source code for this learning example is located in the source distribution under demo/src/com/vang/jee/fbn/demo/TestLearning.java. How to get the source and dependencies?The FBN API is dependent on two other minor projects called, Free-Display and Free-GA (FGA). The Free-Dispaly API is used to visualize the Bayesian networks, while the FGA API is used for search-and-scoring methods for Bayesian network structure learning. You may download all these APIs, and they are all licensed under the Apache 2.0 license.
I hope this API helps you in your research. Happy research, data mining, and programming! Cheers! Sib ntsib dua mog! References
RelatedBayesian scoring function Java and JavaScript API for Bayesian Belief Networks In "Bayesian Networks" Bayesian scoring functions for structure learning of Bayesian belief networks (BBNs)In "Bayesian Networks" Search the web through Google without an API KeyIn "Data Mining" |
|