Structure Learning

My dissertation is "Learning Optimal Bayesian Networks with Heuristic Search." In this work, we propose a heuristic search formulation for the optimal Bayesian network structure learning problem. A Bayesian networks structure is a directed acyclic graph (DAG) in which each vertex corresponds to a variable. In this problem, we are given a complete dataset (like a spreadsheet, where each row is a "record" and each column is one of the variables) and a scoring function. The scoring function takes as input a DAG structure and a dataset and outputs a score for that structure and dataset. The problem is to find the structure which optimizes the scoring function for a given dataset.

In our search formulation, each node in the implicit search graph corresponds to a subnetwork over some subset of the variables. A successor of a node represents adding one more variable to the subnetwork and selecting its optimal parents. The start node of the search is an empty subnetwork, while the goal node is a subnetwork over all of the variables. A path from the start to the goal represents a choice of parents for all variables in the network structure. The cost of the path is exactly equal to the score of of that structure. Therefore, the shortest path from the start to the goal corresponds to the optimal network structure.

Using our shortest path formulation, we have adapted several graph search algorithms to find optimal Bayesian network structures. Changhe and a previous student of his, Xiojian Wu, initially developed an A* algorithm. We have since improved the algorithm and have shown it is viable for about 25 variables. As with most A* algorithms, storing the closed list in RAM is the main resource bottleneck. Because of the highly regular structure of the search space, we can divide it into layers. All predecessors of layer l are in layer l-1. Therefore, we only need to detect duplicates (use a closed list) for one layer at a time. After generating layer l, layer l-1 can be deleted. This search resembles Zhou and Hansen's breadth-first branch and bound search algorithm. We also incorporate external memory. We have shown that this algorithm can scale to about 35 variables.

I am also interested in bounded error learning. For example, we have developed algorithms that learn algorithms that have scores that are within, say, 5% percent of the optimal network. These algorithms are based on weighted A*. Preliminary empirical results suggest that learning networks 8% of optimal is "easy" (not NP-easy) while learning networks within 4% of optimal is much more difficult. In practice, score calculations take much time; often, that process runs longer than the structure learning step. I would like to reduce that time. I believe that introducing bounded error at this stage could significantly improve the efficiency of structure learning.

Page updated

Google Sites

Report abuse