Learning Comparisons

A variety of design decisions and parameters affect the behavior of score-based structure learning algorithms. Optimality is perhaps the most straightforward distinction. Some algorithms, such as dynamic programming, guarantee to return the best-scoring network; others, such as Friedman's sparse candidate algorithm, do not offer any guarantees on the quality of the returned network. The structure learning page discusses my work in this area is detail. The choice of scoring function and parameters, such as the equivalent sample size for the BDeu score, affect both global and local search algorithms. Constraints and priors offer a variety of ways to incorporate expert knowledge.

Zhifa, Changhe and I have thoroughly investigated the effect of scoring function on structure learning. In this work, we learned gold standard networks from UCI datasets. We then sampled a variety of datasets from each network. Next, we used an optimal structure learning algorithm to learn a network from each of the sampled datasets. Finally, we computed the structural Hamming distance between the gold standard and learned networks. We repeated this process using AIC, BIC, fNML and BDeu with a variety of equivalent sample sizes. We found that BIC and fNML performed well. BDeu performed well for some ESSs, but poorly for others.

Abiola and I have evaluated the use of some expert knowledge on structure learning. I adapted the exact dynamic programming structure learning algorithm to incorporate hard structural constraints. That is, an expert specifies that some edges must exist and others are not allowed. (They can actually specify constraints in disjunctive normal form, so the constraints can be rather complex.) Abiola and I ran experiments using a publicly available epigenetic dataset. We derived our expert knowledge from the literature. We then ran the algorithm using a number of different constraints. We validated our model by counting the number of relationships present in both the literature and the learned network. Our results showed that "positive" constraints (A -> B must exist) often helped find other true positives. On the other hand, negative constraints did not help find more true positives for most cases.

Not all algorithms treat constraints the same. The simple dynamic programming algorithm uses prior knowledge as "hard" constraints. These algorithms always return structures consistent with the constraints. Others use special scoring functions which use the constraints to bias scores towards networks which are closer to the structure imposed by the constraints. These algorithms treat the expert knowledge as "soft" constraints; enough contradictory evidence from the data can "override" the constraint. In the future, I would like to investigate soft constraints and priors in more detail.