TIGHTENING BOUNDS FOR BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan, Changhe Yuan and Brandon Malone A recent breadth-first branch and bound algorithm (BFBnB) for learning Bayesian network structures (Malone et al. 2011) uses two bounds to prune the search space for better efficiency; one is a lower bound calculated from pattern database heuristics, and the other is an upper bound obtained by a hill climbing search. Whenever the lower bound of a search path exceeds its upper bound, the path is guaranteed to lead to suboptimal solutions and is discarded immediately. This paper introduces methods for tightening the bounds. The lower bound is tightened by using more informed variable groupings in creating the pattern databases, and the upper bound is tightened using an anytime learning algorithm. Empirical results show that these bounds improve the efficiency of Bayesian network learning by two to three orders of magnitude. Tightening Upper Bound Bayesian Network Structure Learning Representation. Joint probability distribution over a set of variables. Structure. DAG storing conditional dependencies. •Vertices correspond to variables. •Edges indicate relationships among variables. Parameters. Conditional probability distributions. Learning. Find the network with the minimal score for complete dataset D. We often omit D for brevity. Anytime window A* (AWA*) was shown to find high quality, often optimal, solutions very quickly, thus provided a tight upper bound. Tightening Lower Bound The lower bound was calculated from a pattern database heuristic called k-cycle conflict heuristic. Particularly, Static k-cycle conflict pattern database was shown to have a good performance. Computing k-Cycle Conflict Heuristic. Its main idea is to relax the acyclicity constraint between groups of variables; acyclicity is enforced among the variables within each group. For a 8-variable problem, partition all the variables by Simple Grouping (SG) into two groups: G1={X1, X2, X3, X4}, G2={X5, X6, X7, X8}. We created the pattern databases with a backward breadth-first search in the order graph for each group. More Informed Grouping Strategies Dynamic Programming Rather than use SG (1st half VS 2nd half grouping), we developed more informed grouping strategies. Intuition. All DAGs must have a leaf. Optimal networks for a single variable are trivial. Recursively add new leaves and select optimal parents until adding all variables. All orderings have to be considered. Begin with a Pick one variable Pick another leaf. Find its optimal single variable. as leaf. Find its optimal parents. parents from current. Recurrences. 1. Maximizing the correlation between the variables within each group, and Minimize the correlation between groups. a) Family Grouping (FG): We created a correlation graph by Max-Min Parent Children (MMPC) algorithm, and gave weights by negative p-value; then performed graph partition. Continue picking leaves and finding optimal parents. P1 = h1({X2,X3}) = BestScore(X2, {X1, X4}U G2) + BestScore(X3, {X1, X2 , X4}U G2) b) Parents Grouping (PG): We created a correlation graph by only considering the optimal parent set out of all the other variables for each variable, and gave weights by negative p-value; then performed graph partition. P2 = h2({X5,X7}) = BestScore(X5, {X6, X8} U G1) + BestScore(X7, {X5, X6 , X8} U G1) 2. Using Topological Ordering Information. Additive Pattern database heuristic : h({X2,X3,X5,X7}) = h1({X2,X3})+h2({X1, X4}))= P1 + P2 a) Topology Grouping (TG): We created a correlation graph by considering the topological ordering of an anytime Bayesian Network solution by AWA*, then partitioned the variables according to the ordering. E.g., how to calculate the heuristic for pattern {X2,X3,X5,X7}? Graph Search Formulation The dynamic programming can be visualized as a search through an order graph. The Order Graph Calculation. Score(U), best subnetwork for U. Node. Score(U) for U. Successor. Add X as a leaf to U. Path. Induces an ordering on variables. Size. 2n nodes, one for each subset. Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. The effect of upper bounds generated by running AWA* for different amount of time on the performance of BFBnB search. Admissible Heuristic Search Formulation Start Node. Top node, {}. Goal Node. Bottom node, V. Shortest Path. Corresponds to optimal structure. g(U). Score(U). h(U). Relax acyclicity. Selected References 1. 2. 3. 4. Yuan, C.; Malone, B.; and Wu, X. 2011. Learning optimal Bayesian networks using A* search. In IJCAI ‘11, 2011. Malone, B.; Yuan,C.2011. Improving the Scalability of Optimal Bayesian Network Learning with Frontier BreadthFirst Branch and Bound Search. In UAI’11, 2011. Felner, A.; Korf, R. E.; and Hanan, S. 2004. Additive pattern database heuristics. Journal of Artificial Intelligence Research (JAIR) vol. 22. Malone, B, Yuan, C. 2013. Evaluating Anytime Algorithms for Learning Optimal Bayesian Networks. In UAI’13, 2013. The effect of different grouping strategies on the number of expanded nodes and time. The four grouping methods are the simple grouping (SG), FG, PG, and TG.