Finding Optimal Bayesian Network Structures with Constraints Learned from Data Xiannian Fan1, Brandon Malone2 and Changhe Yuan1 1City University of New York 2University of Helsinki 1/21 2/21 Overview ๏Structure Learning of Bayesian Networks • Graph Search Formulation • Reduce Search Space Using Constraints • Experiments • Summary 3/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary Learning optimal Bayesian networks • Very often we have data sets. • We can extract knowledge from these data. structure numerical parameters data 4/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary Score-based learning • Goal: Find a Bayesian network that optimizes a scoring function, e.g., MDL, BIC. • NP-hard • Decomposability ๐ ๐=๐ ๐๐ (๐ท๐จ๐ ) where PAi is the parent set of Xi in N ๐ ๐ = 5/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary Graph Search Formulation • Formulate the learning task as a shortest path finding problem – The shortest path solution to a graph search problem corresponds to an optimal Bayesian network [Yuan, Malone, Wu, IJCAI-11] 6/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary Search graph (Order graph) ฯ 1 1,2 1,3 2 3 2,3 1,4 4 2,4 3,4 Formulation: Search space: Variable subsets Start node: Empty set Goal node: Complete set Edges: Select parents Edge cost: BestScore(X,U) for edge U๏ฎU๏{X} Task: find the shortest path between start and goal nodes 1 1,2,3 1,2,4 1,3,4 2,3,4 1,4,3,2 3 2 1,2,3,4 4 4 3 2 [Yuan, Malone, Wu, IJCAI-11] 7/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary Pruning Based on Bounds • Improved search heuristic [Yuan and Malone, UAI-12] • Tightening Bounds [Fan, Yuan and Malone, AAA-14] ฯ 1 1,3 2 3 2,3 1,4 1,2,4 1,3,4 2,4 2,3,4 1,2,3,4 8/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary Search Space • Size: O(2n) ฯ 1 1,2 1,2,3 1,3 2 3 2,3 1,4 1,2,4 1,3,4 4 2,4 3,4 2,3,4 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 9/21 •• Structure Learning Graph Search Reduce Search Space Experiments Summary Motivation for our work • Observation: Most Scores are not optimal for any ordering • Sparse Parent Graph [Yuan and Malone, UAI-12] – Bounds on Parents Sets for MDL [Tian, UAI-00]) – Properties of Decomposable Score Functions [de Campos and Ji, JMLR-11] Most scores can be pruned 10/21 •• Sparse Parent Graph Structure Learning Graph Search Reduce Search Space Experiments Summary Propagate the best scores. • Potential Optimal Parent Set for X1 (a) s1(PA1) (b) s1(PA1) 11/21 •• Structure Learning Graph Search Reduce Search Space Experiments Summary Spare Parent Graph • Potential Optimal Parent Set for X1 (b) s1(PA1) Prune useless local scores. (c) s1(PA1) 12/21 •• Structure Learning Graph Search Reduce Search Space Experiments Summary Motivation for our work • Potentially Optimal Parent Sets (POPS) An example: • Observation: Not all variables can possibly be ancestors of the others. – E.g., any variables in {X3,X4,X5,X6} can not be ancestor of X1 or X2 13/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary POPS Constraints • Parent Child Graph – Formed by Aggregating the POPS • Component Graph – Formed by Strongly Connected Components (SCCs) – Give the Ancestor Constraints {1, 2} {3,4,5,6} 14/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary POPS Constraints • Decompose the Problem – Each SCC corresponds to a smaller subproblem – Each subproblem can be solved independently. {1, 2} {3,4,5,6} 15/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary POPS Constraints • Recursive POPS Constraints – Selecting the parents for one of the variables has the effect of removing that variable from the parent relation graph. 16/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary Top-p POPS Constraints • Create the parent relation graph using only the best p POPS of each variable – Tradeoff between Optimality and Efficiency – Lossy score pruning • Sub-optimality Bound – The most loss by excluding the pruned POPS for Xi ๐น๐ = ๐๐๐ ๐, ๐๐ (๐ท๐จ๐ − ๐๐ (๐ท๐จ′๐ )) where ๐ ๐ (๐ท๐จ๐ )): Xi selects parents PAi using Top-p Constraints ๐ ๐ (๐๐ด′๐ )): Xi selects parents without constraints. – Error Bound for score s(N) learned using POPS Constraint: ๐= ๐(๐ต) ๐ (๐๐ (๐ท๐จ๐ )−๐น๐ ) 17/21 Structure Learning Graph Search Reduce Search Space Experiments Summary • Experiment : POPS and Recursive POPS Constraints Alarm, 37 : # Expanded Nodes(million) 3 Alarm, 37 : Running Time(seconds) 90 80 70 60 50 40 30 20 10 0 2.5 2 1.5 1 0.5 0 No Constraint POPS Recursive POPS No Constraint POPS Recursive POPS 18/21 Structure Learning Graph Search Reduce Search Space Experiments Summary • Experiment : POPS and Recursive POPS Constraints Barley, 48: # Expanded Nodes(million) Barley, 48 : # Running Time(seconds) 0.7 3 0.6 2.5 0.5 2 0.4 1.5 0.3 1 0.2 0.1 0.5 0 0 No Constraint POPS Recursive POPS No Constraint POPS Recursive POPS 19/21 • Structure Learning Graph Search Reduce Search Space Experiments Summary Experiment : POPS and Recursive POPS Constraints Soybean, 36 : # Running Time(seconds) Soybean, 36 : # Expanded Nodes(seconds) 12 600 10 500 8 400 6 300 4 200 2 100 0 0 No Constraint POPS Recursive POPS No Constraint POPS Recursive POPS 20/21 Structure Learning Graph Search Reduce Search Space Experiments Summary • Experiment :Top-p POPS Constraints on Hailfinder ๐−๐ 21/21 Summary: Graph Search Formulation • Structure Learning Graph Search Reduce Search Space Experiments Summary • This Work: Reduce Search Space Using POPS Constraints 22/21