poster - Xiannian Fan

advertisement
Graduate Center/City University of New York
University of Helsinki
FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES
Xiannian Fan, Brandon Malone and Changhe Yuan
WITH CONSTRAINTS LEARNED FROM DATA
Several recent algorithms for learning Bayesian network structures first calculate potentially optimal parent sets (POPS) for all variables and then use various optimization techniques to
find a set of POPS, one for each variable, that constitutes an optimal network structure. This paper makes the observation that there is useful information implicit in the POPS. Specifically,
the POPS of a variable constrain its parent candidates. Moreover, the parent candidates of all variables together give a directed cyclic graph, which often decomposes into a set of strongly
connected components (SCCs). Each SCC corresponds to a smaller subproblem which can be solved independently of the others. Our results show that solving the constrained
subproblems significantly improves the efficiency and scalability of heuristic search-based structure learning algorithms. Further, we show that by considering only the top p POPS of each
variable, we quickly find provably very high quality networks for large datasets.
Bayesian Network Structure Learning with Graph Search
Graph Search Formulation
Bayesian Network Structure Learning
Representation. Joint probability distribution over a set of variables.
Structure. DAG storing conditional dependencies.
•Vertices correspond to variables.
•Edges indicate relationships among variables.
Parameters. Conditional probability distributions.
Learning. Find the network with the minimal score for complete
dataset D. We often omit D for brevity.
where Score(Xi | PAi) is called local score
Dynamic Programming
Intuition. All DAGs must have a leaf. Optimal networks for a single
variable are trivial. Recursively add new leaves and select optimal
parents until adding all variables. All orderings have to be considered.
Begin with a Pick one variable Pick another leaf.
Find its optimal
single variable. as leaf. Find its
optimal parents. parents from
current.
Continue picking leaves
and finding optimal
parents.
Recurrences.
Potentially Optimal Parent Sets (POPS)
The dynamic programming can be visualized as a search
through an order graph.
While the local scores are defined for all 2n-1 possible parent sets for
each variable, this number is greatly reduced by pruning parent sets
that are provably never optimal (Tian, J. 2000; de Campos, C. P., et al
2011.).
The Order Graph
Calculation. Score(U),
best subnetwork for U.
Node. Score(U) for U.
Successor. Add X as a
leaf to U.
Path. Induces an
ordering on variables.
Size. 2n nodes, one
for each subset.
We refer to the above pruning as lossless score pruning because it is
guaranteed to not remove the optimal network from consideration.
We refer to the scores remaining after pruning as potentially optimal
parent sets (POPS). Denote the set of POPS for variable Xi as Pi .
Figure 1: Order Graph
for 4 variables
Admissible Heuristic Search
Formulation
Start Node. Top node, {}.
Goal Node. Bottom
node, V.
Shortest Path. Corresponds
to optimal structure.
g(U). Score(U).
h(U). Relax acyclicity.
Table 1: The POPS for six variables problem. The ith row
shows the Pi .
POPS Constraints Pruning
Motivation: Not all variables can possibly be ancestors of the others.
Previous technique: Consider all variable orderings anyway.
Shortcomings: Exponential increase in the number of paths in the
search space.
Contribution: Construct the parent relation graph and find its SCCs;
divide the problem into independent subproblems based on the
SCCs.
We collected all the potential
parents–children relation from
POPS, and get the resulting
parent relation graph.
Figure 2: The parent
relation graph.
We extracted its strongly connected components (SCCs) from parent
relation graph. The SCCs form the component graph (Cormen et al.
2001), giving the ancestor constraints (which we call POPS
constraint). Each SCC corresponds to a smaller subproblem which can
be solved independently of the others.
Recursive POPS Constraints Pruning
Selecting the parents for one of the variables has the effect
of removing that variable from the parent relation graph.
After removing it, the remaining variables may split into
smaller SCCs, and the resulting smaller subproblems
can be solved recursively. Figure 3(b) shows the example.
Top-p POPS Constraints
Motivation: Despite POPS constraints pruning, some
problems remain difficult.
Previous technique: AWA* has been used to find bounded
optimality solutions.
Shortcomings: AWA* does not give any explicit tradeoff
between complexity and optimality.
Contribution: Lossy score pruning gives a more principled
way to control the tradeoff; we create the parent relation
graph using only the best p POPS of each variable and
discard POPS not compatible with this graph.
Rather than constructing the parent relation graph by
aggregating all of the POPS, we can instead create the graph
by considering only the best p POPS for each variable.
Experiment Result
FAIL
600
Observation: The constraints seem to help benchmark
network datasets more than UCI.
Observation: Even for very small values of p, the top-p
POPS constraint results in networks provably very close
to the globally optimal solution.
FAIL
No Constraint
511.55
500
Figure 5: The behavior of dataset Hailfinder under the top-p
POPS constraint as p varies.
POPS Constraint
435.65
400
Software: http://url.cs.qc.cuny.edu/software/URLearning.html
Recursive POPS
Constraint
Selected References
300
200
100
76.51
46.62
26.76
20.93
6.47
4.06
2.51
0
Figure 3: Order graphs after applying the POPS constraints. (a)
The order graph after applying the POPS constraints once. (b) The
order graph after recursively applying the POPS constraints on
the second subproblem.
Autos
Soybean
Alarm
1.28
1. de Campos, C. P.; and Ji, Q. Efficient learning of Bayesian networks using
constraints. JMLR 2011.
2. Tian, J. A branch-and-bound algorithm for MDL learning Bayesian networks. In
UAI’00, 2000.
3. Yuan, C.; Malone, B.; and Wu, X. 2011. Learning optimal Bayesian networks
using A* search. In IJCAI ‘11, 2011.
4. Yuan, C.,; and Maline, B. An Improved Admissible Heuristic for Learning Optimal
Bayesian Networks. In UAI’12, 2012..
Barley
Acknowledgements
Figure 4: Running Time (in seconds) of A* with three different
constraint settings: No Constraint, POPS Constraint and This research was supported by NSF grants IIS-0953723, IIS-1219114 and the Academy
Recursive POPS Constraint.
of Finland (COIN, 251170).
Download