Spirtes2013.ppt

Markov Properties of Directed Acyclic Graphs Peter Spirtes Outline Bayesian Networks Simplicity and Bayesian Networks Causal Interpretation of Bayesian Networks Greedy Equivalence Search 2 Outline Bayesian Networks Simplicity and Bayesian Networks Causal Interpretation of Bayesian Networks Greedy Equivalence Search 3 Directed Acyclic Graph (DAG) ng bl bd ng – no gas bl – battery light on bd – battery dead ns – no start ns The vertices are random variables. All edges are directed. There are no directed cycles. 4 Definition of Conditional Independence Let P(X,Y,Z) be a joint distribution over X, Y, Z. X is independent of Y conditional on Z in P, written as IP(X,Y|Z), if and only if P(X|Y,Z) = P(X|Z) whenever P(y,z) > 0. 5 Probabilistic Interpretation Local Markov Condition ng bl bd ns Distribution P satisfies the Local Markov Condition for DAG G iff each vertex is independent in P of all vertices that are neither parents nor descendants, conditional on its parents. IP(ng,{bd,bl}| ) IP(bl,{ng,ns}|bd) IP(bd,ng| ) IP(ns,bl|{bd,ng} 6 Probabilistic Interpretation: I-Map ng bl bd If distribution P satisfies the Local Markov Condition for G, G is an Imap (Independence-map) of P. ns 7 Graphical Entailment ng bl bd ns If G being an I-map of P entails a conditional independence relation I holds in P, G entails I. Examples: In every distribution P that G is an I-map of IP(bd,ng| ) IP(bl,ng|{bd,ns}) 8 If I is Not Entailed by G ng bl bd ns If conditional independence relation I is not entailed by G, then I may hold in some (but not every) distribution P that G is an I-map of. Example: G is an I-map of some P such that ~IP(ns,bl| ) and some P’ such that IP’(ns,bl| ) 9 Factorization ng bl bd If G is an I-map of P, then P(ng,bd,bl,ns) = P(ng)P(bd)P(bl|bd)P(ns|bd,ng) ns 10 Example of Parametric Family ng bl bd ns For binary variables, the dimension of the set of distributions that G is an I-map of, is 8. 1. 2. 3. 4. 5. 6. 7. 8. P(ng = 0) P(bd = 0) P(bl = 0|bd = 0) P(bl = 0|bd = 1) P(ns = 0|ng = 0, bd = 0) P(ns = 0|ng = 0, bd = 1) P(ns = 0|ng = 1, bd = 0) P(ns = 0|ng = 1, bd = 1) 11 Outline Bayesian Networks Simplicity and Bayesian Networks Causal Interpretation of Bayesian Networks Greedy Equivalence Search 12 Markov Equivalence Two DAGs G1 and G2 are Markov equivalent when they contain the same variables, and for all disjoint X, Y, Z, X is entailed to be independent from Y conditional on Z in G1 if and only if X is entailed to be independent from Y conditional on Z in G2 13 Markov Equivalence Two DAGS over the same set of variables are Markov equivalent iff they have: the same adjacencies the same unshielded colliders (X → Y ← Z, and no X → Z or Z→ X edge) 14 Markov Equivalence ng bl ng bl bd ns DAG G bd ns DAG G’ 15 Patterns pattern represents a Markov equivalence class of DAGs. A bl ng bl ng ng DAG G bd bd bd ns bl ns DAG G’ ns Pattern(G) 16 Patterns The adjacencies in a pattern are the same as the adjacencies in each DAG in the d-separation equivalence class. An edge is oriented as A → B in the pattern if it is oriented as A → B in every DAG in the equivalence class. An edge is oriented as A – B in the pattern if the edge is oriented as A → B in some DAGs in the equivalence class, and as A ← B in other DAGs in the equivalence class. 17 Patterns All of the conditional independence relations entailed by a DAG G represented by a pattern G’ can be read off of G’. So we can speak of a pattern being an Imap of P by extension. 18 Faithfulness P is faithful to G if Every conditional independence entailed by G, is true in P (Markov) Every conditional independence true in P is entailed by G For every P, there is at most one, but possibly no pattern that P is faithful to. Unfaithfulness X Y G Z X Y G’ Z + IP(X,Z|  IP(X,Z|  Both G and G’ are patterns that are I-maps of a P in which the only independence is I(X,Z| , but P is faithful only to G’, not G. 20 Unfaithfulness X Y Z IP(X,Z|Y) + IP(X,Z|  X Y Z IP(X,Z|Y) + IP(X,Z|  If both IP(X,Z|Y) + IP(X,Z|  then P is not faithful to any DAG. 21 Violations of Faithfulness If conditional independence can be expressed as rational function of the parameters (e.g. any exponential family including Gaussian and multinomial) then violations of faithfulness are Lebesgue measure 0. 22 Minimality G is Markov and Minimal for P if and only if Every conditional independence entailed by G is true in P (Markov) No graph that entails a subset of the conditional independencies entailed by G is Markov to P (minimality). For every P, there is at least one, and possibly more than one pattern, Markov and minimal for P. Two Markov Minimal Graphs for P X Y Z IP(X,Z|Y) + IP(X,Z|  X Y Z IP(X,Z|Y) + IP(X,Z|  24 Faithfulness and Minimality If P is faithful to G, then G is a unique Markov Minimal pattern for P. 25 Dimensionality → X X X X Y Entailment Inclusion → Lattice of I-maps Y Y Y Z X Z Y Z Z X Y Z Z X X Y Z X Y Z X Y Z X Y Y Z Z Outline Bayesian Networks Simplicity and Bayesian Networks Causal Interpretation of Bayesian Networks Greedy Equivalence Causal Search 27 Causal Interpretation of DAG There is an edge X → Y in G if and only if there is some pair of experimental manipulations of all of the variables in G other than Y, that differ only in what manipulation is performed on X; and make the probability of Y different. Causal Sufficiency A set S of variables is causally sufficient if there are no variables not in S that are direct causes of more than one variable in S. 29 Causal Sufficiency ng bl bd S = {ng, ns} is causally sufficient. S = {ng, ns, bl} is not causally sufficient. ns 30 Causal Markov Assumption In a population Pop with causally sufficient graph G and distribution P, each variable is independent of its non-descendants (noneffects) conditional on its parents (immediate causes) in P. Equivalently: G is an I-map of P. Causal Faithfulness Assumption In a population Pop with causally sufficient graph G and distribution P, I(X,Y|Z) in P only if X is entailed to be independent of Y conditional on Z by G. Causal Faithfulness Assumption As currently used, it is a substantive simplicity assumption, not a methodological simplicity assumption. It says to prefer the simplest explanation: if a more complicated explanation is true, it leads astray. Causal Faithfulness Assumption It serves two roles in practice: Aiding model choice (in which case weaker versions of the assumption will do) Simplifying search Causal Minimality Assumption In a population Pop with causally sufficient graph G and distribution P, G is a minimal I-map of P. 35 Causal Minimality Assumption This is a strictly weaker simplicity assumption than Causal Faithfulness. Given the manipulation interpretation of the causal graph, and an everywhere positive distribution, the Causal Minimality Assumption is entailed by the Causal Markov Assumption. 36 Outline Bayesian Networks Simplicity and Bayesian Networks Causal Interpretation of Bayesian Networks Greedy Equivalence Search 37 Greedy Equivalence Search Inputs: Sample from a probability distribution over a causally sufficient set of variables from a population with causal graph G. Output: A pattern that represents G. 38 Assumptions Causally sufficient set of varaibles No feedback The true causal structure can be represented by a DAG. Causal Markov Assumption Causal Faithfulness Assumption 39 Consistent scores If model M1 contains the true distribution, while model M2 doesn’t, then in the large sample limit M1 gets the higher score. If both M1 and M2 contain the true distribution, and M1 has fewer parameters than M2 does, then in the large sample limit M1 gets the higher score. 40 Consistent scores So, assuming the Causal Markov and Faithfulness conditions, a consistent score assigns the true model (and its Markov equivalent models) the highest score in the limit. Examples: Bayesian Information Criterion, posterior of Bde or Bge prior 41 Score Equivalence Under BIC, Bde, and Bge, Markov equivalent DAGs are also score equivalent, i.e., always receive the same score. (Could also use posterior probabilities under a wide variety of priors). This allows GES to search over the space of patterns, instead of the space of DAGs. 42 Two Phases of GES Forward Greedy Search (FGS): Starting with any pattern Q, evaluate all patterns that are I-maps of Q with one more edge, and move to the one with the best increase of score. Iterate until local maximum. Backward Greedy Search (BGS): Starting with a pattern Q that contains the true distribution, evaluate all patterns with one fewer edge of which Q is an I-map, and move to the one with the best increase of score. Iterate until local maximum. 43 Asymptotic Optimality In the large sample limit, GES always returns a pattern that is minimal and Markov to P. Proof. Because the score is consistent, the forward phase continues until it reaches a pattern G such that P is Markov to G. The backward phase preserves Markov, and continues until there is no pattern that is both Markov to P and entails more independencies. 44 Asymptotic Optimality If P is faithful to the causal pattern G, then BGS returns G. Proof Sketch: G is the unique pattern that is minimal and Markov to P. By the previous theorem, GES returns G. 45 When GES Fails to Find the Right Causal Pattern In the large sample limit, GES always finds a Markov Minimal pattern Some Markov Minimal pattern may not be the true causal pattern if Causal Minimality Assumption is violated; or There are multiple Markov Minimal patterns, and GES outputs the wrong one One Markov Minimal Pattern for P, but Not the Causal Pattern X Y Z + I(X,Z|  X Y Z I(X,Z|  47 Two Markov Minimal Graphs for P X Y G Z I(X,Z|Y) + I(X,Z|  X Y G’ Z I(X,Z|Y) + I(X,Z|  For some parametric families G and G’ have the same score and the same dimension. For other parametric families, G’ has lower dimension and higher score than G. 48 Multiple Minimal I-maps? Lower Standard of Success Output all Markov Minimal Output any Markov Minimal More Data Experiments More Background Knowledge Time order, etc. References Chickering, M. (2002) Optimal Structure Identification With Greedy Search, Journal of Machine Learning Research 3, pp. 507554. Spirtes, P., Glymour, C., and Scheines, R., (2000) Causation, Prediction, and Search, 2nd edition. MIT Press, Cambridge, MA. 50

Spirtes2013.ppt

Related documents

Products

Support

Spirtes2013.ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib