Weakening the Causal Faithfulness Assumption Jiji Zhang Lingnan University Based on joint work with Peter Spirtes Markov and Faithfulness Assumptions Suppose the set of observed variables V is causally sufficient and its causal structure can be properly represented by a DAG over V. A statement of conditional independence is said to be entailed by a DAG if it is entailed by the Markov property of the DAG. Causal Markov Assumption: Every conditional independence statement entailed by the causal DAG over V is satisfied by the joint distribution over V. Causal Faithfulness Assumption: Every conditional independence statement satisfied by the joint distribution over V is entailed by the causal DAG over V. 2 Simple Examples of Unfaithfulness + Y + Entailed: none; Extra: X Z. X Z - X Y [0, 1] Z [0, 1, 2] X [0, 1] Y Entailed: X Z | Y; Extra: X Z. Entailed: X Y; Extra: X Z; Y Z Z 3 Testing Faithfulness? • Without knowing the true causal DAG, the Faithfulness assumption is not fully testable. • But given the Markov assumption, the Faithfulness assumption has a testable consequence: the distribution of V is (Markov and) faithful to some DAG. • Unfaithfulness is in principle detectable if the distribution is not faithful to any DAG. It is undetectable if the distribution is faithful to some (false) DAG. 4 SGS Algorithm S1. Form the complete undirected graph H over V. S2. For each pair of variables X and Y, search for S V\{X, Y} such that X and Y are independent conditional on S. Remove the edge between X and Y in H iff such a set is found. S3. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y and Z are adjacent, but X and Z are not adjacent), (1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z. (2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z). S4. More orientation rules … 5 Justification of S2 S2. For each pair of variables X and Y, search for S V\{X, Y} such that X and Y are independent conditional on S. Remove the edge between X and Y in H iff such a set is found. • Inference of adjacencies is justified by the Markov assumption. • Inference of non-adjacencies is justified by a consequence of the Faithfulness assumption. Adjacency-Faithfulness: For every X, YV, if X and Y are adjacent in the true causal DAG, then they are not independent conditional on any subset of V\{X,Y}. 6 Justification of S3 S3. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y and Z are adjacent, but X and Z are not adjacent), (1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z. (2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z). • (1) and (2) are both justified by the Markov assumption. • What about the Faithfulness assumption? 7 Justification of S3 (con’t) • The antecedent of clause (1) and that of clause (2) do not exhaust the logical possibilities. X Y Z Entailed: X Z | Y; Extra: X Z. • The remaining logical possibility is ruled out by the following consequence of the Faithfulness assumption: Orientation-Faithfulness: For every unshielded triple <X, Y, Z> in the true causal DAG, – If X Y Z, then X and Z are not independent conditional on any subset of V\{X,Y} that contains Y. – Otherwise, X and Z are not independent conditional on any subset of V\{X,Z} that does not contain Y. 8 First Weakening of Faithfulness • It follows that given the Markov and Adjacency-Faithfulness assumptions, violations of Orientation-Faithfulness are detectable, and a there is a straightforward test: S3*. For each unshielded triple <X, Y, Z>, (1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z. (2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z). (3) Otherwise, mark the triple as ambiguous or unfaithful. 9 Conservative SGS • Replace S3 with S3*, and we get what we call the Conservative SGS (CSGS) algorithm. • The CSGS algorithm is correct under the causal Markov and Adjacency-Faithfulness assumptions. • When Orientation-Faithfulness happens to hold, the output of CSGS is the same as that of SGS. 10 E-pattern • We call the (supposed) output of CSGS an extended pattern (epattern), which represents a set of patterns (each of which represents a Markov equivalence class of DAGs). Y W Z U X Y W X Y W X Z U Y W Z U X Z U 11 Violations of Adjacency-Faithfulness • Some violations of Adjacency-Faithfulness are also detectable. X Y Y X W Z Z Extra: X Z. Extra: X Z; Y Z. • Compare to an undetectable violation: Y Extra: X Z. X Z 12 Triangle-Faithfulness Triangle-Faithfulness: For every triangle <X, Y, Z> (i.e., they are adjacent to one another) in the true causal DAG, (1) If Y is a non-collider on the path <X, Y, Z>, then X and Z are not independent conditional on any subset of V\{X,Y} that does not contain Y. Y Y X Z X Z (2) If Y is a collider on the path <X, Y, Z>, then X and Z are not independent conditional on any subset of V\{X,Y} that Y contains Y. X Z • Triangle-Faithfulness is weaker than Adjacency-Faithfulness. 13 Further Weakening of Faithfulness • Another weak condition entailed by the Adjacency-Faithfulness assumption is known as the causal Minimality condition: no proper subgraph of the true causal DAG satisfies the Markov condition with the joint distribution. • Theorem: Given the causal Markov, Minimality and TriangleFaithfulness assumptions, any violation of the Faithfulness assumption is detectable. • What if we only make the Markov, Minimality and TriangleFaithfulness assumptions? 14 CSGS under the Weaker Assumptions • Given the Markov assumption, in the adjacency step S2, the inferred adjacencies are still correct. • The inferred non-adjacencies, however, are not necessarily correct, since Adjacency-Faithfulness is not assumed. (Mark the non-adjacencies as ‘apparent’). • Given the Markov and Triangle-Faithfulness assumptions, the orientation step S3* is still correct! (For an ‘apparently’ unshielded triple <X, Y, Z>, either it is really unshielded or it is a triangle. In the former case, S3* is correct by the Markov assumption; in the latter case, S3* is correct by the Triangle-Faithfulness assumption.) 15 Testing Adjacency-Faithfulness? • Therefore, given only the Markov and Triangle-Faithfulness assumptions, CSGS is still correct, provided that we take the nonadjacencies in the output as uninformative. • Can we somehow test Adjacency-Faithfulness and confirm nonadjacencies if the test returns affirmative? • What we have for now: take the output of CSGS and check the Markov condition for each pattern represented by the output. If every pattern satisfies the Markov condition, then the nonadjacencies are correct (assuming Minimality in addition to Markov and Triangle-Faithfulness). 16 Conjecture • The condition should be improvable. In particular, it is sufficient but not necessary for Adjacency-Faithfulness. • A necessary condition for Adjacency-Faithfulness is: some pattern represented by the CSGS output satisfies the Markov condition. • Conjecture: The necessary condition is also sufficient. That is, assuming Markov, Minimality, and Triangle-Faithfulness, Adjacency-Faithfulness holds iff some pattern represented by the CSGS output satisfies the Markov condition. 17 Still Further Weakening • Let G and H be DAGs over V. H is an I-structure of G if every conditional independence entailed by G is also entailed by H. H is a proper I-structure of G if H is an I-structure of G but G is not an I-structure of H. P-minimality assumption: No proper I-structure of the true causal DAG satisfies the Markov condition with the joint distribution. • The causal Faithfulness assumption is equivalent to a conjunction of (1) the P-minimality assumption and (2) that the joint distribution is faithful to some DAG. 18 Still Further Weakening (con’t) • The causal Faithfulness assumption is often regarded as a methodological assumption of simplicity; that is only part of its content, namely, the P-minimality assumption. • Violations of the P-minimality assumption are not detectable; Given the P-minimality assumption, violations of (the rest of) the Faithfulness assumption are detectable. • The causal (SGS-)minimality assumption plus the TriangleFaithfulness assumption entail the P-minimality assumption. • Conversely, the P-minimality assumption entails the causal (SGS)minimality assumption, but does not entail Triangle-Faithfulness. 19 Example Y X Z W Entailed: Y W | {X, Z}; Extra: X Z | {Y, W}. • Triangle-Faithfulness is violated, but P-minimality is not. • Assuming Markov and P-minimality, the violation of Triangle-Faithfulness is detectable. Y X Y Z X Y Z X Z W W W Y Y Y X Z W X Z X Z 20 W W Example (con’t) Y X Z W Entailed: Y W | {X, Z}; Extra: X Z | {Y, W}. Y Output of CSGS: X Z W • I suspect that VCSGS (i.e., CSGS in which non-adjacencies are regarded as ambiguous, unless a check of Markov condition in the end confirms them) is also correct under the causal Markov and P-minimality assumptions. 21 Further Questions • Are there feasible versions (or approximations)? • How about causal inference without causal sufficiency? 22 PC and CPC • The PC algorithm is a much more efficient version of SGS. • The key efficiency-improving ideas are also applicable to CSGS (when Adjacency-Faithfulness is assumed to hold). The resulting algorithm was called Conservative PC (CPC). • Joe Ramsey did simulations and found that even when the Faithfulness assumption is true, (1) CPC produces significantly fewer errors than PC at moderate sample sizes; (2) outputs about as much correct information as PC does; and (3) runs almost as fast. 23 Almost Unfaithfulness • The reason, we think, is that CPC not only guards against strict failure of orientation-faithfulness, but also guards against almost violations. • Intuitively, CPC suspends judgments when it detects “almost unfaithfulness” at a given sample size, just as it suspends judgments when it detects unfaithfulness in the large sample limit. 24 Uniform Consistency • A negative result due to Robins et al. (2003) is that causal inference can only be pointwise consistent but not uniformly consistent under the Causal Markov and Faithfulness assumptions. • The basis of their proof is related to almost unfaithfulness. 25 Uniform Consistency of Inferring Causal Direction • Suppose that we have the right adjacencies, and use procedures like PC to infer causal directions. • Robins et al.’s results do not apply here. • But we can still show that the PC procedure is not uniformly consistent in the inference of causal direction given the right adjacencies. 26 Uniform Consistency of Inferring Causal Direction (con’t) • Our argument is based on a theorem that no procedure can be uniformly consistent in, for example, deciding between an unshielded collider (X Y Z) and an unshielded noncollider without sometimes suspending judgments. • This argument does not apply to CPC, and we can show that CPC can be made uniformly consistent in its inference of causal directions (given the right adjacencies). 27 References P. Spirtes and J. Zhang (forthcoming) “A uniformly consistent estimator of causal effects under the k-triangle-faithfulness assumption”, Statistical Science. J. Zhang (2013) “A comparison of three Occam’s razors for Markovian causal models”, British Journal for the Philosophy of Science, 64(2): 423-448. J. Zhang (2008) “Error probabilities for the inference of causal direction”, Synthese 163: 409-418. J. Zhang and P. Spirtes (2008) “Detection of unfaithfulness and robust causal inference”, Minds and Machines 18(2): 239-271. J. Ramsey, P. Spirtes, and J. Zhang (2006) “Adjacency-faithfulness and conservative causal inference”, UAI proceedings: 401-408. 28