Detection of Unfaithfulness and Robust Causal Inference

advertisement
Weakening the Causal
Faithfulness Assumption
Jiji Zhang
Lingnan University
Based on joint work with Peter Spirtes
Markov and Faithfulness Assumptions
Suppose the set of observed variables V is causally sufficient and its
causal structure can be properly represented by a DAG over V.
A statement of conditional independence is said to be entailed by a
DAG if it is entailed by the Markov property of the DAG.
Causal Markov Assumption: Every conditional independence
statement entailed by the causal DAG over V is satisfied by the
joint distribution over V.
Causal Faithfulness Assumption: Every conditional independence
statement satisfied by the joint distribution over V is entailed by the
causal DAG over V.
2
Simple Examples of Unfaithfulness
+
Y
+
Entailed: none; Extra: X  Z.
X
Z
-
X
Y
[0, 1]
Z
[0, 1, 2]
X
[0, 1]
Y
Entailed: X  Z | Y; Extra: X  Z.
Entailed: X  Y; Extra: X  Z; Y  Z
Z
3
Testing Faithfulness?
• Without knowing the true causal DAG, the Faithfulness
assumption is not fully testable.
• But given the Markov assumption, the Faithfulness assumption
has a testable consequence: the distribution of V is (Markov
and) faithful to some DAG.
• Unfaithfulness is in principle detectable if the distribution is
not faithful to any DAG.
It is undetectable if the distribution is faithful to some (false)
DAG.
4
SGS Algorithm
S1. Form the complete undirected graph H over V.
S2. For each pair of variables X and Y, search for S  V\{X, Y} such
that X and Y are independent conditional on S. Remove the edge
between X and Y in H iff such a set is found.
S3. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y
and Z are adjacent, but X and Z are not adjacent),
(1) If X and Z are not independent conditional on any subset of V\{X,
Y} that contains Y, then mark the triple as a collider: X  Y  Z.
(2) If X and Z are not independent conditional on any subset of V\{X,
Y} that does not contain Y, then mark the triple as a non-collider
(i.e., not X  Y  Z).
S4. More orientation rules …
5
Justification of S2
S2. For each pair of variables X and Y, search for S  V\{X, Y} such
that X and Y are independent conditional on S. Remove the edge
between X and Y in H iff such a set is found.
• Inference of adjacencies is justified by the Markov assumption.
• Inference of non-adjacencies is justified by a consequence of the
Faithfulness assumption.
Adjacency-Faithfulness: For every X, YV, if X and Y are adjacent in
the true causal DAG, then they are not independent conditional on
any subset of V\{X,Y}.
6
Justification of S3
S3. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y
and Z are adjacent, but X and Z are not adjacent),
(1) If X and Z are not independent conditional on any subset of V\{X,
Y} that contains Y, then mark the triple as a collider: X  Y  Z.
(2) If X and Z are not independent conditional on any subset of V\{X,
Y} that does not contain Y, then mark the triple as a non-collider
(i.e., not X  Y  Z).
• (1) and (2) are both justified by the Markov assumption.
• What about the Faithfulness assumption?
7
Justification of S3 (con’t)
• The antecedent of clause (1) and that of clause (2) do not exhaust
the logical possibilities.
X
Y
Z
Entailed: X  Z | Y; Extra: X  Z.
• The remaining logical possibility is ruled out by the following
consequence of the Faithfulness assumption:
Orientation-Faithfulness: For every unshielded triple <X, Y, Z> in
the true causal DAG,
– If X  Y  Z, then X and Z are not independent conditional on
any subset of V\{X,Y} that contains Y.
– Otherwise, X and Z are not independent conditional on any
subset of V\{X,Z} that does not contain Y.
8
First Weakening of Faithfulness
• It follows that given the Markov and Adjacency-Faithfulness
assumptions, violations of Orientation-Faithfulness are detectable,
and a there is a straightforward test:
S3*. For each unshielded triple <X, Y, Z>,
(1) If X and Z are not independent conditional on any subset of V\{X,
Y} that contains Y, then mark the triple as a collider: X  Y  Z.
(2) If X and Z are not independent conditional on any subset of V\{X,
Y} that does not contain Y, then mark the triple as a non-collider
(i.e., not X  Y  Z).
(3) Otherwise, mark the triple as ambiguous or unfaithful.
9
Conservative SGS
• Replace S3 with S3*, and we get what we call the Conservative
SGS (CSGS) algorithm.
• The CSGS algorithm is correct under the causal Markov and
Adjacency-Faithfulness assumptions.
• When Orientation-Faithfulness happens to hold, the output of
CSGS is the same as that of SGS.
10
E-pattern
• We call the (supposed) output of CSGS an extended pattern (epattern), which represents a set of patterns (each of which
represents a Markov equivalence class of DAGs).
Y
W
Z
U
X
Y
W
X
Y
W
X
Z
U
Y
W
Z
U
X
Z
U
11
Violations of Adjacency-Faithfulness
• Some violations of Adjacency-Faithfulness are also detectable.
X
Y
Y
X
W
Z
Z
Extra: X  Z.
Extra: X  Z; Y  Z.
• Compare to an undetectable violation:
Y
Extra: X  Z.
X
Z
12
Triangle-Faithfulness
Triangle-Faithfulness: For every triangle <X, Y, Z> (i.e., they are
adjacent to one another) in the true causal DAG,
(1) If Y is a non-collider on the path <X, Y, Z>, then X and Z
are not independent conditional on any subset of V\{X,Y} that
does not contain Y.
Y
Y
X
Z
X
Z
(2) If Y is a collider on the path <X, Y, Z>, then X and Z are not
independent conditional on any subset of V\{X,Y} that
Y
contains Y.
X
Z
• Triangle-Faithfulness is weaker than Adjacency-Faithfulness.
13
Further Weakening of Faithfulness
• Another weak condition entailed by the Adjacency-Faithfulness
assumption is known as the causal Minimality condition: no
proper subgraph of the true causal DAG satisfies the Markov
condition with the joint distribution.
• Theorem: Given the causal Markov, Minimality and TriangleFaithfulness assumptions, any violation of the Faithfulness
assumption is detectable.
• What if we only make the Markov, Minimality and TriangleFaithfulness assumptions?
14
CSGS under the Weaker Assumptions
• Given the Markov assumption, in the adjacency step S2, the
inferred adjacencies are still correct.
• The inferred non-adjacencies, however, are not necessarily
correct, since Adjacency-Faithfulness is not assumed. (Mark the
non-adjacencies as ‘apparent’).
• Given the Markov and Triangle-Faithfulness assumptions, the
orientation step S3* is still correct!
(For an ‘apparently’ unshielded triple <X, Y, Z>, either it is really
unshielded or it is a triangle. In the former case, S3* is correct by
the Markov assumption; in the latter case, S3* is correct by the
Triangle-Faithfulness assumption.)
15
Testing Adjacency-Faithfulness?
• Therefore, given only the Markov and Triangle-Faithfulness
assumptions, CSGS is still correct, provided that we take the nonadjacencies in the output as uninformative.
• Can we somehow test Adjacency-Faithfulness and confirm nonadjacencies if the test returns affirmative?
• What we have for now: take the output of CSGS and check the
Markov condition for each pattern represented by the output. If
every pattern satisfies the Markov condition, then the nonadjacencies are correct (assuming Minimality in addition to
Markov and Triangle-Faithfulness).
16
Conjecture
• The condition should be improvable. In particular, it is sufficient
but not necessary for Adjacency-Faithfulness.
• A necessary condition for Adjacency-Faithfulness is: some pattern
represented by the CSGS output satisfies the Markov condition.
• Conjecture: The necessary condition is also sufficient.
That is, assuming Markov, Minimality, and Triangle-Faithfulness,
Adjacency-Faithfulness holds iff some pattern represented by the
CSGS output satisfies the Markov condition.
17
Still Further Weakening
• Let G and H be DAGs over V. H is an I-structure of G if every
conditional independence entailed by G is also entailed by H. H is
a proper I-structure of G if H is an I-structure of G but G is not an
I-structure of H.
P-minimality assumption: No proper I-structure of the true causal
DAG satisfies the Markov condition with the joint distribution.
• The causal Faithfulness assumption is equivalent to a conjunction
of (1) the P-minimality assumption and (2) that the joint
distribution is faithful to some DAG.
18
Still Further Weakening (con’t)
• The causal Faithfulness assumption is often regarded as a
methodological assumption of simplicity; that is only part of its
content, namely, the P-minimality assumption.
• Violations of the P-minimality assumption are not detectable;
Given the P-minimality assumption, violations of (the rest of) the
Faithfulness assumption are detectable.
• The causal (SGS-)minimality assumption plus the TriangleFaithfulness assumption entail the P-minimality assumption.
• Conversely, the P-minimality assumption entails the causal (SGS)minimality assumption, but does not entail Triangle-Faithfulness.
19
Example
Y
X
Z
W
Entailed: Y  W | {X, Z}; Extra: X  Z | {Y, W}.
• Triangle-Faithfulness is violated, but P-minimality is not.
• Assuming Markov and P-minimality, the violation of Triangle-Faithfulness is
detectable.
Y
X
Y
Z
X
Y
Z
X
Z
W
W
W
Y
Y
Y
X
Z
W
X
Z
X
Z
20
W
W
Example (con’t)
Y
X
Z
W
Entailed: Y  W | {X, Z}; Extra: X  Z | {Y, W}.
Y
Output of CSGS:
X
Z
W
• I suspect that VCSGS (i.e., CSGS in which non-adjacencies are regarded as
ambiguous, unless a check of Markov condition in the end confirms them) is
also correct under the causal Markov and P-minimality assumptions.
21
Further Questions
• Are there feasible versions (or approximations)?
• How about causal inference without causal sufficiency?
22
PC and CPC
• The PC algorithm is a much more efficient version of SGS.
• The key efficiency-improving ideas are also applicable to
CSGS (when Adjacency-Faithfulness is assumed to hold). The
resulting algorithm was called Conservative PC (CPC).
• Joe Ramsey did simulations and found that even when the
Faithfulness assumption is true, (1) CPC produces
significantly fewer errors than PC at moderate sample sizes;
(2) outputs about as much correct information as PC does; and
(3) runs almost as fast.
23
Almost Unfaithfulness
• The reason, we think, is that CPC not only guards against strict
failure of orientation-faithfulness, but also guards against
almost violations.
• Intuitively, CPC suspends judgments when it detects “almost
unfaithfulness” at a given sample size, just as it suspends
judgments when it detects unfaithfulness in the large sample
limit.
24
Uniform Consistency
• A negative result due to Robins et al. (2003) is that causal
inference can only be pointwise consistent but not uniformly
consistent under the Causal Markov and Faithfulness
assumptions.
• The basis of their proof is related to almost unfaithfulness.
25
Uniform Consistency of Inferring
Causal Direction
• Suppose that we have the right adjacencies, and use
procedures like PC to infer causal directions.
• Robins et al.’s results do not apply here.
• But we can still show that the PC procedure is not uniformly
consistent in the inference of causal direction given the right
adjacencies.
26
Uniform Consistency of Inferring
Causal Direction (con’t)
• Our argument is based on a theorem that no procedure can be
uniformly consistent in, for example, deciding between an
unshielded collider (X Y  Z) and an unshielded noncollider without sometimes suspending judgments.
• This argument does not apply to CPC, and we can show that
CPC can be made uniformly consistent in its inference of causal
directions (given the right adjacencies).
27
References
P. Spirtes and J. Zhang (forthcoming) “A uniformly consistent estimator of causal
effects under the k-triangle-faithfulness assumption”, Statistical Science.
J. Zhang (2013) “A comparison of three Occam’s razors for Markovian causal
models”, British Journal for the Philosophy of Science, 64(2): 423-448.
J. Zhang (2008) “Error probabilities for the inference of causal direction”,
Synthese 163: 409-418.
J. Zhang and P. Spirtes (2008) “Detection of unfaithfulness and robust causal
inference”, Minds and Machines 18(2): 239-271.
J. Ramsey, P. Spirtes, and J. Zhang (2006) “Adjacency-faithfulness and
conservative causal inference”, UAI proceedings: 401-408.
28
Download