spirtes2014.pptx

advertisement
Peter Spirtes, Jiji Zhang
1
 Faithfulness comes in several flavors and is a kind of
principle that selects simpler (in a certain sense) over more
complicated models.
 We show how to weaken the assumption of standard
faithfulness so that it needs to be applied in fewer
circumstances.
 We show how to weaken the assumption of strong (ε)faithfulness) so that it does not prohibit the existence of
weak edges.
 We show how to modify the causal search algorithms so
that they make fewer mind changes as the sample size
grows.
2
X
Y X
Y X
Y X
Y X
Y
Z
Z
Z
Z
Z
W
W
W
W
W
 True Graph
 W = aZ + εW
 Z = bX + cY + εZ
IP(W,X|Z) = 0
IP(W,Y|Z) = 0
IP(X,Y| ) = 0
 X = εX
 Y = εY
3
 S1. Form the complete undirected graph H on the given set
of variables V.
 S2. For each pair of variables X and Y in V, search for a
subset S of V\{X, Y} such that X and Y are independent
conditional on S. Remove the edge between X and Y in H
iff such a set is found.
 S3. Let K be the graph resulting from S2. For each
unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y
and Z are adjacent, but X and Z are not adjacent), if X and
Z are independent conditional on some subset of V\{X, Y}
that does not contain Y, then orient the triple as a collider:
X  Y  Z.
 S4. Execute the entailed orientation rules.
4
 Causal Markov Assumption: For a set of variables for
which there are no unmeasured common causes, each
variable is independent of its non-effects conditional
on its direct causes.
 Non-obvious equivalent formulation: If IG(X,Y|Z) in
causal DAG G with no unmeasured common causes
then IP(X,Y|Z) = 0.
 If IP(X,Y|Z) = 0 then IG(X,Y|Z) in causal DAG G.
 Converse of Causal Markov Assumption.
 If IP(X,Y|Z) is a rational function of parameters, then
violations are Lebesgue measure 0.
5
 Reduction of Underdetmination
 If I(A,B|  then prefer A → C ← B to A → C → B
 Computational Efficiency
 If A – C – B and I(A,B|  then don’t need to check
I(A,B|C.
 Statistical Efficiency
 The Markov equivalence class can be found without
testing independence conditional on a set with more
than maximum degree of any variable in the true causal
graph.
6
 If causal sufficiency, Causal Markov and Causal
Faithfulness Assumptions, then there exist pointwise
consistent estimators of Markov equivalence class
 SGS
 PC
 GES (Gaussian, multinomial)
 If just assume Causal Markov Assumption and causal
sufficiency there are no pointwise consistent
estimators of Markov Equivalence Class
 Gaussian
 Multinomial
 Unrestricted
7
 If causal sufficiency, Causal Markov and Causal
Faithfulness Assumptions, then no uniform consistent
estimator of Markov Equivalence Class
 Gaussian
 Multinomial
 Unrestricted
8
 (A4: ε-faithfulness) The partial correlations between
X(i) and X( j) given {X(r); r k} for some set k 
{1,…,pn}\{i,j} are denoted by rn;i,j|k. Their absolute values
are bounded from below and above:
{
}
inf ri, j|k ;i, j,k with ri, j|k ¹ 0 ³ cn ,cn-1 = O(n d )
f or some 0 < d < b / 2
sup ri, j|k £ M < 1
n;i, j,k
where 0 < b £ 1 is as in (A3).
9
Assume (A1)-(A4). Denote by Ĝskel,n (a n ) the estimate
from the (first part of the) SGS algorithm and by Gskel,n
the true skeleton from the DAG Gn . Then, there exists
a n ® 0 (n ® ¥) such that
P[Ĝskel,n (a n ) = Gskel,n ]
= 1-O(exp(-Cn1-2d )) ® 1 (n ® ¥) for some 0 < C < ¥,
where d > 0 as in (A4).
10
 Uhler et al.: (A4) tends to be violated fairly often, if the
parameter values are assigned randomly, and ε is not
very small.
 There are two ways to get very small partial
correlations – almost cancellations and very weak
edges.
 (A4) forbids both – it entails that there are no very
weak edges.
11
X
Y
X
Y
X
Y
X
Y
Z
Z
Z
Z
W
W
W
W
12
X
Y IP(W,{X,Y}|Z)
IP(W,{X,Y}| )
Z
IP(X,Y| )
IP(W,Z| )
W
Output Small Sample
X
Y IP(W,{X,Y}|Z)
IP(W,{X,Y}| )
Z
IP(X,Y| )
IP(W,Z| )
W
Output Medium- Sample
X
Y
Y IP(W,{X,Y}|{Z})
IP(W,{X,Y}| )
Z
Z
IP(X,Y| )
IP(W,Z| )
W
W
Output Medium+ Sample Output
X
IP(W,{X,Y}|{Z})
IP(W,{X,Y}| )
IP(X,Y| )
IP(W,Z| )
Large Sample
13
X
Y IP(W,{X,Y}|Z)
IP(W,{X,Y}| )
Z
IP(X,Y| )
IP(W,Z| )
W
Output Small Sample
X
Y IP(W,{X,Y}|Z)
IP(W,{X,Y}| )
Z
IP(X,Y| )
IP(W,Z| )
W
Output Medium- Sample
X
Y
Y IP(W,{X,Y}|{Z})
IP(W,{X,Y}| )
Z
Z
IP(X,Y| )
IP(W,Z| )
W
W
Output Medium+ Sample Output
X
IP(W,{X,Y}|{Z})
IP(W,{X,Y}| )
IP(X,Y| )
IP(W,Z| )
Large Sample
14
X→Y→Z→W
X–Y–Z–W
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,Z|Y)
IP(X,Z|Y)
IP(Y,W|{X,Z)} IP(Y,W|{X,Z)}
IP(X,W| )
IP(X,W| )
Small Sample Large Sample
True Graph
X–Y–Z→W
15
X→Y→Z→W
X–Y–Z–W
X–Y–Z→W
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,W| )
Small Sample
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,W| )
Large Sample
True Graph
16
X
Y
Z
W
17
X
Y IP(W,{X,Y}|Z)
IP(W,{X,Y}| )
Z
IP(X,Y| )
IP(W,Z| )
W
Output Small Sample
X
Y IP(W,{X,Y}|Z)
IP(W,{X,Y}| )
Z
IP(X,Y| )
IP(W,Z| )
W
Output Medium- Sample
X
Y
Y IP(W,{X,Y}|{Z})
IP(W,{X,Y}| )
Z
Z
IP(X,Y| )
IP(W,Z| )
W
W
Output Medium+ Sample Output
X
IP(W,{X,Y}|{Z})
IP(W,{X,Y}| )
IP(X,Y| )
IP(W,Z| )
Large Sample
18
 S3*. Let K be the undirected graph resulting from S2.
For each unshielded triple <X, Y, Z>,
 If X and Z are not independent conditional on any
subset of V\{X, Y} that contains Y, then orient the triple
as a collider: X  Y  Z.
 If X and Z are not independent conditional on any
subset of V\{X, Y} that does not contain Y, then mark
the triple as a non-collider.
 Otherwise, mark the triple as ambiguous (or unfaithful).
19
 Adjacency – If X – Y in the causal DAG then IP(X,Y|Z)
≠ 0 for any Z.
20
 Triangle – For any three variables that form a
triangle in causal DAG G
 If Z is a non-collider on the path <X, Z, Y>, then X and Y
are not independent conditional on any subset of V\{X,
Y} that does not contain Z;
 If Z is a collider on the path <X, Z, Y>, then X and Y are
not independent conditional on any subset of V\{X, Y}
that contains Z.
 Suppose X → Y ← Z and IP(X,Z|Y) = 0. This is faithful
to X → Y → Z. This cannot be detected, so it must be
assumed.
21
X
Z
Y
X
Z
Y
W
¬I(X,Z| )
¬I(X,Y|Z)
¬I(Y,Z| )
¬I(X,Z| )
¬I(Y,Z| )
¬I(X,Y|Z)
¬I(X,W| )
¬I(Y,W| )
¬I(Z,W| )
¬I(X,Z|W) ¬I(X,Z|Y,W)
¬I(Y,Z|W) ¬I(Y,Z|X,W)
¬I(X,Y|W) ¬I(X,Y|Z,W)
¬I(X,W|Z) ¬I(X,W|Y)
¬I(Y,W|X) ¬I(Y,W|Z)
¬I(Z,W|X) ¬I(Z,W|Y)
22
 The population distribution is not Markov to any
proper subDAG of the true causal DAG.
 Causal Minimality is entailed by manipulation
definition of causation if a distribution is positive.
 There is a weaker kind of causal minimality – Pminimality: the population distribution is not Markov
to any DAG that entails a proper superset of the
conditional independence relations.
 Is this sufficient for the correctness of VCSGS?
23
X→Y→Z→W
X–Y–Z–W
X–Y–Z –W
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,W| )
Small Sample
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,W| )
Large Sample
True Graph
24
X→Y→Z→W
X–Y–Z–W
X – Y – Z →W
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,W| )
Small Sample
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,W| )
Large Sample
True Graph
25
X→Y→Z→W
X–Y–Z–W
X – Y – Z →W
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,W| )
Small Sample
IP(X,Z|Y)
IP(Y,W|{X,Z)}
IP(X,W| )
Large Sample
True Graph
26
 V1. Form the complete undirected graph H on the given set of variables
V.
 V2. For each pair of variables X and Y in V, search for a subset S of
V\{X, Y} such that X and Y are independent conditional on S. Remove
the edge between X and Y in H and mark the pair <X, Y> as ‘apparently
non-adjacent’, if and only if such a set is found.
 V3. Let K be the graph resulting from V2. For each apparently
unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y and Z are
adjacent, but X and Z are apparently non-adjacent),
 If X and Z are not independent conditional on any subset of V\{X,
Y} that contains Y, then orient the triple as a collider: X  Y  Z.
 If X and Z are not independent conditional on any subset of V\{X,
Y} that does not contain Y, then mark the triple as a non-collider.
 Otherwise, mark the triple as ambiguous (or unfaithful), and mark
the pair <X, Z> as ‘definitely non-adjacent’.
27
 V4. Execute the same orientation rules as in S4,
until none of them applies.
 V5. Let M be the graph resulting from V4. For each
consistent disambiguation of the ambiguous
triples in M (i.e., each disambiguation that leads to
a pattern), test whether each vertex V in the
resulting pattern satisfies the Markov condition. If
V and W satisfy the Markov condition in every
pattern, then mark the ‘apparently non-adjacent’
<V,W> pair as ‘definitely non-adjacent’.
28
Faithfulness
Adjacency-Faithfulness
Triangle-Faithfulness
P-Minimality
29
 If Triangle Faithfulness Assumption, Causal
Minimality Assumption, and Causal Markov
Assumption, then VCSGS is a consistent estimator of
the extended Markov equivalence class.
 Is it complete?
30
 V5*. Let M be the graph resulting from V4. For each
consistent disambiguation of the ambiguous triples in M
(i.e., each disambiguation that leads to a pattern), test
whether each vertex V in the resulting pattern satisfies the
Markov condition. If V and W satisfy the Markov condition
in some pattern, then mark the ‘apparently non-adjacent’
<V,W> pair as ‘definitely non-adjacent’.
31
 Assumption NVV(J):
inf varM ( X i | V \{X i }) ³ J , for some (small) J > 0
X i ÎV
 Assumption UBC(C):
sup
X i ,X j ÎV,WÍV\{ X i ,X j }
r M ( X i , X j | W) £ C for some C < 1
32
 Given a set of variables V, suppose the true causal
model over V is M = <P,G>, where P is a Gaussian
distribution over V, and G is a DAG with vertices V For
any three variables X, Y, Z that form a triangle in G
(i.e., each pair of vertices is adjacent),
 If Y is a non-collider on the path <X, Y, Z>, then |r(X,
Z|W)| ≥ k  |eM(X – Z)| for all WV that do not contain
Y; and
 If Y is a collider on the path <X, Y, Z>, then |r(X, Z|W)| ≥
k  |eM(X – Z)| for all WV that do contain Y.
33
 S3* (sample version). Let K be the undirected graph
resulting from the adjacency phase. For each unshielded
triple <X, Y, Z>,
 If there is a set W not containing Y such that the test of r(X,
Z|W) = 0 returns 0 (i.e., accepts the hypothesis), and for every
set U that contains Y, the test of |r(X,Z|U)| = 0 returns 1 (i.e.,
rejects the hypothesis), and the test of |r(X,Z|U) – r(X,Z|W)|
 L returns 0 (i.e., accepts the hypothesis), then orient the
triple as a collider: X  Y  Z.
 If there is a set W containing Y such that the test of r(X, Z|W)
= 0 returns 0 (i.e., accepts the hypothesis), and for every set U
that does not contain Y, the test of |r(X,Z|U)| = 0 returns 1
(i.e., rejects the hypothesis), and the test of |r(X,Z|U) –
r(X,Z|W)|  L returns 0 (i.e., accepts the hypothesis), then
mark the triple as a non-collider.
 Otherwise, mark the triple as ambiguous.
34
 Say that CSGS(L, n, M) errs if it contains (i) an
adjacency not in GM; or (ii) a marked non-collider not
in GM, or (iii) an orientation not in GM.
 Theorem: Given causal sufficiency of the measured
variables V, the Causal Markov, k-TriangleFaithfulness, NVV(J), and UBC(C) Assumptions, the
CSGS algorithm is uniformly consistent in the sense
that
lim sup P (CSGS(L,n, M ) errs) = 0
n®¥
M Îy k ,J ,C
n
M
35
 For each vertex Z
 If every vertex not adjacent to Z is not confirmed to be
non-adjacent to Z return ‘Unknown’ for every edge
containing Z
 else
 For every non-adjacent pair <Y, Z> in EP(G), let the
estimate be 0
 For each vertex Z such that all of the edges containing
Z are oriented in EP(G), if Y is a parent of Z in EP(G),
let the estimate be the sample regression coefficient
of Y in the regression of Z on its parents in EP(G).
36
 Let M1 be an output of the Estimation Algorithm, and M2
be a causal model. We define the structural coefficient
distance, d[M1,M2], between M1 and M2 to be
d[M 1, M 2 ] = max êM (Xi ® X j ) - eM (Xi ® X j )
i, j
where by convention êM (Xi ® X j ) - eM (Xi ® X j ) = 0
if êM (Xi ® X j ) = “Unknown”.
1
2
1
2
1
37
 E1. Run the CSGS algorithm on an i.i.d. sample of size
n from PM.
 E2. Let the output from E1 be CSGS(L, n, M). Apply
step V5 in the VCSGS algorithm (from section 3), using
tests of zero partial correlations and record which nonadjacencies are confirmed.
 E3. Apply the Estimation Algorithm to CSGS(L, n, M),
the confirmed non-adjacencies, and the sample of size
n.
38
 Given causal sufficiency of the measured variables V,
the Causal Markov, k-Triangle-Faithfulness, NVV(J),
and UBC(C) Assumptions, the Edge Estimation I
algorithm is uniformly consistent in the sense that for
every  > 0
lim sup P (d[Ô( M ), M ] > d ) = 0
n®¥
MÎy k ,J ,C
n
M
 For a large enough and dense enough graph, this still
allows for the possibility of large manipulation errors
(due to many small edge errors.
39
X1
1.0
0.01
0.7877781
X2
X3
1.0
0.612157
1.0
40
 if k > 0.014, then the k-Triangle-Faithfulness Assumption is
violated for models M2 and M3, but not for M1.
 If 0.008 < k < 0.014 then the k-Triangle-Faithfulness
Assumption is violated for models M3, but not for M1 or M2.
41
 E1. Run Edge Estimation Algorithm I.
 E2. Set ForbiddenOrientations = {}.
 E3. For each maximal clique in CSGS(L, n, M) such that if a
vertex in the clique is not adjacent to some vertex not in the
clique, it is definitely non-adjacent
 (i) for each possible orientation O of all of the unoriented
edges in the maximal clique
 Apply the orientation O to each of the unoriented edges.
 Apply Meeks’ orientation rules.
 If application of the rules produces a cycle or a new unshielded
collider add O to ForbiddenOrientations
 Add O to ForbiddenOrientations if for any Y and W such that Y is a
non-collider the path <X, Y, Z>, and W  V and does contain Y
(
)
j n k êO ( X - Z) - r̂ ( X ,Z | W) > L = 0
42
 E4. For each unoriented edge X – Y in CSGS(L, n, M), if
there is only one orientation X  Y that does not occur in
ForbiddenOrientations, and every vertex that Y is not
adjacent to, Y is definitely not adjacent to, orient as X  Y
 E5. For each vertex V such that some edge containing V in
CSGS(L, n, M) is not oriented, if there is only one
orientation of all of the edges containing V that is not in
ForbiddenOrientations, and every vertex that V is not
adjacent to, V is definitely not adjacent to, let the estimate
of each edge equal be the sample regression coefficient of V
on its parents in the non-forbidden orientation.
43
 Theorem: Given causal sufficiency of the measured
variables V, the Causal Markov, k-TriangleFaithfulness, NVV(J), and UBC(C) Assumptions, the
Edge Estimation II algorithm is uniformly consistent
in the sense that for every  > 0
lim sup PMn (O(L,n, M ) errs) = 0
n®¥
M Îy k ,J ,C
lim sup PMn (d[Ô( M ), M ] > d ) = 0
n®¥
MÎy k ,J ,C
 where O(L,n,M) is the graphical output of the Edge
Estimation II algorithm, and Ô( M ) is the output of the
Edge Estimation II algorithm.
44
 We weaken the assumption of faithfulness so that
fewer inferences from conditional independence to dseparation need to be made.
 We strengthened the assumption so that it allows one
to make inferences from “almost independence” in a
probability distribution to d-separation in a causal
graph, allowing for the existence of uniformly
consistent estimation algorithms.
45
 We changed the concept of correctness to allow for
missing weak edges, and saying “don’t know” about
some features of Markov equivalence classes.
 The new simplicity assumption broke up the Markov
equivalence class in the sense that it considers some
models in a Markov equivalence class simpler than
other models in the same Markov equivalence class.
 This allowed for uniformly consistent estimates of
linear coefficients in a causal model, as well as causal
structure.
46
 Can we get similar results for:




PC
FCI
non-linear models
increasing numbers of variables and vertex degree and
decreasing k (analogous to Kalisch and Buhlmann)?
 If parameter values are randomly assigned, how often
is k-triangle faithfulness violated as a function of




sample size
clique size
parameter distribution
k
47
 Kalisch, M., and P. Bühlmann (2007). Estimating high-
dimensional directed acyclic graphs with the PCalgorithm. Journal of Machine Learning Research 8,
613–636.
 Spirtes, P., and Zhang, J. (forthcoming) A Uniformly
Consistent Estimator of Causal Effects Under The kTriangle-Faithfulness Assumption, Statistical Science.
 Spirtes, P., and Zhang, J. (submitted) Three Faces of
Faithfulness, Synthese.
48
Download