Graphical Methods for Identi cation in Structural Equation Models

advertisement
Graphical Methods for Identi cation in Structural
Equation Models
Carlos Eduardo Fisch de Brito
June 2004
Technical Report R-314
Cognitive Systems Laboratory
Department of Computer Science
University of California
Los Angeles, CA 90095-1596, USA
This report reproduces a dissertation submitted to UCLA in partial satisfaction
of the requirements for the degree of Doctor of Philosophy in Computer Science.
This work was supported in part by AFOSR grant #F49620-01-1-0055, NSF grant
#IIS-0097082, California State MICRO grant #00-077, and MURI grant #N0001400-1-0617.
1
­c Copyright by
Carlos Eduardo Fisch de Brito
2004
2004
To my parents and wife
iii
TABLE OF C ONTENTS
1 Introduction
1
1.1
Data Analysis with SEM and the Identification Problem . . . . . . . .
3
1.2
Overview of Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.3
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2 Problem Definition and Background
10
2.1
Structural Equation Models and Identification . . . . . . . . . . . . .
10
2.2
Graph Background . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.3
Wright’s Method of Path Analysis . . . . . . . . . . . . . . . . . . .
18
3 Auxiliary Sets for Model Identification
20
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.2
Basic Systems of Linear Equations . . . . . . . . . . . . . . . . . . .
21
3.3
Auxiliary Sets and Linear Independence . . . . . . . . . . . . . . . .
23
3.4
Model Identification Using Auxiliary Sets . . . . . . . . . . . . . . .
28
3.5
Simpler Conditions for Identification . . . . . . . . . . . . . . . . . .
31
3.5.1
Bow-free Models . . . . . . . . . . . . . . . . . . . . . . . .
32
3.5.2
Instrumental Condition . . . . . . . . . . . . . . . . . . . . .
33
4 Algorithm
36
5 Correlation Constraints
43
5.1
Obtaining Constraints Using Auxiliary Sets . . . . . . . . . . . . . .
iv
45
6 Sufficient Conditions For Non-Identification
49
6.1
Violating the First Condition . . . . . . . . . . . . . . . . . . . . . .
50
6.2
Violating the Second Condition . . . . . . . . . . . . . . . . . . . . .
54
57
7 Instrumental Sets
7.1
Causal Influence and the Components of Correlation . . . . . . . . .
59
7.2
Instrumental Variable Methods . . . . . . . . . . . . . . . . . . . . .
60
7.3
Instrumental Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
8 Discussion and Future Work
66
A (Proofs from Chapter 3)
68
B (Proofs from Chapter 6)
72
C (Proofs from Chapter 7)
82
C.1 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
C.1.1
Partial Correlation Lemma . . . . . . . . . . . . . . . . . . .
82
C.1.2
Path Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . .
83
C.2 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
C.2.1
Notation and Basic Linear Equations . . . . . . . . . . . . .
84
C.2.2
System of Equations . . . . . . . . . . . . . . . . . . . . .
88
C.2.3
Identification of
. . . . . . . . . . . . . . . . . . .
91
C.3 Proof of Lemma 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
95
References
½
v
L IST OF F IGURES
1.1
Smoking and lung cancer example . . . . . . . . . . . . . . . . . . .
3
1.2
Model for correlations between blood pressures of relatives . . . . . .
4
1.3
McDonald’s regressional hierarchy examples . . . . . . . . . . . . .
8
2.1
A simple structural model and its causal diagram . . . . . . . . . . .
14
2.2
A causal diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.3
A causal diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.4
Wright’s equations. . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.1
Wright’s equations. . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.2
Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.3
Example illustrating condition ´
in the G criterion . . . . . . . . .
25
3.4
Example illustrating rule R¾´µ. . . . . . . . . . . . . . . . . . . . .
30
3.5
Example illustrating Auxiliary Sets method. . . . . . . . . . . . . . .
32
3.6
Example of a Bow-free model. . . . . . . . . . . . . . . . . . . . . .
33
3.7
Examples illustrating the instrumental condition. . . . . . . . . . . .
35
4.1
A causal diagram and the corresponding flow network . . . . . . . . .
42
5.1
D-separation conditions and correlation constraints. . . . . . . . . . .
44
5.2
Example of a model that imposes correlation constraints . . . . . . .
47
6.1
Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
6.2
Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
µ
vi
6.3
Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
7.1
Typical Instrumental Variable . . . . . . . . . . . . . . . . . . . . . .
61
7.2
Conditional IV Examples . . . . . . . . . . . . . . . . . . . . . . . .
62
7.3
Simultaneous use of two IVs . . . . . . . . . . . . . . . . . . . . . .
63
7.4
More examples of Instrumental Sets . . . . . . . . . . . . . . . . . .
64
vii
V ITA
1971
Born, Mogi das Cruzes-SP, Brazil.
1997
B.S. Computer Science, Universidade Federal do Rio de Janeiro,
Brazil.
1999
M.S. Computer Science, Universidade Federal do Rio de Janeiro,
Brazil.
P UBLICATIONS
Brito C. and Gafni, E. and Vaya, S. An Information Theoretic Lower Bound for Broadcasting in Radio Networks. In STACS’04 - 21st International Symposium on Theoretical Aspects of Computer Science, Montpellier, France.
Brito, C. and Koutsoupias, E. and Vaya, S. Competitive Analysis of Organization
Networks, Or Multicast Acknolwedgement: How Much to Wait? In SODA’04 - ACMSIAM Symposium on Discrete Algorithms, New Orleans, LA, USA
Brito, C. A New Approach to the Identification Problem. In SBIA’02 - Brazillian
Symposium in Artificial Intelligence, Porto de Galinhas/Recife, Brazil.
Brito, C. and Pearl, J. Generalized Instrumental Variables. In UAI’02 - The Eighteenth
Conference on Uncertainty in Artificial Intelligence, Edmonton, Alberta, Canada.
viii
Brito, C. and Pearl, J. A Graphical Criterion for the Identification of Causal Effects
in Linear Models. In AAAI’02 - The Eighteenth National Conference on Artificial
Intelligence, Edmonton, Alberta, Canada.
Brito, C. and Pearl, J. A New Identification Condition for Recursive Models with
Correlated Errors. In Structural Equation Models, 2002.
Brito, C. and Silva, E. S. and Diniz, M. C. and Leao, R. M. M. Analise Transiente
de Modelos de Fonte Multimidia. In SBRC’00 - 18o Simposio Brasileiro de Redes de
Computadores, Belo Horizonte, MG, Brazil.
Brito, C. and Moraes, R. and Oliveira, D. and Silva, E. S. Comunicacao Multicast
Confiavel na Implementacao de uma Ferramenta Whiteboard. In SBRC’99 - 17o Simposio Brasileiro de Redes de Computadores, Salvador, BA, Brazil.
ix
A BSTRACT OF THE D ISSERTATION
Graphical Methods for Identification in Structural
Equation Models
by
Carlos Eduardo Fisch de Brito
Doctor of Philosophy in Computer Science
University of California, Los Angeles, 2004
Professor Judea Pearl, Chair
Structural Equation Models (SEM) is one of the most important tools for causal analysis in the social and behavioral sciences (e.g., Economics, Sociology, etc). A central
problem in the application of SEM models is the analysis of Identification. Succintly,
a model is identified if it only admits a unique parametrization to be compatible with
a given covariance matrix (i.e., observed data). The identification of a model is important because, in general, no reliable quantitative conclusion can be derived from
non-identified models.
In this work, we develop a new approach for the analysis of identification in SEM,
based on graph theoretic techniques. Our main result is a general sufficient criterion
for model identification. The criterion consists of a number of graphical conditions
on the causal diagram of the model. We also develop a new method for computing
correlation constraints imposed by the structural assumptions, that can be used for
model testing. Finally, we also provide a generalization to the traditional method of
Instrumental Variables, through the concept of Instrumental Sets.
x
CHAPTER 1
Introduction
Structural Equation Models (SEM) is one of the most important tools for causal analysis in the social and behavioral sciences [Bol89, Dun75, McD97, BW80, Fis66,
KKB98]. Although most developments in SEM have been done by scientists in these
areas, the theoretical aspects of the model provide interesting problems that can benefit
from techniques developed in computer science.
In a structural equation model, the relationships among a set of observed variables
are expressed by linear equations. Each equation describes the dependence of one
variable in terms of the others, and contains a stochastic error term accounting for the
influence of unobserved factors. Independence assumptions on pairs of error terms are
also specified in the model.
An attractive characteristic of SEM models is their simple causal interpretation.
Specifically, the linear equation
encodes two distinct assumptions: (1)
the possible existence of (direct) causal influence of on
(direct) causal influence on
; and, (2) the absence of
of any variable that does not appear on the right-hand
side of the equation. The parameter quantifies the (direct) causal effect of on
.
That is, the equation claims that a unit increase in would result in units increase
of
, assuming that everything else remains the same.
Let us consider a simple example taken from [Pea00a]. This model investigates
the relations between smoking ( ) and lung cancer ( ), taking into consideration the
1
amount of tar ( ) deposited in a person’s lungs, and allowing for unobserved factors to
affect both smoking
and cancer . This situation is represented by the following
equations:
½
¾
¿
½ ¾ ¾ ¿ ½ ¿ The first three equations claim, respectively, that the level of smoking of a person
depends only on factors not included in the model, the amount of tar deposited in the
lungs depends on the level of smoking as well as external factors, and the level of
cancer depends on the amount of tar in the lungs and external factors. The remaining
equations say that the external factors that cause tar to be accumulated in the lungs
are independent of the external factors that affect the other variables, but the external
factors that have influence on smoking and cancer may be correlated.
All the information contained in the equations can be expressed by a graphical
representation, called causal diagram, as illustrated in Figure 1.1. We formally define
the model and its graphical representation in section 2.1.
Figure 1.2 shows a more elaborate model used to study correlations between relatives for systolic and diastolic blood pressures [TEM93]. The squares represent blood
pressures for each type of individual, and the circles represent genetic and environ-
the additive genetic contribution of the polygenes; D
the dominance genetic contributions; all the environmental factors; and those
mental causes of variation:
environmental components only shared by siblings of the same sex.
2
γ
a
b
X
Z
Y
(smoking)
(tar)
(cancer)
Figure 1.1: Smoking and lung cancer example
1.1 Data Analysis with SEM and the Identification Problem
The process of data analysis using Structural Equation Models consists of four steps
[KKB98]:
1. Specification: Description of the structure of the model. That is, the qualitative
relations among the variables are specified by linear equations. Quantitative information is generally not specified and is represented by parameters.
2. Identification: Analysis to decide if there is a unique valuation for the parameters
that make the model compatible with the observed data. The identification of a
SEM model is formally defined in Section 2.1.
3. Estimation: Actual estimation of the parameters from statistical information on the
observed variables.
4. Evaluation of fit: Assessment of the quality of the model as a description of the
data.
In this work, we will concentrate on the problem of Identification. That is, we
leave the task of model specification to other investigators, and develop conditions
to decide if these models are identified or not. The identification of a model is important because, in general, no reliable quantitative conclusion can be derived from a
3
Figure 1.2: Model for correlations between blood pressures of relatives
4
non-identified model. The question of identification has been the object of extensive
research [Fis66], [Dun75], [Pea00a], [McD97], [Rig95]. Despite all this effort, the
problem still remains open. That is, we do not have a necessary and sufficient condition for model identification in SEM. Some results are available for special classes of
models, and will be reviewed in Section 1.3.
In our approach to the problem, we state Identification as an intrinsic property of
the model, depending only on its structural assumptions. Since all such assumptions
are captured in the graphical representation of the model, we can apply graph theoretic
techniques to study the problem of Identification in SEM. Thus, our main results consist of graphical conditions for identification, to be applied on the causal diagram of
the model.
As a byproduct of our analysis, estimation methods will also be provided, as well
as methods to obtain constraints imposed by the model on the distribution over the
observed variables, thus addressing questions in steps 3 and 4 above.
1.2 Overview of Results
The central question studied in this work is the problem of identification in recursive
SEM models, that is, models that do not contain feedbacks (see Section 2.1 for more
details). The basic tool used in the analysis is Wright’s decomposition, which allows
us to express correlation coefficients as polynomials on the parameters of the model.
The important fact about this decomposition is that each term in the polynomial corresponds to a path in the causal diagram.
Based on the observation that these polynomials are linear on specific subsets of
parameters, we reduce the problem of Identification to the analysis of simple systems
of linear equations. As one should expect, conditions for linear independence of those
5
systems (which imply a unique solution and thus identification of the parameters),
translate into graphical conditions on the paths of the causal diagram.
Hence, the fundamental step in our method of Auxiliary Sets for model identification (see Chapter 3) consists in finding, for each variable
, a set of variables
and each variable in
specific restrictions on the paths between
with
.
As it turns out, the restrictions that allow us to obtain the maximum generality
from the method are not so easy to verify by visual inspection of the causal diagram.
To overcome this problem, we developed an algorithm that searches for an Auxiliary
Set for a given variable
(see Chapter 4). We also provide the Bow-free condition
and the Instrumental condition (Section 3.5), which are special cases of the general
method, but have straightforward application.
The machinery developed for the method of Auxiliary Sets can also be used to
compute correlation constraints. These constraints are implied by the structural assumptions, and allow us to test the model [McD97]. The basic idea is very simple.
While in the case of Identification we take advantage of linearly independent equations, it follows that correlation constraints are immediately obtained from linearly
dependent equations (see Chapter 5). Despite its simplicity, this is a very powerfull
method for computing correlation constraints.
The main goal of this research is to solve the problem of Identification for recursive
models. Namely, to obtain a necessary and sufficient condition for model identification. The sufficient condition provided by the method of Auxiliary Sets is very general,
and in Chapter 6 we present our initial efforts on our attempt to prove that it is also
necessary for identification.
Finally, in Chapter 7, we consider the problem of parameter identification. This
problem is motivated by the observation that even on non-identified models there may
exist some parameters whose value is uniquely determined by the structural assump-
6
tions and data. We provide a solution based on the concept of Instrumental Sets, which
generalizes the traditional method of Instrumental Variables [BT84]. The criterion for
parameter identification involves d-separation conditions, and the proofs required the
development of new techniques of independent interest.
1.3 Related Work
The use of graphical models to represent and reason about probability distributions has
been extensively studied [WL83, CCK83, Pea88]. In many areas such models have become the standard representation, e.g., Bayesian networks for dealing with uncertainty
in Artificial Intelligence [Pea88], and Markov random fields for speech recognition
and coding [KS80]. Some reasons for the success of the language of graphs in many
domains are: it provides a compact representation for a large class of probability distributions; it is convenient to describe dependencies among variables; and it consists
of a natural language for causal modeling. Besides these advantages, many methods
and techniques were developed to reason about probability distributions directly at the
level of the graphical representation [LS88, HD96]. An example of such a technique
is the d-separation criterion [Pea00a], which allows us to read off conditional independencies among variables by inspecting the graphical representation of the model.
The Identification problem has been tackled in the past half century, primarily by
econometricians and social scientists [Fis66, Dun75]. It is still unsolved. In other
words, we are not in possession of a necessary and sufficient criterion for deciding
whether the parameters in a structural model can be determined uniquely from the
covariance matrix of the observed variables.
Certain restricted classes of models are nevertheless known to be identifiable, and
these are often assumed by social scientists as a matter of convenience or convention
7
(1)
(2)
(3)
Figure 1.3: McDonald’s regressional hierarchy examples
[Dun75]. McDonald [1997] characterizes a hierarchy of three such classes (see Figure
1.3): (1) uncorrelated errors, (2) correlated errors restricted to exogenous variables,
and (3) correlated errors restricted to pairs of causally unordered variables (i.e., variables that are not connected by uni-directed paths.). The structural equations in all
three classes are regressional (i.e., the error term in each equation is uncorrelated with
the explanatory variables of that same equation) hence the parameters can be estimated
uniquely using Ordinary Least Squares techniques.
Traditional approaches to the Identification problem are based on algebraic manipulation of the equations defining the model. Powerful algebraic methods have been
developed for testing whether a specific parameter, or a specific equation in a model
is identifiable . However, such methods are often too complicated for investigators to
apply in the pre-analytic phase of model construction. Additionally, those specialized
methods are limited in scope. The rank and order criteria [Fis66], for example, do
not exploit restrictions on the error covariances (if such are available). The rank criterion further requires precise estimate of the covariance matrix before identifiability
can be decided. Identification methods based on block recursive models [Fisher, 1966;
Rigdon, 1995], for another example, insist on uncorrelated errors between any pair of
ordered blocks.
Recently, some advances have been achieved on graphical conditions for identi-
8
fication [Pea98, Pea00a, SRM98]. Examples of such conditions are the “back-door”
and “single-door” criteria [Pea00a, pp. 150–2]. The backdoor criterion consists of a dseparation test applied to the causal diagram, and provides a sufficient condition for the
identification of specific causal effects in the model. A problem with such conditions is
that they are applicable only in sparse models, that is, models rich in conditional independence. The same holds for criteria based on instrumental variables (IV) (Bowden
and Turkington, 1984), since these require search for variables (called instruments)
that are uncorrelated with the error terms in specific equations.
9
CHAPTER 2
Problem Definition and Background
2.1 Structural Equation Models and Identification
A structural equation model
for a vector of observed variables
½
¼
is
defined by a set of linear equations of the form
for .
Or, in matrix form
and .
The term in each equation corresponds to an stochastic error, assumed to have
where
½
¼
normal distribution with zero mean. The model also specifies independence assump-
tions for those error terms, by the indication of which entries in the matrix have value zero.
In this work, we consider only recursive models, which are characterized by the
fact that the matrix
is lower triangular.
This assumption is reasonable in many
domains, since it basically forbids feedback causation. That is, a sequence of variables
where each appears in the right-hand side of the equation for ·½, and
variable appears in the equation for ½ .
½
10
The structural assumptions encoded in a model
consist of:
(1) the set of variables omitted in the right-hand side of each equation (i.e., the zero
entries in matrix
); and,
).
(2) the pairs of independent error terms (i.e., zero entries in
, denoted by
The set of parameters of model
non-zero entries of matrices
A parametrization
and
, is composed by the (possibly)
.
for model
is a function
value to each parameter of the model. The pair
that assigns a real
determines a unique covariance
matrix over the observed variables, given by [Bol89]:
Å ½
½ Ì
(2.1)
and are obtained by replacing each non-zero entry of and
the respective value assigned by .
where
by
Now, we are ready to define formally the problem of Identification in SEM.
Definition 1 (Model Identification) A structural equation model
tified if, for almost every parametrization
for
, the following condition holds:
Å Å That is, if we view parametrization
is said to be iden-
(2.2)
as a point in , then the set of points in which
condition (2.2) does not hold has Lebesgue measure zero.
The identification status of simple models can be determined by explicitly calculating the covariances between the observed variables, and analyzing if the resulting
11
expressions imply a unique solution for the parameters. This method is illustrated in
the following examples.
Consider the model defined by the equations:
(2.3)
where the covariances of pairs of error terms not listed above are assumed to be zero.
We also make the assumption that each of the observed variables has zero mean and
is standardized (i.e., has variance 1). This assumption is not important because, if this
is not the case, a simple transformation can put the variables in this form. Immediate
consequences of this last assumption are:
¾.
and Calculating the covariances between observed variables, we obtain:
(2.4)
and, by similar derivations,
12
(2.5)
Now, it is easy to see that the values of parameters are uniquely determined by the covariances , and . Parameters and
are obtained by solving the system formed by the expressions for and
. Hence, if two parametrizations and induce the same covariance ma¼
trix they must be identical, and the model is identified.
Note, however, that this argument does not hold if parameter is exactly 1. In this
case, the equations for and do not allow us to obtain a unique
solution for parameters and . A unique solution can still be obtained from the
expression for , but if we also have
, then the parameters and are not uniquely determined by the covariance matrix of the observed variables. This
explains why we only require condition (2.2) to hold for almost every parametrization,
and allow it to fail in a set of measure zero.
The simplest example of a non-identified model corresponds to:
½ ½
¾ ½ ¾
½ ¾ In this case, the covariance matrix for the observed variables ½ , ¾ contains only
one entry, whose value is given by:
½ ¾ ½ ¾ ½ ½ ¾ ½ ¾ ½ ½ ¾ Now, given any parametrization , it is easy to construct another one parametrization ¼
, with ¼
¼
. But this implies that ,
and so the model is non-identified.
13
¼
W
α
β
b
a
X
c
Y
Z
Figure 2.1: A simple structural model and its causal diagram
In general, if a model
is non-identified, for each parametrization
an infinite number of distinct parametrizations
¼
such that
ever, it is also possible that for most parametrizations
there exists
. How¼
only a finite number of distinct
parametrizations generate the same covariance matrix. We will return to this issue in
Chapter 6, where we study sufficient conditions for non-identification, and will provide
an example of this situation.
A few other algebraic methods, like algebra of expectations [Dun75], have been
proposed in the literature. However, those techniques are too complicated to analyze
complex models. Here, we pursue a different strategy, and study the identification
status of SEM models using graphical methods. For this purpose, we introduce the
graphical representation of the model, called a causal diagram [Pea00a].
The causal diagram of a model
spond to the observed variables
consists of a directed graph whose nodes corre-
½
in the model. A directed edge from to
indicates that appears on the right-hand side of the equation for with a nonzero coefficient. A bidirected arc between and indicates that the corresponding
error terms, and , have non-zero correlation. The graphical representation can be
completed by labeling the directed edges with the respective coefficients of the linear
equations, and the bidirected arcs with the non-zero entries of the covariance matrix
14
Y
X
Z
V
W
U
Figure 2.2: A causal diagram
. Figure 2.1 shows the causal diagram for the example given in Eq. (2.3). Note that
the causal diagram of a recursive model does not have any cycle composed only of
directed edges.
The next section presents some basic definitions and facts about the type of directed
graphs considered here. Then, in section 2.3 we establish the connection between the
Identification problem and the graphical representation of the model.
2.2 Graph Background
in a causal diagram consists of a sequence of edges
½ ¾ such that ½ is incident to , is incident to , and every pair of
and are
consecutive edges in the sequence has a common variable. Variables
A path between variables
and
called the extreme points of the path, and every other variable appearing in some edge
is said to be an intermediate variable in the path. We say that the path points to
extreme point
( ) if the edge
½ ( ) has an arrow head pointing to
For example, the following are some of the paths between
diagram of Figure 2.2:
15
and
( ).
in the causal
Note that only the third path points to variable
, but all of them point to
.
½ between and is valid if variable only appears in
½ , variable only appears in , and every intermediate variable appears in exactly
A path
two edges in the path. Among the examples above, only the first three are valid. The
last one is invalid because variable
appears in more than two edges.
The special case of a path composed only by directed edges, all of which oriented
in the same direction, is called a chain. The first example above corresponds to a chain
to
from
.
We will also make use of a few family terms to refer to variables in particular
topological relationships. Specifically, if the edge
, then
is present in the causal
. Similarly, if there exists a chain from
is said to be an ancestor of , and is a descendant of . Clearly,
diagram, then we say that
to
is a parent of
in a recursive model, we cannot have the situation where
is both an ancestor and a
. In the causal diagram of Figure 2.2, and are the parents of variable , and is an ancestor of both and .
Given a path between and , and an intermediate variable in , we denote
by the path consisting of the edges of that appear between and . 1
Variable is a collider in path a between and , if both and descendant of some other variable
½ Here,
and in most of the following, we are only concerned about valid paths, so this concept is
well-defined
16
point to . A path that does not contain any collider is said to be unblocked. Next, we
consider a few important facts about unblocked paths.
Define the depth of a node
in a causal diagram as the length (i.e., number of
edges) of the longest chain from any ancestor of
to
.
Nodes with no ancestors
have depth 0.
Lemma 1 Let and be nodes in the causal diagram of a recursive model such that
depth(X)
depth(Y). Then, every path between
with depth(Z)
and
which includes a node
depth(X) must have a collider.
Proof: Consider a path
between above. We observe that
cannot be an ancestor of either or , otherwise we would
have depth
depth or depth
Now, consider the subpath of
and
and node
satisfying the conditions
depth .
between
and
.
If this subpath has the form
, then it must contain a collider, since it cannot be a directed path from
to . Similarly, if the subpath of between and has the form , then it
must contain a collider.
In all the remaining cases
¾
is a collider blocking the path.
If is a path between and , and is a path between and , then denotes
the path obtained by the concatenation of the sequences of edges corresponding to
and .
Lemma 2 Let be an unblocked path between and , and let be an unblocked
path between and . Then, is a valid unblocked path between and
only if:
(i)
and do not have any intermediate variable in common;
17
if and
Y
X
Z
V
W
U
Figure 2.3: A causal diagram
(ii) either is a chain from to , or is a chain from to .
Definition 2 -separation
A set of nodes
-separates from in a graph, if closes every path between and . A path is closed by a set
(possibly empty) if one of the following holds:
(i)
contains at least one non-collider that is in ;
(ii)
contains at least one collider that is outside
For example, consider the path and has no descendant in .
in Figure 2.3. This path is
closed by any set containing variables or . On the other hand, the path is closed by the empty set , but is not closed by any set containing
or but not or . It is easy to verify by inspection that the set closes
every path between and , and so d-separates from .
2.3 Wright’s Method of Path Analysis
The method of path analysis [Wri34] for identification is based on a decomposition of
the correlations between observed variables into polynomials on the parameters of the
18
Z1
a
X1
γ1
¾ ½
e
f
b
c1 c2
½ ½
Z1
½
X2
γ2
½
¾
½
Y
½
¾
½
½ ¾
¾ ¾
¾
¾
½ · ¾
¾ ¾
Figure 2.4: Wright’s equations.
model. More precisely, for variables
coefficient of
and
, denoted by and
in a recursive model, the correlation
, can be expressed as:
paths (2.6)
represents the product of the parameters of the edges along path
, and the summation ranges over all unblocked paths between and . For this
where the term
equality to hold, the variables in the model must be standardized (i.e., variance equal
to 1) and have zero mean. We refer to Eq.(2.6) as Wright’s decomposition for
.
Figure 2.4 shows a simple model and the decompositions of the correlations for each
pair of variables.
The set of equations obtained from Wright’s decompositions summarizes all the
statistical information encoded in the model. Therefore, any question about identification can be decided by studying the solutions for this system of equations. However,
since this is a system of non-linear equations, it can be very difficult to analyze the
identification of large models by directly studying the solutions for these equations.
19
CHAPTER 3
Auxiliary Sets for Model Identification
3.1 Introduction
In this chapter we investigate sufficient conditions for model identification. Specifically, we want to find graphical conditions on the causal diagram that guarantee the
identification of every parameter in the model. One example of the type of result obtained here is the Bow-Free Condition, which states that every model whose causal
diagram has at most one edge connecting any pair of variables is identified.
The starting point for our analysis of identification is the set of equations provided
by Wright’s decompositions of correlations. Then, we make the following important
observation.
For an arbitrary variable
an arrow head pointing to
, let
be a set of incoming edges to (i.e., edges with
). Then, any unblocked path in the causal diagram can
include at most one edge from . This follows because if two such edges appear in
a valid path, then they must be consecutive. But since both edges point to
), the path must be blocked.
(e.g.,
Now, recall that each term in the polynomial of Wright’s decomposition corresponds to an unblocked path in the causal diagram. Thus, the observation above implies that such polynomials are linear in the parameters of the edges in .
Hence, our approach to the problem of Identification in SEM can be summarized
20
as follows. First, we partition all the edges in the causal diagram into sets of incoming
edges. Then, we study the identification of the parameters associated with each set by
analyzing the solution of a system of linear equations.
Two conditions must be satisfied to obtain the identification of the parameters corresponding to a set of edges . First, there must exist a sufficient number of linearly
independent equations. Second, the coefficients of these equations, which are functions of other parameters in the model, must be identified.
To address the first issue, we developed a graphical characterization for linear independence, called the G Criterion. That is, for a fixed variable , if a set of variables
½ satisfies the graphical conditions established by the G criterion, then the
decompositions of ½ are linearly independent (with respect to the param-
eters of edges in ). These conditions are based on the existence of specific unblocked
paths between and each of the ’s.
The second point is addressed by establishing an appropriate order to solve the
systems of equations.
The following sections will formally develop this graphical analysis of identification.
3.2 Basic Systems of Linear Equations
We begin by partitioning the set of edges in the causal diagram into sets of incoming
edges.
for the variables in the model, with the only restriction that if
, then must appear before in . For each variable , we
define as the set of edges in the causal diagram that connect to any variable
appearing before in the ordering .
Fix an ordering
21
It easily follows from this definition that, for each variable
directed edges pointing to (i.e., , contains all
).
Lemma 3 Any unblocked path between and some variable can include at most
one edge from . Moreover, if , then any such path must
include exactly one edge from .
Proof: The first part of the lemma follows from the argument given in Section 3.1. For
the second part, assume that is an unblocked path between and , which does not
contain any edge from . Let be the variable adjacent to in path . Clearly,
(otherwise edge would belong to ). But then
Lemma 1 saysays that contains a collider, which is a contradiction.
¾
Now, fix an arbitrary variable , and let ½ denote the parameters of the
edges in . Then, Lemma 3 allows us to express Wright’s decomposition of the
correlation between and as a linear equation on the ’s:
where if .
Figure 3.1 shows the linear equations obtained from the correlations between and every other variable in a model.
, we let
1 denote the system
of equations corresponding to the decompositions of correlations ½ :
Now, given a set of variables
½ ½ Whenever clear
from the context, we drop the reference to
22
and simply write .
Z1
Z2
a
e
f
¾
½
½
½
b
X2
X1
λ3
λ2
λ1
λ4
Y
Figure 3.1: Wright’s equations.
3.3 Auxiliary Sets and Linear Independence
Following the ideas presented in Section 3.1, we would like to find a set of variables
that provides a system of linearly independent equations. This motivates the following
definition:
Definition 3 (Auxiliary Sets) A set of variables
½
is said to be an
Auxiliary Set with respect to if and only if the system of equations is linearly
independent.
Next, we obtain sufficient graphical conditions for a given set of variables to be
an auxiliary set for
.
Since the terms in Wright’s decompositions correspond to
unblocked paths, it is natural to expect that linear independence between equations
translate into properties of such paths. In the following, we explore this connection by
analyzing a few examples, and then we introduce the G criterion.
For each of the models in Figure 3.2 we will verify if the set
½
¾ qualifies
as an Auxiliary Set for . In model ½ , the system of equations is given by:
23
Z2
Z1
a
X1
λ1
Z1
Z2
d
c
b
λ2
λ3
c
b
λ2
λ3
b
c
X2
λ1
λ4
a
Z1
X1
X2
Z2
a
X1
λ1
λ4
X2
λ2
λ3
Y
Y
Y
M1
M2
M3
λ4
Figure 3.2: Figure
½
¾
¾ ¿
¾ ¿
It is easy to see that the equations are linearly independent 2 , and so
is an Auxiliary
Set for . We also call attention to the fact that unblocked paths ½ ½
and ¾ ¾
¾
½
have no intermediate variables in common.
In model ¾ , system is formed by:
½
¾
¾ ¿
¾ ¿
¾ ¿ Clearly, the equations are not linearly independent in this case. This occurs because
every unblocked path between ½ and in ¾ can be extended by the edge ½
¾
to give an unblocked path between ¾ and .
Finally, in model ¿ , the system is given by:
¾ This is not true if ad = bc, but this condition only holds on a set of measure zero (see discussion in
Section 2.1)
24
Zj
Zi
pi
pj
Y
Figure 3.3: Example illustrating condition
½
¾ ¿
¾
¾
in the G criterion
and again we obtain a pair of linearly independent equations. The important fact to
note here is that, if we extend path ½
¾
by edge ¾
½ ,
we obtain a path
blocked by ½ .
In general, the situation can become much more complicated, with one equation
being a linear combination of several others. However, as we will see in the following,
the examples discussed above illustrate the essential graphical properties that characterize linear independence.
G Criterion: A set of variables
½ satisfies the G criterion with respect
to if there exist paths ½ such that:
(i) is an unblocked path between and including some edge from (ii) for
, ;
is the only possible common variable in paths and (other than
), and in this case, both and must point to (see Figure 3.3 for
an example).
25
Next, we prove a technical lemma, and then establish the main result of this Chapter.
½ witness the fact that satisfies the G criterion with respect to . Let ½ denote the parameters of the edges in , and, without loss of generality, assume
that, for , path contains the edge with parameter . (It is easy to see
that condition of the G criterion does not allow paths and to have a common
Let
½
be a set of variables, and assume that paths
edge.)
, let be an unblocked path between and including the
edge with parameter . Then, must contain an edge that does not appear in any of
½ .
Lemma 4 For
, if both
is an intermediate variable in , then
Proof: Without loss of generality, we may assume that, for all
variables
.
appear in path , and
If this is not the case, then we can always rename the variables such that this
condition holds, and condition of the G criterion is not violated.
be a path satisfying the conditions of the lemma, and assume that contains only edges appearing in ½ . In the following we show that must be
Now, let
blocked by a collider.
Clearly, we can divide the path into segments ½ such that all the edges in
each segment belong to the same path .
(from condition
of the G criterion). On the other hand, the edges of the last segment belong to
path .
Now, note that variable
can appear only in a path for , there must exist two consecutive segments , ·½ and indices , such that the edges in belong to and the edges in ·½ belong to .
Since
26
The common variable of
be , and
appear in both and . Since , it must
and
must point to it.
belong to subpath , then
If the edges in
also points to . In this
case, is a collider in path and the lemma follows.
In the other case,
cannot be the first segment of path , and we consider segment
whose edges belong to, say, path
. Since we assumed that
is the first
segment with edges from a path with , we conclude that .
But we also have that appears in subpath , and the initial assumption
is that . Thus, we have a contradiction, and the lemma follows.
Theorem 1 If the set of variables
respect to , then
¾
satisfies the G criterion with
is an Auxiliary Set for .
Proof: The system of equations can be written in matrix form as:
where .
½ , is a by matrix, and
Let denote the submatrix corresponding to the first columns of . We will
show that , which implies that and the equations in are
linearly independent with respect to the ’s.
Applying the definition of determinant, we obtain
(3.1)
where the summation ranges over all permutations of , and denotes the
parity of permutation .
27
First, observe that entry
corresponds to unblocked paths between and including the edge from with parameter . In particular, is one of these
´ µ ´ µ
.
This
implies
that
the
term
paths, and we can write
¼
appears in the summand corresponding to permutation .
Also, note that
every factor in is the parameter of an edge in some .
On the other hand, any term in the summand of a permutation distinct from must contain a factor from some entry
, with . Such an entry corresponds to
unblocked paths between and including the edge from with parameter
. But lemma 4 says that those paths must have at least one edge that does not appear
in any of ½ . This implies that is not cancelled out by any other term in 3.1,
¾
and so does not vanish, completing the proof of the theorem.
3.4 Model Identification Using Auxiliary Sets
Assume that for each variable there is an Auxiliary Set , with .
This implies that for each there exists a system of linear equations
that
can be solved uniquely for the parameters ½ of the edges in . This
fact, however, does not guarantee the identification of the ’s, because the solution for
each is a function of the coefficients in the linear equations, which may depend on
non-identified parameters.
To prove identification we need to find an appropriate order to solve the systems
of equations. This order will depend on the variables that compose each auxiliary set.
The following theorem gives a simple sufficient condition for identification:
Theorem 2 Assume that the Auxiliary Set of each variable satisfies:
(i)
;
28
(ii)
, for all .
Then, the model is identified.
Proof: We prove the theorem by induction on the depth of the variables.
Let be a variable at depth 0.
Note that can only contain bidirected edges connecting to another variable at depth 0. Let . Observing that the only unblocked path be-
tween and consists precisely of edge , we get that the parameter of edge
is identified and given by .
Now, assume that, for every variable at depth smaller than , the parameters of
the edges in are identified.
Let be a variable at depth , and let . Lemma 1 implies that every
intermediate variable of an unblocked path between and has depth smaller than
. The inductive hypothesis then implies that the coefficient of the linear equations in
are identified. Hence, the parameters of the edges in are identified.
¾
In the general case, however, the auxiliary set for some variable may contain
variables at greater depths than , or even descendants of . This would force us to
solve the systems of equations in a different order than the one established by the depth
of the variables.
In the following, we provide two rules that impose restrictions on the order in
which the linear systems must be solved. We will see that if these restrictions do not
generate a cycle, then the model is identified.
R1: If, for every bidirected edge in , variable is not an ancestor of
any , then can be solved at any time.
R2: Otherwise,
29
U
W
Y
Z
Figure 3.4: Example illustrating rule R¾´ µ.
a) For every ,¨
must be solved before ¨ is solved.
30
b) If
is a descendant of
, and ´
µ
is a bidirected edge with
an
ancestor of , then for every lying on a chain from to (see Figure 3.4),
¨
must be solved before ¨ .
For a model and a given choice of Auxiliary Sets, the restrictions above can be
represented by a directed graph, called the dependence graph , as follows:
Each node in corresponds to a variable in the model;
There exists a directed edge from
to in if rule R1 does not apply to ,
and rule R2 imposes that ¨ must be solved before ¨ .
The next theorem states our general sufficient condition for model identification:
Theorem 3 Assume that there exist Auxiliary Sets for each variable in model , such
that the associated dependence graph has no directed cycles. Then, model is
identified.
The proof of the theorem is given in appendix A.
Figure 3.5 shows an example that illustrates the method just described. Apparently,
this is a very simple model. However, it actually requires the full generality of Theorem
3. The Figure also shows the auxiliary sets for each variable, and the corresponding
. The fact that rule ½ can be applied to variable
on and eliminates the possibility of a cycle.
dependence graph
dependence of
avoids a
3.5 Simpler Conditions for Identification
The sufficient condition for identification presented in the previous section is very
general, but complicated to verify by visual inspection of the causal diagram. The
31
X
Y
X
Z
Z
W
W
Y
R1 applies to Z.
R2 applies to Y,W.
Figure 3.5: Example illustrating Auxiliary Sets method.
main difficulty resides in finding the required unblocked paths witnessing that a set
of variables is an Auxiliary Set. A solution for this problem is provided in Chapter 4,
where we develop an algorithm to find an appropriate Auxiliary Set for a fixed variable
. However, simple conditions for identification still seem to be useful, and this is the
subject of this section.
3.5.1 Bow-free Models
A bow-free model is characterized by the property that no pair of variables is connected
by more than one edge. Actually, this represents the simplest situation for our method
of identification.
Corollary 1 Every bow-free model is identified.
32
Z
X
Y
W
Figure 3.6: Example of a Bow-free model.
be the set of variables
such that, for , edge belongs to . Since the model is bow . Moreover, the set of paths ½ , where
free, it follows that each is the trivial path consisting of the single edge , witnesses that is an
Auxiliary Set for .
Proof: Fix an arbitrary variable
, and let
½
be the ordering used in th construction of the sets of incoming edges
. Then, for every , all the variables in appear before in . This implies
that the dependence graph is acyclic, and the corollary follows.
¾
Now, let
Figure 3.6 shows an interesting example. A brief examination of this causal diagram will reveal that no conditional independence holds among the variables
,
,
and . As a consequence, most traditional methods for Identification (e.g., Instrumental Variables, Back-door criterion) would fail to classify this model as identified.
However, as can be easily verified, this model is bow-free and hence identified.
3.5.2 Instrumental Condition
From the preceding result, it is clear that all the problems for identification arise from
the existence of bow-arcs in the causal diagram (i.e., pairs of variables connected by
both a directed and a bidirected edge). Let us examine this structure in more detail and
try to understand why it represents a problem for identification.
33
Assume that there is a bow-arc between variables
and
, and let Æ, denote
the parameters of the directed and bidirected edges in the bow, respectively. Then,
Wright’s decomposition for correlation
gives 3 :
Æ
Æ
Now, if this is the only constraint on the values of and , then there exists an infinite
Æ
that are consistent with the observed correlation .
This situation occurs, for example, when both edges in the bow point to , and there is
number of solutions for and
no other edge in the model with an arrow head pointing to
. In this case, by analyzing
the decomposition of the correlations between any pair of variables, we observe that
Æ at all, or they only depend on their
either they do not depend on the values of ,
sum (See Figure 3.7(a) for an example). From this observation it is possible to derive
a proof that any such model is non-identified. Similar techniques will be explored in
Chapter 6 to obtain graphical conditions for non-identification.
Now, consider the situation where there exists a third variable
edge pointing to
. A variable
with a directed
with such properties is sometimes called an Instru-
mental Variable. Figure 3.7(b) shows an example of this situation, with the respective
decompositions of correlations. It is easy to verify that there exists a unique solution
for parameters
Æ in terms of the observed correlations.
Hence, the basic idea for our next sufficient condition for identification is to find,
for each bow-arc between
and , a distinct variable with an edge pointing to .
The condition is precisely stated as follows.
For a fixed variable
connected to
¿ Here,
let ½
be the set of variables that are
by a directed and a bidirected edges, both pointing to .
we are assuming that there is no other unblocked path between
34
and
.
X
a
Z
λ
δ
b
λ
δ
a
α
Y
Æ
c
Æ
Æ
W
Z
X
(a)
Y
Æ
Æ
(b)
Figure 3.7: Examples illustrating the instrumental condition.
Instrumental Condition: We say that a variable
tion if, for each variable
¾ , there exists a unique variable (i)
;
(ii)
is not connected to
satisfies the Instrumental Condisatisfying:
by any edge;
(iii) there exists an edge between
and
that points to
.
Corollary 2 Assume that the Instrumental Condition holds for every variable in model
. Then model is identified.
35
CHAPTER 4
Algorithm
In some elaborate models, it is not an easy task to check if a set of variables satisfies
the GAV criterion. Moreover, the criterion itself does not provide any guidance to find
a set of variables satisfying its conditions. In this chapter we present an algorithm that,
finds an Auxiliary Set
for a fixed variable
, if any such set exists.
The basic idea is to reduce the problem to an instance of the maximum flow problem in a network.
Cormen et al [CCR90] define the maximum flow problem as follows. A flow
is a directed graph in which each edge has a nonnegative capacity . We distinguish two vertices in the flow network: a
source and a sink . A flow in is a real-valued function , satisfying:
network
, for all ;
, for all ;
¾ , for all .
Ú Î
states that the amount of flow on any edge cannot exceed its
capacity; condition says that the amount of flow running on one direction of an
That is, condition
edge is the same as the flow in the other direction, but with opposite sign; and condition
establishes that the amount of flow entering any vertex must be the same as the
amount of flow leaving the vertex.
36
Intuitively, the value of a flow
is the amount of flow that is transfered from the
source to the sink , and can be formally defined as
¾
In the maximum flow problem, we are given a flow network , with source and
sink , and we wish to find a flow of maximum value from to .
Before describing the construction of the flow network, we make a few observations. Fix an arbitrary variable .
Lemma 5 Assume that the set of variables
satisfies the conditions of the GAV crite-
rion with respect to . Then, there exists a set of variables
and paths ½ such that
(i)
(ii)
½ witness that
(iii) for satisfies the GAV criterion;
, every intermediate variable in belongs to
.
½ , and let ½ be paths witnessing that
Proof: Let
satisfies
the GAV criterion. The lemma follows from the next observation.
associated with variable
. Then the paths ½ ½ ·½ witness that ½ , ,
½ , ,
·½ , , satisfies the GAV criterion with respect to .
¾
Let
be an intermediate variable in the path
Let
be a set of variables, and ½ be paths satisfying the conditions of
Lemma 5. Then, it follows that each of the paths must be either a bidirected edge
, or a chain , or the concatenation of a bidirected edge with
a chain .
37
From these observations we conclude that, in the search for an Auxiliary Set for
, we only need to consider:
¯ ancestors of
;
¯ variables connected by a bidirected edge to either
Now, from condition or an ancestor of
;
of the GAV criterion, we get that an ancestor of can
appear in at most two paths: (1) the path between and ; and (2) some path ,
as an intermediate variable. To allow this possibility, for each ancestor of , we create
two vertices in the flow network.
Since non-ancestors of
can appear in at most one path, there will be only one
vertex in the flow network corresponding to each such variable.
Directed edges between ancestors of
in the causal diagram are represented by
directed edges between the corresponding vertices in the flow network. Bidirected
edges incident to
and non-ancestors of
are also represented by directed edges.
Bidirected edges between ancestors and require special treatment, because they
can appear as the first edge of either or , but not in both of them. To enforce this
restriction we make use of an extra vertex.
Next, we define the flow network that will be used to find an Auxiliary Set for
.
The set of vertices of consists of:
¯ for each ancestor of , we include two vertices, denoted and ;
¯ for each non-ancestor , we include vertex ;
¯ for each bidirected edge
°
connecting ancestors of
vertex ;
38
, we include the
a source vertex ;
a sink vertex , corresponding to variable .
The set of edges of is defined as follows:
for each ancestor of , we include the edge ;
for each directed edge in the causal diagram, connecting ancestors of ,
we include the edge ;
for each directed edge , we include the edge ;
for each bidirected edge , where is an ancestor of , we include the edge
;
for each bidirected edge , where is a non-ancestor and is an ancestor of , we include the edge ;
for each bidirected edge , where is a non-ancestor, we include the
edge ;
for each bidirected edge , where both and are ancestors of , we
include the edges: , , , ;
for each ancestor of , we include the edge ;
for each non-ancestor , we include the edge .
To solve the maximum flow problem on the flow network defined above, we
assign capacity 1 to every edge in . We also impose the additional constraint of
maximum incoming flow capacity of 1 to every vertex in (this can be implemented
39
by splitting each vertex into two and connecting them by an edge of capacity 1), except
for vertices and .
We solve the Max-flow problem using the Ford-Fulkerson algorithm. From the
integrality theorem ([CCR90], p.603), the computed flow allocates a non-negative
integer amount of flow to each edge in . Since we assign capacity 1 to every edge,
this solution corresponds to disjoint directed paths from
to . The Auxiliary Set
returned by the algorithm is simply the set of variables corresponding to the first vertex
in each path.
Theorem 4 The algorithm described above is sound and complete. That is, the set of
variables returned by the algorithm is an Auxiliary Set for with maximum size.
Proof: Fix a variable in model , and let be the corresponding flow network. Let be the flow computed by the Ford-Fulkerson algorithm on , and let
.
As described above, we interpret the flow solution as a set of edge disjoint paths
½ from the source to the sink .
First, we show that each such path corresponds to an unblocked path in the causal
diagram of . Let ½
be one of the paths in . We
make the following observations:
is either a vertex corresponding to a non-ancestor of , or a vertex corresponding to an ancestor of in the causal diagram.
for , each is either a vertex corresponding to ancestor of ,
or a vertex corresponding to a bidirected edge between ancestors and of , but the later can only occur if 40
and .
Now, we establish a correspondence between the edges of
and the edges of the
causal diagram.
1. The directed edge corresponds to the directed edge from the respective
ancestor to variable .
2. If two vertices and are both vertices of the type and ·½ , then the edge
in
corresponds to the edge 3. If is a vertex of type , then the edge and path
corresponds to the path in the causal diagram.
corresponds to the edge ,
in the causal
diagram.
4. If is a vertex of type and is a vertex of type ½ , then the edge corresponds to the edge , and path corresponds to the path in the causal diagram.
5. If is a vertex of type and is of type , then the edge to the edge , and path
corresponds to the path corresponds
in the causal diagram.
Now, let be the first vertices in each of the paths in , and let be the variables associated with those vertices in the causal diagram.
Note that the constraint of maximum incoming flow capacity of 1 on the vertices
of implies that the paths are vertex disjoint (and also implies that the
variables are all distinct). However, this constraint does not imply that the
corresponding paths in the causal diagram are vertex disjoint.
For an ancestor of , it is possible that vertices and appear in distinct
paths
and
, and so would appear in two of the corresponding paths on the causal
41
s
Z1
VZ-
V^Z
VW
W
X
U
VZU
V^X
VX-
VUV^U
Y
t
Figure 4.1: A causal diagram and the corresponding flow network
diagram. In fact, this is the only possibility for a variable to appear in more than one
such paths, and in this case it is easy to verify that the conditions of the G criterion
hold. This proves that the algorithm is sound.
Completeness easily follows from the construction of the flow network, and the
¾
optimality of Ford-Fulkerson algorithm
Figure 4.1 shows an example of a simple causal diagram and the corresponding
flow network
.
Theorem 5 The time complexity of the algorithm described above is ¿ .
Proof: The theorem easily follows from the facts that the number of vertices in
the flow network is proportional to the variables in the model, and that Ford-Fulkerson
algorithm runs in ¿ , where is the size of the flow network.
42
¾
CHAPTER 5
Correlation Constraints
It is a well-known fact that the set of structural assumptions defining a SEM model
may impose constraints on the covariance matrix of the observed variables [McD97].
That is, a given set of structural assumptions may imply that the value of a particular
entry in
is a function of some other entries in this matrix.
An immediate consequence of this observation is that a model
may not be com-
patible with an observed covariance matrix , in the sense that for every parametrization
we have
.
This allows to test the quality of the model. That is, by
verifying if the constraints imposed by the structural assumptions are satisfied in the
observed covariance matrix (at least approximately), we either increase our confidence
in the model, or decide to modify (or discard) it.
The first type of constraint imposed by a SEM model is associated with d-separation
conditions. Such condition is equivalent to a conditional independence statement
[Pea00a], and implies that a corresponding correlation coefficient (or partial correlation) must be zero. Figure 5.1 shows two examples illustrating how we can obtain
correlation constraints from d-separation conditions. In the causal diagram
easy to see that variables
and , it is
are d-separated (the only path between them is
. For the model in , it
follows that variables and are d-separated when conditioning on . This implies
that . Applying the recursive formula for partial correlations, we get:
blocked by variable ). This immediately gives that
43
α
X
X
Y
a
X
Y
b
Z
Z
γ
c
W
Z
(a)
W
(b)
(c)
Figure 5.1: D-separation conditions and correlation constraints.
and so,
¾ ¾
.
The d-separation conditions, however, do not capture all the constraints imposed
by the structural assumptions. In the model of Figure 5.1 no d-separation condition
holds, but the following algebraic analysis shows that there is a constraint involving
the correlations
and .
Applying Wright’s decomposition to these correlation coefficients, we obtain the
following equations:
Observe that factoring out in the right-hand side of the equation for
obtain the expression for
. Thus, we can write
44
, we
Similarly, we obtain
Now, it is easy to see that the following constraint is implied by the two equations
above
Clearly, this type of analysis is not appropriate for complex models. In the following we show how to use the concept of Auxiliary Sets to compute constraints in a
systematic way.
5.1 Obtaining Constraints Using Auxiliary Sets
The idea is simple, but gives a general method for computing correlation constraints.
Fix a variable , and let ½ denote the parameters of the edges in .
Recall that if
½ is an Auxiliary Set for , then the decomposition of
the correlations ½ gives a linearly independent system of equations, with
respect to the ’s. Let us denote this system by Now, if
.
, then this system of equations is maximal, in the sense
that any linear equation on parameters ½ can be expressed as a linear combination of the equations in
decomposition of , where . Hence, computing this linear combination for the
, gives a constraint involving the correlations
among ½ .
45
In the following, we describe the method in more detail.
, and assume that . The decomposition of the correlations ½ can be written as:
Let
½
be an Auxiliary Set for
½
(5.1)
Or, in matrix form,
, it follows that is an by Since is an Auxiliary Set and matrix with rank .
Now, let , and write the decomposition of Clearly, the vector as
(5.2)
can be expressed as a linear combination of the
rows of matrix as
where denotes the row of matrix , and the ’s are the coefficients of the linear
combination.
But this implies that we can express the correlation as
by considering the left-hand side of Equations 5.1 and 5.2.
46
W
Z
a
c
α
b
X2
X1
λ1
λ2
X3
λ3
λ4
Y
Figure 5.2: Example of a model that imposes correlation constraints
We also note that the coefficients
are functions of the parameters of the model.
But if the model is identified, then each of these parameters can be expressed as a function of the correlations among the observed variables. Hence, we obtain an expression
only in terms of those correlations.
To illustrate the method, let us consider the model in Figure 5.2. It is not difficult to
check that
½ is an Auxiliary Set for variable . Next, we obtain
a constraint involving the correlations and ¿ .
The decompositions of the correlations between and each variable in gives
the following system of equations:
½
¾
¿
and the decomposition of gives the equation
The corresponding matrix and vector are then given by:
47
After some calculations, one can verify that vector can be expressed as a linear
combination of the first and fourth rows of
¾ ¾¾
as:
(5.3)
Since the model is identified, we can express the parameters and in terms of the
correlation coefficients. In this case, we obtain ¿ and . Substituting
these expressions in Equation 5.3 and performing some algebraic manipulations, we
obtain the following constraint:
¿ ¿ ¿ ¿ 48
CHAPTER 6
Sufficient Conditions For Non-Identification
The ultimate goal of this research is to solve the problem of identification in SEM.
That is, to obtain a necessary and sufficient condition for identification, based only on
the structural assumptions of the model.
In Chapter 3, we introduced our graphical approach for identification, and provided
a very general sufficient condition for model identification. Indeed, we are not aware of
any example of an identified model that cannot be proven to be so using the method of
auxiliary sets. Here, we present the results of our initial efforts on the other side of the
problem. That is, we investigate necessary graphical conditions for the identification
of a SEM model.
The method of auxiliary sets imposes two main conditions to classify a model
as identified. First, for each variable
in
we must find a sufficiently large auxiliary
set. Second, given the choice of auxiliary sets, a precedence relation is established
among the variables, and represented by a dependence graph
. This graph cannot
contain any directed cycle.
A natural strategy, then, is to assume that one of these conditions does not hold,
and try to prove that the model is non-identified. The proof of non-identification is
conceptually simple. We begin with an arbitrary parametrization
for model
.
Then, we show that, under the specified conditions, it is possible to construct another
parametrization
¼
such that . This proves that the model is
¼
49
Z
X
X2
X1
Z
W
Y
U
Y
W
(b)
(a)
Figure 6.1: Figure
non-identified.
6.1 Violating the First Condition
In this section we analyze two sets of conditions that prevent the existence of a sufficiently large Auxiliary Set for a given variable
. In both cases we can show that they
imply the non-identification of the underlying model. These conditions, however, do
not provide a complete characterization of the situation where the first condition of the
Auxiliary Sets method does not hold. At the end of this section, we give an example
that illustrates this fact. At this point, such a characterization is the major difficulty to
obtaining a more general condition for non-identification.
The simplest example of non-identification, briefly discussed in Section 3.5.2, consists of a model in which a pair of variables
bidirected edge, both pointing to
is connected by a directed and a
, and there is no other edge pointing to
in the
causal diagram (see Figure 6.1(a)).
The next theorem provides a simple generalization of this situation. We assume
that, for a fixed variable , there exists an Auxiliary Set
50
, with not enough variables.
We also assume that every edge connecting a variable
does not point to
¾
and a variable outside
(see Figure 6.1(b)). Under these conditions, it is possible
to show that there exists no Auxiliary Set for which is larger than
. The theorem
proves that any such model is non-identified.
Theorem 6 Let be an arbitrary variable in model , and let
½ be an Auxiliary Set for . Assume that
(i)
(ii)
;
only contains non-descendants of ;
(iii) There exists no edge between some
pointing to
and a variable
.
Then, model is non-identified.
½ denote the parameters of the edges in . First, we establish that, for every pair of variables the correlation only depends on the
values of the ’s through the correlations ½ . That is, any modification
of the values of the ’s that does not change the values of ½ , also does
not change the value of . This is formally stated as follows:
Proof: Let
Lemma 6 Let and be arbitrary variables in model . Then, the correlation can be expressed as
where the independent term and the coefficients ’s do not depend on the parameters .
51
The proof of the lemma is given in Appendix B.
Now, fix an arbitrary parametrization
for model . This parametrization induces
a unique correlation coefficient for each pair of variables , that we denote by
.
According to Wright’s decomposition, the correlations ½ de-
pend on the values of parameters ’s through the linear equations:
½ Since
, this system of equations has more variables than equa-
tions, and so there exists an infinite number of solutions for the ’s. The values assigned to the ’s by parametrization
other solution gives a parametrization
corresponds to one of these solutions. Any
¼
with ¼
¾
.
The next theorem considers a much weaker set of assumptions that still prevent the
existence of an Auxiliary Set for with enough variables. The theorem shows that
any such model is non-identified.
Let be an arbitrary variable in model , and let
Auxiliary Set for . Define the boundary of variables of
be an
, denoted
subset of variables in for which there exists a variable , as the
and an
edge pointing to .
Theorem 7 Assume that for a given variable in model , the following conditions
hold:
(i)
;
(ii) For any , there exists no edge pointing to ;
52
W
Z2
Z1
X1
W
V
X2
Z
X3
X1
X2
X3
Y
Y
(a)
(b)
Figure 6.2: Figure
¾
(iii) For any
and , there is no edge pointing
to .
Then, model is non-identified.
The proof of this theorem in given in Appendix B.
Figure 6.2(a) shows a causal diagram satisfying the conditions in Theorem 7. In
this model, the Auxiliary Set for is given by
in
½ ¾ ¿ , with boudary
¿ . Note that ¿ and ¾ are the only edges between a variable
and a variable in
, and they do not point to and ¿.
Figure 6.2(b) shows a causal diagram where the conditions of Theorem 7 do not
hold. In this case, the Auxiliary Set for is given by
boudary
¾. Note that edge
½
¾
½ ¾ ¿ ½ ¾, with
violates condition
. However, it is
still possible to show that there is no Auxiliary Set for with more than 5 variables.
53
6.2 Violating the Second Condition
In the previous section we studied the problem of identification under two sets of
assumptions. In both cases we could show that, for every parametrization
for model
, there exists an infinite number of distinct parametrizations for that generate the
same covariance matrix
. We conjecture that this will be the case whenever the
first condition of the method of Auxiliary Sets fails. Equivalently, we believe that the
conditions that characterize an Auxiliary Set are also necessary for linear independence
among the equations given by Wright’s decompositions.
Surprisingly, the situation seems to be very different when the second condition
fails. Next, we discuss an example in which every variable has a large enough auxiliary
set, but the corresponding dependence graph contains a cycle. A simple algebraic
analysis shows that, for almost every parametrization , there exists exactly one other
distinct parametrization
¼
such that
.
¼
At this point, we still cannot provide conditions for the existence of only a finite
number of parametrizations generating the same covariance matrix. So, we leave the
problem open. However, this seems to be an important question. According to the
definition in section 2.1, any such model would be considered non-identified. On the
other hand, if there exists only a small number of parametrizations compatible with the
observed correlations, the investigator can decide to proceed with the SEM analysis,
and decide which parametrization is more appropriate for the case in hand, based on
domain knowledge.
Consider the model illustrated by Figure 6.3. Applying Wright’s decomposition to
the correlations between each pair of variables, we obtain the following equations:
54
X
a α
β
Y
γ
b
Z
c
W
Figure 6.3: Figure
and , we can express parameters and in terms
Using the equations for
of :
and , to obtain expressions for pa-
Similarly, we use the equations for
rameters and in terms of :
Now, substituting the expressions for and in the equation for , we get
55
After some algebraic manipulation, we obtain the following quadratic equation in
terms of :
¾ ¾
¾
where it is possible to show that the coefficient of the quadratic term does not vanish.
This implies that there exist exactly two distinct parametrizations for this model
that generate the same covariance matrix. For example, it is not hard to verify that the
following two parametrizations generate the same covariance matrix:
56
CHAPTER 7
Instrumental Sets
So far we have concentrated our efforts on questions related to the entire model, such
as: ”Is the model identified?”, or ”Does the model impose any constraint on the correlations among observed variables?”.
In this chapter we focus our attention on the identification of specific subsets of parameters of the model. This goal is based on the observation that, even when we cannot
prove the identification of the model (or, when the model is actually non-identified),
we may still be able to show that the value of some parameters is uniquely determined
by the structural assumptions in the model and the observed data.
This type of result may be valuable in situations where, even though a model is
specified for all the observed variables, the main object of study consists of the relations among a small subset of those variables.
We also restrict ourselves to the identification of parameters associated with directed edges in the causal diagram. Those parameters represent the strength of (direct)
causal relationships, and are usually more important that the spurious correlations (associated with bidirected edges).
The main result of the chapter is a sufficient condition for the identification of the
parameters of a subset of directed incoming edges to a variable
with arrow head pointing to
(i.e., directed edges
). An important characteristic of this condition is that it
does not depend on the identification of any other parameter in the model.
57
This result actually generalizes the graphical version of the method of Instrumental
Variables [Pea00a]. According to this method, the parameter of edge
identified if we can find a variable
is
and a set of variables Ï satisfying specific d-
separation conditions. A more detailed explanation of this method is given in the next
section.
½ Ͻ Ï Our contribution is to extend this method to allow the use of multiple pairs to prove, simultaneously, the identification of the parameters of directed edges
½
. As we will see later, there exist examples in which our method of In-
, ...,
strumental Sets proves the identification of a subset of parameters, but the application
of the original method to the individual parameters would fail.
Before proceeding, we would like to call attention to a few aspects of the proof
technique developed in this chapter, which seem to be of independent interest. The
method imposes two main conditions for a set of variables to be an Instrumental Set.
First, we require the existence of a number of unblocked paths with specific properties.
This condition is very similar to the one in the GAV Criterion, and basically ensures
the linear independence of a system of equations. The second consists of d-separation
conditions between
and each , given the variables in Ï.
Now, to be able to take advantage of these d-separation conditions, we have to
work with partial correlations instead of the standard correlation coefficients used in
Chapter 3. The problem, however, is that no technique similar to Wright’s decomposition was available to express partial correlation as linear expressions on the parameters
of interest.
This difficulty is overcome by developing a new decomposition of the partial correlation
ables
into a linear expression in terms of the correlations among the vari ½ . This leads to a linear equation on parameters of the model,
½
by applying Wright’s decomposition to each of the correlation coefficients. The d-
58
separation conditions then imply that only the parameters under study appear in these
equations. The result finally follows by showing that all the coefficients in the linear
equations can be estimated from data.
7.1 Causal Influence and the Components of Correlation
This section introduces some concepts that will be used in the description of Instrumental Variables methods.
Intuitively, we say that variable
in the value of
has causal influence on variable
lead to changes in the value of
.
, if changes
This notion is captured in the
graphical language as follows.
An observed variable
has causal influence on variable
if there exists at least
one path in the causal diagram, consisting only of directed edges, going from
This path reflects the existence of variables
¯
appears in the r.h.s. of the equation for
¯ for
¯
½ to
.
such that
½ ;
, each appears on the r.h.s. of the equation for ·½;
appears in the r.h.s. of the equation for
.
Now, it is easy to see that any change on the value of
would propagate through the
’s and affect the value of .
Similarly, an error term has causal influence on variable if either there exists a directed path from to , or a path consisting of a bidirected edge concatenated with a directed path from to (the error term can be viewed as sitting on the top of bidirected edge ). Note that observed variables do not have
equations for the
causal influence on error terms, and the causal relationships among error terms are not
specified by the model.
59
We also say that the causal influence of
by a variable
) on is mediated
to (and the unblocked
(or some error term
, if appears in every directed path from
paths starting with bidirected edges, in the case of error terms).
The correlation between two observed variables
can then be explained as the
sum of two components:
¯ the causal influence of
on
; and,
¯ the correlation created by other variables (or error terms) which have causal
influence on both
and
(sometimes called spurious correlation).
The problem of identification consists in assessing the strength of the (direct)
causal influences of one variable on another, given the structure of the model and
the correlations between observed variables. This requires the ability to separate the
portions of the correlation contributed by each of the components above.
7.2 Instrumental Variable Methods
The traditional definition qualifies variable
and effect
if [Pea00a]:
as instrumental, relative to a cause
1. Every error term with causal influence on
of
2.
(i.e., uncorrelated);
is not independent of
not mediated by
is independent
.
and is created by the correlation between and , and the causal influence of on , represented by parameter
. This implies that ¡ . Property , allows us to obtain the value of by
writing . Hence, parameter is identified.
Property (1) implies that the correlation between
60
a
Z
c
X
Y
Figure 7.1: Typical Instrumental Variable
Figure 7.1 shows a typical example of the use of an instrumental variable. In this
has properties
model, it is easy to verify that variable
´½µ
and
´¾µ.
Note that this
causal diagram could be just a portion of a larger model. But, as long as properties ´½µ
and ´¾µ hold, variable
can be used to obtain the identification of parameter .
A generalization of the method of Instrumental Variables (IV) is offered through
the use of conditional IV’s. A conditional IV is a variable
that does not have prop-
erties ´½µ and ´¾µ above, but after conditioning on a set of variables
Ï such properties
hold (with independence statements replaced by conditional independence). When
such a pair ´
Ï Ï.
Ï is found, the causal influence of on is identified and given by
µ
Now, given the graphical interpretation of causal influence provided in Section 7.1,
we can express the properties of a conditional IV in graphical terms using d-separation
[Pea00b]:
be a model, and let denote its causal diagram. Variable is a condi if there exists a set of variables (possibly empty)
tional IV relative to edge Let
Ï
satisfying:
Ï contains only non-descendents of ;
2. Ï d-separates from in the subgraph obtained from by removing edge
1.
3.
;
Ï does not d-separate
from in .
61
W
W
c
Z
X
Y
Z
(a)
X
Y
(b)
Figure 7.2: Conditional IV Examples
As an example, let us verify if
is a conditional IV for the edge
model of Figure 7.2(a). The first step consists in removing the edge in the
to obtain
the subgraph , shown in Figure 7.2(b). Now, it is easy to see that, after conditioning
on variable ,
becomes d-separated from but not from in .
7.3 Instrumental Sets
Before introducing our method of Instrumental Sets, let us analyze an example where
the procedure above fails.
Consider the model in Figure 7.3. A visual inspection of this causal diagram
shows that variable
problem here is that
½
does not qualify as a conditional IV for the edge ½
½
cannot be d-separated from , no matter how we choose the
Ï. If Ï does not include variable , then the path
open. However, including variable in Ï would open the path
set
¾
½
¾
By symmetry, we also conclude that
edge ¾
. The
½
¾
½
¾
remains
.
does not qualify as a conditional IV for the
, and the situation is exactly the same for
¾.
Hence, the identification
of parameters ½ and ¾ cannot be proved using the graphical criterion for the method
of conditional IV.
However, observe the subgraph in Figure 7.3, obtained by deleting edges ½
62
Z1
Z2
a
Z2
Z2
Z1
a’
b’ b
X1
Z1
c1
X2
X1
X1
X2
c2
X2
c1
c2
Y
Y
Y
(a)
(b)
(c)
Figure 7.3: Simultaneous use of two IVs
. It is easy to verify that properties hold for ½ and ¾ in this
subgraph (by taking Ï for both ½ and ¾ ).
Thus, the idea is to let ½ and ¾ form an Instrumental Set, and try to prove the
identification of parameters ½ and ¾ simultaneously.
and
¾
Note that the d-separation conditions are not sufficient to guarantee the identifica-
, variables ½ and ¾ also become
½ and
¾ . However, parame-
tion of the parameters. In the model of Figure 7.3
d-separated from
after removing edges
½ and ¾ are non-identified in this case. Similarly to the GAV criterion, we have
to require the existence of specific paths between each of the ’s and .
ters
Next, we give a precise definition of Instrumental Sets, in terms of graphical conditions, and state the main result of this chapter.
Definition 4 (Instrumental Sets) Let
½ be an arbitrary subset of
. The set ½ is said to be an Instrumental Set relative to
and if there exist triplets ½ ½ ½ , . . . , such that:
parents of
(i) for
, variable and the elements of are non-descendents of ;
(ii) Let
be the subgraph obtained by deleting edges
63
½
from
Z1
Z
Z2
X1
X1
c1
c2
X2
Z
c1
Y
(a)
X2
X1
X2
c1
c2
X3
c2
Y
Y
(b)
(c)
Figure 7.4: More examples of Instrumental Sets
, the set Ï
the causal diagram of the model. Then, for
from in ; but
, (iii) for
(iv) for
Ï does not block path
d-separates
;
is an unblocked path between
and
including the edge
.
,
the only possible common variable in paths and (other than ) is
variable , and, in this case, both and Theorem 8 If
½ must point to ;
is an Instrumental Set relative to variable and set
of parents , then the parameters of edges ½ are identified, and
can be computed by solving a system of linear equations.
The proof the theorem is given in Appendix C.
Figure 7.4 shows more examples in which the method of conditional IV’s fails, but
our new criterion is able to prove the identification of parameters
model
’s.
is a bow-free model, and thus is completely identifiable.
In particular,
Model
illus-
trates an interesting case in which variable ¾ is used as the instrument for ½ while
is the instrument for
¾
. Finally, in model
64
,
we have an example
in which the parameter of edge
prove the identification of
¿
½ and ¾ .
is nonidentifiable, and still the method can
65
CHAPTER 8
Discussion and Future Work
An important contribution of this work was to offer a new approach to the problem
of identification in SEM, based on the graphical analysis of the causal diagram of the
model. This approach allowed us to obtain powerful sufficient conditions for identification, and a new method of computing correlation constraints.
This new approach has important advantages over existing techniques. Traditional
methods [Dun75, Fis66] are based on algebraic manipulation of the equations that
define the model. As a consequence, they have to handle the two types of structural
assumptions contained in the model (e.g., (a) which variables appear in each equation;
and, (b) how the error terms are correlated with each other) in different ways, which
makes the analysis more complicated. The language of graphs, on the other hand,
allows us to represent both types of assumptions in the same way, namely, by the
presence or absence of edges in the causal diagram. This permits a uniform treatment
in the analysis of identification.
Another important advantage of our approach relates to conditional independencies implied by the model. Most existing methods [Fis66, BT84, Pea00b] strongly
rely on conditional independencies to prove that a model is identified. As a consequence, such methods are not very informative when the model has few such relations.
Since we do not make direct us of conditional independencies in our derivations, we
can prove identification in many cases where most methods fail. Even the method of
instrumental sets, which involves a d-separation test, is not very sensitive to condi-
66
tional independencies, because the tests are performed on a potentially much sparser
graph.
The sufficient conditions for identification obtained in this work correspond to the
most general criteria current available to SEM investigators. To the best of our knowledge, this is also the first work that provides necessary conditions for identification.
Although these results answer many questions in practical applications of SEM, finding a necessary and sufficient condition for identification is still an outstanding open
problem from a theoretical perspective.
Another important question that remains open is the application of our graphical
methods to non-recursive models. Since the basic tool for our analysis (i.e., Wright’s
decomposition of correlations) only applies to recursive models, this problem may
require the development of new techniques.
67
APPENDIX A
(Proofs from Chapter 3)
Proof of Theorem 3:
Fix an arbitrary variable
with Auxiliary Set
½
½ be the set of parents of .
the parameter of edge .
Let
.
, let Æ denote
For ½ be the set of variables such that bidirected edge
belongs to . For , let denote the parameter of edge
. [Note that we may have .]
Similarly, let
Assume first that condition C1 can be applied to the pair
.
Let
be any
variable from . Then, we can write the decomposition of as:
Æ and we show that coefficients ’s and ’s are identified.
First, note that every unblocked path between
and including and edge .
Moreover, every unblocked path between and can be extendend by edge to give an unblocked path between and . These factes imply that is identified
and given by .
can be decomposed into an unblocked path between
Now, let
and and the edge be an arbitrary bidirected edge from .
, because there is only one unblocked path between
68
then
and including If
, because
which consists precisely of this edge. In the other case, we have that
any unblocked path between and including should consist of a chain
from to concatenated to the edge , but no such chain exists. Hence,
is
identified.
Assume now that condition C2 is applied to .
,
Let and let
½ be the set of parents of . For , let denote the parameter
of edge .
The idea is to replace the equation corresponding to the decomposition of in
, by the following linear combination:
Æ
This equation is also linearly independent of all the other ones in
(has to
show that). However, since many of the unblocked paths between and can be
obatined by concatenation of some edge and an unblocked path between and , many terms are cancelled out in the r.h.s.. We also observe that term in the
r.h.s. corresponds to pahts that do not included edges from , and is zero when
is a non-descendant of (?).
Next lemma proves the identification of coefficients
Lemma 7 Each coefficient
’s.
in the r.h.s. of (*) is identified and has value either 0 or
1.
Proof:
Fix a bidirected edge with parameter . We consider a few separate cases:
and is a non-descendant of .
Any unblocked path beween and including edge must consist of a
chain from to concatenaed with the edge . Since is a non-descendant
69
of
, no such chain exists. Thus, the coefficient of in the decomposition of is
zero. Since each parent of is also a non-descendant of
in the decomposition of , the coefficient of .
is also zero. Hence, Note that, in this case, there is only one unblocked path between and including
, which consists precisely of the edge . Thus, the coefficient of in the
decomposition of is 1. Again, each parent of is a non-descendant of , so
the coefficient of in the decomposition of is a descendant of
is zero. Hence, .
.
Let denote the set of chains from
to that do not include . (For non-
descendants of this is the set of all chains from
to )
Clearly, each of these chains must include a parent of . Thus, we can partition
the chains in according to the last parent of appearing in each of them. More
precisely, let denote the set of chains in that include edge . Now,we
can write
Observing that is the coefficient of in the decomposition of , and that
.
is the coefficient of in the decomposition of , we conclude that
¾
Now, it just remain to analyze the coefficients ’s. The next lemma takes care of
the case when is a non-descendant of .
Lemma 8 Let be a non-descendant of . Then, the coefficient of Æ in the decomposition of is identified and given by .
70
Proof:
Recall that
Æ
is th parameter of directed edge
. Then, the lemma follows
and can be extended by edge
to give an unblocked path between and , and those are all the unblocked
.
¾
paths between and including edge
The identification of coefficients ’s when is a non-descendant of , then follows from the identification of the ’s.
Now, consider the identification of the ’s when is a descendant of .
First, we observe that any unblocked path between and that contains a directed
edgepointing to can be decomposed into an unblocked path between and , and
(for some ). Thus, all terms associated with such path are cancelled
edge from the facts that every unblocked path between
out in (*).
¼ , since every unblocked path between and must either end with a directed edge pointing to , or be a bidirected edge between and . However, in the later case, would be in the Auxiliary Set for , which would
This immediately gives that
create a cycle in the dependency graph.
, each term in coefficient corresponds to a path consisting of
the concatenation of bidirected edge and a chain from to . Since and all , where is an intermediated variable in the chain from to , are
solved brfore , all paramters in this path are identified. But this imples that is
¾
identified.
For
71
APPENDIX B
(Proofs from Chapter 6)
Proof of Lemma 6:
Let
½
denote the parameters of the edges in .
Let denote the set of unblocked paths between and that do not include any
edge from . Then, ¼ is given by , and clearly does not depend on the
values of the
’s.
Now, let denote the set of unblocked paths between and that include
some edge from , with such edge appearing in subpath (the other case
is similar).
Let be an arbitrary path from . Then, it follows from assumption in the
theorem that must contain at least one variable from
first (i.e., closest to ) variable from
. We let denote the
to appear in , and divide the path into three
segments:
¯ ½
;
¯ ¾
¯ ¿
;
;
where both ½ and ¿ can be null, if or , respectively. We also note
that ½ is a chain from to (by assumption ), and ¿ is a chain from to ,
because every edge from points to and is unblocked.
72
Now, for each
, let denote the set of all chains from to that do not
¾
; and, let denote the set of all chains from to
contain any variable from
.
Also, for each , let denote the set of unblocked paths between and
.
Proposition 1 For any , let ½
, let , and let . Then, the
concatenation of , and gives a valid unblocked path between and if and
only if and do not have any intermediate variable in common.
Proof: If
and have a common variable, then clearly the concatenation gives an
invalid path.
In the other case, the proposition follows by observing that every intermediate
belongs to
(follows from assumption ), and that intermediate
(by definition, and by assumption ). ¾
variables in and cannot belong to
variable in
The arguments above give that we can write
where
½ ¾ ¿ ¾½ ¿ Clearly, each of the is independent of the values of parameters ’s. The lemma
follows by observing that ¾
.
73
Proof of Theorem 7:
¾
For each variable
, let denote the set composed by the following
and :
unblocked paths between
to whose intermediate variables belong to
a) all chains from
b) all unblocked paths that point to
which is a descendant of
For each variable
and do not have a variable and as a linear combination of the terms Let
;
point to .
We first show that the correlations of each variable
, let denote the set composed by all chains from
to whose intermediate variables belong to
Case 1:
.
and can be expressed
’s and ’s.
.
be the set of unblocked paths between
and
, so that .
. Thus, we just need to show that the
second term on the right-hand side is a linear combination of the ’s and ’s.
Clearly, we can write There are two types of paths in
1) chains from
:
to with some intermediate variable that does not belong to
;
2) unblocked paths
descendant of
that include a variable and point to .
Let us consider paths of type
Proposition 2 Every path which is a
first.
of type
74
contains a variable from
.
Proof: Assume that the proposition does not hold for some path of type
must include at least one variable that does not belong to
variable in
.
Then,
. Let be the last such
(i.e., closest to ), and let be the variable adjacent to in
.
is a chain from to , it follows from condition in the theorem that
Since
, and so we must have .
, which contradicts the initial assumption.
Fix a path
of type
,
But then edge witnesses that
¾
and let be the last variable from in .
Let be an arbitrary path from .
Proposition 3 The concatenation gives a valid chain from to .
Proof: Since is a chain from to , and is a chain from to , it follows
that the concatenation of
is a chain from to . Now, such chain must
be a valid one, otherwise we would have a contradicion to the recursiveness of the
¾
model.
It follows from the two propositions above that ´½µ can be expressed as
a linear combination of the ’s.
Now, we consider the paths of type
Proposition 4 Every path
.
of type
contains a variable such
that
(i) is a descendant of ;
(ii)
points to .
Proof: Assume that the proposition does not hold for some path of type
.
Then,
must include at least one variable that does not belong to and satisfies and above. Let be the last such variable in (i.e., closest to ), and let be the variable
75
. Since is unblocked and points to , we get that
must be a chain from to . This fact, together with condition gives
, and so we must have . But then edge implies that
that , which contradicts the initial assumption.
¾
Fix a path of type , and let be the last variable from in .
adjacent to
in
Proposition 5 Every intermediate variable in
Proof: Let
be the last variable in such that
is an ancestor of
.
is not a chain from
. Clearly, any intermediate variable in is an ancestor of . Moreover,
subpath must point to . Since is unblocked, it follows that must
be a chain from to . But this implies that and every intermediate variable in
is an ancestor of . Since is an ancestor of the proposition holds. ¾
Let be an arbitrary path from .
to
Proposition 6 The concatenation
and
.
Proof: Since
gives a valid unblocked path between
is a chain from to , the path obtained from the concatenation is
unblocked. The validity of the path follows from the facts that every intermediate variable in
of
is an ancestor of
, every intermediate variable in is a descendant
, and the recursiveness of the model.
¾
It follows from the two propositions above that
a linear combination of the
Case 2:
´¾µ can be expressed as
’s.
.
be the set of unblocked paths between and , so that .
Clearly, we can write . Thus, we only need to show
Let
76
that the second term on the right-hand side is a linear combination of the
’s and
’s.
There are two types of paths in
:
1) chains from to with some intermediate variable that does not belong to ;
2) unblocked paths between and that point to .
Let us consider first paths of type
Proposition 7 Every path .
of type
contains an intermediate variable
from .
Proof: (same as proof of proposition 2) ¾
Fix a path of type
,
and let be the last intermediate variable from
in . Let be an arbitrary path from .
Proposition 8 The concatenation of gives a valid chain from to .
Proof: (same as proof of proposition 3) ¾
It follows from the two propositions above that
a linear combination of the
´½µ can be expressed as
’s.
Now, we consider paths of type
,
and further divide them into:
a) paths that have an intermediate variable from a chain from to ;
b) all the remaining paths.
77
, such that is
¾
Let
,
be of type
and let be the first variable from in
(i.e., closest to ). Let be an arbitrary path from .
Proposition 9 The concatenation
gives a valid unblocked path between
and .
Proof: Since
is a chain from to , the path obtained from the con-
catenation is unblocked. Now, let be an intermediate variable in
and .
Then,
is a descendant of . But, by definition, no path in contain such variables. Thus, it follows that the concatenation produces a valid path.
¾
It follows from the two propositions above that ´¾µ can be expressed
as a linear combination of the ’s.
Now, consider paths of type
Proposition 10 Every path
variable from such that
Proof: Let
.
be a path of type
,
be of type
contains an intermediate
is a chain from to .
and let be the first variable in
is not a chain. Clearly, such variable must exist, otherwise
such that
would be a
chain from to that does not include any edge from . Moreover,
points to , and so
If must be a chain from to .
, then we are done. So, we consider the two remaining cases.
Assume that .
Then, the proposition easily follows from the facts that
is a chain from to , and that there is no edge pointing to , where
and .
Finally, if , then let be the variable adjacent to in
Since edge points to , it follows that 78
.
.
But cannot belong to
.
,
because
is a chain from to , and then
Thus, we must have , and condition of the theorem gives that
is a directed edge from to . Hence,
Fix a path
of type
would be of type
,
¾
is a chain from to .
and let be the last intermediate variable
from in . Let be an arbitrary path from .
Proposition 11 The concatenation gives a valid unblocked path between
and .
Proof: Since is a chain from to , the path obtained from the concatenation is
unblocked. Now, let be the last variable in
such that
from to (if there is no such variable, we take of type
, is a chain
). It follows that, since
does not contain any variable from
intermediate variable in
.
is
Moreover, every
is an ancestor of . But, by definition, intermediate
variables in any path from must belong to
and be descendants of .
¾
Hence, the concatenation produces a valid path.
It follows from the two propositions above that ´¾µ can be expressed
as a linear combination of the ’s.
Case 3: .
Let denote the set of unblocked paths between and , so that There are two types of paths in :
1) chains from to ;
2) unblocked paths between and that point to .
Let us consider paths of type
first.
79
.
Proposition 12 Every path
¾
of type
contains an
intermediate variable from
.
Proof: Follows from the fact that there is no edge pointing to , where and ¾
.
Fix a path
of type
,
and let be the last intermediate variable from in . Let be an arbitrary path from .
Proposition 13 The concatenation of
gives a valid chain from to .
Proof: (same as proof of proposition 3) ¾
It follows from the two propositions above that ´½µ can be expressed as a
linear combination of the ’s.
Now, we consider paths of type
a) paths
,
and further divide them into:
that have an intermediate variable from
such that
is a chain from to ;
b) all the remaining paths.
Fix a path
of type
,
and let be the first variable from
in
. Let be an arbitrary path from .
Proposition 14 The concatenation gives a valid unblocked path between
and .
Proof: (same as proof of proposition 9) ¾
It follows from the two propositions above that linear combination of the ’s.
Now, we consider paths of type
.
80
can be expressed as a
´¾ µ Proposition 15 Every path
from such that
¾
of type
contains an intermediate variable is a chain from to .
Proof: (same as proof of proposition 10) ¾
Fix a path
of type
,
and let be the last intermediate variable from
in . Let be an arbitrary path from .
Proposition 16 The concatenation gives a valid unblocked path between
and .
Proof: (same as proof of proposition 11) ¾
It follows from the two propositions above that linear combination of the ’s.
81
can be expressed as a
´¾ µ ¾
APPENDIX C
(Proofs from Chapter 7)
C.1
Preliminary Results
C.1.1 Partial Correlation Lemma
Next lemma provides a convenient expression for the partial correlation coefficient of
½
and
, given
, denoted Lemma 9 The partial correlation where
and
. The proof is given in Section C.3.
can be expressed as the ratio:
are functions of the correlations among
(C.1)
, satisfying the
following conditions:
.
is linear on the correlations
(i)
(ii)
, with no constant
term.
(iii) The coefficients of relations among the variables , in are polynomials on the cor-
. Moreover, the coefficient of has
the constant term equal to 1, and the coefficients of
the correlations , with no constant term.
82
, are linear on
½ (iv)
½
¾
½ , is a polynomial on the correlations among the variables
½ , ,
, with constant term equal to 1.
C.1.2 Path Lemmas
The following lemmas explore some consequences of the conditions in the definition
of Instrumental Sets.
Lemma 10 W.l.o.g., we may assume that, for , paths and do not
have any common variable other than (possibly) .
Proof: Assume that paths and have some variables in common, different from .
Let be the closest variable to in path which also belongs to path .
We show that after replacing triple
tions
still hold.
Ï by triple
Ï that subpath must point to .
unblocked, subpath must be a directed path from to .
It follows from condition
,
condi-
Since
is
Now, variable cannot be a descendent of , because is a directed path
from to , and is a non-descendent of . Thus, condition
still holds.
Consider the causal graph . Assume that there exists a path between and Ï does not d-separate from in . Since is a directed
path from to , we can always find another path witnessing that Ï does not d
witnessing that
(for example, if and do not have any variable
in common other than , then we can just take their concatenation). But this is a
contradiction, and thus it is easy to see that condition still holds.
separate
from
Condition
in
follows from the fact that and point to .
In the following, we assume that the conditions of lemma 10 hold.
83
¾
Lemma 11 For all
,
there exists no unblocked path between
different from , which includes edge
½ and
,
and is composed only by edges from
.
Proof: Let
be an unblocked path between and
, different from
, and assume
that is composed only by edges from ½ .
According to condition , if appears in some path , with , then it must
be that . Thus, must start with some edges of .
Since is different from , it must contain at least one edge from ½ ½ . Let
½ ¾ denote the first edge in which does not belong to .
From lemma 10, it follows that variable ½ must be a for some condition , both subpath ½ and edge ½ ¾ must point to
implies that is blocked by ½ , which contradicts our assumptions.
, and by
½ .
But this
¾
The proofs for the next two lemmas are very similar to the previous one, and so are
omitted.
Lemma 12 For all
,
there is no unblocked path between and some composed only by edges from ½ .
Lemma 13 For all
edge C.2
, with ,
there is no unblocked path between and including
, composed only by edges from ½ .
Proof of Theorem 8
C.2.1 Notation and Basic Linear Equations
Fix a variable in the model. Let
½ be the set of all non-descendents
of which are connected to by an edge (directed, bidirected, or both). Define the
84
following set of edges with an arrowhead at
Note that for some :
there may be more than one edge between and (one
directed and one bidirected). Thus, . Let ½ , , denote
the parameters of the edges in .
It follows that edges ½ , belong to , because ½ ,
are clearly non-descendents of . W.l.o.g., let be the parameter of edge ,
, and let be the parameters of the remaining edges in .
Let be any non-descendent of . Wright’s equation for the pair , is given
by
paths (C.2)
where each term corresponds to an unblocked path between and . Next
lemma proves a property of such paths.
Lemma 14 Let be a variable in a recursive model, and let be a non-descendent
of . Then, any unblocked path between and must include exactly one edge from
.
Lemma 14 allows us to write Eq. (C.2) as
(C.3)
Thus, the correlation between and can be expressed as a linear function of the
parameters , with no constant term.
Consider a triple Ï
, and let
Ï
½ 1 . From lemma 9, we
can express the partial correlation of and given
½ To
simplify the notation, we assume that
Ï
, for 85
Ï as:
Ï
½ ½ ½ where function is linear on the correlations
(C.4)
, ½ , , , and is a
function of the correlations among the variables given as arguments. We abbreviate
½ by Ï, and by Ï.
½
We have seen that the correlations
, ½ , , , can be expressed as
linear functions of the parameters . Since is linear on these correlations,
it follows that we can express as a linear function of the parameters .
Formally, by lemma 9, Also, for each Ï
¾
¼
Ï can be written as:
½ ½ (C.5)
Ï, we can write:
(C.6)
Replacing each correlation in Eq.(C.5) by the expression given by Eq. (C.6), we
obtain
Ï (C.7)
where the coefficients ’s are given by:
(C.8)
Lemma 15 The coefficients in Eq. (C.7) are identically zero.
Proof: The fact that
Ï d-separates from in , implies that
Ï in any
probability distribution compatible with ([Pea00a], pg. 142). Thus, Ï
must vanish when evaluated in . But this implies that the coefficient of each of the
’s in Eq. (C.7) must be identically zero.
86
Now, we show that the only difference between evaluations of
Ï
on the
causal graphs and , consists on the coefficients of parameters ½ .
First, observe that coefficients ¼ are polynomials on the correlations among
the variables ½ . Thus, they only depend on the unblocked paths between such variables in the causal graph. However, the insertion of edges ½
..., ,
, in does not create any new unblocked path between any pair of
½ (and obviously does not eliminate any existing one). Hence, the coefficients ¼ have exactly the same value in the evaluations of on and .
Now, let be such that , and let edges ½
Ï
Ï . Note that the insertion of
, . . . , , in does not create any new unblocked path between
and including the edge whose parameter is (and does not eliminate any existing
one). Hence, coefficients , ,
have exactly the same value on and .
From the two previous facts, we conclude that, for , the coefficient of in
Ï on and have exactly the same value, namely zero.
Ï does not vanish when evaluated on .
Next, we argue that
Finally, let be such that , and let Ï . Note that there is no
the evaluations of unblocked path between and in including edge , because this edge
does not exist in . Hence, the coefficient of in the expression for the correlation
on must be zero.
On the other hand, the coefficient of in the same expression on is not necessarily zero. In fact, it follows from the conditions in the definition of Instrumental sets
that, for , the coefficient of contains the term .
From lemma 15, we get that eters ½ .
Ï
87
¾
is a linear function only on the param-
C.2.2 System of Equations Ï
,
linear equations on the parameters ½ :
Rewriting Eq.(C.4) for each triple
we obtain the following system of
½ ½ Ͻ ½ Ͻ
½ ½ Ͻ ½ Ͻ
Ï Ï
Ï
Ï
where the terms on the right-hand side can be computed from the correlations among
the variables ½ , estimated from data.
Our goal is to show that
identification of
½ can be solved uniquely for the ’s, and so prove the
. Next lemma proves an important result in this direction.
Let denote the matrix of coefficients of .
is a non-trivial polynomial on the parameters of the model.
Lemma 16
Proof: From Eq.(C.8), we get that each entry of is given by
where is the coefficient of (or , if
Ï
),
in the linear expression for
in terms of correlations (see Eq.(C.5)); and is the coefficient of
in the expression for the correlation in terms of the parameters (see
Eq.(C.6)).
of lemma 9, we get that ¼ has constant term equal to 1. Thus,
we can write ¼ ¼ , where ¼ represent the remaining terms of ¼ .
From property
88
Also, from condition
Thus, we can write ¼
of Theorem 8, it follows that
¼
¼
contains term
.
, where ¼ represents all the remaining terms
of ¼ .
Hence, a diagonal entry of , can be written as
¼ Now, the determinant of
,
¼ ¼
(C.9)
is defined as the weighted sum, for all permutations of
of the product of the entries selected by (entry is selected by permuta-
tion if the
element of is ), where the weights are or
,
depending on the
parity of the permutation. Then, it is easy to see that the term
appears in the product of permutation ,
which selects all the diagonal
entries of .
We prove that product of permutation
does not vanish by showing that appears only once in the
,
and that does not appear in the product of any
other permutation.
Before proving those facts, note that, from the conditions of lemma 10, for ,
paths and have no edge in common. Thus, every factor of is distinct
from each other.
Proposition 17 Term appears only once in the product of permutation
Proof: Let be a term in the product of permutation
.
.
Then, has one factor
corresponding to each diagonal entry of .
A diagonal entry of can be expressed as a sum of three terms (see Eq.(C.9)).
89
Let be such that for all
, the factor of
from the first term of (i.e., corresponding to entry
comes
¼ ).
Assume that the factor of corresponding to entry comes from the second term
¼ ¼ ). Recall that each term in ¼ corresponds to an unblocked path beof (i.e., tween and , different from , including edge . However, from lemma 11,
any such path must include either an edge which does not belong to any of ½ ,
or an edge which appears in some of . In the first case, it is easy to see that
must have a factor which does not appear in . In the second, the parameter of an
edge of some , , must appear twice as a factor of , while it appears only once
in . Hence, and are distinct terms.
Now, assume that the factor of
term of
(i.e.,
pression for
). Recall that
Ï . From property
on the correlations
corresponding to entry
½
comes from the third
is the coefficient of
of lemma 9,
in the ex-
is a linear function
, with no constant term. Moreover, correlation
can be expressed as a sum of terms corresponding to unblocked paths between
and . Thus, every term in has the term of an unblocked path between and
some as a factor. By lemma 12, we get that any such path must include either an
edge that does not belong to any of , or an edge which appears in some of
. As above, in both cases and must be distinct terms.
After eliminating all those terms from consideration, the remaining terms in the
product of
are given by the expression:
¼ Since ¼ is a polynomial on the correlations among variables
½ ,
with no
constant term, it follows that appears only once in this expression. ¾
Proposition 18 Term does not appear in the product of any permutation other than
90
.
Proof: Let
be a permutation different from
, and let
be a term in the
product of .
Let be such that, for all before, for ,
selects the diagonal entry in the row of . As
, if the factor of corresponding to entry does not come from the
first term of (i.e., ¼ ), then must be different from . So, we assume
that this is the case.
Assume that
does not select the diagonal entry
some entry , with of
. Then,
must select
. Entry can be written as:
¼ ¼ Assume that the factor of corresponding to entry comes from term ¼ ¼ . Recall
that each term in
edge ¼
corresponds to an unblocked path between
and
including
. Thus, in this case, lemma 13 implies that and are distinct terms.
Now, assume that the factor of
corresponding to entry
comes from term
. Then, by the same argument as in the previous proof, terms
and
are distinct. ¾
Hence, term is not cancelled out and the lemma holds.
C.2.3 Identification of Lemma 16 gives that
¾
is a non-trivial polynomial on the parameters of the
model. Thus, only vanishes on the roots of this polynomial. However, [Oka73]
has shown that the set of roots of a polynomial has Lebesgue measure zero. Thus,
system has unique solution almost everywhere.
It just remains to show that we can estimate the entries of the matrix of coefficients
91
of system from data.
of matrix :
Let us examine again an entry
From condition of lemma 9, the factors in the expression above are polynomials on the correlations among the variables ½ , and thus can be estimated
from data.
Now, recall that ¼ is given by the sum of terms corresponding to each unblocked
path between and including edge . Precisely, for each term in ¼ , there
is an unblocked path between and including edge , such that is the
product of the parameters of the edges along , except for .
However, notice that for each unblocked path between and including edge
, we can obtain an unblocked path between and , by removing edge
. On the other hand, for each unblocked path between and we can
obtain an unblocked path between and , by extending it with edge .
Thus, factor ¼ is nothing else but . It is easy to see that the same argument
holds for with . Thus, , .
Hence, each entry of matrix can be estimated from data, and we can solve the
system of equations to obtain the parameters .
C.3
Proof of Lemma 9
Functions and are defined recursively. For ½ ¾
92
,
, we have
For
¾ ½
¾
Using induction and the recursive definition of Now, we prove that functions and This is clearly the case for
½ , it is easy to check that:
¾ as defined satisfy the properties .
. Now, assume that the properties are satisfied for all
.
Property follows from the definition of and the assumption that
it holds for Now, .
is linear on the correlations is equal to .
Since
, it is linear on the
correlations . Thus, is linear on , with no
constant term, and property holds.
Terms and are polynomials on
the correlations among the variables . Thus, the first part of property holds. For the second part, note that correlation only appears in the first term of
, and by the inductive hypothesis has constant
term equal to 1. Also, since and the later one
93
is linear on the correlations
½¾ ½¿ ½
, we must have that the coefficients of
´½ ¾ µ must be linear on these correlations. Hence, property ´ µ holds.
Finally, for property ´ µ, we note that by the inductive hypothesis, the first term
of ´ ¾
´ ¿ ¾
½µµ
has constant term equal to 1, and the second term has no
¾
constant term. Thus, property ´ µ holds.
94
R EFERENCES
[BMW94] P.A. Bekker, A. Merckens, and T.J. Wansbeek. Identification, equivalent
models, and computer algebra. Academic Press, Boston, 1994.
[Bol89]
K.A. Bollen. Structural Equations with Latent Variables. John Wiley,
New York, 1989.
[BP02]
C. Brito and J. Pearl. “A Graphical Criterion for the Identification of
Causal Effects in Linear Models.” In Proceedings of the AAAI Conference,
pp. 533–538, Edmonton, 2002.
[BT84]
R.J. Bowden and D.A. Turkington. Instrumental Variables. Cambridge
University Press, Cambridge, England, 1984.
[BW80]
P. M. Bentler and D. G. Weeks. “Linear structural equations with latent
variables.” Psychometrika, pp. 289–307, 1980.
[CCK83] J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. A. Tukey. Graphical
Methods for Data Analysis. Wadsworth, Belmont, CA, 1983.
[CCR90]
T. Cormen, C.Leiserson, and R. Rivest. Introduction to Algorithms. The
MIT Press, 1990.
[Dun75]
O.D. Duncan. Introduction to Structural Equation Models. Academic
Press, New York, 1975.
[Fis66]
F.M. Fisher. The Identification Problem in Econometrics. McGraw-Hill,
New York, 1966.
[Ger83]
V. Geraci. “Errors in variables and the indivicual structure model.” International Economic Review, 1983.
[HD96]
C. Huang and A. Darwiche. “Inference in belief networks: A procedural
guide.” International Journal of Approximate Reasoning, 15(3):225–263,
1996.
[KKB98] D.A. Kenny, D.A. Kashy, and N. Bolger. “Data Analysis in Social Psychology.” In D. Gilbert, S. Fiske, and G. Lindzey, editors, The Handbook
of Social Psychology, pp. 233–265. McGraw-Hill, Boston, MA, 1998.
[KS80]
R. Kindermann and J.L. Snell. Markov Random Fields and their Applications. American Mathematical Society, Providence, RI, 1980.
95
[LS88]
S. L. Lauritzen and D. J. Spiegelhalter. “Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems.”
Journal of the Royal Statistical Society (B), 50:157–224, 1988.
[McD97] R. McDonald. “Haldane’s lungs: A case study in path analysis.” Multiv.
Behav. Res., pp. 1–38, 1997.
[Oka73]
M. Okamoto. “Distinctness of the Eigenvalues of a Quadratic form in a
Multivariate Sample.” Annals of Statistics, 1(4):763–765, July 1973.
[Pea88]
J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA, 1988.
[Pea98]
J. Pearl. “Graphs, causality, and structural equation models.” Sociological
Methods and Research, 27(2):226–284, 1998.
[Pea00a]
J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000.
[Pea00b]
J. Pearl. “Parameter identification: A new perspective.” Technical Report
R-276, UCLA, August 2000.
[Rig95]
E.E. Rigdon. “A necessary and sufficient identification rule for structural models estimated in practice.” Multivariate Behavioral Research,
30(3):359–383, 1995.
[SRM98] P. Spirtes, T. Richardson, C. Meek, R. Scheines, , and C. Glymour. “Using path diagrams as a structural equation modelling tool.” Socioligical
Methods and Research, pp. 182–225, November 1998.
[TEM93] K. Tambs, L.J. Eaves, T. Moum, J. Holmen, M.C. Neale, S. Naess, and
P.G. Lund-Larsen. “Age-specific genetic effects for blood pressure.” Hypertension, 22:789–795, 1993.
[WL83]
N. Wermuth and S.L. Lauritzen. “Graphical and recursive models for contingency tables.” Biometrika, 70:537–552, 1983.
[Wri34]
S. Wright. “The method of path coefficients.” Ann. Math. Statist., 5:161–
215, 1934.
96
Download