Graphical Methods for Identi cation in Structural Equation Models Carlos Eduardo Fisch de Brito June 2004 Technical Report R-314 Cognitive Systems Laboratory Department of Computer Science University of California Los Angeles, CA 90095-1596, USA This report reproduces a dissertation submitted to UCLA in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science. This work was supported in part by AFOSR grant #F49620-01-1-0055, NSF grant #IIS-0097082, California State MICRO grant #00-077, and MURI grant #N0001400-1-0617. 1 ­c Copyright by Carlos Eduardo Fisch de Brito 2004 2004 To my parents and wife iii TABLE OF C ONTENTS 1 Introduction 1 1.1 Data Analysis with SEM and the Identification Problem . . . . . . . . 3 1.2 Overview of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Problem Definition and Background 10 2.1 Structural Equation Models and Identification . . . . . . . . . . . . . 10 2.2 Graph Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Wright’s Method of Path Analysis . . . . . . . . . . . . . . . . . . . 18 3 Auxiliary Sets for Model Identification 20 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Basic Systems of Linear Equations . . . . . . . . . . . . . . . . . . . 21 3.3 Auxiliary Sets and Linear Independence . . . . . . . . . . . . . . . . 23 3.4 Model Identification Using Auxiliary Sets . . . . . . . . . . . . . . . 28 3.5 Simpler Conditions for Identification . . . . . . . . . . . . . . . . . . 31 3.5.1 Bow-free Models . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5.2 Instrumental Condition . . . . . . . . . . . . . . . . . . . . . 33 4 Algorithm 36 5 Correlation Constraints 43 5.1 Obtaining Constraints Using Auxiliary Sets . . . . . . . . . . . . . . iv 45 6 Sufficient Conditions For Non-Identification 49 6.1 Violating the First Condition . . . . . . . . . . . . . . . . . . . . . . 50 6.2 Violating the Second Condition . . . . . . . . . . . . . . . . . . . . . 54 57 7 Instrumental Sets 7.1 Causal Influence and the Components of Correlation . . . . . . . . . 59 7.2 Instrumental Variable Methods . . . . . . . . . . . . . . . . . . . . . 60 7.3 Instrumental Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 8 Discussion and Future Work 66 A (Proofs from Chapter 3) 68 B (Proofs from Chapter 6) 72 C (Proofs from Chapter 7) 82 C.1 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 C.1.1 Partial Correlation Lemma . . . . . . . . . . . . . . . . . . . 82 C.1.2 Path Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . 83 C.2 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 C.2.1 Notation and Basic Linear Equations . . . . . . . . . . . . . 84 C.2.2 System of Equations . . . . . . . . . . . . . . . . . . . . . 88 C.2.3 Identification of . . . . . . . . . . . . . . . . . . . 91 C.3 Proof of Lemma 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 95 References ½ v L IST OF F IGURES 1.1 Smoking and lung cancer example . . . . . . . . . . . . . . . . . . . 3 1.2 Model for correlations between blood pressures of relatives . . . . . . 4 1.3 McDonald’s regressional hierarchy examples . . . . . . . . . . . . . 8 2.1 A simple structural model and its causal diagram . . . . . . . . . . . 14 2.2 A causal diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 A causal diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Wright’s equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Wright’s equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Example illustrating condition ´ in the G criterion . . . . . . . . . 25 3.4 Example illustrating rule R¾´µ. . . . . . . . . . . . . . . . . . . . . 30 3.5 Example illustrating Auxiliary Sets method. . . . . . . . . . . . . . . 32 3.6 Example of a Bow-free model. . . . . . . . . . . . . . . . . . . . . . 33 3.7 Examples illustrating the instrumental condition. . . . . . . . . . . . 35 4.1 A causal diagram and the corresponding flow network . . . . . . . . . 42 5.1 D-separation conditions and correlation constraints. . . . . . . . . . . 44 5.2 Example of a model that imposes correlation constraints . . . . . . . 47 6.1 Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.2 Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 µ vi 6.3 Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.1 Typical Instrumental Variable . . . . . . . . . . . . . . . . . . . . . . 61 7.2 Conditional IV Examples . . . . . . . . . . . . . . . . . . . . . . . . 62 7.3 Simultaneous use of two IVs . . . . . . . . . . . . . . . . . . . . . . 63 7.4 More examples of Instrumental Sets . . . . . . . . . . . . . . . . . . 64 vii V ITA 1971 Born, Mogi das Cruzes-SP, Brazil. 1997 B.S. Computer Science, Universidade Federal do Rio de Janeiro, Brazil. 1999 M.S. Computer Science, Universidade Federal do Rio de Janeiro, Brazil. P UBLICATIONS Brito C. and Gafni, E. and Vaya, S. An Information Theoretic Lower Bound for Broadcasting in Radio Networks. In STACS’04 - 21st International Symposium on Theoretical Aspects of Computer Science, Montpellier, France. Brito, C. and Koutsoupias, E. and Vaya, S. Competitive Analysis of Organization Networks, Or Multicast Acknolwedgement: How Much to Wait? In SODA’04 - ACMSIAM Symposium on Discrete Algorithms, New Orleans, LA, USA Brito, C. A New Approach to the Identification Problem. In SBIA’02 - Brazillian Symposium in Artificial Intelligence, Porto de Galinhas/Recife, Brazil. Brito, C. and Pearl, J. Generalized Instrumental Variables. In UAI’02 - The Eighteenth Conference on Uncertainty in Artificial Intelligence, Edmonton, Alberta, Canada. viii Brito, C. and Pearl, J. A Graphical Criterion for the Identification of Causal Effects in Linear Models. In AAAI’02 - The Eighteenth National Conference on Artificial Intelligence, Edmonton, Alberta, Canada. Brito, C. and Pearl, J. A New Identification Condition for Recursive Models with Correlated Errors. In Structural Equation Models, 2002. Brito, C. and Silva, E. S. and Diniz, M. C. and Leao, R. M. M. Analise Transiente de Modelos de Fonte Multimidia. In SBRC’00 - 18o Simposio Brasileiro de Redes de Computadores, Belo Horizonte, MG, Brazil. Brito, C. and Moraes, R. and Oliveira, D. and Silva, E. S. Comunicacao Multicast Confiavel na Implementacao de uma Ferramenta Whiteboard. In SBRC’99 - 17o Simposio Brasileiro de Redes de Computadores, Salvador, BA, Brazil. ix A BSTRACT OF THE D ISSERTATION Graphical Methods for Identification in Structural Equation Models by Carlos Eduardo Fisch de Brito Doctor of Philosophy in Computer Science University of California, Los Angeles, 2004 Professor Judea Pearl, Chair Structural Equation Models (SEM) is one of the most important tools for causal analysis in the social and behavioral sciences (e.g., Economics, Sociology, etc). A central problem in the application of SEM models is the analysis of Identification. Succintly, a model is identified if it only admits a unique parametrization to be compatible with a given covariance matrix (i.e., observed data). The identification of a model is important because, in general, no reliable quantitative conclusion can be derived from non-identified models. In this work, we develop a new approach for the analysis of identification in SEM, based on graph theoretic techniques. Our main result is a general sufficient criterion for model identification. The criterion consists of a number of graphical conditions on the causal diagram of the model. We also develop a new method for computing correlation constraints imposed by the structural assumptions, that can be used for model testing. Finally, we also provide a generalization to the traditional method of Instrumental Variables, through the concept of Instrumental Sets. x CHAPTER 1 Introduction Structural Equation Models (SEM) is one of the most important tools for causal analysis in the social and behavioral sciences [Bol89, Dun75, McD97, BW80, Fis66, KKB98]. Although most developments in SEM have been done by scientists in these areas, the theoretical aspects of the model provide interesting problems that can benefit from techniques developed in computer science. In a structural equation model, the relationships among a set of observed variables are expressed by linear equations. Each equation describes the dependence of one variable in terms of the others, and contains a stochastic error term accounting for the influence of unobserved factors. Independence assumptions on pairs of error terms are also specified in the model. An attractive characteristic of SEM models is their simple causal interpretation. Specifically, the linear equation encodes two distinct assumptions: (1) the possible existence of (direct) causal influence of on (direct) causal influence on ; and, (2) the absence of of any variable that does not appear on the right-hand side of the equation. The parameter quantifies the (direct) causal effect of on . That is, the equation claims that a unit increase in would result in units increase of , assuming that everything else remains the same. Let us consider a simple example taken from [Pea00a]. This model investigates the relations between smoking ( ) and lung cancer ( ), taking into consideration the 1 amount of tar ( ) deposited in a person’s lungs, and allowing for unobserved factors to affect both smoking and cancer . This situation is represented by the following equations: ½ ¾ ¿ ½ ¾ ¾ ¿ ½ ¿ The first three equations claim, respectively, that the level of smoking of a person depends only on factors not included in the model, the amount of tar deposited in the lungs depends on the level of smoking as well as external factors, and the level of cancer depends on the amount of tar in the lungs and external factors. The remaining equations say that the external factors that cause tar to be accumulated in the lungs are independent of the external factors that affect the other variables, but the external factors that have influence on smoking and cancer may be correlated. All the information contained in the equations can be expressed by a graphical representation, called causal diagram, as illustrated in Figure 1.1. We formally define the model and its graphical representation in section 2.1. Figure 1.2 shows a more elaborate model used to study correlations between relatives for systolic and diastolic blood pressures [TEM93]. The squares represent blood pressures for each type of individual, and the circles represent genetic and environ- the additive genetic contribution of the polygenes; D the dominance genetic contributions; all the environmental factors; and those mental causes of variation: environmental components only shared by siblings of the same sex. 2 γ a b X Z Y (smoking) (tar) (cancer) Figure 1.1: Smoking and lung cancer example 1.1 Data Analysis with SEM and the Identification Problem The process of data analysis using Structural Equation Models consists of four steps [KKB98]: 1. Specification: Description of the structure of the model. That is, the qualitative relations among the variables are specified by linear equations. Quantitative information is generally not specified and is represented by parameters. 2. Identification: Analysis to decide if there is a unique valuation for the parameters that make the model compatible with the observed data. The identification of a SEM model is formally defined in Section 2.1. 3. Estimation: Actual estimation of the parameters from statistical information on the observed variables. 4. Evaluation of fit: Assessment of the quality of the model as a description of the data. In this work, we will concentrate on the problem of Identification. That is, we leave the task of model specification to other investigators, and develop conditions to decide if these models are identified or not. The identification of a model is important because, in general, no reliable quantitative conclusion can be derived from a 3 Figure 1.2: Model for correlations between blood pressures of relatives 4 non-identified model. The question of identification has been the object of extensive research [Fis66], [Dun75], [Pea00a], [McD97], [Rig95]. Despite all this effort, the problem still remains open. That is, we do not have a necessary and sufficient condition for model identification in SEM. Some results are available for special classes of models, and will be reviewed in Section 1.3. In our approach to the problem, we state Identification as an intrinsic property of the model, depending only on its structural assumptions. Since all such assumptions are captured in the graphical representation of the model, we can apply graph theoretic techniques to study the problem of Identification in SEM. Thus, our main results consist of graphical conditions for identification, to be applied on the causal diagram of the model. As a byproduct of our analysis, estimation methods will also be provided, as well as methods to obtain constraints imposed by the model on the distribution over the observed variables, thus addressing questions in steps 3 and 4 above. 1.2 Overview of Results The central question studied in this work is the problem of identification in recursive SEM models, that is, models that do not contain feedbacks (see Section 2.1 for more details). The basic tool used in the analysis is Wright’s decomposition, which allows us to express correlation coefficients as polynomials on the parameters of the model. The important fact about this decomposition is that each term in the polynomial corresponds to a path in the causal diagram. Based on the observation that these polynomials are linear on specific subsets of parameters, we reduce the problem of Identification to the analysis of simple systems of linear equations. As one should expect, conditions for linear independence of those 5 systems (which imply a unique solution and thus identification of the parameters), translate into graphical conditions on the paths of the causal diagram. Hence, the fundamental step in our method of Auxiliary Sets for model identification (see Chapter 3) consists in finding, for each variable , a set of variables and each variable in specific restrictions on the paths between with . As it turns out, the restrictions that allow us to obtain the maximum generality from the method are not so easy to verify by visual inspection of the causal diagram. To overcome this problem, we developed an algorithm that searches for an Auxiliary Set for a given variable (see Chapter 4). We also provide the Bow-free condition and the Instrumental condition (Section 3.5), which are special cases of the general method, but have straightforward application. The machinery developed for the method of Auxiliary Sets can also be used to compute correlation constraints. These constraints are implied by the structural assumptions, and allow us to test the model [McD97]. The basic idea is very simple. While in the case of Identification we take advantage of linearly independent equations, it follows that correlation constraints are immediately obtained from linearly dependent equations (see Chapter 5). Despite its simplicity, this is a very powerfull method for computing correlation constraints. The main goal of this research is to solve the problem of Identification for recursive models. Namely, to obtain a necessary and sufficient condition for model identification. The sufficient condition provided by the method of Auxiliary Sets is very general, and in Chapter 6 we present our initial efforts on our attempt to prove that it is also necessary for identification. Finally, in Chapter 7, we consider the problem of parameter identification. This problem is motivated by the observation that even on non-identified models there may exist some parameters whose value is uniquely determined by the structural assump- 6 tions and data. We provide a solution based on the concept of Instrumental Sets, which generalizes the traditional method of Instrumental Variables [BT84]. The criterion for parameter identification involves d-separation conditions, and the proofs required the development of new techniques of independent interest. 1.3 Related Work The use of graphical models to represent and reason about probability distributions has been extensively studied [WL83, CCK83, Pea88]. In many areas such models have become the standard representation, e.g., Bayesian networks for dealing with uncertainty in Artificial Intelligence [Pea88], and Markov random fields for speech recognition and coding [KS80]. Some reasons for the success of the language of graphs in many domains are: it provides a compact representation for a large class of probability distributions; it is convenient to describe dependencies among variables; and it consists of a natural language for causal modeling. Besides these advantages, many methods and techniques were developed to reason about probability distributions directly at the level of the graphical representation [LS88, HD96]. An example of such a technique is the d-separation criterion [Pea00a], which allows us to read off conditional independencies among variables by inspecting the graphical representation of the model. The Identification problem has been tackled in the past half century, primarily by econometricians and social scientists [Fis66, Dun75]. It is still unsolved. In other words, we are not in possession of a necessary and sufficient criterion for deciding whether the parameters in a structural model can be determined uniquely from the covariance matrix of the observed variables. Certain restricted classes of models are nevertheless known to be identifiable, and these are often assumed by social scientists as a matter of convenience or convention 7 (1) (2) (3) Figure 1.3: McDonald’s regressional hierarchy examples [Dun75]. McDonald [1997] characterizes a hierarchy of three such classes (see Figure 1.3): (1) uncorrelated errors, (2) correlated errors restricted to exogenous variables, and (3) correlated errors restricted to pairs of causally unordered variables (i.e., variables that are not connected by uni-directed paths.). The structural equations in all three classes are regressional (i.e., the error term in each equation is uncorrelated with the explanatory variables of that same equation) hence the parameters can be estimated uniquely using Ordinary Least Squares techniques. Traditional approaches to the Identification problem are based on algebraic manipulation of the equations defining the model. Powerful algebraic methods have been developed for testing whether a specific parameter, or a specific equation in a model is identifiable . However, such methods are often too complicated for investigators to apply in the pre-analytic phase of model construction. Additionally, those specialized methods are limited in scope. The rank and order criteria [Fis66], for example, do not exploit restrictions on the error covariances (if such are available). The rank criterion further requires precise estimate of the covariance matrix before identifiability can be decided. Identification methods based on block recursive models [Fisher, 1966; Rigdon, 1995], for another example, insist on uncorrelated errors between any pair of ordered blocks. Recently, some advances have been achieved on graphical conditions for identi- 8 fication [Pea98, Pea00a, SRM98]. Examples of such conditions are the “back-door” and “single-door” criteria [Pea00a, pp. 150–2]. The backdoor criterion consists of a dseparation test applied to the causal diagram, and provides a sufficient condition for the identification of specific causal effects in the model. A problem with such conditions is that they are applicable only in sparse models, that is, models rich in conditional independence. The same holds for criteria based on instrumental variables (IV) (Bowden and Turkington, 1984), since these require search for variables (called instruments) that are uncorrelated with the error terms in specific equations. 9 CHAPTER 2 Problem Definition and Background 2.1 Structural Equation Models and Identification A structural equation model for a vector of observed variables ½ ¼ is defined by a set of linear equations of the form for . Or, in matrix form and . The term in each equation corresponds to an stochastic error, assumed to have where ½ ¼ normal distribution with zero mean. The model also specifies independence assump- tions for those error terms, by the indication of which entries in the matrix have value zero. In this work, we consider only recursive models, which are characterized by the fact that the matrix is lower triangular. This assumption is reasonable in many domains, since it basically forbids feedback causation. That is, a sequence of variables where each appears in the right-hand side of the equation for ·½, and variable appears in the equation for ½ . ½ 10 The structural assumptions encoded in a model consist of: (1) the set of variables omitted in the right-hand side of each equation (i.e., the zero entries in matrix ); and, ). (2) the pairs of independent error terms (i.e., zero entries in , denoted by The set of parameters of model non-zero entries of matrices A parametrization and , is composed by the (possibly) . for model is a function value to each parameter of the model. The pair that assigns a real determines a unique covariance matrix over the observed variables, given by [Bol89]: Å ½ ½ Ì (2.1) and are obtained by replacing each non-zero entry of and the respective value assigned by . where by Now, we are ready to define formally the problem of Identification in SEM. Definition 1 (Model Identification) A structural equation model tified if, for almost every parametrization for , the following condition holds: Å Å That is, if we view parametrization is said to be iden- (2.2) as a point in , then the set of points in which condition (2.2) does not hold has Lebesgue measure zero. The identification status of simple models can be determined by explicitly calculating the covariances between the observed variables, and analyzing if the resulting 11 expressions imply a unique solution for the parameters. This method is illustrated in the following examples. Consider the model defined by the equations: (2.3) where the covariances of pairs of error terms not listed above are assumed to be zero. We also make the assumption that each of the observed variables has zero mean and is standardized (i.e., has variance 1). This assumption is not important because, if this is not the case, a simple transformation can put the variables in this form. Immediate consequences of this last assumption are: ¾. and Calculating the covariances between observed variables, we obtain: (2.4) and, by similar derivations, 12 (2.5) Now, it is easy to see that the values of parameters are uniquely determined by the covariances , and . Parameters and are obtained by solving the system formed by the expressions for and . Hence, if two parametrizations and induce the same covariance ma¼ trix they must be identical, and the model is identified. Note, however, that this argument does not hold if parameter is exactly 1. In this case, the equations for and do not allow us to obtain a unique solution for parameters and . A unique solution can still be obtained from the expression for , but if we also have , then the parameters and are not uniquely determined by the covariance matrix of the observed variables. This explains why we only require condition (2.2) to hold for almost every parametrization, and allow it to fail in a set of measure zero. The simplest example of a non-identified model corresponds to: ½ ½ ¾ ½ ¾ ½ ¾ In this case, the covariance matrix for the observed variables ½ , ¾ contains only one entry, whose value is given by: ½ ¾ ½ ¾ ½ ½ ¾ ½ ¾ ½ ½ ¾ Now, given any parametrization , it is easy to construct another one parametrization ¼ , with ¼ ¼ . But this implies that , and so the model is non-identified. 13 ¼ W α β b a X c Y Z Figure 2.1: A simple structural model and its causal diagram In general, if a model is non-identified, for each parametrization an infinite number of distinct parametrizations ¼ such that ever, it is also possible that for most parametrizations there exists . How¼ only a finite number of distinct parametrizations generate the same covariance matrix. We will return to this issue in Chapter 6, where we study sufficient conditions for non-identification, and will provide an example of this situation. A few other algebraic methods, like algebra of expectations [Dun75], have been proposed in the literature. However, those techniques are too complicated to analyze complex models. Here, we pursue a different strategy, and study the identification status of SEM models using graphical methods. For this purpose, we introduce the graphical representation of the model, called a causal diagram [Pea00a]. The causal diagram of a model spond to the observed variables consists of a directed graph whose nodes corre- ½ in the model. A directed edge from to indicates that appears on the right-hand side of the equation for with a nonzero coefficient. A bidirected arc between and indicates that the corresponding error terms, and , have non-zero correlation. The graphical representation can be completed by labeling the directed edges with the respective coefficients of the linear equations, and the bidirected arcs with the non-zero entries of the covariance matrix 14 Y X Z V W U Figure 2.2: A causal diagram . Figure 2.1 shows the causal diagram for the example given in Eq. (2.3). Note that the causal diagram of a recursive model does not have any cycle composed only of directed edges. The next section presents some basic definitions and facts about the type of directed graphs considered here. Then, in section 2.3 we establish the connection between the Identification problem and the graphical representation of the model. 2.2 Graph Background in a causal diagram consists of a sequence of edges ½ ¾ such that ½ is incident to , is incident to , and every pair of and are consecutive edges in the sequence has a common variable. Variables A path between variables and called the extreme points of the path, and every other variable appearing in some edge is said to be an intermediate variable in the path. We say that the path points to extreme point ( ) if the edge ½ ( ) has an arrow head pointing to For example, the following are some of the paths between diagram of Figure 2.2: 15 and ( ). in the causal Note that only the third path points to variable , but all of them point to . ½ between and is valid if variable only appears in ½ , variable only appears in , and every intermediate variable appears in exactly A path two edges in the path. Among the examples above, only the first three are valid. The last one is invalid because variable appears in more than two edges. The special case of a path composed only by directed edges, all of which oriented in the same direction, is called a chain. The first example above corresponds to a chain to from . We will also make use of a few family terms to refer to variables in particular topological relationships. Specifically, if the edge , then is present in the causal . Similarly, if there exists a chain from is said to be an ancestor of , and is a descendant of . Clearly, diagram, then we say that to is a parent of in a recursive model, we cannot have the situation where is both an ancestor and a . In the causal diagram of Figure 2.2, and are the parents of variable , and is an ancestor of both and . Given a path between and , and an intermediate variable in , we denote by the path consisting of the edges of that appear between and . 1 Variable is a collider in path a between and , if both and descendant of some other variable ½ Here, and in most of the following, we are only concerned about valid paths, so this concept is well-defined 16 point to . A path that does not contain any collider is said to be unblocked. Next, we consider a few important facts about unblocked paths. Define the depth of a node in a causal diagram as the length (i.e., number of edges) of the longest chain from any ancestor of to . Nodes with no ancestors have depth 0. Lemma 1 Let and be nodes in the causal diagram of a recursive model such that depth(X) depth(Y). Then, every path between with depth(Z) and which includes a node depth(X) must have a collider. Proof: Consider a path between above. We observe that cannot be an ancestor of either or , otherwise we would have depth depth or depth Now, consider the subpath of and and node satisfying the conditions depth . between and . If this subpath has the form , then it must contain a collider, since it cannot be a directed path from to . Similarly, if the subpath of between and has the form , then it must contain a collider. In all the remaining cases ¾ is a collider blocking the path. If is a path between and , and is a path between and , then denotes the path obtained by the concatenation of the sequences of edges corresponding to and . Lemma 2 Let be an unblocked path between and , and let be an unblocked path between and . Then, is a valid unblocked path between and only if: (i) and do not have any intermediate variable in common; 17 if and Y X Z V W U Figure 2.3: A causal diagram (ii) either is a chain from to , or is a chain from to . Definition 2 -separation A set of nodes -separates from in a graph, if closes every path between and . A path is closed by a set (possibly empty) if one of the following holds: (i) contains at least one non-collider that is in ; (ii) contains at least one collider that is outside For example, consider the path and has no descendant in . in Figure 2.3. This path is closed by any set containing variables or . On the other hand, the path is closed by the empty set , but is not closed by any set containing or but not or . It is easy to verify by inspection that the set closes every path between and , and so d-separates from . 2.3 Wright’s Method of Path Analysis The method of path analysis [Wri34] for identification is based on a decomposition of the correlations between observed variables into polynomials on the parameters of the 18 Z1 a X1 γ1 ¾ ½ e f b c1 c2 ½ ½ Z1 ½ X2 γ2 ½ ¾ ½ Y ½ ¾ ½ ½ ¾ ¾ ¾ ¾ ¾ ½ · ¾ ¾ ¾ Figure 2.4: Wright’s equations. model. More precisely, for variables coefficient of and , denoted by and in a recursive model, the correlation , can be expressed as: paths (2.6) represents the product of the parameters of the edges along path , and the summation ranges over all unblocked paths between and . For this where the term equality to hold, the variables in the model must be standardized (i.e., variance equal to 1) and have zero mean. We refer to Eq.(2.6) as Wright’s decomposition for . Figure 2.4 shows a simple model and the decompositions of the correlations for each pair of variables. The set of equations obtained from Wright’s decompositions summarizes all the statistical information encoded in the model. Therefore, any question about identification can be decided by studying the solutions for this system of equations. However, since this is a system of non-linear equations, it can be very difficult to analyze the identification of large models by directly studying the solutions for these equations. 19 CHAPTER 3 Auxiliary Sets for Model Identification 3.1 Introduction In this chapter we investigate sufficient conditions for model identification. Specifically, we want to find graphical conditions on the causal diagram that guarantee the identification of every parameter in the model. One example of the type of result obtained here is the Bow-Free Condition, which states that every model whose causal diagram has at most one edge connecting any pair of variables is identified. The starting point for our analysis of identification is the set of equations provided by Wright’s decompositions of correlations. Then, we make the following important observation. For an arbitrary variable an arrow head pointing to , let be a set of incoming edges to (i.e., edges with ). Then, any unblocked path in the causal diagram can include at most one edge from . This follows because if two such edges appear in a valid path, then they must be consecutive. But since both edges point to ), the path must be blocked. (e.g., Now, recall that each term in the polynomial of Wright’s decomposition corresponds to an unblocked path in the causal diagram. Thus, the observation above implies that such polynomials are linear in the parameters of the edges in . Hence, our approach to the problem of Identification in SEM can be summarized 20 as follows. First, we partition all the edges in the causal diagram into sets of incoming edges. Then, we study the identification of the parameters associated with each set by analyzing the solution of a system of linear equations. Two conditions must be satisfied to obtain the identification of the parameters corresponding to a set of edges . First, there must exist a sufficient number of linearly independent equations. Second, the coefficients of these equations, which are functions of other parameters in the model, must be identified. To address the first issue, we developed a graphical characterization for linear independence, called the G Criterion. That is, for a fixed variable , if a set of variables ½ satisfies the graphical conditions established by the G criterion, then the decompositions of ½ are linearly independent (with respect to the param- eters of edges in ). These conditions are based on the existence of specific unblocked paths between and each of the ’s. The second point is addressed by establishing an appropriate order to solve the systems of equations. The following sections will formally develop this graphical analysis of identification. 3.2 Basic Systems of Linear Equations We begin by partitioning the set of edges in the causal diagram into sets of incoming edges. for the variables in the model, with the only restriction that if , then must appear before in . For each variable , we define as the set of edges in the causal diagram that connect to any variable appearing before in the ordering . Fix an ordering 21 It easily follows from this definition that, for each variable directed edges pointing to (i.e., , contains all ). Lemma 3 Any unblocked path between and some variable can include at most one edge from . Moreover, if , then any such path must include exactly one edge from . Proof: The first part of the lemma follows from the argument given in Section 3.1. For the second part, assume that is an unblocked path between and , which does not contain any edge from . Let be the variable adjacent to in path . Clearly, (otherwise edge would belong to ). But then Lemma 1 saysays that contains a collider, which is a contradiction. ¾ Now, fix an arbitrary variable , and let ½ denote the parameters of the edges in . Then, Lemma 3 allows us to express Wright’s decomposition of the correlation between and as a linear equation on the ’s: where if . Figure 3.1 shows the linear equations obtained from the correlations between and every other variable in a model. , we let 1 denote the system of equations corresponding to the decompositions of correlations ½ : Now, given a set of variables ½ ½ Whenever clear from the context, we drop the reference to 22 and simply write . Z1 Z2 a e f ¾ ½ ½ ½ b X2 X1 λ3 λ2 λ1 λ4 Y Figure 3.1: Wright’s equations. 3.3 Auxiliary Sets and Linear Independence Following the ideas presented in Section 3.1, we would like to find a set of variables that provides a system of linearly independent equations. This motivates the following definition: Definition 3 (Auxiliary Sets) A set of variables ½ is said to be an Auxiliary Set with respect to if and only if the system of equations is linearly independent. Next, we obtain sufficient graphical conditions for a given set of variables to be an auxiliary set for . Since the terms in Wright’s decompositions correspond to unblocked paths, it is natural to expect that linear independence between equations translate into properties of such paths. In the following, we explore this connection by analyzing a few examples, and then we introduce the G criterion. For each of the models in Figure 3.2 we will verify if the set ½ ¾ qualifies as an Auxiliary Set for . In model ½ , the system of equations is given by: 23 Z2 Z1 a X1 λ1 Z1 Z2 d c b λ2 λ3 c b λ2 λ3 b c X2 λ1 λ4 a Z1 X1 X2 Z2 a X1 λ1 λ4 X2 λ2 λ3 Y Y Y M1 M2 M3 λ4 Figure 3.2: Figure ½ ¾ ¾ ¿ ¾ ¿ It is easy to see that the equations are linearly independent 2 , and so is an Auxiliary Set for . We also call attention to the fact that unblocked paths ½ ½ and ¾ ¾ ¾ ½ have no intermediate variables in common. In model ¾ , system is formed by: ½ ¾ ¾ ¿ ¾ ¿ ¾ ¿ Clearly, the equations are not linearly independent in this case. This occurs because every unblocked path between ½ and in ¾ can be extended by the edge ½ ¾ to give an unblocked path between ¾ and . Finally, in model ¿ , the system is given by: ¾ This is not true if ad = bc, but this condition only holds on a set of measure zero (see discussion in Section 2.1) 24 Zj Zi pi pj Y Figure 3.3: Example illustrating condition ½ ¾ ¿ ¾ ¾ in the G criterion and again we obtain a pair of linearly independent equations. The important fact to note here is that, if we extend path ½ ¾ by edge ¾ ½ , we obtain a path blocked by ½ . In general, the situation can become much more complicated, with one equation being a linear combination of several others. However, as we will see in the following, the examples discussed above illustrate the essential graphical properties that characterize linear independence. G Criterion: A set of variables ½ satisfies the G criterion with respect to if there exist paths ½ such that: (i) is an unblocked path between and including some edge from (ii) for , ; is the only possible common variable in paths and (other than ), and in this case, both and must point to (see Figure 3.3 for an example). 25 Next, we prove a technical lemma, and then establish the main result of this Chapter. ½ witness the fact that satisfies the G criterion with respect to . Let ½ denote the parameters of the edges in , and, without loss of generality, assume that, for , path contains the edge with parameter . (It is easy to see that condition of the G criterion does not allow paths and to have a common Let ½ be a set of variables, and assume that paths edge.) , let be an unblocked path between and including the edge with parameter . Then, must contain an edge that does not appear in any of ½ . Lemma 4 For , if both is an intermediate variable in , then Proof: Without loss of generality, we may assume that, for all variables . appear in path , and If this is not the case, then we can always rename the variables such that this condition holds, and condition of the G criterion is not violated. be a path satisfying the conditions of the lemma, and assume that contains only edges appearing in ½ . In the following we show that must be Now, let blocked by a collider. Clearly, we can divide the path into segments ½ such that all the edges in each segment belong to the same path . (from condition of the G criterion). On the other hand, the edges of the last segment belong to path . Now, note that variable can appear only in a path for , there must exist two consecutive segments , ·½ and indices , such that the edges in belong to and the edges in ·½ belong to . Since 26 The common variable of be , and appear in both and . Since , it must and must point to it. belong to subpath , then If the edges in also points to . In this case, is a collider in path and the lemma follows. In the other case, cannot be the first segment of path , and we consider segment whose edges belong to, say, path . Since we assumed that is the first segment with edges from a path with , we conclude that . But we also have that appears in subpath , and the initial assumption is that . Thus, we have a contradiction, and the lemma follows. Theorem 1 If the set of variables respect to , then ¾ satisfies the G criterion with is an Auxiliary Set for . Proof: The system of equations can be written in matrix form as: where . ½ , is a by matrix, and Let denote the submatrix corresponding to the first columns of . We will show that , which implies that and the equations in are linearly independent with respect to the ’s. Applying the definition of determinant, we obtain (3.1) where the summation ranges over all permutations of , and denotes the parity of permutation . 27 First, observe that entry corresponds to unblocked paths between and including the edge from with parameter . In particular, is one of these ´ µ ´ µ . This implies that the term paths, and we can write ¼ appears in the summand corresponding to permutation . Also, note that every factor in is the parameter of an edge in some . On the other hand, any term in the summand of a permutation distinct from must contain a factor from some entry , with . Such an entry corresponds to unblocked paths between and including the edge from with parameter . But lemma 4 says that those paths must have at least one edge that does not appear in any of ½ . This implies that is not cancelled out by any other term in 3.1, ¾ and so does not vanish, completing the proof of the theorem. 3.4 Model Identification Using Auxiliary Sets Assume that for each variable there is an Auxiliary Set , with . This implies that for each there exists a system of linear equations that can be solved uniquely for the parameters ½ of the edges in . This fact, however, does not guarantee the identification of the ’s, because the solution for each is a function of the coefficients in the linear equations, which may depend on non-identified parameters. To prove identification we need to find an appropriate order to solve the systems of equations. This order will depend on the variables that compose each auxiliary set. The following theorem gives a simple sufficient condition for identification: Theorem 2 Assume that the Auxiliary Set of each variable satisfies: (i) ; 28 (ii) , for all . Then, the model is identified. Proof: We prove the theorem by induction on the depth of the variables. Let be a variable at depth 0. Note that can only contain bidirected edges connecting to another variable at depth 0. Let . Observing that the only unblocked path be- tween and consists precisely of edge , we get that the parameter of edge is identified and given by . Now, assume that, for every variable at depth smaller than , the parameters of the edges in are identified. Let be a variable at depth , and let . Lemma 1 implies that every intermediate variable of an unblocked path between and has depth smaller than . The inductive hypothesis then implies that the coefficient of the linear equations in are identified. Hence, the parameters of the edges in are identified. ¾ In the general case, however, the auxiliary set for some variable may contain variables at greater depths than , or even descendants of . This would force us to solve the systems of equations in a different order than the one established by the depth of the variables. In the following, we provide two rules that impose restrictions on the order in which the linear systems must be solved. We will see that if these restrictions do not generate a cycle, then the model is identified. R1: If, for every bidirected edge in , variable is not an ancestor of any , then can be solved at any time. R2: Otherwise, 29 U W Y Z Figure 3.4: Example illustrating rule R¾´ µ. a) For every ,¨ must be solved before ¨ is solved. 30 b) If is a descendant of , and ´ µ is a bidirected edge with an ancestor of , then for every lying on a chain from to (see Figure 3.4), ¨ must be solved before ¨ . For a model and a given choice of Auxiliary Sets, the restrictions above can be represented by a directed graph, called the dependence graph , as follows: Each node in corresponds to a variable in the model; There exists a directed edge from to in if rule R1 does not apply to , and rule R2 imposes that ¨ must be solved before ¨ . The next theorem states our general sufficient condition for model identification: Theorem 3 Assume that there exist Auxiliary Sets for each variable in model , such that the associated dependence graph has no directed cycles. Then, model is identified. The proof of the theorem is given in appendix A. Figure 3.5 shows an example that illustrates the method just described. Apparently, this is a very simple model. However, it actually requires the full generality of Theorem 3. The Figure also shows the auxiliary sets for each variable, and the corresponding . The fact that rule ½ can be applied to variable on and eliminates the possibility of a cycle. dependence graph dependence of avoids a 3.5 Simpler Conditions for Identification The sufficient condition for identification presented in the previous section is very general, but complicated to verify by visual inspection of the causal diagram. The 31 X Y X Z Z W W Y R1 applies to Z. R2 applies to Y,W. Figure 3.5: Example illustrating Auxiliary Sets method. main difficulty resides in finding the required unblocked paths witnessing that a set of variables is an Auxiliary Set. A solution for this problem is provided in Chapter 4, where we develop an algorithm to find an appropriate Auxiliary Set for a fixed variable . However, simple conditions for identification still seem to be useful, and this is the subject of this section. 3.5.1 Bow-free Models A bow-free model is characterized by the property that no pair of variables is connected by more than one edge. Actually, this represents the simplest situation for our method of identification. Corollary 1 Every bow-free model is identified. 32 Z X Y W Figure 3.6: Example of a Bow-free model. be the set of variables such that, for , edge belongs to . Since the model is bow . Moreover, the set of paths ½ , where free, it follows that each is the trivial path consisting of the single edge , witnesses that is an Auxiliary Set for . Proof: Fix an arbitrary variable , and let ½ be the ordering used in th construction of the sets of incoming edges . Then, for every , all the variables in appear before in . This implies that the dependence graph is acyclic, and the corollary follows. ¾ Now, let Figure 3.6 shows an interesting example. A brief examination of this causal diagram will reveal that no conditional independence holds among the variables , , and . As a consequence, most traditional methods for Identification (e.g., Instrumental Variables, Back-door criterion) would fail to classify this model as identified. However, as can be easily verified, this model is bow-free and hence identified. 3.5.2 Instrumental Condition From the preceding result, it is clear that all the problems for identification arise from the existence of bow-arcs in the causal diagram (i.e., pairs of variables connected by both a directed and a bidirected edge). Let us examine this structure in more detail and try to understand why it represents a problem for identification. 33 Assume that there is a bow-arc between variables and , and let Æ, denote the parameters of the directed and bidirected edges in the bow, respectively. Then, Wright’s decomposition for correlation gives 3 : Æ Æ Now, if this is the only constraint on the values of and , then there exists an infinite Æ that are consistent with the observed correlation . This situation occurs, for example, when both edges in the bow point to , and there is number of solutions for and no other edge in the model with an arrow head pointing to . In this case, by analyzing the decomposition of the correlations between any pair of variables, we observe that Æ at all, or they only depend on their either they do not depend on the values of , sum (See Figure 3.7(a) for an example). From this observation it is possible to derive a proof that any such model is non-identified. Similar techniques will be explored in Chapter 6 to obtain graphical conditions for non-identification. Now, consider the situation where there exists a third variable edge pointing to . A variable with a directed with such properties is sometimes called an Instru- mental Variable. Figure 3.7(b) shows an example of this situation, with the respective decompositions of correlations. It is easy to verify that there exists a unique solution for parameters Æ in terms of the observed correlations. Hence, the basic idea for our next sufficient condition for identification is to find, for each bow-arc between and , a distinct variable with an edge pointing to . The condition is precisely stated as follows. For a fixed variable connected to ¿ Here, let ½ be the set of variables that are by a directed and a bidirected edges, both pointing to . we are assuming that there is no other unblocked path between 34 and . X a Z λ δ b λ δ a α Y Æ c Æ Æ W Z X (a) Y Æ Æ (b) Figure 3.7: Examples illustrating the instrumental condition. Instrumental Condition: We say that a variable tion if, for each variable ¾ , there exists a unique variable (i) ; (ii) is not connected to satisfies the Instrumental Condisatisfying: by any edge; (iii) there exists an edge between and that points to . Corollary 2 Assume that the Instrumental Condition holds for every variable in model . Then model is identified. 35 CHAPTER 4 Algorithm In some elaborate models, it is not an easy task to check if a set of variables satisfies the GAV criterion. Moreover, the criterion itself does not provide any guidance to find a set of variables satisfying its conditions. In this chapter we present an algorithm that, finds an Auxiliary Set for a fixed variable , if any such set exists. The basic idea is to reduce the problem to an instance of the maximum flow problem in a network. Cormen et al [CCR90] define the maximum flow problem as follows. A flow is a directed graph in which each edge has a nonnegative capacity . We distinguish two vertices in the flow network: a source and a sink . A flow in is a real-valued function , satisfying: network , for all ; , for all ; ¾ , for all . Ú Î states that the amount of flow on any edge cannot exceed its capacity; condition says that the amount of flow running on one direction of an That is, condition edge is the same as the flow in the other direction, but with opposite sign; and condition establishes that the amount of flow entering any vertex must be the same as the amount of flow leaving the vertex. 36 Intuitively, the value of a flow is the amount of flow that is transfered from the source to the sink , and can be formally defined as ¾ In the maximum flow problem, we are given a flow network , with source and sink , and we wish to find a flow of maximum value from to . Before describing the construction of the flow network, we make a few observations. Fix an arbitrary variable . Lemma 5 Assume that the set of variables satisfies the conditions of the GAV crite- rion with respect to . Then, there exists a set of variables and paths ½ such that (i) (ii) ½ witness that (iii) for satisfies the GAV criterion; , every intermediate variable in belongs to . ½ , and let ½ be paths witnessing that Proof: Let satisfies the GAV criterion. The lemma follows from the next observation. associated with variable . Then the paths ½ ½ ·½ witness that ½ , , ½ , , ·½ , , satisfies the GAV criterion with respect to . ¾ Let be an intermediate variable in the path Let be a set of variables, and ½ be paths satisfying the conditions of Lemma 5. Then, it follows that each of the paths must be either a bidirected edge , or a chain , or the concatenation of a bidirected edge with a chain . 37 From these observations we conclude that, in the search for an Auxiliary Set for , we only need to consider: ¯ ancestors of ; ¯ variables connected by a bidirected edge to either Now, from condition or an ancestor of ; of the GAV criterion, we get that an ancestor of can appear in at most two paths: (1) the path between and ; and (2) some path , as an intermediate variable. To allow this possibility, for each ancestor of , we create two vertices in the flow network. Since non-ancestors of can appear in at most one path, there will be only one vertex in the flow network corresponding to each such variable. Directed edges between ancestors of in the causal diagram are represented by directed edges between the corresponding vertices in the flow network. Bidirected edges incident to and non-ancestors of are also represented by directed edges. Bidirected edges between ancestors and require special treatment, because they can appear as the first edge of either or , but not in both of them. To enforce this restriction we make use of an extra vertex. Next, we define the flow network that will be used to find an Auxiliary Set for . The set of vertices of consists of: ¯ for each ancestor of , we include two vertices, denoted and ; ¯ for each non-ancestor , we include vertex ; ¯ for each bidirected edge ° connecting ancestors of vertex ; 38 , we include the a source vertex ; a sink vertex , corresponding to variable . The set of edges of is defined as follows: for each ancestor of , we include the edge ; for each directed edge in the causal diagram, connecting ancestors of , we include the edge ; for each directed edge , we include the edge ; for each bidirected edge , where is an ancestor of , we include the edge ; for each bidirected edge , where is a non-ancestor and is an ancestor of , we include the edge ; for each bidirected edge , where is a non-ancestor, we include the edge ; for each bidirected edge , where both and are ancestors of , we include the edges: , , , ; for each ancestor of , we include the edge ; for each non-ancestor , we include the edge . To solve the maximum flow problem on the flow network defined above, we assign capacity 1 to every edge in . We also impose the additional constraint of maximum incoming flow capacity of 1 to every vertex in (this can be implemented 39 by splitting each vertex into two and connecting them by an edge of capacity 1), except for vertices and . We solve the Max-flow problem using the Ford-Fulkerson algorithm. From the integrality theorem ([CCR90], p.603), the computed flow allocates a non-negative integer amount of flow to each edge in . Since we assign capacity 1 to every edge, this solution corresponds to disjoint directed paths from to . The Auxiliary Set returned by the algorithm is simply the set of variables corresponding to the first vertex in each path. Theorem 4 The algorithm described above is sound and complete. That is, the set of variables returned by the algorithm is an Auxiliary Set for with maximum size. Proof: Fix a variable in model , and let be the corresponding flow network. Let be the flow computed by the Ford-Fulkerson algorithm on , and let . As described above, we interpret the flow solution as a set of edge disjoint paths ½ from the source to the sink . First, we show that each such path corresponds to an unblocked path in the causal diagram of . Let ½ be one of the paths in . We make the following observations: is either a vertex corresponding to a non-ancestor of , or a vertex corresponding to an ancestor of in the causal diagram. for , each is either a vertex corresponding to ancestor of , or a vertex corresponding to a bidirected edge between ancestors and of , but the later can only occur if 40 and . Now, we establish a correspondence between the edges of and the edges of the causal diagram. 1. The directed edge corresponds to the directed edge from the respective ancestor to variable . 2. If two vertices and are both vertices of the type and ·½ , then the edge in corresponds to the edge 3. If is a vertex of type , then the edge and path corresponds to the path in the causal diagram. corresponds to the edge , in the causal diagram. 4. If is a vertex of type and is a vertex of type ½ , then the edge corresponds to the edge , and path corresponds to the path in the causal diagram. 5. If is a vertex of type and is of type , then the edge to the edge , and path corresponds to the path corresponds in the causal diagram. Now, let be the first vertices in each of the paths in , and let be the variables associated with those vertices in the causal diagram. Note that the constraint of maximum incoming flow capacity of 1 on the vertices of implies that the paths are vertex disjoint (and also implies that the variables are all distinct). However, this constraint does not imply that the corresponding paths in the causal diagram are vertex disjoint. For an ancestor of , it is possible that vertices and appear in distinct paths and , and so would appear in two of the corresponding paths on the causal 41 s Z1 VZ- V^Z VW W X U VZU V^X VX- VUV^U Y t Figure 4.1: A causal diagram and the corresponding flow network diagram. In fact, this is the only possibility for a variable to appear in more than one such paths, and in this case it is easy to verify that the conditions of the G criterion hold. This proves that the algorithm is sound. Completeness easily follows from the construction of the flow network, and the ¾ optimality of Ford-Fulkerson algorithm Figure 4.1 shows an example of a simple causal diagram and the corresponding flow network . Theorem 5 The time complexity of the algorithm described above is ¿ . Proof: The theorem easily follows from the facts that the number of vertices in the flow network is proportional to the variables in the model, and that Ford-Fulkerson algorithm runs in ¿ , where is the size of the flow network. 42 ¾ CHAPTER 5 Correlation Constraints It is a well-known fact that the set of structural assumptions defining a SEM model may impose constraints on the covariance matrix of the observed variables [McD97]. That is, a given set of structural assumptions may imply that the value of a particular entry in is a function of some other entries in this matrix. An immediate consequence of this observation is that a model may not be com- patible with an observed covariance matrix , in the sense that for every parametrization we have . This allows to test the quality of the model. That is, by verifying if the constraints imposed by the structural assumptions are satisfied in the observed covariance matrix (at least approximately), we either increase our confidence in the model, or decide to modify (or discard) it. The first type of constraint imposed by a SEM model is associated with d-separation conditions. Such condition is equivalent to a conditional independence statement [Pea00a], and implies that a corresponding correlation coefficient (or partial correlation) must be zero. Figure 5.1 shows two examples illustrating how we can obtain correlation constraints from d-separation conditions. In the causal diagram easy to see that variables and , it is are d-separated (the only path between them is . For the model in , it follows that variables and are d-separated when conditioning on . This implies that . Applying the recursive formula for partial correlations, we get: blocked by variable ). This immediately gives that 43 α X X Y a X Y b Z Z γ c W Z (a) W (b) (c) Figure 5.1: D-separation conditions and correlation constraints. and so, ¾ ¾ . The d-separation conditions, however, do not capture all the constraints imposed by the structural assumptions. In the model of Figure 5.1 no d-separation condition holds, but the following algebraic analysis shows that there is a constraint involving the correlations and . Applying Wright’s decomposition to these correlation coefficients, we obtain the following equations: Observe that factoring out in the right-hand side of the equation for obtain the expression for . Thus, we can write 44 , we Similarly, we obtain Now, it is easy to see that the following constraint is implied by the two equations above Clearly, this type of analysis is not appropriate for complex models. In the following we show how to use the concept of Auxiliary Sets to compute constraints in a systematic way. 5.1 Obtaining Constraints Using Auxiliary Sets The idea is simple, but gives a general method for computing correlation constraints. Fix a variable , and let ½ denote the parameters of the edges in . Recall that if ½ is an Auxiliary Set for , then the decomposition of the correlations ½ gives a linearly independent system of equations, with respect to the ’s. Let us denote this system by Now, if . , then this system of equations is maximal, in the sense that any linear equation on parameters ½ can be expressed as a linear combination of the equations in decomposition of , where . Hence, computing this linear combination for the , gives a constraint involving the correlations among ½ . 45 In the following, we describe the method in more detail. , and assume that . The decomposition of the correlations ½ can be written as: Let ½ be an Auxiliary Set for ½ (5.1) Or, in matrix form, , it follows that is an by Since is an Auxiliary Set and matrix with rank . Now, let , and write the decomposition of Clearly, the vector as (5.2) can be expressed as a linear combination of the rows of matrix as where denotes the row of matrix , and the ’s are the coefficients of the linear combination. But this implies that we can express the correlation as by considering the left-hand side of Equations 5.1 and 5.2. 46 W Z a c α b X2 X1 λ1 λ2 X3 λ3 λ4 Y Figure 5.2: Example of a model that imposes correlation constraints We also note that the coefficients are functions of the parameters of the model. But if the model is identified, then each of these parameters can be expressed as a function of the correlations among the observed variables. Hence, we obtain an expression only in terms of those correlations. To illustrate the method, let us consider the model in Figure 5.2. It is not difficult to check that ½ is an Auxiliary Set for variable . Next, we obtain a constraint involving the correlations and ¿ . The decompositions of the correlations between and each variable in gives the following system of equations: ½ ¾ ¿ and the decomposition of gives the equation The corresponding matrix and vector are then given by: 47 After some calculations, one can verify that vector can be expressed as a linear combination of the first and fourth rows of ¾ ¾¾ as: (5.3) Since the model is identified, we can express the parameters and in terms of the correlation coefficients. In this case, we obtain ¿ and . Substituting these expressions in Equation 5.3 and performing some algebraic manipulations, we obtain the following constraint: ¿ ¿ ¿ ¿ 48 CHAPTER 6 Sufficient Conditions For Non-Identification The ultimate goal of this research is to solve the problem of identification in SEM. That is, to obtain a necessary and sufficient condition for identification, based only on the structural assumptions of the model. In Chapter 3, we introduced our graphical approach for identification, and provided a very general sufficient condition for model identification. Indeed, we are not aware of any example of an identified model that cannot be proven to be so using the method of auxiliary sets. Here, we present the results of our initial efforts on the other side of the problem. That is, we investigate necessary graphical conditions for the identification of a SEM model. The method of auxiliary sets imposes two main conditions to classify a model as identified. First, for each variable in we must find a sufficiently large auxiliary set. Second, given the choice of auxiliary sets, a precedence relation is established among the variables, and represented by a dependence graph . This graph cannot contain any directed cycle. A natural strategy, then, is to assume that one of these conditions does not hold, and try to prove that the model is non-identified. The proof of non-identification is conceptually simple. We begin with an arbitrary parametrization for model . Then, we show that, under the specified conditions, it is possible to construct another parametrization ¼ such that . This proves that the model is ¼ 49 Z X X2 X1 Z W Y U Y W (b) (a) Figure 6.1: Figure non-identified. 6.1 Violating the First Condition In this section we analyze two sets of conditions that prevent the existence of a sufficiently large Auxiliary Set for a given variable . In both cases we can show that they imply the non-identification of the underlying model. These conditions, however, do not provide a complete characterization of the situation where the first condition of the Auxiliary Sets method does not hold. At the end of this section, we give an example that illustrates this fact. At this point, such a characterization is the major difficulty to obtaining a more general condition for non-identification. The simplest example of non-identification, briefly discussed in Section 3.5.2, consists of a model in which a pair of variables bidirected edge, both pointing to is connected by a directed and a , and there is no other edge pointing to in the causal diagram (see Figure 6.1(a)). The next theorem provides a simple generalization of this situation. We assume that, for a fixed variable , there exists an Auxiliary Set 50 , with not enough variables. We also assume that every edge connecting a variable does not point to ¾ and a variable outside (see Figure 6.1(b)). Under these conditions, it is possible to show that there exists no Auxiliary Set for which is larger than . The theorem proves that any such model is non-identified. Theorem 6 Let be an arbitrary variable in model , and let ½ be an Auxiliary Set for . Assume that (i) (ii) ; only contains non-descendants of ; (iii) There exists no edge between some pointing to and a variable . Then, model is non-identified. ½ denote the parameters of the edges in . First, we establish that, for every pair of variables the correlation only depends on the values of the ’s through the correlations ½ . That is, any modification of the values of the ’s that does not change the values of ½ , also does not change the value of . This is formally stated as follows: Proof: Let Lemma 6 Let and be arbitrary variables in model . Then, the correlation can be expressed as where the independent term and the coefficients ’s do not depend on the parameters . 51 The proof of the lemma is given in Appendix B. Now, fix an arbitrary parametrization for model . This parametrization induces a unique correlation coefficient for each pair of variables , that we denote by . According to Wright’s decomposition, the correlations ½ de- pend on the values of parameters ’s through the linear equations: ½ Since , this system of equations has more variables than equa- tions, and so there exists an infinite number of solutions for the ’s. The values assigned to the ’s by parametrization other solution gives a parametrization corresponds to one of these solutions. Any ¼ with ¼ ¾ . The next theorem considers a much weaker set of assumptions that still prevent the existence of an Auxiliary Set for with enough variables. The theorem shows that any such model is non-identified. Let be an arbitrary variable in model , and let Auxiliary Set for . Define the boundary of variables of be an , denoted subset of variables in for which there exists a variable , as the and an edge pointing to . Theorem 7 Assume that for a given variable in model , the following conditions hold: (i) ; (ii) For any , there exists no edge pointing to ; 52 W Z2 Z1 X1 W V X2 Z X3 X1 X2 X3 Y Y (a) (b) Figure 6.2: Figure ¾ (iii) For any and , there is no edge pointing to . Then, model is non-identified. The proof of this theorem in given in Appendix B. Figure 6.2(a) shows a causal diagram satisfying the conditions in Theorem 7. In this model, the Auxiliary Set for is given by in ½ ¾ ¿ , with boudary ¿ . Note that ¿ and ¾ are the only edges between a variable and a variable in , and they do not point to and ¿. Figure 6.2(b) shows a causal diagram where the conditions of Theorem 7 do not hold. In this case, the Auxiliary Set for is given by boudary ¾. Note that edge ½ ¾ ½ ¾ ¿ ½ ¾, with violates condition . However, it is still possible to show that there is no Auxiliary Set for with more than 5 variables. 53 6.2 Violating the Second Condition In the previous section we studied the problem of identification under two sets of assumptions. In both cases we could show that, for every parametrization for model , there exists an infinite number of distinct parametrizations for that generate the same covariance matrix . We conjecture that this will be the case whenever the first condition of the method of Auxiliary Sets fails. Equivalently, we believe that the conditions that characterize an Auxiliary Set are also necessary for linear independence among the equations given by Wright’s decompositions. Surprisingly, the situation seems to be very different when the second condition fails. Next, we discuss an example in which every variable has a large enough auxiliary set, but the corresponding dependence graph contains a cycle. A simple algebraic analysis shows that, for almost every parametrization , there exists exactly one other distinct parametrization ¼ such that . ¼ At this point, we still cannot provide conditions for the existence of only a finite number of parametrizations generating the same covariance matrix. So, we leave the problem open. However, this seems to be an important question. According to the definition in section 2.1, any such model would be considered non-identified. On the other hand, if there exists only a small number of parametrizations compatible with the observed correlations, the investigator can decide to proceed with the SEM analysis, and decide which parametrization is more appropriate for the case in hand, based on domain knowledge. Consider the model illustrated by Figure 6.3. Applying Wright’s decomposition to the correlations between each pair of variables, we obtain the following equations: 54 X a α β Y γ b Z c W Figure 6.3: Figure and , we can express parameters and in terms Using the equations for of : and , to obtain expressions for pa- Similarly, we use the equations for rameters and in terms of : Now, substituting the expressions for and in the equation for , we get 55 After some algebraic manipulation, we obtain the following quadratic equation in terms of : ¾ ¾ ¾ where it is possible to show that the coefficient of the quadratic term does not vanish. This implies that there exist exactly two distinct parametrizations for this model that generate the same covariance matrix. For example, it is not hard to verify that the following two parametrizations generate the same covariance matrix: 56 CHAPTER 7 Instrumental Sets So far we have concentrated our efforts on questions related to the entire model, such as: ”Is the model identified?”, or ”Does the model impose any constraint on the correlations among observed variables?”. In this chapter we focus our attention on the identification of specific subsets of parameters of the model. This goal is based on the observation that, even when we cannot prove the identification of the model (or, when the model is actually non-identified), we may still be able to show that the value of some parameters is uniquely determined by the structural assumptions in the model and the observed data. This type of result may be valuable in situations where, even though a model is specified for all the observed variables, the main object of study consists of the relations among a small subset of those variables. We also restrict ourselves to the identification of parameters associated with directed edges in the causal diagram. Those parameters represent the strength of (direct) causal relationships, and are usually more important that the spurious correlations (associated with bidirected edges). The main result of the chapter is a sufficient condition for the identification of the parameters of a subset of directed incoming edges to a variable with arrow head pointing to (i.e., directed edges ). An important characteristic of this condition is that it does not depend on the identification of any other parameter in the model. 57 This result actually generalizes the graphical version of the method of Instrumental Variables [Pea00a]. According to this method, the parameter of edge identified if we can find a variable is and a set of variables Ï satisfying specific d- separation conditions. A more detailed explanation of this method is given in the next section. ½ Ͻ Ï Our contribution is to extend this method to allow the use of multiple pairs to prove, simultaneously, the identification of the parameters of directed edges ½ . As we will see later, there exist examples in which our method of In- , ..., strumental Sets proves the identification of a subset of parameters, but the application of the original method to the individual parameters would fail. Before proceeding, we would like to call attention to a few aspects of the proof technique developed in this chapter, which seem to be of independent interest. The method imposes two main conditions for a set of variables to be an Instrumental Set. First, we require the existence of a number of unblocked paths with specific properties. This condition is very similar to the one in the GAV Criterion, and basically ensures the linear independence of a system of equations. The second consists of d-separation conditions between and each , given the variables in Ï. Now, to be able to take advantage of these d-separation conditions, we have to work with partial correlations instead of the standard correlation coefficients used in Chapter 3. The problem, however, is that no technique similar to Wright’s decomposition was available to express partial correlation as linear expressions on the parameters of interest. This difficulty is overcome by developing a new decomposition of the partial correlation ables into a linear expression in terms of the correlations among the vari ½ . This leads to a linear equation on parameters of the model, ½ by applying Wright’s decomposition to each of the correlation coefficients. The d- 58 separation conditions then imply that only the parameters under study appear in these equations. The result finally follows by showing that all the coefficients in the linear equations can be estimated from data. 7.1 Causal Influence and the Components of Correlation This section introduces some concepts that will be used in the description of Instrumental Variables methods. Intuitively, we say that variable in the value of has causal influence on variable lead to changes in the value of . , if changes This notion is captured in the graphical language as follows. An observed variable has causal influence on variable if there exists at least one path in the causal diagram, consisting only of directed edges, going from This path reflects the existence of variables ¯ appears in the r.h.s. of the equation for ¯ for ¯ ½ to . such that ½ ; , each appears on the r.h.s. of the equation for ·½; appears in the r.h.s. of the equation for . Now, it is easy to see that any change on the value of would propagate through the ’s and affect the value of . Similarly, an error term has causal influence on variable if either there exists a directed path from to , or a path consisting of a bidirected edge concatenated with a directed path from to (the error term can be viewed as sitting on the top of bidirected edge ). Note that observed variables do not have equations for the causal influence on error terms, and the causal relationships among error terms are not specified by the model. 59 We also say that the causal influence of by a variable ) on is mediated to (and the unblocked (or some error term , if appears in every directed path from paths starting with bidirected edges, in the case of error terms). The correlation between two observed variables can then be explained as the sum of two components: ¯ the causal influence of on ; and, ¯ the correlation created by other variables (or error terms) which have causal influence on both and (sometimes called spurious correlation). The problem of identification consists in assessing the strength of the (direct) causal influences of one variable on another, given the structure of the model and the correlations between observed variables. This requires the ability to separate the portions of the correlation contributed by each of the components above. 7.2 Instrumental Variable Methods The traditional definition qualifies variable and effect if [Pea00a]: as instrumental, relative to a cause 1. Every error term with causal influence on of 2. (i.e., uncorrelated); is not independent of not mediated by is independent . and is created by the correlation between and , and the causal influence of on , represented by parameter . This implies that ¡ . Property , allows us to obtain the value of by writing . Hence, parameter is identified. Property (1) implies that the correlation between 60 a Z c X Y Figure 7.1: Typical Instrumental Variable Figure 7.1 shows a typical example of the use of an instrumental variable. In this has properties model, it is easy to verify that variable ´½µ and ´¾µ. Note that this causal diagram could be just a portion of a larger model. But, as long as properties ´½µ and ´¾µ hold, variable can be used to obtain the identification of parameter . A generalization of the method of Instrumental Variables (IV) is offered through the use of conditional IV’s. A conditional IV is a variable that does not have prop- erties ´½µ and ´¾µ above, but after conditioning on a set of variables Ï such properties hold (with independence statements replaced by conditional independence). When such a pair ´ Ï Ï. Ï is found, the causal influence of on is identified and given by µ Now, given the graphical interpretation of causal influence provided in Section 7.1, we can express the properties of a conditional IV in graphical terms using d-separation [Pea00b]: be a model, and let denote its causal diagram. Variable is a condi if there exists a set of variables (possibly empty) tional IV relative to edge Let Ï satisfying: Ï contains only non-descendents of ; 2. Ï d-separates from in the subgraph obtained from by removing edge 1. 3. ; Ï does not d-separate from in . 61 W W c Z X Y Z (a) X Y (b) Figure 7.2: Conditional IV Examples As an example, let us verify if is a conditional IV for the edge model of Figure 7.2(a). The first step consists in removing the edge in the to obtain the subgraph , shown in Figure 7.2(b). Now, it is easy to see that, after conditioning on variable , becomes d-separated from but not from in . 7.3 Instrumental Sets Before introducing our method of Instrumental Sets, let us analyze an example where the procedure above fails. Consider the model in Figure 7.3. A visual inspection of this causal diagram shows that variable problem here is that ½ does not qualify as a conditional IV for the edge ½ ½ cannot be d-separated from , no matter how we choose the Ï. If Ï does not include variable , then the path open. However, including variable in Ï would open the path set ¾ ½ ¾ By symmetry, we also conclude that edge ¾ . The ½ ¾ ½ ¾ remains . does not qualify as a conditional IV for the , and the situation is exactly the same for ¾. Hence, the identification of parameters ½ and ¾ cannot be proved using the graphical criterion for the method of conditional IV. However, observe the subgraph in Figure 7.3, obtained by deleting edges ½ 62 Z1 Z2 a Z2 Z2 Z1 a’ b’ b X1 Z1 c1 X2 X1 X1 X2 c2 X2 c1 c2 Y Y Y (a) (b) (c) Figure 7.3: Simultaneous use of two IVs . It is easy to verify that properties hold for ½ and ¾ in this subgraph (by taking Ï for both ½ and ¾ ). Thus, the idea is to let ½ and ¾ form an Instrumental Set, and try to prove the identification of parameters ½ and ¾ simultaneously. and ¾ Note that the d-separation conditions are not sufficient to guarantee the identifica- , variables ½ and ¾ also become ½ and ¾ . However, parame- tion of the parameters. In the model of Figure 7.3 d-separated from after removing edges ½ and ¾ are non-identified in this case. Similarly to the GAV criterion, we have to require the existence of specific paths between each of the ’s and . ters Next, we give a precise definition of Instrumental Sets, in terms of graphical conditions, and state the main result of this chapter. Definition 4 (Instrumental Sets) Let ½ be an arbitrary subset of . The set ½ is said to be an Instrumental Set relative to and if there exist triplets ½ ½ ½ , . . . , such that: parents of (i) for , variable and the elements of are non-descendents of ; (ii) Let be the subgraph obtained by deleting edges 63 ½ from Z1 Z Z2 X1 X1 c1 c2 X2 Z c1 Y (a) X2 X1 X2 c1 c2 X3 c2 Y Y (b) (c) Figure 7.4: More examples of Instrumental Sets , the set Ï the causal diagram of the model. Then, for from in ; but , (iii) for (iv) for Ï does not block path d-separates ; is an unblocked path between and including the edge . , the only possible common variable in paths and (other than ) is variable , and, in this case, both and Theorem 8 If ½ must point to ; is an Instrumental Set relative to variable and set of parents , then the parameters of edges ½ are identified, and can be computed by solving a system of linear equations. The proof the theorem is given in Appendix C. Figure 7.4 shows more examples in which the method of conditional IV’s fails, but our new criterion is able to prove the identification of parameters model ’s. is a bow-free model, and thus is completely identifiable. In particular, Model illus- trates an interesting case in which variable ¾ is used as the instrument for ½ while is the instrument for ¾ . Finally, in model 64 , we have an example in which the parameter of edge prove the identification of ¿ ½ and ¾ . is nonidentifiable, and still the method can 65 CHAPTER 8 Discussion and Future Work An important contribution of this work was to offer a new approach to the problem of identification in SEM, based on the graphical analysis of the causal diagram of the model. This approach allowed us to obtain powerful sufficient conditions for identification, and a new method of computing correlation constraints. This new approach has important advantages over existing techniques. Traditional methods [Dun75, Fis66] are based on algebraic manipulation of the equations that define the model. As a consequence, they have to handle the two types of structural assumptions contained in the model (e.g., (a) which variables appear in each equation; and, (b) how the error terms are correlated with each other) in different ways, which makes the analysis more complicated. The language of graphs, on the other hand, allows us to represent both types of assumptions in the same way, namely, by the presence or absence of edges in the causal diagram. This permits a uniform treatment in the analysis of identification. Another important advantage of our approach relates to conditional independencies implied by the model. Most existing methods [Fis66, BT84, Pea00b] strongly rely on conditional independencies to prove that a model is identified. As a consequence, such methods are not very informative when the model has few such relations. Since we do not make direct us of conditional independencies in our derivations, we can prove identification in many cases where most methods fail. Even the method of instrumental sets, which involves a d-separation test, is not very sensitive to condi- 66 tional independencies, because the tests are performed on a potentially much sparser graph. The sufficient conditions for identification obtained in this work correspond to the most general criteria current available to SEM investigators. To the best of our knowledge, this is also the first work that provides necessary conditions for identification. Although these results answer many questions in practical applications of SEM, finding a necessary and sufficient condition for identification is still an outstanding open problem from a theoretical perspective. Another important question that remains open is the application of our graphical methods to non-recursive models. Since the basic tool for our analysis (i.e., Wright’s decomposition of correlations) only applies to recursive models, this problem may require the development of new techniques. 67 APPENDIX A (Proofs from Chapter 3) Proof of Theorem 3: Fix an arbitrary variable with Auxiliary Set ½ ½ be the set of parents of . the parameter of edge . Let . , let Æ denote For ½ be the set of variables such that bidirected edge belongs to . For , let denote the parameter of edge . [Note that we may have .] Similarly, let Assume first that condition C1 can be applied to the pair . Let be any variable from . Then, we can write the decomposition of as: Æ and we show that coefficients ’s and ’s are identified. First, note that every unblocked path between and including and edge . Moreover, every unblocked path between and can be extendend by edge to give an unblocked path between and . These factes imply that is identified and given by . can be decomposed into an unblocked path between Now, let and and the edge be an arbitrary bidirected edge from . , because there is only one unblocked path between 68 then and including If , because which consists precisely of this edge. In the other case, we have that any unblocked path between and including should consist of a chain from to concatenated to the edge , but no such chain exists. Hence, is identified. Assume now that condition C2 is applied to . , Let and let ½ be the set of parents of . For , let denote the parameter of edge . The idea is to replace the equation corresponding to the decomposition of in , by the following linear combination: Æ This equation is also linearly independent of all the other ones in (has to show that). However, since many of the unblocked paths between and can be obatined by concatenation of some edge and an unblocked path between and , many terms are cancelled out in the r.h.s.. We also observe that term in the r.h.s. corresponds to pahts that do not included edges from , and is zero when is a non-descendant of (?). Next lemma proves the identification of coefficients Lemma 7 Each coefficient ’s. in the r.h.s. of (*) is identified and has value either 0 or 1. Proof: Fix a bidirected edge with parameter . We consider a few separate cases: and is a non-descendant of . Any unblocked path beween and including edge must consist of a chain from to concatenaed with the edge . Since is a non-descendant 69 of , no such chain exists. Thus, the coefficient of in the decomposition of is zero. Since each parent of is also a non-descendant of in the decomposition of , the coefficient of . is also zero. Hence, Note that, in this case, there is only one unblocked path between and including , which consists precisely of the edge . Thus, the coefficient of in the decomposition of is 1. Again, each parent of is a non-descendant of , so the coefficient of in the decomposition of is a descendant of is zero. Hence, . . Let denote the set of chains from to that do not include . (For non- descendants of this is the set of all chains from to ) Clearly, each of these chains must include a parent of . Thus, we can partition the chains in according to the last parent of appearing in each of them. More precisely, let denote the set of chains in that include edge . Now,we can write Observing that is the coefficient of in the decomposition of , and that . is the coefficient of in the decomposition of , we conclude that ¾ Now, it just remain to analyze the coefficients ’s. The next lemma takes care of the case when is a non-descendant of . Lemma 8 Let be a non-descendant of . Then, the coefficient of Æ in the decomposition of is identified and given by . 70 Proof: Recall that Æ is th parameter of directed edge . Then, the lemma follows and can be extended by edge to give an unblocked path between and , and those are all the unblocked . ¾ paths between and including edge The identification of coefficients ’s when is a non-descendant of , then follows from the identification of the ’s. Now, consider the identification of the ’s when is a descendant of . First, we observe that any unblocked path between and that contains a directed edgepointing to can be decomposed into an unblocked path between and , and (for some ). Thus, all terms associated with such path are cancelled edge from the facts that every unblocked path between out in (*). ¼ , since every unblocked path between and must either end with a directed edge pointing to , or be a bidirected edge between and . However, in the later case, would be in the Auxiliary Set for , which would This immediately gives that create a cycle in the dependency graph. , each term in coefficient corresponds to a path consisting of the concatenation of bidirected edge and a chain from to . Since and all , where is an intermediated variable in the chain from to , are solved brfore , all paramters in this path are identified. But this imples that is ¾ identified. For 71 APPENDIX B (Proofs from Chapter 6) Proof of Lemma 6: Let ½ denote the parameters of the edges in . Let denote the set of unblocked paths between and that do not include any edge from . Then, ¼ is given by , and clearly does not depend on the values of the ’s. Now, let denote the set of unblocked paths between and that include some edge from , with such edge appearing in subpath (the other case is similar). Let be an arbitrary path from . Then, it follows from assumption in the theorem that must contain at least one variable from first (i.e., closest to ) variable from . We let denote the to appear in , and divide the path into three segments: ¯ ½ ; ¯ ¾ ¯ ¿ ; ; where both ½ and ¿ can be null, if or , respectively. We also note that ½ is a chain from to (by assumption ), and ¿ is a chain from to , because every edge from points to and is unblocked. 72 Now, for each , let denote the set of all chains from to that do not ¾ ; and, let denote the set of all chains from to contain any variable from . Also, for each , let denote the set of unblocked paths between and . Proposition 1 For any , let ½ , let , and let . Then, the concatenation of , and gives a valid unblocked path between and if and only if and do not have any intermediate variable in common. Proof: If and have a common variable, then clearly the concatenation gives an invalid path. In the other case, the proposition follows by observing that every intermediate belongs to (follows from assumption ), and that intermediate (by definition, and by assumption ). ¾ variables in and cannot belong to variable in The arguments above give that we can write where ½ ¾ ¿ ¾½ ¿ Clearly, each of the is independent of the values of parameters ’s. The lemma follows by observing that ¾ . 73 Proof of Theorem 7: ¾ For each variable , let denote the set composed by the following and : unblocked paths between to whose intermediate variables belong to a) all chains from b) all unblocked paths that point to which is a descendant of For each variable and do not have a variable and as a linear combination of the terms Let ; point to . We first show that the correlations of each variable , let denote the set composed by all chains from to whose intermediate variables belong to Case 1: . and can be expressed ’s and ’s. . be the set of unblocked paths between and , so that . . Thus, we just need to show that the second term on the right-hand side is a linear combination of the ’s and ’s. Clearly, we can write There are two types of paths in 1) chains from : to with some intermediate variable that does not belong to ; 2) unblocked paths descendant of that include a variable and point to . Let us consider paths of type Proposition 2 Every path which is a first. of type 74 contains a variable from . Proof: Assume that the proposition does not hold for some path of type must include at least one variable that does not belong to variable in . Then, . Let be the last such (i.e., closest to ), and let be the variable adjacent to in . is a chain from to , it follows from condition in the theorem that Since , and so we must have . , which contradicts the initial assumption. Fix a path of type , But then edge witnesses that ¾ and let be the last variable from in . Let be an arbitrary path from . Proposition 3 The concatenation gives a valid chain from to . Proof: Since is a chain from to , and is a chain from to , it follows that the concatenation of is a chain from to . Now, such chain must be a valid one, otherwise we would have a contradicion to the recursiveness of the ¾ model. It follows from the two propositions above that ´½µ can be expressed as a linear combination of the ’s. Now, we consider the paths of type Proposition 4 Every path . of type contains a variable such that (i) is a descendant of ; (ii) points to . Proof: Assume that the proposition does not hold for some path of type . Then, must include at least one variable that does not belong to and satisfies and above. Let be the last such variable in (i.e., closest to ), and let be the variable 75 . Since is unblocked and points to , we get that must be a chain from to . This fact, together with condition gives , and so we must have . But then edge implies that that , which contradicts the initial assumption. ¾ Fix a path of type , and let be the last variable from in . adjacent to in Proposition 5 Every intermediate variable in Proof: Let be the last variable in such that is an ancestor of . is not a chain from . Clearly, any intermediate variable in is an ancestor of . Moreover, subpath must point to . Since is unblocked, it follows that must be a chain from to . But this implies that and every intermediate variable in is an ancestor of . Since is an ancestor of the proposition holds. ¾ Let be an arbitrary path from . to Proposition 6 The concatenation and . Proof: Since gives a valid unblocked path between is a chain from to , the path obtained from the concatenation is unblocked. The validity of the path follows from the facts that every intermediate variable in of is an ancestor of , every intermediate variable in is a descendant , and the recursiveness of the model. ¾ It follows from the two propositions above that a linear combination of the Case 2: ´¾µ can be expressed as ’s. . be the set of unblocked paths between and , so that . Clearly, we can write . Thus, we only need to show Let 76 that the second term on the right-hand side is a linear combination of the ’s and ’s. There are two types of paths in : 1) chains from to with some intermediate variable that does not belong to ; 2) unblocked paths between and that point to . Let us consider first paths of type Proposition 7 Every path . of type contains an intermediate variable from . Proof: (same as proof of proposition 2) ¾ Fix a path of type , and let be the last intermediate variable from in . Let be an arbitrary path from . Proposition 8 The concatenation of gives a valid chain from to . Proof: (same as proof of proposition 3) ¾ It follows from the two propositions above that a linear combination of the ´½µ can be expressed as ’s. Now, we consider paths of type , and further divide them into: a) paths that have an intermediate variable from a chain from to ; b) all the remaining paths. 77 , such that is ¾ Let , be of type and let be the first variable from in (i.e., closest to ). Let be an arbitrary path from . Proposition 9 The concatenation gives a valid unblocked path between and . Proof: Since is a chain from to , the path obtained from the con- catenation is unblocked. Now, let be an intermediate variable in and . Then, is a descendant of . But, by definition, no path in contain such variables. Thus, it follows that the concatenation produces a valid path. ¾ It follows from the two propositions above that ´¾µ can be expressed as a linear combination of the ’s. Now, consider paths of type Proposition 10 Every path variable from such that Proof: Let . be a path of type , be of type contains an intermediate is a chain from to . and let be the first variable in is not a chain. Clearly, such variable must exist, otherwise such that would be a chain from to that does not include any edge from . Moreover, points to , and so If must be a chain from to . , then we are done. So, we consider the two remaining cases. Assume that . Then, the proposition easily follows from the facts that is a chain from to , and that there is no edge pointing to , where and . Finally, if , then let be the variable adjacent to in Since edge points to , it follows that 78 . . But cannot belong to . , because is a chain from to , and then Thus, we must have , and condition of the theorem gives that is a directed edge from to . Hence, Fix a path of type would be of type , ¾ is a chain from to . and let be the last intermediate variable from in . Let be an arbitrary path from . Proposition 11 The concatenation gives a valid unblocked path between and . Proof: Since is a chain from to , the path obtained from the concatenation is unblocked. Now, let be the last variable in such that from to (if there is no such variable, we take of type , is a chain ). It follows that, since does not contain any variable from intermediate variable in . is Moreover, every is an ancestor of . But, by definition, intermediate variables in any path from must belong to and be descendants of . ¾ Hence, the concatenation produces a valid path. It follows from the two propositions above that ´¾µ can be expressed as a linear combination of the ’s. Case 3: . Let denote the set of unblocked paths between and , so that There are two types of paths in : 1) chains from to ; 2) unblocked paths between and that point to . Let us consider paths of type first. 79 . Proposition 12 Every path ¾ of type contains an intermediate variable from . Proof: Follows from the fact that there is no edge pointing to , where and ¾ . Fix a path of type , and let be the last intermediate variable from in . Let be an arbitrary path from . Proposition 13 The concatenation of gives a valid chain from to . Proof: (same as proof of proposition 3) ¾ It follows from the two propositions above that ´½µ can be expressed as a linear combination of the ’s. Now, we consider paths of type a) paths , and further divide them into: that have an intermediate variable from such that is a chain from to ; b) all the remaining paths. Fix a path of type , and let be the first variable from in . Let be an arbitrary path from . Proposition 14 The concatenation gives a valid unblocked path between and . Proof: (same as proof of proposition 9) ¾ It follows from the two propositions above that linear combination of the ’s. Now, we consider paths of type . 80 can be expressed as a ´¾ µ Proposition 15 Every path from such that ¾ of type contains an intermediate variable is a chain from to . Proof: (same as proof of proposition 10) ¾ Fix a path of type , and let be the last intermediate variable from in . Let be an arbitrary path from . Proposition 16 The concatenation gives a valid unblocked path between and . Proof: (same as proof of proposition 11) ¾ It follows from the two propositions above that linear combination of the ’s. 81 can be expressed as a ´¾ µ ¾ APPENDIX C (Proofs from Chapter 7) C.1 Preliminary Results C.1.1 Partial Correlation Lemma Next lemma provides a convenient expression for the partial correlation coefficient of ½ and , given , denoted Lemma 9 The partial correlation where and . The proof is given in Section C.3. can be expressed as the ratio: are functions of the correlations among (C.1) , satisfying the following conditions: . is linear on the correlations (i) (ii) , with no constant term. (iii) The coefficients of relations among the variables , in are polynomials on the cor- . Moreover, the coefficient of has the constant term equal to 1, and the coefficients of the correlations , with no constant term. 82 , are linear on ½ (iv) ½ ¾ ½ , is a polynomial on the correlations among the variables ½ , , , with constant term equal to 1. C.1.2 Path Lemmas The following lemmas explore some consequences of the conditions in the definition of Instrumental Sets. Lemma 10 W.l.o.g., we may assume that, for , paths and do not have any common variable other than (possibly) . Proof: Assume that paths and have some variables in common, different from . Let be the closest variable to in path which also belongs to path . We show that after replacing triple tions still hold. Ï by triple Ï that subpath must point to . unblocked, subpath must be a directed path from to . It follows from condition , condi- Since is Now, variable cannot be a descendent of , because is a directed path from to , and is a non-descendent of . Thus, condition still holds. Consider the causal graph . Assume that there exists a path between and Ï does not d-separate from in . Since is a directed path from to , we can always find another path witnessing that Ï does not d witnessing that (for example, if and do not have any variable in common other than , then we can just take their concatenation). But this is a contradiction, and thus it is easy to see that condition still holds. separate from Condition in follows from the fact that and point to . In the following, we assume that the conditions of lemma 10 hold. 83 ¾ Lemma 11 For all , there exists no unblocked path between different from , which includes edge ½ and , and is composed only by edges from . Proof: Let be an unblocked path between and , different from , and assume that is composed only by edges from ½ . According to condition , if appears in some path , with , then it must be that . Thus, must start with some edges of . Since is different from , it must contain at least one edge from ½ ½ . Let ½ ¾ denote the first edge in which does not belong to . From lemma 10, it follows that variable ½ must be a for some condition , both subpath ½ and edge ½ ¾ must point to implies that is blocked by ½ , which contradicts our assumptions. , and by ½ . But this ¾ The proofs for the next two lemmas are very similar to the previous one, and so are omitted. Lemma 12 For all , there is no unblocked path between and some composed only by edges from ½ . Lemma 13 For all edge C.2 , with , there is no unblocked path between and including , composed only by edges from ½ . Proof of Theorem 8 C.2.1 Notation and Basic Linear Equations Fix a variable in the model. Let ½ be the set of all non-descendents of which are connected to by an edge (directed, bidirected, or both). Define the 84 following set of edges with an arrowhead at Note that for some : there may be more than one edge between and (one directed and one bidirected). Thus, . Let ½ , , denote the parameters of the edges in . It follows that edges ½ , belong to , because ½ , are clearly non-descendents of . W.l.o.g., let be the parameter of edge , , and let be the parameters of the remaining edges in . Let be any non-descendent of . Wright’s equation for the pair , is given by paths (C.2) where each term corresponds to an unblocked path between and . Next lemma proves a property of such paths. Lemma 14 Let be a variable in a recursive model, and let be a non-descendent of . Then, any unblocked path between and must include exactly one edge from . Lemma 14 allows us to write Eq. (C.2) as (C.3) Thus, the correlation between and can be expressed as a linear function of the parameters , with no constant term. Consider a triple Ï , and let Ï ½ 1 . From lemma 9, we can express the partial correlation of and given ½ To simplify the notation, we assume that Ï , for 85 Ï as: Ï ½ ½ ½ where function is linear on the correlations (C.4) , ½ , , , and is a function of the correlations among the variables given as arguments. We abbreviate ½ by Ï, and by Ï. ½ We have seen that the correlations , ½ , , , can be expressed as linear functions of the parameters . Since is linear on these correlations, it follows that we can express as a linear function of the parameters . Formally, by lemma 9, Also, for each Ï ¾ ¼ Ï can be written as: ½ ½ (C.5) Ï, we can write: (C.6) Replacing each correlation in Eq.(C.5) by the expression given by Eq. (C.6), we obtain Ï (C.7) where the coefficients ’s are given by: (C.8) Lemma 15 The coefficients in Eq. (C.7) are identically zero. Proof: The fact that Ï d-separates from in , implies that Ï in any probability distribution compatible with ([Pea00a], pg. 142). Thus, Ï must vanish when evaluated in . But this implies that the coefficient of each of the ’s in Eq. (C.7) must be identically zero. 86 Now, we show that the only difference between evaluations of Ï on the causal graphs and , consists on the coefficients of parameters ½ . First, observe that coefficients ¼ are polynomials on the correlations among the variables ½ . Thus, they only depend on the unblocked paths between such variables in the causal graph. However, the insertion of edges ½ ..., , , in does not create any new unblocked path between any pair of ½ (and obviously does not eliminate any existing one). Hence, the coefficients ¼ have exactly the same value in the evaluations of on and . Now, let be such that , and let edges ½ Ï Ï . Note that the insertion of , . . . , , in does not create any new unblocked path between and including the edge whose parameter is (and does not eliminate any existing one). Hence, coefficients , , have exactly the same value on and . From the two previous facts, we conclude that, for , the coefficient of in Ï on and have exactly the same value, namely zero. Ï does not vanish when evaluated on . Next, we argue that Finally, let be such that , and let Ï . Note that there is no the evaluations of unblocked path between and in including edge , because this edge does not exist in . Hence, the coefficient of in the expression for the correlation on must be zero. On the other hand, the coefficient of in the same expression on is not necessarily zero. In fact, it follows from the conditions in the definition of Instrumental sets that, for , the coefficient of contains the term . From lemma 15, we get that eters ½ . Ï 87 ¾ is a linear function only on the param- C.2.2 System of Equations Ï , linear equations on the parameters ½ : Rewriting Eq.(C.4) for each triple we obtain the following system of ½ ½ Ͻ ½ Ͻ ½ ½ Ͻ ½ Ͻ Ï Ï Ï Ï where the terms on the right-hand side can be computed from the correlations among the variables ½ , estimated from data. Our goal is to show that identification of ½ can be solved uniquely for the ’s, and so prove the . Next lemma proves an important result in this direction. Let denote the matrix of coefficients of . is a non-trivial polynomial on the parameters of the model. Lemma 16 Proof: From Eq.(C.8), we get that each entry of is given by where is the coefficient of (or , if Ï ), in the linear expression for in terms of correlations (see Eq.(C.5)); and is the coefficient of in the expression for the correlation in terms of the parameters (see Eq.(C.6)). of lemma 9, we get that ¼ has constant term equal to 1. Thus, we can write ¼ ¼ , where ¼ represent the remaining terms of ¼ . From property 88 Also, from condition Thus, we can write ¼ of Theorem 8, it follows that ¼ ¼ contains term . , where ¼ represents all the remaining terms of ¼ . Hence, a diagonal entry of , can be written as ¼ Now, the determinant of , ¼ ¼ (C.9) is defined as the weighted sum, for all permutations of of the product of the entries selected by (entry is selected by permuta- tion if the element of is ), where the weights are or , depending on the parity of the permutation. Then, it is easy to see that the term appears in the product of permutation , which selects all the diagonal entries of . We prove that product of permutation does not vanish by showing that appears only once in the , and that does not appear in the product of any other permutation. Before proving those facts, note that, from the conditions of lemma 10, for , paths and have no edge in common. Thus, every factor of is distinct from each other. Proposition 17 Term appears only once in the product of permutation Proof: Let be a term in the product of permutation . . Then, has one factor corresponding to each diagonal entry of . A diagonal entry of can be expressed as a sum of three terms (see Eq.(C.9)). 89 Let be such that for all , the factor of from the first term of (i.e., corresponding to entry comes ¼ ). Assume that the factor of corresponding to entry comes from the second term ¼ ¼ ). Recall that each term in ¼ corresponds to an unblocked path beof (i.e., tween and , different from , including edge . However, from lemma 11, any such path must include either an edge which does not belong to any of ½ , or an edge which appears in some of . In the first case, it is easy to see that must have a factor which does not appear in . In the second, the parameter of an edge of some , , must appear twice as a factor of , while it appears only once in . Hence, and are distinct terms. Now, assume that the factor of term of (i.e., pression for ). Recall that Ï . From property on the correlations corresponding to entry ½ comes from the third is the coefficient of of lemma 9, in the ex- is a linear function , with no constant term. Moreover, correlation can be expressed as a sum of terms corresponding to unblocked paths between and . Thus, every term in has the term of an unblocked path between and some as a factor. By lemma 12, we get that any such path must include either an edge that does not belong to any of , or an edge which appears in some of . As above, in both cases and must be distinct terms. After eliminating all those terms from consideration, the remaining terms in the product of are given by the expression: ¼ Since ¼ is a polynomial on the correlations among variables ½ , with no constant term, it follows that appears only once in this expression. ¾ Proposition 18 Term does not appear in the product of any permutation other than 90 . Proof: Let be a permutation different from , and let be a term in the product of . Let be such that, for all before, for , selects the diagonal entry in the row of . As , if the factor of corresponding to entry does not come from the first term of (i.e., ¼ ), then must be different from . So, we assume that this is the case. Assume that does not select the diagonal entry some entry , with of . Then, must select . Entry can be written as: ¼ ¼ Assume that the factor of corresponding to entry comes from term ¼ ¼ . Recall that each term in edge ¼ corresponds to an unblocked path between and including . Thus, in this case, lemma 13 implies that and are distinct terms. Now, assume that the factor of corresponding to entry comes from term . Then, by the same argument as in the previous proof, terms and are distinct. ¾ Hence, term is not cancelled out and the lemma holds. C.2.3 Identification of Lemma 16 gives that ¾ is a non-trivial polynomial on the parameters of the model. Thus, only vanishes on the roots of this polynomial. However, [Oka73] has shown that the set of roots of a polynomial has Lebesgue measure zero. Thus, system has unique solution almost everywhere. It just remains to show that we can estimate the entries of the matrix of coefficients 91 of system from data. of matrix : Let us examine again an entry From condition of lemma 9, the factors in the expression above are polynomials on the correlations among the variables ½ , and thus can be estimated from data. Now, recall that ¼ is given by the sum of terms corresponding to each unblocked path between and including edge . Precisely, for each term in ¼ , there is an unblocked path between and including edge , such that is the product of the parameters of the edges along , except for . However, notice that for each unblocked path between and including edge , we can obtain an unblocked path between and , by removing edge . On the other hand, for each unblocked path between and we can obtain an unblocked path between and , by extending it with edge . Thus, factor ¼ is nothing else but . It is easy to see that the same argument holds for with . Thus, , . Hence, each entry of matrix can be estimated from data, and we can solve the system of equations to obtain the parameters . C.3 Proof of Lemma 9 Functions and are defined recursively. For ½ ¾ 92 , , we have For ¾ ½ ¾ Using induction and the recursive definition of Now, we prove that functions and This is clearly the case for ½ , it is easy to check that: ¾ as defined satisfy the properties . . Now, assume that the properties are satisfied for all . Property follows from the definition of and the assumption that it holds for Now, . is linear on the correlations is equal to . Since , it is linear on the correlations . Thus, is linear on , with no constant term, and property holds. Terms and are polynomials on the correlations among the variables . Thus, the first part of property holds. For the second part, note that correlation only appears in the first term of , and by the inductive hypothesis has constant term equal to 1. Also, since and the later one 93 is linear on the correlations ½¾ ½¿ ½ , we must have that the coefficients of ´½ ¾ µ must be linear on these correlations. Hence, property ´ µ holds. Finally, for property ´ µ, we note that by the inductive hypothesis, the first term of ´ ¾ ´ ¿ ¾ ½µµ has constant term equal to 1, and the second term has no ¾ constant term. Thus, property ´ µ holds. 94 R EFERENCES [BMW94] P.A. Bekker, A. Merckens, and T.J. Wansbeek. Identification, equivalent models, and computer algebra. Academic Press, Boston, 1994. [Bol89] K.A. Bollen. Structural Equations with Latent Variables. John Wiley, New York, 1989. [BP02] C. Brito and J. Pearl. “A Graphical Criterion for the Identification of Causal Effects in Linear Models.” In Proceedings of the AAAI Conference, pp. 533–538, Edmonton, 2002. [BT84] R.J. Bowden and D.A. Turkington. Instrumental Variables. Cambridge University Press, Cambridge, England, 1984. [BW80] P. M. Bentler and D. G. Weeks. “Linear structural equations with latent variables.” Psychometrika, pp. 289–307, 1980. [CCK83] J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. A. Tukey. Graphical Methods for Data Analysis. Wadsworth, Belmont, CA, 1983. [CCR90] T. Cormen, C.Leiserson, and R. Rivest. Introduction to Algorithms. The MIT Press, 1990. [Dun75] O.D. Duncan. Introduction to Structural Equation Models. Academic Press, New York, 1975. [Fis66] F.M. Fisher. The Identification Problem in Econometrics. McGraw-Hill, New York, 1966. [Ger83] V. Geraci. “Errors in variables and the indivicual structure model.” International Economic Review, 1983. [HD96] C. Huang and A. Darwiche. “Inference in belief networks: A procedural guide.” International Journal of Approximate Reasoning, 15(3):225–263, 1996. [KKB98] D.A. Kenny, D.A. Kashy, and N. Bolger. “Data Analysis in Social Psychology.” In D. Gilbert, S. Fiske, and G. Lindzey, editors, The Handbook of Social Psychology, pp. 233–265. McGraw-Hill, Boston, MA, 1998. [KS80] R. Kindermann and J.L. Snell. Markov Random Fields and their Applications. American Mathematical Society, Providence, RI, 1980. 95 [LS88] S. L. Lauritzen and D. J. Spiegelhalter. “Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems.” Journal of the Royal Statistical Society (B), 50:157–224, 1988. [McD97] R. McDonald. “Haldane’s lungs: A case study in path analysis.” Multiv. Behav. Res., pp. 1–38, 1997. [Oka73] M. Okamoto. “Distinctness of the Eigenvalues of a Quadratic form in a Multivariate Sample.” Annals of Statistics, 1(4):763–765, July 1973. [Pea88] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA, 1988. [Pea98] J. Pearl. “Graphs, causality, and structural equation models.” Sociological Methods and Research, 27(2):226–284, 1998. [Pea00a] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. [Pea00b] J. Pearl. “Parameter identification: A new perspective.” Technical Report R-276, UCLA, August 2000. [Rig95] E.E. Rigdon. “A necessary and sufficient identification rule for structural models estimated in practice.” Multivariate Behavioral Research, 30(3):359–383, 1995. [SRM98] P. Spirtes, T. Richardson, C. Meek, R. Scheines, , and C. Glymour. “Using path diagrams as a structural equation modelling tool.” Socioligical Methods and Research, pp. 182–225, November 1998. [TEM93] K. Tambs, L.J. Eaves, T. Moum, J. Holmen, M.C. Neale, S. Naess, and P.G. Lund-Larsen. “Age-specific genetic effects for blood pressure.” Hypertension, 22:789–795, 1993. [WL83] N. Wermuth and S.L. Lauritzen. “Graphical and recursive models for contingency tables.” Biometrika, 70:537–552, 1983. [Wri34] S. Wright. “The method of path coefficients.” Ann. Math. Statist., 5:161– 215, 1934. 96