TESTING LATENT VARIABLE MODELS WITH SURVEY DATA TABLE OF CONTENTS FOR THIS SECTION As a compromise between formatting and download time, the chapters below are in Microsoft Word. You may want to use "Find" to go to the chapters below by pasting the chapter title into the "Find what" window. FORWARD INTRODUCTION THEORETICAL MODEL TESTING STUDIES STEP I IN UNOBSERVED VARIABLE-SURVEY DATA (UV-SD) MODEL TESTING-DEFINING MODEL CONCEPTS FIRST-ORDER CONSTRUCTS SECOND-ORDER CONSTRUCTS INTERACTIONS AND QUADRATICS DEFINING CONCEPTS IDENTIFYING IMPORTANT ANTECEDENTS SUGGESTIONS FOR STEP I-- DEFINING MODEL CONCEPTS 2002 Robert A. Ping, Jr. 9/20/02 i TESTING LATENT VARIABLE MODELS WITH SURVEY DATA FORWARD This book critically reviews the process of testing, or validating, as it is sometimes called, theoretical models involving unobserved or latent variables and survey data, and selectively suggests improvements in this process. Because I am a captive of my discipline, a branch of social science research that investigates socio-economic exchanges between organizations (also known as marketing channels research), the book's examples and many of its comments about theoretical model testing practices using survey data involve research in Marketing. Nevertheless, because research in Marketing follows the same conventions and practices for theoretical model testing as the other brances of the social sciences, this book and its suggestions have application across the social sciences. Thus, this book is intended for researchers who test theoretical models involving latent variables1 and survey data, and its purpose is to help these researchers reliably test these models with survey data. I am a theory tester, and my first experience with covariant structure analysis or structural equation analysis (I shall use the more popular term structural equation analysis) was while testing a theoretical model with multiple dependent or endogenous variables and survey data. After trying Ordinary Least Squares regression, then Canonical Correlation, to estimate the path coefficients in my structural model, I settled on using structural equation analysis, specifically LISREL, for its ability to jointly estimate simultaneous equations and its ability to model the effects of measurement error. I was subsequently surprised by how difficult structural equation analysis was to use. In those days LISREL was available only on a mainframe computer. Using it to estimate path coefficients in a structural model required learning the LISREL programming "language," then writing and debugging a comparatively lengthy computer program using this language. LISREL is now available on PC computers, and programming has been somewhat simplified (see Hayduk, 1996:xiii) with its SIMPLIS code (LISREL program) generator. There were (and are) other structural equation analysis programs besides LISREL2 (e.g., EQS, AMOS, etc.), but in comparison to OLS regression, for example, structural equation analysis still seems as much an "art" as an estimation technique. Thus, this book is also intended to help make analyzing survey data using structural equation analysis a little easier. The book is organized around the process of testing theoretical models involving latent variables and survey data (i.e., first define the model constructs. Then state the relationships among these constructs, develop appropriate measures of the constructs, and gather data using these measures. Next validate these measures, and test the stated relationships among the constructs). It brings together what is known about this process, and it selectively adds to this body of knowledge. The book begins with a discussion of the process of testing theoretical 1. The book is restricted to models that have criterion variables rather than classification variables (i.e., latent class models are not discussed). 2. Available structural equation analysis software include AMOS (Arbuckle, 1988), COSAN (Fraser, 1988), EQS (Bentler, 1985), LINCS (Schoenberger and Arminger, 1988), and LISCOMP (Muthen, 1984). 2002 Robert A. Ping, Jr. 9/20/02 ii models involving latent variables and survey data, then it details each step in this process using several examples involving real-world survey data. Although the book assumes the reader is familiar with the terminology of covariant structure analysis or structural equation analysis, and a software package for the analysis of structural equations (e.g., LISREL, EQS, AMOS, etc.), I have tried to make it as accessible as possible. The list of those I should thank in this book is long and certainly incomplete. My first exposure to structural equation analysis was while working with Bob Dwyer at the University of Cincinnati. Neil Ritchie, also at UC, helped refine that first exposure. My thinking about structural equations has been heavily influenced by the writings of James Anderson, Richard Bagozzi, David Gerbing and John Hunter; Peter Bentler; Kenneth Bollen; Michael Browne and Robert Cudeck; John Fox and Michael Sobel; Leslie Hayduk; Karl Jöreskog and Dag Sörbom; and John Kenny. This book is on my web site for several reasons. It allows me to use my recent experiences to periodically revise the book without the rigors of publishing a revised edition. Because the book is searchable using standard "Find" functions, I did not have to spend time building an index, and this searchabilty seems to make it more useful than a printed version. However, for several reasons the book is in Microsoft Word, rather than Acrobat Reader (PDF) or HTML, so it may download rather slowly. It was also "auto" formatted from WordPerfect to Word, and so, in addition to my own errors of omission and commission, there are probably reformatting errors. If you see anything you like, or dislike, errors, etc., please e-mail me with the details. Robert A. Ping, Jr. Department of Marketing Wright State University Dayton, Ohio 45435-0001 rping@wright.edu 2002 Robert A. Ping, Jr. 9/20/02 iii TESTING LATENT VARIABLE MODELS WITH SURVEY DATA INTRODUCTION This monograph selectively suggests improvements in the process of testing theoretical models involving unobserved variables3 and survey data. Because my social science research involves theoretical research in Marketing, the book's examples and much of its discussion of theoretical model testing is focused there. Nevertheless, because the conventions and processes of theoretical model testing using survey data are generally the same across the social sciences, the book's suggestions have application throughout the social sciences. Thus, the book is intended for researchers in the social sciences who test theoretical models involving latent variables and survey data. The suggestions in the monograph are prompted by a review of a sample of substantive articles in the social sciences that qualitatively judged compliance with generally accepted procedures for testing latent variables using survey data, and a review of the recent social science methods literature.4 The book also selectively extends recent results in the methods literature, and proposes novel applications of several others. It provides numerous explanations and examples, and overall is intended as a contribution to continuous improvement in the use of generally accepted procedures for theoretical model tests involving unobserved variables and survey data in the social sciences. THEORETICAL MODEL TESTING STUDIES Perhaps Bollen (1989:268) best states the objective of theoretical model testing studies that involve unobserved variables and survey data: "In virtually all cases we do not expect to have a completely accurate description of reality. The goal is more modest. If the model... helps us to 3. The book assumes criterion dependent or endogenous variables rather than classification variables (i.e., latent class models are not discussed). 4. I reviewed articles reporting theoretical model tests involving survey data in the the Journal of Consumer Research, the Journal of Marketing Research, the Journal of Marketing, Marketing Science, the Journal of the Academy of Marketing Science, the Journal of Retailing, the Journal of Personal Selling and Sales Management, and the Journal of Business Research from 1980 to the present. I also selectively reviewed methodological articles in the Psychological Bulletin/Psychological Methods, Psychometrika, Multivariate Behavioral Research, Sociological Methodology, Sociological Methods and Research, the Journal of Marketing Research, and Quality and Quantity, for this same period. I had planned to complete similar reviews of the major journals in Psychology, Education, Political Science, Sociology, etc., but shortly after starting those reviews it became apparent that research in Marketing could be argued to be representative of the good and bad practices involved in testing theoretical model involving survey data. The resulting sample also may restrict any raised hackles resulting from my critical remarks about model testing practices in the sample to those in Marketing. 2002 Robert A. Ping, Jr. 9/20/02 1 understand the relations between variables and does a reasonable job of matching (fitting) the data, we may judge it (the model) as partially validated. The assumption that we have identified the exact process generating the data would not be accepted." Reasonableness or adequacy in model testing studies involving unobserved variables and survey data is addressed by first determining measure adequacy, then determining model adequacy. Measure adequacy is typically determined using conceptual definitions of the unobserved concepts, observed items that "tap into" or measure the unobserved concepts, and, increasingly, model-to-data fit and parameter estimates from measurement models that utilize structural equation analysis. Model adequacy is determined using hypotheses, and model-to-data fit and parameter estimates from structural models that utilize structural equation analysis. Specifically, social science researchers appear to agree that specifying and testing models using unobserved variables with multiple item measures of these unobserved variables and survey data (UV-SD models) involve i) defining model constructs, ii) stating relationships among these constructs, iii) developing appropriate measures of these constructs, iv) gathering data using these measures, v) validating these measures, and vi) validating the model (i.e., testing the stated relationships among the constructs). However based on articles I reviewed, there also appears to be considerable latitude in some cases, and confusion in others, regarding how these steps are carried out in UV-SD model tests. For example, in response to calls for increased psychometric attention to measures in theoretical model tests in Marketing (e.g., Churchill, 1979; Churchill and Peter, 1984; Cote and Buckley, 1987, 1988; Heeler and Ray, 1972; Peter, 1979, 1981; Peter and Churchill, 1986; among others), reliability and validity now receive more attention in these tests, when compared to the study results. However, the articles reviewed exhibited significant variation in what constitutes an adequate demonstration of valid and reliable measures when unobserved variables and survey data were involved. For example in some articles, steps v) (measure validation) and vi) (model validation) involved separate data sets. In other articles a single data set was used to validate both the measures and the model. Further, in some articles the reliabilities of measures used in previous studies were reassessed. However in other articles, reliabilities were assumed to be constants that, once assessed, should be invariant in subsequent studies. Similarly, in some articles many facets of validity for each measure were examined, even for previously used measures. In other articles few facets of measure validity were examined, and validities for existing measures were also assumed to be constants (i.e., once judged acceptably valid a measure should be acceptably valid in subsequent studies). Further, methodologists in the Social Sciences have long warned about regressions potential for coefficient bias and sample-to-sample coefficient variation (inefficiency) because of measurement error (Bohrnstedt and Carter, 1971; see Rock, Werts, Linn and Jöreskog, 1977; 2002 Robert A. Ping, Jr. 9/20/02 2 Warren, White and Fuller, 1974; and demonstrations in Cohen and Cohen, 1983). Nevertheless based on the articles I reviewed, regression still appears to be generally acceptable in some areas as an estimation technique for survey data with variables that contain measurement error. In addition, although many of the studies I reviewed acknowledged the risk of generalizing from a single study,5 in general there was little subsequent concern about the appropriateness or amount of generalizing from a single study. Because there are other examples, major and minor, such as little apparent concern about violations of the assumptions underlying the estimation techniques used in UV-SD model tests (e.g., the use or ordinal data with covariant structure analysis, which assumes continuous data), it seems fair to say there appear to be fewer generally accepted principles of model validation using unobserved variables and survey data than there could be.6 Fortunately there have been important advances in validating UV-SD models. These include new results in developing, testing and evaluating multiple item measures, and estimating models employing these measures. However, some of these developments have appeared in literatures not widely read or easily understood by all substantive researchers in the social sciences. 5. Generalizing from a single study involves recommending interventions based on a single study, which ignores the possibility that "confirmed" hypotheses could be disconfirmed in a subsequent study, and disconfirmed associations could be confirmed in a future study. This can occur in many ways, including errors in the study (e.g., omission of important predictor/independent/exogenous variables, the use or ordinal data with structural equation analysis which assumes continuous data, etc.), and the presence of unmodeled interactions or quadratic latent variables in the model. 6. A previous version of this monograph commented on the effects on of this latitude in UV-SD studies. However, informal reviewers reacted negatively to statements such as "authors and reviewers... may apply idiosyncratic, rather than generally accepted, standards in evaluating these studies," and, "this can produce an unnecessarily prolonged and unpredictable review processes... in which errors of acceptance and rejection can be higher than they ought to be." They stated that these problems were well known and readers did not need to be reminded of them. 2002 Robert A. Ping, Jr. 9/20/02 3 Thus, this monograph is intended for these researchers, and one of its objectives is to selectively identify areas for continuous improvement in the process of testing UVSD models. The book provides a qualitative review of UV-SD model testing practices.7 It provides selective discussions of the errors of omission and commission in the process of testing theoretical models involving latent variables and survey data, and it suggests, and selectively extends, remedies from recent applicable methods research. For example, it suggests an additional procedure for achieving model-to-data fit, or consistency in a measure, using covariant structure analysis (I will use the more popular term, structural equation analysis, e.g., analysis using LISREL, EQS, AMOS, etc.), and it includes a suggestion for easily executed pretests using scenario analyses. The monograph provides accessible discussions of several overlooked but valuable statistics such as Average Variance Extracted (AVE) and Root Mean Squared Error of Approximation (RMSEA), and it suggests an estimator of AVE that does not rely on structural equation analysis. It discusses matters that may be well-known to methodologists but may not be as well known to substantive researchers, such as a discussion of error-adjusted regression, the use of single summed indicators in structural equation analysis, and the use of a nonrecursive model to investigate directionality or causality. This research selectively discusses recent advances in the detection of interactions and quadratics, and provides a rationale for the more frequent inclusion of interactions and quadratics in UV-SD model tests. It calls for additional attention to measure consistency, and thus model-to-data fit, in structural equation analysis, and argues for a higher thresholds for acceptable reliability based on average extracted variance. This research also renews calls for caution in generalizing from a single study because of the unavoidable risks from violations of methodological assumptions and the use of inter-subject research designs to test intra-subject hypotheses. It also discusses the implications of reliability and facets of validity as sampling statistics with unknown sampling distributions. It suggests techniques such as easily executed experiments that could be used to pretest measures, and bootstrapping for reliabilities and facets of validity. In addition, it suggests an alternative to omitting items in structural equation analysis to improve model-to-data fit, that should be especially useful for older measures established before structural equation analysis became popular. The first step in UV-SD model testing is discussed next. 7. See Footnote 4. 2002 Robert A. Ping, Jr. 9/20/02 4 STEP I IN UNOBSERVED VARIABLE-SURVEY DATA MODEL TESTING-DEFINING MODEL CONCEPTS Models with unobserved variables with multiple item measures of unobserved variables and survey data (UV-SD models) involve so called latent or unobserved variables because we observe indirect evidence or indications of these model variables. For example, we can directly observe or measure a concept such as household income, but we can measure only indirect evidence or indications of the concept or construct overall satisfaction. Thus, it is the practice in social science to measure several indications, or what is termed indicators, of each latent variable using multiple-item measures. The construction of these indicators or multiple items is guided by the definition of the construct or concept. These definitions were as a rule clearly stated in the articles I reviewed. Because these matters have received attention previously (e.g., Bollen , 1989:180 and Churchill, 1979), later in this section I will simply summarize the two definitional requirements for the unobserved variables typically involved in theoretical model tests: conceptual definition and operational definition. However, because several types of constructs are underutilized in the social sciences and they offer opportunities to add to the richness and descriptiveness of UV-SD models, I will comment on second-order constructs, and interactions and quadratics. To discuss second-order constructs I begin with the notion of a first-order construct. FIRST-ORDER CONSTRUCTS A first-order construct has observed variables (i.e., the items in its measure) as indicators of the construct. These constructs were ubiquitous in the articles I reviewed. The relationship between indicators and their construct in a first-order construct typically assumes the construct "drives" the indicators (i.e., the indicators are observable instances or manifestations of their unobservable construct, and a diagram of the construct and its indicators would show the construct specified or connected to the indicators with arrows from the construct to the indicators-- a reflexive relationship, see Bagozzi, 1980b, 1984 and Figure A in Appendix A). Less frequently, the indicators "drive" the construct (i.e., the indicators define the construct rather than being several instances of a construct, and a diagram of the construct and its indicators would show the indicators connected to the construct with arrows from the indicators to the construct-- a formative relationship, see Fornell and Bookstein, 1982). In UV-SD models estimated using regression, unidimensional items are summed to indicate their construct, while in structural equation analysis the indicators relationship with their 2002 Robert A. Ping, Jr. 9/20/02 5 construct is explicitly specified using structural equation analysis software such as AMOS, EQS and LISREL) (however, see Step V-- Single Indicator Structural Equation Analysis below for a summed indicator approach used in structural equation analysis). SECOND-ORDER CONSTRUCTS Second-order constructs have other unobserved constructs as their "indicators" (see Figure J in Appendix J). These constructs were infrequently observed in the articles I reviewed. Nevertheless, in Dwyer and Ohs (1987) study of environmental munificence and relationship quality in interfirm relationships, for example, the second-order construct relationship quality had the first-order constructs satisfaction, trust, and minimal opportunism as indicators (see Bagozzi, 1981a; Bagozzi and Heatherton, 1994; Gerbing and Anderson, 1984; Gerbing, Hamilton and Freeman, 1994; Hunter and Gerbing, 1982; Jöreskog, 1970; and Rindskopf and Rose, 1988 for accessible discussions of second-order constructs). Each first-order construct has its respective observed indicators. Specifying second-order constructs, such as relationship quality, with firstorder constructs simplified the structural paths of the model, and it provided a richer description of the consequences of environmental munificence for example. A similar approach was taken in Pings (1997a) study of the relationship between cost-of-exit and voice in interfirm relationships. Cost-of-exit was itemized using several unobserved constructs (alternative attractiveness, relationship investment, and switching cost) which themselves had observed indicators. As these examples suggest, a second-order construct can be used to combine several related constructs into a single higher-order construct to simplify the structural paths in a UV-SD model. A second-order construct can also be used as an alternative to omitting items of a multidimensional measure to obtain model-to-data fit in structural equation analysis. This can be useful with established measures, developed before the advent of structural equation analysis, that turn out to be multi-dimensional using structural equation analysis (see Gerbing, Hamilton and Freeman, 1994). In addition, a second-order construct can be used to account for types of error other than measurement error in structural equation analysis (see Gerbing and Anderson, 1984). A second-order construct can be conceptualized for regression as factors in an exploratory factor analysis that are not particularly orthogonal (however, see the cautions about regression using variables measured with error in Step VI-- Violations of Assumptions). When the items in each of these factors are summed, an exploratory factor analysis of the resulting summed items is also unidimensional. To use a second-order construct in regression (e.g., for exploratory purposes), the items in each first-order construct (factor) should be unidimensional. In addition the second-order construct (factor) should be unidimensional using exploratory factor analysis 2002 Robert A. Ping, Jr. 9/20/02 6 with each first-order constructs summed items as a single item per construct, and the secondorder construct should be face or content valid using the first-order constructs as "items" (see Appendix J for an example). INTERACTIONS AND QUADRATICS Unlike experiments analyzed with ANOVA where interactions (e.g., XZ in Y = b0 + b1X + b2Z + b3XZ + b4XX + e (1 = b0 + b1X + (b2 + b3X)Z + b4XX + e ) (1a and quadratics (e.g., XX in Equation 1) are routinely estimated to help interpret significant main effects (i.e., the X-Y and Z-Y associations), interactions and quadratics were rarely seen in the articles reviewed. This may have been because authors have confused detection difficulties with their frequency of occurrence (see Podsakoff, Todor, Grover and Huber, 1984; also see McClelland and Judd, 1993). In addition, until recently interactions and quadratics in UV-SD models have been difficult for researchers to specify and interpret (see Aiken and West, 1991; Ping, 1995, 1996a).8 Further, because they are mathematical constructs or concepts rather than mental constructs, and have indicators that are products of observed variables, interactions and quadratics may be judged by some substantive researchers as inappropriate in UV-SD models.9 Nevertheless, authors have called for more thorough investigation of interactions and quadratics in survey research (e.g., Aiken and West, 1991; Blalock, 1965; Cohen, 1968; Cohen and Cohen, 1975, 1983; Darlington, 1990; Friedrich, 1982; Kenny, 1985; Howard, 1989; Jaccard, Turrisi and Wan, 1990; Neter, Wasserman and Kunter, 1989; Pedhazur, 1982). Their argument is the same as that used for main effects in ANOVA: failing to consider the possibility of interactions and quadratics in the population model is likely to lead to erroneous interpretations of the study's results. To explain, in Equation 1 the actual coefficient of Z is given by (b2 + b3X) (see Equation 1a). The statistical significance of this moderated coefficient of Z could be very different from the statistical significance of the coefficient of Z in Equation 1 without the XZ variable (i.e., Y = b0' + b1'X + b2'Z ) (see Aiken and West, 1991). Specifically, if the interaction is significant (i.e., b3 is significant) b2' could be nonsignificant while (b2 + b3X) is significant over part(s) of the range of 8. There are a variety of techniques for detecting interactions and quadratics. However, because many do not produce structural coefficients (e.g., b3 and b4 in Equation 1) that suggest the direction and strength of a significant interaction or quadratic, and they do not permit detailed interpretation such as that shown in Appendix C, the discussion will concentrate on techniques that estimate structural coefficients for interactions and quadratics. 9. This research treats interactions as mathematical constructs. Because they are mathematical, interactions and quadratics can, for example, be algebraically factored, as shown in Equation 1a. Because they are also constructs, their psychometrics (e.g., reliability and validity) are, or should be, important. 2002 Robert A. Ping, Jr. 9/20/02 7 X in the survey (see Table C2 in Appendix C). Thus, failing to include an interaction when it is present in the population model would lead to a misleading interpretation of the Z-Y association. While strictly speaking a nonsignificant b2' implies the Z-Y association is disconfirmed, it is clearly not the case with a significant XZ interaction that Z is never associated with Y. The association simply depends on the level of X. This has several implications, including that b2' could be observed to be variously nonsignificant or significant across multiple studies, and it may explain inconsistent findings in studies. Alternatively b2' could be significant while (b2 + b3X) could be nonsignificant over part of the range of X in a study. In this event, failing to include the interaction could produce a false "confirmation" of the Z-Y association: The significant Z-Y association is actually nonsignificant over parts of the range of X in a study. This error is especially insidious in UV-SD model tests. Many of the studies I reviewed provided interventions (e.g., recommendations to practitioners) based on significant associations in the study, many of which seemed to me could easily have been contingent associations. The algebra and implications of failing to consider the possibility of a population quadratic are similar. Thus, care should be taken to consider interactions and quadratics in UV-SD model testing studies. Specifically, they should, of course, be considered when theory postulates their existence. In addition, they should be considered in post-hoc probing, as is done in experiments analyzed with ANOVA, to aid either in interpreting and providing the implications of significant associations, or as a possible explanation for hypothesized but nonsignificant associations (see Appendix C for an example), or inconsistent results across studies. There has been considerable progress in estimating interactions in survey data using regression (e.g., Aiken and West, 1991; Denters and Puijenbrork, 1989; Feucht, 1989; Heise, 1986; Jaccard, Turissi and Wan, 1990; Ping, 1996b; Warren, White and Fuller, 1974) and structural equation analysis (see Bollen, 1995; Hayduk, 1987; Jaccard and Wan, 1995; Jöreskog and Yang, 1996; Kenny and Judd, 1984; Ping, 1995, 1996a; Wong and Long, 1987) (also see Appendix A). However, estimating interactions using structural equation analysis is more difficult than using Ordinary Least Squares regression (Aiken and West, 1991). Nevertheless latent variable interactions have been estimated using structural equation analysis (e.g., Hochwarter, Ferris and Perrewe, 2001; Lee and Bae, 1999; Lee and Ganesh, 1999; Masterson 2001; Osterhuis, 1997; Singh, 1998). In addition, interactions between first-order and secondorder latent variables have been estimated with structural equation analysis (see Ping, 1999). I will discuss the estimation of interactions and quadratics later. I now return to Step I-- Defining Model Concepts. DEFINING CONCEPTS 2002 Robert A. Ping, Jr. 9/20/02 8 Concepts are defined using words, then they are measured using observed variables or items that "tap" or are instances of these definitions. Thus, conceptual definitions, definitions of the concepts in the model, are required to provide meaning for the label (e.g., satisfaction, solidarity, equity, etc.) used for each concept, and meaning for concepts associated with these labels. For example, a concept with the label "overall relationship satisfaction" could have the conceptual definition, "the subject's overall evaluation of the costs and benefits attributed to the buyer-seller relationship." Conceptual definitions are important for judging the adequacy of the observed items used to measure the concept because these items should be judged to be instances or indicators of the concept. Operational definitions of concepts can be used in addition to conceptual definitions. Conceptual definitions can be general, while operational definitions can allow for contextual specificity or other contingencies in the study. For example, exiting (a label) could be conceptualized as, or have the conceptual definition of, physically ending the relationship. However, this concept could be operationalized or measured in many ways depending on the study context, the population being sampled, the difficulty of tracking down subjects who have exited, etc. Thus, the concept of exiting could be operationalized as exit-propensity, and have an operational definition of "the intention to end the relationship" (see Ping, 1993). An operational definition should be consistent with its conceptual definition, and items should be "instances" of or "tap" their operational definition. A second-order construct should be conceptually and operationally defined because the validity of a second-order constructs is as important as the validity of a first-order construct. However, providing conceptual definitions for interactions or quadratics is difficult because they are mathematical concepts rather than mental constructs. Nevertheless, operational definitions should be provided (e.g., Zs moderation of the X-Y association was operationalized as XZ, the interaction between X and Z) because there are many operationalizations of an interaction (e.g., X/Z, etc., see Jaccard, Turissi and Wan, 1990). IDENTIFYING IMPORTANT ANTECEDENTS Ideally defining the concepts in the model to be tested should involve all the antecedents of each dependent or endogenous variable. However, models in the social sciences typically account for only a portion of the variation in the dependent variables. There usually are other unknown variables that are antecedents of each dependent variable but are not included in the model. While accounting for all antecedents of every dependent variable is frequently impossible, especially in early stages of theory development, not including important antecedents (i.e., ones that are significantly related to the dependent variable, and that are also correlated with the other 2002 Robert A. Ping, Jr. 9/20/02 9 independent variables included in the model) biases (i.e., inflates or deflates) observed associations (see Duncan, 1975). This places a great burden on theoretical model testing for the inclusion of all important antecedents. Knowing which unstudied antecedents are important obviously requires considerable researcher knowledge and time. As a result, an objective of pretesting the model, which is discussed later, should be to address the adequacy of the antecedents (i.e., explained variance). A model that does not explain much variance in a dependent variable is vulnerable to subsequent research that use models that explain more variance in a dependent variable: non significant results may later turn out to be significant because important antecedents were not included in the original model. Finally, the requirement to adequately model the antecedents of dependent variables is not contrary to the notion of model parsimony, because parsimony is intended to exclude variables that are un important. SUGGESTIONS FOR STEP I-- DEFINING MODEL CONCEPTS Because of their importance in establishing the meaning of a construct and their importance to evaluating the domain sampling adequacy of the observed items that comprise the measure of the construct, conceptual definitions should be carefully stated for each construct. Operational definitions should be used to allow for contextual specificity of the sample or other contingencies in the study, and to describe second order concepts and interactions. To reduce bias in the estimates of the associations with dependent variables in the model, and thus the likelihood of false negative (Type I) and false positive (Type II) errors, the important antecedents of each dependent variable should be specified in the model to be validated. For similar reasons, interactions and quadratics should be considered even if theory is mute on their possible existence as they are in ANOVA studies. At a minimum they should be estimated on a post hoc basis when an hypothesized association turns out to be non significant, as a possible explanation for this lack of significance. Because second-order constructs can combine dimensions of a multidimensional construct and produce a more parsimonious structural model, among other reasons, plausible second-order constructs should also be considered. (end of section) 2002 Robert A. Ping, Jr. 9/20/02 10