EVALUATING SUBSTANTIVE CONCLUSIONS BASED ON INCOMPLETE DATA:

EVALUATING SUBSTANTIVE CONCLUSIONS BASED ON INCOMPLETE DATA:
A METHODOLOGICAL COMPARISON OF MISSING DATA TECHNIQUES USED
IN A STUDY OF THE EFFECT OF RACE ON TEACHERS’ EVALUATIONS OF
STUDENTS’ CLASSROOM BEHAVIOR
Heather Schwartz
B.A., California State University, Sacramento, 2006
THESIS
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF ARTS
in
SOCIOLOGY
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
FALL
2009
© 2009
Heather Schwartz
ALL RIGHTS RESERVED
ii
EVALUATING SUBSTANTIVE CONCLUSIONS BASED ON INCOMPLETE DATA:
A METHODOLOGICAL COMPARISON OF MISSING DATA TECHNIQUES USED
IN A STUDY OF THE EFFECT OF RACE ON TEACHERS’ EVALUATIONS OF
STUDENTS’ CLASSROOM BEHAVIOR
A Thesis
by
Heather Schwartz
Approved by:
_____________________________________, Committee Chair
Randall MacIntosh, Ph.D.
_____________________________________, Second Reader
Manuel Barajas, Ph.D.
____________________________
Date
iii
Student: Heather Schwartz
I certify that this student has met the requirements for format contained in the University
format manual, and that this thesis is suitable for shelving in the Library and credit is to
be awarded for the thesis.
_____________________________, Graduate Coordinator
Amy Liu, Ph.D.
Department of Sociology
iv
_________________
Date
Abstract
of
EVALUATING SUBSTANTIVE CONCLUSIONS BASED ON INCOMPLETE DATA:
A METHODOLOGICAL COMPARISON OF MISSING DATA TECHNIQUES USED
IN A STUDY OF THE EFFECT OF RACE ON TEACHERS’ EVALUATIONS OF
STUDENTS’ CLASSROOM BEHAVIOR
by
Heather Schwartz
Statement of Problem
The problem of missing data in statistical analysis is one that the field of social
research has failed to adequately address despite its potential to significantly affect results
and subsequent substantive conclusions. The purpose of this study is to evaluate the
practical application of missing data techniques in reaching substantive sociological
conclusions on the basis of statistical analyses with incomplete data sets. This study
compares three different methods for handling incomplete data: multiple imputation,
direct maximum likelihood, and listwise deletion.
Sources of Data
The comparisons are conducted via a reexamination of a multiple regression
analysis of the ECLS-K 1998-99 data set by Downey and Pribesh (2004), who reported
the results of their study on the effects of teacher and student race on teachers’
evaluations of students’ classroom behavior using multiple imputation to handle missing
data.
v
Conclusions Reached
After comparing the three different methods for handling incomplete data, this
study comes to the general conclusion that multiple imputation and direct maximum
likelihood will produce equivalent results and arrive at the same substantive sociological
conclusions. The current study also found that direct maximum likelihood shared more
similarities with listwise deletion than with multiple imputation, which may be the result
of differences in data handling by this author and Downey and Pribesh. In general, both
direct maximum likelihood and listwise deletion produced increased significance levels
and therefore a greater number of statistically significant variables when compared to the
multiple imputation results. Still, all three methods produced basically equivalent results.
The importance of taking method choice and missing data into careful consideration prior
to performing a statistical analysis and drawing subsequent substantive conclusions is
also stressed.
_____________________________________, Committee Chair
Randall MacIntosh, Ph.D.
_____________________________________
Date
vi
TABLE OF CONTENTS
Page
List of Tables .............................................................................................................. ix
Chapter
1. INTRODUCTION .................................................................................................. 1
Statement of the Problem .................................................................................. 1
Methods ............................................................................................................ 4
Organization of Current Study .......................................................................... 5
2. LITERATURE REVIEW ....................................................................................... 7
Effects of Race on Teachers’ Evaluations of Students’ Behavior .................... 7
Missing Data ................................................................................................... 13
Missing Data Mechanisms .............................................................................. 16
Handling Incomplete Data .............................................................................. 20
Traditional Methods ........................................................................................ 21
Direct Maximum Likelihood (DML) .............................................................. 23
Multiple Imputation (MI) ................................................................................ 27
Review of Missing Data Issues ....................................................................... 30
3. METHODOLOGY ............................................................................................... 33
Hypothesis....................................................................................................... 33
Sample............................................................................................................. 33
Dependent Measures ....................................................................................... 35
Independent Measures .................................................................................... 36
Control Variables ............................................................................................ 38
Evaluation of Missingness .............................................................................. 41
Analytical Plan ................................................................................................ 47
4. FINDINGS ............................................................................................................ 54
vii
Methodological Comparisons ......................................................................... 60
Summary of Findings ...................................................................................... 68
5. DISCUSSION ....................................................................................................... 70
Discussion of Findings .................................................................................... 70
Evaluation and Critique of Study.................................................................... 71
Impact on Future Research ............................................................................. 78
Conclusion ...................................................................................................... 80
References ................................................................................................................... 82
viii
LIST OF TABLES
Page
1. Descriptive Statistics from Downey and Pribesh’s Study .................................... 42
2. Percentage of Missing Data from ECLS-K Public Use Data Files ....................... 45
3. Race of Student by Race of Teacher from Downey and Pribesh’s Study ............ 49
4. Descriptive Statistics for the Variables Used in the Listwise Deletion Analysis . 51
5. Unstandardized Regression Coefficients for the Dependent Variable Externalizing
Problem Behaviors (Model 1 and Model 2 only) ....................................................... 56
6. Unstandardized Regression Coefficients for Dependent Variable Approaches to
Learning (Model 1 and Model only) ........................................................................... 56
7. Unstandardized Regression Coefficients for Dependent Variable Externalizing
Problem Behaviors (Model 3 and Model 4 only) ....................................................... 57
8. Unstandardized Regression Coefficients for Dependent Variable Approaches to
Learning (Model 3 and Model 4 only) ........................................................................ 59
ix
1
Chapter 1
INTRODUCTION
Statement of the Problem
Empirical research relies heavily on the statistical analysis of quantitative data,
collected by various means such as self administered questionnaires, surveys, and
interviews. Unfortunately, cases and even entire variables must often be left out of these
statistical analyses due to missing or incomplete data. At times, the incompleteness of a
data set is the undesirable yet expected result of the research design, and thus generally
ignorable (Fowler 2002; Rubin 1976; Schafer and Graham 2002; Stumpf 1978). For
example, the researcher may choose to administer different versions of a survey to
various subsets of the sample in order to save time or money and still obtain responses to
a large number of questions. Missingness of a data set may also be the unplanned
outcome of the research design, such as asking confusing questions or not providing all
applicable response choices. Missing or incomplete data can also be the result of
attrition, especially in longitudinal studies where respondents may die or records may be
lost before all waves of data are collected. In social research, missing data is often the
result of respondents forgetting or refusing to answer all questions. Respondents will
often skip questions they do not understand or respond to only part of multiple part
questions. In addition, many respondents will refuse to provide sensitive or private
information, especially in the context of face to face interviews. In these instances, the
missingness of data should be examined for its relationship to other variables and cases
2
and should not be ignored (Allison 2002; Byrne 2001a; Enders 2006; Fowler 2002; Rubin
1976; Rudas, 2005; Schafer and Graham 2002; Schafer and Olsen 1998; Stumpf 1978).
Common statistical analysis methods are based on the assumption that the data set
is essentially complete. Using conventional analytical methods with an incomplete data
set will result in biased, unreliable results with diminished statistical power and the
statistical program may not run correctly (Allison 2002; Collins, Schafer, and Kam 2001;
Fowler 2002; Rubin 1976; Rubin 1987; Rudas 2005; Schafer and Graham 2002).
However, most data sets are not complete, especially in social research and longitudinal
studies. Nevertheless, the results of statistical analyses using incomplete data sets are
often taken as valid and reliable (Allison 2002; Enders 2006; Rubin 1976; Rubin 1987;
Schafer and Graham 2002). Fellow researchers may realize that the results may be
biased due to a high rate of missingness, yet statistics are often utilized by those who do
not possess such expertise. Consumers of the results of statistical research are often
social and political institutions, such as hospitals, schools, and governmental agencies
that draw on the results to determine significant issues such as funding and program
efficacy (Regoeczi and Riedel 2003). Consequently, it is important that statistical
analyses be performed using complete data whenever possible in order to maximize the
reliability and validity of the results (Alison 2002; Schafer and Graham 2002; Schafer
and Olsen 1998; Stumpf 1978). Thus, when a complete data set is not available or
feasible, it is necessary for researchers to address the issue of missing data.
Even with the proliferation of methodological and theoretical literature on the
problem of missing data since the 1970s, many data analysts still fail to adequately
3
address the issue of missing data from a theoretical basis, often simply treating it as a
nuisance that can be ignored (Allison 1999; Baydar 2004; Enders 2006; Regoeczi and
Riedel 2003; Rubin 1976; Rudas 2005; Schafer and Graham 2002; Stumpf 1978; Wothke
and Arbuckle 1996). Regularly, a high rate of missingness is acknowledged by the
author as a limitation but not accounted for in the actual analysis (Schafer and Graham
2002). Additionally, though there are currently methods and software designed to
effectively deal with missing data, many social researchers are either not aware of them
or choose not to employ them (Allison 1999).
Despite recent theoretical advances, “theory has not had much influence on
practice in the treatment of missing data” (Wothke and Arbuckle 1996:2). When missing
data methods are utilized, analysts continue to use outdated ad hoc editing methods to
force data into a semblance of completeness and often make problematic assumptions
regarding the mechanisms of missingness without any theoretical basis (Baydar 2004;
Enders 2006; Rubin 1976; Schafer and Graham 2002). It has been said that is social
research there is a general tolerance of imprecision (Espeland 1988). Effective methods
have been developed, however they are often avoided and viewed as difficult to learn and
not commonly applicable (Allison 1999). As will be discussed in depth, opinions
regarding the employment of modern methods vary widely. For example, multiple
imputation is generally accepted as an effective and adaptable method, however it is also
time consuming and awkward to perform. On the other hand, direct maximum likelihood
is difficult to learn, but often preferable to use though it has less applications (Allison
2002; Collins, Schafer, and Kam 2001; Enders 2006; Schafer and Graham 2002). Most
4
researchers simply use whatever statistical software package they are comfortable with,
regardless of whether their data meets the assumptions required by that method and
despite literature suggesting otherwise (McArdle 1994).
The problem of missing data is complex as it is both an issue for researchers, who
must choose when and how to deal with missing data, and for their audience, who
consume the results of analysis on the basis that it is reliable, valid, and unbiased. This
problem only becomes more complex when one introduces the issues of secondary data
analysis (Rubin 1987; Rubin 1996). The problem of missing data in statistical analysis is
one that the social research field has failed to adequately address and accept (Baydar
2004; Espeland 1988; McArdle 1994; Rudas 2005; Yuan and Bentler 2000). The broad
purpose of this study is to bridge the gap between the methodological and theoretical
literature regarding missing data techniques and their practical applications in reaching
substantive conclusions on the basis of statistical analyses with incomplete data sets.
Methods
In order to attain this goal, this study will compare three different methods for
handling incomplete data. This comparison will be conducted via a reexamination of a
multiple regression analysis of the ECLS-K 1998-99 data set (Downey and Pribesh
2004). Downey and Pribesh (2004) reported the results of their study on the effects of
teacher and student race on teachers’ evaluations of students’ classroom behavior using
multiple imputation (MI). MI is a method which replaces missing values with some other
value, which varies depending on the specific type of MI employed. According to
Downey and Pribesh (2004), they followed Allison’s (2002) recommended data
5
augmentation model and performed five imputations (Allison 2002; Schafer 1999). This
study will compare these results with those obtained via two other generally accepted
methods, direct maximum likelihood (DML) and listwise deletion (LD). The most
simple and commonly used method for dealing with missing data is deletion. In short,
LD is performed by omitting any case that has missing data for any variable from the
analysis altogether. There are other deletion methods, which will be discussed, however
LD will be the focus of this study (Allison 1999; Allison 2002; Byrne 2001a; Collins,
Schafer, and Kam 2001; Enders 2006; Schafer and Graham 2002; Stumpf 1978). DML is
a frequently recommended method and considered comparable to MI. Simply put, DML
identifies the best possible parameter value such that it maximizes the likelihood that the
actual observed values will be produced by the estimated model. There are several
varieties of maximum likelihood estimation, some of which will be reviewed, though
direct maximum likelihood estimation will be utilized in the actual comparative analysis
(Allison 2002; Eliason 1993; Enders 2006; Schafer and Graham 2002).
In the course of performing this methodological comparison, the most commonly
used statistical software packages used to perform each of the three methods of analysis
will also be discussed. In addition, the various effects of method choice will also be
considered. The aim of this comparison is to demonstrate how missing data methods
influence substantive conclusions based on statistical analyses.
Organization of Current Study
Prior to performing these analyses, the theoretical and practical literature
regarding missing data and methods for handling incomplete data sets will be reviewed at
6
length, with a focus on the three methods to be compared. The substantive literature on
the effect of student and teacher race on teachers’ evaluations of students’ classroom
behavior will be introduced as well. The specific methods to be employed will be
discussed at length and then applied to the ECLS-K 1998-99 data set used by Downey
and Pribesh (2004). Two separate regression analyses will be performed, using the DML
and LD methods. The results of Downey and Pribesh’s analysis will be utilized as
representative of the MI method. Then, the results of all three regression analyses will be
compared and discussed to evaluate the efficacy, implementation, and appropriate
applications of the different methods. In addition to presenting these findings, the current
study will be evaluated and apparent limitations will be discussed as well as implications
for future studies.
7
Chapter 2
LITERATURE REVIEW
An objective of this study is to provide a cohesive link between the use of missing
data methods in substantive research and the methodological theories regarding handling
missing data in the field of social science research. This will be accomplished through
the re-examination of data analysis performed by Downey and Pribesh (2004) in their
study of the relationship between teacher and student race in teachers’ evaluations of
students’ behaviors using three different methods for handling missing data. Prior to
performing and comparing the various methods designed to handle incomplete
quantitative data, the existing literature on these methods will be reviewed. However, in
order to develop an understanding of the broad premise and significance of this current
study, the substantive literature will first be reviewed.
Effects of Race on Teachers’ Evaluations of Students’ Behavior
Downey and Pribesh’s 2004 study examined the effects of students’ and teachers’
race on teachers’ evaluations of students, particularly on teachers’ subjective evaluations
of students’ classroom behavior. Data from the ECLS-K 1998-99 kindergarten data set
and NELS 8th grade data set were compared to investigate whether teachers’ poor
evaluations of black students’ behavior are the effect of teacher bias or Oppositional
Culture Theory. Oppositional Culture Theory emphasizes minority group agency and its
role in hurting them by developing a culture in opposition to the values of the dominant
group, particularly formal schooling (Downey 2008; Farkas, Lleras, and Maczuga 2002;
Ogbu 2003). According to Downey and Pribesh, in order to support the Oppositional
8
Culture Theory, evaluations of black students must get worse as students age and adopt
an oppositional culture ideology. Operationally, this would be exhibited by poorer
evaluations of black 8th grade students as compared to kindergarten students, showing
that black students do indeed change from eager learners into oppositional, defiant
students. Downey and Pribesh’s theory of teacher bias is that teachers give less favorable
evaluations to students of different backgrounds, in this case students from a different
racial/ethnic background (Downey and Pribesh 2004; Ehrenberg, Goldhaber, and Brewer
1995; Long and Henderson 1971; Espinosa and Laffey 2003; Alexander, Entwisle, and
Thompson 1987). The teacher bias hypothesis would be supported if both kindergarten
and 8th grade black students received inferior evaluations from white teachers. The
premise behind this hypothesis is that if white teachers are biased against black students
or in favor of white students, it would be regardless of students’ actual behaviors as
kindergarten students do not yet have a negative idea of school or authority.
According to Downey and Pribesh, their findings “replicated the pattern that
others have found: Black students are typically rated as poorer classroom citizens than
white students,” (2004: 275). The results of Downey and Pribesh’s (2004) regression
analyses indicate that the effects of racial matching are comparable for both kindergarten
and 8th grade students, with white teachers giving black students poorer behavioral
evaluations than white students. In Downey and Pribesh’s data analysis, for all models
the regression coefficients indicated that black students receive higher ratings on the
Externalizing Problem Behaviors scale (indicating more problem behaviors) and lower
ratings on the Approaches to Learning Scale (indicating less good behaviors) than do
9
white students. Downey and Pribesh state that these less favorable evaluations are the
function of the student-teacher race effect and not simply the effect of student race. In
Model 4, they replaced the variables for students’ and teachers’ race with variables
measuring the student-teacher racial dyad and found that black students matched with
white teachers receive statistically increased reports of problem behaviors (b = .150,
p<.001) and decreased reports of favorable behaviors (b = -.127, p<.001) when compared
to white students matched with white teachers. . However, they did not find any
significant differences when looking at black students matched with black teachers or
white students matched with black teachers, further evidence that the poorer evaluations
of black students are related to the teacher’s race. In addition, they report that black
students receive more favorable behavioral evaluations when matched with black
teachers than do white students matched with white teachers (b = -.063, p = .06)
(2004:275). Downey and Pribesh interpret these results as evidence to support their
hypothesis that black students receive poorer evaluations than white students as a
function of teacher bias on the basis of student race.
In summary, Downey and Pribesh (2004) report that these statistical results do not
support the Oppositional Culture Theory explanation. On the other hand, the results do
support the teacher bias explanation that white teachers are biased against black students.
Based on this conclusion regarding subjective evaluations by teachers, Downey and
Pribesh conclude that the next step would be to examine how student-teacher racial (mis)matching impacts objective measures such as achievement and learning.
10
In addition to variables measuring student and teacher race and student-teacher
racial matching, Downey and Pribesh also included several control variables in their
analysis. Taking these control variables into account decreased the effects of students’
race on negative evaluations and several control variables were statistically significant
according to their published tables. Unfortunately, they do not include a discussion of
these variables in their study. For one, female students received far more favorable
evaluations than did male students in the areas of behavior (b = -.261, p<.001) and
approaches to learning (b = .285, p<.001), effects that were larger than that of race
according to Models 3 and 4. Another variable which had a larger effect on classroom
evaluations than race was the type of parent(s) living in the student’s household. Those
students with both biological parents in the home received more positive behavioral
evaluations (b = -.189, p<.001) and were reported to have more skills for learning (b =
.147, p<.001) as compared to other family compositions. Also, first-time kindergartners
were reported to have far better scores on the Approaches to Learning scale (b = .211,
p<.001) and less problem behaviors (b = -.147, p<.001) than those children who had been
students before. Other variables which were statistically significant in the direction of
more favorable evaluations were: socioeconomic status, student’s age, teacher’s
educational level (for problem behaviors only), public school (for problem behaviors
only), and the percentage of students who are black (for problem behaviors only). The
only control variable found to be statistically significant in the direction of less favorable
behavioral evaluations was the percentage of students eligible for free lunch in the
school, although its effects appear to be minimal (b = .001, p<.01). Downey and Pribesh
11
do not report any discussion or statistic with which to evaluate the predictive power of
their models, so it is unknown whether the variation in teachers’ evaluations of students’
classroom behaviors is adequately explained in these models. Still, one can see by the
change in student race regression coefficients from Model 1 and Model 2 to Model 3 that
the inclusion of these control variables does account for some of the variation that was
attributed to student race in the first two models. When controlling for the independent
variables included in Model 3, the effects of students’ race on teachers’ evaluations
decreased dramatically (b = .151 and b = -.127 respectively). It should be noted that the
change in variables to measure student and teacher race in Model 4 had little effect on
control variables for either dependent variable.
Furthering their scholarly contributions, Downey and Pribesh’s 2004 study has
informed subsequent studies. Two studies have cited their findings as empirical evidence
against Oppositional Culture Theory and proof of the need for policy changes in the areas
of education and socialization of minority children (Downey 2008; Gosa and Alexander
2007). A substantial number of studies have utilized the work as support for the theory
of teacher bias against students of dissimilar backgrounds, especially against minority
students. The proposals based on these substantive findings include hiring more female
and minority women for positions of power, eliminate tracking systems, provide
supplemental on-campus programs for minority youths, demand and reward behavior
favored by mainstream society, focus on enhancing non-cognitive skills in schools, and
even challenge the idea that education equalizes other social inequities (Bodovski and
Farkas 2008; Cohen and Huffman 2007; Condron 2007; Downey, von Hippel, and Broh
12
2004; Entwisle, Alexander, and Olson 2005; Lleras 2008; Shernoff and Schmidt 2008;
Stearns and Glennie 2006; Tach and Farkas 2006). Thus, the substantive conclusions
based on Downey and Pribesh’s (2004) statistical analysis will continue to have an
impact on future applied and theoretical sociological studies.
As with most studies, Downey and Pribesh’s work testing Oppositional Culture
Theory has been critically reviewed by their peers. Criticism has primarily been based
upon the problematic assumptions relating to the Oppositional Culture Theory in general,
and their operationalization of it in this and prior studies (Ainsworth-Darnell and Downey
1998; Downey 2008; Farkas, Lleras, and Maczuga 2002; Tach and Farkas 2006; Takei
and Shouse 2008). The current study will take a critical look at the methodological
aspects of Downey and Pribesh’s work. Their substantive conclusion is directly related
to the results of their statistical data analyses; however these analyses were performed
using an incomplete data set. Previous studies have shown that there may be problematic
implications with reliance on statistical data analyses using incomplete quantitative data
(Regoeczi and Riedel 2003; Rubin 1976). Therefore, it is important to examine whether
Downey and Pribesh’s results were valid. Were the results of their study skewed by
missing data and did they handle the issue of missing data appropriately? The current
study will examine these questions through a review of prior literature and
methodological comparison.
As explained above, Downey and Pribesh (2004) used a multiple imputation method
to deal with the missing and incomplete cases in their statistical analyses. This is but one
method used to handle missing and incomplete data sets in quantitative studies. The
13
literature regarding the problem of missing data in general will be reviewed below,
followed by an extensive discussion regarding particular methods.
Missing Data
In the literature regarding handling missing data, the existence of missing data in
social research study is depicted as common and often unavoidable (Allison 2002;
Arbuckle 2007; Carter 2006; Fowler 2002; Regoeczi and Riedel 2003; Rudas 2005;
Stumpf 1978). Data can be missing for a variety of reasons including oversight in data
collection and recording, respondent drop out in longitudinal studies, planned
missingness in research design, and refusal, among others (Byrne 2001a; Schafer and
Graham 2002; Sinharay, Stern, and Russell 2001). The literature generally discusses
three types of missing data in surveys: unit non-response, which indicates no information
for a subject or case; item non-response, which is no information for a particular variable
for a case; and undercoverage, either due to attrition or sampling issues (Madow,
Nisselson, and Olkin 1983; Rudas 2005; Schafer and Graham 2002). Alternatively,
others divide the three missing data situations into those of: omission, which includes unit
and item non-response; attrition; and planned missing data, based on research design
(Graham, Hofer, and Piccinin 1994). One of the most common reasons for missing data
is refusal to answer a particular question in an otherwise complete survey, usually the
refusal to provide a piece of personal or controversial information. Sinharay, Stern, and
Russell (2001) found that income information is the most common item of refusal
(around 14%). Whatever the reason, missing data is seen as inevitable in social research,
especially surveys (Carter 2006; Yuan and Bentler 2000).
14
If incomplete data sets and other types of missing data are commonplace in
survey research, then why is missing data considered a problem? Most social scientists
are interested in substantive issues and are not focused on methodology; thus most
statistical analyses are performed using standard complete-data methods even if the data
are incomplete (Rubin 1978; Rubin 1987; Rubin 1996; Rudas 2005; Schafer et al 1996).
This presents a problem as standard statistical methods are designed for complete,
“rectangular” data sets (Little and Rubin 2002:3) where rows represents units (cases) and
columns represent variables (items). Missing variables and cases alter the rectangularity
of a data set resulting in a variety of missing data patterns. Most incomplete data sets
used in sociological research result in a general or multivariate pattern of missing data
due to item or unit non-response or a monotone pattern due to attrition of respondents in
longitudinal studies (Little and Rubin 2002). However, most statistical analysis
procedures cannot deal with a non-rectangular data set and sociologists often resort to ad
hoc editing of data to make it fit the parameters required by the method of analysis
(Rudas 2005; Schafer and Graham 2002).
Given that complete-data methods cannot handle non-rectangular data matrixes,
missing data can seriously bias results and conclusions drawn from empirical research.
The extent and direction of this bias depends on the amount and pattern of the missing
data. However, there are no clear guidelines of what constitutes a problematic amount of
missing data or how to proceed with an incomplete data set (Bose 2001; Byrne 2001a;
Fowler 2002; Madow, Nisselson, and Olkin 1983; Regoeczi and Riedel 2003). There are
three general concerns when dealing with missing data: loss of efficiency, complication
15
in data handling and analysis, and bias from the difference between observed and
unobserved data (Horton and Lipsitz 2001; Mackelprang 1970; Stumpf 1978). More
specifically, the problems associated with performing statistical analyses on an
incomplete data set can include biased, inefficient estimates of population characteristics,
distorted distributions, and increased errors in hypothesis testing (Collins, Schafer, and
Kam 2001; Madow, Nisselson, and Olkin 1983). Due to the reduced sample size
available for analysis, parameter estimates lose efficiency and become biased because
respondents are systematically different from non-respondents; the sample is no longer
representative of the population (Arbuckle 2007; Rubin 1987).
Incomplete data sets which are analyzed using complete data methods not only
have biased and inaccurate results, but are often taken as valid statistical analyses upon
which substantive conclusions are based. Many social scientists lack the methodological
expertise to thoroughly evaluate the soundness of the statistical findings of a study
(Rubin 1996; Schafer et al. 1996). Unfortunately, many public agencies and social
scientists knowingly misuse statistics to further one’s cause or prove the efficacy of a
certain social program. Therefore, the issues of incomplete data should be considered in
all studies involving statistical analysis (Byrne 2001a; Madow, Nisselson, and Olkin
1983; Regoeczi and Riedel 2003).
Surprisingly, this concept was largely ignored in statistics literature until the early
1970s, when Donald Rubin began his seminal work on the subject. His work was
considered groundbreaking and has since provided the framework behind modern
methods and terminology used to discuss them (Baydar 2004; Espeland 1988; Little and
16
Rubin 2002; Mackelprang 1970; Rubin 1976). After Rubin brought the issue of missing
data into the limelight, there was a relative proliferation in concern with non-response, as
evidenced by the creation of committees and panels for dealing with incomplete data.
The federal Office of Management and Budget even created a guideline stating that no
survey would be approved that anticipates a response rate of less than 50% (Rubin 1978).
Unfortunately, despite the increase in theoretical literature on the subject and available
methods and software which efficiently and validly deal with missing data, most social
researchers still view these methods as novel and the subject remains largely ignored
(McArdle 1994; Rubin 1996).
Missing Data Mechanisms
While the analysis of incomplete data is necessarily inferior to that of a complete
data set, the efficiency of analysis procedures varies depending on the proportion of
missing data and its distribution in the data set; this is especially true when data is
systematically missing or when a large portion of the sample is missing. In most
discussions regarding statistical analysis with missing data, one can find reference to
missing data mechanisms. The three mechanisms have been termed “missing completely
at random,” “missing at random,” and “missing not at random” (Allison 1987; Byrne
2001a; Collins, Schafer, and Kam 2001; Mackelprang 1970; Schafer and Graham 2002;
Stumpf 1978). Many interpret these mechanisms as the cause or reason for the
missingness. However, missing data mechanisms actually represent the distribution of
missingness, not a causal relationship. Mechanisms reflect the possible relationships
between the missingness and the value of missing items themselves as well as to other
17
measured variables, not what causes the data to be missing (Enders 2006; Little and
Rubin 2002; Rubin 1976; Schafer and Graham 2002). Rubin (1976) stated that missing
data was ignorable if it was missing at random and observed at random, an idea that has
come to be referred to as missing completely at random in most discussions (Allison
1987). Rubin’s 1976 definitions of the three primary mechanisms of missing data have
become the common terms for discussing this topic, and will be used in the current
discussion.
Missing completely at random (MCAR)
When the missingness does not depend on the observed or missing data in the
data set, then the missing data is said to be missing completely at random (MCAR) (Little
and Rubin 2002; Schafer and Graham 2002). In other words, MCAR is when the
probability of missing data on a particular variable is independent or unrelated to the
value of that variable or the values of any other variables in the data set, observed or
unobserved (Byrne 2001a; Enders 2001; Regoeczi and Riedel 2003). Basically, the
observed values are a random subsample of a hypothetically complete data set where the
non-respondents do no differ from respondents (Enders 2001; Little and Rubin 2002). If
data is MCAR the missingness is considered ignorable, because the missing response is
independent of all variables in the study and occurs by chance. In the case of MCAR,
analysis remains unbiased (Sinharay, Stern, and Russell 2001). There is no relationship
between the patterns of missing and observed data (McArdle 1994) and missingness is
not related to any known or unknown variable relating to the data set (Horton and Lipsitz
2001).
18
MCAR is a special case of MAR, which is discussed below (Enders 2006; Schafer
and Graham 2002). MCAR is the most restrictive assumption (Byrne 2001a). The
MCAR assumption can be tested for statistically but is unlikely to hold (Regoeczi and
Riedel 2003) unless the missingness is by design (Little and Rubin 2002).
Missing at random (MAR)
Data are missing at random (MAR) if the probability that an observation is
missing can depend on the values of observed data, but not on the values of the missing
item itself (Allison 2002; Baydar 2004; Byrne 2001a; Enders 2001; Enders 2006; Little
and Rubin 2002; Regoeczi and Riedel 2003; Schafer and Graham 2002). Basically, the
occurrence of missing values may be at random, but their missingness can be linked to
the observed values of other variables in the data set (Byrne 2001a; Horton and Lipsitz
2001; Sinharay, Stern, and Russell 2001). MAR is also considered ignorable
missingness, as some relationship exists between the patterns of missing data and missing
scores, but the data are still observed at random (McArdle 1994; Rubin 1976).
MAR is a less restrictive assumption than MCAR, (Byrne 2001a) but still not
totally realistic in the social sciences. MAR is usually the case with planned missingness
and can be viewed as a good working assumption (Allison 2002; Arbuckle 2007;
Regoeczi and Riedel 2003; Schafer and Graham 2002; Yuan and Bentler 2000).
However, there is no statistical test for MAR as there is no way to determine if missing
values depend on the variable itself or differ systematically without knowing the values
of the missing data. Methodologists suggest that the only way to be sure that missing
data are MAR is to actually go back and collect the missing data. Some do suggest that
19
one practical way for evaluating whether data are MAR is that the missingness should be
predicable based on other variables. Nevertheless, most also agree that minor departure
from MAR may not cause significant bias, so the assumption of MAR is usually a safe
one and tests are not as necessary.
Missing not at random (MNAR)
Missingness is considered to be missing not at random (MNAR) if the
missingness is related to the missing and observed values. MNAR missingness is
considered nonignorable because the missingness is related to the value that would have
been observed (Allison 2002; Horton and Lipsitz 2001; Little and Rubin 2002; McArdle
1994; Schafer and Graham 2002; Sinharay, Stern, and Russell 2001).
MNAR is the least restrictive and most plausible assumption in applied settings
(Byrne 2001a). However, it is also the most problematic as MNAR missingness can have
an effect on the generalizability of findings and introduce significant bias as a result of
the systemic difference between cases with missing and observed variables (Arbuckle
2007; Byrne 2001a; Fowler 2002; Stumpf 1978).
The treatment of missing data depends on the mechanism behind the missingness,
however little attention is paid to this issue in practice (Allison 2002; Regoeczi and
Riedel 2003; Stumpf 1978). Theoretically, in most statistical analyses it is generally
assumed that missing data is accidental and random, and can thus be ignored (Rubin
1976). However, statisticians should examine the process behind missing data and
include this process in their model and choice of method (Graham, Hofer, and Piccinin
1994; Rubin 1976).
20
Handling Incomplete Data
Most literature on missing data methods actually indicates that the best method
for dealing with the issues of incomplete data is to collect data as fully and accurately as
possible and to handle missing data at the data collection phase (Allison 2002; Fowler
2002; Graham, Hofer, and Piccinin 1994; Madow, Nisselson, and Olkin 1983; Rudas
2005). However, this is not always possible or sufficient. The goal of any statistical
procedure is to make valid, efficient inferences about the population (Rudas 2005;
Schafer and Graham 2002). The general guidelines when handling incomplete data are
not dissimilar: allow standard complete data methods to be used, capability of yielding
valid inferences, and sensitivity of inferences to various plausible models for nonresponse (Rubin 1987). When he began his work on incomplete data, Rubin outlined
three related objectives which must be met for properly handling non-response: adjust
estimates for the fact that non-respondents differ systemically from respondents; expand
standard error to reflect smaller sample size and differences between respondents and
non-respondents; and expose sensitivity of estimates and standard errors to possible
difference between respondents and non-respondents on unmeasured background
variables (Rubin 1987).
The overarching goal when dealing with incomplete data is to estimate predictors
so that uncertainty due to missingness is taken into account (Allison 1987; Rudas 2005;
Stumpf 1978). This avoids the serious bias which missing data can cause in completedata analysis methods (Sinharay, Stern, and Russell 2001). Adhering to this goal should
provide consistent, efficient parameter estimates and good estimation of standard errors,
21
which will allow valid hypothesis testing and confidence intervals (Allison 1987). How
to deal with missing data in a way which meets these goals is the question. In keeping
with the majority of literature on missing data methods, traditional methods will be
briefly reviewed before discussing in detail the three contemporary methods which are
the focus of the current study.
Traditional Methods
Most traditional methods are ad hoc editing treatments for missing data and are
not theoretically based. Their primary goal is to fix up the data so they can be analyzed
by methods designed for complete data, to make the data fit back into the rectangular
matrix (Allison 1999; Allison 2002; Stumpf 1978; Wothke and Arbuckle 1996). These
methods generally work well in limited contexts where data are MCAR and only a small
amount of missing data exists. Still, they are prone to estimation bias and are less
powerful and less efficient than modern methods, even when the data is MCAR (Enders
2006). One type of method traditionally used is reweighting, in which complete cases are
weighted so that their distribution more closely resembles that of the full sample or
population (Fowler 2002; Schafer and Graham 2002). However, the most commonly
used methods repair data before analysis by discarding records (deletion) or filling in
values (imputation) (Enders 2006). Simple imputation methods fill in missing values
with some estimated score, such as an average value (Schafer and Graham 2002). These
imputation methods obtain the imputed score used to fill in for the missing value in
various ways. However, all are considered fairly arbitrary, lacking variation, and
significantly biased (Byrne 2001a; Enders 2006; Rubin 1987). Imputation methods are
22
more efficient than deletion as there is no data loss, but are more difficult to implement
and can severely distort relationships (Schafer and Graham 2002). There are two primary
deletion methods: Pairwise and Listwise. In pairwise deletion, only cases with missing
values on variables included in a particular computation are excluded from analysis
(Allison 1999; Arbuckle 2007; Byrne 2001a; Enders 2006; Schafer and Graham 2002).
The sample size may vary widely across variables depending on the particular analysis
and the sampled cases will be different for each analysis.
Listwise Deletion (LD)
The other principal deletion method is listwise deletion (LD), also called complete
case analysis or casewise deletion (Allison 1999; Arbuckle 2007; Schafer and Graham
2002; Sinharay, Stern, and Russell 2001; Stumpf 1978). Although LD is based on the ad
hoc ideas of other traditional methods, it is the most commonly applied method for
handling incomplete data problems. In LD, cases are omitted which do not have
complete records on all variables to be included in the analysis (Byrne 2001a; Carter
2006; Enders 2006; Little and Rubin 2002; Schafer and Graham 2002). LD can lead to a
very small sample size and thus incorrect statistical results (Sinharay, Stern, and Russell
2001). However, unlike pairwise deletion, all analyses are calculated with the same set
of cases and the final sample includes only cases with complete record. Yet, analysis of
these complete cases may be biased because they can be unrepresentative of full
population if missingness is not MCAR (Byrne 2001a; Carter 2006; Rubin 1987; Schafer
and Graham 2002).
23
The opinion regarding LD is varied in the literature. The general opinion is that
LD assumes data are MCAR, but can still produce somewhat accurate results with MAR
and small amount of missing (Carter 2006; Enders 2006; Sinharay, Stern, and Russell
2001). However, others note that even with data that are MCAR, LD may be consistent
but is still biased and inefficient as it often discards a large amount of data (Allison 1987;
Arbuckle 2007; Carter 2006; Enders 2006; Little and Rubin 2002). There is general
consensus that LD is inefficient when there is a substantial amount of missing data as a
large portion of data is discarded from analysis (Allison 1999; Schafer and Graham 2002)
resulting in a loss of information, reduced sample size, and overall decreased statistical
power (Byrne 2001a). Still, most do agree that LD has its place in contemporary missing
data methodology as it is the simplest method to achieve satisfactory results and remains
the most widely used method (Schafer and Graham 2002).
Direct Maximum Likelihood Estimation (DML)
“Appropriate methods do not make something out of nothing but do make the
most out of available data,” (Graham, Hofer, and Piccinin 1994:14). Maximum
likelihood estimation (ML) is a theoretically based, iterative method for dealing with
incomplete data (Byrne 2001a; Eliason 1993; Enders 2006; Little and Rubin 2002). ML
has two basic assumptions: that the missing data mechanism is ignorable (MCAR or
MAR) and a data fit a multivariate normal model (Allison 1987; Sinharay, Stern, and
Russell 2001; Yuan and Bentler 2000). The basic goal of ML is to identify the
population parameter values that are most likely to have produced a particular sample of
the data, using only variables that are complete for case (Collins, Schafer, and Kam 2001;
24
Enders 2006). Basically, ML borrows information from observed variables and identifies
the best possible parameter value such that it maximizes the likelihood that the actual
observed values will be produced by the estimated model (Eliason 1993; Enders 2001).
ML uses iterative algorithms to try out different values (Enders 2006). The discrepancy
between each case’s data and estimation is quantified by the likelihood, which is similar
to a probability measuring how likely a particular score is to occur from a normal
distribution with a particular set of parameter values.
There are three common ML estimation algorithms for missing data (Allison
2002; Enders 2001; Yuan and Bentler 2000):
1) Expectation Maximization (EM) Algorithm, a commonly used two-stage iterative
method. The first step is to obtain estimates given the observed values and the second
step is to find the maximum likelihood parameters using the data set completed with
estimates contained in the first step.
2) Multiple Group Approach in which a sample is divided into groups so that each
subgroup has the same pattern of missing data and a likelihood function is computed for
each group and then maximized.
3) Direct Maximum Likelihood (DML) which is similar to the multiple group method
except that the likelihood function is calculated at the individual rather than group level.
DML is considered theoretically superior to other ML methods and will be the
ML method of focus in this study (Byrne 2001b; Yuan and Bentler 2000). Direct
Maximum Likelihood (DML) is also referred to as full information maximum likelihood
and raw maximum likelihood (Navarro 2003). DML is a direct likelihood inference that
25
results from “ratios of the likelihood function for various values of the parameter” (Rubin
1976:586). DML is considered to be a direct method because there is no attempt to
restore the data matrix to rectangular form (Byrne 2001b). Model parameters and
standard errors are estimated directly from using all observed data and no values are
imputed (Enders 2006). In direct approaches (DML and Multiple Group Approach)
parameter estimates are obtained directly from available raw data without a preliminary
data preparation step (i.e. imputation) and complete data analysis methods can be used
(Enders 2001). This is unlike indirect approaches (EM Algorithm), in which an
additional data preparation phase is necessary and additional analyses are necessary to
recover lost residual variability. DML uses all of the available information on observed
portions of variables to generate ML based statistics (Carter 2006; Navarro 2003). The
DML estimate can be obtained by maximizing the data likelihood, a function linking the
observed and missing data to the model parameters. It maximizes the observed data
likelihood to obtain the ML estimates of the parameters (Little and Rubin 2002; Sinharay,
Stern, and Russell 2001; Yuan and Bentler 2000). DML maximizes the case-wise
likelihood of the observed data through an iterative function, which ultimately converges
on single set of parameter values that maximize the log likelihood (Enders 2006; Wothke
and Arbuckle 1996).
Even among those who favor ML methods, DML has been historically
unavailable and rarely used except in SEM applications (Allison 1999; Byrne 2001a;
Collins, Schafer, and Kam 2001). The reasons for this lack of use is mostly that DML is
model specific, complicated, and until recently required sophisticated methods (Sinharay,
26
Stern, and Russell 2001). The general feeling was that DML may be too much work for
practical use, and that many prefer to use the EM algorithm over DML (Allison 1987;
Yuan and Bentler 2000).
Still, advantages of DML over ad hoc methods are clear. DML’s lack of reliance
on MCAR is one obvious advantage (Wothke and Arbuckle 1996). It provides unbiased
and valid inferences under MAR especially in large samples (Arbuckle 2007; Schafer and
Graham 2002). Even when data are not quite MAR DML produces valid results, and
with MNAR data DML is the least biased method. DML is able to yield standard error
estimates which take into account the incompleteness in the data and provide a method
for hypothesis testing (Byrne 2001a; Little and Rubin 2002). A theoretical advantage of
DML over other methods is that it provides estimates without requiring the filling in
missing values (Sinharay, Stern, and Russell 2001). DML is flexible, is not limited by
the number of missing data patterns, and does not require complex steps to accommodate
missing data (Carter 2006). DML is widely applicable to a variety of analyses including
multiple regression and structural equation modeling (SEM) (Enders 2001).
New algorithms and software, such as Amos, have made DML a simpler and
more feasible option (Allison 1987; Arbuckle 2007; Collins, Schafer, and Kam 2001;
Zeiser 2008). DML is the standard option available for dealing with missing data in the
Amos program. Amos (Analysis of Moment Structures) is a widely available statistical
software package which uses SEM in the analysis of mean and covariance structures.
Procedures can be performed by path diagram (Amos Graphics) or equation statement
(Amos Basic) (Byrne 2001b; Byrne 2001a; Cunningham and Wang 2005). With the
27
increased use of Amos, the DML method for analysis of incomplete data sets will
become more mainstreamed.
Multiple Imputation (MI)
Multiple Imputation (MI) is another type of correction method, which is similar in
practice to LD, but theoretically based (Wothke and Arbuckle 1996). MI does not create
information but represents the observed information so as to make it appropriate for valid
analysis using complete data tools (Rubin 1996). MI was designed for the survey context
by Rubin in 1971 when he became concerned that non-respondents were systematically
different than respondents in an educational testing survey (Rubin 1987). He developed
MI for large sample surveys in which data is collected to be used by a variety of users
and for a variety of purposes (Sinharay, Stern, and Russell 2001). In particular, MI was
designed for instances where data base constructors and users are distinct entities (Rubin
1996). MI has three assumptions: matching imputation and analysis models, at least
MAR if not MCAR, and a multivariate normal model (Allison 2000; Enders 2006).
MI is the process of replacing each missing data point with a set of m>1 plausible
values to generate m complete data sets to be analyzed by standard statistical methods
(Allison 2000; Collins, Schafer, and Kam 2001; Enders 2001; Enders 2006; Freedman
and Wolf 1995; Little and Rubin 2002; Navarro 2003; Penn 2007; Rubin 1987; Rubin
1996; Schafer 1999; Schafer and Graham 2002; Schafer and Olson 1998; Sinharay, Stern,
and Russell 2001; Yuan 2000). Multiple imputations represent a distribution of
possibilities and reflect the uncertainty about nonresponse bias (Fay 1992; Graham,
28
Hofer, and Piccinin 1994; Horton and Lipsitz 2001; Rubin 1978). MI is a three-step
process:
1). Imputation: Starting from observations and using a predictive model based method, a
set of plausible values for missing values are created. These values are used to fill in and
create complete data sets. Multiple full data sets are created, each with a different set of
random draws.
2) Analysis: Data sets are each analyzed using complete-data methods.
3) Combination: The results from each separate analysis are combined using
straightforward calculations developed by Rubin to develop single results.
There are two standard algorithms for performing MI: Propensity Score Classifier
with approximate Bayesian bootstrap and Data Augmentation (Allison 2000; Carter
2006; Schafer and Graham 2002). Generally, data augmentation is considered to be the
best algorithm because it produces little or no bias. Data augmentation is an iterative,
regression based method of simulating the posterior distribution. Data augmentation
iteratively draws a sequence of values of the parameters and missing data until
convergence occurs. The iterative chain repeats two steps (Allison 2000; Collins,
Schafer, and Kam 2001; Enders 2001; Enders 2006; Schafer and Olson 1998; Yuan
2000):
I Step: Replaces missing values with predicted scores from a series of multiple
regression equations.
P Step: New covariance matrix and mean vector elements are randomly sampled from a
posterior distribution that is conditional on the filled-in data from I Step.
29
I and P are repeated numerous times with imputed data sets saved at specified
intervals, until convergence to their stationary distribution (Enders 2006; Yuan 2000).
Data is said to converge when within variation approximately equals between variation
(Little and Rubin 2002) or when the distribution of parameter estimates no longer
changes between contiguous iterations (Enders 2001).
The SAS program used by Downey and Pribesh (2004) uses a data augmentation
MI method for imputing missing data based on Rubin’s guidelines (1987), which
assumes MCAR or MAR and multivariate normality (Horton and Lipsitz 2001; Sinharay,
Stern, and Russell 2001; Yuan 2000).
Overall, MI is considered to be a simple and generalizable method (Little and
Rubin 2002; Sinharay, Stern, and Russell 2001). One of the major advantages of MI is
that missing data is dealt with prior to analysis, which makes the completed data set more
available for secondary data analysis. This also encourages a more inclusive strategy of
adding auxiliary variables to improve the missing data management without having to
actually include them in subsequent analyses (Collins, Schafer, and Kam 2001; Penn
2007; Schafer and Graham 2002; Schafer et al 1996). The imputed data sets can be
readily analyzed with available software and no special analysis software needed by the
user (Enders 2006). Also, this allows for the ability to use different methods for data and
analysis (Schafer 1999). In addition, MI has weaker, more realistic assumptions than
traditional methods and still performs well when data does not fit the normal model
(Allison 2000; Enders 2006). Because there is no omitted data, MI sustains the original
sample size and thus reduced bias versus traditional methods.
30
Yet, MI has its drawbacks. The disadvantages of MI are that it takes more time,
more effort, and more storage space than other methods because multiple data sets must
be created and analyzed (Graham, Hofer, and Piccinin 1994; Little and Rubin 2002;
Rubin 1987). In addition to being labor intensive, MI also requires more statistical
expertise (Enders 2006). Although the ability to perform analysis and imputations
separately is an advantage, the results can be invalid if the analysis model is not same as
imputation model (Fay 1992). However, this is easily remedied by the imputer who
should include as many variables as possible when doing imputations (Rubin 1996;
Sinharay, Stern, and Russell 2001).
Review of Missing Data Issues
As the review of existing literature on the subject reveals, there are a wide variety
of methods being currently used for handling incomplete data sets, ranging from simply
ignoring the missingness to elaborate estimation and analysis methods designed to
simulate the variability of a complete data set. Since Rubin initiated the discussion in the
1970s, there has been a relatively large amount of theoretical literature regarding the
issue of missing data in statistical analysis. Unfortunately, theory has not had much
influence in practice on the treatment of missing data (Wothke and Arbuckle 1996).
Most methodologists advise one to avoid solving missing data problems by simply
replacing missing values with arbitrary numbers, such as 0 or -9, as is the common
treatment (Schafer and Graham 2002). Further, recent theoretical literature urges the
treatment of missing values as a source of variation rather than viewing it as simply a
nuisance as many traditional social scientists do (Sinharay, Stern, and Russell 2001).
31
Nevertheless, there is no real consensus about which methods should be employed to
handle statistical analyses with missing data in practice (Collins, Schafer, and Kam
2001).
Most methodologists do acknowledge that how missing data should be handled
depends on the distribution of missing data in the particular data set to be analyzed
(Regoeczi and Riedel 2003). Statisticians should consider the process behind missing
data and include this process in their method choice (Graham, Hofer, and Piccinin 1994,
Rubin 1976). In order for most statistical analysis methods to produce unbiased, valid
results the assumption of MCAR must be made, even for those which are purported to be
appropriate for incomplete data sets.
Unfortunately, little attention has been paid to the issue of missing data
mechanisms and method choice in practice. In some cases, such as MCAR with a small
amount of missingness, LD may be an appropriate method given its simplicity (Allison
2002; Enders 2001; Regoeczi and Riedel 2003). However, with MAR or a substantial
amount of missing data, LD is clearly inefficient and biased (Wothke and Arbuckle
1996). The methodological literature presents MI and DML as generally comparable
methods (Collins, Schafer, and Kam 2001; Enders 2006). A large portion of
contemporary literature recommends DML if at all possible as a first choice, and then MI
if appropriate, under general MAR conditions over traditional ad hoc and simple
imputation methods (Navarro 2003; Schafer and Graham 2002). However, some
methodologists reveal that it may just be that the choice between methods is one personal
preference and convenience rather than theoretically based (Allison 2002; Enders 2006).
32
Regardless of the rationale behind method choice, the fact remains that many
social scientists perform statistical analyses using incomplete data sets. The results of
such analyses are generally considered accurate and valid by both the researcher and their
audience. Further, substantive conclusions and significant decisions are often based on
the results of these statistical analyses. The remainder of the current study will focus on a
comparison of three widely used, mainstream methods for handling incomplete data (LD,
DML, and MI) and the subsequent evaluation of the substantive conclusions of Downey
and Pribesh’s 2004 study.
33
Chapter 3
METHODOLOGY
Hypothesis
In light of the prior literature, the following hypothesis has been developed and
will be further explored in the methodological comparison to follow: DML and MI will
produce equivalent results and in application arrive at the same substantive conclusions.
Therefore, DML should be used whenever appropriate to do so, even if MI is also
appropriate, as it is generally easier to implement.
Sample
The current study is a re-examination of Downey and Pribesh’s 2004 study,
“When Race Matters: Teachers’ Evaluations of Students’ Classroom Behavior.”
Downey and Pribesh utilized data from the Early Childhood Longitudinal Study’s base
year data collected from the fall kindergarten class of 1998-99, commonly referred to at
the ECLS-K study (National Center for Education Statistics (NCES) 2004). In order to
conduct a proper comparison of methods using Downey and Pribesh’s 2004 study as
starting point, this study will utilize the same data set and variables as their original
study.
The ECLS-K is a nationally representative sample of 22,782 children who
attended kindergarten in the 1998-99 school year. Base year data were collected when
the children were kindergarten students. There have been four subsequent waves of data
collection since, when the children were in the first, third, and fifth grades (NCES 2004).
Sampling for the primary wave of ECLS-K involved a dual-frame, multistage sampling
34
design (West, Denton, and Reaney 2001). The primary sampling units (PSUs) utilized by
the ECLS-K study were revised from existing multipurpose PSUs created from 1990
county-level population data in order to meet the study’s precision goals regarding size,
race, and income. These PSUs were updated using “1994 population estimates of fiveyear-olds by race-ethnicity” (NCES 2004:4-1). These new PSUs were constructed for a
minimum PSU size of 320 five year olds, a size which was designed to allow for over
sampling of specific demographic groups. PSUs which did not meet the minimum
standard were incorporated into an adjoining PSU. Next, private and public schools
offering Kindergarten programs were selected from within the sampled PSUs. The
school sampling frame was comprised of pre-existing school data, with schools not
meeting the minimum number of kindergarten students clustered together. Finally,
students were sampled from these Kindergarten programs, with the goals of obtaining a
“self-weighting sample of students” and a minimum sample size for several targeted
subpopulations (NCES 2004:4-8; West, Denton, and Reaney 2001).
According to the ECLS-K study documentation:
The Early Childhood Longitudinal Study-Kindergarten Class of 1998-99
(ECLS-K) employed a multistage probability sample design to select a
nationally representative sample of children attending kindergarten in
1998-99. The primary sampling units (PSUs) were geographic areas
consisting of counties or groups of counties. The second stage units were
schools within sampled PSUs. The third and final stage units were
students within schools. (NCES 2004:4-1)
In all, 100 PSUs were selected for the ECLS-K. The 24 PSUs with the
largest measures of size were designated as certainty selections or selfrepresenting (SR) and were set aside. Once the SR PSUs were removed,
the remaining PSUs were partitioned into 38 strata of roughly equal
measure of size. The frame of non-SR PSUs was first sorted into eight
35
superstrata by MSA/nonMSA status and by Census region. Within the
four MSA superstrata, the variables used for further stratification were
race-ethnicity (high concentration of API, Black, or Hispanic), size class
(MOS >= 13,000 and MOS < 13,000) and 1988 per capita income. Within
the four non-MSA superstrata, the stratification variables were race,
ethnicity and per capita income. (NCES 2004:4-2)
Once the sampled students were identified, school personnel provided contact
information so that parents could be contacted for consent and to be interviewed
themselves. In addition, each sampled student was linked to their primary teacher. Each
case includes information of the student and parent, teacher and class, and school
obtained via parent interviews, self administered teacher and school administrator
questionnaires, and direct observation student assessments. This information is separated
into three distinct data files: child file, teacher file, and school file (NCES 2004).
In following the methodology of Downey and Pribesh (2004), this study will
focus on only the Fall 1998-99 data. Further, the focus will be restricted to “the 2,707
black and 10,282 white students who were matched with either a black teacher or a white
teacher in the fall of 1998” (Downey and Pribesh 2004:270). This sample of 12,989
includes 1,024 black teachers and 11,965 white teachers.
Dependent Measures
The dependent variables used by Downey and Pribesh (2004) were selected in
order to measure teachers’ subjective assessments of students’ classroom behaviors. The
two ECLS-K variables used are scales constructed by the NCES. The actual questions
used to measure these scales are not available for public use due to copyright issues
(NCES 2004).
36
The Externalizing Problem Behaviors scale asked teachers to rate how often the
student argued, fought, got angry, acted impulsively, and disturbed ongoing activities.
The Approaches to Learning scale asked teachers to rate the student’s attentiveness, task
persistence, eagerness to learn, learning independence, flexibility, and organization.
Responses ranged from 1-4 with 1 = Never, 2 = Sometimes, 3 = Often, and 4 = Very
Often. There was a fifth response option of N/O = No opportunity to observe this
behavior, which was coded as -9. Non-response was coded as a missing value with no
numerical identifier. Since the Externalizing Problem Behavior Scale is measuring the
frequency of negative behaviors, a higher score is actually representative of a poorer
evaluation. Conversely, the Approaches to Learning Scale is measuring the frequency of
positive behaviors, so a higher score is representative of a more favorable evaluation
(Downey and Pribesh 2004; NCES 2004).
Independent Measures
The independent variables measure the student’s and teacher’s race. As indicated
above, Downey and Pribesh (2004) only used black and white students match with black
and white teachers in their analysis. Therefore, the ECLS-K variable for student race
must be modified so that only black and white students are included in the analysis. The
original variable is coded as follows: 1 = white, 2 = black, 3 = Hispanic (race specified),
4 = Hispanic (race not specified), 5 = Asian, 6 = Native Hawaiian/Pacific Islander, 7 =
American Indian/Alaskan Native, 8 = More than one race (not Hispanic), -1 = Not
Applicable, and -9 = Not Ascertained (NCES 2004). This is a composite variable created
by the NCES and its components are not available for public screening. The ECLS-K
37
variable for student race is recoded so that 1 = white, 2 = black, and all other racial
categories set to 0. Additionally, a filter will be applied so that only cases with values of
1 or 2 are included in the analysis (Downey and Pribesh 2004).
The same sort of technique is applied to the ECLS-K variables for teacher race.
However, the procedure is a little different as teacher race is measured using five separate
variables (Native American/Pacific Islander, Asian, black, Hawaiian, and white). The
teacher is coded as either 1 = Yes, 2 = No, or -9 = Not Ascertained for each of the
racial/ethnic categories, with non-responses coded simply as system missing with no
numerical value (NCES 2004). Only the variables asking if the teacher is black or white
are utilized in Downey and Pribesh’s (2004) analysis. Further, only the 1 = Yes values
are included in analysis. Thus, dummy variables for are created which set the 1 = Yes
values to 1 and all other values to 0 for both the black and white teacher race variables.
By doing this, only teachers who responded as being white or black are left with a
numerical value. Finally, a single variable measuring teacher race was created by
combining the black and white teacher race dummy variables. This new teacher race
variable is coded as 1 = white and 2 = black with all other values set to 0 (Downey and
Pribesh 2004).
Several other independent variables were created by Downey and Pribesh (2004)
to represent the (mis-)matching between student and teacher race. They created binary
variables out of the teacher and student race variables to represent the four possible
student-teacher race combinations (black student/black teacher, white student/white
teacher, black student/white teacher, and white student/black teacher). The variables
38
were produced by creating a dummy variable for each of the student and teacher racial
categories separately with 1 = the category to be measured (i.e. for the white student
dummy variable 1 = white and all others = 0). The four student-teacher race variables are
then computed from the product of the appropriate student and teacher dummy race
variables. Downey and Pribesh (2004) included these variables in their regression, with
the white student/white teacher variable as the omitted referent category in Model 4
(Downey and Pribesh 2004).
Control Variables
Several other independent variables measuring important demographic and
background information of the student, teacher, and school are included in Downey and
Pribesh’s (2004) analysis as control variables. These variables are:
Gender, which is a composite variable used to measure the student’s gender based
on the parent interview. The ECLS-K gender variable is coded as 1 = Male, 2 = Female,
-9 = Not Ascertained, and non-response = no value. This variable is recoded to create a
female dummy variable where 1 = Female and 0 = Male.
Socioeconomic status, a categorical variable which measures the student’s
family’s socioeconomic status. This variable is coded as 1 = (bottom) 1st Quintile, 2 =
2nd Quintile, 3 = 3rd Quintile, 4 = 4th Quintile, 5 = (top) 5th Quintile, and non-response =
no value. This variable is also a composite derived from a logarithm of several other
variables measuring the parent’s income, education, and occupational prestige (NCES
2004).
39
Student age, which is measured using a ratio variable reporting student age in
months. This variable is also a composite which was calculated “by determining the
number of days between the child assessment date and the child’s date of birth. The value
was then divided by 30 to calculate the age in months” (NCES 2004:7-7). The actual
values for each student are not available for public use.
Types of parents in household, a nominal variable which measures the types of
parent(s) which lived in the student’s household at the time of the survey. This variable
is coded as 1 = Biological mother and biological father, 2 = Biological mother and other
father (step-, adoptive, foster), 3 = Biological father and other mother (step-, adoptive,
foster), 4 = Biological mother only, 5 = Biological father only, 6 = Two adoptive parents,
7 = Single adoptive parent or adoptive parent and stepparent, 8 = Related guardian(s), 9 =
Unrelated guardian(s), and non-response = no numerical value. A dummy variable is
created for the category with both biological parents with 1 = Biological mother and
biological father and 0 = All other responses.
First-time kindergartner status, a variable which measures whether or not the
student is a first-time kindergartener and is coded as 1 = Yes, 2 = No, -8 = Don’t Know, 9 = Not Ascertained, and Non-response = no value. A dummy variable is created for
first-time kindergarteners where 1 = Yes and 0 = No.
Teacher’s educational attainment, a variable which measures the teacher’s highest
degree achieved. This variable is coded as 1 = High School/Associate’s
Degree/Bachelor’s Degree, 2 = At least one year beyond Bachelor’s, 3 = Master’s
40
Degree, 4 = Education Specialist/Professional Diploma, 5 = Doctorate, -9 = Not
Ascertained, and Non-response = no value.
Teacher age, which is a ratio variable measuring the teacher’s age in years. This
variable was recoded by NCES from the teacher’s response indicating their birth year in
order to protect their privacy and the actual response values are not available for public
use.
Public school status, which is a categorical variable measuring whether the child’s
school is public or private with 1 = Public, 2 = Private, -9 = Not Ascertained, and Nonresponse = no value. Although Downey and Pribesh report in their Table 2 that the
variable is measuring whether the school is identified as public with 1 = No and 0 = Yes,
a closer examination reveals that the original variable is actually asking if the school is
public with 1 = Yes and 2 = No (2004:272). This variable is recoded to create a dummy
variable for public school where 0 = No and 1 = Yes, thus a higher value indicates that
the school is public.
Percentage of students eligible for free lunch, which is a continuous ratio variable
which measure the percentage of students in the school who were eligible for free lunch
at the time of the survey. This is a composite variable created by the NCES from the
school administrators’ responses regarding the number of students enrolled in their
school and number of students who were eligible for free lunch.
Percentage of black students, which is a categorical variable measured at the
school level to determine the percentage of students who are identified as black in that
school at the time of the survey. Downey and Pribesh reported in their Table 2
41
(2004:272) that they were measuring the “percentage of minority students.” However in
their Table 5, which reported the results of their regression, it states that the variable
measured the “percentage of black students” (2004:276). Upon further examination, it
was determined that Downey and Pribesh were in fact looking at the percentage of black
students. The ECLS-K variable used is an ordinal variable which is coded as 1 = 0, 2 =
More than 0 and less than 5, 3 = 5 to less than 10, 4 = 10 to less than 25, 5 = 25 or more, 7 = Refused, -8 = Don’t Know, -9 = Not Ascertained, Non-response = No Value. There
is also a discrepancy in how this variable is coded as the ECLS-K codebook (NCES
2004) indicates the response ranges noted above, whereas Downey and Pribesh report in
their Table 2 (2004:272) that the range is from 0 (none) to 5 (25% or more). It appears
that the range noted in the ECLS-K codebook is correct. This variable is recoded to set
all missing values (-7, -8, and -9) to system missing, thus leaving only the response codes
1-5 for use in the analysis.
Descriptive statistics for all variables included in Downey and Pribesh’s
(2004:272) analysis can be found in Table 1.
Evaluation of Missingness
As discussed earlier, there are many different types of incomplete data and
various reasons behind why the data are missing which are important to consider prior to
performing any statistical data analysis. Because the ECLS-K public use data files have
been modified, it is difficult to accurately evaluate the amount, types, patterns, and
mechanisms of missing data in the ECLS-K data set (NCES 2001). These modifications
include the inclusion of substitute schools to counteract poor school response rates and
42
Table 1: Descriptive Statistics from Downey and Pribesh's Study
Name of Variable
Mean
Standard Deviation
Classroom Behavior
Externalizing Problem Behaviors
1.65
.65
Approaches to Learning
2.98
.68
Black Student
.21
.41
Black Teacher
.08
.27
Black Student x Black Teacher
.06
.24
Female Student
.49
.50
Socioeconomic Status
3.26
1.36
Student’s Age (in months)
34.21
6.72
Mother/Father Household
.66
.47
First-time Kindergartner
.95
.21
Teacher’s Highest Degree
2.09
.91
Teacher’s Age
41.72
9.91
.76
.43
Student-Teacher Race
Student Characteristics
Teacher Characteristics
School Characteristics
Public School
Percentage Eligible for Free Lunch 26.76
26.25
Percentage of Minority Students
1.39
2.92
Source: Downey and Pribesh 2004:272
43
composite variables and other mechanisms to protect the privacy of students and parents.
Further, there are five different codes used for missing values, only one of which is
readily recognized as missing by software such as SPSS and AMOS and several of which
were not used in the construction of composite variables. Consequently, there is some
disparity regarding the rates of missingness published by the NCES and others who
utilized the ECLS-K data set depending on whether the restricted or public use data files
were used and which missing data codes were counted as missing.
Based on data obtained using the ECLS-K base year public use data files and
SPSS, there is a marked disparity between the percentage of cases counted as missing and
those coded as missing under the various missing values codes. For example, for the
variable measuring teachers’ educational level, 15.4% of cases are coded as system
missing and thus counted as missing by SPSS. However, another 5.3% of cases are
coded as “not ascertained” and are given a numerical value of -9 (NCES 2001). Thus,
one could either report a missingness rate of 15.4% or 20.7% for this variable and still
technically be accurate. Due to these coding schemes as well as the recoding of many
variables by NCES prior to the release of the ECLS-K public use data files, it is difficult
to determine the rates and types of missingness present in the ECLS-K base year data.
According to Downey and Pribesh, “missing cases were modest (i.e. less than 5
percent) for most variables. Percentage eligible for free lunches in ECLS is an exception
with one third missing values in kindergarten.” (2004: 275). According the NCES (2001:
4-9), only 26.9% of public schools and 19.3% of all school having missing values for
percentage eligible for free lunch. While these numbers remain high, they are not the one
44
third reported by Downey and Pribesh. However, based on the ECLS-K public use data
files and SPSS’ descriptive statistics function, 41.27% of schools have missing date on
this variable. This difference may be explained by the fact that the NCES reported
statistics on the original school sample of 1,277 schools, whereas it is assumed that
Downey and Pribesh reported statistics based on the 866 schools used in the public use
data files and may have only counted certain missing values, while the 41.27% is the sum
of all types of missing data for that variable (NCES 2001). Still, while Downey and
Pribesh reported modest rates of missing data, there are several variables used in this
analysis with relatively high rates of missingness as can be seen in Table 2.
Incomplete data can introduce many problems in statistical analysis including
nonresponse bias. Consequently, the National Center for Education Statistics has set a
standard for its surveys where any survey with an overall response rate of less than 70
percent is subject to a nonresponse bias analysis (Bose 2001). The purpose of the
nonresponse bias analysis is to identify any potential sources of bias and address them, if
possible. Nevertheless, unless the true population values are known, an accurate
45
Table 2: Percentage of Missing Data from ECLS-K Public Use Data Files (obtained via SPSS)
System Missing
“Not Ascertained”
Externalizing Problem Behaviors
9.4
1.60
11%
Approaches to Learning
9.4
.37
9.77%
Student Race
0
.33
0.33%
Black Teacher
9.5
4.4
13.9%
White Teacher
9.5
4.4
13.9%
Gender
0
.06
0.06%
Socioeconomic Status
5.3
Age
6.9
Types of Parents in Household
14.9
First-time Kindergartner
14.9
.04
Teacher’s Highest Degree
15.4
5.3
20.7%
Teacher’s Age
15.5
3.81
19.31%
Public School
14.6
.20
14.8%
Percentage Eligible for Free Lunch
14.6
26.67
41.27%
Percentage of Black Students
14.6
4.1
18.7%
Name of Variable
Classroom Behavior
“Don’t Know”
Total
Student-Teacher Race
Student Characteristics
5.3%
3.45
10.35%
14.9%
.11
15.05%
Teacher Characteristics
School Characteristics
46
evaluation of bias is not possible. Further, even in cases where the real population values
are known, there is no way to tell if bias is due to nonresponse or some other factor such
as sampling bias. In order to fully understand the NCES’ studies on nonresponse, one
must first understand that the NCES uses two separate components to discuss
nonresponse bias. First, the completion rate refers to the percentage of participating units
and is calculated independently for the various components of a survey. Second is the
response rate, which refers to the overall percentage of participation in the study as
determined by computing the product of the completion rate at the various stages (Bose
2001; West, Denton, and Reaney 2001).
The ECLS-K survey is multifaceted and included data collected from various
sources, including the school, the student, and the parents. In the base year (1998-1999),
students had a completion rate of 92% and parents had a completion rate of 89%.
Unfortunately, only 944 of the 1,277 (74%) schools sampled participated in the first year
of the study and only 69.4% of schools participated in the fall wave. Therefore, when the
student response rate was computed, as a product of the student and school completion
rates, it came to only 68.1% and the parent response rate equaled only 65.9%. Thus, in
following with NCES standards, a nonresponse bias analysis was conducted on the
ECLS-K base year data (Bose 2001; West, Denton, and Reaney 2001). Five separate
tests for potential nonresponse bias were conducted, and the findings indicated that there
was no evidence of bias due to school nonresponse. Still, the NCES has admitted that
recruiting schools to willingly participate in the first wave of data collection in the ECLSK was harder than for any of the subsequent waves and that this nonresponse and lack of
47
cooperation may have resulted in biased estimates not identified in the analysis (Bose
2001). In fact, where school response rates were below 65%, substitute schools were
recruited with at least 74 substitute schools participating in the fall data collection (NCES
2001). Additionally, the NCES had to implement a “special refusal conversion effort” to
recruit parents (NCES 2001: 5-14) and began offering monetary incentives to teachers.
Unfortunately, there is no way for this current study to perform its own nonresponse bias
analysis of the ECLS-K Base Year data set as the actual data in its original form with all
values intact are not available for public use due to privacy issues.
Analytical Plan
As indicated above, this study will examine three separate statistical analyses in
order to compare three different methods for handling missing data, with the aim of
evaluating the results of Downey and Pribesh’s 2004 study of the student-teacher racial
dyad. Each method will regress the two measures of teachers’ evaluations of students’
classroom behaviors on the students’ and teachers’ race and the interaction between
students’ and teachers’ race using the same four models which Downey and Pribesh
utilized. The missing data methods to be used are: listwise deletion (LD), as it is the
most commonly used and generally applicable traditional method; direct maximum
likelihood (DML), which is a relatively easy to implement but more sophisticated
method; and multiple imputation (MI), the method which Downey and Pribesh (2004)
chose. For each of these analyses, the exact same variables will be utilized and all
measures will be taken to ensure these analyses remain comparable. The major
difference will be the methodology used in dealing with the incomplete cases in the
48
ECSL-K 1998-99 data set. In order to employ the missing data treatments, the variables
may further recoded to appropriately handle the incomplete, omitted, and other missing
values in following the particular method’s standard procedures. Each of these methods
has been discussed thoroughly in the review of prior literature.
Listwise deletion
The first of these analyses will be an ordinary least squares regression analysis
using the method of LD to deal with missing data (Allison 1999; Allison 2002; Byrne
2001b; Enders 2006; Rubin 1976). This analysis will be performed using the standard
SPSS statistical software package to edit the ECLS-K Base Year Public Use data set and
run a multiple regression analysis with missing values excluded using the listwise
deletion method. Before running the regression, each variable was examined and recoded
as necessary to specify all missing data values as “system missing”. Only those values
which represented meaningful categories were included in the analysis.
On page 270, Downey and Pribesh (2004) include a table showing the distribution
by race of black and white students matched with black or white teachers. After applying
a filter which restricted the analysis to only black and white kindergarten students with
black or white teachers in the Fall, the results obtained by performing this crosstabulation in SPSS were identical to those of Downey and Pribesh. These data can be
found in Table 3. Aside from using this table as an initial point of comparison between
the two methods, this table also provides some useful substantive information. As
Downey and Pribesh point out, almost 70% of the black students are matched with a
49
teacher of a different race as opposed to less than 2% of the white students. This puts the
disproportionate number of white teachers into perspective for the reader.
Table 3: Race of Student by Race of Teacher from Downey and Pribesh’s Study
Teacher’s Race
Student’s Race
White
Black
Total
White
10,090
192
10,282
Black
1,875
832
2,707
11,965
1,024
12,989
Total
Source: Downey and Pribesh 2004:270
Table 4 reports the descriptive statistics (mean and standard deviation) of the
variables used in the multiple regression using LD after all recoding had been completed.
When compared to Table 1, which reports the mean and standard deviation values
according to Downey and Pribesh’s study, one can see that there is a slight difference in
most of the values. For the most part, the mean and standard deviation values vary by
less than +/- 0.10. Still, only three variables have matching mean and standard deviation
values (female, mother/father household, and first-time kindergartner).
Surprisingly, the mean value attained via SPSS for student age is 68.52 which is
over double that reported by Downey and Pribesh (34.21). Upon closer examination, it
appears that the value reported by Downey and Pribesh may be a typographical error.
Since the variable measures student age in months, the mean age according to Downey
and Pribesh would be 2.85 years, which is several years younger than the customary
enrollment age for kindergarten. The mean age as determined via SPSS using the ECLS-
50
K variable for student age would be 5.71 years, which appears to be a more reasonable
value for the mean age of kindergartners.
Aside from the discrepancies in the student age variable, the only other difference
which appears significant is that for the variable measuring the percentage of students in
the school eligible for free lunch. The mean value from the SPSS analysis is 1.42 greater
than that reported by Downey and Pribesh, with an increase in the standard deviation of
0.84. According to Downey and Pribesh (2004), the rate of missingness for this variable
is the highest of all variables used with one third missing. This may be one explanation
for the difference in mean values, as missing values were excluded through listwise
deletion in SPSS which results in a smaller sample size and possible increase in mean
values.
51
Table 4: Descriptive Statistics for the Variables Used in the Listwise Deletion Analysis
Name of Variable
Mean
Standard Deviation
Externalizing Problem Behavior
1.63
.64
Approaches to Learning
2.97
.68
Black Student
.19
.39
Black Teacher
.06
.24
Black Student x Black Teacher
.04
.21
Female Student
.49
.50
Socioeconomic Status
3.15
1.40
Student’s Age (in months)
68.52
4.34
Mother/Father Household
.66
.47
First-time Kindergartner
.95
.21
Highest Degree
2.12
.90
Teacher’s Age
41.80
10.06
Public School
.77
.42
Percentage Elig. For Free Lunch
28.18
27.09
Percentage of Black Students
2.86
1.35
Classroom Behavior
Student-Teacher Race
Student Characteristics
Teacher Characteristics
School Characteristics
52
Direct maximum likelihood
The second analysis will be performed using the method of DML estimation to
handle the missing data issues (Allison 2002; Arbuckle 2007; Byrne 2001b; Enders
2006). This analysis will use the Amos Graphics software package with the full
information maximum likelihood function enabled to account for missing data by
estimating means and intercepts (Arbuckle 2007). The Amos program will utilize
structural equation modeling to estimate the regression equation from the available data
and the results from this analysis can be interpreted as if the analysis had been performed
using a complete data set. Downey and Pribesh utilized adjusted standard errors and
robust standard errors are available using Amos’ bootstrapping procedure, however this
procedure requires complete data and therefore cannot be used in this study. Additionally,
whereas Downey and Pribesh did not publish any measures of fit for their models, Amos
calculates numerous model fit measures and the current study will use squared multiple
correlations (R2) values as this was also the model fit measure calculated using the LD
method.
As indicated in the literature review, Amos allows the user to perform analyses
using path diagrams and has a non-traditional user interface which may be difficult for
social scientists to employ without specific training. Because of this and other program
nuances, the literature suggests that all data editing and recoding should be completed
prior to working with the dataset in Amos. It is recommended that SPSS be used to
perform this recoding as Amos is able to use SPSS data files and recognizes the periods
(.) in SPSS data sets as missing values (Arbuckle 2007; Zeiser 2008). Thus, prior to
53
performing the DML analysis in Amos, all variables to be used were examined and
recoded as necessary to specify missing data values and create dummy variables using
SPSS.
Multiple Imputation
The third analysis has already been performed by Downey and Pribesh (2004),
using MI to treat the missing data and create a complete data set (Allison 2002; Byrne
2001b; Enders 2006; Schafer 1999; Schafer and Olson 1998). Downey and Pribesh
(2004) used the SAS program to complete five imputed data sets using all the variables
included in their analysis and perform the subsequent multiple regression analysis.
Additionally, Downey and Pribesh utilized adjusted standard errors. The results reported
for the MI method will be taken directly from Downey and Pribesh’s published results
(2004).
The results obtained from these three analyses will be compared to one another.
The goal of this comparison will be to evaluate the substantive results published by
Downey and Pribesh (2004). This will be achieved by valuating and comparing the
statistical output and application of the various methods for handling missing data. This
methodological comparison will test this study’s claim that DML and MI will produce
equivalent results and that DML may be generally recommended over MI as it is easier to
implement.
54
Chapter 4
FINDINGS
This study tested the hypothesis that DML and MI will produce equivalent results
and in application arrive at the same substantive conclusions; and thus DML should be
used whenever appropriate, as it is generally easier to implement than MI. Some
discrepancies were observed in this methodological comparison as MI resulted in fewer
statistically significant independent variables and reduced significance levels when
compared to DML and LD. However, in general, the hypothesis can be supported as MI
and DML did produce equivalent results and arrive at the same substantive conclusions in
the current methodological comparison.
In testing this study’s hypothesis that DML and MI will produce equivalent
results and thus will arrive at the same substantive conclusions in application, the results
of an identical statistical analysis performed using both MI and DML will be evaluated
and discussed. Additionally, the analysis was performed using the traditional LD method
and these results will also be discussed. This discussion will focus on the MI and DML
methods as these methods are generally supported on a theoretical basis in the literature.
The basis for this methodological comparison is a study of the effect of race on teachers’
evaluations of students’ classroom behaviors performed by Downey and Pribesh (2004).
Downey and Pribesh studied the effects of two dependent variables in four separate
statistical models, with their substantive conclusion based primarily on the results of the
fourth model. Therefore, while all models have been evaluated, the focus of this
55
discussion will be on the fourth model. As will be discussed below, Tables 5, 6, 7, and 8
present the quantitative results of all three regression analyses.
When comparing the results from all three methods, there is a distinctive pattern
which emerges. While not as prominent in Models 1 and 2 as in Models 3 and 4, it is
evident from all models that MI resulted in fewer statistically significant independent
variables and reduced significance levels when contrasted with DML and LD. This
means the standard errors for the DML and LD analyses are attenuated. In fact, DML
generally resulted in an increased number of statistically significant independent
variables than even LD which is was an unexpected outcome. This overarching trend can
be most clearly recognized when looking at the teacher and school characteristic
variables in Model 3 and 4 for both dependent variables. There one can see that DML
produced an increased statistical significance for a number of independent variables when
compared with MI and LD, although LD also resulted in inflated numbers of statistically
significant variables when evaluated against MI. Although this trend is prevalent in the
teacher and school characteristic variables, there is only one variable which was found to
be statistically significant by the MI method and not DML (with reduced significance
level using LD): students’ socioeconomic status on the problem behaviors dependent
variable. These trends and other methodological differences will be discussed below.
56
Table 5: Unstandardized Regression Coefficients for Dependent Variable
Externalizing Problem Behaviors (Model 1 and Model 2 only).
Model 1
Variable
Model 2
MI
DML
LD
MI
DML
LD
.222***
.225***
.237***
.237***
.241***
.253***
(.017)
(.015)
(.016)
(.018)
(.016)
(.016)
-.116**
-.132***
-.124***
.003
-.001
.013
(.029)
(.021)
(.024)
(.054)
(.046)
(.047)
-.162**
-.177**
-.184**
(.061)
(.055)
(.055)
Student-Teacher Race
Black Student
Black Teacher
Black Student x
Black Teacher
Constant
1.63
1.60
1.61
1.62
1.60
1.61
R Square
a
.017
.018
a
.017
.019
*p<.05, **p<.01, ***p<.001 (two-tailed tests).
a
Downey and Pribesh (2004) did not report an R Square statistic.
Table 6: Unstandardized Regression Coefficients for Dependent Variable Approaches
to Learning (Model 1 and Model 2 only).
Model 1
Variable
Model 2
MI
DML
LD
MI
DML
LD
-.240***
-.266***
-.270***
-.243***
-.274***
-.279***
(.017)
(.016)
(.016)
(.019)
(.016)
(.017)
.040
.086***
.056*
.008
.013
-.020
(.034)
(.022)
(.025)
(.058)
(.049)
(.049)
.042
.098
.100
(.065)
(.057)
(.057)
3.03
3.01
3.01
3.03
.023
a
.022
.024
Student-Teacher Race
Black Student
Black Teacher
Black Student x
Black Teacher
Constant
3.01
R Square
a
3.01
.022
*p<.05, **p<.01, ***p<.001 (two-tailed tests).
a
Downey and Pribesh (2004) did not report an R Square statistic.
57
Table 7: Unstandardized Regression Coefficients for Dependent Variable Externalizing
Problem Behaviors (Model 3 and Model 4 only).
Variable
MI
Model 3
DML
LD
MI
Model 4
DML
LD
Student-Teacher Race
Black Student
Black Teacher
Black Student x
Black Teacher
.151***
.142***
.176***
(.020)
(.019)
(.027)
-.001
.029
-.020
(.053)
(.045)
(.069)
-.208***
-.213***
-.182*
(.059)
(.053)
(.082)
White StudentWhite Teacher
----
----
----
Black StudentBlack Teacher
-.063
-.033
-.026
(.034)
(.028)
(.045)
Black StudentWhite Teacher
.150***
.150***
.176***
(.020)
(.020)
(.027)
White StudentBlack Teacher
.006
.038
-.020
(.051)
(.046)
(.069)
Students’ Characteristics
Female Student
-.261***
-.264***
-.271***
-.263***
-.264***
-.271***
(.011)
(.009)
(.015)
(.011)
(.009)
(.015)
-.018***
-.005
-.017**
-.017***
-.005
-.017**
(.005)
(.004)
(.006)
(.005)
(.004)
(.006)
-.006***
-.005***
-.007***
-.006***
-.005***
-.007***
(.001)
(.001)
(.002)
(.001)
(.001)
(.002)
Types of Parents in
Household
-.189***
-.205***
-.189***
-.192***
-.205***
-.189***
(.014)
(.011)
(.017)
(.014)
(.011)
(.017)
First-time Kindergartner
-.147***
-.127***
-.132***
-.146***
-.128***
-.132***
(.028)
(.023)
(.037)
(.029)
(.023)
(.037)
-.020*
-.016**
-.035***
-.020*
-.017**
-.035***
(.009)
(.006)
(.009)
(.009)
(.006)
(.009)
Students’ SES
Students’ Age
Teachers’ Characteristics
Highest Degree
58
Variable
Model 3
DML
LD
MI
Model 4
DML
LD
-.001
-.002***
-.002*
-.002*
-.002***
-.002*
(.000)
(.000)
(.001)
(.000)
(.000)
(.001)
-.043*
-.059***
-.055**
-.044*
-.062***
-.055**
(.021)
(.014)
(.021)
(.022)
(.014)
(.021)
Percentage Eligible For
Free Lunch
.001**
.001***
.001**
.001*
.001***
.001**
(.000)
(.000)
(.000)
(.000)
(.000)
(.000)
Percentage of Black
Students
-.011*
-.019***
-.017*
-.013*
-.019***
-.017*
(.007)
(.005)
(.007)
(.007)
(.005)
(.007)
Constant
2.61
2.53
2.77
2.64
2.55
2.77
R Square
a
.086
.095
a
.085
.095
MI
Teachers’ Age
School Characteristics
Public School
*p<.05, **p<.01, ***p<.001 (two-tailed tests).
a
Downey and Pribesh (2004) did not report an R Square statistic.
59
Table 8: Unstandardized Regression Coefficients for Dependent Variable Approaches
to Learning (Model 3 and Model 4 only).
Variable
MI
Model 3
DML
LD
-.127***
-.126***
-.171***
(.020)
(.020)
(.028)
.024
-.025
-.050
(.056)
(.046)
(.070)
.082
.148**
.178*
(.062)
(.055)
(.083)
MI
Model 4
DML
LD
Student-Teacher Race
Black Student
Black Teacher
Black Student x
Black Teacher
White StudentWhite Teacher
----
----
----
Black StudentBlack Teacher
-.022
-.006
-.042
(.040)
(.029)
(.046)
Black StudentWhite Teacher
-.127***
-.130***
-.171***
(.020)
(.020)
(.028)
White StudentBlack Teacher
.022
-.028
-.050
(.056)
(.048)
(.070)
Students’ Characteristics
Female Student
.285***
.281***
.292***
.284***
.281***
.292***
(.011)
(.009)
(.015)
(.011)
(.009)
(.015)
.088***
.074***
.085***
.087***
.075***
.085***
(.005)
(.004)
(.007)
(.005)
(.004)
(.007)
.025***
.024***
.026***
.025***
.024***
.026***
(.001)
(.001)
(.002)
(.001)
(.001)
(.002)
Types of Parents in
Household
.147***
.175***
.156***
.146***
.176***
.156***
(.013)
(.011)
(.018)
(.013)
(.011)
(.018)
First-time Kindergartner
.211***
.204***
.238***
.211***
.205***
.238***
(.030)
(.024)
(.038)
(.029)
(.024)
(.038)
-.006
.000
.010
-.006
.000
.010
(.011)
(.006)
(.009)
(.003)
(.006)
(.009)
Students’ SES
Students’ Age
Teachers’ Characteristics
Highest Degree
60
Variable
MI
Teachers’ Age
Model 3
DML
LD
MI
Model 4
DML
LD
-.002
-.002***
-.003**
-.002
-.002***
-.003**
(.001)
(.001)
(.001)
(.001)
(.001)
(.001)
.005
.036**
.034
.005
.039**
.034
(.023)
(.014)
(.021)
(.023)
(.014)
(.021)
Percentage Eligible For
Free Lunch
.000
-.001**
.000
.000
-.001**
.000
(.000)
(.000)
(.000)
(.000)
(.000)
(.000)
Percentage of Black
Students
.005
.018***
.016*
.007
.018***
.016*
(.008)
(.005)
(.007)
(.008)
(.005)
(.007)
Constant
.56
.68
.54
.55
.66
.54
R Square
a
.128
.134
a
.127
.134
School Characteristics
Public School
*p<.05, **p<.01, ***p<.001 (two-tailed tests).
a
Downey and Pribesh (2004) did not report an R Square statistic.
Methodological Comparisons
Multiple Imputation: A summary of findings from Downey and Pribesh’s study
For the purposes of this study, the results published by Downey and Pribesh
(2004) will be utilized as an example of a multiple regression analysis using the MI
method for handling incomplete and missing data and are the primary referent against
which the other two methods are being evaluated. The MI columns of Tables 5, 6, 7, and
8 present the results of Downey and Pribesh’s analysis which found that black students
matched with white teachers receive statistically increased reports of problem behaviors
and decreased reports of favorable behaviors when compared to white students matched
61
with white teachers (2004:276). They did not find any significant differences when
looking at black students matched with black teachers or white students matched with
black teachers.
As noted above, when compared to DML and LD methods, the MI analysis
resulted in decreased levels of statistical significance for a large number of variables. In
Model 1, the black teacher variable had a decreased regression coefficient and decreased
significance level using the MI method as opposed to the DML or LD methods. While
there were no significant differences between MI and DML or LD in Model 2, this trend
becomes more obvious in Model 3 and 4. In Models 3 and 4 with the problem behaviors
dependent variable, the MI significance levels for all teacher and school characteristic
variables are lower than they are using DML and most are lower than with LD. This
pattern continues with the learning approaches dependent variable, with the exception of
the variable for teachers’ education level. However, this trend is not universal; in Model
3 for the dependent variable externalizing problem behaviors, there is one variable which
was found to be statistically significant using the MI method, but not with DML:
students’ socioeconomic status. Whereas it was expected that MI would produce less
inflated estimates when compared to LD, it was not expected that the estimates would be
so unlike those obtained via DML.
Direct Maximum Likelihood
The results of the regression analysis using the DML method to handle missing
data can be found in the DML columns of Tables 5, 6, 7, and 8. As indicated above, the
findings from the DML analysis do vary somewhat from those of the analysis using MI
62
and are fairly similar to those obtained via LD. While the regression coefficient and
standard errors values do differ slightly, the major differences between MI and DML are
found in the coefficient significance values. Aside from a few variables, DML resulted in
increased levels of statistical significance when compared to both the MI and LD
methods. Again, this pattern is most clearly evident when looking at the school and
teacher characteristic variables in Model 3 and 4, although it is also present in Model 1.
Looking at Model 1 with Approaches to Learning as the dependent variable, one sees that
the teacher race variable is highly statistically significant in the DML analysis (b =
.086***) whereas it was not significant at all in the MI analysis. This was also the case
in the LD analysis (b = .056*). For all models, the DML R2 values reveal that these
models have somewhat weak predictive power with the largest R2 value being .128. The
R2 values for this Model 1 indicate that only 1.7% of the variance in the Externalizing
Problem Behaviors scale and 2.2% of the Approaches to Learning scare is explained by
student and teacher’s race alone, values which are also very similar to those obtained via
LD. When the first student-teacher racial matching variable, black student-black teacher,
was added in Model 2, there were actually no discrepancies found between the regression
coefficient significance levels in the MI and DML analyses. This change in variables for
teacher and student race had no affect on the squared multiple correlations values, which
remain at .017 and .024 respectively. The majority of the differences between the DML
and MI methods arise in Models 3 and 4.
In Model 3, the addition of independent variables as controls increases the
discrepancies between the DML and MI methods. Most of these differences are found
63
between the control variables’ significance values. Many of the teacher and school
characteristics which were not significant in the MI analysis were when the analysis was
performed using DML, this was also the case in the LD analysis. For the dependent
variable Externalizing Problem Behaviors, there is no discrepancy in the significance
levels and very little discrepancy in the coefficient values for the three variables
measuring student and teacher race. However, there is one variable which was highly
significant in the MI analysis which is no longer statistically significant using DML,
students’ socioeconomic status. Conversely, there are four variables with increased
significance values using DML versus MI on both dependent variables: teachers’
education, public school, percentage eligible for free lunch, and percentage of black
students. Further with the problem behaviors dependent variable, teachers’ age was
found to be not significant in the MI analysis but is highly statistically significant in the
DML analysis, although its effects appear minimal (b = -.002***), as was the case in the
LD analysis. The differences between the MI and DML findings continue when Model 3
for the dependent variable Approaches to Learning is examined. In this analysis, the
student-teacher racial matching variable is found to be a significant indicator with black
students matched with black teachers receiving better evaluations than other studentteacher racial combinations (b = .148**), although this variable was not found to be
significant in the MI analysis.
With the substitution of student-teacher racial matching variables for student and
teacher race variables in Model 4, there is not much change in the incongruity between
methods found in Model 3. For the dependent variable Externalizing Problem Behaviors,
64
again the student’s socioeconomic status is not significant using DML, whereas it was
with both the LD and MI methods. Also, there are four variables which have increased
significance values than that obtained via the MI method: teacher’s education, teacher’s
age, public school, percentage eligible for free lunch, and percentage of black students.
All of the indicators for the Approaches to Learning variable that were significant in
Model 3 remain so in Model 4. Thus, the four variables (teacher’s age, public school,
percentage eligible for free lunch, and percentage of black students) remain statistically
significant in the DML analysis, although they were not found to be significant using the
MI method. In this fully specified model, the squared multiple correlations values are
.085 for Externalizing Problem Behaviors and .127 for Approaches to Learning.
Despite the literature which suggested that the DML and MI results would be
similar, there were actually more similarities between the DML and LD methods and
more inconsistencies between the DML and MI methods. In fact, the DML method
found more variables to be significant and generally increased significance values even
when compared to the LD method. The most glaring example of this is that in Models 3
and 4, both the DML and LD methods found the teacher and school characteristic
variables to have increased statistical significance as indicators for both dependent
variables when compared to MI. It is also important to note that only one variable was
found to be significant by the MI method and not by the DML method, students’
socioeconomic status. While the theoretical literature suggests that DML and MI should
reach equivalent results and that LD will produce dramatically biased and inefficient
65
results, this comparison unexpectedly revealed less similarities between DML and MI
than with the LD method.
Listwise Deletion
The LD columns of Tables 5, 6, 7, and 8 report the results of regressing the two
measures of teachers’ evaluations of students’ classroom behavior on students’ and
teachers’ race using SPSS to perform the regression analysis with missing values
excluded via LD. In following Downey and Pribesh’s design, four separate models were
used for each of the two dependent variables. While Downey and Pribesh did not publish
any evaluations of their models’ predictions, the current study tested the LD models’
predictions using squared multiple correlation (R2) values. The R2 values were
statistically significant for all models using LD, and the fully specified models have R2
values of .095 and .134.
Given the differences in the results of the MI and LD methods of handling
incomplete and missing data with only one data set, one can clearly see how method
choice can effect substantive conclusions. It is evident that the LD analysis has resulted
in increased statistical significance for a number of variables as compared to MI. This
may be due to the fact that there are generally far fewer cases being counted in LD versus
the MI method. While Downey and Pribesh (2004) did not publish their sample size
using MI, it is assumed that they utilized the entire sample of black and white students
matched with black or white teachers (n = 12989); whereas the fully specified models
have a significantly reduced sample size using LD (n = 6917 for Externalizing Problem
Behaviors and n = 6984 for Approaches to Learning). Substantive conclusions can vary
66
considerably based on simply choosing a different method of dealing with missing data
and subsequent analysis. For example, if one were to perform this analysis using MI as
Downey and Pribesh did, one would not consider teacher and school characteristics such
as teacher’s educational level, public versus private school, and percentage of students in
the school eligible for free lunch to be significant whereas someone performing the same
analysis using LD may base their substantive conclusion on the inflated significance of
these variables. While it is difficult to determine which method to choose, a brief review
of the LD analysis will support the literature which advises against the use of LD on a
theoretical basis.
As with DML, while the differences between MI and LD are most clearly seen in
Models 3 and 4 they are present in Model 1 as well. In the first model, which looks at the
effects of students’ and teachers’ race, the LD analysis resulted in the increased statistical
significance of the black teacher variable on both dependent variables when compared to
MI. However, like the DML analysis, there are no significant differences between the
LD and MI results for Model 2, which includes the student race and teacher race
variables with the addition of a variable measuring the interaction of student-teacher race
(black student x black teacher). With the addition of the black student-black teacher
variable, the variable for teacher race is no longer statistically significant for either of the
dependent variables (recall that Downey and Pribesh did not find that it was significant in
the Approaches to Learning Model 1). In just examining these two models, it is already
apparent that the LD analysis has resulted in increased significance levels when evaluated
against the MI analysis. Yet, its similarities with DML are also emerging.
67
The discrepancies between the LD and MI methods increase with the addition of
control variables in Models 3 and 4. While there is some minor variation in the values of
the regression coefficients and standard errors, the primary differences are evident in the
significance values. Again, this is similar to the pattern seen in the DML comparison. In
looking at the Model 3 analysis with the Externalizing Problem Behaviors scale as the
dependent variable, it is evident that there is an inconsistency in the significance values
of several variables. Both black student-black teacher and students’ socioeconomic status
were found to have higher levels of significance in the MI analysis than in the LD
analysis. Several other variables were found to have higher significance levels in the LD
analysis than in the MI analysis: teacher’s educational level, teacher’s age (which was
not found to be significant at all in MI analysis), and public school. In the Model 3 for
Approaches to Learning, there are three statistically significant variables which were not
significant in the MI analysis: black student-black teacher, teachers’ age, and percentage
of black students in the school. One can see that teacher and school characteristics have
an increase in significance levels in the LD results, as compared to the MI analysis, a
trend which was also evident in the DML analysis.
In Model 4, Downey and Pribesh replace the student race and teacher race
variables with three measures of student-teacher racial matching, with the white studentwhite teacher combination as the omitted referent category. This change in variables had
very little effect on the results and did not result in much change in regard to the
remaining independent variables. Thus, the discrepancies found between the MI and LD
methods in Model 4 are similar to those from Model 3. For the problem behaviors
68
analyses, the students’ socioeconomic status remains less significant in the LD analysis
than in MI. Additionally, teachers’ educational level, public school, and percentage of
students eligible for free lunch in the school have increased statistical significance in the
LD method as compared to the MI method. In the analysis using Approaches to Learning
as the dependent variable, the discrepancies between the LD and MI methods continue.
Both teachers’ age and the percentage of black students in the school are statistically
significant in LD whereas they where not in MI. In both Model 3 and 4, variables were
found to be statistically significant using the LD method but not significant using the MI
method which could result in very different substantive conclusions regarding the effects
of teacher and school characteristics on teachers’ evaluations of students’ behaviors.
Thus, the current methodological study supported the theoretical literature’s opinion that
LD will result in biased and inefficient estimates due to dramatically decreased sample
size. However, this study’s findings also departed from the literature in that it found the
same patterns of increased significance levels in the DML method as it did in LD.
Summary of Findings
Given this comparison of three different methods for handling incomplete data
(LD, MI, and DML), the hypothesis that DML and MI will produce equivalent results
cannot be fully supported. While DML and MI did produce comparable results regarding
the primary independent variables' effects on the dependent variables, they produced
substantially different findings regarding which control variables may be significant
indicators as well, with both DML and LD having inflated significance levels for several
independent variables. While this did not affect Downey and Pribesh’s ultimate
69
conclusion, it may have if the study’s authors had not been as focused on the race
variables. If the independent variable of interest had been a teacher or school
characteristic, it may have been relegated to being not significant using MI but would be
found statistically significant using DML or even LD. Consequently, while DML and MI
did produce equivalent results from the perspective of Downey and Pribesh, the methods
may have resulted in entirely different findings if one of the control variables had been
the focus of the study instead of race.
Nonetheless, the hypothesis that in application DML and MI will arrive at the
same substantive conclusions can be supported. Downey and Pribesh’s substantive
conclusion, that black students receive less favorable evaluations on their classroom
behavior as a function of teacher bias, would be supported by performing the statistical
analysis with any of these three methods. In all three methods, student race remained a
highly significant indicator of poor evaluations on both dependent variables for all
models. Additionally, the black student-white teacher racial matching variable was found
to be statistically significant with black students paired with white teachers receiving the
worst evaluations when compared to the other black/white student-teacher racial
combinations across all models and with all methods. Therefore, even though there are
somewhat significant differences in regard to which additional independent variables are
significant indicators of the dependent variables, the evidence which Downey and Pribesh
used to arrive at their substantive conclusion is present using any of these three methods.
Thus, one can conclude that DML and MI in application may arrive at the same
substantive conclusions.
70
Chapter 5
DISCUSSION
Discussion of Findings
Although the literature suggested that DML and MI would produce somewhat
equivalent results, the current study found that DML shared more similarities with LD
than with MI. In general, both DML and LD produced increased significance levels and
therefore a greater number of statistically significant variables when compared to the MI
results. Despite these differences, the regression analyses using DML and LD both still
resulted in findings which supported the substantive conclusions Downey and Pribesh
(2004) reached using the MI method; thus, the hypothesis that in practice MI and DML
will produce equivalent results was generally supported.
The hypothesis also suggested that because MI and DML will produce equivalent
results, DML should be used whenever possible as it is generally easier to implement.
Unfortunately, the current study was unable to adequately test this hypothesis as this
author did not execute the analysis using MI, but used Downey and Pribesh’s published
results using MI instead. In application this author did find that both DML and LD were
fairly equivalent and simple in their implementation, with the only drawback to
implementing DML being the need to acclimate to the use of a graphical user interface to
input data in a path diagram as opposed to the traditional menu driven programs.
Therefore, one can conclude that with the increased theoretical support for the use of
DML and its fairly simple implementation that it should definitely be favored over LD
whenever possible. Still, as the literature suggests that MI procedures require a large
71
amount of computer memory space and processing time due to the creation of numerous
imputed datasets, one may be fairly certain that DML should be suggested over MI when
appropriate as it takes no additional processing time or storage space.
Based on the prior literature, one can conclude that the MI estimates are the most
valid and efficient and least biased of the three used in this methodological comparison.
Accordingly, one would also conclude that the inflated results obtained via the DML and
LD methods may have been effected by bias due to missing data, at least to some extent.
For these reasons, the prior literature advises that one should evaluate the missingness of
a data set before beginning a statistical analysis. Unfortunately, the current study was
unable to adequately review the missingness of the ECLS-K dataset as the public use data
files were edited by the NCES prior to release, concealing a great deal of information
from the data analyst.
Evaluation and Critique of Study
The current study is a re-examination of an existing study by Downey and Pribesh
(2004) on the effects of students’ and teachers’ race on teachers’ evaluations of students’
classroom behaviors using data from the ECLS-K dataset. Downey and Pribesh
performed a secondary data analysis to test whether black students received less
favorable evaluations than white students due to teachers’ racial bias or black students’
adoption of Oppositional Culture. Based on their statistical findings using the MI method
to handle missing data in the ECLS-K dataset, Downey and Pribesh concluded that the
black students received more negative evaluations than white students as a result of
teachers’ racial bias. As Downey and Pribesh state, these findings show that race
72
continues to be an important factor in the classroom and that the idea of student-teacher
racial matching deserves further attention. They suggest that the next step is to study
how racial matching affects students’ academic achievement.
As with all studies, there are some limitations to Downey and Pribesh’s study.
Their paper provides a brief discussion of the limitations they perceived, such as the fact
that their findings were based on a limited amount of data gathered at only one specific
point in time, included only teachers’ evaluations of students’ behaviors, and did not
include any students’ evaluations of teachers. There are other apparent weaknesses to
Downey and Pribesh’s study. As with all statistical analyses, there are limitations based
on the particular dataset used. The limitations of this study inherent to the secondary
analysis of the ECLS-K dataset will be discussed in detail in a separate section below.
Downey and Pribesh’s study can also be critiqued based on a close examination of the
study’s methodology and theoretical reasoning.
When reviewing published studies based on statistical analyses, one must be sure
to evaluate the theoretical and methodological basis for the substantive conclusions made.
On a basic level, one must ask whether the variables are actually measuring what the
authors claim that they are measuring and whether the model is actually testing the
hypothesis that the authors have developed. As Downey and Pribesh point out, they only
use measures of teachers’ evaluations of students’ classroom behaviors from the ECLS-K
as the basis for their substantive conclusion that teachers are biased against black
students. This is problematic as they do not use the most common and official means by
which teachers evaluate students, grades. Teachers receive training to evaluate students
73
on an academic grading system and may not have understood the evaluation scheme of
the ECLS-K survey or may have provided evaluations without much thought. Therefore,
while it is important to understand teachers’ opinions and attitudes regarding students’
behaviors, the Externalizing Problem Behaviors ad Approaches to Learning Scales may
not have been the most reliable or valid measures to use. Additionally, whereas the
ECLS-K gathered fairly detailed ethnic and racial data from a nationwide sample of
student and teachers, Downey and Pribesh only used data from black and white students
and teachers in their analysis. These findings were generalized in the substantive
conclusion that black students receive less favorable evaluations based on teachers’ racial
bias without ever considering students and teachers of other racial and ethnic groups.
The problem of choosing variables to measure race quantitatively begs the question of
whether it is even appropriate to study race in this manner. One must consider whether
the concept of race can be reduced to a single variable and whether quantitative studies
can truly be used to examine problems involving race. After careful examination of the
main variables used in Downey and Pribesh’s study, one can see that perhaps the
variables were not measuring what the authors claimed to be measuring.
Downey and Pribesh compare the results of two independent statistical analyses
to test their hypothesis that black students receive less favorable behavioral evaluations in
school as a function of teacher bias as opposed to Oppositional Culture Theory (OCT).
While they provide quantitative support for their conclusion, it is questionable why they
did not simply test their teacher bias hypothesis and the OCT hypothesis instead of pitting
the two theories against one another. Downey and Pribesh pose their research question as
74
if these two are the only possible causes of poor evaluations of black students’ behavior
and as mutually exclusive options. The authors also seem closed to the idea that any
variable besides race could have a significant impact on teachers’ evaluations as there
was little discussion on other significant variables in the analysis, such as family, teacher,
and school characteristics. Further, as discussed in the literature review, both of the
teacher bias and OCT hypotheses are based on theories which are considered to be
problematic and myopic by many sociologists. In summary, they simply put blame on
either the teacher or the student and fail to account for the larger processes at work
behind racial bias and oppositional culture. Downey and Pribesh further downgrade the
matter by isolating the focus to only black and white racial matching. Again, after
evaluating the theoretical basis of Downey and Pribesh’s study, one reveals reasoning
which appears problematic and possibly biased.
Just as Downey and Pribesh’s research methods and substantive conclusions were
thoroughly examined and critiqued, one must scrutinize the current study to determine its
strengths and weaknesses and reveal any potential problems. After a review of the prior
literature, it is apparent that there is a lack of discussion regarding missing data methods
in sociology; one of the primary strengths of the current study is that it will work to fill
this gap. In addition to simply supplying needed information regarding the importance of
considering missingness prior to data analysis, the current study can be used to revitalize
the discussion regarding the proper use and treatment of incomplete datasets and
evaluation of substantive conclusions based on incomplete datasets. This methodological
comparison as a means of evaluating substantive conclusions is an innovative approach to
75
sociological research and may provide the impetus for future studies, an obvious positive
feature of the current study.
As with any study, the current analysis is not without its limitations. These
include the fact that the methodological comparison was limited to the analysis of only
one dataset using only three methods, with only two of the analyses executed by this
author. Consequently, one may conclude that the discrepancies between the MI and
DML methods and similarities between the DML and LD methods may be the result of
differences in data handling by this author and Downey and Pribesh. Despite the fact that
the selection of methods used based on a review of the prior literature on the subject, one
must still question whether or not the results of this study are generalizable to other data
analyses. Although the findings of this study did support the original hypothesis that MI
and DML will lead to the same substantive conclusions in application, it was not
anticipated that the DML and LD methods would produce for similar results than the MI
and DML methods. These unexpected findings only lead to further doubt of the
generalizability of the findings, a major limitation of the study. One possible manner to
address this area of weakness would be to perform further methodological comparisons
and create a meta-analaysis comparing the findings from the various studies.
The most obvious and consequential limitations of this study are related to the use
of the ECLS-K dataset, limitations which are shared by the Downey and Pribesh study as
they are inherent to the dataset. Unfortunately, only the public use versions of the ECLSK data were available for use and these data sets were already edited by the NCES prior
to being released to the public. Therefore, this study was unable to adequately evaluate
76
the types and amounts of missing data in the ECLS-K dataset. In addition to editing data
to protect the students’ privacy, the NCES also replaced numerous variables with NCES
created composite variables. Subsequently, all information on missing cases for the
original variables was made unavailable to the public. Further, the NCES did admit to
very low school response rates which led to the recruitment of replacement schools at
departure from the original sampling frame. Even with these additional recruitment
measures, the school completion rates were below NCES standards and prompted an
internal nonresponse bias analysis. Although the NCES nonresponse bias analysis
concluded that there was no bias due to school nonresponse, the current study was unable
to independently test this claim due to the unavailability of data (Bose 2001; West,
Denton, and Reaney 2001). As with all analysis of secondary data, one must trust that
the sampling, data collection, and data coding functions were performed properly.
However, one must still evaluate whether there are any potential risks to the reliability or
validity of the data. In the ECLS-K, as with many large confidential datasets, there are
several codes used for the various types of missing data. Still, not all situations fit neatly
into these coding schemes and the secondary data analyst must use the data as it is
available without having access to the actual responses. For example, according to the
ECLS-K codebook (NCES 2004), children needing special accommodations due to
physical or cognitive limitations were excluded from certain survey components and
coded as “not applicable” on that variable. If a child was unable to complete a question
with repeated instruction, the code “don’t know” was used. As a secondary data user,
one must trust that the survey administrators were capable of determining whether the
77
child needed special accommodations and just did not know the answer. Unfortunately,
many areas of limitation are inherent to the use of secondary data in a statistical analysis
and are unavoidable without the collection of one’s own data.
Another limitation to this study is that the methodological comparison used one
analysis prepared by another author and two analyses prepared by the current author. As
the MI results were obtained by Downey and Pribesh and the DML and LD results were
obtained by this author, the differences in MI and DML results may be the result of minor
differences in data preparation. One area of difference may be that the current author
recoded variables so that only meaningful values were included in the analysis and all
other values (such as refusals) were denoted as “missing”. Therefore, one may improve
upon the current study by conducting a methodological comparison using only analyses
with data prepared in exactly the same manner or by the same analyst.
In summary, it is important to understand that every study will have certain
strengths and weaknesses. It is even more important to evaluate a study and determine
what these assets and limitations are prior to utilizing a secondary source or publishing
one’s own findings for others to use. Due to the nature of the ECLS-K public use dataset,
many of the limitations of this current study as well as that of Downey and Pribesh’s
study were unavoidable if the ECLS-K data was to be used. Certain areas of weakness
are due to the adherence to a particular theory, such as the Oppositional Culture Theory
or teacher bias theory. While not all of a study’s limitations may be avoided, all should
be revealed and discussed so that the substantive conclusions drawn from that study can
be applied appropriately.
78
Impact on Future Research
As indicated above, one of the primary strengths of this study is that is
provides information on a neglected area in sociology: missing data methods.
Missingness is considered inevitable in social research. Therefore, issues regarding the
analysis of missing data need to be discussed, understood, and considered by the field of
sociology. The understanding of such issues is imperative as traditional statistical
methods will not perform properly using incomplete data. So, one must edit and adapt
the dataset to fit a traditional method or use a method designed for incomplete data.
Unfortunately, these methods are not well known nor commonly used. If an unsuitable
method is used, it can result in unreliable and biased results and errors in performing even
basic functions. This can lead to increased errors in hypothesis testing and thus distorted
substantive conclusions. However, many sociologists are concerned with substantive
issues and do not want to take the time or effort to consider technical and methodological
issues. In addition, many do not have the expertise to recognize the issues regarding
missing data methods. Thus, many simply use the method and program they are
comfortable with and unknowingly base substantive conclusions on biased results. The
current study not only fills a void in the existing literature regarding missing data
methods, but also reveals the neglected area of handling incomplete data is sociological
methods and opens up the discussion regarding these issues.
Underlying issues were revealed regarding the importance of method choice and
research design in sociological studies using incomplete data, areas which have been
largely ignored in the field of sociology. First is the issue of reviewing and
79
understanding the dataset prior to conducting any analysis. Ideally, one would have a
thorough knowledge of the dataset prior to even deciding what method of statistical
analysis is to be performed and with what software package. This includes an
understanding of the limitations of the particular dataset. Second is the issue of ensuring
that the hypothesis one seeks to test is actually being tested by model chosen. The
implications of this study on the field of social research will be an increased emphasis on
teaching the importance of method choice and teaching more than just the traditional
missing data methods in research methods courses. Further, it will lead to increased
scrutiny of the methods used to treat incomplete data in published works.
Although there are limitations to using the ECLS-K data in statistical analyses,
there are numerous possibilities for future substantive research using the ECLS-K as a
foundation. As Downey and Pribesh (2004) suggest, one can use the data collected in the
ECLS-K study to evaluate whether the teachers’ or students’ race has an effect on the
academic achievement of students or on the academic evaluations they receive. Due to
the wealth of information contained in the ECLS-K, future analyses of these data can
evaluate “the role of various things such as child care, home educational environment,
teachers’ instructional practices, class size and the general climate, and facilities and
safety of the schools” on areas such as changes in student academic achievement and
performance (West, Denton, and Reaney 2001: xiii). The current literature suggests a
focus on using research in standards based educational reform, and surveys such as the
ECLS-K will provide the data by which school programs can be evaluated (Stanovich
and Stanovich 2003). While there are many opportunities for future research using the
80
ECLS-K data, the fact remains that the limitations inherent to using this dataset must be
thoroughly understood and taken into account to ensure that only valid substantive
conclusions are based on these data.
The statistical findings and substantive conclusions made by Downey and Pribesh
in their 2004 study of the effects of students’ and teachers’ race on teachers’ evaluations
of students’ classroom behaviors also provide many possibilities for future research in the
areas of race, education, and child behavior. Though there were several apparent
weaknesses in their study, Downey and Pribesh’s findings may serve as a viable jumping
off point for future sociologists interested in studying these areas. As Downey and
Pribesh point out in their article, their focus on classroom behaviors leaves room for
research into the areas of academic achievement and grading. Given that the concepts of
race and education are so broad and pervasive, the influence of Downey and Pribesh’s
statistical results and substantive findings will not doubt lend support and influence in
future sociological research.
Conclusion
While this current study’s methodological comparisons resulted in some
unexpected findings, the results did generally support the hypothesis that MI and DML in
application will result in the same substantive conclusions. All three methods resulted in
findings which did support the substantive conclusions made by Downey and Pribesh that
black students receive less favorable evaluations than white students due to teacher bias
on the basis of race. After a careful review of the prior literature and Downey and
Pribesh’s 2004 study, with a methodological comparison for the basis of evaluation, one
81
finds that the choice of method used for analysis and handling missing data is an
important step in sociological research. It is evident that the choice of method can have
an effect on statistical findings, and this study found that it can be fairly certain that DML
should be used as the missing data method of choice whenever appropriate as it is
generally easier to implement than MI and has more theoretical support in the literature
than LD. While methodological considerations are an important foundation for a valid
and unbiased study, it is also clear that the hypothesis guides the focus of the research
and thus may draw attention to or away from the influence of independent variables. The
influence of hypothesis on the focus of research can be seen in Downey and Pribesh’s
lack of discussion regarding statistically significant independent variables other than race
in their study. Theoretical and substantive issues will continue to guide sociological
research; however it is evident that sociologists must begin to seriously consider
methodological issues in order to ensure the reliability of future substantive conclusions
based on statistical analyses using incomplete quantitative data.
82
REFERENCES
Ainsworth-Darnell, James W. and Douglas B. Downey. 1998. “Assessing the
Oppositional Culture Explanation for Racial/Ethnic Differences in School
Performance.” American Sociological Review 63(4): 536-553.
Alexander, Karl L., Doris R. Entwisle, and Maxine S. Thompson. 1987. “School
Performance, Status Relations, and the Structure of Sentiment: Bringing the
Teacher Back In.” American Sociological Review 52(5): 665-682.
Allison, Paul D. 1987. “Estimation of Linear Models with Incomplete Data.”
Sociological Methodology 17:71-103.
------. 1999. Multiple Regression: A Primer. Thousand Oaks, CA: Pine Forge Press.
------. 2000. “Multiple Imputation for Missing Data: A Cautionary Tale.” Sociological
Methods and Research 28(3):301-309.
------. 2002. Missing Data. Sage University Papers series on Quantitative Applications
in Social Sciences, 07-136. Thousand Oaks, CA: Sage.
Arbuckle, James L. 2007. Amos 17.0 User’s Guide. [MRDF] Spring House, PA:
Amos Development Corporation. (http://amosdevelopment.com).
Baydar, Nazli. 2004. “Book Reviews.” Sociological Methods Research 33: 157-161.
Bodovski, Katrina and George Farkas. 2008. “Concerted Cultivation and Unequal
Achievement in Elementary School.” Social Science Research 37: 903–919.
Bose, Jonaki. 2001. “Nonresponse Bias Analyses at the National Center for Education
Statistics.” Proceedings of Statistics Canada Symposium 2001. Achieving Data
Quality in a Statistical Agency: A Methodological Perspective.
83
Byrne, Barbara M. 2001a. Structural Equation Modeling with AMOS: Basic Concepts,
Applications, and Programming. Mahwah, NJ: Lawrence Erlbaum Associates,
Inc.
------. 2001b. “Structural Equation Modeling with AMOS, EQS, and LISREL:
Comparative Approaches to Testing for the Factorial Validity of a Measurement
Instrument.” International Journal of Testing 1(1):55-86.
Carter, Rufus Lynn. 2006. “Solutions for Missing Data in Structural Equation
Modeling.” Research & Practice in Assessment 1(1):1-6.
Cohen, Philip N. and Matt Huffman. 2007. “Working for the Woman? Female
Managers and the Gender Wage Gap.” American Sociological Review 72: 681–
704.
Condron, Dennis J. 2007. “Stratification and Educational Sorting: Explaining
Ascriptive Inequalities in Early Childhood Reading Group Placement.” Social
Problems 54(1):139-160.
Collins, Linda M., Joseph L. Schafer, and Chi-Ming Kam. 2001. “A Comparison of
Inclusive and Restrictive Strategies in Modern Missing Data Procedures.”
Psychological Methods 6(4): 330-351.
Cunningham, Everarda G. and Wei C. Wang. 2005. “Using AMOS Graphics to Enhance
the Understanding and Communication of Multiple Regression.” IASE/ISI
Satellite. Swineburne University of Technology, Australia.
Downey, Douglas B. 2008. “Black/White Differences in School Performance: The
Oppositional Culture Explanation.” Annual Review of Sociology 34:107-126.
84
Downey, Douglas B. and Shana Pribesh. 2004. “When Race Matters: Teachers’
Evaluations of Students’ Classroom Behavior.” Sociology of Education 77(4):
267-282.
Downey, Douglas B., Paul T. von Hippel, and Beckett A. Broh. 2004. “Are Schools the
Great Equalizer? Cognitive Inequality During the Summer Months and the School
Year.” American Sociological Review 69(5): 613-635.
Ehrenberg, Ronald G., Daniel D. Goldhaber, and Dominic J. Brewer. 1995. “Do
Teachers’ Race, Gender, and Ethnicity Matter? Evidence from the National
Educational Longitudinal Study of 1988.” Industrial and Labor Relations Review
48(3): 547-561.
Eliason, Scott R. 1993. Maximum Likelihood Estimation: Logic and Practice. A Sage
University Paper series on Quantitative Applications in the Social Sciences, 07096. Newbury Park, CA: Sage.
Enders, Craig K. 2001. “A Primer on Maximum Likelihood Algorithms Available for
Use With Missing Data.” Structural Equation Modeling 8(1):128-141.
------. 2006. “A Primer on the Use of Modern Missing-Data Methods n Psychosomatic
Medicine Research.” Psychosomatic Medicine 68:427-436.
Entwisle, Doris R., Karl L. Alexander, and Linda Steffel Olson. 2005. “First Grade and
Educational Attainment by Age 22: A New Story.” American Journal of
Sociology 10(5):1458–1502.
Espeland, Mark A. 1988. “Review.” American Journal of Sociology 94(1): 156-158.
85
Espinosa, Linda M. and James M. Laffey. 2003. “Urban Primary Teacher Perceptions of
Children with Challenging Behaviors.” Journal of Children and Poverty 9(2):
135-156.
Farkas, George, Christy Lleras, and Steve Maczuga. 2002. “Does Oppositional Culture
Exist in Minority and Poverty Peer Groups?”. American Sociological Review
67(1):148-155.
Fay, Robert E. 1992. “When are Inferences from Multiple Imputation Valid?”.
Proceedings of the Survey Research Methods Section, American Statistical
Association. Pp. 227-232.
Fowler, Floyd J. Jr. 2002. Survey Research Methods. 3rd ed. Applied Social Research
Methods Series. Vol. 1. Thousand Oaks, CA: Sage Publications.
Freedman, Vicki A. and Douglas A. Wolf. 1995. “A Case Study on the Use of Multiple
Imputation.” Demography 32(3):459-470.
Graham, John W., Scott M. Hofer, and Andrea M. Piccinin. 1994. “Analysis with
Missing Data in Drug Prevention Research.” Pp. 13-63 in National Institute on
Drug Abuse Research Monograph Series: Advances in Data Analysis for
Prevention Intervention Research, edited by L.M. Collins and L.A. Seitz.
Rockville, MD: National Institute on Drug Abuse.
Gosa, Travis L. and Karl L. Alexander. 2007. “Family (Dis)Advantage and the
Educational Prospects of Better Off African American Youth: How Race Still
Matters.” Teacher’s College Record 109(9):285-321.
86
Horton, Nicholas J. and Stuart R. Lipsitz. 2001. “Multiple Imputation in Practice:
Comparison of Software Packages for Regression Models with Missing
Variables.” The American Statistician 55(3):244-254.
Little, Roderick and Donald B. Rubin. 2002. Statistical Analysis with Missing Data 2nd
Edition. Hoboken, NJ: John Wiley & Sons, Inc.
Lleras, Christy. 2008. “Do skills and behaviors in high school matter? The contribution
of noncognitive factors in explaining differences in educational attainment and
earnings.” Social Science Research (37):888–902.
Long, Barbara H. and Edmund H. Henderson. 1971. “Teachers’ Judgments of Black and
White School Beginners.” Sociology of Education 44(3): 358-368.
Mackelprang, A. J. 1970. “Missing Data in Factor Analysis and Multiple Regression.”
Midwest Journal of Political Science 14(3): 493-505.
Madow, William G, Harold Nisselson, and Ingram Olkin, eds. 1983. Incomplete Data in
Sample Surveys, Vol. 1, Report and Case Studies. New York: Academic Press.
McArdle, John J. 1994. “Structural Factor Analysis Experiments with Incomplete Data.”
Multivariate Behavioral Research 29(4):409-454.
National Center for Education and Statistics, U.S. Dept. of Education. 2004. ECLS-K
Base Year Public-Use Data Files and Electronic Codebook. CD-ROM. NCES
2001-029 Revised August 2004. Rockville, MD: Westat.
Navarro, Jose Blas. 2003. “Methods for the Analysis of Explanatory Linear Regression
Models with Missing Data Not at Random.” Quality & Quantity 37:363-376.
87
Penn, David A. 2007.”Estimating Missing Values form the General Social Survey; An
Application of Multiple Imputation.” Social Science Quarterly 88(2):573-584.
Regoeczi, Wendy C. and Marc Riedel. 2003. “The Application of Missing Data
Estimation Models to the Problem of Unknown Victim/Offender Relationships in
Homicide Cases.” Journal of Quantitative Criminology 19(2): 155-183.
Rubin, Donald B. 1976. “Inference and Missing Data.” Biometrika 63(3): 581-592.
------. 1978. “Multiple Imputations In Sample Surveys: A Phenomenological Bayesian
Approach to Nonresponse.” Proceeding of the Survey Research Methods Section,
American Statistical Association. Pp. 20-34. Washington, D.C.
------. 1987. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley
& Sons, Inc.
------. 1996. “Multiple Imputation After 18+ Years.” Journal of the American Statistical
Association 91(434):473-489.
Rudas, Tamas. 2005. “Mixture Models of Missing Data.” Quality & Quantity 39:19-36.
Schafer, Joseph L. 1999. “Multiple Imputation: A Primer.” Statistical Methods in
Medical Research 8: 3-15.
Schafer, J. L., T. M. Ezzati-Rice, W. Johnson, M. Khare, R. J. A. Little, and D. B. Rubin.
1996. “The NHANES III Multiple Imputation Project.” Proceedings of the
Survey Research Methods Section, American Statistical Association. Pp. 28-37.
Schafer, Joseph L. and John W. Graham. 2002. “Missing Data: Our View of the State
of the Art.” Psychological Methods 7(2): 147-177.
88
Schafer, Joseph L. and Maren K. Olsen. 1998. “Multiple Imputation for Multivariate
Missing-Data Problems: A Data Analyst’s Perspective.” Multivariate Behavioral
Research 33(4): 545-571.
Shernoff, David J. and Jennifer A. Schmidt. 2008. “Further Evidence of an
Engagement–Achievement Paradox Among U.S. High School Students.” Journal
of Youth Adolescence 37:564–580.
Sinharay, Sandip, Hal S. Stern, and Daniel Russell. 2001. “The Use of Multiple
Imputation For the Analysis of Missing Data.” Psychological Methods 6(4):317329.
Stanovich, Paula J. and Keith E. Stanovich. 2003. “Using Research and Reason in
Education: How Teachers Can Use Scientifically Based Research to Make
Curricular & Instructional Decisions.” Portsmouth, New Hampshire: RMC
Research Corporation.
Stearns, Elizabeth and Elizabeth J. Glennie. 2006. “When and Why Dropouts Leave
High School.” Youth Society 38(1): 29-57.
Stumpf, Stephen A. 1978. “A Note On Handling Missing Data.” Journal of
Management 4(1): 65-73.
Tach, Laura Marie and George Farkas. 2006. “Learning-related Behaviors, Cognitive
Skills, and Ability Grouping when Schooling Begins.” Social Science Research
35: 1048–1079.
89
Takei, Yoshimitsu and Roger Shouse. 2008. “Ratings in Black and White: Does Racial
Symmetry or Asymmetry Influence Teacher Assessment of a Pupil’s Work
Habits?”. Social Psychology of Education 11(4):367-387.
West, Jerry, Kristin Denton, and Lizabeth M. Reaney. 2001. The Kindergarten Year:
Findings from the Early Childhood Longitudinal Study, Kindergarten Class of
1998-99. National Center For Education Statistics, NCES 2001-023.
Washington, DC: U.S. Department of Education.
Wothke, Werner and James L. Arbuckle. 1996. “Full-information Missing Data
Analysis with AMOS.” SPSS White Paper.
Yuan, Yang C. 2000. “Multiple Imputation for Missing Data: Concepts and New
Development.” P267-25. SAS White Papers. Rockville, MD: SAS Institute.
Yuan, Ke-Hai and Peter M. Bentler. 2000. “Three Likelihood-Based Methods for Mean
and Covariance Structure Analysis with Nonnormal Missing Data.” Sociological
Methodology 30:165-200.
Zeiser, Krissy. 2008. “PRI Workshop: Introduction to AMOS.” Pennsylvania State
University, November 13, 2008.