Value-added modeling - Maryland Assessment Research Center

advertisement
Modeling Student Growth Using
Multilevel Mixture Item Response Theory
Hong Jiao
Robert Lissitz
University of Maryland
Presentation at the 2012 MARCES Conference
October 2012
Thanks to
Yong Luo, Chao Xie, and Ming Li for feedback
Outline of presentation
•
•
•
•
Value-added modeling
Multilevel IRT models
Mixture IRT models
Direct modeling of students’ growth
parameters in multilevel mixture IRT models
• Simulation for direct modeling of growth in
IRT models
• Future explorations
Value-added modeling
• VAM intends to estimate the effect of
educational inputs on student outcomes or
student achievement as measured by
standardized tests. (McCaffrey et al. 2003)
• Accurate estimation of students’ achievement
is very important as high stakes decisions are
associated with the use of such scores.
• All value-added models estimate the growth
associated with schools and/or teachers
• To measure growth, some models control for
students’ prior achievement (FL
commissioned paper by AIR)
Complexity of the VAM
• How prior achievements are accounted for
• How value-added scores of school and
teacher effects are estimated
• Assumptions about the sustainability of
school and teacher effects
• Value-added models can be grouped into two
major classes (AIR):
 Typical learning path models
 Covariate adjustment models
Typical learning path modelslongitudinal mixed-effects mdoels
• Each student is assumed to have a typical
learning path
• Schools and teachers can alter this learning
path relative to the state mean, a conditional
average
• No direct control of prior achievement
• With more data points, a student’s propensity
to achieve can be estimated with more
accuracy
• With each passing year, a student’s typical
learning path can be estimated with increased
precision over time
• Different learning path models assume
differently about how teachers and schools
can impact a student’s propensity to achieve
Different learning path models
• Sander’s Tennessee value added assessment
system (TVAAS) model, teacher effects are
assumed to have a permanent impact on
students
• McCaffrey and Lockwood (2008) relaxed this
assumption and let the data dictate the
extent to which teacher effects decay over
time
• Kane et al. (2008) found that teacher effects
appear to dissipate over the course of about
two years in an experiment in Los Angeles
Covariate adjustment models
• Direct control of prior student scores, prior
test scores are included as predictors in the
model
• Teacher effects can be treated as either fixed
or random
• To obtain unbiased estimates, covariate
adjustment models must account for
measurement error introduced by the
inclusion of model predictors-students’ prior
achievement
Covariate adjustment models
• Two frequently used methods for accounting
for measurement error in regression analysis
include
 Direct modeling of error such as in
structural equation models or errors-invariables regression
 Instrumental variable approach using one
or more variables that are assumed to
influence the current year score, but not
prior year scores to statistically purge the
measurement error from the prior year
scores
Statistical controls for contextual factors
• Students are not randomly assigned to
districts, schools, and classes
• Parent selection of schools and teachers,
teacher selection of schools, subjects, and
sections, principal discretion in assigning
certain students to certain teachers
• These selection factors cause significant
biases
• Unbiased estimates of teacher value-added
controls the factors that influence both
selection of students into particular classes
and current year test scores
Statistical controls for contextual factors
• Many value-added models assume only
students’ prior test score is relevant to
students’ posttest score
• Other models incorporate controls for
additional variables that might influence
selection and outcomes
Statistical controls for contextual factors
• Empirical evidence is mixed on the extent to which
student characteristics other than score histories
remain correlated with test scores after controlling
for prior test scores
• Some studies found that controlling for studentlevel characteristics makes little if any significant
difference in model estimates (Ballou, Sanders,
and Wright, 2004; McCaffrey et al. 2004)
 This is consistent with the view that durable student
characteristics associated with race, income, and other
characteristics are already reflected in prior test scores,
such that controlling for the prior test scores controls for
any relevant impact of the factors proxied by the
measured characteristics
Statistical controls for contextual factors
• In contrast, when student factors are aggregated
to school or classroom levels, they sometimes
reveal a significant residual effect (Raudenbush,
2004; Ballou, Sanders, & Wright, 2004). School or
classroom characteristics may explain additional
variance in students’ posttest scores independently
beyond students’ individual characteristics
accounted for by their prior test scores
• True teacher effectiveness really does vary with
student characteristics and correlated variation of
estimated teacher value-added is not the
consequence of uncontrolled selection bias but
rather a reflection of these true differences in
teacher effectiveness.
Durability of teacher effects
• Typical learning path models require an
assumption about the durability of the impact of
teachers on a student’s learning path.
 Sanders’ Tennessee value-added assessment
system assume that teacher effects have a
permanent impact on students
 McCaffrey & Lockwood (2008) let the data
dictate the extent to which teacher effects
decay over time
 Kane et al. (2008) found that teacher effects
appeared to dissipate over the course of
about two years in an experiment in LA.
Durability of teacher effects
• Covariate models do not make assumption
about the durability of teacher effects as they
explicitly establish expectations based on prior
achievement by including prior test scores as a
covariate, rather than the abstract ‘propensity
to achieve’ estimated in learning path models
Unit of measurement for student achievement
• Colorado growth model (Betebenner, 2008)
uses entirely normative in-state percentile
ranks
• Not rely on a potentially flawed vertical
scale
• But only provide normative criteria
• Students’ growth is examined relative to
their peers rather than absolute growth in
their own learning.
Dependent variable in growth modeling
• Majority used interval measures of students;
scaled test score
• Student percentile ranks within the student’s
grade was also used as the dependent
variable in some models
Correction of biased estimates of teacher effects in VAM
• Selection effects include parent selection of
schools and teachers; teacher selection of
schools, subjects, and sections; and principal
discretion in assigning certain students to
certain teachers
• Selection effects can be mitigated when the
model includes factors that are not accounted
for by pretest scores, and are associated with
posttest scores after controlling for pretest
scores.
Issues arising from the use of achievement test scores as an
outcome measure
• Testing is infrequent-once a year
• Tests sample all topics related to achievement
• The scale for measuring achievement is not
predetermined by the nature of achievement
but is chosen by the test developer.
• Changes to the timing of tests, the weight
given to alternative topics, or the scaling of
the test could change our conclusions about
the relative achievement or growth in
achievement across classes of students.
Potential problems in value-added models
• Linking errors could be conflated with teacher
effects
• Equal interval property of the scale across
grades was questionable.
• Ceiling effects at higher grades may lead to
smaller learning gains than grades in the
middle scale.
• Measurement errors cause estimated
treatment effects confounded with group
means of prior achievement (Lockwood, 2012)
Covariate Adjusted Models (McCaffrey, et al. 2003)
St  mt  b *S (t 1)  Tt   t
the student’s
score at time t
a student-specific
mean
the student’s
score at time t-1
the teacher effect
the error term
assumed to be
normally
distributed and
independent of
Gain Score Models (McCaffrey et al. 2003)
St S (t 1)  mt  Tt   t
the student’s
score at time t
a student-specific
mean
the student’s
score at time t-1
The gain score model can be
viewed as a special case of the
covariate adjusted model, where b,
the coefficient of prior
achievement, is equal to 1.
the teacher effect
the error term
assumed to be
normally
distributed and
independent of
Teacher Effect Estimate in VAM (Luo, Jiao, & Van Wie,2012)
• Two-step process:
 In most value-added modeling, student
achievement scores are estimated before
entering the model for estimating teacher or
school effect.
 Students’ achievement scores are estimated
based on a certain item response theory (IRT)
model first, most often a unidimensional IRT
model.
Issues with Two-Step Process (Luo, Jiao, & Van Wie, 2012)
• Standard IRT models are used in operation to
measure students’ achievement scores.
• Non-random assignment of students into
schools and classes cause local person
dependence due to the nesting structure
(Reckase, 2009, Jiao et al. 2012).
• Measurement precision might be affected
• Parameter estimates may be biased due to the
reduced effective sample size (Cochrane, 1977;
Cyr & Davies, 2005; Kish, 1965).
• Ultimately, the accuracy in estimating teacher
and school effect may also be affected.
Outcome variables in VAM
• Standardized test scores
• Intrinsic measurement errors in the test
scores
• Possible solution is to use multilevel item
response theory (IRT) models
 Simultaneous modeling of students’
achievement, teacher effects, and school
effects using item response data as the
input data and the latent ability is
simultaneously estimated with other model
parameters such as item parameters and
teacher and school random-effects.(van
Wie, Luo, & Jiao, 2012; Luo, Jiao, & van
Wie, 2012)
Four level model in the traditional Rasch model format (Van
Wie, Luo, & Jiao, 2012)
p jmsi
1

1  exp[( j   T   S  bi )]
26
Four level model in the traditional 3PL IRT model format (Luo,
Jiao, & Van Wie, 2012)
p jmsi
1  ci
 ci 
1  exp[( j   T   S  bi )]
27
Multilevel IRT Framework
• Model parameter estimation of the 4-level IRT
models: item parameters, student ability, teacher
effect, and school effect.
Rasch: HLM7, Proc Glimmix, MCMC
2pl: MCMC
3pl: MCMC
Teacher Effect and School Effect Computation
• In the two-step process, teacher effect is
computed as the average of the scores of the
nested students, and school effect is computed
as the average of the teacher effects within the
school. This is analogous to the status model.
• In the 4-level IRT model, the student ability, the
teacher effect and the school effect were
simultaneously estimated.
Findings
• Except for RMSE in teacher effect parameter
estimation, the 4-level 3pl IRT model performs
significantly better than the 2-level 3pl IRT
model.
• Especially noticeable is the considerable
improvement of teacher effect parameter
estimation.
Improvement of Teacher Effect Estimates
• The improvement is especially noticeable when
teacher effects and school effects are medium.
• The improvement decreases with the decrease of
teacher effects and school effects.
Further improvement
• As the change score is ultimately used in
evaluating teacher and school effects in several
value-added models, we explored direct
estimation of change score by including prior
achievement scores in the IRT modeling.
• An IRT model formulation for growth score is
presented and model parameter estimation is
explored.
• A multilevel formulation is presented.
• A mixture IRT version including growth score is
presented and model parameter estimation is
discussed.
Possible models
• Rasch model with direct modeling of
growth parameter
1
Pji ( x  1bi , j,  j ) 
(1  exp(( j   j  bi )))
• Multilevel Rasch model with direct
modeling of growth parameter
p jmsi 
1
1  exp[( j   j   T   S  bi )]
Possible models
• Multilevel Rasch mixture model with direct
modeling of growth parameter with no
latent classes at teacher and school levels
p jmsic 
1
1  exp[( jc   jc   T   S  bic )]
• Multilevel Rasch mixture model with direct
modeling of growth parameter with latent
classes at teacher and school levels
p jmsic 
1
1  exp[( jc   jc   Tc   Sc  bic )]
Simulation Study
• 30 items and 1000 examinees simulated
b ~ N (0,1)
 ~ N (0,1)
 ~ N (0,0.5)
1
Pji ( x  1bi , j,  j ) 
(1  exp(( j   j  bi )))
Model Parameter Estimation
Using the Markov Chain Monte Carlo (MCMC)
method implemented in OpenBUGS 3.0.7
,
x ji ~ Bernoulli( p ji )
Pji ( x  1bi , j,  j ) 
1
(1  exp(( j   j  bi )))
Priors: b ~ dnorm(0,10)
i
 j ~ N (0, 2)
.
 j is a known parameter   a prior test score
MCMC runs
two chains used
Initial values were generated by the program
Convergence Check
Multiple criteria for convergence check
The required number of iterations for equilibrium
varied for different models
The number of burn-in iterations: 40,000
iterations
The model parameter inferences were made
based on the 10,000 monitoring iterations for
each chain with a total of 20,000 samples.
Growth parameter estimates
Descriptive Statistics
N
Minimum
Maximum
Mean
Std. Deviation
dtheta
1000
-1.426000
1.818000
-.00315860
.485358632
dtheta_true
1000
-1.346726
1.459898
-.01686716
.488377290
dif_theta
1000
-1.513055
2.036097
.00835295
.543427499
Valid N (listwise)
1000
Correlations: growth parameter estimates
Correlations
dtheta
Pearson Correlation
dtheta
1
.745
Sig. (2-tailed)
N
Pearson Correlation
dtheta_true
dtheta_true
**
.000
1000
1000
**
1
.745
Sig. (2-tailed)
.000
N
1000
1000
**. Correlation is significant at the 0.01 level (2-tailed).
Correlations
dtheta_true
Pearson Correlation
dtheta_true
dif_theta
1
Sig. (2-tailed)
dif_theta
.616
**
.000
N
1000
1000
Pearson Correlation
.616
**
1
Sig. (2-tailed)
.000
N
1000
**. Correlation is significant at the 0.01 level (2-tailed).
1000
Future Research
•
Multilevel IRT model for direct estimation of the growth
change scores.
•
A Mixture multilevel IRT model for direct estimation of the
growth change scores
•
A constrained version of the model is possible by setting
the growth change scores to non-negative values.
•
Extensions to other IRT models such as 2PL, 3PL-c, 3PL-d,
and 4P IRT models and the mixture version of the models.
•
Replications and simulate more study conditions.
•
Model fit indexes to select among competing models should
be investigated under more extensive study conditions.
Thank you!
Download