On the Merits of Planning and Planning for Missing Data*

advertisement
On the Merits of Planning and
Planning for Missing Data*
*You’re a fool for not using planned
missing data design
Todd D. Little
University of Kansas
Director, Quantitative Training Program
Director, Center for Research Methods and Data Analysis
Director, Undergraduate Social and Behavioral Sciences Methodology Minor
Member, Developmental Psychology Training Program
crmda.KU.edu
Workshop presented 5-23-2012 @
University of Turku, Finland
Special Thanks to: Mijke Rhemtulla & Wei Wu
crmda.KU.edu
1
Road Map
• Learn about the different types of missing data
• Learn about ways in which the missing data process
can be recovered
• Understand why imputing missing data is not cheating
• Learn why NOT imputing missing data is more likely to lead
to errors in generalization!
• Learn about intentionally missing designs
crmda.KU.edu
2
Key Considerations
• Recoverability
•
•
Is it possible to recover what the sufficient statistics would
have been if there was no missing data?
• (sufficient statistics = means, variances, and covariances)
Is it possible to recover what the parameter estimates of a
model would have been if there was no missing data.
• Bias
•
Are the sufficient statistics/parameter estimates
systematically different than what they would have been
had there not been any missing data?
• Power
•
Do we have the same or similar rates of power (1 – Type II
error rate) as we would without missing data?
crmda.KU.edu
3
Types of Missing Data
•
Missing Completely at Random (MCAR)
•
•
No association with unobserved variables (selective
process) and no association with observed variables
Missing at Random (MAR)
• No association with unobserved variables, but maybe
related to observed variables
• Random in the statistical sense of predictable
•
Non-random (Selective) Missing (MNAR)
•
Some association with unobserved variables and maybe
with observed variables
crmda.KU.edu
4
Effects of imputing missing data
crmda.KU.edu
5
Effects of imputing missing data
No Association
with Observed
Variable(s)
An Association
with Observed
Variable(s)
No Association
with Unobserved
/Unmeasured
Variable(s)
MCAR
•Fully
recoverable
•Fully unbiased
MAR
• Partly to fully
recoverable
• Less biased to
unbiased
An Association
with Unobserved
/Unmeasured
Variable(s)
NMAR
• Unrecoverable
• Biased (same
bias as not
estimating)
MAR/NMAR
• Partly
recoverable
• Same to
unbiased
crmda.KU.edu
6
Effects of imputing missing data
No Association with
ANY Observed
Variable
An Association
with Analyzed
Variables
An Association
with Unanalyzed
Variables
No Association
with Unobserved
/Unmeasured
Variable(s)
MCAR
•Fully
recoverable
•Fully unbiased
MAR
• Partly to fully
recoverable
• Less biased to
unbiased
MAR
• Partly to fully
recoverable
• Less biased to
unbiased
An Association
with Unobserved
/Unmeasured
Variable(s)
NMAR
• Unrecoverable
• Biased (same
bias as not
estimating)
MAR/NMAR
• Partly to fully
recoverable
• Same to
unbiased
MAR/NMAR
• Partly to fully
recoverable
• Same to
unbiased
Statistical Power: Will always be greater when missing data is imputed!
crmda.KU.edu
7
Modern Missing Data Analysis
MI or FIML
•
In 1978, Rubin proposed Multiple Imputation (MI)
•
•
•
•
An approach especially well suited for use with large public-use
databases.
First suggested in 1978 and developed more fully in 1987.
MI primarily uses the Expectation Maximization (EM) algorithm
and/or the Markov Chain Monte Carlo (MCMC) algorithm.
Beginning in the 1980’s, likelihood approaches developed.
•
•
Multiple group SEM
Full Information Maximum Likelihood (FIML).
• An approach well suited to more circumscribed models
crmda.KU.edu
8
60% MAR correlation estimates with 1 PCA
auxiliary variable (r = .60)
Figure 7. Simulation results showing XY correlation estimates (with 95 and 99%
confidence intervals) associated with a 60% MAR Situation and 1 PCA auxiliary
variable.
crmda.KU.edu
9
Three-form design
•
What goes in the Common Set?
Form
Common Set
X
Variable Set A
Variable Set B
Variable Set C
1
¼ of items
¼ of items
¼ of items
missing
2
¼ of items
¼ of items
missing
¼ of items
3
¼ of items
missing
¼ of items
¼ of items
crmda.KU.edu
10
Three-form design: Example
•
21 questions made up of 7 3-question subtests
Subtest
Item
Subtest
Item
Demographics
How old are you?
Are you male or female?
What is your occupation?
Extraversion
Musical Taste
What is your favorite genre
of music?
Do you like to listen to music
while you work?
Do you prefer music played
loud or softly?
I start conversations.
I am the life of the party.
I am comfortable around
people.
Neuroticism
I get stressed out easily.
I get irritated easily.
I have frequent mood
swings.
Conscientiousness
I am always prepared.
I like order.
I pay attention to details.
Agreeableness
I am interested in people.
I have a soft heart.
I take time out for others.
Openness
I have a rich vocabulary.
I have excellent ideas.
I have a vivid imagination.
crmda.KU.edu
11
Three-form design: Example
• Common Set (X)
Subtest
Item
Subtest
Item
Demographics
How old are you?
Are you male or female?
What is your occupation?
Extraversion
Musical Taste
What is your favorite genre
of music?
Do you like to listen to music
while you work?
Do you prefer music played
loud or softly?
I start conversations.
I am the life of the party.
I am comfortable around
people.
Neuroticism
I get stressed out easily.
I get irritated easily.
I have frequent mood
swings.
Conscientiousness
I am always prepared.
I like order.
I pay attention to details.
Agreeableness
I am interested in people.
I have a soft heart.
I take time out for others.
Openness
I have a rich vocabulary.
I have excellent ideas.
I have a vivid imagination.
crmda.KU.edu
Three-form design: Example
• Common Set (X)
Subtest
Item
Subtest
Item
Demographics
How old are you?
Are you male or female?
What is your occupation?
Extraversion
Musical Taste
What is your favorite genre
of music?
Do you like to listen to music
while you work?
Do you prefer music played
loud or softly?
I start conversations.
I am the life of the party.
I am comfortable around
people.
Neuroticism
I get stressed out easily.
I get irritated easily.
I have frequent mood
swings.
Conscientiousness
I am always prepared.
I like order.
I pay attention to details.
Agreeableness
I am interested in people.
I have a soft heart.
I take time out for others.
Openness
I have a rich vocabulary.
I have excellent ideas.
I have a vivid imagination.
crmda.ku.edu
13
Three-form design: Example
• Set A
Subtest
Item
Subtest
Item
Demographics
How old are you?
Are you male or female?
What is your occupation?
Extraversion
Musical Taste
What is your favorite genre
of music?
Do you like to listen to music
while you work?
Do you prefer music played
loud or softly?
I start conversations.
I am the life of the party.
I am comfortable around
people.
Neuroticism
I get stressed out easily.
I get irritated easily.
I have frequent mood
swings.
Conscientiousness
I am always prepared.
I like order.
I pay attention to details.
Agreeableness
I am interested in people.
I have a soft heart.
I take time out for others.
Openness
I have a rich vocabulary.
I have excellent ideas.
I have a vivid imagination.
crmda.KU.edu
14
Three-form design: Example
• Set B
Subtest
Item
Subtest
Item
Demographics
How old are you?
Are you male or female?
What is your occupation?
Extraversion
Musical Taste
What is your favorite genre
of music?
Do you like to listen to music
while you work?
Do you prefer music played
loud or softly?
I start conversations.
I am the life of the party.
I am comfortable around
people.
Neuroticism
I get stressed out easily.
I get irritated easily.
I have frequent mood
swings.
Conscientiousness
I am always prepared.
I like order.
I pay attention to details.
Agreeableness
I am interested in people.
I have a soft heart.
I take time out for others.
Openness
I have a rich vocabulary.
I have excellent ideas.
I have a vivid imagination.
crmda.KU.edu
15
Three-form design: Example
• Set C
Subtest
Item
Subtest
Item
Demographics
How old are you?
Are you male or female?
What is your occupation?
Extraversion
Musical Taste
What is your favorite genre
of music?
Do you like to listen to music
while you work?
Do you prefer music played
loud or softly?
I start conversations.
I am the life of the party.
I am comfortable around
people.
Neuroticism
I get stressed out easily.
I get irritated easily.
I have frequent mood
swings.
swings.
Conscientiousness
I am always prepared.
I like order.
I pay attention to details.
Agreeableness
I am interested in people.
I have a soft heart.
I take time out for others.
Openness
I have a rich vocabulary.
I have excellent ideas.
I have a vivid imagination.
crmda.KU.edu
16
Form 1 (XAB)
Form 2 (XAC)
Form 3 (XBC)
How old are you?
Are you male or female?
What is your occupation?
How old are you?
Are you male or female?
What is your occupation?
How old are you?
Are you male or female?
What is your occupation?
What is your favorite genre of
What is your favorite genre of
What is your favorite genre of
music?
music?
music?
Do you like to listen to music
Do you like to listen to music
Do you like to listen to music
while you work?
while you work?
while you work?
Do you prefer music played loud or Do you prefer music played loud or Do you prefer music played loud or
softly?
softly?
softly?
I have a rich vocabulary.
I have excellent ideas.
I have a rich vocabulary.
I have a vivid imagination.
I have excellent ideas.
I have a vivid imagination.
I start conversations.
I am the life of the party.
I start conversations.
I am comfortable around people.
I am the life of the party.
I am comfortable around people.
I get stressed out easily.
I get irritated easily.
I get stressed out easily.
I have frequent mood swings.
I get irritated easily.
I have frequent mood swings.
I am always prepared.
I like order.
I am always prepared.
I pay attention to details.
I like order.
I pay attention to details.
I am interested in people.
I have a soft heart.
I am interested in people.
I take time out for others.
I have a soft heart.
I take time out for others.
17
Jazz
4
1
29 M
server
5
1
27 M
6
2
21
7
2
8
4
4
--
1
5
--
1
2
--
4
2
--
3
2
--
soft
1
3
--
2
2
--
5
3
--
4
1
--
2
1
--
N
soft
2
4
--
5
5
--
2
4
--
5
1
--
4
2
--
Metal
N
soft
1
3
--
5
2
--
2
1
--
1
1
--
4
2
--
chef
Rock
N
soft
1
4
--
5
1
--
2
2
--
5
3
--
2
2
--
F
painter
Pop
Y loud
4
--
4
2
--
1
1
--
5
1
--
5
5
--
3
39
F
librarian
Alt
N loud
1
--
4
4
--
3
4
--
3
4
--
2
4
--
3
2
22
F
server
Ska
N
soft
4
--
2
3
--
3
3
--
3
1
--
2
5
--
5
9
2
38 M
doctor
Punk
N loud
1
--
3
2
--
2
2
--
4
4
--
1
3
--
2
10
2
29
F
statistician
Pop
N loud
4
--
5
3
--
4
5
--
4
3
--
2
3
--
1
11
3
28
F
chef
Rock
Y loud
--
3
3
--
5
5
--
5
4
--
3
3
--
2
5
12
3
25 M
nurse
Rock
N
soft
--
4
5
--
2
2
--
2
5
--
4
5
--
3
5
13
3
29 M
lawyer
Jazz
Y
soft
--
3
4
--
3
2
--
4
5
--
4
5
--
1
2
14
3
38
F
accountant
Metal
N
soft
--
3
1
--
1
2
--
3
3
--
4
4
--
5
4
15
3
21
F
secretary
Alt
N loud
--
4
4
--
1
2
--
1
1
--
5
3
--
4
5
Genre
Agree3
student
Agree2
27 M
Agree1
1
Consc3
3
Consc2
N
Consc1
Funk
Neuro3
musician
Neuro2
F
Neuro1
42
Extra3
1
Extra2
2
Extra1
professor
Open3
Occupation
F
Open2
Sex
47
Open1
Age
1
Volume
Form
Work Music
Participant
1
Classical N loud
crmda.KU.edu
18
Expansions of 3-Form Design
Forms
Item Set Order
1
2
3
4
5
6
7
8
9
XAB
XAC
XBC
AXB
AXC
BXC
ABX
ACX
BCX
(Graham, Taylor, Olchowski, & Cumsille, 2006)
crmda.KU.edu
19
Expansions of 3-Form Design
(Graham, Taylor, Olchowski, & Cumsille, 2006)
crmda.KU.edu
20
2-Method Planned Missing Design
crmda.KU.edu
21
2-Method Planned Missing Design
Self-Report
Bias
SelfSelfReport 1 Report 2
CO
Cotinine
Smoking
crmda.KU.edu
22
2-Method Measurement
•
Expensive Measure 1
• Gold standard– highly valid (unbiased) measure of
•
•
the construct under investigation
Problem: Measure 1 is time-consuming and/or
costly to collect, so it is not feasible to collect from a
large sample
Inexpenseive Measure 2
• Practical– inexpensive and/or quick to collect on a
•
large sample
Problem: Measure 2 is systematically biased so not
ideal
crmda.KU.edu
23
2-Method Measurement
•
e.g., measuring stress
•
•
•
Expensive Measure 1 = collect spit samples, measure cortisol
Inexpensive Measure 2 = survey querying stressful thoughts
e.g., measuring intelligence
• Expensive Measure 1 = WAIS IQ scale
• Inexpensive Measure 2 = multiple choice IQ test
•
e.g., measuring smoking
• Expensive Measure 1 = carbon monoxide measure
• Inexpensive Measure 2 = self-report
•
e.g., Student Attention
•
•
Expensive Measure 1 = Classroom observations
Inexpensive Measure 2 = Teacher report
crmda.KU.edu
24
2-Method Measurement
• How it works
• ALL participants receive Measure 2 (the cheap
•
•
one)
A subset of participants also receive Measure 1
(the gold standard)
Using both measures (on a subset of
participants) enables us to estimate and remove
the bias from the inexpensive measure (for all
participants) using a latent variable model
crmda.KU.edu
25
2-Method Measurement
• Example
• Does child’s level of classroom attention in Grade 1
•
•
predict math ability in Grade 3?
Attention Measures
• 1) Direct Classroom Assessment (2 items, N = 60)
• 2) Teacher Report (2 items, N = 200)
Math Ability Measure, 1 item (test score, N = 200)
crmda.KU.edu
26
1

Attention
(Grade 1)
TR1
TR 2
DA1
DA 2
Math Score
(Grade 3)
1
Teacher
Report 1
Teacher
Report 2
Direct
Assessment 1
Direct
Assessment 2
Math Score
(Grade 3)
(N = 200)
(N = 200)
(N = 60)
(N = 60)
(N = 200)
1
1
Teacher
Bias
2
 bias
crmda.KU.edu
27
1

Attention
(Grade 1)
TR1
TR 2
DA1
DA 2
Math Score
(Grade 3)
1
Teacher
Report 1
Teacher
Report 2
Direct
Assessment 1
Direct
Assessment 2
Math Score
(Grade 3)
(N = 200)
(N = 200)
(N = 60)
(N = 60)
(N = 200)
1
1
Teacher
Bias
2
 bias
crmda.KU.edu
28
1

Attention
(Grade 1)
TR1
TR 2
DA1
DA 2
Math Score
(Grade 3)
1
Teacher
Report 1
Teacher
Report 2
Direct
Assessment 1
Direct
Assessment 2
Math Score
(Grade 3)
(N = 200)
(N = 200)
(N = 60)
(N = 60)
(N = 200)
1
1
Teacher
Bias
2
 bias
29
crmda.KU.edu
29
1

Attention
(Grade 1)
TR1
TR 2
DA1
DA 2
Math Score
(Grade 3)
1
Teacher
Report 1
Teacher
Report 2
Direct
Assessment 1
Direct
Assessment 2
Math Score
(Grade 3)
(N = 200)
(N = 200)
(N = 60)
(N = 60)
(N = 200)
1
1
Teacher
Bias
2
 bias
30
crmda.KU.edu
30
1

Attention
(Grade 1)
TR1
TR 2
DA1
DA 2
Math Score
(Grade 3)
1
Teacher
Report 1
Teacher
Report 2
Direct
Assessment 1
Direct
Assessment 2
Math Score
(Grade 3)
(N = 200)
(N = 200)
(N = 60)
(N = 60)
(N = 200)
1
1
Teacher
Bias
2
 bias
crmda.KU.edu
31
1

Attention
(Grade 1)
TR1
TR 2
DA1
DA 2
Math Score
(Grade 3)
1
Teacher
Report 1
Teacher
Report 2
Direct
Assessment 1
Direct
Assessment 2
Math Score
(Grade 3)
(N = 200)
(N = 200)
(N = 60)
(N = 60)
(N = 200)
1
1
Teacher
Bias
2
 bias
crmda.KU.edu
32
2-Method Planned Missing Design
crmda.KU.edu
33
2-Method Planned Missing Design
crmda.KU.edu
34
Developmental time-lag model
•
Use 2-time point data with variable time-lags to
measure a growth trajectory + practice effects
(McArdle & Woodcock, 1997)
crmda.KU.edu
35
Time
Age
student
T1
T2
1
5;6
5;7
2
5;3
5;8
3
4;9
4;11
4
4;6
5;0
5
4;11
5;4
6
5;7
5;10
7
5;2
5;3
8
5;4
5;8
0
1
2
crmda.KU.edu
3
4
5
6
36
T0
T1
T2
T3
crmda.KU.edu
T4
T5
T6
37
Yt  1 I  Bt G  At P
1
Intercept
1
T0
1
1
T1
1
1
1
1
T2
T3
crmda.KU.edu
T4
T5
T6
38
Yt  1 I  Bt G  At P
Linear growth
1
1
Intercept
1
T0
1
Growth
1
T1
1
1
1
1
0 1
T2
2
3
4
5
T3
6
T4
T5
T6
39
crmda.KU.edu
39
Yt  1 I  Bt G  At P
Constant Practice Effect
1
1
Intercept
1
T0
1
1
Growth
1
T1
1
1
1
1
0 1
T2
2
3
4
Practice
5
T3
crmda.KU.edu
6
T4
0
11
1
1
T5
1
1
T6
40
Yt  1 I  Bt G  At P
Exponential Practice Decline
1
1
Intercept
1
T0
1
1
Growth
1
T1
1
1
1
1
0 1
T2
2
3
4
Practice
5
T3
crmda.KU.edu
6
T4
0
1 .87
.67
.55
T5
.45
.35
T6
41
The Equations for Each Time Point
Constant Practice Effect
Declining Practice Effect
YT0  I
YT1  I  1G  P
YT0  I
YT1  I  1G  1.0 P
YT2  I  2G  P
YT2  I  2G  .82 P
YT3  I  3G  P
YT3  I  3G  .67 P
YT 4  I  4G  P
YT 4  I  4G  .55P
YT 5  I  5G  P
YT 6  I  6G  P
YT 5  I  5G  .45P
YT 6  I  6G  .37 P
crmda.KU.edu
42
Developmental time-lag model
•
Summary
• 2 measured time points are formatted according to
•
time-lag
This formatting allows a growth-curve to be fit,
measuring growth and practice effects
crmda.KU.edu
43
age
grade
student
K
1
2
1
5;6
6;7
7;3
2
5;3
6;0
7;4
3
4;9
5;11 6;10
4
4;6
5;5
6;4
5
4;11
5;9
6;10
6
5;7
6;7
7;5
7
5;2
6;1
7;3
8
5;4
6;5
7;6
4;64;11
5;0- 5;65;5 5;11
crmda.KU.edu
6;0- 6;66;5 6;11
7;0- 7;67;5 7;11
44
 Out of 3 waves, we
create 7 waves of data
with high missingness
 Allows for more finetuned age-specific
growth modeling
 Even high amounts of
missing data are not
typically a problem
for estimation
age
4;64;11
5;0- 5;65;5 5;11
6;0- 6;66;5 6;11
5;6
6;7
5;3
4;9
4;6
6;0
5;11
5;5
4;11
7;0- 7;67;5 7;11
7;3
7;4
6;10
6;4
5;9
6;10
5;7
6;7
5;2
6;1
5;4
6;5
crmda.KU.edu
7;5
7;3
7;6
45
Growth-Curve Design
Group
Time 1
Time 2
Time 3
Time 4
Time 5
1
x
x
x
x
x
2
x
x
x
x
missing
3
x
x
x
missing
x
4
x
x
missing
x
x
5
x
missing
x
x
x
6
missing
x
x
x
x
crmda.KU.edu
46
Growth Curve Design II
Group
Time 1
Time 2
Time 3
Time 4
Time 5
1
x
x
x
x
x
2
x
x
x
missing
missing
3
x
x
missing
x
missing
4
x
missing
x
x
missing
5
missing
x
x
x
missing
6
x
x
missing
missing
x
7
x
missing
x
missing
x
8
missing
x
x
missing
x
9
x
missing
missing
x
x
10
missing
x
missing
x
x
11
missing
missing
x
x
x
crmda.KU.edu
47
Growth Curve Design II
Group
Time 1
Time 2
Time 3
Time 4
Time 5
1
x
x
x
x
x
2
x
x
x
missing
missing
3
x
x
missing
x
missing
4
x
missing
x
x
missing
5
missing
x
x
x
missing
6
x
x
missing
missing
x
7
x
missing
x
missing
x
8
missing
x
x
missing
x
9
x
missing
missing
x
x
10
missing
x
missing
x
x
11
missing
missing
x
x
x
crmda.KU.edu
48
The Impact of Auxiliary Variables
• Consider the following Monte Carlo
simulation:
• 60% MAR (i.e., Aux1) missing data
• 1,000 samples of N = 100
crmda.KU.edu
www.crmda.ku.edu
49
Excluding A Correlate of Missingness
crmda.KU.edu
www.crmda.ku.edu
50
Simulation Results Showing the Bias Associated with Omitting a
Correlate of Missingness.
crmda.KU.edu
51
MNAR improvements
crmda.KU.edu
www.crmda.ku.edu
52
Simulation Results Showing the Bias Reduction Associated with
Including Auxiliary Variables in a MNAR Situation.
crmda.KU.edu
53
Improvement in power relative to the power
of a model with no auxiliary variables.
Simulation results showing the relative power associated with
including auxiliary variables in a MCAR Situation.
crmda.KU.edu
54
PCA Auxiliary Variables
• Use PCA to reduce the dimensionality of
the auxiliary variables in a data set.
• A new smaller set of auxiliary variables are
created (e.g., principal components) that
contain all the useful information (both linear
and non-linear) in the original data set.
• These principal component scores are then
used to inform the missing data handling
procedure (i.e., FIML, MI).
crmda.KU.edu
www.crmda.ku.edu
55
The Use of PCA Auxiliary Variables
• Consider a series of simulations:
• MCAR, MAR, MNAR (10-60%) missing data
• 1,000 samples of N = 50-1000
crmda.KU.edu
www.crmda.ku.edu
56
60% MAR correlation estimates with no
auxiliary variables
Simulation results showing XY correlation estimates (with 95 and
99% confidence intervals) associated with a 60% MAR Situation.
crmda.KU.edu
57
Bias – Linear MAR process
ρAux,Y = .60; 60% MAR
crmda.KU.edu
58
Non-Linear Missingness
crmda.KU.edu
59
Bias – Non-Linear MAR process
ρAux,Y = .60; 60% non-linear MAR
crmda.KU.edu
60
Bias
ρAux,Y = .60; 60% MAR
crmda.KU.edu
61
Bias
ρAux,Y = .60; 60% MAR
crmda.KU.edu
62
Bias
ρAux,Y = .60; 60% MAR
crmda.KU.edu
63
60% MAR correlation estimates with no
auxiliary variables
Simulation results showing XY correlation estimates (with 95 and 99% confidence
intervals) associated with a 60% MAR Situation.
crmda.KU.edu
64
60% MAR correlation estimates with all possible
auxiliary variables (r = .60)
Simulation results showing XY correlation estimates (with 95 and 99%
confidence intervals) associated with a 60% MAR Situation and 8 auxiliary
variables.
crmda.KU.edu
65
60% MAR correlation estimates with 1 PCA
auxiliary variable (r = .60)
Simulation results showing XY correlation estimates (with 95 and 99%
confidence intervals) associated with a 60% MAR Situation and 1 PCA auxiliary
variable.
crmda.KU.edu
66
Auxiliary Variable Power Comparison
1 PCA Auxiliary
All 8 Auxiliary
Variables
1 Auxiliary
crmda.KU.edu
67
Faster and more reliable convergence
crmda.KU.edu
68
Summary
• Including principal component auxiliary
variables in the imputation model improves
parameter estimation compared to
• the absence of auxiliary variables and
• beyond the improvement of typical auxiliary
variables in most cases, particularly with the nonlinear MAR type of missingness.
• Improve missing data handling procedures
when the number of potential auxiliary
variables is beyond a practical limit.
crmda.KU.edu
www.crmda.ku.edu
69
www.quant.ku.edu
crmda.KU.edu
70
Simple Significance Testing with MI
•
Generate multiply imputed datasets (m).
•
Calculate a single covariance matrix on all N*m observations.
•
•
Run the Analysis model on this single covariance matrix and
use the resulting estimates as the basis for inference and
hypothesis testing.
•
•
By combining information from all m datasets, this matrix should
represent the best estimate of the population associations.
The fit function from this approach should be the best basis for making
inferences about model fit and significance.
Using a Monte Carlo Simulation, we test the hypothesis that
this approach is reasonable.
crmda.KU.edu
71
Population Model
.5
1
1
Factor B
Factor A
.6
.3
A1
.3
.6
...
A2
.2
.2
.6
.6
.2
A10
B1
.3
.1
.1
.05
.05
.6
B2
.3
.05
.1
Covariate
1
.3
.6
.1
...
B10
.3
.1
.1
Covariate
2
3
crmda.KU.edu
72
Analysis Model
ψ2,1 (0*)†
1*
1*
Factor B
Factor A
λ1,1
A1
θ1,1
λ2,
λ10,
1
1
A2
θ2,2
...
λ11,2
B1
A10
λ12,
λ20,
2
2
B2
θ11,11
θ10,10
...
θ12,12
B10
θ20,20
†: ψ2,1 is fixed to zero to parameterize the alternative
model.
crmda.KU.edu
73
Results: Change in Chi-squared Test
Percent Relative Bias
N
100
250
500
PM
SuperMatrix
Naive
10
0.893
1.079
30
4.399
5.319
50
16.294
19.517
10
0.489
0.570
30
1.829
2.176
50
8.072
9.269
10
0.630
0.669
30
1.993
2.175
50
6.202
6.764
Note: The x-axis of the figure above represents
levels of PM nested within sample size.
crmda.KU.edu
74
On the Merits of Planning and
Planning for Missing Data*
*You’re a fool for not using planned
missing data design
Thanks for your attention!
Questions?
crmda.KU.edu
Workshop presented 5-23-2012 @
University of Turku, Finland
crmda.KU.edu
75
Update
Dr. Todd Little is currently at
Texas Tech University
Director, Institute for Measurement, Methodology, Analysis and Policy (IMMAP)
Director, “Stats Camp”
Professor, Educational Psychology and Leadership
Email: yhat@ttu.edu
IMMAP (immap.educ.ttu.edu)
Stats Camp (Statscamp.org)
www.Quant.KU.edu
76
Download