Statistical Analysis of Repeated-Measures Data

advertisement
Partially Missing At Random and
Ignorable Inferences for Parameter
Subsets with Missing Data
Roderick Little
Rennes 2015
1
Outline
• Inference with missing data: Rubin's (1976) paper on
conditions for ignoring the missing-data mechanism
• Rubin’s standard conditions are sufficient but not
necessary: example
• Propose definitions of partially MAR, ignorability for
likelihood (and Bayes) inference for subsets of
parameters (Little and Zanganeh, 2013)
• Application: Subsample ignorable methods for regression
with missing covariates (Little and Zhang, 2011)
• Joint work with Nanhua Zhang, Sahar Zanganeh
Rennes 2015
2
v
Rennes 2015
3
Rubin (1976 Biometrika)
• Landmark paper (5000+ citations, after being
rejected by many journals!)
– I wrote my first referee’s report (11 pages!), and an
obscure discussionon ancillarity
• Modeled the missing data mechanism by
treating missingness indicators as random
variables, assigning them a distribution
• Sufficient conditions under which missing data
mechanism can be ignored for likelihood and
frequentist inference about parameters
– Focus here on likelihood, Bayes
Rennes 2015
4
Ignoring the mechanism
D  data with no missing values, Dobs observed, Dmis missing
R = response indicator matrix
f D ,R ( D, R |  , )  f D ( D |  ) f R|D ( R | D, )
• Full likelihood:
L( , | Dobs , R)  const.   f D ( D |  ) f R|D ( R | D, )dDmis
• Likelihood ignoring mechanism:
Lign ( | Dobs , R)  const.   f D ( D |  )dDmis
• Missing data mechanism can be ignored for
likelihood inference when
L( , | Dobs , R)  Lign ( |Dobs ,R)  Lrest ( | Dobs , R)
Rennes 2015
5
Rubin’s sufficient conditions for
ignoring the mechanism
• Missing data mechanism can be ignored for
likelihood inference when
– (a) the missing data are missing at random (MAR):
f R|D ( R | Dobs , Dmis , )  f R|D ( R | Dobs , ) for all Dmis ,
– (b) distinctness of the parameters of the data model
and the missing-data mechanism:
( , )     ; for Bayes,  and  a-priori independent
• MAR is the key condition: without (b),
inferences are valid but not fully efficient
Rennes 2015
6
More on Rubin (1976)
• Seaman et al. (2013) propose a more complex
but precise notation
• Distinguish between “direct” likelihood
inference and “frequentist likelihood inference
– “Realized MAR” sufficient for direct likelihood
inference – R depends only on realized observed
data
– “Everywhere MAR” sufficient for frequentist
likelihood inference: MAR condition needs to hold
for observed values in future repeated sampling
– Rubin (1976) uses term “always MAR”. See also
Mealli and Rubin (2015, forthcoming)
Rennes 2015
7
“Sufficient for ignorable” is not the
same as “ignorable”
• These definitions have come to define ignorability (e.g.
Little and Rubin 2002)
• However, Rubin (1976) described (a) and (b) as the
"weakest simple and general conditions under which it
is always appropriate to ignore the process that causes
missing data".
• These conditions are not necessary for ignoring the
mechanism in all situations.
MAR+distinctness  ignorable
ignorable  MAR+distinctness
Rennes 2015
8
Example 1: Nonresponse with
Or whole
auxiliary data
population N
Dobs  ( Dresp , Daux )
Dresp  ( yi1 , yi 2 ), i  1,..., m , Daux   y*j1 , j  1,..., n Y1 R Y1 Y2
Daux includes the respondent values of Y1 ,
but we do not know which they are.
Y1 , Y2 ~ ind f ( yi1 , yi 2 |  )
Pr(ri  1| yi1 , yi 2 , )  g ( yi1 , )
0
0
0
1
1
?
?
?
?
Not linked
Not MAR -- yi1 missing for nonrespondents i
But... mechanism is ignorable, does not need to be modeled:
Marginal distribution of Y1 estimated from Daux
Conditional of Y2 given Y1 estimated from D resp
Rennes 2015
9
MAR, ignorability for parameter subsets
• MAR and ignorability are defined in terms of
the complete set of parameters in the data
model for D
• It would be useful to have a definition of MAR
that applies to subsets of parameters of
substantive interest.
• Example: inference for regression parameters
might be “partially MAR” when parameters for
a model for full data are not.
Rennes 2015
10
MAR, ignorability for parameter subsets
 =(1 , 2 ),  = parameters of model for mechanism
Define the data to be partially MAR for
likelihood inference about 1 , denoted P-MAR(1 ), if:
L(1 , 2 , | Dobs , R)  Lign (1 | Dobs )  Lrest ( 2 , | Dobs , R)
for all 1 , 2 ,
Data are IGN(1 ) if P-MAR(1 ) and 1 and ( 2 , ) distinct
Rennes 2015
11
MAR, ignorability for parameter subsets
Special case where 1 = , all the parameters:
Data are P-MAR( ) if:
L( , | Dobs , R)  Lign ( | Dobs )  Lrest ( | Dobs , R)
for all  ,
A consequence of (but does not imply) Rubin's MAR condition
IGN( ) if MAR( ) and  and  distinct
Rennes 2015
12
Partial MAR given a function of
mechanism
Harel and Schafer (2009) define a different kind of Partial MAR:
Mechanism is partially MAR given g ( R) if:
P( R | Yobs , Ymis , g ( R), , )  P( R | Yobs , g ( R), , )
for all  , , R, Yobs
Here "partial" relates to the mechanism,
In our definition "partial" relates to the parameters
These ideas seem quite distinct
Rennes 2015
13
Example 1: Auxiliary Survey Data
Dobs  ( Dresp , Daux )
Dresp  ( yi1 , yi 2 ), i  1,..., m , Daux   y*j1 , j  1,..., n Y1 R Y1 Y2
0
0
0
1
1
Daux includes the respondent values of Y1 ,
but we do not know which they are.
D  ( yi1 , yi 2 ), i  1,..., n}
Y1 , Y2 ~ f ( yi1 , yi 2 |  )
Pr(ri  1| yi1 , yi 2 , )  g ( yi1 , )
?
?
?
?
Not linked
Easy to show that data are P-MAR( ),
and IGN( ) if  , are distinct
Rennes 2015
14
Ex. 2: MNAR Monotone Bivariate Data
M Y1 Y2
D  ( yi1 , yi 2 ), i  1,..., n}
Dobs  ( yi1 , yi 2 ), i  1,..., m and  yi1 , i  m  1,..., n
Y1 , Y2 ~ f ( yi1 , yi 2 |  )  f ( yi1 | 1 )  f ( yi 2 | yi1 , 2 )
Pr (ri 2  1| yi1 , yi 2 , )  g ( yi1 , yi 2 , ) (MNAR)
0
0
0
1
1
?
?
COMMENT: Clearly, inference about parameters 1
of the marginal distribution of Y1 can ignore mechanism,
since Y1 has no missing values.
In proposed definition, this mechanism is P-MAR(1 ),
and IGN(1 ) if 1 and ( 2 , ) distinct
[Note distinctness for IGN(1 )]
Rennes 2015
15
More generally…
(Y1 , R (1) ),(Y2 , R (2) ) blocks of incomplete variables, and
f ( y1i , y2i , ri(1) , ri(2) )  f1 ( y1i | 1 ) Pr(ri(1) | y1i , 1 )
 f1 ( y2i | y1i , 2 ) Pr(ri (2) | ri(1) , y1i , y2i , 2 )
Assume: Pr(ri(1) | y1i ; 1 )  g1 ( y1,obs,i , 1 ) for all y1,mis,i ,
Pr(ri(2) | ri(1) , y1i , y2i ; 2 )  g 2 (ri(1) , y1i , y2i , 2 ),
Mechanism is P-MAR(1 ), IGN(1 ) if 1 and
( 2 , 1 , 2 ) are distinct
Rennes 2015
16
Application: missing data in covariates
Target: regression of Y on X, Z; missing data on X
IL methods include information for the
regression in the incomplete cases
(particularly intercept and coefficients of
Z) and are valid assuming MAR:
Pr(X missing)= g(Z, Y)
Z
X
Y
?
BUT: if Pr(X missing)= g(Z, X)
CC analysis is consistent, but IL methods (or weighted CC)
are inconsistent since mechanism is not MAR
Simulations favoring IL often generate data under MAR,
hence are biased against CC
Rennes 2015
17
More general missing data in X
Pattern
Observation, i
zi
xi
Could be
vector
yi
R( xi , yi )
P1
i = 1,…,m
√
√
√
P2
i = m +1,…,n
√
?
√
ux  (1,...,1)
ux
Key: √ denotes observed, ? denotes observed or missing
zi
xi
yi
P1
P2
Rennes 2015
18
Ignorable Likelihood methods
Ignorable likelihood inferences (IL) for model: p ( xi , yi | zi ,  ) are based on
n
Lign ( )  const.   p ( xobs,i , yi | zi ,  )
i 1
1  1 ( ) = parameters of regression of Y on X , Z
ML: ˆ1  1 (ˆ)
Bayes: draw 1( d )  1 ( ( d ) )
(d )
Multiple imputation: draw X mis
~ P ( X mis | data),
apply multiple imputation combining rules to estimates of 1
All these methods assume MAR:
p( Rxi | zi , xi , yi , )  p( Rxi | zi , xobs,i , yi , ) for all xmis,i
Rennes 2015
19
CC analysis
Assume: missingness of X depends on covariates, not outcomes:




p Rxi  u x | zi , xi , yi , )  p Rxi  u x | zi , xi , ) for all yi
n
  ( , ), Lfull ( )    p( Rx , xi , yi | zi ,  )dxmis
i
i 1
m
n
i 1
i  m1
  p ( Rxi  u x , xi , yi | zi ,  )   p ( Rxi , xi , yi | zi ,  )dxmis
m
n
i 1
i  m1
*
MNAR
mechanism:
Missingness can
depend on missing
values of X
  p ( yi | zi , xi , Rxi  u x ,  ) p ( Rxi  u x , xi , yi | zi ,  )   p ( Rxi , xi , yi | zi ,  )dxmis
m
m
n
i 1
i 1
i m 1
  p ( yi | zi , xi ,1 ) p ( Rxi  u x , xi , yi | zi ,  )   p ( Rxi , xi , yi | zi ,  )dxmis
 Lcc (1 )  Lrest ( )
Follows from (*)
Hence data are P-MAR(1 ), but not IGN(1 ),
since  includes 1: loss of information except in special cases
Rennes 2015
20
Extension: missing data on X and Y
zi
zi
Could be
vector
yi
Rxi
xi
Pattern
Observation, i
P1
i = 1,…,m
√
√
?
P2
i = m +1,…,n
√
?
?
ux  (1,...,1)
ux
Key: √ denotes observed, ? denotes observed or missing
zi
xi
yi
P1: covariates complete
P2: x incomplete
Rennes 2015
21
SSIL analysis, X and Y missing
Assume:
XCOV: missingness of X depends on covariates, not outcomes:

p Rxi  u x | zi , xi , yi , )



 p Rxi  u x | zi , xi , ) for all yi
YMAR: Y is MAR in subsample with X observed:
p ( Ryi | Rxi  u x , zi , xi , yi , )
 p ( Ryi | Rxi  u x , zi , xi , yobs,i , ) for all yi (YMAR)
Under these conditions, data are P-MAR (1 ),
and corresponding analysis is to apply IL method
to the subsample of cases with X fully observed.
We call this subsample ignorable likelihood  SSIL 
Rennes 2015
22
SSIL likelihood, X and Y missing
n
Lfull ( )    p ( Rxi , Ryi , xi , yi | zi ,  ) dxmis dymis
i 1
m
   p ( xi , Rxi  u x , yi , Ryi | zi ,  ) dymis  Lrest ( )
i 1
m

 Lrest ( )   p( xi , Rxi  u x | zi ,  )
i 1
m

i 1
  p( y | z , x , R
i
i
i
xi

 u x ,  ) p( Ryi | yi , zi , xi , Rxi  u x , )dymis
m
 L ( )    p ( yi | zi , xi ,1 ) p ( Ryi | zi , yobs,i , xi , Rxi  u x , )dymis
*
rest
i 1
m
 L**rest ( )   p ( yobs,i | zi , xi ,1 )
i 1
Hence data are P-MAR(1 ), SSIL valid for 1
Rennes 2015
23

Two covariates X, W with different mechanisms
Pattern
Observation, i
P1
i = 1,…,m
P2
P3
zi
xi
wi
√
√
√
i = m +1,…,m+r
√
√
i = m +r+1,…,n
√
?
yi
Rxi
Rwi
?
ux
uw
?
?
ux
uw
?
?
ux
uw
SSIL: analyze cases in patterns 1 and 2
zi xi
wi
yi
P1: covariates complete
P2: x obs, w, y may be mis
P3: x mis, w, y may be mis
Rennes 2015
24
Subsample Ignorable Likelihood (SSIL)
• Target: regression of Y on Z, X, and W
• Assume:
(XCOV) Completeness of X can depend on covariates but not Y :




p Rxi  u x | zi , xi , wi , yi , x )  p Rxi  u x | zi , xi , wi , x ) for all yi
(WYMAR) Missingness of (W , Y ) is MAR within subsample
of cases with X observed:
p ( R( wi , yi ) | zi , xi , wi , yi , Rxi  u x ; wyx ) 
p ( R( wi , yi ) | zi , xi , wobs,i , yobs,i , Rxi  u x ; wyx ) for all wmis,i , ymis,i
• By similar proof to previous case, data are P-MAR(1 );
SSIL applies IL method (e.g. ML) to the subsample of
cases for which X is observed, but W or Y may be
incomplete
Rennes 2015
25
Simulation Study
• For each of 1000 replications, 5000 observations Z, W,
X and Y generated as:
 yi | zi , wi , xi  ~ ind
( zi , wi , xi ) ~ ind
N (1  zi  wi  xi ,1)
1

N (0, ),    



1


 
1 
• 20-35% of missing values of W and X generated by four
mechanisms
Rennes 2015
26
Simulation: missing data mechanisms
Mechanisms
 0( w)  z( w)  w( w)  x( w)  y( w)  0( x )  z( x )  w( x )  x( x )  y( w)
I: All valid
-1
1
0
0
0
-1
1
0
0
0
II: CC valid
-1
1
1
1
0
-1
1
1
1
0
III: IML valid
-2
1
0
0
1
-2
1
1
0
1
IV: SSIML valid
-1
1
1
1
0
-2
1
1
0
1

logit  P( R

logit P( Rwi  0 | zi , wi , xi , yi )   0( w)   z( w) zi   w( w) wi   x( w) xi   y( w) yi
xi

 0 | Rwi  1, zi , wi , xi , yi )   0( x )   z( x ) zi   w( x ) wi   x( x ) xi   y( x ) yi
Rennes 2015
27
RMSEs*1000 of Estimated Regression Coefficients for Before
Deletion (BD), Complete Cases (CC), Ignorable Maximum
Likelihood (IML) and Subsample Ignorable Maximum Likelihood
(SSIML), under Four Missing Data Mechanisms.
 0
  0.8
I*
II
III
IV
I
II
III
IV
BD
27
28
28
27
50
46
50
46
CC
45
44
553
322
86
71
426
246
IML
37
231
36
116
58
96
53
90
SSIML
42
133
360
49
70
80
319
69
Valid: ALL
CC
IML SSIML ALL CC
Rennes 2015
IML SSIML
28
Missing Covariates in Survival Analysis
{t1 ,..., tk } distinct survival times, j = unit that fails at time t j (no ties);
R j  risk set at time t j , and z j , x j are covariates.
Complete data: contribution of data at time t j to partial likelihood is
( y  t j | z j , xj ,  )
Lj 
,  ( y  t j | z j , x j ,  )  hazard
  ( y  tk | zk , xk ,  )
kR j
With z j fully observed (can be time-varying),
x j covariate-dependent missing, i.e.:
Pr( Rx j  u | y j , z j , x j )  Pr( Rx j  u | z j , x j )
then  ( y  t j | Rx j  u , z j , x j ,  )   ( y  t j | z j , x j ,  )
Hence SSIL, restricted to cases with x j observed in each risk set
gives valid partial likelihood -- see Zhang and Little (2014)
Rennes 2015
29
How to choose X, W
• Choice requires understanding of the
mechanism:
• Variables that are missing based on their
underlying values belong in W
• Variables that are MAR belong in X
• Collecting data about why variables are missing
is obviously useful to get the model right
• But this applies to all missing data
adjustments…
Rennes 2015
30
Other questions and points
– How much is lost from SSIL relative to full
likelihood model of data and missing data
mechanism?
• In some special cases, SSIL is efficient for a
pattern-mixture model
• In other cases, trade-off between additional
specification of mechanism and loss of efficiency
from conditional likelihood
– MAR analysis applied to the subset does not
have to be likelihood-based
• E.g. weighted GEE, AIPWEE
– Pattern-mixture models (Little, 1993) can
also avoid modeling the mechanism
Rennes 2015
31
Conclusions
• Defined partial MAR for a subset of parameters
• Application to regression with missing
covariates: sometimes discarding data is useful!
• Subsample ignorable likelihood: apply
likelihood method to data, selectively
discarding cases based on assumed missingdata mechanism
– More efficient than CC
– Valid for P-MAR mechanisms where IL, CC are
inconsistent
Rennes 2015
32
References
Harel, O. and Schafer, J.L. (2009). Partial and Latent Ignorability in
missing data problems. Biometrika, 2009, 1-14
Little, R.J.A. (1993). Pattern-Mixture Models for Multivariate
Incomplete Data. JASA, 88, 125-134.
Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with Missing
Data (2nd ed.) Wiley.
Little, R.J. and Zangeneh, S.Z. (2013). Missing at random and
ignorability for inferences about subsets of parameters with missing
data. University of Michigan Biostatistics Working Paper Series.
Little, R. J. and Zhang, N. (2011). Subsample ignorable likelihood for
regression analysis with missing data. JRSSC, 60, 4, 591–605.
Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63, 581592.
Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013). What Is Meant
by “Missing at Random”? Statist. Sci. 28, 2, 257-268.
Zhang, N. & Little, R.J. (2014). Lifetime Data Analysis, published online
Aug 2014. doi:10.​1007/​s10985-014-9304-x.
Rennes 2015
33
Download