Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data Roderick Little Rennes 2015 1 Outline • Inference with missing data: Rubin's (1976) paper on conditions for ignoring the missing-data mechanism • Rubin’s standard conditions are sufficient but not necessary: example • Propose definitions of partially MAR, ignorability for likelihood (and Bayes) inference for subsets of parameters (Little and Zanganeh, 2013) • Application: Subsample ignorable methods for regression with missing covariates (Little and Zhang, 2011) • Joint work with Nanhua Zhang, Sahar Zanganeh Rennes 2015 2 v Rennes 2015 3 Rubin (1976 Biometrika) • Landmark paper (5000+ citations, after being rejected by many journals!) – I wrote my first referee’s report (11 pages!), and an obscure discussionon ancillarity • Modeled the missing data mechanism by treating missingness indicators as random variables, assigning them a distribution • Sufficient conditions under which missing data mechanism can be ignored for likelihood and frequentist inference about parameters – Focus here on likelihood, Bayes Rennes 2015 4 Ignoring the mechanism D data with no missing values, Dobs observed, Dmis missing R = response indicator matrix f D ,R ( D, R | , ) f D ( D | ) f R|D ( R | D, ) • Full likelihood: L( , | Dobs , R) const. f D ( D | ) f R|D ( R | D, )dDmis • Likelihood ignoring mechanism: Lign ( | Dobs , R) const. f D ( D | )dDmis • Missing data mechanism can be ignored for likelihood inference when L( , | Dobs , R) Lign ( |Dobs ,R) Lrest ( | Dobs , R) Rennes 2015 5 Rubin’s sufficient conditions for ignoring the mechanism • Missing data mechanism can be ignored for likelihood inference when – (a) the missing data are missing at random (MAR): f R|D ( R | Dobs , Dmis , ) f R|D ( R | Dobs , ) for all Dmis , – (b) distinctness of the parameters of the data model and the missing-data mechanism: ( , ) ; for Bayes, and a-priori independent • MAR is the key condition: without (b), inferences are valid but not fully efficient Rennes 2015 6 More on Rubin (1976) • Seaman et al. (2013) propose a more complex but precise notation • Distinguish between “direct” likelihood inference and “frequentist likelihood inference – “Realized MAR” sufficient for direct likelihood inference – R depends only on realized observed data – “Everywhere MAR” sufficient for frequentist likelihood inference: MAR condition needs to hold for observed values in future repeated sampling – Rubin (1976) uses term “always MAR”. See also Mealli and Rubin (2015, forthcoming) Rennes 2015 7 “Sufficient for ignorable” is not the same as “ignorable” • These definitions have come to define ignorability (e.g. Little and Rubin 2002) • However, Rubin (1976) described (a) and (b) as the "weakest simple and general conditions under which it is always appropriate to ignore the process that causes missing data". • These conditions are not necessary for ignoring the mechanism in all situations. MAR+distinctness ignorable ignorable MAR+distinctness Rennes 2015 8 Example 1: Nonresponse with Or whole auxiliary data population N Dobs ( Dresp , Daux ) Dresp ( yi1 , yi 2 ), i 1,..., m , Daux y*j1 , j 1,..., n Y1 R Y1 Y2 Daux includes the respondent values of Y1 , but we do not know which they are. Y1 , Y2 ~ ind f ( yi1 , yi 2 | ) Pr(ri 1| yi1 , yi 2 , ) g ( yi1 , ) 0 0 0 1 1 ? ? ? ? Not linked Not MAR -- yi1 missing for nonrespondents i But... mechanism is ignorable, does not need to be modeled: Marginal distribution of Y1 estimated from Daux Conditional of Y2 given Y1 estimated from D resp Rennes 2015 9 MAR, ignorability for parameter subsets • MAR and ignorability are defined in terms of the complete set of parameters in the data model for D • It would be useful to have a definition of MAR that applies to subsets of parameters of substantive interest. • Example: inference for regression parameters might be “partially MAR” when parameters for a model for full data are not. Rennes 2015 10 MAR, ignorability for parameter subsets =(1 , 2 ), = parameters of model for mechanism Define the data to be partially MAR for likelihood inference about 1 , denoted P-MAR(1 ), if: L(1 , 2 , | Dobs , R) Lign (1 | Dobs ) Lrest ( 2 , | Dobs , R) for all 1 , 2 , Data are IGN(1 ) if P-MAR(1 ) and 1 and ( 2 , ) distinct Rennes 2015 11 MAR, ignorability for parameter subsets Special case where 1 = , all the parameters: Data are P-MAR( ) if: L( , | Dobs , R) Lign ( | Dobs ) Lrest ( | Dobs , R) for all , A consequence of (but does not imply) Rubin's MAR condition IGN( ) if MAR( ) and and distinct Rennes 2015 12 Partial MAR given a function of mechanism Harel and Schafer (2009) define a different kind of Partial MAR: Mechanism is partially MAR given g ( R) if: P( R | Yobs , Ymis , g ( R), , ) P( R | Yobs , g ( R), , ) for all , , R, Yobs Here "partial" relates to the mechanism, In our definition "partial" relates to the parameters These ideas seem quite distinct Rennes 2015 13 Example 1: Auxiliary Survey Data Dobs ( Dresp , Daux ) Dresp ( yi1 , yi 2 ), i 1,..., m , Daux y*j1 , j 1,..., n Y1 R Y1 Y2 0 0 0 1 1 Daux includes the respondent values of Y1 , but we do not know which they are. D ( yi1 , yi 2 ), i 1,..., n} Y1 , Y2 ~ f ( yi1 , yi 2 | ) Pr(ri 1| yi1 , yi 2 , ) g ( yi1 , ) ? ? ? ? Not linked Easy to show that data are P-MAR( ), and IGN( ) if , are distinct Rennes 2015 14 Ex. 2: MNAR Monotone Bivariate Data M Y1 Y2 D ( yi1 , yi 2 ), i 1,..., n} Dobs ( yi1 , yi 2 ), i 1,..., m and yi1 , i m 1,..., n Y1 , Y2 ~ f ( yi1 , yi 2 | ) f ( yi1 | 1 ) f ( yi 2 | yi1 , 2 ) Pr (ri 2 1| yi1 , yi 2 , ) g ( yi1 , yi 2 , ) (MNAR) 0 0 0 1 1 ? ? COMMENT: Clearly, inference about parameters 1 of the marginal distribution of Y1 can ignore mechanism, since Y1 has no missing values. In proposed definition, this mechanism is P-MAR(1 ), and IGN(1 ) if 1 and ( 2 , ) distinct [Note distinctness for IGN(1 )] Rennes 2015 15 More generally… (Y1 , R (1) ),(Y2 , R (2) ) blocks of incomplete variables, and f ( y1i , y2i , ri(1) , ri(2) ) f1 ( y1i | 1 ) Pr(ri(1) | y1i , 1 ) f1 ( y2i | y1i , 2 ) Pr(ri (2) | ri(1) , y1i , y2i , 2 ) Assume: Pr(ri(1) | y1i ; 1 ) g1 ( y1,obs,i , 1 ) for all y1,mis,i , Pr(ri(2) | ri(1) , y1i , y2i ; 2 ) g 2 (ri(1) , y1i , y2i , 2 ), Mechanism is P-MAR(1 ), IGN(1 ) if 1 and ( 2 , 1 , 2 ) are distinct Rennes 2015 16 Application: missing data in covariates Target: regression of Y on X, Z; missing data on X IL methods include information for the regression in the incomplete cases (particularly intercept and coefficients of Z) and are valid assuming MAR: Pr(X missing)= g(Z, Y) Z X Y ? BUT: if Pr(X missing)= g(Z, X) CC analysis is consistent, but IL methods (or weighted CC) are inconsistent since mechanism is not MAR Simulations favoring IL often generate data under MAR, hence are biased against CC Rennes 2015 17 More general missing data in X Pattern Observation, i zi xi Could be vector yi R( xi , yi ) P1 i = 1,…,m √ √ √ P2 i = m +1,…,n √ ? √ ux (1,...,1) ux Key: √ denotes observed, ? denotes observed or missing zi xi yi P1 P2 Rennes 2015 18 Ignorable Likelihood methods Ignorable likelihood inferences (IL) for model: p ( xi , yi | zi , ) are based on n Lign ( ) const. p ( xobs,i , yi | zi , ) i 1 1 1 ( ) = parameters of regression of Y on X , Z ML: ˆ1 1 (ˆ) Bayes: draw 1( d ) 1 ( ( d ) ) (d ) Multiple imputation: draw X mis ~ P ( X mis | data), apply multiple imputation combining rules to estimates of 1 All these methods assume MAR: p( Rxi | zi , xi , yi , ) p( Rxi | zi , xobs,i , yi , ) for all xmis,i Rennes 2015 19 CC analysis Assume: missingness of X depends on covariates, not outcomes: p Rxi u x | zi , xi , yi , ) p Rxi u x | zi , xi , ) for all yi n ( , ), Lfull ( ) p( Rx , xi , yi | zi , )dxmis i i 1 m n i 1 i m1 p ( Rxi u x , xi , yi | zi , ) p ( Rxi , xi , yi | zi , )dxmis m n i 1 i m1 * MNAR mechanism: Missingness can depend on missing values of X p ( yi | zi , xi , Rxi u x , ) p ( Rxi u x , xi , yi | zi , ) p ( Rxi , xi , yi | zi , )dxmis m m n i 1 i 1 i m 1 p ( yi | zi , xi ,1 ) p ( Rxi u x , xi , yi | zi , ) p ( Rxi , xi , yi | zi , )dxmis Lcc (1 ) Lrest ( ) Follows from (*) Hence data are P-MAR(1 ), but not IGN(1 ), since includes 1: loss of information except in special cases Rennes 2015 20 Extension: missing data on X and Y zi zi Could be vector yi Rxi xi Pattern Observation, i P1 i = 1,…,m √ √ ? P2 i = m +1,…,n √ ? ? ux (1,...,1) ux Key: √ denotes observed, ? denotes observed or missing zi xi yi P1: covariates complete P2: x incomplete Rennes 2015 21 SSIL analysis, X and Y missing Assume: XCOV: missingness of X depends on covariates, not outcomes: p Rxi u x | zi , xi , yi , ) p Rxi u x | zi , xi , ) for all yi YMAR: Y is MAR in subsample with X observed: p ( Ryi | Rxi u x , zi , xi , yi , ) p ( Ryi | Rxi u x , zi , xi , yobs,i , ) for all yi (YMAR) Under these conditions, data are P-MAR (1 ), and corresponding analysis is to apply IL method to the subsample of cases with X fully observed. We call this subsample ignorable likelihood SSIL Rennes 2015 22 SSIL likelihood, X and Y missing n Lfull ( ) p ( Rxi , Ryi , xi , yi | zi , ) dxmis dymis i 1 m p ( xi , Rxi u x , yi , Ryi | zi , ) dymis Lrest ( ) i 1 m Lrest ( ) p( xi , Rxi u x | zi , ) i 1 m i 1 p( y | z , x , R i i i xi u x , ) p( Ryi | yi , zi , xi , Rxi u x , )dymis m L ( ) p ( yi | zi , xi ,1 ) p ( Ryi | zi , yobs,i , xi , Rxi u x , )dymis * rest i 1 m L**rest ( ) p ( yobs,i | zi , xi ,1 ) i 1 Hence data are P-MAR(1 ), SSIL valid for 1 Rennes 2015 23 Two covariates X, W with different mechanisms Pattern Observation, i P1 i = 1,…,m P2 P3 zi xi wi √ √ √ i = m +1,…,m+r √ √ i = m +r+1,…,n √ ? yi Rxi Rwi ? ux uw ? ? ux uw ? ? ux uw SSIL: analyze cases in patterns 1 and 2 zi xi wi yi P1: covariates complete P2: x obs, w, y may be mis P3: x mis, w, y may be mis Rennes 2015 24 Subsample Ignorable Likelihood (SSIL) • Target: regression of Y on Z, X, and W • Assume: (XCOV) Completeness of X can depend on covariates but not Y : p Rxi u x | zi , xi , wi , yi , x ) p Rxi u x | zi , xi , wi , x ) for all yi (WYMAR) Missingness of (W , Y ) is MAR within subsample of cases with X observed: p ( R( wi , yi ) | zi , xi , wi , yi , Rxi u x ; wyx ) p ( R( wi , yi ) | zi , xi , wobs,i , yobs,i , Rxi u x ; wyx ) for all wmis,i , ymis,i • By similar proof to previous case, data are P-MAR(1 ); SSIL applies IL method (e.g. ML) to the subsample of cases for which X is observed, but W or Y may be incomplete Rennes 2015 25 Simulation Study • For each of 1000 replications, 5000 observations Z, W, X and Y generated as: yi | zi , wi , xi ~ ind ( zi , wi , xi ) ~ ind N (1 zi wi xi ,1) 1 N (0, ), 1 1 • 20-35% of missing values of W and X generated by four mechanisms Rennes 2015 26 Simulation: missing data mechanisms Mechanisms 0( w) z( w) w( w) x( w) y( w) 0( x ) z( x ) w( x ) x( x ) y( w) I: All valid -1 1 0 0 0 -1 1 0 0 0 II: CC valid -1 1 1 1 0 -1 1 1 1 0 III: IML valid -2 1 0 0 1 -2 1 1 0 1 IV: SSIML valid -1 1 1 1 0 -2 1 1 0 1 logit P( R logit P( Rwi 0 | zi , wi , xi , yi ) 0( w) z( w) zi w( w) wi x( w) xi y( w) yi xi 0 | Rwi 1, zi , wi , xi , yi ) 0( x ) z( x ) zi w( x ) wi x( x ) xi y( x ) yi Rennes 2015 27 RMSEs*1000 of Estimated Regression Coefficients for Before Deletion (BD), Complete Cases (CC), Ignorable Maximum Likelihood (IML) and Subsample Ignorable Maximum Likelihood (SSIML), under Four Missing Data Mechanisms. 0 0.8 I* II III IV I II III IV BD 27 28 28 27 50 46 50 46 CC 45 44 553 322 86 71 426 246 IML 37 231 36 116 58 96 53 90 SSIML 42 133 360 49 70 80 319 69 Valid: ALL CC IML SSIML ALL CC Rennes 2015 IML SSIML 28 Missing Covariates in Survival Analysis {t1 ,..., tk } distinct survival times, j = unit that fails at time t j (no ties); R j risk set at time t j , and z j , x j are covariates. Complete data: contribution of data at time t j to partial likelihood is ( y t j | z j , xj , ) Lj , ( y t j | z j , x j , ) hazard ( y tk | zk , xk , ) kR j With z j fully observed (can be time-varying), x j covariate-dependent missing, i.e.: Pr( Rx j u | y j , z j , x j ) Pr( Rx j u | z j , x j ) then ( y t j | Rx j u , z j , x j , ) ( y t j | z j , x j , ) Hence SSIL, restricted to cases with x j observed in each risk set gives valid partial likelihood -- see Zhang and Little (2014) Rennes 2015 29 How to choose X, W • Choice requires understanding of the mechanism: • Variables that are missing based on their underlying values belong in W • Variables that are MAR belong in X • Collecting data about why variables are missing is obviously useful to get the model right • But this applies to all missing data adjustments… Rennes 2015 30 Other questions and points – How much is lost from SSIL relative to full likelihood model of data and missing data mechanism? • In some special cases, SSIL is efficient for a pattern-mixture model • In other cases, trade-off between additional specification of mechanism and loss of efficiency from conditional likelihood – MAR analysis applied to the subset does not have to be likelihood-based • E.g. weighted GEE, AIPWEE – Pattern-mixture models (Little, 1993) can also avoid modeling the mechanism Rennes 2015 31 Conclusions • Defined partial MAR for a subset of parameters • Application to regression with missing covariates: sometimes discarding data is useful! • Subsample ignorable likelihood: apply likelihood method to data, selectively discarding cases based on assumed missingdata mechanism – More efficient than CC – Valid for P-MAR mechanisms where IL, CC are inconsistent Rennes 2015 32 References Harel, O. and Schafer, J.L. (2009). Partial and Latent Ignorability in missing data problems. Biometrika, 2009, 1-14 Little, R.J.A. (1993). Pattern-Mixture Models for Multivariate Incomplete Data. JASA, 88, 125-134. Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.) Wiley. Little, R.J. and Zangeneh, S.Z. (2013). Missing at random and ignorability for inferences about subsets of parameters with missing data. University of Michigan Biostatistics Working Paper Series. Little, R. J. and Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. JRSSC, 60, 4, 591–605. Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63, 581592. Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013). What Is Meant by “Missing at Random”? Statist. Sci. 28, 2, 257-268. Zhang, N. & Little, R.J. (2014). Lifetime Data Analysis, published online Aug 2014. doi:10.1007/s10985-014-9304-x. Rennes 2015 33