Partially missing at random and ignorable inferences for parameter subsets with missing data Roderick Little Outline • Survey Bayesics in three slides • Inference with missing data: Rubin's (1976) paper on conditions for ignoring the missingdata mechanism • Rubin’s standard conditions are sufficient but not necessary: example • Propose definitions of MAR, ignorability for likelihood (and Bayes) inference for subsets of parameters • Examples • Joint work with Sahar Zanganeh Graybill Conference: Partially Missing at Random 2 Calibrated Bayes – Frequentists should be Bayesian • Bayes is optimal under assumed model – Bayesians should be frequentist • We never know the model (and all models are wrong) • Inferences should have good repeated sampling characteristics – Calibrated Bayes (e.g. Box 1980, Rubin 1984, Little 2012) • Inference based on a Bayesian model • Model chosen to yield inferences that are well-calibrated in a frequentist sense • Aim for posterior probability intervals that have (approximately) nominal frequentist coverage Graybill Conference: Partially Missing at Random 3 Calibrated Bayes models for surveys should incorporate sample design features – All models are wrong, some models are useful • Design-assisted: make the estimator more robust • Calibrated Bayes: make the model more robust – many models yield design-consistent estimates – Models that ignore features like survey weights are vulnerable to misspecification – But models can be successfully applied in survey setting, with attention to design features • Weighting, stratification, clustering – Capture design weights as covariates in the prediction model (e.g. Gelman 2007) Graybill Conference: Partially Missing at Random 4 Benefits of Bayes • Unified approach to all problems – Avoids current approach -- “inferential schizophrenia” • Not asymptotic – Propagates errors in estimating parameters • Avoids frequentist pitfalls: – Conditions on ancillaries – Obeys likelihood principle Graybill Conference: Partially Missing at Random 5 v Graybill Conference: Partially Missing at Random 6 There are those who predict… … and those who weight Graybill Conference: Partially Missing at Random 7 Rubin (1976 Biometrika) • Landmark paper (3700+ citations, after being rejected by many journals!) – RL wrote his first (11 page) referee report, and an obscure discussion • Modeled the missing data mechanism by treating missingness indicators as random variables, assigning them a distribution • Sufficient conditions under which missing data mechanism can be ignored for likelihood and frequentist inference about parameters – Focus here on likelihood, Bayes Graybill Conference: Partially Missing at Random 8 Ignoring the mechanism D data with no missing values, Dobs observed, Dmis missing R = response indicator matrix f D ,R ( D, R | , ) f D ( D | ) f R|D ( R | D, ) • Full likelihood: L( , | Dobs , R) const. f D ( D | ) f R|D ( R | D, )dDmis • Likelihood ignoring mechanism: Lign ( | Dobs , R) const. f D ( D | )dDmis • Missing data mechanism can be ignored for likelihood inference when L( , | Dobs , R) Lign ( |Dobs ,R) Lrest ( | Dobs , R) Graybill Conference: Partially Missing at Random 9 Rubin’s sufficient conditions for ignoring the mechanism • Missing data mechanism can be ignored for likelihood inference when – (a) the missing data are missing at random (MAR): f R|D ( R | Dobs , Dmis , ) f R|D ( R | Dobs , ) for all Dmis , – (b) distinctness of the parameters of the data model and the missing-data mechanism: ( , ) ; for Bayes, and a-priori independent • MAR is the key condition: without (b), inferences are valid but not fully efficient Graybill Conference: Partially Missing at Random 10 “Sufficient for ignorable” is not the same as “ignorable” • These definitions have come to define ignorability (e.g. Little and Rubin 2002) • However, Rubin (1976) described (a) and (b) as the "weakest simple and general conditions under which it is always appropriate to ignore the process that causes missing data". • These conditions are not necessary for ignoring the mechanism in all situations. MAR+distinctness ignorable ignorable MAR+distinctness Graybill Conference: Partially Missing at Random 11 Example 1: Nonresponse with Or whole auxiliary data population N Dobs ( Dresp , Daux ) Dresp ( yi1 , yi 2 ), i 1,..., m , Daux y*j1 , j 1,..., n Y1 R Y1 Y2 Daux includes the respondent values of Y1 , but we do not know which they are. Y1 , Y2 ~ ind f ( yi1 , yi 2 | ) Pr(ri 1| yi1 , yi 2 , ) g ( yi1 , ) 0 0 0 1 1 ? ? ? ? Not linked Not MAR -- yi1 missing for nonrespondents i But... mechanism is ignorable, does not need to be modeled: Marginal distribution of Y1 estimated from Daux Conditional of Y2 given Y1 estimated from D resp Graybill Conference: Partially Missing at Random 12 MAR, ignorability for parameter subsets • MAR and ignorability are defined in terms of the complete set of parameters in the data model for D • It would be useful to have a definition of MAR that applies to subsets of parameters, including parameters of substantive interest. • A trivial example: It seems plausible that a nonignorable mechanism would be MAR for the parameters of distributions of variables that are not missing. Graybill Conference: Partially Missing at Random 13 MAR, ignorability for parameter subsets =(1 , 2 ) Mechanism is partially MAR for likelihood inference about 1 , denoted P-MAR(1 ), if: L(1 , 2 , | Dobs , R) Lign (1 | Dobs , R) Lrest ( 2 , | Dobs , R) for all 1 , 2 , Mechanism is IGN(1 ) if MAR(1 ) and 1 and ( 2 , ) distinct Graybill Conference: Partially Missing at Random 14 MAR, ignorability for parameter subsets Special case where 1 = Mechanism is P-MAR( ) if: L( , | Dobs , R) Lign ( | Dobs , R) Lrest ( | Dobs , R) for all , A consequence of (but does not imply) Rubin's MAR condition IGN( ) if MAR( ) and and distinct Graybill Conference: Partially Missing at Random 15 Partial MAR given a function of mechanism Harel and Schafer (2009) define a different kind of Partial MAR: Mechanism is partially MAR given g ( R) if: P( R | Yobs , Ymis , g ( R ), , ) P ( R | Yobs , g ( R), , ) for all , , R, Yobs Here "partial" relates to the mechanism, In my definition "partial" relates to the parameters This ideas seems quite distinct Graybill Conference: Partially Missing at Random 16 Example 1: Auxiliary Survey Data Dobs ( Dresp , Daux ) Dresp ( yi1 , yi 2 ), i 1,..., m , Daux y*j1 , j 1,..., n Y1 R Y1 Y2 Daux includes the respondent values of Y1 , but we do not know which they are. D ( yi1 , yi 2 ), i 1,..., n} 0 0 0 1 1 Y1 , Y2 ~ f ( yi1 , yi 2 | ) Pr(ri 1| yi1 , yi 2 , ) g ( yi1 , ) ? ? ? ? Not linked Easy to show that mechanism is P-MAR( ), and IGN( ) if , are distinct Graybill Conference: Partially Missing at Random 17 Ex. 2: MNAR Monotone Bivariate Data D ( yi1 , yi 2 ), i 1,..., n} Dobs ( yi1 , yi 2 ), i 1,..., m and yi1 , i m 1,..., n Y1 , Y2 ~ f ( yi1 , yi 2 | ) f ( yi1 | 1 ) f ( yi 2 | yi1 , 2 ) Pr (ri 2 1| yi1 , yi 2 , ) g ( yi1 , yi 2 , ) (MNAR) M Y1 Y2 0 0 0 1 1 ? ? COMMENT: Clearly, inference about parameters 1 of the marginal distribution of Y1 can ignore mechanism, since Y1 has no missing values. In proposed definition, this mechanism is P-MAR(1 ), and IGN(1 ) if 1 and ( 2 , ) distinct • Paper presents more interesting case with Y1, Y2 blocks of variables and missing data in each block Graybill Conference: Partially Missing at Random 18 More generally… (Y1 , R (1) ),(Y2 , R (2) ) blocks of incomplete variables, and f ( y1i , y2i , ri(1) , ri(2) ) f1 ( y1i | 1 ) Pr(ri(1) | y1i , 1 ) f1 ( y2i | y1i , 2 ) Pr(ri(2) | ri(1) , y1i , y2i , 2 ) Assume: Pr(ri(1) | y1i ;1 ) g1 ( y1,obs,i ,1 ) for all y1,mis,i , Pr(ri(2) | ri(1) , y1i , y2i ;2 ) g 2 (ri(1) , y1i , y2i , 2 ), Mechanism is P-MAR(1 ), IGN(1 ) if 1 and ( 2 , 1 , 2 ) are distinct Graybill Conference: Partially Missing at Random 19 Ex. 3: Complete Case Analysis in Regression D ( yi1 , yi 2 ), i 1,..., n} Dobs ( yi1 , yi 2 ), i 1,..., m Y1 , Y2 ~ f ( yi1 , yi 2 | ) Pr(ri 1| yi1 , yi 2 , ) g ( yi1 , ) MNAR, but inference about parameters of conditional distribution of Y2 given Y1 based on R Y1 Y2 0 0 0 0 1 1 ? ? ? ? complete cases is valid, ignoring the mechanism. Let f ( yi1 , yi 2 | ) f1 ( yi1 | 1 ) f 2 ( yi 2 | yi1 , 2 ) L(1 , 2 , | Dobs , R) const. L1 ( 2 | Dobs ) L2 (1 , | Dobs , R), where r L1 ( 2 | Dobs ) f 2 ( yi 2 | yi1 , 2 ) i 1 MNAR, but P-MAR( 2 ), and IGN( 2 ) if 2 , (1 , ) distinct Graybill Conference: Partially Missing at Random 20 Ex. 4:A normal pattern-mixture model Dobs ( yi1 , yi 2 ), i 1,..., m and yi1 , i m 1,..., n f ( D, R2 | , ) f D|R ( D | R2 , ) f R ( R2 | ) ( yi1 , yi 2 | ri 2 j , ) ~ ind G ( ( j ) , ( j ) ), j 0,1, ri 2 ~ ind Bern( ) Assume Pr(ri 2 1| yi1 , yi 2 ) g ( yi 2 ), g unknown (MNAR) R2 Y1 Y2 0 0 0 1 1 ? ? COMMENT: Distribution of Y1 given Y2 and R2 is independent of R2 , so it can be estimated from complete cases, ignoring the mechanism 12 ( 120 , 122 , 112 ), 2(0) , 22(0) , 1(1) , 11(1) L( , | Dobs , R2 ) const. L1 (12 | Dobs , R2 ) L2 ( , | Dobs , R2 ), where m L1 (1 | Dobs ) f12 ( yi1 | yi 2 ,12 ) i 1 MNAR, but P-MAR(12 ), not IGN(12 ) since 12 and are not distinct Graybill Conference: Partially Missing at Random 21 uw Ex. 5: Subsample ignorable likelihood Pattern Z W X Y Little and Zhang (2011) P1 √ √ ? ? P2 √ ? ? Columns could be vectors √ = fully observed ? = observed or missing ? • Interest concerns parameters 1 of regression of Y on (Z,X,W) • Z complete, W and (X,Y) incomplete. W complete in P1. • Division of covariates into W, X is based on following MNAR assumptions about the missing data mechanism: • Pr(W complete) = fn(W,X,Z) (not Y) (X,Y) MAR in subsample with W fully observed (that is, P1) This mechanism is P-MAR(1 );corresponding analysis is to apply an ignorable likelihood method, discarding data in P2 Graybill Conference: Partially Missing at Random 22 Ex. 6: Auxiliary data, survey nonresponse D ( yi1 , yi 2 , yi 3 ), i 1,..., n} Dobs ( Dresp , Daux ) Dresp ( yi1 , yi 2 , yi 3 ), i 1,..., r , ( yi1 ), i r 1,..., n , Daux y , j 1,..., N , N = population size * j2 Y1 , Y2 , Y3 ~ f ( yi1 , yi 2 , yi 3 | ) Pr(mi 1| yi1 , yi 2 , yi 3 , ) g ( yi1 , yi 2 , ) Y2 Y1 Y2 Y3 1 . . r . . n . . N ? ? ? ? Not linked NOT MAR -- yi 2 missing for nonrespondents But mechanism is P-MAR( ) if g ( yi1 , yi 2 , ) additive function of (yi1 , yi 2 ) Marginal of Y2 from Daux , marginal of Y1 from Dresp Conditional of Y3 given Y1 ,Y2 from complete cases in Dresp Graybill Conference: Partially Missing at Random 23 Simulation Study [Y1 , Y2 , Y3 , M ] [Y1 , Y2 ][Y3 | Y1 , Y2 ][ M | Y1 , Y2 , Y3 ] [Y1 , Y2 ] multinomial [Y3 | Y1 , Y2 ] generated as log it Pr(Y3 1| Y1 , Y2 ) 0.5 1Y1 2Y2 12Y1 * Y2 [ M | Y1 , Y2 ] generated as log it Pr( M 1| Y1 , Y2 ) 0.5 1Y1 2Y2 12Y1 * Y2 Each j , j set to zero or two (various combinations) N 100,000, n 200, 1000 and 10, 000 Graybill Conference: Partially Missing at Random 24 Simulation Study: methods CC: Complete Case estimates based on the responding units M1: ML based on a logistic regression with interaction for Y3 M2: ML based on an additive logistic regression for Y3 NR: Weighting class estimates where nonresponse weights are obtained based on Y1 PS: Post-stratification weighted estimates (PS) based on Y2 NRPS: Adjust weights using both Y1 and Y2. For the case of categorical variable, this method is equivalent to Linear Calibration regression, or Generalized Raking estimates Graybill Conference: Partially Missing at Random 25 Graybill Conference: Partially Missing at Random 26 Simulation: summary findings • When response depends on Y1 *Y2 interaction, all methods do poorly • When data are MCAR, all methods do similarly well • Model-based methods remove almost all the bias and perform better when response doesn’t depend on Y1 *Y2 interaction • Qualitative patterns hold for different sample sizes Graybill Conference: Partially Missing at Random 27 Frequentist inference • Rubin’s (1976) sufficient conditions for ignorability for frequentist inference were even stronger (essentially MCAR) • These can be weakened too – for example asymptotic frequentist inference based on ML and observed information matrix works under conditions given here • Small sample inference seems more problematic Graybill Conference: Partially Missing at Random 28 Frequentist inference • Rubin’s (1976) sufficient conditions for ignorability for frequentist inference were even stronger (essentially MCAR) • These can be weakened too – for example asymptotic frequentist inference based on ML and observed information matrix works under conditions given here • Small sample inference is more complex Graybill Conference: Partially Missing at Random 29 Summary • Proposed definitions of partial MAR, ignorability for subsets of parameters • Expands range of situations where missing data mechanism can be ignored • Though, in some cases, MAR analysis entails a loss of information – – How much is lost is an interesting question, varies by context Graybill Conference: Partially Missing at Random 30 References Harel, O. and Schafer, J.L. (2009). Partial and Latent Ignorability in missing data problems. Biometrika, 2009, 1-14 Little, R.J.A. (1993). Pattern-Mixture Models for Multivariate Incomplete Data. JASA, 88, 125-134. Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.) Wiley. Little, R.J. and Zangeneh, S.Z. (2013). Missing at random and ignorability for inferences about subsets of parameters with missing data. University of Michigan Biostatistics Working Paper Series. Little, R. J. and Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. JRSSC, 60, 4, 591–605. Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63, 581-592. Graybill Conference: Partially Missing at Random 31