A Latent Class Call-back Model for Survey Nonresponse Paul P. Biemer RTI International and UNC-CH Michael W. Link Centers for Disease Control and Prevention Outline • Motivation for the study • Early cooperator effects (ECE) in the Behavior Risk Factor Surveillance System (BRFSS) • Call-back models – Manifest and latent • Model extensions • Application to the BRFSS • Results • Summary and conclusions Terms and definitions • Cooperators = units that will eventually respond at some request or call-back • Non-cooperators (also called hardcore nonrespondents) = units that will not respond to any call-back Terms and definitions (cont’d) • Early cooperator = Cooperators that respond at early calls (say, 5 or less) • Later cooperators = Cooperators that respond at later calls (say, 6 or more) • Early cooperator effect (ECE) = expected difference in estimates based on early vs. early + later cooperators (say, E( y5 y ) ) Response rates as a function of number of call attempts 60 50 40 Int 30 Ref NC 20 10 0 1-5 6-7 8-9 10-11 12-14 Number of call attempts 15+ Illustration 1- Have you ever been told by a doctor, nurse or other health professional that you had asthma? Number of call attempts 1-5 1-15+ Percent “yes” 13.8 13.4 Small ECE maximum of 5 calls is adequate Illustration 2- During the past 12 months, have you had a flu shot? Number of call attempts 1-5 1-15+ Percent “yes” 38.3 35.8 Larger ECE max of 5 call attempts may be biasing Could consider other definitions of “early cooperator.” Why study ECE? • Effort (and costs) could be saved if ECE is small • If ECE is not small, adjustments may be applied to reduce it • May need to adjust for HCNRs, not only later cooperators What adjustments can be applied to reduce the ECE? • Nonresponse adjustments – Requires characteristics of nonrespondents – Lack of information a limitation for some surveys • Post-stratification adjustments – Requires known target population totals within adjustment cells – Variables limited to those available externally • Call-back model adjustments – Assumes response propensity is function of level of effort required to obtain a response and grouping variables – Related work of Drew and Fuller (1980), Politz and Simmons (1949), others ECE in the BRFSS Survey details • One of the largest RDD surveys in the world • Estimates the prevalence of risk behaviors and preventive health practices • Monthly, state-based, cross-sectional survey • Target population is adults in telephone hh’s • Data source: 2004 survey with ~300,000 interviews ECE in the BRFSS (cont’d) • Early cooperator defined as responding with 5 fewer call attempts • Examined differences in – demographic characteristics – 10 selected health characteristics overall and by demographic domain • ECE estimated by y5 y • Data weighted by base weights only Typical Values of ECE General Health - Exc Asthma Drink Alcohol Flu Shot Prevalence Total Male 21% 1.2 1.3 13% 0.3 0.1 53% -2.2 -1.7 36% 2.6 2.9 Female 1.1 0.4 -2.0 2.2 White, nonHispanic Black, nonHispanic 1.6 0.1 -2.7 2.4 2.5 0.6 -2.2 1.1 Hispanic -0.7 1.5 -0.9 1.4 Typical Values of ECE (cont’d) General Health Asthma Drink Alcohol Influenza Shot < High school 1.1 1.4 -2.6 2.9 High school 1.4 0.1 -2.5 2.7 > High school 1.0 0.3 -1.7 2.4 One 2.1 0.1 -2.9 3.1 Two 1.1 0.2 -2.0 2.5 Three or more 0.5 0.9 -1.8 1.4 Education Number of adults Summary of the Results • Early cooperators are different from later cooperators on many dimensions • For most characteristics ECE is relatively small – Less than 3 percentage points at aggregate level – Rarely more than 3 points for domains • For some characteristics, ECE may be important • Other definitions of ECE also considered Hardcore Nonresponse Bias • Hardcore Nonrespondents = Units that will not respond under the current survey protocol no matter the number of call-backs • ECE does not include the bias due to hardcore nonrespondents • Total nonresponse bias = Bias due to cooperators who did not respond + bias due to hardcore nonrespondents • Adjusting for ECE may not remove bias due to HCNR Call-back Models for Adjusting for ECE and HCNR Bias • General idea – Estimate the response propensity for subgroups of the population – Response propensity is modeled as a function level of effort (LOE) to obtain a response • Two models are considered – Manifest model (MM) – Ignores HCNR – Latent class model (LCM) –Includes HCNR • Includes a latent indicator variable to represent the HCNR’s in the population • Why latent? Illustration for 5 Call-backs Group A Group B Group B 11111 33111 33332 31111 33333 33333 33111 33311 33331 11111 33332 33332 33331 31111 33333 ... ... ... 1 = interview; 2 = noninterview; 3 = noncontact Illustration for 5 Call-backs Group A Group B Group B High response propensity Medium response propensity Low response propensity 11111 33111 33332 31111 33333 33333 33111 33311 33331 11111 33332 33332 33331 31111 33333 ... ... ... 1 = interview; 2 = noninterview; 3 = noncontact Potential Advantages over Post-Stratification • Post-stratification adjustments (PSA’s) depend upon the availability of external benchmarks or auxiliary data – Selection of control variables is quite limited – Target populations also quite limited – Adjust for “ignorable” nonresponse only Potential Advantages over Post-Stratification • Call-back model can rely only on internal variables – Weighting classes can be defined for any variables collected in the survey – Can be applied for any target population – Greater ability to selected variables that are highly correlated with response propensity – Adjust for “ignorable” and “nonignorable” nonresponse Modeling Framework • Simple random sampling • Survey eligibility is known for all sample members • No right censoring – (i.e., all noncontacts received maximum LOE) Extensions to relax these assumptions are described in the paper Incorporating the Model-based Weights Unadjusted estimator of the mean K nrg K g 1 i 1 g 1 y nr1 ygi ˆ g yrg Adjusted estimator of the mean K y g yrg g 1 Based on the sample distribution Estimated from the model Two Models for Estimating g MM (Manifest Model) Assumes all nonrespondents would eventually respond at some LOE (i.e., all nonrespondents have a positive probability of response) LCM (Latent class model) Incorporates 0 probability of response for the hardcore nonrespondents (HNCR’s) Technical Details Notation l 1,..., L Levels of effort (LOE) ol 1,2,3 Outcome of LOE l where l* LOE associated with state S=1 or 2 g 1,..., K 1=interview, 2 = noninterview, 3=noncontact Grouping variable (weighting class variable) Notation l *,1| g l *,2| g L ,3| g Probability person in group g is interviewed at LOE l* Probability person in group g is noninterviewed at LOE l* Probability person in group g is never contacted * n(l ,1, g ) Number of sample persons in group g interviewed at LOE l* n(l * ,2) Number of sample persons noninterviewed at LOE l* n( L,3) Number of sample persons never contacted after L (max LOE) attempts General Idea –Outcome Patterns for 5 Call-backs Cooperator 11111 31111 33111 33311 33331 22222 32222 33222 33322 33332 33333 1,1| g ,x 2 2,1| g ,x 2 3,1| g ,x 2 4,1| g ,x 2 5,1| g ,x 2 1,2| g ,x 2 2,2| g ,x 2 3,2| g ,x 2 4,2| g ,x 2 5,2| g ,x 2 5,3| g ,x 2 HCNR 0 0 0 0 0 1,2| g ,x 1 2,2| g ,x 1 3,2| g ,x 1 4,2| g ,x 1 5,2| g ,x 1 5,3| g ,x 1 Likelihood for the Manifest Model log ‹ ( ) n(l * ,1, g )log g l *,1| g g ,l * n(l * ,2)log( g l* ,2| g ) l* , g n( L,3)log( g L ,3| g ) g This model is appropriate when (a) Every sample member has a positive probability of responding at some LOE, or (b) Adjustment for ECE only is desired Likelihood for the Latent Class Model log ‹ ( ) n(l * ,1, g )log g x 2 l* ,1| g ,x 2 g ,l * n(l * ,2)log( x 1 x 2 g l* ,2| g ,x 2 ) l* g n( L,3)log( x 1 x 2 g L ,3| g ,x 2 ) g Introduces a latent variable X where X = 1, if HCNR and X = 2, if otherwise Appropriate when some sample members have a 0 probability of responding and adjustment for total nonresponse (Later Cooperators + HCNR’s) is desired Results Four Estimators were Considered • Unadjusted estimator g Estimator using LCM estimates of g Estimator using CPS estimates of g • Estimator using MM estimates of • • – i.e., usual PSA estimator – treated as the “gold standard” Comparison of the ECE for a Maximum Five Callbacks Strategy Before and After MM Adjustment Estimate Unadjusted Manifest Model % ECE ECE Excellent 20.7 -0.9 -0.6 Very good 33.1 -0.4 -0.4 Good 29.6 0.1 0.1 ALCOHOL 52.8 -2.2 -1.8 ASTHMA 13.4 0.3 0.5 DIABETES 8.8 0.7 0.3 FLUSHOT 35.8 2.5 -0.8 HLTHCOV 86.0 0.8 -1.3 PHYMO 18.7 2.2 0.9 GENHLTH Differences between PSA and Unadjusted and Adjusted Estimates for a Maximum Five Callbacks PSA Estimate Diff Diff Diff Unadj MM LCM Excellent 20.7 -0.6 -0.3 -1.2 Very good 33.1 0.3 -0.1 -1.6 Good 29.6 -0.1 -0.1 1.0 ALCOHOL 52.8 -0.6 -0.3 -0.8 ASTHMA 13.4 -0.3 -0.1 0.1 DIABETES 8.8 0.9 0.4 0.6 FLUSHOT 35.8 4.2 0.9 -0.1 HLTHCOV 86.0 2.7 0.6 -2.5 PHYMO 18.7 1.9 0.7 -0.2 GENHLTH Estimating the Potential Bias Reduction • BRFSS data do not exhibit very large nonresponse biases • Therefore, consider a variable, Y, that has maximum nonresponse bias given the BRFSS nonresponse rates • To do this, we form • Yg BRFSS response rate for group g • Compute the relative difference between unadjusted and adjusted estimates and the PSA estimate of the mean of Y Absolute Relative Differences (|RDL|) for Unadjusted and Adjusted Estimators as a Function of Number of Call-backs No. of Call-Backs |RDU,L| (%) |RDMM,L| (%) |RDLCM,L| (%) 5 8.8 5.2 1.4 7 6.9 4.0 2.5 9 5.8 3.4 2.9 11 8.8 3.4 2.9 14 4.5 2.6 2.8 15 4.0 2.3 2.4 Conclusions • ECE for 5 call-backs is generally small, but can be moderately high for some characteristics • The Manifest Model can be employed to reduce ECE • The Latent Class Model can be employed to reduce total nonresponse bias (Later Cooperators + HCNR bias) • Future research should focus on – – – – Variable selection Comparisons of MSEs of the estimators Small/medium size sample properties Integration with other post-survey weight adjustments