Invited Lunchtime Session: Biemer

advertisement
A Latent Class Call-back Model
for Survey Nonresponse
Paul P. Biemer
RTI International and UNC-CH
Michael W. Link
Centers for Disease Control and Prevention
Outline
• Motivation for the study
• Early cooperator effects (ECE) in the Behavior
Risk Factor Surveillance System (BRFSS)
• Call-back models
– Manifest and latent
• Model extensions
• Application to the BRFSS
• Results
• Summary and conclusions
Terms and definitions
• Cooperators = units that will eventually respond
at some request or call-back
• Non-cooperators (also called hardcore
nonrespondents) = units that will not respond to
any call-back
Terms and definitions (cont’d)
• Early cooperator = Cooperators that respond at
early calls (say, 5 or less)
• Later cooperators = Cooperators that respond at
later calls (say, 6 or more)
• Early cooperator effect (ECE) = expected
difference in estimates based on early vs. early +
later cooperators (say, E( y5  y ) )
Response rates as a function of number of call
attempts
60
50
40
Int
30
Ref
NC
20
10
0
1-5
6-7
8-9
10-11
12-14
Number of call attempts
15+
Illustration 1- Have you ever been told by a doctor, nurse
or other health professional that you had asthma?
Number of call
attempts
1-5
1-15+
Percent “yes”
13.8
13.4
Small ECE  maximum of 5 calls is adequate
Illustration 2- During the past 12 months, have
you had a flu shot?
Number of call
attempts
1-5
1-15+
Percent “yes”
38.3
35.8
Larger ECE  max of 5 call attempts may be
biasing
Could consider other definitions of “early
cooperator.”
Why study ECE?
• Effort (and costs) could be saved if ECE is small
• If ECE is not small, adjustments may be applied
to reduce it
• May need to adjust for HCNRs, not only later
cooperators
What adjustments can be applied to reduce the
ECE?
• Nonresponse adjustments
– Requires characteristics of nonrespondents
– Lack of information a limitation for some surveys
• Post-stratification adjustments
– Requires known target population totals within adjustment
cells
– Variables limited to those available externally
• Call-back model adjustments
– Assumes response propensity is function of level of effort
required to obtain a response and grouping variables
– Related work of Drew and Fuller (1980), Politz and
Simmons (1949), others
ECE in the BRFSS
Survey details
• One of the largest RDD surveys in the world
• Estimates the prevalence of risk behaviors and
preventive health practices
• Monthly, state-based, cross-sectional survey
• Target population is adults in telephone hh’s
• Data source: 2004 survey with ~300,000
interviews
ECE in the BRFSS (cont’d)
• Early cooperator defined as responding with 5
fewer call attempts
• Examined differences in
– demographic characteristics
– 10 selected health characteristics overall and by
demographic domain
• ECE estimated by
y5  y
• Data weighted by base weights only
Typical Values of ECE
General
Health - Exc
Asthma
Drink
Alcohol
Flu Shot
Prevalence
Total
Male
21%
1.2
1.3
13%
0.3
0.1
53%
-2.2
-1.7
36%
2.6
2.9
Female
1.1
0.4
-2.0
2.2
White, nonHispanic
Black, nonHispanic
1.6
0.1
-2.7
2.4
2.5
0.6
-2.2
1.1
Hispanic
-0.7
1.5
-0.9
1.4
Typical Values of ECE (cont’d)
General
Health
Asthma
Drink
Alcohol
Influenza
Shot
< High school
1.1
1.4
-2.6
2.9
High school
1.4
0.1
-2.5
2.7
> High school
1.0
0.3
-1.7
2.4
One
2.1
0.1
-2.9
3.1
Two
1.1
0.2
-2.0
2.5
Three or more
0.5
0.9
-1.8
1.4
Education
Number of adults
Summary of the Results
• Early cooperators are different from later
cooperators on many dimensions
• For most characteristics ECE is relatively small
– Less than 3 percentage points at aggregate level
– Rarely more than 3 points for domains
• For some characteristics, ECE may be important
• Other definitions of ECE also considered
Hardcore Nonresponse Bias
• Hardcore Nonrespondents = Units that will not
respond under the current survey protocol no
matter the number of call-backs
• ECE does not include the bias due to hardcore
nonrespondents
• Total nonresponse bias = Bias due to cooperators
who did not respond + bias due to hardcore
nonrespondents
• Adjusting for ECE may not remove bias due to
HCNR
Call-back Models for Adjusting for ECE and
HCNR Bias
• General idea
– Estimate the response propensity for subgroups of the
population
– Response propensity is modeled as a function level of
effort (LOE) to obtain a response
• Two models are considered
– Manifest model (MM) – Ignores HCNR
– Latent class model (LCM) –Includes HCNR
• Includes a latent indicator variable to represent the
HCNR’s in the population
• Why latent?
Illustration for 5 Call-backs
Group A
Group B
Group B
11111
33111
33332
31111
33333
33333
33111
33311
33331
11111
33332
33332
33331
31111
33333
...
...
...
1 = interview; 2 = noninterview; 3 = noncontact
Illustration for 5 Call-backs
Group A
Group B
Group B
High response
propensity
Medium response
propensity
Low response
propensity
11111
33111
33332
31111
33333
33333
33111
33311
33331
11111
33332
33332
33331
31111
33333
...
...
...
1 = interview; 2 = noninterview; 3 = noncontact
Potential Advantages over Post-Stratification
• Post-stratification adjustments (PSA’s) depend
upon the availability of external benchmarks or
auxiliary data
– Selection of control variables is quite limited
– Target populations also quite limited
– Adjust for “ignorable” nonresponse only
Potential Advantages over Post-Stratification
• Call-back model can rely only on internal
variables
– Weighting classes can be defined for any variables
collected in the survey
– Can be applied for any target population
– Greater ability to selected variables that are highly
correlated with response propensity
– Adjust for “ignorable” and “nonignorable” nonresponse
Modeling Framework
• Simple random sampling
• Survey eligibility is known for all sample
members
• No right censoring
– (i.e., all noncontacts received maximum LOE)
Extensions to relax these assumptions are
described in the paper
Incorporating the Model-based Weights
Unadjusted estimator of the mean
K nrg
K
g 1 i 1
g 1
y  nr1  ygi  ˆ g yrg
Adjusted estimator of the mean
K
y   g yrg
g 1
Based on the
sample distribution
Estimated from
the model
Two Models for Estimating  g
MM (Manifest Model)
Assumes all nonrespondents would eventually
respond at some LOE (i.e., all nonrespondents have
a positive probability of response)
LCM (Latent class model)
Incorporates 0 probability of response for the
hardcore nonrespondents (HNCR’s)
Technical Details
Notation
l  1,..., L
Levels of effort (LOE)
ol  1,2,3
Outcome of LOE l where
l*
LOE associated with state S=1 or 2
g  1,..., K
1=interview, 2 = noninterview,
3=noncontact
Grouping variable (weighting class
variable)
Notation
 l *,1| g
 l *,2| g
 L ,3| g
Probability person in group g is interviewed at LOE l*
Probability person in group g is noninterviewed at LOE
l*
Probability person in group g is never contacted
*
n(l ,1, g )
Number of sample persons in group g interviewed at
LOE l*
n(l * ,2)
Number of sample persons noninterviewed at LOE l*
n( L,3)
Number of sample persons never contacted after L
(max LOE) attempts
General Idea –Outcome Patterns for 5 Call-backs
Cooperator
11111
31111
33111
33311
33331
22222
32222
33222
33322
33332
33333
 1,1| g ,x 2
 2,1| g ,x  2
 3,1| g ,x 2
 4,1| g ,x  2
 5,1| g ,x 2
 1,2| g ,x  2
 2,2| g ,x  2
 3,2| g ,x  2
 4,2| g ,x  2
 5,2| g ,x  2
 5,3| g ,x  2
HCNR
0
0
0
0
0
 1,2| g ,x 1
 2,2| g ,x 1
 3,2| g ,x 1
 4,2| g ,x 1
 5,2| g ,x 1
 5,3| g ,x 1
Likelihood for the Manifest Model
log ‹ ( )   n(l * ,1, g )log g l *,1| g
g ,l *
  n(l * ,2)log( g  l* ,2| g )
l* ,
g
 n( L,3)log( g  L ,3| g )
g
This model is appropriate when
(a) Every sample member has a positive probability of
responding at some LOE, or
(b) Adjustment for ECE only is desired
Likelihood for the Latent Class Model
log ‹ ( )   n(l * ,1, g )log g  x  2 l* ,1| g ,x  2
g ,l *
  n(l * ,2)log( x 1   x  2  g  l* ,2| g ,x  2 )
l*
g
 n( L,3)log( x 1   x  2  g  L ,3| g ,x  2 )
g
Introduces a latent variable X where X = 1, if HCNR and X = 2, if
otherwise
Appropriate when some sample members have a 0 probability of
responding and adjustment for total nonresponse (Later Cooperators
+ HCNR’s) is desired
Results
Four Estimators were Considered
• Unadjusted estimator
g
Estimator using LCM estimates of  g
Estimator using CPS estimates of  g
• Estimator using MM estimates of
•
•
– i.e., usual PSA estimator
– treated as the “gold standard”
Comparison of the ECE for a Maximum Five
Callbacks Strategy Before and After MM
Adjustment
Estimate
Unadjusted
Manifest Model
%
ECE
ECE
Excellent
20.7
-0.9
-0.6
Very good
33.1
-0.4
-0.4
Good
29.6
0.1
0.1
ALCOHOL
52.8
-2.2
-1.8
ASTHMA
13.4
0.3
0.5
DIABETES
8.8
0.7
0.3
FLUSHOT
35.8
2.5
-0.8
HLTHCOV
86.0
0.8
-1.3
PHYMO
18.7
2.2
0.9
GENHLTH
Differences between PSA and Unadjusted and
Adjusted Estimates for a Maximum Five Callbacks
PSA
Estimate
Diff
Diff
Diff
Unadj
MM
LCM
Excellent
20.7
-0.6
-0.3
-1.2
Very good
33.1
0.3
-0.1
-1.6
Good
29.6
-0.1
-0.1
1.0
ALCOHOL
52.8
-0.6
-0.3
-0.8
ASTHMA
13.4
-0.3
-0.1
0.1
DIABETES
8.8
0.9
0.4
0.6
FLUSHOT
35.8
4.2
0.9
-0.1
HLTHCOV
86.0
2.7
0.6
-2.5
PHYMO
18.7
1.9
0.7
-0.2
GENHLTH
Estimating the Potential Bias Reduction
• BRFSS data do not exhibit very large
nonresponse biases
• Therefore, consider a variable, Y, that has
maximum nonresponse bias given the BRFSS
nonresponse rates
• To do this, we form
• Yg
 BRFSS response rate for group g
• Compute the relative difference between
unadjusted and adjusted estimates and the PSA
estimate of the mean of Y
Absolute Relative Differences (|RDL|) for Unadjusted
and Adjusted Estimators as a Function of Number of
Call-backs
No. of
Call-Backs
|RDU,L| (%)
|RDMM,L| (%)
|RDLCM,L| (%)
5
8.8
5.2
1.4
7
6.9
4.0
2.5
9
5.8
3.4
2.9
11
8.8
3.4
2.9
14
4.5
2.6
2.8
15
4.0
2.3
2.4
Conclusions
• ECE for 5 call-backs is generally small, but can be
moderately high for some characteristics
• The Manifest Model can be employed to reduce
ECE
• The Latent Class Model can be employed to
reduce total nonresponse bias (Later
Cooperators + HCNR bias)
• Future research should focus on
–
–
–
–
Variable selection
Comparisons of MSEs of the estimators
Small/medium size sample properties
Integration with other post-survey weight adjustments
Download