Estimating Phone Service and Usage Percentages:

advertisement
Estimating Phone Service
and Usage Percentages:
How to Weight the Data from a Local,
Dual-Frame Sample Survey
of Cellphone and Landline Telephone Users
in the United States
Thomas M. Guterbock
TomG@virignia.edu
Presented at
AAPOR 2009
Hollywood, FL
May 14, 2009
The Problem
• Dual-frame telephone surveys are becoming more
prevalent in U.S. survey research
– The rising percentages and distinctive demographics of
cellphone-only [CPO] households make it imperative
that sample designs cover them.
– Landline RDD + Cellphone RDD sample frames
• Result: sample data for 3 phone-service segments
– CPO; overlap (dual-phone); landline-only [LLO]
• Problem: what is the correct population
distribution across 3 phone service segments?
Center for Survey Research
University of Virginia
2
National data? No problem
• National Health Interview Survey [NHIS] data are
the ‘gold standard’
– Uses a very large N, continuous sampling, in-person
mode to establish household phone service.
– NHIS provides fairly current data on cellphone
coverage, percent CPO, phone segment distributions
• NHIS data are available for the U.S. & for four
census regions
– State estimates released in 2009 using CPS + NHIS
• SOLUTION: Weight phone-service segments in
the national sample to NHIS percents for U.S.
Center for Survey Research
University of Virginia
3
What about local studies?
• We cannot assume that the local phone-service segment
distribution is the same as national or regional averages.
• Cellphone penetration and CPO lifestyle adoption vary
considerably across areas.
• Cell penetration is higher in high density areas, metro
areas, high-income areas, flat terrain, near interstates
• CPO percentage varies with age, ethnicity, urbanicity,
landline phone costs
• NHIS: strong phone service variation across regions, states
– Variation within states is probably similar in magnitude
Center for Survey Research
University of Virginia
4
Why not use percents from
the local sample data?
•
•
In a local dual-frame sample, we will directly
observe % CPO in the cell sample, % LLO in the
landline sample.
But estimation from these observed percents is
problematic for several reasons:
1) If we just combine the two samples, we overlook the
fact that overlap households are double-sampled.
2) It’s not intuitively obvious how to calculate the
percentages for the combined sample from the split
sample results.
Center for Survey Research
University of Virginia
5
Why not use percents from
the local sample data?
3) Cellphone-only cases are substantially
overcounted in a cellphone sample.
•
CPOs have different telephone behaviors. More likely
than dual-phone users . . .
•
•
•
To have phone with them
To have phone turned on
To accept calls from unknown numbers
4) Cellphone samples are usually kept small
because of higher per-completion cost
•
So we can’t just add up the segment counts from the
two samples.
Center for Survey Research
University of Virginia
6
Can we use the local sample data?
• Collected data from the two realized, local
samples surely contain useful information about
local phone-service segments
• Overcounts of CPO and LLO distort these data
• We have to do the math correctly
• IDEA: Estimate the amount of CPO and LLO
overcount in national dual-frame studies, and then
apply an adjustment to the local sample data to
arrive at local estimates for %CPO and %LLO
Center for Survey Research
University of Virginia
7
Overview: A proposed solution
• Develop algebraic solution for combining the two sample
results from a dual-frame design into an overall phone
service segment distribution, assuming equal response
rates.
• Develop algebraic solution for combining the two samples
when response rates are NOT equal
– higher response rates (overcounts) are assumed for CPO and LLO
(compared to overlap)
• Compare 2007 CHIS to 2007 NHIS (West region) to
estimate ‘response rate ratios’ that correspond to the
observed overcount
• Apply these ratios to newly collected dual-frame survey
data from three counties in Virginia
– Result: plausible, locality-specific estimates of phone segments
Center for Survey Research
University of Virginia
8
Key assumptions
• Local phone-service segment distributions vary
– Forcing NHIS segment distributions onto local data
would distort results
• Response rate ratios (rates of overcount) are
constant across surveys
– If fielding and screening procedures are similar
• Sampling variability is ignorable
– In comparison of NHIS to CHIS
– In projection from the local samples to local population
Center for Survey Research
University of Virginia
9
How to combine dual-frame
sample results
(equal response rates)
Center for Survey Research
University of Virginia
The universe of telephone
households
100%
Cell phone samples include some
that are also in the RDD frame
81.1%
Cell
phones
(Frame 1)
Landlineonly
households
are excluded
RDD samples cover all
landline households
RDD
(Frame 2)
Cell-phoneonly households
are excluded
86.8%
RDD and Cell samples overlap,
yield complete coverage
a
LLO
LANDLINE
ONLY
OVERLAP
CPO
CELL ONLY
13.2%
PaT=.132
CELL +
LANDLINE
b
67.9%
These proportions define
the population distribution
of segments:
Pa T  Pb T  Pab T  1
RDD
PabT=.679
Cell
phones
ab
All percentages are from 2007 NHIS data (West region).
18.9%
PbT=.189
With equal response rates,
cell sample would show:
OVERLAP
a
CPO
PaT=.132
PabT=.679
81.1%
OVERLAP as
percent of
Frame 1
CPO as percent of
Frame 1
Pa′ =.132/.811
=.163
Pa  Pab  1
RDD
Pab′ =.679/.811
=.837
Cell
phones
All percentages are from 2007 NHIS data (West region).
LLO
LANDLINE
ONLY
PbT=.189
With equal response rates,
RDD sample would show:
a
86.8%
RDD
LLO
PbT=.189
OVERLAP
CPO
PaT=.132
PabT=.679
b
OVERLAP as
percent of
Frame 2
Pab″=.679/.868
Pb  Pab  1
Cell
phones
=.783
ab
All percentages are from 2007 NHIS data (West region).
LLO as percent
Of Frame 2
Pb″=.189/.868
=.218
So, if response rates were equal,
we would have . . .
True values
NHIS West 2007
CPO
13.2%
PaT
Overlap
67.9%
PabT
LLO
18.9%
PbT
Total
100.0%
Observed thru
Cell sample
Pa′
16.3%
Pab′
83.7%
100.0%
Observed thru
RDD sample
Pab″
78.3%
Pb″
21.7%
100.0%
How do we get from observed
percentages to population percents?
True values
NHIS West 2007
Observed thru
Cell sample
CPO
PaT
??
Pa′
16.3%
Overlap
PabT
??
Pab′
83.7%
LLO
PbT
??
Total
100.0%
100.0%
Observed thru
RDD sample
Pab″
78.3%
Pb″
21.7%
100.0%
Formulas for calculating
underlying population distribution
Pab T
1

1 Pab'  1 Pab''  1
PaT 
Pab T
Pab'
 (Pab T )
With PabT + PaT evaluated, we have:
.
PbT  1  PaT  Pab
Center for Survey Research
University of Virginia
19
Combining dual-frame sample
results when response rates
are not equal
Center for Survey Research
University of Virginia
Three segments, four response rates
a
RDD sample
response rate
for LLOs:
RDD
rb
Cell sample
response rate
for CPOs:
ra
Cell sample
response rate
for overlap:
rab′
b
Cell
phones
ab
RDD sample
response rate
for overlap:
rab″
4 response rates,
2 response rate ratios
• Reduction in base response for dual-phone in the
rab'
cell sample is:
r1 
ra
– This is the ‘response rate ratio’ that applies to the
cellphone sample.
• Reduction in base response for dual-phone in the
rab''
RDD sample is:
r2 
rb
– This is the response rate ratio for the RDD sample.
Center for Survey Research
University of Virginia
22
It follows that . . .
rab'  r1 (ra ); rab''  r2 (rb ).
• And our expressions for calculating true
population phone service segments are modified
by incorporating the response rate ratios:
Pab T
Pa 
1

r1 Pab'  r2 Pab''  1  r1  r2
r1Pab T
Pab'
 r1 (Pab T )
Pb  1  Pa  Pab
Center for Survey Research
University of Virginia
23
How to calculate
response rate ratios
• Now assume that we have observed results from a
dual-frame phone survey.
• We also know the true population distribution.
• We can calculate the response rate ratios:
r1 
(Pa T )Pab'
PabT  (PabT )Pab'
r2 
(PbT )Pab
PabT  (PabT )Pab
Center for Survey Research
University of Virginia
24
Deriving response rate ratios
by comparing
CHIS 2007 to NHIS
Center for Survey Research
University of Virginia
CHIS 2007
California Health Interview Survey
True values
NHIS West 2007
Observed thru
Cell sample
CPO
PaT
13.2%
Pa′
34.6%
≠16.3%
Overlap
PabT
67.9%
Pab′
65.4%
Pab″
68.3%
LLO
PbT
18.9%
Pb″
32.7%
Total
100.0%
≠21.7%
100.0%
100.0%
Observed thru
RDD sample
From these data
we can evaluate r1 and r2
r1 
r2 
(Pa T )Pab'
PabT  (PabT )Pab'
(PbT )Pab
PabT  (PabT )Pab
 .368
In the cellphone sample,
overlap response rate
is only 37% of CPO rate.
 .598
In the RDD sample,
overlap response rate
is about 60% of LLO rate.
• Overcount of CPOs is greater than overcount of LLOs.
This shows: many dual-phone users still use cellphone
as a secondary device.
Center for Survey Research
University of Virginia
27
Calculating local area estimates
of population phone-service
segment distributions
Center for Survey Research
University of Virginia
2008 Prince William County Survey
• Citizen satisfaction survey in large, suburban
county in Northern Virginia
• N = 1,666
• Triple frame design: cellphone, landline RDD, and
directory-listed sample
– Here we combine the landline samples and treat as a
dual-frame design
• Screening questions patterned after those on CHIS
Center for Survey Research
University of Virginia
29
2008 Results for Prince William County, VA
Observed thru
Cell sample
CPO
PaT
Pa′
40.6%
Overlap
PabT
Pab′
59.4%
LLO
PbT
Total
100.0%
100.0%
Observed thru
RDD sample
0.7%
Pab″
88.5%
Pb″
10.5%
100.0%
2008 Results for Prince William County, VA
True values
for PWC
Observed thru
Cell sample
CPO
PaT
??
Pa′
40.6%
Overlap
PabT
??
Pab′
59.4%
LLO
PbT
??
Total
100.0%
100.0%
Observed thru
RDD sample
0.7%
Pab″
88.5%
Pb″
10.5%
100.0%
Apply formulas given above:
Pab T
Pa 
1

 .753
r1 Pab'  r2 Pab''  1  r1  r2
r1Pab T
Pab'
 r1 (Pab T )  .190
Pb  1  Pa  Pab  .057
Calculations based on:
r1 = .368
r2 = .598
Center for Survey Research
University of Virginia
32
2008 Results for Prince William County, VA
True values
for PWC
Observed thru
Cell sample
CPO
PaT
19.0%
Pa′
40.6%
Overlap
PabT
75.3%
Pab′
59.4%
LLO
PbT
5.7%
Total
100.0%
100.0%
Observed thru
RDD sample
0.7%
Pab″
88.5%
Pb″
10.5%
100.0%
2008 Albemarle County Survey
• Citizen satisfaction survey
• Suburban and rural county surrounding City of
Charlottesville, VA
• Similar triple-frame design as in PWC survey
• Smaller sample size: n = 700
Center for Survey Research
University of Virginia
34
2008 Results for Albemarle County, VA
Observed thru
Cell sample
CPO
PaT
Pa′
21.9%
Overlap
PabT
Pab′
78.1%
LLO
PbT
Total
100.0%
100.0%
Observed thru
RDD sample
0.2%
Pab″
82.7%
Pb″
17.2%
100.0%
2008 Results for Albemarle County, VA
True values for
Albemarle
Observed thru
Cell sample
CPO
PaT
8.4%
Pa′
21.9%
Overlap
PabT
81.4%
Pab′
78.1%
LLO
PbT
10.2%
Total
100.0%
100.0%
Observed thru
RDD sample
0.2%
Pab″
82.7%
Pb″
17.2%
100.0%
2008 Chesterfield County Survey
• Citizen satisfaction survey
• Suburban county adjacent to Richmond, VA
• Similar triple-frame design as in PWC survey
– Treated as dual frame here
• n = 1600
Center for Survey Research
University of Virginia
37
2008 Results for Chesterfield County, VA
Observed thru
Cell sample
CPO
PaT
Pa′
20.4%
Overlap
PabT
Pab′
79.6%
LLO
PbT
Total
100.0%
100.0%
Observed thru
RDD sample
0.1%
Pab″
87.6%
Pb″
12.4%
100.0%
2008 Results for Chesterfield County, VA
True values for
Chesterfield
Observed thru
Cell sample
CPO
PaT
8.0%
Pa′
20.4%
Overlap
PabT
84.8%
Pab′
79.6%
LLO
PbT
7.2%
Total
100.0%
100.0%
Observed thru
RDD sample
0.1%
Pab″
87.6%
Pb″
12.4%
100.0%
Contrasting results
NHIS
CHIS
[= NHIS]
Prince
William
Albemarle
Chesterfield
CPO
PaT
13.2%
13.2%
19.0%
8.4%
8.0%
Overlap
PabT
67.9%
67.9%
75.3%
81.4%
84.8%
LLO
PbT
18.9%
18.9%
5.7%
10.2%
7.2%
Total
100.0%
100.0% 100.0%
100.0%
100.0%
Using the estimated segment
distribution to weight the
sample data
Center for Survey Research
University of Virginia
Example: PWC 2008
Observed thru
cell sample
Observed thru
RDD sample
CPO
76
40.6%
11
0.7%
87
5.3%
Overlap
111
59.4%
1303
88.5%
1414
85.4%
154
10.5%
154
9.3%
1468
100.0%
1655
100.0%
LLO
Total
187
100.0%
Combined sample
unweighted
3-segment weights: PWC 2008
Combined sample
unweighted
True
values for
PWC
Weight
Weighted N
CPO
87
5.3%
19.0%
3.61
314
19.0%
Overlap
1414
85.4%
75.3%
.88
1247
75.3%
LLO
154
9.3%
5.7%
.61
94
5.7%
Total
1655
100.0%
100.0%
1655
100.0%
But wait . . . We have 4 segments
Observed thru
cell sample
Observed thru
RDD sample
CPO
76
40.6%
11
Overlap
via cell
111
59.4%
0.7%
Combined sample
unweighted
87
5.3%
111
6.7%
Overlap
via RDD
1303
88.5%
1303
78.7
LLO
154
10.5%
154
9.3%
1468
100.0%
1655
100.0%
Total
187
100.0%
If 2 frames split the overlap equally:
Combined sample
unweighted
True
values for
PWC
Weight
Weighted N
CPO
87
5.3%
19.0%
3.61
314
19.0%
Overlap
via cell
111
6.7%
37.7%
5.62
623
37.7%
Overlap
via RDD
1303
78.7
37.7%
.48
623
37.7%
LLO
154
9.3%
5.7%
.61
94
5.7%
Total
1655
100.0%
100.0%
1655
100.0%
If overlap-cell segment gets weight = 2
Combined sample
unweighted
CPO
87
5.3%
Overlap
via cell
111
6.7%
True
values for
PWC
Weight
19.0%
3.61
314
19.0%
2.00
222
13.4%
.79
1025
61.9%
.61
94
5.7%
1655
100.0%
Weighted N
75.3%
Overlap
via RDD
1303
78.7
LLO
154
9.3%
5.7%
Total
1655
100.0%
100.0%
In Summary . . .
Center for Survey Research
University of Virginia
Problem and solution
• We don’t have ‘gold standard’ data by which to weight the
results of a dual-frame telephone survey in a local area
• Weighting to national or state averages might not be
accurate
• We developed needed formulas that relate observed
percentages to underlying population phone segment
distributions
• We calculated ‘response rate ratios’ by comparing CHIS
2007 to regional NHIS 2007 results.
• We applied these ratios to calculate underlying
distributions in three local telephone surveys
Center for Survey Research
University of Virginia
48
Results
• The estimates for three suburban counties in
Virginia are quite different from national phonesegment distributions—and from each other
– Cellphone penetration is higher in Northern Virginia
than in downstate suburbs, or in national estimates
– CPO lifestyle has been adopted by fewer people in the
downstate suburbs
• The estimates can guide weighting of sample data
– But we must use caution in weighting our cellphone
samples up too much
– Larger cellphone samples needed in the future
Center for Survey Research
University of Virginia
49
Future research
• This is a time of rapid change in the telephone
system
– We are just learning how to deal with the weighting
issues in cellphone surveys
• We need to look at optimization of our dual-frame
designs (cf. Hartley 1962)
• Estimates of response rate ratios can be updated
using more current national phone surveys
compared to NHIS
• Results would be strengthened if external local
data were available to validate the estimates
Center for Survey Research
University of Virginia
50
Estimating Phone Service
and Usage Percentages:
How to Weight the Data from a Local,
Dual-Frame Sample Survey
of Cellphone and Landline Telephone Users
in the United States
Thomas M. Guterbock
TomG@virignia.edu
Presented at
AAPOR 2009
Hollywood, FL
May 14, 2009
Download