Estimating Phone Service and Usage Percentages: How to Weight the Data from a Local, Dual-Frame Sample Survey of Cellphone and Landline Telephone Users in the United States Thomas M. Guterbock TomG@virignia.edu Presented at AAPOR 2009 Hollywood, FL May 14, 2009 The Problem • Dual-frame telephone surveys are becoming more prevalent in U.S. survey research – The rising percentages and distinctive demographics of cellphone-only [CPO] households make it imperative that sample designs cover them. – Landline RDD + Cellphone RDD sample frames • Result: sample data for 3 phone-service segments – CPO; overlap (dual-phone); landline-only [LLO] • Problem: what is the correct population distribution across 3 phone service segments? Center for Survey Research University of Virginia 2 National data? No problem • National Health Interview Survey [NHIS] data are the ‘gold standard’ – Uses a very large N, continuous sampling, in-person mode to establish household phone service. – NHIS provides fairly current data on cellphone coverage, percent CPO, phone segment distributions • NHIS data are available for the U.S. & for four census regions – State estimates released in 2009 using CPS + NHIS • SOLUTION: Weight phone-service segments in the national sample to NHIS percents for U.S. Center for Survey Research University of Virginia 3 What about local studies? • We cannot assume that the local phone-service segment distribution is the same as national or regional averages. • Cellphone penetration and CPO lifestyle adoption vary considerably across areas. • Cell penetration is higher in high density areas, metro areas, high-income areas, flat terrain, near interstates • CPO percentage varies with age, ethnicity, urbanicity, landline phone costs • NHIS: strong phone service variation across regions, states – Variation within states is probably similar in magnitude Center for Survey Research University of Virginia 4 Why not use percents from the local sample data? • • In a local dual-frame sample, we will directly observe % CPO in the cell sample, % LLO in the landline sample. But estimation from these observed percents is problematic for several reasons: 1) If we just combine the two samples, we overlook the fact that overlap households are double-sampled. 2) It’s not intuitively obvious how to calculate the percentages for the combined sample from the split sample results. Center for Survey Research University of Virginia 5 Why not use percents from the local sample data? 3) Cellphone-only cases are substantially overcounted in a cellphone sample. • CPOs have different telephone behaviors. More likely than dual-phone users . . . • • • To have phone with them To have phone turned on To accept calls from unknown numbers 4) Cellphone samples are usually kept small because of higher per-completion cost • So we can’t just add up the segment counts from the two samples. Center for Survey Research University of Virginia 6 Can we use the local sample data? • Collected data from the two realized, local samples surely contain useful information about local phone-service segments • Overcounts of CPO and LLO distort these data • We have to do the math correctly • IDEA: Estimate the amount of CPO and LLO overcount in national dual-frame studies, and then apply an adjustment to the local sample data to arrive at local estimates for %CPO and %LLO Center for Survey Research University of Virginia 7 Overview: A proposed solution • Develop algebraic solution for combining the two sample results from a dual-frame design into an overall phone service segment distribution, assuming equal response rates. • Develop algebraic solution for combining the two samples when response rates are NOT equal – higher response rates (overcounts) are assumed for CPO and LLO (compared to overlap) • Compare 2007 CHIS to 2007 NHIS (West region) to estimate ‘response rate ratios’ that correspond to the observed overcount • Apply these ratios to newly collected dual-frame survey data from three counties in Virginia – Result: plausible, locality-specific estimates of phone segments Center for Survey Research University of Virginia 8 Key assumptions • Local phone-service segment distributions vary – Forcing NHIS segment distributions onto local data would distort results • Response rate ratios (rates of overcount) are constant across surveys – If fielding and screening procedures are similar • Sampling variability is ignorable – In comparison of NHIS to CHIS – In projection from the local samples to local population Center for Survey Research University of Virginia 9 How to combine dual-frame sample results (equal response rates) Center for Survey Research University of Virginia The universe of telephone households 100% Cell phone samples include some that are also in the RDD frame 81.1% Cell phones (Frame 1) Landlineonly households are excluded RDD samples cover all landline households RDD (Frame 2) Cell-phoneonly households are excluded 86.8% RDD and Cell samples overlap, yield complete coverage a LLO LANDLINE ONLY OVERLAP CPO CELL ONLY 13.2% PaT=.132 CELL + LANDLINE b 67.9% These proportions define the population distribution of segments: Pa T Pb T Pab T 1 RDD PabT=.679 Cell phones ab All percentages are from 2007 NHIS data (West region). 18.9% PbT=.189 With equal response rates, cell sample would show: OVERLAP a CPO PaT=.132 PabT=.679 81.1% OVERLAP as percent of Frame 1 CPO as percent of Frame 1 Pa′ =.132/.811 =.163 Pa Pab 1 RDD Pab′ =.679/.811 =.837 Cell phones All percentages are from 2007 NHIS data (West region). LLO LANDLINE ONLY PbT=.189 With equal response rates, RDD sample would show: a 86.8% RDD LLO PbT=.189 OVERLAP CPO PaT=.132 PabT=.679 b OVERLAP as percent of Frame 2 Pab″=.679/.868 Pb Pab 1 Cell phones =.783 ab All percentages are from 2007 NHIS data (West region). LLO as percent Of Frame 2 Pb″=.189/.868 =.218 So, if response rates were equal, we would have . . . True values NHIS West 2007 CPO 13.2% PaT Overlap 67.9% PabT LLO 18.9% PbT Total 100.0% Observed thru Cell sample Pa′ 16.3% Pab′ 83.7% 100.0% Observed thru RDD sample Pab″ 78.3% Pb″ 21.7% 100.0% How do we get from observed percentages to population percents? True values NHIS West 2007 Observed thru Cell sample CPO PaT ?? Pa′ 16.3% Overlap PabT ?? Pab′ 83.7% LLO PbT ?? Total 100.0% 100.0% Observed thru RDD sample Pab″ 78.3% Pb″ 21.7% 100.0% Formulas for calculating underlying population distribution Pab T 1 1 Pab' 1 Pab'' 1 PaT Pab T Pab' (Pab T ) With PabT + PaT evaluated, we have: . PbT 1 PaT Pab Center for Survey Research University of Virginia 19 Combining dual-frame sample results when response rates are not equal Center for Survey Research University of Virginia Three segments, four response rates a RDD sample response rate for LLOs: RDD rb Cell sample response rate for CPOs: ra Cell sample response rate for overlap: rab′ b Cell phones ab RDD sample response rate for overlap: rab″ 4 response rates, 2 response rate ratios • Reduction in base response for dual-phone in the rab' cell sample is: r1 ra – This is the ‘response rate ratio’ that applies to the cellphone sample. • Reduction in base response for dual-phone in the rab'' RDD sample is: r2 rb – This is the response rate ratio for the RDD sample. Center for Survey Research University of Virginia 22 It follows that . . . rab' r1 (ra ); rab'' r2 (rb ). • And our expressions for calculating true population phone service segments are modified by incorporating the response rate ratios: Pab T Pa 1 r1 Pab' r2 Pab'' 1 r1 r2 r1Pab T Pab' r1 (Pab T ) Pb 1 Pa Pab Center for Survey Research University of Virginia 23 How to calculate response rate ratios • Now assume that we have observed results from a dual-frame phone survey. • We also know the true population distribution. • We can calculate the response rate ratios: r1 (Pa T )Pab' PabT (PabT )Pab' r2 (PbT )Pab PabT (PabT )Pab Center for Survey Research University of Virginia 24 Deriving response rate ratios by comparing CHIS 2007 to NHIS Center for Survey Research University of Virginia CHIS 2007 California Health Interview Survey True values NHIS West 2007 Observed thru Cell sample CPO PaT 13.2% Pa′ 34.6% ≠16.3% Overlap PabT 67.9% Pab′ 65.4% Pab″ 68.3% LLO PbT 18.9% Pb″ 32.7% Total 100.0% ≠21.7% 100.0% 100.0% Observed thru RDD sample From these data we can evaluate r1 and r2 r1 r2 (Pa T )Pab' PabT (PabT )Pab' (PbT )Pab PabT (PabT )Pab .368 In the cellphone sample, overlap response rate is only 37% of CPO rate. .598 In the RDD sample, overlap response rate is about 60% of LLO rate. • Overcount of CPOs is greater than overcount of LLOs. This shows: many dual-phone users still use cellphone as a secondary device. Center for Survey Research University of Virginia 27 Calculating local area estimates of population phone-service segment distributions Center for Survey Research University of Virginia 2008 Prince William County Survey • Citizen satisfaction survey in large, suburban county in Northern Virginia • N = 1,666 • Triple frame design: cellphone, landline RDD, and directory-listed sample – Here we combine the landline samples and treat as a dual-frame design • Screening questions patterned after those on CHIS Center for Survey Research University of Virginia 29 2008 Results for Prince William County, VA Observed thru Cell sample CPO PaT Pa′ 40.6% Overlap PabT Pab′ 59.4% LLO PbT Total 100.0% 100.0% Observed thru RDD sample 0.7% Pab″ 88.5% Pb″ 10.5% 100.0% 2008 Results for Prince William County, VA True values for PWC Observed thru Cell sample CPO PaT ?? Pa′ 40.6% Overlap PabT ?? Pab′ 59.4% LLO PbT ?? Total 100.0% 100.0% Observed thru RDD sample 0.7% Pab″ 88.5% Pb″ 10.5% 100.0% Apply formulas given above: Pab T Pa 1 .753 r1 Pab' r2 Pab'' 1 r1 r2 r1Pab T Pab' r1 (Pab T ) .190 Pb 1 Pa Pab .057 Calculations based on: r1 = .368 r2 = .598 Center for Survey Research University of Virginia 32 2008 Results for Prince William County, VA True values for PWC Observed thru Cell sample CPO PaT 19.0% Pa′ 40.6% Overlap PabT 75.3% Pab′ 59.4% LLO PbT 5.7% Total 100.0% 100.0% Observed thru RDD sample 0.7% Pab″ 88.5% Pb″ 10.5% 100.0% 2008 Albemarle County Survey • Citizen satisfaction survey • Suburban and rural county surrounding City of Charlottesville, VA • Similar triple-frame design as in PWC survey • Smaller sample size: n = 700 Center for Survey Research University of Virginia 34 2008 Results for Albemarle County, VA Observed thru Cell sample CPO PaT Pa′ 21.9% Overlap PabT Pab′ 78.1% LLO PbT Total 100.0% 100.0% Observed thru RDD sample 0.2% Pab″ 82.7% Pb″ 17.2% 100.0% 2008 Results for Albemarle County, VA True values for Albemarle Observed thru Cell sample CPO PaT 8.4% Pa′ 21.9% Overlap PabT 81.4% Pab′ 78.1% LLO PbT 10.2% Total 100.0% 100.0% Observed thru RDD sample 0.2% Pab″ 82.7% Pb″ 17.2% 100.0% 2008 Chesterfield County Survey • Citizen satisfaction survey • Suburban county adjacent to Richmond, VA • Similar triple-frame design as in PWC survey – Treated as dual frame here • n = 1600 Center for Survey Research University of Virginia 37 2008 Results for Chesterfield County, VA Observed thru Cell sample CPO PaT Pa′ 20.4% Overlap PabT Pab′ 79.6% LLO PbT Total 100.0% 100.0% Observed thru RDD sample 0.1% Pab″ 87.6% Pb″ 12.4% 100.0% 2008 Results for Chesterfield County, VA True values for Chesterfield Observed thru Cell sample CPO PaT 8.0% Pa′ 20.4% Overlap PabT 84.8% Pab′ 79.6% LLO PbT 7.2% Total 100.0% 100.0% Observed thru RDD sample 0.1% Pab″ 87.6% Pb″ 12.4% 100.0% Contrasting results NHIS CHIS [= NHIS] Prince William Albemarle Chesterfield CPO PaT 13.2% 13.2% 19.0% 8.4% 8.0% Overlap PabT 67.9% 67.9% 75.3% 81.4% 84.8% LLO PbT 18.9% 18.9% 5.7% 10.2% 7.2% Total 100.0% 100.0% 100.0% 100.0% 100.0% Using the estimated segment distribution to weight the sample data Center for Survey Research University of Virginia Example: PWC 2008 Observed thru cell sample Observed thru RDD sample CPO 76 40.6% 11 0.7% 87 5.3% Overlap 111 59.4% 1303 88.5% 1414 85.4% 154 10.5% 154 9.3% 1468 100.0% 1655 100.0% LLO Total 187 100.0% Combined sample unweighted 3-segment weights: PWC 2008 Combined sample unweighted True values for PWC Weight Weighted N CPO 87 5.3% 19.0% 3.61 314 19.0% Overlap 1414 85.4% 75.3% .88 1247 75.3% LLO 154 9.3% 5.7% .61 94 5.7% Total 1655 100.0% 100.0% 1655 100.0% But wait . . . We have 4 segments Observed thru cell sample Observed thru RDD sample CPO 76 40.6% 11 Overlap via cell 111 59.4% 0.7% Combined sample unweighted 87 5.3% 111 6.7% Overlap via RDD 1303 88.5% 1303 78.7 LLO 154 10.5% 154 9.3% 1468 100.0% 1655 100.0% Total 187 100.0% If 2 frames split the overlap equally: Combined sample unweighted True values for PWC Weight Weighted N CPO 87 5.3% 19.0% 3.61 314 19.0% Overlap via cell 111 6.7% 37.7% 5.62 623 37.7% Overlap via RDD 1303 78.7 37.7% .48 623 37.7% LLO 154 9.3% 5.7% .61 94 5.7% Total 1655 100.0% 100.0% 1655 100.0% If overlap-cell segment gets weight = 2 Combined sample unweighted CPO 87 5.3% Overlap via cell 111 6.7% True values for PWC Weight 19.0% 3.61 314 19.0% 2.00 222 13.4% .79 1025 61.9% .61 94 5.7% 1655 100.0% Weighted N 75.3% Overlap via RDD 1303 78.7 LLO 154 9.3% 5.7% Total 1655 100.0% 100.0% In Summary . . . Center for Survey Research University of Virginia Problem and solution • We don’t have ‘gold standard’ data by which to weight the results of a dual-frame telephone survey in a local area • Weighting to national or state averages might not be accurate • We developed needed formulas that relate observed percentages to underlying population phone segment distributions • We calculated ‘response rate ratios’ by comparing CHIS 2007 to regional NHIS 2007 results. • We applied these ratios to calculate underlying distributions in three local telephone surveys Center for Survey Research University of Virginia 48 Results • The estimates for three suburban counties in Virginia are quite different from national phonesegment distributions—and from each other – Cellphone penetration is higher in Northern Virginia than in downstate suburbs, or in national estimates – CPO lifestyle has been adopted by fewer people in the downstate suburbs • The estimates can guide weighting of sample data – But we must use caution in weighting our cellphone samples up too much – Larger cellphone samples needed in the future Center for Survey Research University of Virginia 49 Future research • This is a time of rapid change in the telephone system – We are just learning how to deal with the weighting issues in cellphone surveys • We need to look at optimization of our dual-frame designs (cf. Hartley 1962) • Estimates of response rate ratios can be updated using more current national phone surveys compared to NHIS • Results would be strengthened if external local data were available to validate the estimates Center for Survey Research University of Virginia 50 Estimating Phone Service and Usage Percentages: How to Weight the Data from a Local, Dual-Frame Sample Survey of Cellphone and Landline Telephone Users in the United States Thomas M. Guterbock TomG@virignia.edu Presented at AAPOR 2009 Hollywood, FL May 14, 2009