Using probability-based online samples to calibrate non

advertisement
Using Probability-Based Online Samples to
Calibrate Non-Probability Opt-In Samples
Charles DiSogra, Curtiss L. Cobb, Elisa Chan and J. Michael Dennis
67th Annual Conference of the American Association for Public Opinion Research (AAPOR)
May 19, 2012, Orlando, FL.
© GfK 2012 | Title of presentation | DD. Month 2012
1
Outline
 Probability and non-probability online panel samples
 Situations for blending panel samples
 Early adopter attitudes
 Technique for calibrating
 Quantitative evaluation of bias (8 examples)
 Conclusions & Future Research
DiSogra, C. et al. Calibrating Non-Probability Internet Samples with Probability Samples Using
Early Adopter Characteristics. In 2011 JSM Proceedings, Survey Methods Section. Alexandria,
VA: American Statistical Association.
© GfK 2012 | Title of presentation | DD. Month 2012
2
2
Types of Web panels
Probability-based
 Recruited with probability samples (no non-sampled volunteers)
• Area-based, in-person methods
• Random-digit dial (RDD) or dual frame with a cell phone component
• Address-based sampling (ABS)
 Panel members have a known selection probability
 Due to recruitment costs, current panels are of modest size (range 2,000-55,000)
 Used for prevalence estimates and more rigorous survey research
Non-probability-based
 Membership consists of people on the web who volunteer or “opt-in” to join
• Advertisements or e-mail marketing
• Aggregator sites (e.g. surveyclub.com, paidsurveyworld.net, getpaidsurveys.com)
 Convenience samples with unknown selection bias
 Large memberships can be a million or much more
 Used widely by market researchers to reach target audiences
© GfK 2012 | Title of presentation | DD. Month 2012
3
3
A question of accuracy
© GfK 2012 | Title of presentation | DD. Month 2012
4
Accuracy of 2 probability samples – Internet panel and RDD –
compared to 7 non-probability Internet panel samples
Average absolute error for 13 secondary demographics and non-demographics,
(weighted)
Non-Probability Samples
Probability
Samples
Source: Yeager, Krosnick, et. al., 2011. Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys
Conducted with Probability and Non-Probability Samples. Public Opinion Quarterly. 75:709-747..
© GfK 2012 | Title of presentation | DD. Month 2012
5
55
Accuracy of 2 probability samples – Internet panel and RDD –
compared to 7 non-probability Internet panel samples
Average absolute error for 13 secondary demographics and non-demographics,
(weighted)
Non-Probability Samples
Probability
Samples
Source: Yeager, Krosnick, et. al., 2011. Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys
Conducted with Probability and Non-Probability Samples. Public Opinion Quarterly. 75:709-747..
© GfK 2012 | Title of presentation | DD. Month 2012
6
66
55,000+ members
Probability-based ABS recruitment
Recruitment takes place throughout the year
Representative of U.S. adults
Includes:
Adults with no Internet access (24% of adults)
• Provided laptop and free ISP
Cell phone only (30% of adults)
Spanish-language
Extensive profile data maintained on each member
• demographics, attitudes, behaviors, health, media usage, etc.
Samples from the panel are assigned to projects
• e-mail invitations and a link to the online survey questionnaire
© GfK 2012 | Title of presentation | DD. Month 2012
7
7
Situations for blending panel samples
Q. Why are we blending these two different samples?
A. Finite size of the probability-based panel
Small or rare populations
Some examples:
• Boat owners
• Recent college graduates
• Less-common medical conditions
• Viewers of specific niche cable networks
• Specific race/ethnic groups when large samples are required
Small geographic area samples
Some examples:
• Specific congressional districts
• Small media markets
• ZIP code clusters
© GfK 2012 | Title of presentation | DD. Month 2012
8
8
Solution to finite probability panel sample size
 Supplement probability sample with opt-in panel cases
Use quota sampling with opt-in cases to minimize demo skews/weights
 Calibrate the combined samples to the probability sample
Use “ancillary information” to minimize bias introduced from opt-in cases
Assumption A: The probability sample has the most accurate answer
Assumption B: The two samples consist of the same demographic
Assumption C: “ancillary information” differentiates the two samples
Assumption D: Weighting on “ancillary information” will align the combined
samples with the probability sample results
What ancillary information?
© GfK 2012 | Title of presentation | DD. Month 2012
9
9
Early adopter (EA) attitudes
Percent “Strongly agree / Agree”
Base (n)
ANES Web
Panel
Knowledge
Panel
Opt-In
Opt-In
Web Panel A
Web Panel B
(1,397)
(1,210)
(1,221)
(1,223)
I usually try new products before
other people do
I often try new brands because I like
variety and get bored with the same
old thing
When I shop I look for what is new
I like to be the first among my
friends and family to try something
new
I like to tell others about new brands
or technology
* p < .05 Difference compared to ANES Web Panel uses Fisher’s exact test
Completion rates: ANES 65.8% ; KN 63.7%; Opt-in A 4.6%; Opt-in B 4.7%
Dennis, J.M., Osborn, L., Semans, K (2009). Comparison Study: Early Adopter Attitudes and Online Behavior in
Probability
and Non-Probability
© GfK 2012 | Title
of presentation
| DD. Month 2012 Web Panels (Accuracy’s Impact on Research). Palo Alto, CA: Knowledge Networks.
10
10
10
Early adopter (EA) attitudes
Percent “Strongly agree / Agree”
ANES Web
Panel
Knowledge
Panel
Opt-In
Opt-In
Web Panel A
Web Panel B
(1,397)
(1,210)
(1,221)
(1,223)
I usually try new products before
other people do
26.4
24.0
44.2*
41.4*
I often try new brands because I like
variety and get bored with the same
old thing
36.6
34.1
52.0*
54.2*
When I shop I look for what is new
44.5
35.7*
55.2*
59.0*
I like to be the first among my
friends and family to try something
new
23.8
22.2
38.1*
39.6*
I like to tell others about new brands
or technology
51.8
45.0*
60.2*
62.1*
Base (n)
* p < .05 Difference compared to ANES Web Panel uses Fisher’s exact test
Completion rates: ANES 65.8% ; KN 63.7%; Opt-in A 4.6%; Opt-in B 4.7%
Dennis, J.M., Osborn, L., Semans, K (2009). Comparison Study: Early Adopter Attitudes and Online Behavior in
Probability
and Non-Probability
© GfK 2012 | Title
of presentation
| DD. Month 2012 Web Panels (Accuracy’s Impact on Research). Palo Alto, CA: Knowledge Networks.
11
11
11
Early adopters by demo group by panel
60
Race/Ethnicity
Gender
Age Group
Education
50
40
30
20
10
0
White
African Hispanic
American
Male
Female
ANES
© GfK 2012 | Title of presentation | DD. Month 2012
12
18-24
KP
KN
25-34
Opt-In A
35-44
45-54
55+
HS or Less
Some
College
BA/BS or
More
Opt-In B
12
12
What is calibration weighting?
Combines data from different sources and uses estimates from one source
as “benchmarks” to adjust (calibrate) the data.
Integrates auxiliary information irrespective of relationship to other variables
Reuda et al. 2007
Reduction of bias (non-response, coverage, measurement error)
Kott 2006; Skinner 1999
Efficient for limited time-frames, resources (a lower analyst burden)
Can be used for any variable of interest if:
• differential mode effects are avoided
• opt-in sample uses quotas to control for demos and impact on weights
• identified characteristics differentiate opt-in from probability samples
Rueda, M., et al. (2007). Estimation of the distribution function with calibration methods. J Stat Plan Inference 137(2): 435–448.
Kott, P. (2006). Using calibration weighting to adjust for nonresponse and coverage errors. Survey Methodology, 133–142.
Skinner, C.J. (1999). Calibration weighting and non-sampling errors. Research in Official Statistics, 2, 33-43.
© GfK 2012 | Title of presentation | DD. Month 2012
13
13
Calibration Steps
Step 1 – Weight the KP probability sample
Step 2 – Combine weighted probability sample + non-probability opt-in sample,
then re-weight to Step 1 benchmarks = Blended sample
Step 3 – Evaluate by comparing KP sample (Step 1) to Blended sample (Step 2)
for 5 EA questions and at least 3 study variables
© GfK 2012 | Title of presentation | DD. Month 2012
14
14
Results before calibration
Step 4 – Select 1-3 EA Qs as calibration variables for raked reweighting
K Panel
K Panel: n = 105 Deff =
100.0
1.7636
Blended
Blended: n = 174
Deff = 1.5127
90.0
80.0
70.0
60.0
50.0
40.0
30.0
67.7
64.2
76.0
73.6
69.5
61.3
59.7
52.3
35.5
27.1
60.6
55.9
52.8
41.9
38.3
10.0
24.3
20.0
0.0
EA1
EA2
EA3
EA4
EA5
SV1
SV2
SV3
Try before
Like variety
Whats new
Be first
Tell others
Variable 1
Variable 2
Variable 3
© GfK 2012 | Title of presentation | DD. Month 2012
15
15
15
Results after calibration
Evaluate
 Minimize bias introduced from opt-in non-probability sample
K Panel
K Panel: n = 105 Deff =
100.0
1.7636
Blended
Blended: n = 174
Deff = 1.6002
90.0
80.0
70.0
60.0
50.0
40.0
30.0
66.2
64.2
73.4
73.6
61.3
61.3
47.7
52.3
25.4
27.1
54.2
55.9
42.2
41.9
14.5
10.0
24.3
20.0
0.0
EA1
EA2
EA3
EA4
EA5
SV1
SV2
SV3
Try before
Like variety
Whats new
Be first
Tell others
Variable 1
Variable 2
Variable 3
© GfK 2012 | Title of presentation | DD. Month 2012
16
16
Analysis comparing calibration results
Four Conditions:
 Weighted probability sample alone (Reference benchmark)
 Weighted non-probability opt-in sample alone (post-stratification only)
 Blend weighted probability sample with unweighted opt-in sample,
then reweight to reference benchmark – NOT CALIBRATED
 Blend weighted probability sample with unweighted opt-in sample,
then reweight to reference benchmark – CALIBRATED
© GfK 2012 | Title of presentation | DD. Month 2012
17
17
Analysis comparing calibration results
Example: Smoking behavior in a mid-west state
Sample
611 probability sample cases
750 opt-in non-probability sample cases
1,361 combined or “blended” cases
Used 13 items from the study questionnaire
© GfK 2012 | Title of presentation | DD. Month 2012
18
18
Quantitative benchmarks
Examined:
 Average absolute error in weighted estimates
 Number of items with an error of 2 percentage points or more
 Design Effect [ Deff = Σ wi2/ Σ wi ]
 Average estimated squared bias (Ghosh-Dastidar et al. 2009)
)
 Average estimated Mean Squared Error
Ghosh-Dastidar, B., Elliott, M. N., Haviland, A. M., & Karoly, L. A. (2009). Composite Estimates from Incomplete
and Complete Frames for Minimum-MSE Estimation in a Rare Population: An Application to Families with Young
Children. Public Opinion Quarterly, 73 (4), 761-784.
© GfK 2012 | Title of presentation | DD. Month 2012
19
19
Analysis comparing calibration results
Example: Smoking behavior in a mid-west state
Weighted
probability
sample
Weighted
opt-in
sample
Two
samples
blended,
re-weighted
Not
calibrated
Two
samples
blended,
re-weighted
Calibrated *
611
750
1,361
1,361
--
5.3%
2.3%
1.3%
(Reference)
Number of cases (n)
Average absolute error
No. of items with error of
2+ percentage points
Design Effect
Avg. est. squared bias
Avg. est. Mean Squared
Error (MSE)
* Calibrated using EA1, EA3 and EA5
© GfK 2012 | Title of presentation | DD. Month 2012
20
20
Analysis comparing calibration results
Example: Smoking behavior in a mid-west state
Weighted
probability
sample
Weighted
opt-in
sample
Two
samples
blended,
re-weighted
Not
calibrated
Two
samples
blended,
re-weighted
Calibrated *
611
750
1,361
1,361
Average absolute error
--
5.3%
2.3%
1.3%
No. of items with error of
2+ percentage points
--
12
7
3
(Reference)
Number of cases (n)
Design Effect
Avg. est. squared bias
Avg. est. Mean Squared
Error (MSE)
* Calibrated using EA1, EA3 and EA5
© GfK 2012 | Title of presentation | DD. Month 2012
21
21
Analysis comparing calibration results
Example: Smoking behavior in a mid-west state
Weighted
probability
sample
Weighted
opt-in
sample
Two
samples
blended,
re-weighted
Not
calibrated
Two
samples
blended,
re-weighted
Calibrated *
611
750
1,361
1,361
Average absolute error
--
5.3%
2.3%
1.3%
No. of items with error of
2+ percentage points
--
12
7
3
1.87
3.48
2.16
2.10
(Reference)
Number of cases (n)
Design Effect
Avg. est. squared bias
Avg. est. Mean Squared
Error (MSE)
* Calibrated using EA1, EA3 and EA5
© GfK 2012 | Title of presentation | DD. Month 2012
22
22
Analysis comparing calibration results
Example: Smoking behavior in a mid-west state
Weighted
probability
sample
Weighted
opt-in
sample
Two
samples
blended,
re-weighted
Not
calibrated
Two
samples
blended,
re-weighted
Calibrated *
611
750
1,361
1,361
Average absolute error
--
5.3%
2.3%
1.3%
No. of items with error of
2+ percentage points
--
12
7
3
1.87
3.48
2.16
2.10
--
25.579
2.056
0.064
3.937
28.741
3.816
1.826
(Reference)
Number of cases (n)
Design Effect
Avg. est. squared bias
Avg. est. Mean Squared
Error (MSE)
* Calibrated using EA1, EA3 and EA5
© GfK 2012 | Title of presentation | DD. Month 2012
23
23
More examples comparing calibrated results:
Sample sizes
Probability
sample
cases
Opt-In Nonprobability
sample
cases
Combined
“blended”
cases
Example 1: Coastal state environmental study
1,280
767
2,047
Example 2: Restaurant use among Hispanics
560
251
811
Example 3: Holiday shopping among Hispanics
532
268
800
Example 4: Mentoring among LGBT
134
858
992
4,200
8,049
12,249
Example 6: Low-prevalence disease patients
268
537
805
Example 7: Work experience among minorities
1,550
2,379
3,929
Example 5: National study of smokers
© GfK 2012 | Title of presentation | DD. Month 2012
24
24
Average absolute error (%) for 7 studies
Average absolute error (%)
16
14
12
Average
8.3%
10
Average
4.1%
8
6
4
Average
3.2%
2
0
Weighted Opt-in
© GfK 2012 | Title of presentation | DD. Month 2012
Blended KP and Opt-In
(Not calibrated)
Calibrated
25
25
Number of items with error ≥ 2 percentage points
for 7 studies
14
13
No. of Items
12
Average
10.57
11
10
9
Average
9.57
8
7
6
Average
8.43
5
4
Weighted Opt-in
© GfK 2012 | Title of presentation | DD. Month 2012
Blended KP and Opt-In
(Not calibrated)
Calibrated
26
26
Design effect – no real stress on the weights
Weighted
opt-in
sample
Two samples
blended,
re-weighted
Not
calibrated
Two
samples
blended,
re-weighted
Calibrated
2.19
1.62
1.84
1.96
Example 1: Coastal state environmental study
2.37
1.73
2.16
2.19
Example 2: Restaurant use among Hispanics
2.41
1.74
2.15
2.08
Example 3: Holiday shopping among Hispanics
2.27
1.88
2.03
2.08
Example 4: Mentoring among LGBT
2.09
1.27
1.34
2.02
Example 5: National study of smokers
2.45
1.83
2.13
2.22
Example 6: Low-prevalence disease patients
1.89
1.58
1.65
1.63
Example 7: Work experience among minorities
1.87
1.33
1.43
1.48
Weighted
probability
sample
Deff = Σ wi2/ Σ wi
(Reference)
AVERAGE
© GfK 2012 | Title of presentation | DD. Month 2012
27
27
Average estimated squared bias – greatly reduced
Weighted
opt-in
sample
Two samples
blended,
re-weighted
Not
calibrated
Two
samples
blended,
re-weighted
Calibrated
-
97.18
18.56
9.46
Example 1: Coastal state environmental study
-
103.43
13.27
6.21
Example 2: Restaurant use among Hispanics
-
142.84
10.57
4.18
Example 3: Holiday shopping among Hispanics
-
275.28
29.87
17.08
Example 4: Mentoring among LGBT
-
42.45
31.70
10.13
Example 5: National study of smokers
-
57.76
26.00
14.93
Example 6: Low-prevalence disease patients
-
23.09
7.36
4.63
Example 7: Work experience among minorities
-
35.40
11.15
9.06
Weighted
probability
sample
(Reference)
AVERAGE
© GfK 2012 | Title of presentation | DD. Month 2012
28
28
Avg. estimated MSE – reduced with calibration
Weighted
opt-in
sample
Two samples
blended,
re-weighted
Not
calibrated
Two
samples
blended,
re-weighted
Calibrated
5.33
101.38
20.41
11.30
Example 1: Coastal state environmental study
1.81
106.39
14.40
7.35
Example 2: Restaurant use among Hispanics
4.26
152.31
13.55
7.15
Example 3: Holiday shopping among Hispanics
4.06
284.00
32.73
19.90
Example 4: Mentoring among LGBT
16.90
45.15
34.03
12.47
Example 5: National study of smokers
0.50
58.04
26.18
15.11
Example 6: Low-prevalence disease patients
8.32
27.40
10.22
7.48
Example 7: Work experience among minorities
1.48
36.38
11.74
9.65
Weighted
probability
sample
(Reference)
AVERAGE
© GfK 2012 | Title of presentation | DD. Month 2012
29
29
Conclusions
 The KnowledgePanel CalibrationSM technique using non-probability samples can deliver
larger sample sizes when the preferred probability sample source is limited
 Calibrating non-probability samples with probability samples using early adopter
questions minimizes bias in the resulting estimates in the larger combined sample
 Process is relatively simple, serves short timelines for rapid data turnaround
Future Research
 Identify additional characteristics and attitudes that generally distinguish between
probability-based panelists and opt-in panelists that can be used for calibration
 Continue to evaluate our calibration approach across survey topics and populations
 Continued research is necessary to better understand the underlying statistical effects
© GfK 2012 | Title of presentation | DD. Month 2012
30
30
Thank you!
.
Charles DiSogra, Chief Statistician, GfK Sampling Statistics
charles.disogra@gfk.com
© GfK 2012 | Title of presentation | DD. Month 2012
31
Download