EMR 6500: Survey Research Dr. Chris L. S. Coryn Lyssa N. Wilson

advertisement
EMR 6500:
Survey Research
Dr. Chris L. S. Coryn
Lyssa N. Wilson
Spring 2013
Agenda
• Stratified random sampling for
means and totals
• Review
Stratified Random Sampling
Stratified Random Sampling
• A stratified random sample is one in
which some form of random
sampling is applied in each of a set
of separate groups formed from all
entries on a sampling frame from
which a sample is to be drawn
Strata
• In stratified random sampling, strata
are nonoverlapping groups
separating population elements
• By strategically forming these
groups, stratification becomes a
feature of the sample design that can
improve the statistical quality of
survey estimates
Notation for Stratified Random
Sampling
L = Number of strata
Ni = Number of samplig units in strata i
N = Number of sampling units in the population
= N1 + N2 +
+ NL
Allocation to Strata
• Deciding how a stratified sample will
be distributed among all strata is
called stratum allocation
• The most appropriate allocation
method depends on how the
stratification will be used
Equal Allocation
• If the main purpose of stratification is
to control subgroup sample sizes for
important population subgroups,
stratum sample sizes should be
sufficient to meet precision
requirements for subgroup analysis
• An important part of the analysis is to
produce comparisons among all
subgroup strata
• In this instance, equal allocation (i.e.,
equal sample sizes) would be
appropriate
Proportional Allocation
• Proportional allocation is a prudent choice
when the main focus of the analysis is
characteristics of several subgroups or the
population as a whole and where the
appropriate allocations for these analyses
are discrepant
• Proportional allocation involves applying the
same sampling rate to all strata, thus
implying that the percent distribution of the
selected sample among strata is identical to
the corresponding distribution for the
population
Optimum Allocation
• Optimum allocation, in which the most
cost-efficient stratum sample sizes are
sought, can lead to estimates of overall
population characteristics that are
statistically superior to those from
proportionate allocations
• When all stratum unit costs are the
same, the stratum sampling rates that
yield the most precise sample
estimates are proportional to the
stratum-specific standard deviations
(Neyman allocation)
Estimation of a Population
Mean and Total
Estimate of Population Mean
L
1
1
yst = [ N1 y1 + N 2 y2 + + N L yL ] = å Ni yi
N
N i=1
1 é 2
V̂(yst ) = 2 ë N1 V̂ ( y1 ) + N 22V̂ ( y2 ) + + N L2V̂ ( yL )ùû
N
2ö
2 öù
é
æ
ö
æ
æ
ö
æ
1
n1 s1
nL sL
2
2
= 2 ê N1 ç1- ÷ç ÷ + + N L ç1- ÷ç ÷ú
N ë è N1 øè n1 ø
è N L øè nL øû
L
2ö
æ
ö
æ
1
ni si
2
= 2 å Ni ç1- ÷ç ÷
N i=1 è Ni øè ni ø
Example for a Population Mean
N
n
M
SD
Town A
155
20
33.90
5.95
Town B
62
8
25.12
15.25
Rural
93
12
19.00
9.36
1
yst = [ N1 y1 + N 2 y2 + + N L yL ]
N
1 é
=
ë(155) (33.900) + ( 62) ( 25.125) + ( 63) (19.000)ùû
310
= 27.675
Example for a Population Mean
2ö
æ
ö
æ
1
ni si
2
V̂(yst ) = 2 å Ni ç1- ÷ç ÷
N i=1 è Ni øè ni ø
L
2
2
2
2
2
2ù
é
1 (155) ( 0.871) ( 5.95) ( 62) ( 0.871) (15.25) ( 93) ( 0.871) ( 9.36)
ê
ú
=
+
+
2
20
8
12
úû
(310) êë
=1.97
yst = ±2 V̂ ( yst ) = 27.675± 2 1.97 = 27.7± 2.8
Estimate of Population Total
L
Nyst = N1 y1 + N 2 y2 + N L yL = å Ni yi
i=1
2ö
æ
ö
æ
ni si
2
2
V̂(Nyst ) = N V̂(yst ) = å Ni ç1- ÷ç ÷
è Ni øè ni ø
i=1
L
Example for Population Total
Nyst = 310 ( 27.7) = 8, 587
V̂(Nyst ) = N V̂(yst ) = (310) (1.97) =189, 278.560
2
2
Nyst ± 2 V̂ ( Nyst ) = 8, 587± 2 189, 278.560 = 8, 587±870
Selecting the Sample Size for
Estimating Population Means
and Totals
Sample Size for Estimating
Population Means and Totals
L
åN s
2
i
n=
2
i
/ ai
i=1
L
N D + å N is
2
2
i
i=1
B
D = when estimating m
4
2
B
D = 2 when estimating t
4N
2
Example for a Population Mean
s » 25, s » 225, and s »100
2
1
2
2
2
3
1
1
1
allocation fractions are a1 = , a2 = , and a3 =
3
3
3
B=2
B 2
D = = =1
4 4
2
2
Example for a Population Mean
N1 =155, N2 = 62, and N3 = 93
Ns Ns Ns Ns
å a = a + a + a
i
1
2
3
i=1
3
2
i
2
i
2 2
1 1
2
2
2
2
2
3
2
3
155) ( 25) ( 62) ( 225) ( 93) (100)
(
=
+
+
(1/ 3)
(1/ 3)
(1/ 3)
= ( 24,025) ( 75) + (3,844) ( 675) + (8, 649) (300)
2
2
= 6, 991, 275
2
Example for a Population Mean
3
2
2
2
2
N
s
=
N
s
+
N
s
+
N
s
å i i 11 2 2 33
i=1
= (155) ( 25) + ( 62) ( 225) + ( 93) (100) = 27,125
N D = (310) (1) = 96,100
2
2
Example for a Population Mean
L
n=
2 2
N
å i s i / ai
i=1
L
N D + å N is
2
i=1
2
i
6, 991, 275
6, 991, 275
=
=
= 56.7
96,100 + 27,125 123, 225
æ1ö
n1 = n ( a1 ) = 57 ç ÷ =19
è 3ø
æ1ö
n2 = n ( a1 ) = 57 ç ÷ =19
è 3ø
æ1ö
n3 = n ( a1 ) = 57 ç ÷ =19
è 3ø
Neyman Allocation
æ
ö
ç
÷
Nis i ÷
ç
ni = n L
ç
÷
ç å N ks k ÷
è k=1
ø
æ
ö
ç å N ks k ÷
è k=1
ø
2
L
n=
L
N D + å N is
2
i=1
2
i
Neyman Allocation
s 1 » 5, s 2 »15, and s 3 »10
N1 =155, N2 = 62, and N3 = 93
B=2
3
åN s
i
i=1
i
=N1s 1 + N 2s 2 + N3s 3
= (155) ( 5) + ( 62) (15) + ( 93) (10) = 2, 635
Neyman Allocation
æ
ö
ç
÷ é
ù
155
5
N
s
(
)
(
)
n1 = n ç L i i ÷ = n ê
ú = n ( 0.30)
ç
÷ ë 2, 635 û
ç å N ks k ÷
è k=1
ø
é ( 62) (15) ù
n2 = n ê
ú = n ( 0.35)
ë 2, 635 û
é ( 93) (10) ù
n3 = n ê
ú = n ( 0.35)
ë 2, 635 û
Neyman Allocation
B2 22
D = = =1
4 4
N D = (310) (1) = 96,100
2
2
3
åN s
i
i=1
2
i
=N s + N 2s + N3s
2
1 1
2
2
2
3
= (155) ( 25) + ( 62) ( 225) + ( 93) (100) = 27,125
Neyman Allocation
æ
ö
çå N ks k ÷
è i=1
ø
2
3
n=
=
3
N D + å N is
2
2
i
( 2, 635)
2
96,100 + 27,125
i=1
n1 = na1 = ( 57) ( 0.30) =17
n2 = na2 = ( 57) ( 0.35) = 20
n3 = na3 = ( 57) ( 0.35) = 20
= 56.34
Proportional Allocation
L
n=
2
N
s
å i i
i=1
L
1
ND + å Nis i2
N i=1
Proportional Allocation
s 1 »10, s 2 »10, and s 3 »10
N1 =155, N2 = 62, and N3 = 93
B=2
3
åN s
i
i=1
2
i
= N s + N 2s + N3s
2
1 1
2
2
2
3
= (155) (100) + ( 62) (100) + ( 93) (100)
= 310 (100) = 31,000
Proportional Allocation
B2 22
D = = =1
4 4
31, 000
n=
= 75.6
æ 1 ö
310 (1) + ç
÷ (31, 000)
è 310 ø
Proportional Allocation
æ
ö
ç
÷
N1 ÷ æ N1 ö æ 155 ö
ç
n1 = n 3
= nç ÷ = nç
÷ = n ( 0.5) = 38
ç
÷ è N ø è 310 ø
ç å Nk ÷
è k=1 ø
æ
ö
ç
÷
N 2 ÷ æ N 2 ö æ 62 ö
ç
n2 = n 3
= nç ÷ = nç
÷ = n ( 0.2) =15
ç
÷ è N ø è 310 ø
ç å Nk ÷
è k=1 ø
æ
ö
ç
÷
N 3 ÷ æ N3 ö æ 93 ö
ç
n3 = n 3
= nç ÷ = nç
÷ = n ( 0.3) = 23
ç
÷ è N ø è 310 ø
ç å Nk ÷
è k=1 ø
Comparison of Allocation
Methods
L
n=
2
N
s
å i i
i=1
Proportional
L
1
ND + å Nis i2
N i=1
Neyman
L
n=
2 2
N
å i s i / ai
i=1
L
N 2 D + å N is i2
i=1
æ
ö
ç å N ks k ÷
è k=1
ø
2
L
n=
L
N 2 D + å N is i2
i=1
General framework
Review
The Tailored Design Method
The Tailored Design Method
• Uses multiple motivational features
in compatible and mutually
supportive ways to encourage high
quantity and quality of responses
The Tailored Design Method
• Premised on social exchange
perspective on human behavior
• Assumes that the likelihood of
responding is greater when the
expected rewards outweigh the
anticipated costs
The Tailored Design Method
• Gives attention to all aspects of
contacting and communicating with
respondents
• Encourages response by considering
survey sponsorship, the nature of the
population and variations within it,
and content of questions
The Tailored Design Method
• Emphasizes reducing errors of
coverage, sampling, nonresponse,
and measurement
Coverage Error
• Occurs when all members of a
population do not have a known,
non-zero probability of selection
• Occurs when those who are excluded
are different from those who are
included
Sampling Error
• Results from surveying only some
rather than all members of a
population
• Represented by B, the bound on the
error of estimation
Nonresponse Error
• Occurs when people selected do not
respond and are different than those
who do
• Nonresponse can occur at the level
of items within a survey or at the
level of the survey
– MAR
– MCAR
Measurement Error
• Occurs when responses are
inaccurate or imprecise
• Primarily related to poor layout and
poor design and wording of questions
Social Exchange and Surveys
• Addresses three central questions
about design and implementation
1. How can the perceived rewards for
responding be increased?
2. How can the perceived costs of
responding be reduced?
3. How can trust be established so that
people believe the rewards will
outweigh the costs of responding?
Increasing Benefits
•
•
•
•
•
•
•
•
•
Provide information about the survey
Ask for help or advise
Show positive regard
Say thank you
Support group values
Give tangible rewards
Make the questionnaire interesting
Provide social validation
Inform people that opportunities to
respond are limited
Decreasing Costs
• Make it convenient to respond
• Avoid subordinating language
• Make the questionnaire short and
easy to complete
• Minimize requests for personal or
sensitive information
• Emphasize similarity to other
requests or tasks to which a person
has already responded
Establishing Trust
• Obtain sponsorship by legitimate
authority
• Provide a token of appreciation in
advance
• Make the task appear important
• Ensure confidentiality and security of
information
Features that can be Tailored
• Survey mode
– Singular or multiple
• Sample design
– Type of sample
– Number of units sampled
• Incentives
– Type of incentive
– Amount or cost of incentive
– Before or after
Features that can be Tailored
• Contacts
– Number of contacts
– Timing of initial and subsequent
contacts
– Mode of each contact
– Whether contacts will be personalized
– Sponsorship information
– Visual design of each contact
– Text or words in each contact
Features that can be Tailored
• Additional materials
– Whether to provide them at all
– Type of materials (e.g., research report)
– Visual design of materials
– Text or wording of materials
Features that can be Tailored
• Questionnaire
– Topics included
– Length (duration, number of
pages/screens, number of questions)
– First page or screen
– Visual design
– Organization and order of questions
– Navigation through questionnaire
Features that can be Tailored
• Individual questions
– Topic (sensitive, of interest to the
respondent)
– Type (open-ended versus closed-ended)
– Organization of information
– Text or wording
– Visual design
Coverage and Sampling
Central Terminology
• An element is an object on which a
measurement is taken
• A population is a collection of elements to
which an inference is made from a sample
• A sample is a collection of sampling units
drawn from a frame or frames
• Sampling units are nonoverlapping
collections of elements from the population
that cover the entire population
• A frame is a list of sampling units
Central Terminology
• A completed sample is the units that
respond
• Sampling error is the result of
collecting data from only a subset,
rather than all, units from a frame
– Again, represented by B, the bound on
the error of estimation
Coverage
• The degree to which the units in a
sampling frame correspond to the
population of interest
• Coverage is likely one of the most
serious problems in most surveys
Coverage and Frame Problems
Covered Population
Ineligible Units
Undercoverage
Ineligible Units
Undercoverage
Target Population
Frame Population
Reducing Coverage Error
• Central questions:
– Does the list contain everyone in the
survey population?
– Does the list include people who are not
in the study population?
– How is the list maintained and updated?
– Are the same sample units included on
the list more than once?
– Does the list contain other information
that can be used to improve the survey?
Estimate of Population Mean
n
m̂ = y =
åy
i
i=1
n
æ n ö s2
V̂(y) = ç1- ÷
è Nø n
æ n ö s2
2 V̂ ( y ) = 2 ç1- ÷
è Nø n
Estimate of Population Total
n
tˆ = Ny =
N å yi
i=1
n
2ö
æ
æ
ö
n s
2
V̂(tˆ ) = V̂ ( Ny ) = N ç1- ÷ç ÷
è N øè n ø
2ö
æ
æ
ö
n
s
2
2 V̂ ( Ny ) = 2 N ç1- ÷ç ÷
è N øè n ø
Selecting the Sample Size for
Estimating Population Means
and Totals
Sample Size for Estimating
Population Means
Ns 2
n=
2
N
-1
D
+
s
( )
• where
Sample Size for Estimating
Population Means
• Often, the population variance, s , is
unknown
• An approximate value of s can be
obtained by
Range
s»
4
2
Sample Size for Estimating
Population Totals
Ns 2
n=
2
N
-1
D
+
s
( )
• where
B2
D= 2
4N
Estimation of a Population
Proportion
Estimate of Population
Proportion
n
åy
i
p̂ = y =
æ n ö p̂q̂
V̂( p̂) = ç1- ÷
è N ø n -1
i=1
n
where
q̂ =1- p̂
æ n ö p̂q̂
2 V̂ ( p̂) = 2 ç1- ÷
è N ø n -1
Selecting the Sample Size for
Estimating a Population
Proportion
Sample Size for Estimating
Population Proportions
Npq
n=
( N -1) D + pq
• where
q̂ =1- p̂
• and
B2
D=
4
An Overview of Crafting Good
Questions
Issues to Consider
1. What survey mode(s) will be used
to ask the questions?
2. Is the question being repeated from
another survey, and/or will answers
be compared to previously collected
data?
3. Will respondents be willing and
motivated to answer accurately?
4. What type of information is the
question asking for?
Choosing Words and Forming
Question
1.
2.
3.
4.
5.
6.
7.
8.
9.
Make sure the question applies to the respondent
Make sure the question is technically accurate
Ask one question at a time
Use simple and familiar words
Use specific and concrete words to specify the
concepts clearly
Use as few words as possible to pose the question
Use complete sentences with simple sentence
structures
Make sure “yes” means yes and “no” means no
Be sure the question specifies the response task
Visual Presentation of Survey
Questions
1.
2.
3.
4.
5.
6.
7.
8.
9.
Use darker and/or larger print for the question and lighter
and/or smaller print for answer choices and answer spaces
Use spacing to create subgrouping within a question
Visually standardize all answer spaces or response options
Use visual design properties to emphasize elements that are
important to the respondent and to deemphasize those that are
not
Make sure words and visual elements that make up the question
send consistent messages
Integrate special instructions into the question where they will
be used rather than including them as freestanding entities
Separate optional or occasionally needed instructions from the
question stem by font or symbol variation
Organize each question in a way that minimizes the need to
reread portions in order to comprehend the response task
Choose line spacing, font, and text size to ensure the legibility
of the text
Download