Error and Sample Sizes

advertisement
Error and Sample Sizes
PHC 6716
June 1, 2011
Chris McCarty
Types of error
• Non-sampling error – Error associated with
collecting and analyzing the data
• Sampling error – Error associated with failing to
interview the entire population
Non-Sampling Error
• Coverage error
▫ Wrong population definition
▫ Flawed sampling frame
▫ Interviewer or management error in following sampling frame
• Response error
▫ Badly worded question results in invalid or incorrect response
▫ Interviewer bias changes response
• Non-response error
▫ Respondent refuses to take survey or is away
▫ Respondent refuses to answer certain questions
• Processing errors
▫ Error in data entry or recording of responses
• Analysis errors
▫ Inappropriate analytical techniques, weighting or imputation are applied
Sampling Error
• Sampling error is known after the data are collected by calculating
the Margin of Error and confidence intervals
• Surveys don’t have a Margin of Error, questions do
• Power analyses use estimates of the parameters involved in
calculating the margin of error
• It is common to see sample sizes of 400 and 1000 for surveys (these
are associated with 5% and 3% margins of error)
• In most cases the size of the population being sampled from is
irrelevant
• The margin of error should be calculated using the size of the
subgroups sampled
Margin of Error Formula
zs
H
n
• H = Half interval expressed in units of standard
deviation
• z = z score associated with level of confidence
(typically 95%)
• s = standard deviation
• n = sample size
The z score
• The z value is the z score associated with a level
of confidence
• Typically (almost exclusively) surveys use 95%
• This means that if the survey were replicated
100 times, 95 times out of 100 the estimate
would be within the margin of error
• The z score associated with 95% is 1.96
The standard deviation (s)
• For a continuous variable the standard deviation
is typically not known
• Previous research may suggest some reasonable
range for the margin of error
• After you have collected the data the standard
deviation is known
Example: Age of Floridians
1.96(17.6) 34.496
H

 1.712
20.149
406
• Sample of 406 Floridians
• Age range 18 to 92
• Mean age of sample = 52.3
• Standard deviation = 17.6
• 95 times out of 100 sample estimate would be
between 50.58 and 54.01 (Frequentist
interpretation)
Margin of Error for a Proportion
p(1  p)
Hz
n
• p = proportion
Example: Floridians employed
p(1  p)
.5529(.4471)
Hz
 1.96
 .047
n
415
• Sample of 415 Floridians
• 55.29 percent employed
• 44.47 percent not employed
• 95 times out of 100 the estimate of the percent
employed would be between 50.59 and 59.99
Margin of Error with Finite Population
Adjustment
p(1  p) ( N  n)
Hz
n
( N  1)
Example: Floridians employed with
finite population adjustment
p(1  p) ( N  n)
.5529(.4471) (6,949,759 415)
Hz
 1.96
 .0469
n
( N  1)
415
(6,949,759 1)
• With the finite population adjustment the margin of
error is .01 percent lower
H adjusted versus not adjusted as sample size increases
120
Margin of error
100
80
H
H adjusted
60
40
20
0
1
200
399
598
797
996
1195
1394
n
• No real value to adjustment until you reach 10 percent of population
• H adjusted falls to zero as you approach a census
• H unadjusted never does
Formula to determine sample size
given a desired margin of error
2 2
z s
n 2
H
Calculator sites
• http://www.americanresearchgroup.com/moe.html
• http://www.surveysystem.com/sscalc.htm
Power Analysis
n
H (%)
100
200
300
400
500
600
700
800
900
1000
9.8
6.9
5.7
4.9
4.4
4.0
3.7
3.5
3.3
3.1
Dillman formula
Ns =
(Np) (p) (1 – p)
(Np – 1) (B/C)2 + (p) (1 – p)
Where:
Ns = completed sample size needed for desired level of
precision
Np = size of population (in this case assume 80,000)
p = proportion of population expected to choose one of the two
response categories (in
this case either owner or renter)
B = acceptable amount of sampling error (in this case assume +/5% = 0.05)
C = z statistic associated with the confidence level (in this case
assume a 95% confidence level = 1.96)
35000
120
30000
100
Dollars
25000
80
20000
60
15000
40
10000
20
5000
0
0
1
200
399
598
797
n
996 1195 1394
Margin of error (%)
Relationship between cost and sampling error with
increases in sample size
Cost
H
Download