Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty Types of error • Non-sampling error – Error associated with collecting and analyzing the data • Sampling error – Error associated with failing to interview the entire population Non-Sampling Error • Coverage error ▫ Wrong population definition ▫ Flawed sampling frame ▫ Interviewer or management error in following sampling frame • Response error ▫ Badly worded question results in invalid or incorrect response ▫ Interviewer bias changes response • Non-response error ▫ Respondent refuses to take survey or is away ▫ Respondent refuses to answer certain questions • Processing errors ▫ Error in data entry or recording of responses • Analysis errors ▫ Inappropriate analytical techniques, weighting or imputation are applied Sampling Error • Sampling error is known after the data are collected by calculating the Margin of Error and confidence intervals • Surveys don’t have a Margin of Error, questions do • Power analyses use estimates of the parameters involved in calculating the margin of error • It is common to see sample sizes of 400 and 1000 for surveys (these are associated with 5% and 3% margins of error) • In most cases the size of the population being sampled from is irrelevant • The margin of error should be calculated using the size of the subgroups sampled Margin of Error Formula zs H n • H = Half interval expressed in units of standard deviation • z = z score associated with level of confidence (typically 95%) • s = standard deviation • n = sample size The z score • The z value is the z score associated with a level of confidence • Typically (almost exclusively) surveys use 95% • This means that if the survey were replicated 100 times, 95 times out of 100 the estimate would be within the margin of error • The z score associated with 95% is 1.96 The standard deviation (s) • For a continuous variable the standard deviation is typically not known • Previous research may suggest some reasonable range for the margin of error • After you have collected the data the standard deviation is known Example: Age of Floridians 1.96(17.6) 34.496 H 1.712 20.149 406 • Sample of 406 Floridians • Age range 18 to 92 • Mean age of sample = 52.3 • Standard deviation = 17.6 • 95 times out of 100 sample estimate would be between 50.58 and 54.01 (Frequentist interpretation) Margin of Error for a Proportion p(1 p) Hz n • p = proportion Example: Floridians employed p(1 p) .5529(.4471) Hz 1.96 .047 n 415 • Sample of 415 Floridians • 55.29 percent employed • 44.47 percent not employed • 95 times out of 100 the estimate of the percent employed would be between 50.59 and 59.99 Margin of Error with Finite Population Adjustment p(1 p) ( N n) Hz n ( N 1) Example: Floridians employed with finite population adjustment p(1 p) ( N n) .5529(.4471) (6,949,759 415) Hz 1.96 .0469 n ( N 1) 415 (6,949,759 1) • With the finite population adjustment the margin of error is .01 percent lower H adjusted versus not adjusted as sample size increases 120 Margin of error 100 80 H H adjusted 60 40 20 0 1 200 399 598 797 996 1195 1394 n • No real value to adjustment until you reach 10 percent of population • H adjusted falls to zero as you approach a census • H unadjusted never does Formula to determine sample size given a desired margin of error 2 2 z s n 2 H Calculator sites • http://www.americanresearchgroup.com/moe.html • http://www.surveysystem.com/sscalc.htm Power Analysis n H (%) 100 200 300 400 500 600 700 800 900 1000 9.8 6.9 5.7 4.9 4.4 4.0 3.7 3.5 3.3 3.1 Dillman formula Ns = (Np) (p) (1 – p) (Np – 1) (B/C)2 + (p) (1 – p) Where: Ns = completed sample size needed for desired level of precision Np = size of population (in this case assume 80,000) p = proportion of population expected to choose one of the two response categories (in this case either owner or renter) B = acceptable amount of sampling error (in this case assume +/5% = 0.05) C = z statistic associated with the confidence level (in this case assume a 95% confidence level = 1.96) 35000 120 30000 100 Dollars 25000 80 20000 60 15000 40 10000 20 5000 0 0 1 200 399 598 797 n 996 1195 1394 Margin of error (%) Relationship between cost and sampling error with increases in sample size Cost H