INTRODUCTION TO SURVEY SAMPLING February 23, 2011 Karen Foote Retzer Survey Research Laboratory University of Illinois at Chicago www.srl.uic.edu 1 of 22 Census or sample? Census: • Gathering information about every individual in a population Sample: • Selection of a small subset of a population Survey Research Laboratory 2 of 22 Why sample instead of taking a census? • Less expensive • Less time-consuming • More accurate • Samples can lead to statistical inference about the entire population Survey Research Laboratory 3 of 22 Probability Sample • Generalize to the entire population • Unbiased results • Known, non-zero probability of selection Non-probability Sample • Exploratory research • Convenience • Probability of selection is unknown Survey Research Laboratory 4 of 22 Target population Definition: The population to which we want to generalize our findings. • Unit of analysis: Individual/Household/City • Geography: State of Illinois/Champaign County/City of Urbana • Age/Gender • Other variables Survey Research Laboratory 5 of 22 Examples of target populations • Population of adults (18+) in Champaign County • UIUC faculty, staff, students • Youth age 5 to 18 in Champaign County Survey Research Laboratory 6 of 22 Sampling frame • A complete list of all units, at the first stage of sampling, from which a sample is drawn • For example, Lists of addresses Phone numbers in specific area codes Maps of geographic areas Survey Research Laboratory 7 of 22 Sampling frames Example 1: • Population: Adults (18+) in Champaign County • Possible Frame: list of phone numbers, list of block maps, list of addresses Example 2: • Population: Females age 40–60 in Chicago • Possible Frame: list of phone numbers, list of block maps Example 3: • Population: Youth age 5 to 18 in Cook County • Possible Frame: List of schools Survey Research Laboratory 8 of 22 Sample designs for probability samples • Simple random samples • Systematic samples • Stratified samples • Cluster • Multi-stage Survey Research Laboratory 9 of 22 Simple random sampling • Definition: Every element has the same probability of selection and every combination of elements has the same probability of selection. • Probability of selection: n/N, where n = sample size; N = population size • Use Random Number tables, software packages to generate random numbers • Most precision estimates assume SRS Survey Research Laboratory 10 of 22 Systematic sampling • Definition: Every element has the same probability of selection, but not every combination can be selected. • Use when drawing SRS is difficult List of elements is long & not computerized • Procedure Determine population size N and sample size n Calculate sampling interval (N/n) Pick random start between 1 & sampling interval Take every ith case Problem of periodicity Survey Research Laboratory 11 of 22 Stratified sampling: Proportionate • To ensure sample resembles some aspect of population • Population is divided into subgroups (strata) Students by year in school Faculty by gender • Simple Random Sample (with same probability of selection) taken from each stratum. Survey Research Laboratory 12 of 22 Stratified sampling: Disproportionate • Major use is comparison of subgroups • Population is divided into subgroups (strata) Compare girls & boys who play Little League Compare seniors & freshmen who live in dorms • Probability of selection needs to be higher for smaller stratum (girls & seniors) to be able to compare subgroups. • Post-stratification weights Survey Research Laboratory 13 of 22 Cluster sampling • Typically used in face-to-face surveys • Population divided into clusters Schools (earlier example) Blocks • Reasons for cluster sampling Reduction in cost No satisfactory sampling frame available Survey Research Laboratory 14 of 22 Determining sample size: SRS • Need to consider Precision Variation in subject of interest • Formula Sample size no = CI2 * (pq) Precision For example: no = 1.962 * (.5 * .5) .052 • Sample size not dependent on population size. Survey Research Laboratory 15 of 22 Sample size: Other issues • Finite Population Correction n = no/(1 + no/N) • Design effects • Analysis of subgroups • Increase size to accommodate nonresponse • Cost Survey Research Laboratory 16 of 22 Changes in Field of Survey Research From Random Digit Dial to Address Based Sampling Survey Research Laboratory 17 of 22 Cell Phones • 24.5% of US Households are cell phone only (Blumberg & Luke, 2010) • Cell phone only households: • • • • Unrelated adults Non-white Young (<=29) Lower SES • RDD sample frames tend not to include cell phones and can lead to bias Survey Research Laboratory 18 of 22 Cell Phones, cont • Cell phone frames harder to target geographically than landline frame • Frame overlap with RDD • Public Opinion Quarterly, 2007 Special Issue, Vol. 71, Num. 5 Survey Research Laboratory 19 of 22 Address Based Sampling • Sampling addresses from a near universal listing of residential mail delivery locations (Michael Link) • Post-office Delivery Sequence Files (DSF) Survey Research Laboratory 20 of 22 Address Based Sampling Advantages • Coverage of target population is very high • Can be matched to name (~85%) and listed telephone numbers (~65%) • Includes non-telephone households and cell-only households • More efficient than traditional blocklisting Survey Research Laboratory 21 of 22 Address Based Sampling Disadvantages • Incomplete in rural areas (although improving with 9-1-1 address conversion) • Difficulties with “multidrop” addresses Survey Research Laboratory 22 of 22 Before taking questions… • Slides available at www.srl.uic.edu; click on “Seminar Series” • Next seminar: Introduction to Web Surveys, Wednesday, March 2 • Evaluation Survey Research Laboratory 23 of 22