VI. Sampling: (Nov. 2, 4) Frankfort-Nachmias & Nachmias (Chapter 8 – Sampling and Sample Designs) King, Keohane and Verba (Chapter 4) Barbara Geddes. 1990. “How the Cases You Choose Affect the Answers You Get: Selection Bias in Comparative Politics.” Political Analysis, 2:1, 131-150. Applications William Reed, “A Unified Statistical Model of Conflict Onset and Escalation.” American Journal of Political Science, Vol. 44, No. 1 (Jan., 2000), pp. 84-93 Richard Timpone. 1998. “Structure, Behavior and Voter Turnout in the United States.” American Political Science Review, Vol. 92 (1): 145-158. Sampling – any well-defined set of units of analysis; the group to which our theories apply Population – any subset of units collected in some manner from the population; the data we use to test our theories Sample Parameter vs. Statistic Types of Samples sample – each element of the population has a known probability of being included in the sample Probability Nonprobability sample - each element of the population has an unknown probability of being included in the sample Types of Nonprobability Samples Convenience sample Purposive sample – may not be representative of the population to which we want to generalize Problem Famous Example of Convenience Sampling Digest – used automobile registration lists and telephone directories as sampling frame for presidential polls Literary 1928 - 18 million postcards to accurately predict outcome of 1928 election (Hoover-R) 1932: 20 million postcards to accurately predict 1932 election (Roosevelt-D) Famous Example of Convenience Sampling Digest – used automobile registration lists and telephone directories as sampling frame for presidential polls Literary 1928 - predicted Hoover-R 1932: predicted Roosevelt-D 1936: predicted Landon (R) 57% What happened? Famous Example of Convenience Sampling Before 1936 Upper class/Working Class – more or less representative partisan distribution Famous Example of Convenience Sampling Before Upper class/Working Class – more or less representative partisan distribution 1936 1936 and beyond Upper class disproportionately Republican Working class disproportionately Democrat Types of Nonprobability Samples samples – elements are chosen based on selected characteristics and the representation of these characteristics in the population Quota Insures accurate representation of selected characteristics Elements with selected characteristics chosen in convenience fashion Famous Examples of Quota Samples 1936 – George Gallup used quota sampling to accurately predict: The (inaccurate) Literary Digest prediction The winner of the 1936 election Famous Examples of Quota Samples 1948 – quota sampling incorrectly predicts Dewey to defeat Truman Types of Probability Samples random sample – each element of the population has an equal chance of being selected Simple sample – elements selected from a list at predetermined intervals Systematic Types of Probability Samples sample – elements in population are grouped into strata, and each strata is randomly sampled Stratified Example of Stratified Sampling Population: 75% white, 10% black, 10 Hispanic, 5% Asian Simple random sample of 1000: Approximately 750 white, 100 black, 100 Hispanic, 50 Asian Samples too small for group comparisons Solution: Use stratified sampling to over-sample minority groups (disproportionate stratified sampling) Types of Probability Samples sample – elements are grouped into “clusters,” and sampling proceeds in two stages: Cluster • (1) A random sample of clusters is chosen • (2) Elements within selected clusters are then randomly selected and aggregated to form final sample • This is the sampling method used in many national surveys (e.g. clusters=metropolitan areas, zip codes, area codes) Sampling Distribution (of sample means) Population Draw Random Sample of Size n Calculate sample mean Repeat until all possible random samples of size n are exhausted The resulting collecting of sample means is the sampling distribution of sample means Sampling Distribution of Sample Means Def: A frequency distribution of all possible sample means taken from the same population for a given sample size (n) Sampling Distribution of Sample Means Def: A frequency distribution of all possible sample means taken from the same population for a given sample size (n) The mean of the sampling distribution will be equal to the population mean. The sampling distribution will be normally distributed (regardless of population distribution if n>30) Standard Error How the sample means vary from sample to sample (i.e. within the sampling distribution) is expressed statistically by the value of the standard deviation of the sampling distribution. Standard Error, cont. The standard error for a sample mean is calculated as: s / √n Where s = sample standard deviation n = sample size Simulating a Sampling Distribution (For a Sample Proportion) Dichotomous variable for which the true population value is set at .25 Randomly Repeat draw 1,000 samples of size n for different n’s and compare Simulation of a Sampling Distribution (n=10) Simulation of a Sampling Distribution (n=100) Sample Size and Sampling Error Sample Selection Bias What is it? What are the consequences of selecting on: The dependent variable? The independent variable?