Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics The Margin of Error The CNN/Opinion Research Corp. said 51 percent of those polled thought Biden did the best job, while 36 percent thought Palin did the best job. On the question of the candidates' qualifications to assume the presidency, 87 percent of those polled said Biden is qualified and 42 percent said Palin is qualified. The poll had a margin of error of plus or minus 4 percentage points. http://www.cnn.com/2008/POLITICS/10/03/debate.poll/index.html (9:30 AM, Friday, October 3, 2008) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 What Does the “Margin of Error” Tell You? Did Biden do the better job? If we could ask every individual all over the world who had an opinion, the proportion who think yes would be θ. We assume that such a value exists. It has to be as of a moment in time. The next day, the same question might get a different answer. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 The Margin of Error We can’t ask everyone, so we ask a sample of people, i = 1,…,n. Do you think Biden did the better job? Xi = 1 if the person answers yes, Xi = 0 if they answer no. 51% said yes, so P =(1/n)Σi xi = 0.51 Is π = 0.51? No, we didn’t ask everyone, we just asked a sample. 0.51 is an estimate of π. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Why the 4% Margin of Error? We acknowledge that the 0.51 might be inaccurate because it is based on a sample. We assume that whatever n is, the sample was drawn randomly. We use our empirical rule to figure out what the real value of π might be. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Some Theory P = (1/n)Σi xi = 0.51 P is a random variable. It is the sum of n Bernoulli random variables, divided by n. E[xi] = the probability that person i will answer yes. This is π. So, the expected value of P is (1/n)Σi θ = θ. They think P is a good estimate of θ. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Theory Continued Since P is a random variable it has a variance. Var[P] = (1/n2) Σi θ(1- θ) = θ(1- θ)/n The standard deviation is the square root. Use P to estimate this. The estimated standard deviation is sqr(.51(.49)/n). 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 That Margin of Error They use the same empirical rule we do. The margin of error is ±.04 is ±2 standard deviations. So, one standard deviation is .02. .022 = P(1-P)/n. If P is .51, n = 625. (According to David Gregory, they asked 560). 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 What they Found Based on a survey of 625 people, we believe the proportion of people who think Biden did a better job is between 47% and 55%. Based on the same logic, the proportions for Palin are 28% to 36%. Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% What would these ranges be if they had asked 6,250 people instead of 625? Frequency Is this CERTAIN? Listing Is this CERTAIN? Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Regression: θ|State ≠ θ Overall 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000