Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics Statistics and Data Analysis Statistical Tests: Variances Equal Variance Assumption 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing The formula can be used whether the variances of the two groups are the same or not. If it is known that the variances of the two groups are the same, then the results on pages 359-360 in your text can be used to compute a single variance estimator. Why do this? (It’s extra work.) It makes the test procedure more powerful. This will be true whenever you can “pool” data. If the sample size is even moderately large, don’t bother (unless you need to impress someone with your statistical expertise). The change will be trivial and it will not change your conclusion. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Equal Variances Computation with Small Samples x A xB - 0 The test statistic is t = (s2A / NA ) (sB2 / NB ) If the variances are the same, this is x A xB - 0 t= = (s / NA ) (s / NB ) 2 P x A xB - 0 2 P sP (1/ NA ) (1/ NB ) Use a weighted average, sP ws 2A (1 w)sB2 where w = (NA 1) /(NA NB 2). 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Unequal Variances Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Listing In comparing means, the validity (and power) test become dubious if the variances are very unequal. If sA2/sB2 > 2 or < ½, it might be a good idea to reconsider the whole exercise. You can test for unequal variances. Warning: If you do not reject the unequal variances hypothesis, then go on to test equality of the means, you now have two sources of type I error. (Theoretical statisticians worry about this.) Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 A Test for Unequal Variances Assuming both samples are drawn from normal populations with the same variance (the means can be different), the ratio sA2/sB2, has an F distribution. If this F is larger than the upper critical value or lower than the lower critical value, reject the hypothesis. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Two Variances Test 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Equal Variances Test 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Variance Test: Income vs. Own/Rent 11183.02/17317.62 = 0.42 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 900000 Mean StDev N AD P-Value 95 700000 90 500000 400000 200000 100000 15000 800000 700000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 e mc 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 17500 20000 22500 25000 IncomePC 27500 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 500000 400000 10 17500 Histogram of Listing 14 2 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 300000 100000 Probability Plot of Listing 99 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000