Statistical Inference and Regression Analysis: GB.3302.30 Professor William Greene Stern School of Business IOMS Department Department of Economics Inference and Regression Part 9 – Linear Model Topics 3/95 Agenda Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Variable Selection – Stepwise Regression Partial Regression – The Meaning of Multiple Regression Panel Data Test of Regression Stability Generalized Regression Robust inference for OLS regression Heteroscedasticity and weighted least squares Autocorrelation and generalized least squares Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Inference and Regression Stepwise Regression 5/95 Stepwise Regression Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Start with (a) no model, or (b) the specific variables that are designated to be forced to into whatever model ultimately chosen (A: Forward step) Add a variable: “Significant?” Include the most “significant variable” not already included. (B: Backward step) Are variables already included in the equation now adversely affected by collinearity? If any variables become “insignificant,” now remove the least significant variable. Return to (A) This can cycle back and forth for a while. Usually not. Ultimately selects only variables that appear to be “significant” Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 6/95 Stepwise Regression Feature 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 7/95 Specify Procedure All 10 predictors Subset of predictors that must appear in the final model chosen (optional) No need to change Methods or Options I changed P value for inclusion to .10. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 8/95 Used 0.10 as the cutoff “p-value” for inclusion or removal. All P values will be less than or equal to .10. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 9/95 Stepwise Regression What’s Right with It? What’s Wrong with It? Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent No reason to assume that the resulting model will make any sense Test statistics are completely invalid and cannot be used for statistical inference. (Can’t be t ratios if you know in advance they will be larger than 2.) Frequency Listing Automatic – push button Simple to use. Not much thinking involved. Relates in some way to connection of the variables to each other – significance – not just R2 Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Inference and Regression Multiple Regression 11/95 The Frisch-Waugh Theorem 800000 800000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% Partialing out the effect of a variable Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Multiple Regression Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 12/95 U.S. Gasoline Market, 1953-2004 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 12 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 13/95 Multiple Regression of logG on logPG and logY 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 13 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 14/95 Two Side Regressions Regress logG on a constant and logY and compute residuals RESLOGG Regress logPg on a constant and logY and compute residuals RESLOGPG 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 14 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 15/95 Interesting Plots Original regression of logG on a constant and logPg. The line slopes the wrong way. New regression of ReslogG on a constant and ReslogPg. The line slopes the right way. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 15 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 16/95 Regression of Residuals on Residuals 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 16 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 17/95 NLOGIT 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 17 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 18/95 Minitab Use CALC to compute logg=loge(g), logpg=loge(pg), logy=loge(pcincome). Regression of logg on logpg and logy. To save residuals, use Storage as above. Residuals are saved as RESI1 and RESI2 in data area. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 18 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 19/95 Frisch-Waugh (1933) Theorem Context: Model contains two sets of variables: X = [ [1,time] | [ other variables]] = [X1 X2] Regression model: y = X11 + X22 + (population) = X1b1 + X2b2 + e (sample) Problem: Algebraic expression for the second set of least squares coefficients, b2 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 20/95 Frisch-Waugh Result “We get the same result whether we (1) detrend the other variables by using the residuals from a regression of them on a constant and a time trend and use the detrended data in the regression or (2) just include a constant and a time trend in the regression and not detrend the data” “Detrend the data” means compute the residuals from the regressions of the variables on a constant and a time trend. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 21/95 Partitioned Solution Method of solution (Why did F&W care? In 1933, matrix computation was not trivial!) Direct manipulation of normal equations produces ( X X )b = X y X1 X1 X1 X 2 X1 y X = [ X 1 , X 2 ] so X X = and X y = X X X X X y 2 2 2 1 2 X1 X1 X1 X 2 b1 X1 y (X X)b = = X X X X b X y 2 2 2 2 1 2 X1 X1b1 X1 X 2b2 X1 y X 2 X1b1 X 2 X 2b2 X 2 y ==> X 2 X 2b2 X 2 y - X 2 X1b1 = X 2 (y - X 1b1 ) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 22/95 Partitioned Solution Direct manipulation of normal equations produces b2 = (X2X2)-1X2(y - X1b1) What is this? Regression of (y - X1b1) on X2 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 23/95 The Frisch-Waugh Result Continuing the algebraic manipulation, the solution for b2 is: b2 = [(X2’M1)(M1X2)]-1[(X2’M1)(M1y)] where M1 = I-X1(X1’X1)-1X1’ and M1X2 and M1y are residuals in regressions on X1. This is Frisch and Waugh’s famous result - the “double residual regression.” How do we interpret this? A regression of residuals on residuals. “We get the same result whether we (1) detrend the other variables by using the residuals from a regression of them on a constant and a time trend and use the detrended data in the regression or (2) just include a constant and a time trend in the regression and not detrend the data” “Detrend the data” means compute the residuals from the regressions of the variables on a constant and a time trend. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 24/95 Partial Regression Important terms in this context: Partialing out the effect of X1. Netting out the effect … “Partial regression coefficients.” To continue belaboring the point: Note the interpretation of partial regression as “net of the effect of …” Now, follow this through for the case in which X1 is just a constant term, column of ones. What are the residuals in a regression on a constant. What is M1? Note that this produces the result that we can do linear regression on data in mean deviation form. 'Partial regression coefficients' are the same as 'multiple regression coefficients.' It follows from the Frisch-Waugh theorem. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 25/95 Understanding Multiple Regresion In a multiple regression, the coefficient on an x is interpreted to give the effect of change in x on change in y holding everything else constant. That is, “net of the effect of everything else.” How can y=a+b1Educ+b2Age+e? Each year of education means aging by 1 year. How is it possible to hold age constant and increase education by 1 year? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 25 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 26/95 Application – Health and Income German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods Variables in the file are Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. There are altogether 27,326 observations. The number of observations ranges from 1 to 7 per family. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). The dependent variable of interest is DOCVIS = number of visits to the doctor in the observation period HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 27/95 Multiple Regression 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 27 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Inference and Regression Panel Data 29/95 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 30/95 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 31/95 THE Application of Frisch-Waugh The Fixed Effects Model A regression model with a dummy variable for each individual in the sample, each observed Ti times. yi = Xi + diαi + εi, for each individual N columns Plain 32.5% 0 0 0 Scatterplot of Listing vs IncomePC 900000 800000 800000 500000 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 0 900000 400000 Mushroom 16.2% d2 0 0 β ε α dN 0 1000000 60 800000 40 Listing Sausage 5.8% 0 Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% 0 Frequency Pepperoni 21.8% 0 Listing Meatball Garlic 5.0% 2.3% d1 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% X1 X2 XN Percent y1 y2 yN N may be thousands. I.e., the regression has thousands of variables (coefficients). 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 32/95 Application – Health and Income German Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods Variables in the file are Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. There are altogether 27,326 observations. The number of observations ranges from 1 to 7 per family. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). The dependent variable of interest is DOCVIS = number of visits to the doctor in the observation period HHNINC = household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schooling AGE = age in years We desire also to include a separate family effect (7293 of them) for each family. This requires 7293 dummy variables in addition to the four regressors. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 33/95 ‘Within’ Transformations XMD y = XiM y i , Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% 90 200000 100000 15000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 Mean StDev N 369687 156865 51 80 8 4 200000 Normal 10 500000 300000 0 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 700000 400000 10 17500 Histogram of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 400000 t=1 (x it,k -x i.,k )(y it -y i. ) 900000 Mean StDev N AD P-Value 95 500000 Ti Scatterplot of Listing vs IncomePC Normal - 95% CI 600000 t=1 (x it,k -x i.,k )(x it,l -x i.,l ) i k,l i k Probability Plot of Listing 99 700000 300000 100000 i Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball i D Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% i D i Ti Frequency N i=1 XM X XM y i D 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing i D Percent XMD X = XiM X i , N i=1 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 34/95 Least Squares Dummy Variable Estimator b is obtained by ‘within’ groups least squares (group mean deviations) Normal equations for a are D’Xb+D’Da=D’y a = (D’D)-1D’(y – Xb) Ti ai=(1/Ti )Σt=1 (yit -xitb)=ei 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 35/95 Estimating the Fixed Effects Model The FEM is a linear regression model but with many independent variables 1 X X X D X y DX DD Dy the Frisch-Waugh theorem b a Using 1 =[X MD X ] X MD y 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency b 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 36/95 Fixed Effects Estimator (cont.) M1D 0 0 2 0 M 0 D (The dummy variables are orthogonal) MD N 0 MD 0 MDi I Ti di (didi ) 1 d = I Ti (1/Ti )did 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% - T1i 1 ... - Ti ... ... ... 1 - T1i Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% ... Frequency 1 - T1i - T1i 1 1 - Ti 1 - Ti = ... ... 1 - T1 Ti i 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Inference and Regression Chow Test 38/95 Equal Regressions Setting: Two groups of observations (men/women, countries, two different periods, firms, etc.) Regression Model: y = α+β1x1+β2x2 + … + ε Hypothesis: The same model applies to both groups Rejection region: Large values of F 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 39/95 Application Health satisfaction depends on many factors: Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Investigation: Multiple regression Hypothesis: The regressions are the same. Rejection Region: Estimated regressions that are very different. Listing Age, Income, Children, Education, Marital Status Do these factors figure differently in a model for women compared to men? Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 40/95 Test of Structural Stability Two groups, cleverly labeled Group 1 and Group 2. Regression model applies to the two groups: yj = Xjj + j Null hypothesis: 1 = 2 Test using an F statistic. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 40 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 41/95 Testing Strategy: Setup Fit separate regressions for the two groups. Separate coefficient vectors b1 and b2 Each coefficient vector is bj = (Xj’Xj)-1Xj’yj Sums of squares e1’e1 = (y1 - X1b1)’(y1 - X1b1) and e2’e2 = (y2 – X2b2)’(y2 – X2b2) Total separate sum of squares = SS12 = e1’e1 + e2’e2 Pooled sum of squares SSpooled = e1’e1 = (y1 - X1b)’(y1 - X1b) + (y2 – X2b)’(y2 – X2b) SSpooled must be > SS12 Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Percent Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing b = (X1’X1 + X2’X2)-1 ( X1’y1 + X2’y2) Pooled regression Percent Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 42/95 Testing Strategy Rejection Regions (1) b1 is very different from b2 (2) SSpooled is much larger than SS12 These are the same. (SSpooled-SS12)/(K+1) F[K+1,N1+N2-2K-2] = (SS12)/(N1+N2-2K-2) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 43/95 Procedure: Equal Regressions There are N1 observations in Group 1 and N2 in Group 2. There are K variables and the constant term in the model. This test requires you to compute three regressions and retain the sum of squared residuals from each: SS1 = sum of squares from N1 observations in group 1 SS2 = sum of squares from N2 observations in group 2 SSALL = sum of squares from NALL=N1+N2 observations when the two groups are pooled. F= The hypothesis of equal regressions is rejected if F is larger than the critical value from the F table (K numerator and NALL-2K-2 denominator degrees of freedom) 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Percent Mushroom and Onion 9.2% (SSALL-SS1-SS2)/(K+1) (SS1+SS2)/(N1+N2-2K-2) 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 44/95 Health Satisfaction Models: Men vs. Women 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent German survey data over 7 years, 1984 to 1991 (with a gap). 27,326 observations on Health Satisfaction and several covariates. +--------+--------------+----------------+--------+--------+----------+ |Variable| Coefficient | Standard Error | T |P value]| Mean of X| +--------+--------------+----------------+--------+--------+----------+ Women===|=[NW = 13083]================================================ Constant| 7.05393353 .16608124 42.473 .0000 1.0000000 AGE | -.03902304 .00205786 -18.963 .0000 44.4759612 EDUC | .09171404 .01004869 9.127 .0000 10.8763811 HHNINC | .57391631 .11685639 4.911 .0000 .34449514 HHKIDS | .12048802 .04732176 2.546 .0109 .39157686 MARRIED | .09769266 .04961634 1.969 .0490 .75150959 Men=====|=[NM = 14243]================================================ Constant| 7.75524549 .12282189 63.142 .0000 1.0000000 AGE | -.04825978 .00186912 -25.820 .0000 42.6528119 EDUC | .07298478 .00785826 9.288 .0000 11.7286996 HHNINC | .73218094 .11046623 6.628 .0000 .35905406 HHKIDS | .14868970 .04313251 3.447 .0006 .41297479 MARRIED | .06171039 .05134870 1.202 .2294 .76514779 Both====|=[NALL = 27326]============================================== Constant| 7.43623310 .09821909 75.711 .0000 1.0000000 AGE | -.04440130 .00134963 -32.899 .0000 43.5256898 EDUC | .08405505 .00609020 13.802 .0000 11.3206310 HHNINC | .64217661 .08004124 8.023 .0000 .35208362 HHKIDS | .12315329 .03153428 3.905 .0001 .40273000 MARRIED | .07220008 .03511670 2.056 .0398 .75861817 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 45/95 Computing the F Statistic +--------------------------------------------------------------------------------+ | Women Men All | | LHS=HEALTH Mean = 6.634172 6.924362 6.785662 | | Standard deviation = 2.329513 2.251479 2.293725 | | Number of observs. = 13083 14243 27326 | | Model size Parameters = 6 6 6 | | Degrees of freedom = 13077 14237 27320 | | Residuals Sum of squares = 66677.66 66705.75 133585.3 | | Standard error of e = 2.258063 2.164574 2.211256 | | Fit R-squared = 0.060762 0.076033 .070786 | | Model test F (P value) = 169.20(.000) 234.31(.000) 416.24 (.0000) | +--------------------------------------------------------------------------------+ [133,585.3-(66,677.66+66,705.75)] / 6 = 6.8904 (66,677.66+66,705.75) / (27,326 - 6 - 6 - 2 The critical value for F[6, 23214] is 2.0989 F= Even though the regressions look similar, the hypothesis of equal regressions is rejected. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Inference and Regression Generalized Regression 47/95 Generalized Regression “General” in that the main assumptions of the regression model are not met. Heteroscedasticity Serial correlation (autocorrelation) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 48/95 Conditional Homoscedasticity and Nonautocorrelation Disturbances provide no information about each other. Var[i | X ] = 2 Cov[i, j |X] = 0 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 100000 15000 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 4 5 200000 2 1 100000 15000 0 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 300000 10 Mean StDev N 10 500000 400000 20 300000 200000 60 50 40 Normal 100 12 700000 600000 70 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 80 600000 200000 369687 156865 51 0.994 0.012 ... 0 ... 0 2 ... 0 I ... ... ... 2 0 0 2 ... 0 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 0 2 0 ... 0 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% ... Cov(1 , N ) 2 ... Cov( 2 , N ) 0 ... Cov(3 , N ) 0 ... ... ... ... Var ( N ) 0 Percent Cov(1 , 2 ) Cov(1 , 3 ) Var (1 ) Cov( , ) Var ( 2 ) Cov( 2 , 3 ) 2 1 Cov(3 , 1 ) Cov(3 , 2 ) Var (3 ) ... ... ... Cov( N , 1 ) Cov( N , 2 ) Cov( N , 3 ) 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 49/95 Data on 18 OECD countries, 19 years, Gasoline consumption logGas/Car=a+b1logIncome+b2logP+b3logcars/Capita+e 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 50/95 Heteroscedasticity Countries are ordered by the standard deviation of their 19 residuals. Regression of log of per capita gasoline use on log of per capita income, gasoline price and number of cars per capita for 18 OECD countries for 19 years. The standard deviation varies by country. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 51/95 Heteroscedasticity: Regression results for 6 (of 18) countries 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 52/95 Autocorrelation logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 53/95 Autocorrelation Results from an Incomplete Model 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 54/95 Heteroscedasticity Disturbances still provide no information about each other. Var[i | X ] = 2 Cov[i, j |X] = 0 But data are of unequal variation 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 100000 15000 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 4 5 200000 2 1 100000 15000 0 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 300000 10 Mean StDev N 10 500000 400000 20 300000 200000 60 50 40 Marginal Plot of Listing vs IncomePC Normal 100 12 700000 600000 70 ... 0 ... 0 ... 0 ... ... ... N 2 Empirical CDF of Listing 14 800000 80 600000 200000 369687 156865 51 0.994 0.012 0 0 3 2 ... 0 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 0 2 2 0 ... 0 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% ... Cov(1 , N ) 12 ... Cov( 2 , N ) 0 ... Cov(3 , N ) 0 ... ... ... ... Var ( N ) 0 Percent Cov(1 , 2 ) Cov(1 , 3 ) Var (1 ) Cov( , ) Var ( 2 ) Cov( 2 , 3 ) 2 1 Cov(3 , 1 ) Cov(3 , 2 ) Var (3 ) ... ... ... Cov( N , 1 ) Cov( N , 2 ) Cov( N , 3 ) 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 55/95 Response to Heteroscedasticity Regression model (so far) assumes homoscedasticity Any implications for use of least squares? Any adjustment needed for regression computations? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 56/95 Robust Regression Inference Earlier result, Var[b|X]=2(X’X)-1 is no longer correct. It is possible to adjust the covariance matrix Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Robust Covariance Matrix: Gives appropriate standard errors whether or not there is heteroscedasticity. Percent Frequency Listing Case 1. Heteroscedasticity is of unknown type. Least squares is OK – still unbiased and consistent. Least squares standard errors are not OK Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 57/95 The (Hal) White Estimator Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Compute LS, b=… as usual, and compute residuals, ei Compute A = X’X = i 1 xixi’ Compute B = X’[e2]X = i ei2 xixi’ Compute estimator of Var[b|X] as A-1 B A-1 This is called a sandwich estimator Frequency Listing Compute a covariance matrix for least squares that will not be distorted by heteroscedasticity, but will also be OK if there is no heteroscedasticity. Steps: Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 58/95 Heteroscedasticity Robust Standard Errors REGRESS;Lhs=lgaspcar ; Rhs=one,lincomep,lrpmg,lcarpcap ; Heteroscedasticity $ (Minitab does not know how to do this.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 59/95 Heteroscedasticity Case 2. Simple function of a variable yi xi i , Var[i ] 2 zi . Transform to a model that looks like the familiar regression yi zi 1 xi zi zi 1 i zi 2 1 2 yi * wi * xi * i *, Var[i *] zi 2 z i Implication: Regression of y* on w* and x* (without a constant) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 60/95 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 61/95 Number of Observations 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 62/95 Weighted LS Using Means 1 yi xi i , Var[ i ] . Ni Transform to a model that looks like the familiar regression 2 yi Ni Ni Ni xi i Ni 1 yi * wi * xi * i *, Var[i *] Ni Implication: 2 Ni 2 2 Regression of y* on w* and x* (without a constant) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 63/95 Weighted Least Squares 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 64/95 Groupwise Heteroscedasticity 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 65/95 Strategy for Groupwise Var[ε country,year ]=σ country Strategy: (1) Linear Regression pooling the data 2 e years country , year 2 (2) For each country, use ˆ country Tcountry (3) Weighted least squares 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 66/95 Groupwise Weighted Least Squares 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 67/95 Combining Heteroscedastic Estimates Data on 18 OECD countries, 19 years, Gasoline consumption logGas/Car=a+b1logIncome+b2logP+b3logcars/Capita+e 18 Estimates of (1 2 3 4) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 68/95 Combining Heteroscedastic Estimates Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Simple Average Weighted average based on ssqrd Pool the data Weighted average based on all information Generalized Least Squares (same as 4.) Listing 1. 2. 3. 4. 5. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 69/95 Simple Average Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent Frequency Listing Each least squares estimate bk has variance Ak = k2(Xk’Xk)-1 Simple average, b = (1/N)b1 + (1/N)b2 + … b = k wk bk where wk=1/N wk is the same for all k Each is unbiased, so the average is unbiased Variance is (1/N)2 A1 + (1/N)2A2 + … Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 70/95 Weighted Average Based on k2 Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Variances range from .0122 to .0752, difference of 36 times Forming weights, a smaller variance should get greater weight. Use wk = (1/k2)/[ (1/12) + (1/22) + … + (1/N2)] Weights are positive and sum to 1. Weighted average b = k wk bk is still unbiased Variance now w12A1 + w22A2 + … + wN2AN Frequency 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 71/95 Pooling the Data Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing b = [X1’X1+X2’X2+…]-1 [X1’y1+X2’y2+…] Use a trick. X’y = X’Xb After some matrix algebra, b = k Wkbk = a matrix weighted average k Wk = I The variance is a messy matrix weighted sum The average accounts for data but not k2 Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 72/95 Best Weighted Average Account for both Xk’Xk and k2 In Matrix Weighting, Wk = [kAk]-1Ak Ak = k2(Xk’Xk)-1 This is equivalent to the weighted least squares we did before. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 73/95 Comparison Worst and Best 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 74/95 GLS for AR1 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 75/95 U.S. Gasoline Market, 1953-2004 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 76/95 Autocorrelation 800000 800000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% How to adjust estimator to deal with serial correlation Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Robust estimator to patch least squares? Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 77/95 Robust Estimation for b Assume Cov[et,et-s] fades as the time separation increases. log Gt 1 log Pgt 2 log Incomet +1 log Pnct 2 log Puct 3 log Pptt t Correlations of least squares residuals with past values for 13 years. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 78/95 Newey—West Estimator 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 79/95 Model for Autocorrelation Based on a specific assumption t t 1 u t , u t ~ mean 0, variance 2 0 < < 1, is the autocorrelation coefficient. Results (not derived here), 2 Var[t ] = 1 2 Correlation [t , t 1 ] = 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 80/95 How To Do Generalized Least Squares Model: yt xt t , t t 1 ut xt t A transformation: yt yt 1 xt t 1 (yt -yt 1 ) = (1 ) (x t xt 1 ) (t t 1 ) Subtract yt * = * + x t * + u t Now compute the regression. Two loose ends T ee t 2 t t 1 T 2 t t 1 (1) Need to know . First just use least squares, then r = . e (2) What happens to the first observation? Lost? No. 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% x1 * x1 1 2 and Percent Use y1 * y1 1 2 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 81/95 AR(1) for U.S. Gasoline 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Inference and Regression Generalized Regression (Optional) 83/95 Generalized Regression Model Setting: The classical linear model assumes that E[] = Var[] = 2I. That is, observations are uncorrelated and all are drawn from a distribution with the same variance. The generalized regression (GR) model allows the variances to differ across observations and allows correlation across observations. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 84/95 Implications Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Listing Percent The assumption that Var[] = 2I is used to derive the result Var[b] = 2(XX)-1. If it is not true, then the use of s2(XX)-1 to estimate Var[b] is inappropriate. The assumption was used to derive most of our test statistics, so they must be revised as well. Least squares gives each observation a weight of 1/n. But, if the variances are not equal, then some observations are more informative than others. Least squares is based on simple sums, so the information that one observation might provide about another is never used. Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 85/95 GR Model The generalized regression model: y = X + , E[|X] = 0, Var[|X] = 2. Leading Cases Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Simple heteroscedasticity Autocorrelation Panel data and heterogeneity more generally. Frequency Listing Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 86/95 Least Squares Still unbiased. (Proof did not rely on ) For consistency, we need the true variance of b, Var[b|X] = E[(b-β)(b-β)’|X] = (X’X)-1 E[X’εε’X] (X’X)-1 = 2 (X’X)-1 XX (X’X)-1 . Divide all 4 terms by n. If the middle one converges to a finite matrix of constants, we have the result, so we need to examine (1/n)XX = (1/n)ij ij xi xj. This will be another assumption of the model 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 87/95 Generalized Least Squares A transformation of the model: P = -1/2. P’P = -1 Py = PX + P or y* = X* + *. Why? E[**’|X*]= PE[’|X*]P’ = PE[’|X]P’ = σ2PP’ = σ2 -1/2 -1/2 = σ2 0 = σ2I 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 88/95 Generalized Least Squares Aitken theorem. The Generalized Least Squares estimator, GLS. Py = PX + P or y* = X* + *. E[**’|X*]= σ2I Use ordinary least squares in the transformed model. Satisfies the Gauss – Markov theorem. b* = (X*’X*)-1X*’y* 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 89/95 Unbiasedness ˆ ( X'Ω-1 X )1 X'Ω-1 y β β ( X'Ω-1 X )1 X'Ω-1ε ˆ E[β|X ] = β ( X'Ω-1 X ) 1 X'Ω-1E[ε | X ] = β if E[ε | X] 0 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 90/95 Consistency Use Mean Square -1 X'Ω X ˆ Var[β|X ]= n n 2 1 0? X'Ω-1 X Requires to be "well behaved" n Either converge to a constant matrix or diverge. Heteroscedasticity case: X'Ω-1 X 1 n 1 i1 x i x i' n n ii Autocorrelation case: X'Ω-1 X 1 n n 1 i1 j1 x i x j'. n2 terms. Convergence is unclear. n n ij 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 91/95 Generalized (Weighted) Least Squares Heteroscedasticity Case 1 0 2 2 Var[] 0 0 -1/2 0 ... 0 ... 0 ... 0 ... n 2 0 0 1 / 1 0 0 0 0 ... 0 ... 0 ... 1 / n ... 1 / 2 0 0 0 1 ˆ ( X'Ω X ) ( X'Ω y ) n 1 x x n 1 x y β i i i i i1 i1 i i 1 -1 -1 ˆ y x β i i i1 i 2 ˆ nK 2 n 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 92/95 Autocorrelation t = t-1 + ut (‘First order autocorrelation.’ How does this come about?) Assume -1 < < 1. Why? ut = ‘nonautocorrelated white noise’ t = t-1 + ut (the autoregressive form) = (t-2 + ut-1) + ut = ... (continue to substitute) = ut + ut-1 + 2ut-2 + 3ut-3 + ... = (the moving average form) (Some observations about modeling time series.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 93/95 Autocorrelation Var[ t ] Var[ut ut 1 2ut 1 ...] = Var i=0 iut i u2 2i 2 = i=0 u 1 2 An easier way: Since Var[ t ] Var[ t 1 ] and t t 1 ut Var[ t ] 2 Var[ t 1 ] Var[ut ] 2Cov[ t 1 ,ut ] =2 Var[ t ] u2 u2 1 2 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 94/95 Autocovariances Continuing... Cov[ t , t 1 ] = Cov[ t 1 ut , t 1 ] = Cov[ t 1 , t 1 ] Cov[ut , t 1 ] = Var[ t-1 ] Var[ t ] u2 = (1 2 ) Cov[ t , t 2 ] = Cov[ t 1 ut , t 2 ] = Cov[ t 1 , t 2 ] Cov[ut , t 2 ] = Cov[ t , t 1 ] 2 u2 = and so on. 2 (1 ) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 95/95 Autocorrelation Matrix 1 1 2 u 2 2 Ω 2 1 T 1 T 2 600000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 Scatterplot of Listing vs IncomePC Normal - 95% CI 99 700000 300000 100000 Probability Plot of Listing T 3 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 1 Percent 900000 T 2 T 3 1 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% T 1 2 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 96/95 Generalized Least Squares 1 2 0 ... 0 Ω 1 / 2 0 0 ... 1 0 ... 1 ... 0 0 0 ... 0 ... ... ... 0 0 1 2 y 1 y y 1 2 1 / 2 Ω y = y y 3 2 ... y T T 1 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 0 Mean StDev N 10 500000 300000 10 Normal 100 12 700000 400000 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing C ategory Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 1 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000