Statistical Inference and Regression Analysis: GB.3302.30 Professor William Greene Stern School of Business IOMS Department Department of Economics 2/97 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 2 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 3/97 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 3 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Statistics and Data Analysis Part 7 – Regression Model-1 Regression Diagnostics 5/97 Using the Residuals How do you know the model is “good?” Various diagnostics to be developed over the semester. But, the first place to look is at the residuals. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 6/97 Residuals Can Signal a Flawed Model Standard application: Cost function for output of a production process. Compare linear equation to a quadratic model (in logs) (123 American Electric Utilities) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 7/97 Electricity Cost Function 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 8/97 Candidate Model for Cost Log c = a + b log q + e Scatterplot of logCost vs logOutput 7 6 5 Most of the points in this area are above the regression line. logCost 4 3 Most of the points in this area are above the regression line. 2 1 Most of the points in this area are below the regression line. 0 -1 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI Mean StDev N AD P-Value 95 90 500000 400000 100000 15000 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 4 5 200000 2 1 100000 15000 0 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 300000 10 Mean StDev N 10 500000 400000 20 300000 200000 60 50 40 30 Marginal Plot of Listing vs IncomePC Normal 100 12 700000 600000 70 12 Empirical CDF of Listing 14 800000 80 600000 200000 369687 156865 51 0.994 0.012 10 Histogram of Listing 900000 99 700000 300000 100000 Probability Plot of Listing 8 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 900000 6 logOutput Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 4 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 2 Percent 0 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 9/97 A Better Model? Log Cost = α + β1 logOutput + β2 [logOutput]2 + ε 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 10/97 Candidate Models for Cost The quadratic equation is the appropriate model. Logc = a + b1 logq + b2 log2q + e 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 11/97 Missing Variable Included Residuals from the quadratic cost model Residuals Versus logOutput Residuals Versus logOutput (response is logCost) (response is logCost) 2.0 0.50 1.5 0.25 Residual Residual 1.0 0.5 0.00 0.0 -0.25 -0.5 -0.50 -1.0 0 2 4 6 logOutput 8 10 12 0 2 4 6 logOutput 8 10 12 Residuals from the linear cost model 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 12/97 Unusual Data Points Outliers have (what appear to be) very large disturbances, ε Scatterplot of Weight vs TLength Regression of Foreign Box Office on Domestic Overseas = 6.693 + 1.051 Domestic 160 1400 S R-Sq R-Sq(adj) 1200 140 73.0041 52.2% 52.1% Overseas Weight 1000 120 100 800 600 400 80 200 0 60 16 18 20 TLength 22 24 26 28 0 Wolf weight vs. tail length Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 20000 22500 25000 IncomePC 27500 30000 32500 6 200000 2 1 100000 15000 800000 1000000 Normal Mean StDev N 369687 156865 51 80 5 400000 600000 Listing Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 8 4 200000 600 10 500000 300000 0 500 12 700000 400000 10 17500 400 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 300 Domestic Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 200 The 500 most successful movies Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 100 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 14 Percent 12 Frequency 10 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 13/97 Outliers 99.5% of observations will lie within mean ± 3 standard deviations. We show (a+bx) ± 3se below.) Regression of Foreign Box Office on Domestic Titanic is 8.1 standard deviations from the regression! Overseas = 6.693 + 1.051 Domestic 1400 S R-Sq R-Sq(adj) 1200 1000 Overseas These observations might deserve a close look. 73.0041 52.2% 52.1% 800 Only 0.86% of the 466 observations lie outside the bounds. (We will refine this later.) 600 400 200 0 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 600 Scatterplot of Listing vs IncomePC Normal - 95% CI 99 700000 300000 100000 Probability Plot of Listing 500 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 400 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 300 Domestic Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% 200 Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 100 Frequency 0 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 14/97 Prices paid at auction for Monet paintings vs. surface area (in logs) logPrice = a + b logArea + e Not an outlier: Monet chose to paint a small painting. Possibly an outlier: Why was the price so low? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 15/97 What to Do About Outliers (1) Examine the data (2) Are they due to mismeasurement error or obvious “coding errors?” Delete the observations. (3) Are they just unusual observations? Do nothing. (4) Generally, resist the temptation to remove outliers. Especially if the sample is large. (500 movies is large. 10 wolves is not.) (5) Question why you think it is an outlier. Is it really? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 16/97 Regression Options 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 17/97 Diagnostics Residuals e = y - (a + bx) Standardized Residual e e* = (x-x) 2 se 1 (x-x) 2 e* has mean 0 and standard deviation 1 Influential observations have very large values of | x i - x | / (x i - x ) 2 . (How large depends on the number of variables in the model.) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 18/97 On Removing Outliers Be careful about singling out particular observations this way. The resulting model might be a product of your opinions, not the real relationship in the data. Removing outliers might create new outliers that were not outliers before. Statistical inferences from the model will be incorrect. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Statistics and Data Analysis Part 7 – Regression Model-2 Statistical Inference 20/97 b As a Statistical Estimator What is the interest in b? = dE[y|x]/dx Effect of a policy variable on the expectation of a variable of interest. Effect of medication dosage on disease response … many others 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 21/97 Application: Health Care Data German Health Care Usage Data, There are altogether 27,326 observations on German households, 1984-1994. 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 1(Number of doctor visits > 0) 1(Number of hospital visits > 0) health satisfaction, coded 0 (low) - 10 (high) number of doctor visits in last three months number of hospital visits in last calendar year insured in public health insurance = 1; otherwise = 0 insured by add-on insurance = 1; otherswise = 0 household nominal monthly net income in German marks / 10000. children under age 16 in the household = 1; otherwise = 0 years of schooling age in years marital status years of education Frequency DOCTOR = HOSPITAL= HSAT = DOCVIS = HOSPVIS = PUBLIC = ADDON = INCOME = HHKIDS = EDUC = AGE = MARRIED = EDUC = 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 22/97 Regression? Population relationship Income = + Health + (For this population, Income = .31237 + .00585 Health + E[Income | Health] = .31237 + .00585 Health 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 23/97 Distribution of Health 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 24/97 Distribution of Income 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 25/97 Average Income | Health Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 90 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 200000 2 1 100000 15000 800000 1000000 Normal Mean StDev N 369687 156865 51 80 5 400000 600000 Listing Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 8 4 200000 3192 10 500000 300000 0 3061 12 700000 400000 10 6172 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 400000 4191 Histogram of Listing 900000 Mean StDev N AD P-Value 95 500000 2570 Scatterplot of Listing vs IncomePC Normal - 95% CI 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 1390 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% 1173 Percent Meatball Garlic 5.0% 2.3% 642 Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% 255 Listing 447 Percent = Listing Nj Health 4233 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 26/97 b is a statistic Random because it is a sum of the ’s. It has a distribution, like any sample statistic 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 27/97 Sampling Experiment 500 samples of N=52 drawn from the 27,326 (using a random number generator to simulate N observation numbers from 1 to 27,326) Compute b with each sample Histogram of 500 values 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 28/97 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 29/97 Conclusions Sampling variability Seems to center on Appears to be normally distributed 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 30/97 Distribution of slope estimator, b Assumptions: (Model) Regression: yi = + xi + i (Crucial) Exogenous data: data x and noise are independent; E[|x]=0 or Cov(,x)=0 (Temporary) Var[|x] = 2, not a function of x (Homoscedastic) Results: What are the properties of b? 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 31/97 x x y y x x N b i 1 i i 2 N i 1 i yi x i i Treat the sample data on x as given, then x x ( x i i ) ( x ) N (Insert expr. for yi ) b i 1 i x i 1 x b i 1 i 2 x ( ) ( x i x) (i ) N (Expand) x N i i1 x i x N 2 x x x x x x (Expand more) b x x x x x x x x (Simplify) b x x x x x x (First term simplifies) b x x N N i 1 i i 1 i 2 N i 1 i 1 2 i 1 i N i N 2 i 1 2 i 1 i N i N i i i 2 N i 1 i i N i 1 i i 2 N i 1 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% i 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 32/97 (1) b is unbiased and linear in x x x x x x b x x x x x x N N i 1 i i 1 i 2 N i 1 N i 2 N i 1 i i 1 w i i , w i n i 1 i xi x 2 N i1 x i x i 2 N i 1 i i (Linearity in i ) N E[b | x1 ,..., x N ] E i 1 w i i | x i i 1 w i E[ i | x i ] N (Unbiasedness) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 33/97 (2) b is efficient 800000 800000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% Variance of b is smallest among linear unbiased estimators. Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Gauss – Markov Theorem: Like Rao Blackwell. (Proof in Greene) Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 34/97 (3) b is consistent xi x wi 2 N i1 x i x b i 1 w i i , n (Linearity in i ) 2 x x N N i Var[b | data] i 1 w i2 2 2 i 1 2 N x x i 1 i N 2 xi x 2 2 i 1 = 0 2 2 N 2 N x x xi x i 1 i 1 i 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 35/97 Consistency: N=52 vs. N=520 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 36/97 a is unbiased and consistent a y bx x i 1 w i i x N i 1 ( N1 xw i )i N E[a | data] i 1 ( N1 xw i )E[i | data] N 2 2 2 x (x x) x(x x) 1 i Var[a | data] i 1 2 N i 2 2 N N (x x) 2 i 1 (x i x) i 1 i x2 2 1 N 0 2 N i 1 (x i x) N 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency 2 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 37/97 Covariance of a and b a i 1 ( N1 xw i )i N b i 1 w i i N Covariance[a, b] i 1 Cov ( N1 xw i )i , w i i (cross terms are all zero because of independence) N 2 i 1 ( N1 xw i )w i N 2 N w i 2 1 N i 1 N i 1 xw i2 (first term is zero) = 2 x i 1 w i2 N 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball (x i x) (We will need this result later) 2 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% i 1 Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% N Frequency 2 x 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 38/97 Inference about Have derived expected value and variance of b. b is a ‘point’ estimator Looking for a way to form a confidence interval. Need a distribution and a pivotal statistic to use. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 39/97 Normality Additional assumption about the data i comes from a normal population with mean 0 and variance 2 b i 1 wi i N 2 2 E[b|data] , Var[b|data]= N b i 1 ( xi x ) 2 Now, we also have that b has a normal distribution with this mean and variance 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 40/97 Confidence Interval b ~ N[0,1] b Prob b z 2(1 ( z )) b E.g., if z = 1.96, Prob[b is more than 1.96 standard deviations from ] is 2(1 - (1.96)) = 2(1 - .975) = 2(.025) = .05 The usual confidence interval calculation, applied to b. Prob[b - zb < b + zb ] = 2(1 ( z )) ********* We need to know to use this. *********** 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 41/97 Estimating sigma squared i=1 ei2 N se2 = N- 2 = N 2 (y a bx ) i i i=1 N- 2 Is an unbiased estimator of 2 . (Proof later) se2 2 2 Use to form s N as an estimator of b N 2 i 1 ( xi x ) i 1 ( xi x ) 2 2 b (N-2)s e2 has a chi squared distribution with N-2 degrees of freedom. 2 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 42/97 Usable Confidence Interval Use s instead of s. Use t distribution instead of normal. Critical t depends on degrees of freedom b - ts < < b + ts 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 43/97 Slope Estimator Results b=72.7181 N 2 Σ i=1 ei =10751.6 N=62 N Σ i=1 (cntwait3i -cntwait3) 2 =1.49654 s e2 =13.38627 2 s b =10.94249 t*=2.0003 Confidence interval = 72.7181 2.0003(10.94249) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 44/97 Regression Results ----------------------------------------------------------------------------Ordinary least squares regression ............ LHS=BOX Mean = 20.72065 Standard deviation = 17.49244 ---------No. of observations = 62 DegFreedom Mean square Regression Sum of Squares = 7913.58 1 7913.57745 Residual Sum of Squares = 10751.5 60 179.19235 Total Sum of Squares = 18665.1 61 305.98555 ---------Standard error of e = 13.38627 Root MSE 13.16860 Fit R-squared = .42398 R-bar squared .41438 Model test F[ 1, 60] = 44.16247 Prob F > F* .00000 --------+-------------------------------------------------------------------| Standard Prob. 95% Confidence BOX| Coefficient Error t |t|>T* Interval --------+-------------------------------------------------------------------Constant| -14.3600** 5.54587 -2.59 .0121 -25.2297 -3.4903 CNTWAIT3| 72.7181*** 10.94249 6.65 .0000 51.2712 94.1650 --------+-------------------------------------------------------------------Note: ***, **, * ==> Significance at 1%, 5%, 10% level. ----------------------------------------------------------------------------- 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 45/97 Hypothesis Test about Outside the confidence interval is the rejection for hypothesis tests about For the internet buzz regression, the confidence interval is 51.2712 to 94.1650 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% The hypothesis that equals zero is rejected. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Statistics and Data Analysis Part 7-3 – Prediction 47/97 Predicting y Using the Regression Actual y0 is + x0 + 0 Prediction is y0^ = a + bx0 + 0 Error is y0 – y0^ = (a-) + (b-)x0 + 0 Variance of the error is Var[a] + x02 Var[b] + 2x0 Cov[a,b] + Var[0] 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 48/97 Prediction Variance Var[a] + x 02 Var[b] + 2x 0 Cov[a,b] + var[ε 0 ] 1 2 2 x2 1 x 2 2 N x 2 x 0 0 2 N 2 N 2 N ( x x ) ( x x ) ( x x ) i 1 i i 1 i i 1 i Collect the terms: 2 ( x0 x ) 2 1 1 2 2 2 Var[prediction error] = 1 N 1 ( x x ) sb 0 2 N N i 1 ( xi x ) To form a prediction interval, use s to estimate and use t distribution. 2 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 49/97 Quantum of Solace Pie Chart of Percent vs Type Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Percent Actual Box = $67.528882M a=-14.36, b=72.7181, N=62, sb =10.94249, s2 = 13.38632 buzz = 0.76, prediction = 40.906 Mean buzz = 0.4824194 (buzz – mean)2 = 1.49654 Sforecast = 13.8314252 Confidence interval = 40.906 +/- 2.003(13.831425) = 13.239 to 68.527 (Note: The confidence interval contains the value) Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 50/97 Forecasting Out of Sample Fitted Line Plot Regression Analysis: G versus Income The regression equation is G = 1.93 + 0.000179 Income Predictor Coef SE Coef T P Constant 1.9280 0.1651 11.68 0.000 Income 0.00017897 0.00000934 19.17 0.000 S = 0.370241 R-Sq = 88.0% R-Sq(adj) = 87.8% G = 1.928 + 0.000179 Income 8 Regression 95% PI S R-Sq R-Sq(adj) 7 0.370241 88.0% 87.8% G 6 5 How to predict G for 2012? You would need first to predict Income for 2012. 4 3 10000 12500 15000 17500 20000 22500 25000 27500 Income How should we do that? Per Capita Gasoline Consumption vs. Per Capita Income, 1953-2004. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 51/97 The Extrapolation Penalty The interval is narrowest at x* = x , the center of our experience. The interval widens as we move away from the center of our experience to reflect the greater uncertainty. (1) Uncertainty about the prediction of x (2) Uncertainty that the linear relationship will continue to exist as we move farther from the center. Fitted Line Plot Weight = 40.00 + 3.000 TLength Regression 95% PI S R-Sq R-Sq(adj) 22.3607 36.0% 28.0% 100 50 0 12 14 16 18 20 TLength 22 24 26 28 1 s 1+ (x* x)2 (SE(b))2 N Prediction 1.96 Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 2 e 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 10 Percent Weight 150 Frequency 200 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 52/97 Normality Necessary for t statistics and confidence intervals Residuals reveal whether disturbances are normal? Standard tests and devices 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 53/97 Normally Distributed Residuals? -------------------------------------Kolmogorov-Smirnov test of F(E ) vs. Normal[ .00000, 13.27610^2] ******* K-S test statistic = .1810063 ******* 95% critical value = .1727202 ******* 99% critical value = .2070102 Normality hyp. should be rejected. -------------------------------------- 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 54/97 Nonnormal Disturbances Plain 32.5% 800000 800000 500000 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 900000 Percent Scatterplot of Listing vs IncomePC 900000 400000 Mushroom 16.2% t is essentially normal if N > 50. Frequency Sausage 5.8% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% Use standard normal instead of t Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Appeal to the central limit theorem Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Statistics and Data Analysis Part 7-4 – Multiple Regression 56/97 Box Office and Movie Buzz 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 56 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 57/97 Box Office and Budget 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 57 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 58/97 Budget and Buzz Effects 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 58 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 59/97 An Enduring Art Mystery Graphics show relative sizes of the two works. The Persistence of Statistics. Rice, 2007 Why do larger paintings command higher prices? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% The Persistence of Memory. Salvador Dali, 1931 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 60/97 The Data Histogram of ln (SurfaceArea) Histogram of ln (US$) 80 90 80 60 70 50 60 Frequency Frequency 70 40 30 50 40 30 20 20 10 10 0 10.5 12.0 13.5 ln (US$) 15.0 0 16.5 3.2 4.0 4.8 5.6 6.4 ln (SurfaceArea) 7.2 8.0 8.8 Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 61/97 Monet in Large and Small Sale prices of 328 signed Monet paintings Fitted Line Plot ln (US$) = 2.825 + 1.725 ln (SurfaceArea) 18 S R-Sq R-Sq(adj) 17 1.00645 20.0% 19.8% ln (US$) 16 The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model. 15 14 13 12 11 6.0 6.2 6.4 6.6 6.8 7.0 ln (SurfaceArea) 7.2 7.4 7.6 Log of $price = a + b log surface area + e 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 62/97 Monet Regression: There seems to be a regression. Is there a theory? 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 63/97 How much for the signature? The sample also contains 102 unsigned paintings Average Sale Price Signed $3,364,248 Not signed $1,832,712 Average price of signed Monet’s is almost twice that of unsigned 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Frequency Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Listing Pie Chart of Percent vs Type Mushroom and Onion 9.2% Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 64/97 Can we separate the two effects? Average Prices Small Large Unsigned 346,845 5,795,000 Signed 689,422 5,556,490 What do the data suggest? (1) The size effect is huge (2) The signature effect is confined to the small paintings. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 65/97 A Multiple Regression Scatterplot of ln (US$) vs ln (SurfaceArea) 18 Signed 0 1 17 16 b2 ln (US$) 15 14 13 12 11 10 6.0 6.2 6.4 6.6 6.8 7.0 ln (SurfaceArea) 7.2 7.4 7.6 Ln Price = a + b1 ln Area + b2 (0 if unsigned, 1 if signed) + e 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 66/97 Monet Multiple Regression Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed The regression equation is ln (US$) = 4.12 + 1.35 ln (SurfaceArea) + 1.26 Signed Predictor Coef SE Coef T P Constant 4.1222 0.5585 7.38 0.000 ln (SurfaceArea) 1.3458 0.08151 16.51 0.000 Signed 1.2618 0.1249 10.11 0.000 S = 0.992509 R-Sq = 46.2% R-Sq(adj) = 46.0% Interpretation (to be explored as we develop the topic): (1) Elasticity of price with respect to surface area is 1.3458 – very large (2) The signature multiplies the price by exp(1.2618) (about 3.5), for any given size. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 67/97 Ceteris Paribus in Theory Demand for gasoline: G = f(price,income) Demand (price) elasticity: eP = %change in G given %change in P holding income constant. How do you do that in the real world? 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Percent Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Frequency Pie Chart of Percent vs Type Mushroom and Onion 9.2% Listing The “percentage changes” How to change price and hold income constant? Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 68/97 The Real World Data 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 69/97 U.S. Gasoline Market, 1953-2004 Time Series Plot of logG, logIncome, logPg 5 Variable logG logIncome logPg Data 4 3 2 1 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 2001 Scatterplot of Listing vs IncomePC Normal - 95% CI 99 700000 300000 100000 Probability Plot of Listing 1993 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 1985 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 1977 Year Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% 1969 Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 1961 Frequency 1953 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 70/97 Shouldn’t Demand Curves Slope Downward? Scatterplot of GasPrice vs G 140 120 GasPrice 100 80 60 40 20 0 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 100000 15000 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 4 5 200000 2 1 100000 15000 0 200000 400000 600000 Listing 800000 1000000 Mean StDev N 369687 156865 51 80 8 300000 10 Normal 10 500000 400000 20 300000 200000 60 50 40 30 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 100 12 700000 600000 70 0.65 14 800000 80 500000 200000 369687 156865 51 0.994 0.012 0.60 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 0.55 Scatterplot of Listing vs IncomePC Normal - 95% CI 99 700000 300000 100000 Probability Plot of Listing 0.50 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 900000 0.45 G Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 0.40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 0.35 Percent 0.30 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 71/97 A Thought Experiment Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% g 3 10000 Probability Plot of Listing 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 Histogram of Listing 6 200000 2 1 100000 15000 800000 1000000 Marginal Plot of Listing vs IncomePC Normal Mean StDev N 369687 156865 51 80 5 400000 600000 Listing 27500 Empirical CDF of Listing 100 8 4 200000 25000 10 500000 300000 0 22500 12 700000 400000 10 17500 20000 Income 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 15000 900000 Mean StDev N AD P-Value 95 600000 12500 Scatterplot of Listing vs IncomePC Normal - 95% CI 99 700000 300000 100000 4 Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 5 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Meatball Garlic 5.0% 2.3% 6 Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 7 Frequency Scatterplot of g vs Income Listing Percent The main driver of gasoline consumption is income not price Income is growing over time. We are not holding income constant when we change price! How do we do that? Listing 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 72/97 How to Hold Income Constant? Multiple Regression Using Price and Income Regression Analysis: G versus GasPrice, Income The regression equation is G = 0.134 - 0.00163 GasPrice + 0.000026 Income Predictor Constant GasPrice Income Coef 0.13449 -0.0016281 0.00002634 SE Coef 0.02081 0.0004152 0.00000231 T 6.46 -3.92 11.43 P 0.000 0.000 0.000 It looks like the theory works. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Statistics and Data Analysis Linear Multiple Regression Model 74/97 Classical Linear Regression Model The model is y = f(x1,x2,…,xK,1,2,…K) + = a multiple regression model. Important examples: Marginal cost in a multiple output setting Separate age and education effects in an earnings equation. Denote (x1,x2,…,xK) as x. Boldface symbol = vector. 800000 800000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% ‘Dependent’ and ‘independent’ variables. Independent of what? Think in terms of autonomous variation. Can y just ‘change?’ What ‘causes’ the change? Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Form of the model – E[y|x] = a linear function of x. Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 75/97 Model Assumptions: Generalities Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Linearity means linear in the parameters. We’ll return to this issue shortly. Identifiability. It is not possible in the context of the model for two different sets of parameters to produce the same value of E[y|x] for all x vectors. (It is possible for some x.) Conditional expected value of the deviation of an observation from the conditional mean function is zero Form of the variance of the random variable around the conditional mean is specified Nature of the process by which x is observed. Assumptions about the specific probability distribution. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 76/97 Linearity of the Model E[y|x] = 1*1 + 2*x2 + … + K*xK. (1*1 = the intercept term). Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency Boldface letter indicates a column vector. “x” denotes a variable, a function of a variable, or a function of a set of variables. There are K “variables” on the right hand side of the conditional mean “function.” The first “variable” is usually a constant term. (Wisdom: Models should have a constant term unless the theory says they should not.) Listing f(x1,x2,…,xK,1,2,…K) = x11 + x22 + … + xKK Notation: x11 + x22 + … + xKK = x. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 77/97 Linearity Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Logs and levels in economics Time trends, and time trends in loglinear models – rates of growth Dummy variables Quadratics, power functions, log-quadratic, trig functions, interactions and so on. Percent Frequency Listing Linearity means linear in the parameters, not in the variables E[y|x] = 1 f1(…) + 2 f2(…) + … + K fK(…). fk() may be any function of data. Examples: Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 78/97 Linearity Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Simple linear model, E[y|x] =x’β Quadratic model: E[y|x] = α + β1x + β2x2 Loglinear model, E[lny|x] = α + Σk lnxkβk Semilog, E[y|x] = α + Σk lnxkβk All are “linear.” An infinite number of variations. Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 79/97 Matrix Notation xik xrow,column xobservation ,var iable Observation 1 y1 1 x11 2 x12 ... K x1K 1 Observation 2 y 2 1 x21 2 x22 ... K x2 K 2 Observation 3 y3 1 x31 2 x32 ... K x3 K 3 ... Observation i yi 1 xi1 2 xi 2 ... K xiK i ... Observation N y1 1 xN 1 2 xN 2 ... K xNK N 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 79 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 80/97 Notation Define column vectors of N observations on y and the K x variables. y1 x11 y2 x21 y yN xN 1 y = X + x1K 1 1 x2 K 2 2 xNK K N x12 x22 xN 2 The assumption means that the rank of the matrix X is K. No linear dependencies => FULL COLUMN RANK of the matrix X. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 81/97 Uniqueness of the Conditional Mean The conditional mean relationship must hold for any set of N observations, i = 1,…,N. Assume, that N K (justified later) E[y1|x] = x1 E[y2|x] = x2 … E[yn|x] = xn All N observations at once: E[y|X] = X = E. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 82/97 Uniqueness of E[y|X] Now, suppose there is a that produces the same expected value, E[y|X] = X = E. Let = - . Then, X = X - X = E - E = 0. Is this possible? X is an NK matrix (N rows, K columns). What does X = 0 mean? We assume this is not possible. This is the ‘full rank’ assumption. Ultimately, it will imply that we can ‘estimate’ . This requires N K . 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 83/97 An Unidentified (But Valid) Theory of Art Appreciation Enhanced Monet Area Effect Model: Height and Width Effects Log(Price) = β1 + β2 log Area + β3 log Aspect Ratio + β4 log Height + β5 Signature + ε (Aspect Ratio = Height/Width) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 84/97 Conditional Homoscedasticity and Nonautocorrelation Disturbances provide no information about each other. Var[i | X ] = 2 Cov[i, j |X] = 0 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 100000 15000 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 4 5 200000 2 1 100000 15000 0 200000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 300000 10 Mean StDev N 10 500000 400000 20 300000 200000 60 50 40 30 Normal 100 12 700000 600000 70 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 80 600000 200000 369687 156865 51 0.994 0.012 ... 0 ... 0 2 ... 0 I ... ... ... 2 0 0 2 ... 0 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 0 2 0 ... 0 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% ... Cov(1 , N ) 2 ... Cov( 2 , N ) 0 ... Cov(3 , N ) 0 ... ... ... ... Var ( N ) 0 Percent Cov(1 , 2 ) Cov(1 , 3 ) Var (1 ) Cov( , ) Var ( 2 ) Cov( 2 , 3 ) 2 1 Cov(3 , 1 ) Cov(3 , 2 ) Var (3 ) ... ... ... Cov( N , 1 ) Cov( N , 2 ) Cov( N , 3 ) 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 85/97 Heteroscedasticity Countries are ordered by the standard deviation of their 19 residuals. Regression of log of per capita gasoline use on log of per capita income, gasoline price and number of cars per capita for 18 OECD countries for 19 years. The standard deviation varies by country. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 86/97 Autocorrelation logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 87/97 Autocorrelation Results from an Incomplete Model 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 88/97 Normal Distribution of ε Used to facilitate finite sample derivations of certain test statistics. Observations are independent Assumption will be unnecessary – we will use the central limit theorem for the statistical results we need. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 89/97 The Linear Model 800000 800000 500000 400000 Mushroom 16.2% Plain 32.5% 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Scatterplot of Listing vs IncomePC Normal - 95% CI 700000 700000 600000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 Frequency Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepper and Onion 7.3% Regression: E[y|X] = X Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% y = X+ε, N observations, K columns in X, (usually including a column of ones for the intercept). Standard assumptions about X Standard assumptions about ε|X E[ε|X]=0, E[ε]=0 and Cov[ε,x]=0 Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 Statistics and Data Analysis Least Squares 91/97 Vocabulary Pie Chart of Percent vs Type Pepperoni 21.8% Sausage 5.8% 900000 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC 900000 Scatterplot of Listing vs IncomePC Normal - 95% CI 90 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 500000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 600000 300000 100000 Probability Plot of Listing 99 700000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Meatball Garlic 5.0% 2.3% Mushroom and Onion 9.2% 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing Percent Frequency Listing Some terms to be used in the discussion. Population characteristics and entities vs. sample quantities and analogs Residuals and disturbances Population regression line and sample regression Objective: Learn about the conditional mean function. Estimate and 2 First step: Mechanics of fitting a line to a set of data Percent 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 92/97 Least Squares 2 e i1 i ee = (y - Xb)'(y - Xb) n A digression on multivariate calculus. Matrix and vector derivatives. Derivative of a scalar with respect to a vector Derivative of a column vector wrt a row vector Other derivatives 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 93/97 Matrix Results ... in1 xi1 xiK ... in1 xi 2 xiK ... ... 2 n ... i 1 xiK in1 xi21 in1 xi1 xi 2 n 2 n x x x i 1 i 2 X'X = i 1 i 2 i1 ... ... n n i 1 xiK xi1 i 1 xiK xi 2 in1 xi1 yi n y x X'y = i i 2 i ... n i xiK yi 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 94/97 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 95/97 Moment Matrices XX = 1 Pg Y 1 Pg Y 12 ...12 .925(1) ... 3.789(1) 6036(1) ... 11934(1) 2 2 .925(1) ... 3.789(1) .925 ... 3.789 6036(.925) ... 11934(3.789) 6036(1) ... 11934(1) 6036(.925) ... 11934(3.789) 60362 ... 119342 Xy = 1 Pg Y G 1(129.7) ... 1(297.8) .925(129.7) ... 3.789(297.8) 6036(129.7) ... 11934(297.8) 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 96/97 Least Squares Normal Equations (y - Xb)'(y - Xb) b (1 1) / (k 1) 2 X'(y - Xb) = 0 ( -2)(N K)'(N 1) = ( -2)(K N)(N 1) = K 1 Note: Derivative of 11 wrt K 1 is a K 1 vector. Solution - Least squares normal equations: X'y = X'Xb Assuming it exists: b = (X'X)-1X'y 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 97/97 Second Order Conditions (y - Xb)'(y - Xb) b 2 X'(y - Xb) (y - Xb)'(y - Xb) column vector b = = b row vector [2X'(y - Xb)] = b = 2X'X 800000 800000 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 Percent 900000 0 1000000 60 800000 40 Listing Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% Frequency 2 (y - Xb)'(y - Xb) bb 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 98/97 Does b Minimize e’e? in1 xi21 in1 xi1 xi 2 n n 2 2 x x x e'e i 1 i 2 2 X'X = 2 i 1 i 2 i1 ... bb' ... n n i 1 xiK xi1 i 1 xiK xi 2 ... in1 xi1 xiK ... in1 xi 2 xiK ... ... n 2 ... i 1 xiK If there were a single b, we would require this to be positive, which it would be; 2x'x = 2 i 1 xi2 0. n The matrix counterpart of a positive number is a positive definite matrix. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000 99/97 Positive Definite Matrix Matrix C is positive definite if a'Ca is > 0 for any a. Generally hard to check. Requires a look at characteristic roots. For some matrices, it is easy to verify. X'X is one of these. 2 v k=1 k 0 K a'(X'X)a = (a'X')( Xa) = ( Xa)'( Xa) = v'v = Could v = 0? v = 0 means Xa = 0. Is this possible? Conclusion: b = ( X'X)-1 X'y does indeed minimize e'e. 600000 500000 400000 Mushroom 16.2% Plain 32.5% Scatterplot of Listing vs IncomePC Normal - 95% CI 90 500000 400000 200000 100000 15000 60 50 40 30 17500 20000 22500 25000 IncomePC 27500 30000 32500 6 5 200000 2 1 100000 15000 400000 600000 Listing 800000 1000000 369687 156865 51 80 8 4 200000 Mean StDev N 10 500000 300000 0 Normal 100 12 700000 400000 10 Marginal Plot of Listing vs IncomePC Empirical CDF of Listing 14 800000 600000 70 20 300000 200000 369687 156865 51 0.994 0.012 80 600000 Histogram of Listing 900000 Mean StDev N AD P-Value 95 700000 300000 100000 Probability Plot of Listing 99 17500 20000 22500 25000 IncomePC 27500 30000 32500 0 1000000 60 800000 40 Listing 800000 800000 Percent 900000 Frequency Sausage 5.8% Scatterplot of Listing vs IncomePC 900000 700000 Listing Pepper and Onion 7.3% Boxplot of Listing Category Pepperoni Plain Mushroom Sausage Pepper and Onion Mushroom and Onion Garlic Meatball Listing Pepperoni 21.8% Listing Meatball Garlic 5.0% 2.3% Percent Pie Chart of Percent vs Type Mushroom and Onion 9.2% 20 600000 400000 0 0 200000 300000 400000 500000 600000 Listing 700000 800000 900000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70 80 90 Listing 200000 15000 20000 25000 IncomePC 30000