Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics 6-1/35 Part 6: Multiple Regression Regression and Forecasting Models Part 6 – Multiple Regression 6-2/35 Part 6: Multiple Regression 6-3/35 Part 6: Multiple Regression 6-4/35 Part 6: Multiple Regression 6-5/35 Part 6: Multiple Regression 6-6/35 Part 6: Multiple Regression 6-7/35 Part 6: Multiple Regression 6-8/35 Part 6: Multiple Regression 6-9/35 Part 6: Multiple Regression 6-10/35 Part 6: Multiple Regression 6-11/35 Part 6: Multiple Regression 6-12/35 Part 6: Multiple Regression Multiple Regression Agenda The concept of multiple regression Computing the regression equation Multiple regression “model” Using the multiple regression model Building the multiple regression model Regression diagnostics and inference 6-13/35 Part 6: Multiple Regression Concept of Multiple Regression 6-14/35 Different conditional means Application: Monet’s signature Holding things constant Application: Price and income effects Application: Age and education Sales promotion: Price and competitors The general idea of multiple regression Part 6: Multiple Regression Monet in Large and Small Logs of Sale prices of 328 signed Monet paintings F itte d L ine P lo t ln (US $ ) = 2.8 25 + 1 .7 2 5 ln (S ur fa ce A r e a ) 18 S 17 1.00645 R- S q 20.0% R- S q (ad j) 19.8% ln (US $ ) 16 15 14 13 12 11 6.0 6.2 6 .4 6 .6 6 .8 7.0 7 .2 7 .4 7.6 ln ( S u r fa c e A r e a ) The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model. Log of $price = a + b log surface area + e 6-15/35 Part 6: Multiple Regression How much for the signature? The sample also contains 102 unsigned paintings Average Sale Price Signed $3,364,248 Not signed $1,832,712 6-16/35 Average price of a signed Monet is almost twice that of an unsigned one. Part 6: Multiple Regression Can we separate the two effects? Average Prices Small Large Unsigned 346,845 5,795,000 Signed 689,422 5,556,490 What do the data suggest? (1) The size effect is huge (2) The signature effect is confined to the small paintings. 6-17/35 Part 6: Multiple Regression Thought experiments: Ceteris paribus Monets of the same size, some signed and some not, and compare prices. This is the signature effect. Consider signed Monets and compare large ones to small ones. Likewise for unsigned Monets. This is the size effect. 6-18/35 Part 6: Multiple Regression A Multiple Regression S c a tte r plo t o f ln ( U S $ ) v s ln ( S ur fa c e A r e a ) 18 S ig n ed 0 17 1 16 b2 ln (US $ ) 15 14 13 12 11 10 6.0 6.2 6 .4 6.6 6 .8 7 .0 7.2 7 .4 7 .6 ln ( S ur fa c e A r e a ) Ln Price = b0 + b1 ln Area + b2 (0 if unsigned, 1 if signed) + e 6-19/35 Part 6: Multiple Regression Monet Multiple Regression Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed The regression equation is ln (US$) = 4.12 + 1.35 ln (SurfaceArea) + 1.26 Signed Predictor Coef SE Coef T P Constant 4.1222 0.5585 7.38 0.000 ln (SurfaceArea) 1.3458 0.08151 16.51 0.000 Signed 1.2618 0.1249 10.11 0.000 S = 0.992509 R-Sq = 46.2% R-Sq(adj) = 46.0% Interpretation (to be explored as we develop the topic): (1) Elasticity of price with respect to surface area is 1.3458 – very large (2) The signature multiplies the price by exp(1.2618) (about 3.5), for any given size. 6-20/35 Part 6: Multiple Regression Ceteris Paribus in Theory Demand for gasoline: G = f(price,income) Demand (price) elasticity: eP = %change in G given %change in P holding income constant. How do you do that in the real world? 6-21/35 The “percentage changes” How to change price and hold income constant? Part 6: Multiple Regression The Real World Data 6-22/35 Part 6: Multiple Regression U.S. Gasoline Market, 1953-2004 T ime S e r ie s P l o t o f lo gG , lo gInc o me , lo gP g 5 V ar iab le lo g G lo g I n c o m e lo g P g Da t a 4 3 2 1 1953 196 1 1969 1 9 77 1 98 5 19 9 3 2 0 01 Year 6-23/35 Part 6: Multiple Regression Shouldn’t Demand Curves Slope Downward? S c a tte r plo t o f G a s P r i c e v s G 140 120 Ga s Pr ic e 100 80 60 40 20 0 0.30 0 .3 5 0 .4 0 0 .4 5 0.5 0 0.55 0 .6 0 0 .6 5 G 6-24/35 Part 6: Multiple Regression A Thought Experiment The main driver of gasoline consumption is income not price Income is growing over time. We are not holding income constant when we change price! How do we do that? 6-25/35 S c a tte r plo t o f g v s Inc o me 7 6 5 g 4 3 10000 12 5 0 0 15000 17500 2 0 0 00 2 2 50 0 2 50 0 0 27 5 0 0 In c o me Part 6: Multiple Regression How to Hold Income Constant? Multiple Regression Using Price and Income Regression Analysis: G versus GasPrice, Income The regression equation is G = 0.134 - 0.00163 GasPrice + 0.000026 Income Predictor Constant GasPrice Income Coef 0.13449 -0.0016281 0.00002634 SE Coef 0.02081 0.0004152 0.00000231 T 6.46 -3.92 11.43 P 0.000 0.000 0.000 It looks like the theory works. 6-26/35 Part 6: Multiple Regression Application: WHO WHO data on 191 countries in 1995-1999. 6-27/35 Analysis of Disability Adjusted Life Expectancy = DALE EDUC = average years of education PCHexp = Per capita health expenditure DALE = α + β1EDUC + β2HealthExp + ε Part 6: Multiple Regression The (Famous) WHO Data 6-28/35 Part 6: Multiple Regression 6-29/35 Part 6: Multiple Regression Specify the Variables in the Model 6-30/35 Part 6: Multiple Regression 6-31/35 Part 6: Multiple Regression Graphs 6-32/35 Part 6: Multiple Regression Regression Results 6-33/35 Part 6: Multiple Regression Practical Model Building Understanding the regression: The left out variable problem Using different kinds of variables 6-34/35 Dummy variables Logs Time trend Quadratic Part 6: Multiple Regression A Fundamental Result What happens when you leave a crucial variable out of your model? Regression Analysis: g versus GasPrice (no income) The regression equation is g = 3.50 + 0.0280 GasPrice Predictor Coef SE Coef T P Constant 3.4963 0.1678 20.84 0.000 GasPrice 0.028034 0.002809 9.98 0.000 Regression Analysis: G versus GasPrice, Income The regression equation is G = 0.134 - 0.00163 GasPrice + 0.000026 Income Predictor Coef SE Coef T P Constant 0.13449 0.02081 6.46 0.000 GasPrice -0.0016281 0.0004152 -3.92 0.000 Income 0.00002634 0.00000231 11.43 0.000 6-35/35 Part 6: Multiple Regression An Elaborate Multiple Loglinear Regression Model 6-36/35 Part 6: Multiple Regression A Conspiracy Theory for Art Sales at Auction Sotheby’s and Christies, 1995 to about 2000 conspired on commission rates. 6-37/35 Part 6: Multiple Regression If the Theory is Correct… S c a tte r plo t o f ln ( U S $ ) v s ln ( S ur fa c e A r e a ) 18 Sold from 1995 to 2000 16 15 ln (US $ ) Sold before 1995 or after 2000 17 14 13 12 11 10 9 3 4 5 6 7 8 9 ln ( S u r fa c e A r e a ) 6-38/35 Part 6: Multiple Regression Evidence The statistical evidence seems to be consistent with the theory. 6-39/35 Part 6: Multiple Regression A Production Function Multiple Regression Model Sales of (Cameras/Videos/Warranties) = f(Floor Space, Staff) 6-40/35 Part 6: Multiple Regression Production Function for Videos How should I interpret the negative coefficient on logFloor? 6-41/35 Part 6: Multiple Regression An Application to Credit Modeling 6-42/35 Part 6: Multiple Regression Age and Education Effects on Income 6-43/35 Part 6: Multiple Regression A Multiple Regression +----------------------------------------------------+ | LHS=HHNINC Mean = .3520836 | | Standard deviation = .1769083 | | Model size Parameters = 3 | | Degrees of freedom = 27323 | | Residuals Sum of squares = 794.9667 | | Standard error of e = .1705730 | | Fit R-squared = .07040754 | +----------------------------------------------------+ +--------+--------------+--+--------+ |Variable| Coefficient | Mean of X| +--------+--------------+-----------+ Constant| -.39266196 AGE | .02458140 43.5256898 EDUC | .01994416 11.3206310 +--------+--------------+-----------+ 6-44/35 Part 6: Multiple Regression Education and Age Effects on Income Effect on log Income of 8 more years of education 6-45/35 Part 6: Multiple Regression