Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics 1-1 1-1/30 Part 1: Simple Linear Model Regression and Forecasting Models Part 1 – Simple Linear Model 1-2 1-2/30 Part 1: Simple Linear Model Theory Demand Theory: Q = f(Price) “The Law of Demand” Demand curves slope downward What does “ceteris paribus” mean here? 1-3/30 Part 1: Simple Linear Model Data on the U.S. Gasoline Market Quantity = G = Expenditure / Price 1-4/30 Part 1: Simple Linear Model Shouldn’t Demand Curves Slope Downward? Scatterplot of GasPrice vs G 140 120 GasPrice 100 80 60 40 20 0 0.30 1-5/30 0.35 0.40 0.45 G 0.50 0.55 0.60 0.65 Part 1: Simple Linear Model Data on 62 Movies in 2010 1-6/30 Part 1: Simple Linear Model Average Box Office Revenue is about $20.7 Million 1-7/30 Part 1: Simple Linear Model Is There a Theory for This? Scatter plot of box office revenues vs. number of “Can’t Wait To See It” votes on Fandango for 62 movies. 1-8/30 Part 1: Simple Linear Model Average Box Office by Internet Buzz Index = Average Box Office for Buzz in Interval 1-9/30 Part 1: Simple Linear Model Deterministic Relationship: Not a Theory Expected High Temperatures, August 11-20, 2013, ZIP 10012, NY 1-10/30 Part 1: Simple Linear Model Probabilistic Relationship What Explains the Noise? Fuel Bill = Function of Rooms + Random Variation 1-11/30 Part 1: Simple Linear Model Movie Buzz Data Probabilistic Relationship? 1-12/30 Part 1: Simple Linear Model The Regression Model y = 0 + 1x + y = dependent variable x = independent variable The ‘regression’ is the deterministic part, 0 + 1 x The ‘disturbance’ (noise) is . The regression model is E[y|x] = 0 + 1x 1-13/30 Part 1: Simple Linear Model y 1 = slope 0 = y intercept x Linear Regression Model 1-14/30 Part 1: Simple Linear Model The Model Constructed to provide a framework for interpreting the observed data What is the meaning of the observed relationship (assuming there is one) How it’s used 1-15/30 Prediction: What reason is there to assume that we can use sample observations to predict outcomes? Testing relationships Part 1: Simple Linear Model The slope is the interesting quantity. Each additional year of education is associated with an increase of 3.611 in disability adjusted life expectancy. 1-16/30 Part 1: Simple Linear Model A Cost Model Electricity.mpj Total cost in $Million Output in Million KWH N = 123 American electric utilities Model: Cost = 0 + 1 KWH + ε 1-17/30 Part 1: Simple Linear Model Cost Relationship Scatterplot of Cost vs Output 500 400 Cost 300 200 100 0 0 1-18/30 10000 20000 30000 40000 Output 50000 60000 70000 80000 Part 1: Simple Linear Model Sample Regression 1-19/30 Part 1: Simple Linear Model Interpreting the Model Cost = 2.44 + 0.00529 Output + e Cost is $Million, Output is Million KWH. Fixed Cost = Cost when output = 0 Fixed Cost = $2.44Million Marginal cost = Change in cost/change in output = .00529 * $Million/Million KWH = .00529 $/KWH = 0.529 cents/KWH. 1-20/30 Part 1: Simple Linear Model Covariation and Causality Fitted Line Plot DALE = 35.16 + 3.611 EDUC 80 S R-Sq R-Sq(adj) 70 7.87034 59.2% 59.0% DALE 60 50 40 30 20 0 2 4 6 EDUC 8 10 12 Does more education make you live longer (on average)? 1-21/30 Part 1: Simple Linear Model Causality? Estimated Income = -451 + 50.2 Height Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 1-22/30 Part 1: Simple Linear Model How to compute the y intercept, b0, and the slope, b1, in y = b0 + b1x. b1 b0 1-23/30 Part 1: Simple Linear Model Least Squares Regression 1-24/30 Part 1: Simple Linear Model Fitting a Line to a Set of Points Gauss’s method of least squares. 6.4 Yi 6.3 Residuals ei yi (b0 b1x i ) yi yˆ i PerCapitaG 6.2 Choose b0 and b1 to minimize the sum of squared residuals Scatterplot of PerCapitaG vs Income 6.1 6.0 Predictions b0 + b1xi 5.9 5.8 5.7 5.6 21000 22000 23000 24000 Income 25000 26000 27000 Xi SS i1[yi - b0 - b1xi ] i1[yi - (b0 + b1x i )] i1ei2 N 1-25/30 2 N 2 N Part 1: Simple Linear Model Computing the Least Squares Parameters b0 and b1 4 numbers are needed : 1 N 1 N y = y = 20.721 x = x = 0.48242 i1 i i1 i N N N 1 2 2 Var(x) = s x = (x x) = 0.02453 i i1 N-1 N 1 Cov(x,y) = s xy = (x i x)(yi y) = 1.784 i1 N-1 s xy 1.784 b1 2 72.7181 sx 0.02453 b0 y - b1x = 20.721- (72.7181)(0.48242) = -14.36 1-26/30 Part 1: Simple Linear Model b1= 72.718 b0=-14.36 1-27/30 Part 1: Simple Linear Model Least Squares Uses Calculus SS 1 = N-1 i=1(yi - b0 - b1xi )2 N 2 N (yi - b 0 -b1x i ) SS 1 = N-1 i=1 b0 b0 1 = N-1 i=1 2(yi - b0 - b1xi )(-1) = 0 N SS b1 = 1 N-1 (yi - b0 - b1x i )2 i=1 b1 N 1 = N-1 i=1 2(yi - b0 - b1xi )(-xi ) = 0 N 1-28/30 The solution is b0 = y - b1x where b1 = 1 N-1 ΣNi=1(x i - x)(yi - y) N 2 1 Σ (x x) i N-1 i=1 Part 1: Simple Linear Model Least squares minimizes the sum of squared deviations from the line. b0 =-14.36, b1=72.718, Sum of Squares = 10751.5 b0 =-20.00, b1=73.500, Sum of Squares = 12469.7 1-29/30 Part 1: Simple Linear Model Summary Theory vs. practice Linear Relationship Regression Relationship 1-30/30 Deterministic Random, stochastic, ‘probabilistic’ Mean is a function of x Causality vs. correlation Least squares Part 1: Simple Linear Model