Notes 1: Linear Regression.

advertisement
Regression Models
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
1-1
1-1/30
Part 1: Simple Linear Model
Regression and Forecasting Models
Part 1 – Simple Linear Model
1-2
1-2/30
Part 1: Simple Linear Model
Theory
Demand Theory: Q = f(Price)
 “The Law of Demand” Demand curves slope
downward
 What does “ceteris paribus” mean here?

1-3/30
Part 1: Simple Linear Model
Data on the U.S. Gasoline Market
Quantity = G = Expenditure / Price
1-4/30
Part 1: Simple Linear Model
Shouldn’t Demand Curves Slope Downward?
Scatterplot of GasPrice vs G
140
120
GasPrice
100
80
60
40
20
0
0.30
1-5/30
0.35
0.40
0.45
G
0.50
0.55
0.60
0.65
Part 1: Simple Linear Model
Data on 62 Movies in 2010
1-6/30
Part 1: Simple Linear Model
Average Box Office Revenue is about $20.7 Million
1-7/30
Part 1: Simple Linear Model
Is There a Theory for This?
Scatter plot of box office revenues vs. number of “Can’t
Wait To See It” votes on Fandango for 62 movies.
1-8/30
Part 1: Simple Linear Model
Average Box Office by
Internet Buzz Index
= Average Box Office for Buzz in Interval
1-9/30
Part 1: Simple Linear Model
Deterministic Relationship: Not a Theory
Expected High Temperatures, August 11-20, 2013, ZIP 10012, NY
1-10/30
Part 1: Simple Linear Model
Probabilistic Relationship
What Explains the Noise?
Fuel Bill = Function of Rooms + Random Variation
1-11/30
Part 1: Simple Linear Model
Movie Buzz Data
Probabilistic Relationship?
1-12/30
Part 1: Simple Linear Model
The Regression Model
y =  0 +  1x + 
y = dependent variable
x = independent variable
The ‘regression’ is the deterministic part,
0 +  1 x
The ‘disturbance’ (noise) is .
The regression model is E[y|x] = 0 + 1x
1-13/30
Part 1: Simple Linear Model
y
1 = slope
0 = y
intercept
x
Linear Regression Model
1-14/30
Part 1: Simple Linear Model
The Model

Constructed to provide a framework for
interpreting the observed data


What is the meaning of the observed relationship
(assuming there is one)
How it’s used


1-15/30
Prediction: What reason is there to assume that we
can use sample observations to predict outcomes?
Testing relationships
Part 1: Simple Linear Model
The slope is the interesting quantity.
Each additional year of education is associated with an
increase of 3.611 in disability adjusted life expectancy.
1-16/30
Part 1: Simple Linear Model
A Cost Model
Electricity.mpj
Total cost in $Million
Output in Million KWH
N = 123 American electric utilities
Model: Cost = 0 + 1 KWH + ε
1-17/30
Part 1: Simple Linear Model
Cost Relationship
Scatterplot of Cost vs Output
500
400
Cost
300
200
100
0
0
1-18/30
10000
20000
30000
40000
Output
50000
60000
70000
80000
Part 1: Simple Linear Model
Sample Regression
1-19/30
Part 1: Simple Linear Model
Interpreting the Model
Cost = 2.44 + 0.00529 Output + e
 Cost is $Million, Output is Million KWH.
 Fixed Cost = Cost when output = 0
Fixed Cost = $2.44Million
 Marginal cost
= Change in cost/change in output
= .00529 * $Million/Million KWH
= .00529 $/KWH = 0.529 cents/KWH.

1-20/30
Part 1: Simple Linear Model
Covariation and Causality
Fitted Line Plot
DALE = 35.16 + 3.611 EDUC
80
S
R-Sq
R-Sq(adj)
70
7.87034
59.2%
59.0%
DALE
60
50
40
30
20
0
2
4
6
EDUC
8
10
12
Does more education make you live longer (on average)?
1-21/30
Part 1: Simple Linear Model
Causality?
Estimated Income = -451 + 50.2 Height
Height (inches) and Income
($/mo.) in first post-MBA
Job (men). WSJ, 12/30/86.
Ht. Inc. Ht. Inc. Ht. Inc.
70 2990 68 2910 75 3150
67 2870 66 2840 68 2860
69 2950 71 3180 69 2930
70 3140 68 3020 76 3210
65 2790 73 3220 71 3180
73 3230 73 3370 66 2670
64 2880 70 3180 69 3050
70 3140 71 3340 65 2750
69 3000 69 2970 67 2960
73 3170 73 3240 70 3050
1-22/30
Part 1: Simple Linear Model
How to compute the y
intercept, b0, and the
slope, b1, in y = b0 + b1x.
b1
b0
1-23/30
Part 1: Simple Linear Model
Least Squares Regression
1-24/30
Part 1: Simple Linear Model
Fitting a Line to a Set of Points
Gauss’s method
of least squares.
6.4
Yi
6.3
Residuals
ei  yi  (b0  b1x i )
 yi  yˆ i
PerCapitaG
6.2
Choose b0 and b1 to
minimize the sum of
squared residuals
Scatterplot of PerCapitaG vs Income
6.1

6.0
Predictions
b0 + b1xi
5.9
5.8
5.7
5.6
21000
22000
23000
24000
Income
25000
26000
27000
Xi
SS  i1[yi - b0 - b1xi ]  i1[yi - (b0 + b1x i )]  i1ei2
N
1-25/30
2
N
2
N
Part 1: Simple Linear Model
Computing the Least Squares Parameters b0 and b1
4 numbers are needed :
1 N
1 N
y =
y
=
20.721
x
=
x = 0.48242


i1 i
i1 i
N
N
N
1
2
2
Var(x) = s x =
(x

x)
= 0.02453

i
i1
N-1
N
1
Cov(x,y) = s xy =
(x i  x)(yi  y) = 1.784

i1
N-1
s xy
1.784
b1  2 
 72.7181
sx 0.02453
b0  y - b1x = 20.721- (72.7181)(0.48242) = -14.36
1-26/30
Part 1: Simple Linear Model
b1= 72.718
b0=-14.36
1-27/30
Part 1: Simple Linear Model
Least Squares Uses Calculus
SS
1
= N-1
i=1(yi - b0 - b1xi )2
N
2
N (yi - b 0 -b1x i )
SS
1
= N-1
i=1
b0
b0
1
= N-1
i=1 2(yi - b0 - b1xi )(-1) = 0
N
SS
b1
=
1
N-1
(yi - b0 - b1x i )2
i=1
b1
N
1
= N-1
i=1 2(yi - b0 - b1xi )(-xi ) = 0
N
1-28/30
The solution is
b0 = y - b1x where
b1 =
1
N-1
ΣNi=1(x i - x)(yi - y)
N
2
1
Σ
(x
x)
i
N-1 i=1
Part 1: Simple Linear Model
Least
squares
minimizes
the sum of
squared
deviations
from the
line.
b0 =-14.36, b1=72.718, Sum of Squares = 10751.5
b0 =-20.00, b1=73.500, Sum of Squares = 12469.7
1-29/30
Part 1: Simple Linear Model
Summary
Theory vs. practice
 Linear Relationship





Regression Relationship


1-30/30
Deterministic
Random, stochastic, ‘probabilistic’
Mean is a function of x
Causality vs. correlation
Least squares
Part 1: Simple Linear Model
Download