Notes 6: Multiple Regression

advertisement
Regression Models
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
6-1/35
Part 6: Multiple Regression
Regression and Forecasting Models
Part 6 – Multiple Regression
6-2/35
Part 6: Multiple Regression
6-3/35
Part 6: Multiple Regression
6-4/35
Part 6: Multiple Regression
6-5/35
Part 6: Multiple Regression
6-6/35
Part 6: Multiple Regression
6-7/35
Part 6: Multiple Regression
6-8/35
Part 6: Multiple Regression
6-9/35
Part 6: Multiple Regression
6-10/35
Part 6: Multiple Regression
6-11/35
Part 6: Multiple Regression
6-12/35
Part 6: Multiple Regression
Multiple Regression Agenda
The concept of multiple regression
 Computing the regression equation
 Multiple regression “model”
 Using the multiple regression model
 Building the multiple regression model
 Regression diagnostics and inference

6-13/35
Part 6: Multiple Regression
Concept of Multiple Regression



6-14/35
Different conditional means
 Application: Monet’s signature
Holding things constant
 Application: Price and income effects
 Application: Age and education
 Sales promotion: Price and competitors
The general idea of multiple regression
Part 6: Multiple Regression
Monet in Large and Small
Logs of Sale prices of 328 signed Monet paintings
F itte d L ine P lo t
ln (US $ ) = 2.8 25 + 1 .7 2 5 ln (S ur fa ce A r e a )
18
S
17
1.00645
R- S q
20.0%
R- S q (ad j)
19.8%
ln (US $ )
16
15
14
13
12
11
6.0
6.2
6 .4
6 .6
6 .8
7.0
7 .2
7 .4
7.6
ln ( S u r fa c e A r e a )
The residuals do not
show any obvious
patterns that seem
inconsistent with the
assumptions of the
model.
Log of $price = a + b log surface area + e
6-15/35
Part 6: Multiple Regression
How much for the signature?

The sample also contains 102 unsigned
paintings
Average Sale Price
Signed
$3,364,248
Not signed $1,832,712

6-16/35
Average price of a signed Monet is almost
twice that of an unsigned one.
Part 6: Multiple Regression
Can we separate the two effects?
Average Prices
Small Large
Unsigned 346,845 5,795,000
Signed
689,422 5,556,490
What do the data suggest?
(1) The size effect is huge
(2) The signature effect is confined to the
small paintings.
6-17/35
Part 6: Multiple Regression
Thought experiments: Ceteris paribus

Monets of the same size, some signed and
some not, and compare prices. This is the
signature effect.

Consider signed Monets and compare large
ones to small ones. Likewise for unsigned
Monets. This is the size effect.
6-18/35
Part 6: Multiple Regression
A Multiple Regression
S c a tte r plo t o f ln ( U S $ ) v s ln ( S ur fa c e A r e a )
18
S ig n ed
0
17
1
16
b2
ln (US $ )
15
14
13
12
11
10
6.0
6.2
6 .4
6.6
6 .8
7 .0
7.2
7 .4
7 .6
ln ( S ur fa c e A r e a )
Ln Price = b0 + b1 ln Area + b2 (0 if unsigned, 1 if signed) + e
6-19/35
Part 6: Multiple Regression
Monet Multiple Regression
Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed
The regression equation is
ln (US$) = 4.12 + 1.35 ln (SurfaceArea) + 1.26 Signed
Predictor
Coef SE Coef
T
P
Constant
4.1222
0.5585
7.38 0.000
ln (SurfaceArea)
1.3458
0.08151 16.51 0.000
Signed
1.2618
0.1249
10.11 0.000
S = 0.992509
R-Sq = 46.2%
R-Sq(adj) = 46.0%
Interpretation (to be explored as we develop the topic):
(1) Elasticity of price with respect to surface area is 1.3458 – very large
(2) The signature multiplies the price by exp(1.2618) (about 3.5), for any
given size.
6-20/35
Part 6: Multiple Regression
Ceteris Paribus in Theory

Demand for gasoline:
G = f(price,income)

Demand (price) elasticity:
eP = %change in G given %change in P
holding income constant.

How do you do that in the real world?


6-21/35
The “percentage changes”
How to change price and hold income
constant?
Part 6: Multiple Regression
The Real World Data
6-22/35
Part 6: Multiple Regression
U.S. Gasoline Market, 1953-2004
T ime S e r ie s P l o t o f lo gG , lo gInc o me , lo gP g
5
V ar iab le
lo g G
lo g I n c o m e
lo g P g
Da t a
4
3
2
1
1953
196 1
1969
1 9 77
1 98 5
19 9 3
2 0 01
Year
6-23/35
Part 6: Multiple Regression
Shouldn’t Demand Curves Slope Downward?
S c a tte r plo t o f G a s P r i c e v s G
140
120
Ga s Pr ic e
100
80
60
40
20
0
0.30
0 .3 5
0 .4 0
0 .4 5
0.5 0
0.55
0 .6 0
0 .6 5
G
6-24/35
Part 6: Multiple Regression
A Thought Experiment



The main driver of
gasoline consumption
is income not price
Income is growing
over time.
We are not holding
income constant when
we change price!
How do we do that?
6-25/35
S c a tte r plo t o f g v s Inc o me
7
6
5
g

4
3
10000
12 5 0 0
15000
17500
2 0 0 00
2 2 50 0
2 50 0 0
27 5 0 0
In c o me
Part 6: Multiple Regression
How to Hold Income Constant?
Multiple Regression Using Price and Income
Regression Analysis: G versus GasPrice, Income
The regression equation is
G = 0.134 - 0.00163 GasPrice + 0.000026 Income
Predictor
Constant
GasPrice
Income
Coef
0.13449
-0.0016281
0.00002634
SE Coef
0.02081
0.0004152
0.00000231
T
6.46
-3.92
11.43
P
0.000
0.000
0.000
It looks like the theory works.
6-26/35
Part 6: Multiple Regression
Application: WHO

WHO data on 191 countries in 1995-1999.




6-27/35
Analysis of Disability Adjusted Life Expectancy = DALE
EDUC = average years of education
PCHexp = Per capita health expenditure
DALE = α + β1EDUC + β2HealthExp + ε
Part 6: Multiple Regression
The (Famous) WHO Data
6-28/35
Part 6: Multiple Regression
6-29/35
Part 6: Multiple Regression
Specify the Variables in the Model
6-30/35
Part 6: Multiple Regression
6-31/35
Part 6: Multiple Regression
Graphs
6-32/35
Part 6: Multiple Regression
Regression Results
6-33/35
Part 6: Multiple Regression
Practical Model Building
Understanding the regression: The left
out variable problem
 Using different kinds of variables





6-34/35
Dummy variables
Logs
Time trend
Quadratic
Part 6: Multiple Regression
A Fundamental Result
What happens when you leave a crucial
variable out of your model?
Regression Analysis: g versus GasPrice (no income)
The regression equation is
g = 3.50 + 0.0280 GasPrice
Predictor
Coef
SE Coef
T
P
Constant
3.4963
0.1678 20.84 0.000
GasPrice
0.028034 0.002809
9.98 0.000
Regression Analysis: G versus GasPrice, Income
The regression equation is
G = 0.134 - 0.00163 GasPrice + 0.000026 Income
Predictor
Coef
SE Coef
T
P
Constant
0.13449
0.02081
6.46 0.000
GasPrice
-0.0016281
0.0004152 -3.92 0.000
Income
0.00002634 0.00000231 11.43 0.000
6-35/35
Part 6: Multiple Regression
An Elaborate Multiple Loglinear Regression Model
6-36/35
Part 6: Multiple Regression
A Conspiracy Theory
for Art Sales at
Auction
Sotheby’s and Christies, 1995 to
about 2000 conspired on
commission rates.
6-37/35
Part 6: Multiple Regression
If the Theory is Correct…
S c a tte r plo t o f ln ( U S $ ) v s ln ( S ur fa c e A r e a )
18
Sold from 1995 to
2000
16
15
ln (US $ )
Sold before 1995
or after 2000
17
14
13
12
11
10
9
3
4
5
6
7
8
9
ln ( S u r fa c e A r e a )
6-38/35
Part 6: Multiple Regression
Evidence
The statistical
evidence seems to
be consistent with
the theory.
6-39/35
Part 6: Multiple Regression
A Production Function Multiple
Regression Model
Sales of (Cameras/Videos/Warranties) = f(Floor Space, Staff)
6-40/35
Part 6: Multiple Regression
Production Function for Videos
How should I interpret the negative
coefficient on logFloor?
6-41/35
Part 6: Multiple Regression
An Application to Credit Modeling
6-42/35
Part 6: Multiple Regression
Age and Education Effects on Income
6-43/35
Part 6: Multiple Regression
A Multiple Regression
+----------------------------------------------------+
| LHS=HHNINC
Mean
=
.3520836
|
|
Standard deviation
=
.1769083
|
| Model size
Parameters
=
3
|
|
Degrees of freedom
=
27323
|
| Residuals
Sum of squares
=
794.9667
|
|
Standard error of e =
.1705730
|
| Fit
R-squared
=
.07040754
|
+----------------------------------------------------+
+--------+--------------+--+--------+
|Variable| Coefficient | Mean of X|
+--------+--------------+-----------+
Constant| -.39266196
AGE
| .02458140
43.5256898
EDUC
| .01994416
11.3206310
+--------+--------------+-----------+
6-44/35
Part 6: Multiple Regression
Education and Age Effects on Income
Effect on log Income of 8
more years of education
6-45/35
Part 6: Multiple Regression
Download