Regression Forced March

advertisement
Regression Forced March
17.871
Spring 2006
Regression quantifies how one variable can be
described in terms of another
Black Elected Officials Example I
beo
10.8
0
1.2
30.8
bpop
Stop a second:
What is the correlation between beo
& bpop? .72, .82, .92?
beo
10.8
0
1.2
30.8
bpop
The Linear Relationship between Two
Variables
Yi   0  1 X i   i
The Linear Relationship between African
American Population & Black Legislators
beo
Fitted values
beo
10
 0  1.31
1  0.359
5
0
0
10
20
bpop
30
How did we get that line?
1. Pick a representative value of Yi
beo
Fitted values
Yi
beo
10
5
0
0
10
20
bpop
30
How did we get that line?
2. Decompose Yi into two parts
beo
Fitted values
beo
10
5
0
0
10
20
bpop
30
How did we get that line?
3. Label the points
beo
Fitted values
Yi
Yi-Y^i
^
Y
10
εi
“residual”
beo
i
5
0
0
10
20
30
bpop
Yi  ( 0  1 X i )   i
Stop a moment: What is gi?
• Vagueness of theory
• Poor proxies (i.e., measurement error)
• Wrong functional form
• See Utts & Heckard discussion about the
difference between deterministic
relationships and statistical relationships
The Method of Least Squares
Pick  0 and 1 to minimize
n
2
ˆ
(
Y

Y
)
 i i or
beo
Fitted values
10
beo
 (Yi   0  1 X i )
i 1
^
Yi-Y
i
^
Yi
εi
i 1
n
Yi
5
0
2
0
10
20
bpop
30
n
Solve for
  (Yi   0  1 X i ) 2
i 1
1
0
n
1 
 (Y  Y )( X  X )
i
i 1
i
or
n
(X  X )
i 1
cov( X , Y )
var( X )
i
2
(Utts & Heckard,
p. 164)
n
Solve for
  (Yi   0  1 X i ) 2
i 1
 0
0
 0  Y  1 X
Note that if you rearrange. ....
Y   0  1 X
(Utts & Heckard,
p. 164)
Y  0  1 X
beo
Fitted values
beo
10
5
0
0
10
20
bpop
30
About the Functional Form
• Linear in the variables vs. linear in the
parameters
–
–
–
–
Y = a + bX + e (linear in both)
Y = a + bX + cX2 + e (linear in parms.)
Y = a + Xb + e (linear in variables)
Y = a + lnXb/Zc + e (linear in neither)
• Utts & Heckard pp. 174-175
0
5
10
15
Black Elected Officials
0
10
20
pop
leg
Fitted values
Fitted values
30
Log transformations
Y = a + bX + e
b = dY/dX, or
b = the unit change in Y given a unit
change in X
Typical case
Y = a + b lnX + e
b = dY/(dX/X), or
b = the unit change in Y given a %
change in X
Cases where there’s a natural
limit on growth
ln Y = a + bX + e
b = (dY/Y)/dX, or
b = the % change in Y given a unit
change in X
Exponential growth
ln Y = a + b ln X + e
b = (dY/Y)/(dX/X), or
b = the % change in Y given a %
change in X (elasticity)
Economic production
How “good” is the fitted line?
smally
Fitted values
smally
15
beo
Fitted values
15
-2
1.2
30.8
beo
bpop
bigy
Fitted values
15
-2
30.8
bpop
bigy
1.2
-2
1.2
30.8
bpop
Judging results
• Substantive interpretation of coefficients
• Technical judgment of regression
– Judgment of coefficients
– Judgment of overall fit
Determining Goodness of Fit I
• Coefficients
– Standard error of a coefficient
– t-statistic: coeff./s.e.
Standard error of the regression
picture
beo
Fitted values
Yi
Yi-Y^i
^
Y
10
εi
beo
i
5
0
0
10
20
bpop
30
Determining Goodness of Fit
• Standard error of the regression or standard
error of estimate (Root mean square error in
STATA)
n
s.e.e. 
2
ˆ
 (Yi  Yi )
i 1
d.f. = n-2
d. f .
2
R
beo
picture
Fitted values
10.8
10
^)
(Yi-Y
i
^ -Y)
(Y
i
beo
(Yi-Y)
_
Y
0
-.884722
1.2
30.8
bpop
beo
Fitted values
10
10.8
_
(Yi-Y)
beo
^ _
(Yi-Y)
^)
(Yi-Y
i
_
Y
0
-.884722
1.2
30.8
bpop

n
2
(
Y

Y
)
 " total sumof  squares"
i
i 1

  Y ) 2  " regression sumof  squares"
(
Y
i 1 i

n
n
 ) 2  " residual sumof  squares"
(
Y

Y
i
i 1 i
Determining Goodness of Fit
• R-squared
n
r 
2
2
ˆ
 (Yi  Y )
i 1
n
 (Y  Y )
i 1
or
2
i
percent va riance " explained"
“coefficient of determination”
Return to Black Elected Officials
Example
. reg beo bpop
Source |
SS
df
MS
-------------+-----------------------------Model |
351.26542
1
351.26542
Residual | 67.6326195
39 1.73416973
-------------+-----------------------------Total | 418.898039
40
10.472451
Number of obs
F( 1,
39)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
41
202.56
0.0000
0.8385
0.8344
1.3169
-----------------------------------------------------------------------------beo |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------bpop |
.3584751
.0251876
14.23
0.000
.3075284
.4094219
_cons | -1.314892
.3277508
-4.01
0.000
-1.977831
-.6519535
------------------------------------------------------------------------------
Residuals
ei = Yi – B0 – B1Xi
10
be
o
be
o
AL
5
Fit
ted
va
lue
s
0
0
10
IL
bp
op
20
30
One important numerical property of
residuals
• The sum of the residuals is zero.
Regression Commands in STATA
• reg depvar indvars
• predict newvar
• predict newvar, resid
Height of Sons
Why It’s Called Regression
Height of Fathers
Some Regressions
80
Temperature and Latitude
LosAngelesCA
PhoenixAZ
HoustonTX
MobileAL
SanFranciscoCA
40
DallasTX
MemphisTN
NorfolkVA
PortlandOR
20
BaltimoreMD
NewYorkNY
WashingtonDC
BostonMA
KansasCityMO
PittsburghPA
ClevelandOH
SyracuseNY
MinneapolisMN
DuluthMN
0
JanTemp
60
MiamiFL
25
30
35
latitude
40
45
. reg jantemp latitude
Source |
SS
df
MS
-------------+-----------------------------Model | 3250.72219
1 3250.72219
Residual | 1185.82781
18 65.8793228
-------------+-----------------------------Total |
4436.55
19 233.502632
Number of obs
F( 1,
18)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
20
49.34
0.0000
0.7327
0.7179
8.1166
-----------------------------------------------------------------------------jantemp |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------latitude | -2.341428
.3333232
-7.02
0.000
-3.041714
-1.641142
_cons |
125.5072
12.77915
9.82
0.000
98.65921
152.3552
-----------------------------------------------------------------------------. predict py
(option xb assumed; fitted values)
. predict ry,resid
80
60
MiamiFL
LosAngelesCA
PhoenixAZ
HoustonTX
MobileAL
SanFranciscoCA
40
DallasTX
MemphisTN
NorfolkVA
PortlandOR
20
BaltimoreMD
NewYorkNY
WashingtonDC
BostonMA
KansasCityMO
PittsburghPA
ClevelandOH
SyracuseNY
MinneapolisMN
0
DuluthMN
25
30
35
latitude
Fitted values
40
JanTemp
45
gsort -ry
. list city jantemp py ry
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
+-------------------------------------------------+
|
city
jantemp
py
ry |
|-------------------------------------------------|
|
PortlandOR
40
17.8015
22.1985 |
| SanFranciscoCA
49
36.53293
12.46707 |
|
LosAngelesCA
58
45.89864
12.10136 |
|
PhoenixAZ
54
48.24007
5.759929 |
|
NewYorkNY
32
29.50864
2.491357 |
|-------------------------------------------------|
|
MiamiFL
67
64.63007
2.36993 |
|
BostonMA
29
27.16722
1.832785 |
|
NorfolkVA
39
38.87436
.125643 |
|
BaltimoreMD
32
34.1915
-2.1915 |
|
SyracuseNY
22
24.82579
-2.825786 |
|-------------------------------------------------|
|
MobileAL
50
52.92293
-2.922928 |
|
WashingtonDC
31
34.1915
-3.1915 |
|
MemphisTN
40
43.55721
-3.557214 |
|
ClevelandOH
25
29.50864
-4.508643 |
|
DallasTX
43
48.24007
-5.240071 |
|-------------------------------------------------|
|
HoustonTX
50
55.26435
-5.264356 |
|
KansasCityMO
28
34.1915
-6.1915 |
|
PittsburghPA
25
31.85007
-6.850072 |
| MinneapolisMN
12
20.14293
-8.142929 |
|
DuluthMN
7
15.46007
-8.460073 |
+-------------------------------------------------+
Bush Vote and Southern Baptists
.7
UT
WY
ID
NE
OK
.6
SD
KS
IN
MT
.5
OH
AL
TX
AK
MSKY
WV
AZ
NC
VA
MO
FL
CO
IA
WI
PANH
MN MI OR
NJ DE
WA
ME
IL
CA
CT
MD
.4
Bush Pct 2004
ND
NV
SC GA
TN
LA
AR
NM
HI
NY
VT
RI
MA
0
.2
.4
Southern Baptist %
Bush
Fitted values
.6
. reg bush sbc_mpct
Source |
SS
df
MS
-------------+-----------------------------Model | .069183833
1 .069183833
Residual | .280630922
48 .005846478
-------------+-----------------------------Total | .349814756
49 .007139077
Number of obs
F( 1,
48)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
50
11.83
0.0012
0.1978
0.1811
.07646
-----------------------------------------------------------------------------bush |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sbc_mpct |
.196814
.0572138
3.44
0.001
.0817779
.3118501
_cons |
.4931758
.0155007
31.82
0.000
.4620095
.524342
------------------------------------------------------------------------------
.7
UT
WY
ID
NE
OK
.6
SD
KS
IN
MT
.5
OH
AL
TX
AK
MSKY
WV
AZ
NC
VA
MO
FL
CO
IA
WI
PANH
MN MI OR
NJ DE
WA
ME
IL
CA
CT
MD
.4
Bush Pct 2004
ND
NV
SC GA
TN
LA
AR
NM
HI
NY
VT
RI
MA
0
.2
.4
Southern Baptist %
Bush
Fitted values
.6
Weight by State Population
. reg bush sbc_mpct [aw=votes]
(sum of wgt is
1.2207e+08)
Source |
SS
df
MS
-------------+-----------------------------Model | .118925068
1 .118925068
Residual | .142084951
48 .002960103
-------------+-----------------------------Total | .261010018
49 .005326735
Number of obs
F( 1,
48)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
50
40.18
0.0000
0.4556
0.4443
.05441
-----------------------------------------------------------------------------bush |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sbc_mpct |
.261779
.0413001
6.34
0.000
.1787395
.3448185
_cons |
.4563507
.0112155
40.69
0.000
.4338004
.4789011
------------------------------------------------------------------------------
.7
.6
Bush Pct 2004
.5
.4
0
.4
.2
Southern Baptist %
Bush
Fitted values
Fitted values
.6
Midterm loss & pres’l popularity
2002
0
1998
1962
1986
1990
1970
-20
1978
1954
-40
1982
1950
1942
19741966 1958
1994
-60
1946
-80
1938
30
40
50
Gallup approval rating (Nov.)
60
70
. reg loss gallup
Source |
SS
df
MS
-------------+-----------------------------Model | 2493.96962
1 2493.96962
Residual | 6564.50097
15 437.633398
-------------+-----------------------------Total | 9058.47059
16 566.154412
Number of obs
F( 1,
15)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
17
5.70
0.0306
0.2753
0.2270
20.92
-----------------------------------------------------------------------------loss |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gallup |
1.283411
.53762
2.39
0.031
.1375011
2.429321
_cons | -96.59926
29.25347
-3.30
0.005
-158.9516
-34.24697
------------------------------------------------------------------------------
2002
0
1998
1990
1970
-20
1978
1962
1986
1954
-40
1982
1950
1942
19741966 1958
1994
-60
1946
-80
1938
30
40
50
Gallup approval rating (Nov.)
loss
Fitted values
60
70
. reg loss gallup if year>1948
Source |
SS
df
MS
-------------+-----------------------------Model | 3332.58872
1 3332.58872
Residual | 2280.83985
12 190.069988
-------------+-----------------------------Total | 5613.42857
13 431.802198
Number of obs
F( 1,
12)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
14
17.53
0.0013
0.5937
0.5598
13.787
-----------------------------------------------------------------------------loss |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gallup |
1.96812
.4700211
4.19
0.001
.9440315
2.992208
_cons | -127.4281
25.54753
-4.99
0.000
-183.0914
-71.76486
------------------------------------------------------------------------------
2002
-20
0
1998
1990
1970
1978
1962
1986
1954
-40
1982
1950
1942
-60
19741966 1958
1994
1946
-80
1938
30
40
50
Gallup approval rating (Nov.)
loss
Fitted values
60
Fitted values
70
Download