China Carbon Emmissions and Population Factors 1978-2008

advertisement
Ridge Regression
Population Characteristics and Carbon Emissions in China
(1978-2008)
Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon Emissions in China During 1978-2008,”
Environmental Impact Assessment Review, Vol. 36, pp. 1-8
Data Description/Model
• Data Years: 1978-2008 (n = 31 Years)
• Dependent Variable – Carbon Emissions (million-tons)
• Independent Variables





Population (10,000s)
Urbanization Rate (%)
Percentage of Population of Working Age (%)
Average Household Size (persons/hhold)
Per Capita Expenditures (Adjusted to Year=2000)
ln  Ct    0   P ln  Pt   U ln U t   W ln Wt    H ln  H t    E ln  Et    t
Short-hand Notation: Yt   0  1 X t1   2 X t 2  3 X t 3   4 X t 4  5 X t 5   t
Correlation Transformation
n
Yt * 
Yt  Y
sY n  1
 X 11*
 *
X 21
*

X 

 *
 X 31,1
X 'X  R XX
*
X tj* 
*
*
X 22
*
X 31,2
r12
1
r52
j
s j n 1
X 15* 
* 
X 25


* 
X 31,5 
X 12*
1
r
  21


 r51
X tj  X
r15 
r25 


1
Xj
X
t 1
 X
n
tj
n
s 2j 
Y1* 
 *
Y
Y*   2 
 
 *
Y31 
X*'Y*  R XY
 rY 1 
r 
  Y2
 
 
 rY 5 
t 1
tj
X
n 1
j

Data
R_XX
1.0000
0.9753
0.9712
-0.9873
0.9847
0.9753
1.0000
0.9801
-0.9907
0.9952
0.9712
0.9801
1.0000
-0.9802
0.9773
-0.9873
-0.9907
-0.9802
1.0000
-0.9942
R_XY
0.952534
0.974802
0.960147
-0.97861
0.982587
0.9847
0.9952
0.9773
-0.9942
1.0000
Note that the X-variables are very highly correlated, causing problems when it is inverted
and used to obtain the least squares estimate of  and is variance-covariance matrix.
Eigenvalues of X*’X* :
(X’X)-1:
Number
1
2
3
4
5
Eigenvalue
4.9346
0.0311
0.0249
0.0064
0.003
VIF1 = 50.62
Percent
98.692
0.622
0.498
0.129
0.059
Cumulative
98.692
99.314
99.812
99.941
100
VIF2 = 147.72
VIF3 = 31.40
INV(R_XX)
50.62
36.87
-9.85
32.98
-44.13
36.87
147.72
-27.81
21.50
-134.78
VIF4 = 122.38
-9.85
-27.81
31.40
13.30
19.91
32.98
21.50
13.30
122.38
54.80
VIF5 = 213.60
-44.13
-134.78
19.91
54.80
213.60
Ridge Regression
• Method of producing a biased estimator of  that
has a smaller Mean Square Error than OLS
• Mean Square Error of Estimator = Variance + Bias2
• Ridge estimator trades of bias for large reduction of
variance when the predictor variables are highly
correlated
• Problem: Choosing the shrinkage parameter c
• We will work with the standardized regression model
based on the correlation transformed variables, then
“back transform” the regression coefficients to
original scale
Ridge Estimator (Standardized X, Y)
γ OLS =  X 'X
^
*

* -1
X 'Y
*
V γ OLS  
γ R =  X 'X  cI  X 'Y
^
*
-1
*
γ R   X 'X  cI 
^
*
*
-1
*
* -1

 c  X 'X   I 


 

^
*
*
^
V γR  
*
 X 'X 
2
* -1
*
2
 X 'X
*
1  * * -1 1 
X 'Y  I  X 'X   I 
c 
c 
*
1
*
1
*
 X 'X 
*
* -1
 cI 
-1
 X 'X  X 'X
*
 X 'X 
* -1
*
*
*
*
 cI 
-1
X*'Y* 
1 ^
*
* -1

X 'Y   c  X 'X   I  γ OLS


*
*
Note:  A -1 + B -1  = A  A + B  B = B  A + B  A
-1
 
-1
-1
^
p
V  OLS , j
j 1


2
 
^
p

j 1
1
j
p
V  R, j
j 1


2
p

j 1
j
 j  c
2
 j  j th eigenvalue of X*'X*
Note the unconventional notation of  as the standardized regression coefficient vector
c0
China Carbon Emissions Data (c = 0.20)
INV(X*'X*)
50.617
36.874
-9.848
32.980
-44.128
INV(X*'X*+0.2I)
3.617
-0.746
-0.798
0.957
-0.906
36.874
147.722
-27.808
21.499
-134.777
-0.746
3.803
-0.888
0.936
-1.042
-9.848
-27.808
31.397
13.302
19.914
32.980
21.499
13.302
122.378
54.796
-0.798
-0.888
3.540
0.848
-0.788
0.957
0.936
0.848
3.891
0.971
-44.128
-134.777
19.914
54.796
213.603
beta_OLS
-0.931
-1.045
0.207
-0.775
1.967
_i
4.9346
0.0311
0.0249
0.0064
0.0030
Sum
1/_i
0.2027
32.1543
40.1606
156.2500
333.3333
562.1010
-0.906
-1.042
-0.788
0.971
3.888
beta_R
0.124
0.203
0.168
-0.215
0.234
_i
4.9346
0.0311
0.0249
0.0064
0.0030
Sum
_i/(_i+0.20)^2
0.1872
0.5823
0.4923
0.1502
0.0728
1.4848
 
^
Note the difference:
5
V  OLS , j
j 1


2
 
^
 562.1010
5
V  R, j
j 1


2
 1.4848
The estimated regression coefficients have changed large amounts and in signs
for Population and Urbanization rate
Back-Transforming Coefficients to Original Scale
Letting  j represent the coefficient in original (log, in this example) scale:
sY
j   j
sj
j  1,..., p
 0  Y  1 X 1 
^
 R, j
^
sY ^
  R, j
sj
^
 p X p
j  1,..., p
 R ,0  Y   R ,1 X 1 
^
  R, p X p
Choosing the Shrinkage Parameter, c
• Ridge Trace – Plot of the standardized ridge
regression coefficients versus c and observe where
they flatten out
• Cc Statistic – Similar to Cp used in regression model
selection
• PRESS Statistic extended to Ridge Regression – CrossValidation Sum of Squares for “left-out” residuals
• Generalized Cross-Validation – Similar to PRESS,
based on prediction
• Plot of VIFs versus c and observe where they all fall
below 10
Cc - Statistic
H c  X*  X*'X*  cI  X*'
1
p
j
1
1
*
*
*
* 
*
*
*
*



tr  H c   tr  X  X 'X  cI  X '  tr  X 'X  X 'X  cI    



 j 1    c 2
j
SSEc
SSEc

n

2

2tr
H

C

 2 1  tr  H c    n


c
c
2
2
s
s
^
^
 *
  *

*
*
where: SSEc   Y  X γ R (c)  '  Y  X γ R (c) 

 

Cc 
^
^
 *
  *

*
*
s  MSEOLS   Y  X γ OLS  '  Y  X γ OLS 

 

1+tr  H c   "Effective Sample Size", replacing p in C p
2
Cp 
SSE (Model)
 2 p ' n
s2
Goal: Choose c to minimize Cc
where p  # of predictors in current model and p '  p  1
“PRESS” Statistic


ei ,c
 

i 1 
1  (1/ n)  hii ,c 
n
PRRidge
2
^
ei ,c  Yi  x ' γ R (c)
*
*
i
x*i'   X i*1
X i*2 
X i*2
γ R (c)   X 'X  cI  X*'Y*
^
*
*
1
 h11,c
h
1
21, c
*
*
*
H c  X  X 'X  cI  X*'  


 hn1,c
Goal: Choose c to minimize PRRidge
h12,c
h22,c
hn 2,c
h1n ,c 
h2 n ,c 


hnn ,c 
Generalized Cross Validation
n
GCV 
2
e
 i ,c
i 1
 n  1  tr  H c   
2
^
ei ,c  Yi  x ' γ R (c)
*
*
i
x*i'   X i*1
X i*2 
X i*2
γ R (c)   X 'X  cI  X*'Y*
^
*
*
1
 h11,c
h

1
21, c
H c  X*  X*'X*  cI  X*'  


 hn1,c
Goal: Choose c to minimize GCV
h12,c
h22,c
hn 2,c
h1n ,c 
h2 n ,c 


hnn ,c 
Cc , PRESS, GCV for China Carbon Data
c
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0.0009
0.0010
0.0011
0.0012
0.0013
0.0014
0.0015
0.0016
C_c
6.0000
5.9166
5.8859
5.8991
5.9491
6.0297
6.1359
6.2635
6.4088
6.5689
6.7412
6.9235
7.1139
7.3110
7.5134
7.7198
7.9294
PRESS
0.1348
0.1344
0.1344
0.1346
0.1352
0.1359
0.1369
0.1379
0.1391
0.1404
0.1417
0.1431
0.1445
0.1459
0.1474
0.1489
0.1504
GCV
0.00015984
0.00015931
0.00015912
0.00015920
0.00015951
0.00016001
0.00016067
0.00016145
0.00016234
0.00016332
0.00016436
0.00016546
0.00016660
0.00016778
0.00016899
0.00017021
0.00017145
All of these methods select very low values for c. The graphical methods tend to
choose larger values for the stabilization of the regression coefficients and VIFs.
Variance Inflation Factors
VIF  diagonal elements of R   X 'X
-1
*

* -1
For Ridge Regression , we have:
VIF (c)  diagonal elements of  X 'X  cI  X 'X  X 'X  cI 
*
*
-1
*
*
*
*
-1
Final Model – Estimated Regression Coefficients
• The Residual based measures Cc,
PRESS, and GCV suggest very
small values of c
• The Ridge Trace suggests larger
value, with coefficients stabilizing
above c = 0.15 or so
• The VIF plot suggests values
above c = 0.03 having all VIF
values less than 10
• The authors used c = 0.20, based
on the ridge trace
Variable
lnPop
lnUrban
lnWorkforce
lnHholdSz
lnExpend
constant
beta-hat
0.5540
0.3332
1.3212
-0.7835
0.1645
-2.0923
Download