Revised Chapter 10 in Specifying and Diagnostically Testing Econometric Models (Edition 3) © by Houston H. Stokes 29 November 2011. All rights reserved. Preliminary Draft Chapter 10 Special Topics in OLS Estimation ............................................................................................... 2 10.0 Introduction ....................................................................................................................... 2 10.1 The QR Approach ............................................................................................................. 2 10.2 The Principal-Component Regression Model ............................................................... 10 10.3 Ridge, Lasso and Elastic Net Models ............................................................................ 15 Figure 10.1 OLS vs GLM Yhat Values for Out-of-Sample Observations ............................. 20 Figure 10.2 Out-of-Sample Residuals for OLS and GLM Models ........................................ 20 Figure 10.3 Effect of changes in Lamda on the Out-of-Sample RSS..................................... 21 Figure 10.4 RSS for Out-of-Sample Models vs Number of Vectors in the Model ................ 22 Table 10.1 Effect on the Out-of_Sample RSS of values of and the Number of Coefficients ................................................................................................................................................ 23 10.4 Partial Least Squares and Continuum Regression Models ......................................... 23 Table 10.2 Matlab Code to Obtain PLS Model Solution suggested by de Jong (1993) ......... 27 Table 10.3 B34S Implementation of SIMPLS Calculation .................................................... 28 Table 10.4 PLS1 the Jong-Wise-Ricker (2001) PLS-CRM Estimation Approach ................ 30 Table 10.5 B34S Implementation of PLS1 including CRMTEST ......................................... 31 Table 10.6 Effect on the RSS of PCM, PLS and CRM Models of Varying Degrees ............ 35 Figure 10.6 Residual Sum of Squares for various CR Models ............................................ 44 Figure 10.7 Residual Sum of Squares Surface .................................................................... 45 Table 10.7 Setup for Analysis of Octain Data ........................................................................ 45 Figure 10.8 Octain vs NIG Data ............................................................................................. 47 Figure 10.9 Sensitivity of Octain Model to # of Vectors and setting ................................ 48 Figure 10.10 Octain Model Matrix T ..................................................................................... 49 Figure 10.11 Maping of X data to the PLS Vectors ............................................................... 50 Table 10.8 Tests to illustrate PLS Model intermediate calculations. ..................................... 54 10.5 Boosting ............................................................................................................................ 62 10.6 Extended Examples ......................................................................................................... 63 Table 10.9 Example file for Shrinkage Models ..................................................................... 63 Table 10.10 Ridge and Lasso Routines .................................................................................. 64 Table 10.11 LTS and LTS_REC Routines for Resistant Estimation ..................................... 70 Table 10.12 Estimation of LTS Based Models ...................................................................... 72 Table 10.13 Boosting Routine ................................................................................................ 75 Table 10.14 Modified Forward Stagewise model boosting ................................................... 76 Table 10.15 Boosting Test Case ............................................................................................. 77 Figure 10.12 OLS Boosting Example .................................................................................... 78 Table 10.16 Modifications to OLS Boosting to Facilitate Forecasting .................................. 80 Table 10.17 Forecasting an OLS Boosting Model ................................................................. 82 10-1 10-2 Chapter 10 Table 10.18 Wampler Test Problem ...................................................................................... 86 Table 10.19 Longley Test Data .............................................................................................. 89 Table 10.20 Results from the Longley Total Equation .......................................................... 92 Table 10.21 Correlation between the Error term and Right-Hand-Side................................. 93 Table 10.22 Effect of Data Precision on Accuracy using the Grunfeld Data ......................... 94 Table 10.23 Effect of Data Precision on Accuracy using Gas Data ....................................... 99 Table 10.24 Matlab Commands to Replicate Accuracy Results Obtained with B34S ........ 105 Table 10.25. Correlation of the Residual and RHS Variables using Matlab 2006b............. 106 Table 10.26. Correlation of the Residual and RHS Variables using Matlab 2007a ............. 106 10.7 Conclusion ...................................................................................................................... 106 Special Topics in OLS Estimation 10.0 Introduction The qr command in B34S allows calculation of a high accuracy solution to the OLS regression problem and is discussed in section 10.1. Using this command the principal component regression can be calculated to give further insight into possible rank problems and is discussed in section 10.2 where the singular value decomposition is introduced. The matrix command provides substantially more capability and can be used to provide additional insight. After an initial survey of the theory, a number of examples are presented. Further examples on this important topic are presented in section 16.7 which involves the matrix command and illustrates accuracy gains due to data precision and alternative methods of calculation. Section 10.3 discusses the ridge, lasso, least trimmed squares and elastic net models which can be used to shrink the variables on the right hand side and/or remove outliers. These procedures are related to the principle component regression model which can also be used to shrink the model. Section 10.4 is devoted to the partial least squares (PLS) procedure which is shown to be a contromise between the continuum of models between OLS to principle component regression (PCR) in terms of skrinkage. The continuum power regression model is shown to be a more general setup incolving OLS, PLS and PCR as special cases. Section 10.5 is devoted to boosting while extended examples for many of the procedures discussed are given in section 10.6. 10.1 The QR Approach Interest in the QR approach to estimation was stimulated by Longley's (1967) seminal paper on computer accuracy.1 Equation (4.1-6) indicated how the OLS calculation of ˆ , which is usually 1 The literature on the QR approach is vast. No attempt will be made to provide a summary in the limited space available in this book. Good references include Longley (1984), Dongarra et al (1979) and Strang (1976). Chapter 4 of this book provides a discussion of this approach applied to systems estimation. This chapter provides some examples and a brief intuitive discussion based in part on Dongarra et al (1979), which documents the LINPACK routines DQRDC and DQRSL. These routines are used for all QR calculations. Special Topics in OLS Estimation 10-3 calculated as ( X ' X )1 X ' y , could be more accurately calculated as R 1Q1' y , where the T by K matrix X was initially factored as a product of the T by T orthogonal matrix Q and the upper triangular K by K Cholesky matrix R: R Q' X 0 (10.1-1) Matrix Q is usually partitioned as Q Q1 Q2 so that X Q1R (10.1-2) where Q1 is a T by K matrix that is usually kept in factored form as Q H1H 2 , (10.1-3) , Hk where H i is the Householder transformation (Strang 1976, 279). Q2 is a T by (T K ) matrix. In terms of the QR factorization the parameter values and the fitted values are, ˆ ( X ' X )1 X ' y ( R ' Q1'Q1R)1 R ' Q1' y ( R ' R)1 R ' Q1' y R 1Q1' y (10.1-4) and X ˆ Q1RR1Q1' y Q1Q1' y . (10.1-5) Davidson and MacKinnon (1993, 30) write X ˆ Q1Rˆ Q1ˆ . In view of (10-1-5) Q1ˆ Q1Q1' y and ˆ Q1' y (10.1-6) from which the fitted values (10.1-5) can be calculated. Assuming T is large relative to K, then Q1 is T by K and Q2 is T by (T-K) matrix, which is substantially larger. Usually the "economy" QR is made where only Q1 is calculated. In many cases even Q1 is not needed, since only R is needed to ( X ' X ) 1 ( R ' Q 'Q R ) 1 R 1 ( R 1 )' from which it is possible to get the standard errors and ˆ using 1 1 a more accurate estimate of R than obtained from the Cholesky factorization of X ' X . An example of the gain will be shown below. Dongarra et al (1979, sec. 9.1) show that given X is of rank K, then the matrix 10-4 Chapter 10 PX Q1Q1' (10.1-7) is the orthogonal projection onto the column space of X and PX Q2 Q2' (10.1-8) is the projection onto the orthogonal complement of X in view of equation (10.1-1). The residuals of an OLS model are constrained to be orthogonal or have zero correlation with the right-hand sides of the equation. Using a regression the mapping of X to Q1 will be illustrated below. Given that the residuals are orthogonal to the left hand side, if follows from (10.1-2) eˆ PX y Q2 Q2' y (10.1-9) The code listed next illustrates these relationships using the Theil (1971) textile dataset studied in Chapter 2. First the X matrix is built and the economy QR factorization performed. The columns of Q1 are shown to have Euclidian length unity and be orthogonal (Q1'Q1 I ) . Column 1 of Q1 is shown to be a scalar transformation of X (-0.1205241588547691) . OLS is used to show the second and third columns of Q1 are linear transformations of col 1 and to 2 and 1-3 of X respectively. b34sexec options ginclude('b34sdata.mac') member(theil); b34srun; b34sexec matrix; call loaddata; call echooff; x=mfam(catcol(log10ri log10rpt,constant)); r=qr(x,q); call print(x,q,r); i=nocols(q); test=array(i:); do k=1,nocols(q); test(k)=sumsq(q(,k)); enddo; call print(test); col_1_q=q(,1); col_2_q=q(,2); col_3_q=q(,3); col_1_x=x(,1); col_2_x=x(,2); col_3_x=x(,3); s= afam(col_1_q)/afam(col_1_x); call print('Scale factor for x(,1) => q(,1) ',S:); call print('Second col of q linear transform of x Orthog. to Col 1':); call olsq(col_2_q, col_1_x, col_2_x :noint :print); call olsq(col_1_q, col_2_q :noint :print); call olsq(col_3_q, col_1_x, col_2_x col_3_x :noint :print); call print('Test of orthogonal Condition',transpose(q)*q); b34srun; Special Topics in OLS Estimation 10-5 Edited output B34S 8.10Z (D:M:Y) Variable TIME CT RI RPT LOG10CT LOG10RI LOG10RPT CONSTANT 21/ 8/06 (H:M:S) 14: 3:40 Label 1 2 3 4 5 6 7 8 DATA STEP PAGE # Cases Mean 17 17 17 17 17 17 17 17 1931.00 134.506 102.982 76.3118 2.12214 2.01222 1.87258 1.00000 YEAR CONSUMPTION OF TEXTILES REAL INCOME RELATIVE PRICE OF TEXTILES LOG10(CONSUMPTION OF TEXTILES) LOG10(REAL INCOME) LOG10(RELATIVE PRICE OF TEXTILES) Number of observations in data file 17 Current missing variable code 1.000000000000000E+31 Data begins on (D:M:Y) 1: 1:1923 ends 1: 1:1939. Frequency is B34S(r) Matrix Command. d/m/y 21/ 8/06. h:m:s 14: 3:40. => CALL LOADDATA$ => CALL ECHOOFF$ X = Matrix of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Q 1 1.98543 1.99167 2.00000 2.02078 2.02078 2.03941 2.04454 2.05038 2.03862 2.02243 2.00732 1.97955 1.98408 1.98945 2.01030 2.00689 2.01620 R 1 -0.239292 -0.240044 -0.241048 -0.243552 -0.243552 -0.245799 -0.246416 -0.247120 -0.245703 -0.243751 -0.241931 -0.238583 -0.239129 -0.239777 -0.242290 -0.241879 -0.243000 TEST 1 -8.29709 0.00000 0.00000 = Array 1.00000 17 3 by 3 1.00000 3 elements elements 3 -0.306449 -0.236160 -0.142443 0.920310E-01 0.923998E-01 0.301762 0.359337 0.425744 0.294795 0.113219 -0.561974E-01 -0.368744 -0.317944 -0.255997 -0.224782E-01 -0.607658E-01 0.436465E-01 by 2 -7.72132 -0.375029 0.00000 of 3 3 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 2 -0.417759 -0.391904 -0.370073 -0.204205 -0.150576 -0.146393 -0.145235 -0.264908E-01 0.137147 0.177336 0.214822 0.123456 0.114489 0.347728 0.252841 0.248275 0.236848 = Matrix of 1 2 3 by 2 2.00432 2.00043 2.00000 1.95713 1.93702 1.95279 1.95713 1.91803 1.84572 1.81558 1.78746 1.79588 1.80346 1.72099 1.77597 1.77452 1.78746 = Matrix of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 17 3 elements 3 -4.12287 0.304283E-03 -0.442434E-01 elements 1.00000 Scale factor for x(,1) => q(,1) -0.1205241588547691 Second col of q linear transform of x Orthog. to Col 1 Std. Dev. 5.04975 23.5773 5.30097 16.8662 0.791131E-01 0.222587E-01 0.961571E-01 0.00000 1 Variance 25.5000 555.891 28.1003 284.470 0.625889E-02 0.495451E-03 0.924619E-02 0.00000 1 Maximum Minimum 1939.00 168.000 112.300 101.000 2.22531 2.05038 2.00432 1.00000 1923.00 99.0000 95.4000 52.6000 1.99564 1.97955 1.72099 1.00000 10-6 Chapter 10 Note perfect fit of columns 1 and 2 of X mapping to column 2 of Q1 . Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals 1/Condition XPX Maximum Absolute Residual Number of Observations Variable COL_1_X COL_2_X Lag 0 0 Coefficient 2.4814260 -2.6664627 COL_2_Q 1.000000000000000 1.000000000000000 2.033498177717717E-26 1.355665451811812E-27 3.681936245797599E-14 0.9999999945536433 502.7987247346952 1.789899162686381E-05 0.2499999993192054 5.317413176442187E-13 5.668199248898641E-04 5.750955267558311E-14 17 SE 0.91472228E-13 0.98177458E-13 t 0.27127643E+14 -0.27159623E+14 Here column 1 of Q1 is orthogonal to col 2 of Q1 . Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals 1/Condition XPX Maximum Absolute Residual Number of Observations Variable COL_2_Q Lag 0 COL_1_Q -8683.232715995069 -8683.232715995069 0.9999999999999993 6.249999999999996E-02 0.2499999999999999 1.151512209199723E-04 -3.964164000159326E-02 -0.2425216604976431 2.682713422544098E-03 4.122868228459933 0.9999999999999999 0.2471202954562588 17 Coefficient -0.27061686E-15 SE 0.25000000 t -0.10824674E-14 Column 3 of Q1 is shown to be a linear transform of X. Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals 1/Condition XPX Maximum Absolute Residual Number of Observations Variable COL_1_X COL_2_X COL_3_X Lag 0 0 0 COL_3_Q 1.000000000000000 1.000000000000000 1.675885395409665E-23 1.197060996721189E-24 1.094102827307008E-12 0.9998848542254362 445.7268402712293 -2.602552757714062E-03 0.2499856063638260 1.409913852334910E-11 9.474231625417810E-06 1.713684749660160E-12 17 Coefficient 11.248239 -0.18338531E-01 -22.602243 SE 0.12603327E-10 0.29174534E-11 0.24729178E-10 t 0.89248168E+12 -0.62858008E+10 -0.91399087E+12 Test of orthogonal Condition Matrix of 1 2 3 1 1.00000 -0.270617E-15 -0.213371E-15 3 by 3 2 -0.270617E-15 1.00000 -0.100614E-15 elements 3 -0.213371E-15 -0.100614E-15 1.00000 B34S Matrix Command Ending. Last Command reached. Space available in allocator Number variables used Number temp variables used 8856736, peak space used 63, peak number used 51, # user temp clean 3330 63 0 Special Topics in OLS Estimation 10-7 The next example illustrates the gains of the QR approach. The gas data example studied in Chapter 7 and 8 is used to show accuracy gains obtained by use of the QR factorization on a simple problem. Later in this chapter a more difficult problem will be shown. /$ Illustrates OLS Capability under Matrix Command b34sexec options ginclude('b34sdata.mac') member(gas); b34srun; b34sexec matrix; call load(qr_small :staging); call echooff; call loaddata; nlag=6; call olsq(gasout gasin{0 to nlag} gasout{1 to nlag} :print :savex); call print(ccf(%y, %res)); call print(ccf(%yhat,%res)); /; /; Large QR used for illustration. Q2 is large!!! /; error2 equation uses q1 => uses the economy qr. This is the /; best way to proceed /; r=qr(%x,q:); call qr_small(%x,q,r,q1,q2,r_small); /; call print(q,q1,q2); yhat =q1*transpose(q1)*%y; error =q2*transpose(q2)*%y; error2=%y - yhat; beta=inv(r_small)*transpose(q1)*%y; call print('Beta from QR ',beta); call print(ccf(%y,error)); call print(ccf(yhat,error)); call print(ccf(yhat,error2)); /; call tabulate(%y,%yhat,yhat,%res,error,error2); call print(' ':); call print('Study Error Buildup using Cholesky':); call print(' ':); /; excessive problem maxlag=40; chol_r=vector(maxlag:); qr_r =vector(maxlag:); r_cond=vector(maxlag:); do i=1,maxlag; /; /; :qr call uses linpack to get OLS. This is close to LAPACK QR( /; call olsq(gasout gasin{0 to i} gasout{1 to i} :savex); chol_r(i)=ccf(%yhat , %res); r_cond(i)=%rcond; /; call olsq(gasout gasin{0 to i} gasout{1 to i} :qr :savex); /; chol_r(i)=ccf(%yhat , %res); /; /; Use economy size qr to save space!! /; r=qr(%x,q); qr_yhat =q*transpose(q)*%y; qr_error =%y - qr_yhat; qr_r(i) =ccf(qr_yhat,qr_error); enddo; ) 10-8 Chapter 10 call tabulate(chol_r,qr_r r_cond :title 'As maxlag increases accuracy declines'); b34srun; When lags of 1-6 are used, as suggested by Tiao-Box (1981), the reciprocal of the matrix condition was found to be 2.34596E-08. The correlation between the Cholesky ê and ŷ was -0.1333E-10 while for the QR this was -0.4148E-14, which is substantially smaller. When an excessive number of lags, (40) was used, the condition fell to 0.5827E-10 and the correlations were -0.1387E-10 and -0.3607E-13 respectively. Output documenting these findings is shown below. => CALL LOAD(QR_SMALL :STAGING)$ => CALL ECHOOFF$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F(13, 276) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable GASIN GASIN GASIN GASIN GASIN GASIN GASIN GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT CONSTANT Lag 0 1 2 3 4 5 6 1 2 3 4 5 6 0 Coefficient -0.67324068E-01 0.19318519 -0.21454694 -0.42981100 0.14122227 -0.94767767E-01 0.23492127 1.5418090 -0.58620686 -0.17641567 0.13419248 0.54079963E-01 -0.40030303E-01 3.8759484 GASOUT0 0.9946789205951879 0.9944282900435120 16.09378007198891 5.831079736227866E-02 0.2414762873705794 3024.532965517241 7.767793572930756 53.50965517241379 3.235044356946151 48.28024672308078 3968.705786042084 1.000000000000000 2.345968875600428E-08 1.407426755603240 290 SE 0.76805415E-01 0.16668178 0.18914123 0.18922276 0.19069299 0.18185376 0.11100544 0.59960417E-01 0.11056171 0.11539164 0.11472181 0.10092430 0.42973854E-01 0.85787179 t -0.87655367 1.1590061 -1.1343214 -2.2714551 0.74057401 -0.52112076 2.1163041 25.713781 -5.3020786 -1.5288427 1.1697207 0.53584681 -0.93150370 4.5180975 The correlation of y and ŷ with the Cholesky error is shown. 0.72945729E-01 -0.18895630E-10 ˆ is calculated using the QR method and the same correlations performed with one difference. When eˆ y yˆ the correlation is .4827531E-14 while when ê is calculated using Q2 in (10.1-7) the correlation is slightly smaller in absolute value, -.350777E-14.2 Beta from QR BETA = Vector of -0.673241E-01 -0.586207 14 0.193185 -0.176416 elements -0.214547 0.134192 -0.429811 0.540800E-01 0.141222 -0.400303E-01 -0.947678E-01 3.87595 0.234921 1.54181 0.72945729E-01 2 Later in this chapter the effect of data precision on the accuracy of QR vs Cholesky calculation is studied using two different datasets. In those examples the estimated correlation between the residual and the right hand side variables of the model are tested. Special Topics in OLS Estimation 10-9 -0.35077737E-14 0.48275313E-14 Study Error Buildup using Cholesky As maxlag increases accuracy declines Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 CHOL_R QR_R R_COND -0.3512E-11 -0.7724E-15 0.9180E-06 -0.1841E-10 0.1020E-13 0.2720E-06 -0.2981E-10 -0.8277E-15 0.8986E-07 -0.2070E-10 -0.5568E-14 0.5034E-07 -0.1927E-10 0.1700E-13 0.3355E-07 -0.1890E-10 0.4828E-14 0.2346E-07 -0.1908E-10 -0.9341E-14 0.1800E-07 -0.1333E-10 -0.4148E-14 0.1364E-07 -0.1823E-10 0.1090E-13 0.1009E-07 -0.1827E-10 -0.2608E-13 0.6999E-08 -0.1523E-10 0.1335E-13 0.5429E-08 -0.1454E-10 -0.1528E-13 0.4008E-08 -0.1479E-10 0.2589E-13 0.3039E-08 -0.2019E-10 0.2707E-14 0.2460E-08 -0.1711E-10 -0.2973E-13 0.1896E-08 -0.1432E-10 0.1527E-14 0.1574E-08 -0.1526E-10 -0.2731E-13 0.1299E-08 -0.1733E-10 -0.1140E-13 0.1075E-08 -0.1958E-10 -0.9592E-14 0.9116E-09 -0.1701E-10 -0.2558E-13 0.5930E-09 -0.1990E-10 0.3035E-13 0.4891E-09 -0.1744E-10 -0.3318E-14 0.4295E-09 -0.1978E-10 -0.7585E-14 0.3684E-09 -0.1676E-10 -0.1069E-13 0.3191E-09 -0.1614E-10 0.3729E-13 0.2692E-09 -0.1279E-10 -0.6407E-14 0.2405E-09 -0.1606E-10 -0.3884E-13 0.2157E-09 -0.1862E-10 0.2220E-13 0.2008E-09 -0.1539E-10 -0.5976E-14 0.1900E-09 -0.1945E-10 0.1394E-13 0.1783E-09 -0.2438E-10 -0.5711E-14 0.1660E-09 -0.1603E-10 -0.1808E-13 0.1540E-09 -0.1894E-10 0.3219E-13 0.1394E-09 -0.1650E-10 0.2197E-14 0.1561E-09 -0.8106E-11 0.3452E-13 0.1393E-09 -0.1313E-10 -0.4354E-14 0.1041E-09 -0.2144E-10 0.2191E-13 0.1067E-09 -0.1345E-10 0.1394E-13 0.8881E-10 -0.1293E-10 -0.6742E-13 0.7338E-10 -0.1387E-10 -0.3607E-13 0.5827E-10 B34S Matrix Command Ending. Last Command reached. The qr command has a number of options that facilitate its use in cases in which rank may be a problem. The EPS option allows the user to specify a nonnegative number such that the condition number of X ' X must be < (1/EPS). If ESP is not supplied, it defaults to V * 1.0E-16, where V is the largest sum of absolute values in any row of X. The IFREZ option allows the user to require that certain variables be placed in any final model when all variables cannot be entered in the regression due to rank problems.3 NK 2 operations while the QR decomposition 2 requires NK 2 operations. If K ( N / 2) then the QR is both faster and more accurate than the Cholesky. When K ( N / 2) the speeds are the same while for smaller rank problems the Cholesky is faster. The QR decomposition requires more memory than the Cholesky. 3 The Cholesky decomposition requires K 3 10-10 Chapter 10 The B34S qr command provides easy access to the singular value decomposition procedure, although much more power is contained in the matrix procedure capability of the same name.4 Use of the capability is explained in the next section Assume a model y f ( x1, x 2, x3) . The following command will estimate both the QR approach to the OLS model and the associated principal component regression. b34sexec qr ipcc=pcreglist$ model y = x1 x2 x3$ b34seend$ Values printed include ˆ j , t j , U , V , Yˆ , ˆ j , and t j . If IPCC=PCREG, U , V , Yˆ , and eˆ are not printed. The principle component regression model is discussed next. 10.2 The Principal-Component Regression Model The principal-component regression model can easily be calculated from the data and provides a useful transformation, especially in cases in which there is collinearity.5 Assume X is a T by K matrix. The singular value decomposition of X is X U V ' (10.2-1) where U is T by r, is r by r, V is K by r, T K and r K . U and V have the property that U 'U I (10.2-2) and V 'V I (10.2-3) 4 The matrix command r=qr(x,q); or r=qr(x,q,q1,q2); will perform a QR factorization of X. The QR approach to OLS is implemented by the :qr option on the call olsq( ); command. Outside of the matrix command, the qr command will do QR and PC estimation. 5 Mandel (1982) provides a good summary of the uses of the principal-component regression and the discussion here has benefited from that treatment. The LINPACK FORTRAN routine DSVDC has been used to calculate the singular value decomposition. This routine has been found to be substantially more accurate than the singular value decomposition routines in versions 8 and 9 of the IMSL Library and substantially faster than the corresponding NAG routine to calculate the singular value decomposition. Stokes (1983, 1984) implemented LINPACK in SPEAKEASY 7. Subsequent testing of the SPEAKEASY LINKULE SVDCM, which uses DSVDC, against the LINKULE SVD, which uses the NAG routine, documented the speed advantage. Special Topics in OLS Estimation 10-11 i j 0 for i j and the elements i i are the square roots of the nonzero eigenvalues of X ' X or X X ' . This result builds on the fact that while there are T eigenvalues of X X ' only K are non zero and are in fact the eigenvalues of X ' X . The rows of V ' are the eigenvectors of X ' X , while the columns of U are the eigenvectors of X X ' . This can be easily seen if we note that using (10.2-1) X ' X V U 'U V ' V 2V ' and XX ' U V 'V U ' U 2U '. If K = r, then X is of full rank. If we replace X in equation (2.1-7) by its singular value decomposition, we obtain Y X e U V ' e (10.2-4) which can be written Y U e (10.2-5) if we define V ' (10.2-6) An estimate of ̂ can be calculated from equation (2.1-8) as ˆ (U 'U )1U ' y U ' y (10.2-7) ˆ (V ') 1 1ˆ (10.2-8) ˆ [ˆ1, ,ˆ k ] and ˆ [ˆ1, , ˆk ] . Mandel (1982) points out that the variance of each coefficient is k var( ˆ j ) [v 2j m / 2m m ]ˆ 2 (10.2-9) m 1 where ˆ 2 is defined in equation (2.1-10), is vi j an element of matrix V and var(ˆ j ) ˆ 2 (10.2-10) Equation (10.2-9) shows that it is possible to determine just what vectors in X are causing the increase in the variance since the elements of V are in the range 0 through 1. As the smallest m m approaches 0, the var( ˆ ) approaches . The t test for each j coefficient ˆ j , t j is t j ˆ j / (10.2-11) 10-12 Chapter 10 in view of equation (10.2-10). The singular value decomposition can be used to illustrate the problems of near collinearity. Following Mandel (1982), from equation (10.2-1) we write U X V (10.2-12) or a 0 U X [Va Vb ] 0 b (10.2-13) where a and b are square diagonal matrices that contain the ka and kb very small singular values (ka kb K ) along the diagonals, respectively. Va and Vb are K by ka and K by kb , respectively. In terms of our prior notation r ka . If X is close to not being full rank, then X Vb 0 (10.2-14) since U [0 b ]' 0 as b approaches zero. Equation (10.2-14) is very important in understanding why predictions of OLS models with rank problems in X have high variance in cases of near collinearity and why there is no unique solution for ˆi when there is exact collinearity (r K ) . The perfect collinearity case will be discussed first. Assume the vector ˆ is partitioned into ˆ a containing ka elements and ˆb containing kb elements. Given a new vector xT j consisting of K elements, a forecast of y in period T+ j can be calculated from yˆT j xT j (ˆa ˆb ) ' ( xa T j xb T j )(ˆa ˆb ) ' (10.2-15) where xT j is partitioned into xa T j and xb T k . Following Mandel (1982), if Z V ' , where Z is r by K, equation (10.2-8) becomes Z ˆ (Za Zb )(ˆa ˆb )' ˆ (10.2-16) where Z has been partitioned into Z a , which is ka by kb , and Z b , which is ka by kb . From equation (10.2-16) ˆ is written as a ˆa Za1 Zb ˆb Za1ˆ (10.2-17) Special Topics in OLS Estimation 10-13 Equation (10.2-17) shows that if a value for ˆb is arbitrarily determined, ˆa is uniquely determined. Using equation (10.2-17), we substitute for ˆ in equation (10.2-15) a yˆT j xaT j (Za1ˆ ) ( xb T j xa T j Za1Zb )b (10.2-18) Equation (10.2-18) suggests that the only way we can make yˆT j independent of any arbitrary value of ˆ is to impose the constraint b xb T j xa T j Za1Zb (10.2-19) which implies that yˆT j xa T j (Za1ˆ ) (10.2-20) We note that Z a1Z b (Va' ) 1Vb' (10.2-21) where V ' has been partitioned as we did with Z. The above discussion repeats Mandel's (1982) important proof that if there is collinearity such that r K , there is no unique solution of yˆT j , given a vector xT j , except when the new x vector ( xT j ) fulfills equation (10.2-19). Next, near collinearity will be discussed. Consider a new vector xT j from which we want to obtain a prediction yˆT j . If xT j satisfies the near collinearity condition of the X matrix expressed in equation (10.2-14), then from equation (10.2-1) we can write xT j uT j V ' (10.2-22) which, since 1 exists, can be written uT j xT jV 1 (10.2-23) where xT j and uT j are K element vectors. From equation (10.2-7) yˆT j uT jˆ and in view of equation (10.2-10), (10.2-24) 10-14 Chapter 10 K var( yˆT j ) ˆ 2 ui2 (10.2-25) i 1 From equation (10.2-13) a1 uT j ( xT jVa xT jVb ) 0 0 b1 (10.2-26) Assume the new vector xT j satisfies the near collinearity condition of the X matrix expressed in equation (10.2-14), xT jVb 0 and the variance of yˆT j will be small. However in the case where xT jVb 0 , then the value ( xT jVb )b1 will be very large, because of having to invert b where there are small values along the diagonal. These small values will imply large values in u T j from equation (10.2-26) and a large variance in yˆT j from equation (10.2-25). This rather long discussion, which has benefited from Mandel's (1982) excellent paper, has stressed the problems of using an OLS regression model for prediction purposes when the original X matrix has collinearity problems. The singular value decomposition approach to OLS estimation has been shown to highlight the effect of collinearity, which potentially impacts OLS, ARIMA, VAR and VARMA models. The ridge regression model and the Lasso procedure are two ways to deal with this problem is a structured manner.6 6 The stepwise and best regression approaches, discussed in Chapter 2, are other alternatives that have their advantages and disadvantages. Special Topics in OLS Estimation 10-15 10.3 Ridge, Lasso and Elastic Net Models The Ridge regression model of Hoerl-Kennard (1970), as discussed in Hastie-TibshiraneFriedman (2001, 59)(2009 61), is a shrinkage method of estimation that involves calculation of T k 1 k i 1 j 1 j 1 ˆ ridge min ( yi 0 xij j )2 j2 (10.3-1) The ridge coefficients mimimize a penalizied residual sum of squares where controls the amount T of shrinkage. 0.0 implies an OLS model. If the inputs are centered and ˆ0 yi / T and X * has i 1 the vector of 1’s removed, the rest of the ridge coefficients can be estimated as ( X *' X * I )1 X *' y . (10.3-2) Note X ˆ ridge X * ( X *' X I ) 1 X *' y U ( 2 I ) 1 U ' y k 1 uj j 1 2 j ( ) 2 j (10.3-4) u 'j y where u j are the columns of U from (10.2-1) and i is the ith diagonal element of and X * was used in the place of X. Equation (10.3-2) shows that for 0 the ridge estimates are smaller than their OLS counterparts. The columns of X * with the least variance will be associated with the smallest i . An example is shown below. The Lasso is another shrinkage procedure, which following Hastie-Tibshirano-Friedman (2001, 72) (2009, 68), can be estimated as T k 1 k 1 j 1 j 1 ˆ arg min [ yi 0 ( xij x j ) j ]2 | j |d i1 (10.3-5) if d 1 . Setting d 2 corresponds to the ridge regression (10.3-1) while d 0 corresponds to variable subset selection. Examples of the ridge and lasso models are presented in section 10.5 in tables 10.2 and 10.3. 10-16 Chapter 10 A related shrinkage approach is the least trimmed regression (LTS) model discussed in q Faraway (2005, 101-103) where one minimizes uˆ i 1 2 (i ) where q T and uˆ(2i ) is the sorted residual squared. The smaller q, the more outliers that are removed from the dataset. Compared to the full sample, inspection of the LTS coefficients will indicate how sensitive the results are to possibly rogue observations that are coming from a different distribution or have something wrong/strange with the data. The LTS estimates are an example of a resistant regression method and are illustrated in Table 10.9 in section 10.5. The Elastic Net model, discussed next, combines the Ridge and Lasso modeling approaches and provides a major step forward in view of the progress made in easily computing the solutions of lasso and ridge models over a range of values.7 The basic Elastic Net Model assumes both and and minimizes 1 2T k ' 2 .5(1 ) j2 | j | ( y x ) P ( ) where P ( ) i 0 i a i 1 j 1 T (10.3-6) Equation (10.3-6) is a compromise between a ridge regression penalty ( 0) and a lasso penalty ( 1) .8 The below listed code tests a number of the capabilities of the GLM model. The problem is to test the extent of which out-of-sample performance is impacted by shrinking the model. The exact number of lags used by Tiao-Box (1981) of 6 is used and a holdout sample of 100 is assumed. The GLM switch ne=9 restricts the model to nine coefficients and measures the effect on the R**2 and e ' e . /; Illustrates effect of reduction in model on the out of sample /; performance /; b34sexec options ginclude('gas.b34'); b34srun; b34sexec matrix; call loaddata; call load(glm_info :staging); call echooff; /; logic works for holdout > 0 /; set max VAR lag and holdout # of obs /; /; k=max lag, parm(0.0)=> ridge (1.0) lasso, nlam = # models tried k=6; 7 Zon-Hastie (2005) first proposed the Elastic Net Model. Friedman-Hastie-Tibshirani (2009) provided details on a fast way to compute the general linear model using coordinated descent. Further details on this approach are given in Hastie-Tibshirani-Friedman (2009, 662). 8 The elastic net model is implemented in the GLM command which uses GPL code developed by Friedman-Hastie-Tibshirani in a 2009 working paper. Special Topics in OLS Estimation holdout=100; maxlag=k; nlam=100; lam_min=.0001; thr =.1e-6; parm = .5; ne = 9; /; At issue is how are yhat values calculated within the sample treated? /; For purposes of this analysis they are removed call olsq(gasout gasout{1 to k} gasin{1 to k} :print :savex :holdout holdout); %yfuture= gasout(integers(norows(%x)+maxlag,norows(gasout))); olsf=mfam(%xfuture)*vfam(%coef); olsfss=sumsq(afam(%yfuture)-afam(olsf)); call print(' ':); call print('ols forecast error sum sq ',olsfss:); call print(' ':); call glm(gasout gasout{1 to k} gasin{1 to k} :print :savex :lamdamin lam_min :nlam nlam :holdout holdout :thr thr :parm parm :ne ne); call print(' ':); %yfuture= gasout(integers(norows(%x)+maxlag,norows(gasout))); call glm_info(%yfuture,%xfuture,%coef,%a0,%alm,glmf, fss,mod,1); call print(' ':); call print('glm forecast sum of squares ',fss:); call print(' ':); res_ols=vfam(%yfuture)-olsf; res_glm=vfam(%yfuture)-glmf; call graph(%yfuture,olsf,glmf :nolabel :nocontact :pgborder :file 'ols_glm_yhat_oos.wmf' :heading 'OLS vs GLM yhat out-of-sample'); call graph( res_ols res_glm :nolabel :nocontact :pgborder :file 'ols_glm_res_oos.wmf' :heading 'OLS vs GLM residual out-of-sample'); call tabulate(%yfuture,olsf,glmf res_ols res_glm); /; tests of reduction loss as a function of restriction tparm=grid(.1,.9,.1); fsstest=array(norows(tparm):); nein=ne; do jj=1,norows(tparm); call glm(gasout gasout{1 to k} gasin{1 to k} :savex :lamdamin lam_min :nlam nlam :holdout holdout :thr thr :parm tparm(jj) :ne ne); call glm_info(%yfuture,%xfuture,%coef,%a0,%alm,glmf, fss,mod,1); fsstest(jj)=fss; enddo; call tabulate(tparm call graph( tparm :nolabel :nocontact :heading fsstest ); fsstest :plottype xyplot :grid :pgborder :file 'PARM_vs_OSS.wmf' 'Parm vs GLM residual SS out of sample'); 10-17 10-18 Chapter 10 rss_glm=array(2*k:); ne_used=array(2*k:); icount=1; %yfuture= gasout(integers(norows(%x)+maxlag,norows(gasout))); do ne=1,2*k; ii=ne; call glm(gasout gasout{1 to k} gasin{1 to k} :savex :lamdamin lam_min :nlam nlam :holdout holdout :thr thr :parm parm :ne ii /; :print ); /; call print('+++++++++++++++++++++++++':); call glm_info(%yfuture,%xfuture,%coef,%a0,%alm,glmf,fss, mod,0); rss_glm(icount)=sfam(fss); ne_used(icount)=sfam(dfloat(ii)); icount=icount+1; enddo; rss_glm =dropfirst(rss_glm,1); ne_used =dropfirst(ne_used,1); call tabulate(rss_glm,ne_used); call graph(ne_used rss_glm :grid :heading 'RSS out-of-sample as a function of model reduction' :plottype xyplot :nocontact :nolabel :pgborder :file 'rss_loss.wmf'); b34srun; Edited and annotated output follows: Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F(12, 177) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Holdout option reduced sample by Variable GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT GASIN GASIN GASIN GASIN GASIN GASIN CONSTANT Lag 1 2 3 4 5 6 1 2 3 4 5 6 0 Coefficient 1.2285222 -0.29063287 -0.24342211 0.20531898 -0.89608071E-01 0.31553886E-01 0.12786487 -0.32396180 -0.49884894 0.25235636 -0.19926975 0.12738336 8.4137071 ols forecast error sum sq GASOUT 0.9979718278979760 0.9978343247046184 3.682663208127010 2.080600682557632E-02 0.1442428744360578 1815.754789473684 105.0235276580087 52.50894736842105 3.099543224133753 20.92257736335549 7257.808371787274 1.000000000000000 3.637698112128508E-09 0.4505041686010856 190 100 SE 0.74956376E-01 0.11828994 0.11861680 0.11606267 0.10219641 0.40767543E-01 0.52721231E-01 0.11198859 0.12726385 0.13183588 0.12969996 0.99759679E-01 1.3930658 t 16.389829 -2.4569534 -2.0521723 1.7690355 -0.87682210 0.77399529 2.4253013 -2.8928107 -3.9198008 1.9141706 -1.5363902 1.2769022 6.0397055 52.36509239330390 The OLS out-of-sample residual sum of squares was found to be 52.365. A GLM model is shown next where the maximum number of variables in the model is assumed to be 9. In equation (10.3-6) it is assumed that .5 . min 0.142E-01 . Figures 10.1 and 10.2 show out-of-sample ŷ and residuals for OLS and GLM models. Special Topics in OLS Estimation 10-19 Generalized Linear Model via Coordinated Descent Version 5/17/2008 converted to real*8 10/28/2009 Number of Observations 190 Holdout option reduced sample by 100 Number of right hand side variables 12 Maximum number of variables in model (:ne) 9 Number of lamda values considered (:nlam) 100 Minimum lamda (:flmin) 1.000000000000000E-04 Covariance updating algorithm selected (:cua) Percent lasso (:parm) 0.5000000000000000 Analysis on standardized predictor variables Converge threshold for Lamda solution (:thr) 1.000000000000000E-07 Left Hand Side Variable GASOUT Series GASOUT Mean 52.51 Max 60.20 Min 45.60 Right Hand Side Variables # 1 2 3 4 5 6 7 8 9 10 11 12 Series GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT GASIN GASIN GASIN GASIN GASIN GASIN Lag 1 2 3 4 5 6 1 2 3 4 5 6 Mean 52.52 52.53 52.53 52.53 52.53 52.53 0.2094 0.2091 0.2064 0.2025 0.1980 0.1933 Max 60.20 60.20 60.20 60.20 60.20 60.20 2.834 2.834 2.834 2.834 2.834 2.834 Min 45.60 45.60 45.60 45.60 45.60 45.60 -2.716 -2.716 -2.716 -2.716 -2.716 -2.716 Total Number of passes over the data Maximum R**2 Last Lamda Value Considered Residual Sum of Squares for last model Sum of Absolute Residuals for last model Largest Adsolute residual for last model Penalty 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 14807 0.9966397334158390 1.422720271397891E-02 6.101420144071417 26.03246518892973 0.6581948928894832 OLS vs GLM yhat out-of-sample 61 60 59 58 57 % Y F U T U R E 56 55 54 53 52 51 50 0 10 20 30 40 50 Obs 60 70 80 90 100 O L S F G L M F 10-20 Chapter 10 Figure 10.1 OLS vs GLM Yhat Values for Out-of-Sample Observations OLS vs GLM residual out-of-sample 2 1.5 1 R E S _ O L S .5 0 -.5 R E S _ G L M -1 -1.5 0 10 20 30 40 50 Obs 60 70 80 90 100 Figure 10.2 Out-of-Sample Residuals for OLS and GLM Models Figure 10.3 shows the effect of changing the value of on the out-of-sample residual sum of squares. The residual sum of squares is relatively insensitive to values in the range of .3 to 1.0. Figure 10.4 shows the sum of squares is relatively insensitive to models with 8 or more coefficients. More detail on these results are presented in Table 10.1. Special Topics in OLS Estimation 10-21 Parm vs GLM residual SS out of sample 700 600 500 400 300 200 100 .10 re .20 .30 .40 .50 TPARM .60 .70 Figure 10.3 Effect of changes in Lamda on the Out-of-Sample RSS .80 .90 10-22 Chapter 10 RSS out-of-sample as a function of model reduction 1200 1100 1000 900 800 700 600 500 400 300 200 100 2 3 4 5 6 7 NE_USED 8 9 10 11 12 Figure 10.4 RSS for Out-of-Sample Models vs Number of Vectors in the Model Assuming .5 , the out-of-sample sum of squares varied between 1338.78 for 5.48 to 85.43 for 85.43. Table 10.1 shows added detail as is changed from .1 to .9.. Special Topics in OLS Estimation 10-23 Table 10.1 Effect on the Out-of_Sample RSS of values of and the Number of Coefficients ____________________________________________________________________________________________________________________________ Obs 1 2 3 4 5 6 7 8 9 TPARM 0.1000 0.2000 0.3000 0.4000 0.5000 0.6000 0.7000 0.8000 0.9000 FSSTEST 691.0 247.3 74.50 69.11 85.43 84.92 84.35 83.34 82.23 1 2 3 4 5 6 7 8 9 10 11 RSS_GLM 1190. 1055. 933.6 821.6 271.1 194.8 85.43 85.43 85.43 85.43 85.43 NE_USED 2.000 3.000 4.000 5.000 6.000 7.000 8.000 9.000 10.00 11.00 12.00 Obs _________________________________________________________________________________________________________________________ 10.4 Partial Least Squares and Continuum Regression Models Define X as the data matrix with means removed, and y as the left hand-side with mean removed. While the principle component regression model (PCR), discussed in section 10.2, shrinks the X matrix by using orthogonal projections that key on high variance in the X matrix only, the partial least squares procedure (PLS), first suggested by Wold (1975), keys on both high variance and correlation with the left hand side variable y. As such it can usually explain more variance than the PCR approach given that the number of projections is less than the number of columns in X. The performance of PLS vs PCR was studied by Frank-Friedman (1993) and is summarized in Hastie-Tibshirani-Friedman (2009, 61-99). Stone-Brooks (1990) suggested the Continuum Regression Model (CRM) of which OLS, PLS and PCR are special cases. After first discussing PLS and CRM, a number of examples will be developed. 10-24 Chapter 10 Important research by de Jong-Wise-Ricker (2001) provides both an excellent discussion of the theory behind the PLS and a computationally efficient procedure to obtain both PLS and CRM results.9 A first approach to estimating a CRM suggested by Iglarsh-Cheng (1980) formed a weighted continuous estimator (WCR) as ˆWCR (1 )ˆOLS ˆPCR (10.4-1) which could be estimated with a nonlinear search procedure. The strength of this procedure is in its simplicity. Its developers argued that the WCR estimator has a smaller mean square error than OLS in the presence of multicollinearity. Note that OLS implies 0 while PCR implies that 1. The PLS estimator, which will be defined next, is not included in this specification which will not be discussed further. Define and note that given ( X ' X ) ULU ' then ( X ' X ) UL U ' . This is ( 1) equivalent to modifying X into its "powered" form or X ( ) UL /2V ' . 0,1, corresponds to the effect on X of OLS, PLS and PCR respectively.10 For 0 1 multicolinearity has been artificially taken out of the X matrix while for 1 multicolinearity has been artificially put in the the X matrix. Define T as the PLS component or orthogonal vectors and gs ( ) as the Gram-Schmidt orthogonalization11 of ( ) where T gs ( K ) (10.4-2) given K [ULU ' y, UL2U ' y, ,ULaU ' y] (10.4-3) and a is the number of columns of the PLS model. Given p U ' y/ | y | (10.4-4) is the vector of correlations of y with the non-zero principle components, the canonical form of K is K [ Lp, L2 p, , La p] (10.4-5) 9 This reference will be used to develop the math in the following discuission. 10 The SVD decomposition shows X USV '. X ' X VSU 'USV ' VS 2V '. For a positive definite matrix such as X ' X then V ' U . S 2 L. 11 The Gram-Schmidt orthogonalization on a matrix is usually U from the SVD of the matrix. Special Topics in OLS Estimation 10-25 from which we can define the orthonormal basis of K as T gs( K ) U 'T (10.4-6) T UT XR (10.4-7) where R are the weights given by R VL.5U 'UT VL.5T (10.4-8) The importance of R is that it shows how each orthogonal column in T is related to the original data in X . The loadings c of T are defined as c y 'T (T 'T )1 y 'UT (10.4-9) The loadings of T with respect to X are P X 'T (T 'T )1 X 'T VL.5U 'UT VL.5T (10.4-10) The fraction of variance of y explained by the PLS model is 2 RyT y 'T (T 'T )1T ' y / y ' y cc '/ | y |2 p 'TT ' p (10.4-11) where p U ' y / | y | . The PLS fitted y and PLS coefficient vector PLS are yˆ PLS T (T 'T ) 1T ' y Tc ' XRc ' (10.4-12) and ˆPLS Rc ' (10.4-13) respectively. From R we can obtain the implied vector ˆOLS as ˆOLS RˆPLS (10.4-14) since the PLS coefficients are related to their implied OLS coefficients. Either can be used to obtain ŷ . T PLS y X OLS yˆ (10.4-15) 10-26 Chapter 10 The importance of (10.4-14) is that inspection of the implied OLS coefficients OLS shows how the latent vectors in T are related to the underlying data vectors in X. If the number of latent vectors is less than the number of original vectors in X, the resulting residual sum of squares will be larger than would occur if all PLS coefficients were used. Table 10.2 contains the Matlab code to estimate the PLS model using the de Jong (1993) method of analysis. Table 10.3 shows the b34s implementation of this approach. Although this approach is outdated, it is still in use in sas proc pls and in the Matlab plsregress command. Special Topics in OLS Estimation Table 10.2 Matlab Code to Obtain PLS Model Solution suggested by de Jong (1993) function [Xloadings,Yloadings,Xscores,Yscores,Weights]=simpls(X0,Y0,ncomp) [n,dx] = size(X0); dy = size(Y0,2); % outClass = superiorfloat(X0,Y0); Xloadings = zeros(dx,ncomp,outClass); Yloadings = zeros(dy,ncomp,outClass); if nargout > 2 Xscores = zeros(n,ncomp,outClass); Yscores = zeros(n,ncomp,outClass); if nargout > 4 Weights = zeros(dx,ncomp,outClass); end end % Each new basis vector can be removed from Cov separately. V = zeros(dx,ncomp); Cov = X0'*Y0; for i = 1:ncomp % Find unit length ti=X0*ri and % is jointly maximized, subject [ri,si,ci] = svd(Cov,'econ'); ti = X0*ri; normti = norm(ti); ti = ti ./ Xloadings(:,i) = X0'*ti; qi = si*ci/normti; % = Y0'*ti Yloadings(:,i) = qi; if nargout > Xscores(:,i) Yscores(:,i) if nargout > Weights(:,i) end end ui=Y0*ci whose covariance, ri'*X0'*Y0*ci, to ti'*tj=0 for j=1:(i-1). ri = ri(:,1); ci = ci(:,1); si = si(1); normti; % ti'*ti == 1 2 = ti; = Y0*qi; % = Y0*(Y0'*ti), and proportional to Y0*ci 4 = ri ./ normti; % rescaled weights % Update the orthonormal basis vi = Xloadings(:,i); for repeat = 1:2 for j = 1:i-1 vj = V(:,j); vi = vi - (vj'*vi)*vj; end end vi = vi ./ norm(vi); V(:,i) = vi; Cov = Cov - vi*(vi'*Cov); Vi = V(:,1:i); Cov = Cov - Vi*(Vi'*Cov); end if nargout > 2 for i = 1:ncomp ui = Yscores(:,i); for repeat = 1:2 for j = 1:i-1 tj = Xscores(:,j); ui = ui - (tj'*ui)*tj; end end Yscores(:,i) = ui; end end 10-27 10-28 Chapter 10 Table 10.3 B34S Implementation of SIMPLS Calculation subroutine pls_reg(y,x,pls_coef,xload,yload,xscores, yscores,weights,yhat,pls_res,rss,ncomp,iprint); /; Partial Least Squares. See Wold (1975) /; pls_reg is designed to 100% track the Matlab simpls routine /; from which this discussion of PLS and PC regression has been /; developed. /; /; The Matlab code came from /; de Jong, S. "SIMPLS: An Alternative Approach to Partial Least Squares /; Regression." Chemometrics and Intelligent Laboratory Systems. Vol. /; 18, 1993, pp. 251–263. /; /; A newer reference is: /; /; y => left hand variable. Usually %y from olsq with :savex /; Can include more that 1 col! /; x => left hand variable. Usually %x from olsq with :savex /; n by k /; pls_coef => The pls_beta. Calculated as: /; pls_beta=weights*transpose(yload) for each. /; pls_coef is a (k,ncomp) matrix of coefficients. If /; ncomp is set = k, then pls_coef(,ncomp) is the same /; as the ols coefficients /; xload /; yload /; xscores /; yscores /; weights /; yhat => Predicted y value /; res => Residiual for last PLS regression /; rss => Residual Sum of Squares for from 1,...,ncomp models /; ncomp => # of cols in pls_beta /; iprint => =0 print nothing, =1 print results /; /; Built 28 April 2011 by Houston H. Stokes /; k=nocols(x); kk=nocols(y); n=norows(x); if(ncomp.gt.k)then; call epprint( 'ERROR: ncomp must be 0 < ncomp le # cols of x. was',ncomp:); call epprint( ' # of columns of x was ',k:); go to finish; endif; if(norows(y).ne.norows(x))then; call epprint('ERROR: # of obs in y and x not the same':); go to finish; endif; if(kk.gt.1)then; call epprint( 'ERROR: This release of pls_reg limited to one left hand variable':); go to finish; endif; meany=mean(y); meanx=array(k:); y0=y-meany; x0=x; do i=1,k; meanx(i)=mean(x(,i)); x0(,i)=mfam(afam(x(,i))-sfam(afam(meanx(i)))); enddo; vbig vi vibig xload yload xscores yscores weights =matrix(k,ncomp:); =matrix(k,1:); =matrix(k,1:); =matrix(k,ncomp:); =matrix(kk,ncomp:); =matrix(n,ncomp:); =matrix(n,ncomp:); =matrix(k,ncomp:); Special Topics in OLS Estimation rss =vector(ncomp:); pls_beta=matrix(k,ncomp:); cov=matrix(nocols(x0),1:transpose(x0)*y0); do i=1,ncomp; s=svd(cov,ibad,21,uu,vv ); if(ibad.ne.0)then; call epprint('ERROR: SVD of cov failed':); go to finish; endif; ri=uu(,1); ci=vv(,1); si=s(1); ti=x0*ri; normti=sqrt(sumsq(ti)); if(normti.le.0.0)then; call epprint('ERROR: Norm of ti le 0.0':); go to finish; endif; ti=vfam(afam(ti)/normti); xload(,i)=transpose(x0)*ti; qi=si*ci/normti; yload(,i)=qi; xscores(,i)=ti; yscores(,i)=y0*qi; weights(,i)=vfam(afam(ri)/sfam(normti)); vi(,1)=xload(,i); do repeat=1,2; if(i.gt.1)then; do j=1,(i-1); vj(,1)=vbig(,j); vi=mfam(afam(vi)-(sfam(transpose(vj)*vi)*afam(vj))); enddo; endif; enddo; normvi=sqrt(sumsq(vi)); if(normvi.le.0.0)then; call epprint('ERROR: Norm of vi le 0.0':); go to finish; endif; xjunk=afam(afam(vi)/ normvi); vi=matrix(norows(xload),1:xjunk); vbig(,i)=vi(,1); cov=cov - (vi*(transpose(vi)*cov)); vibig=submatrix(vbig,1,norows(vbig),1,i); cov=cov - (vibig*(transpose(vibig)*cov)); do iii=1,ncomp; ui=yscores(,iii); do repeat=1,2; if(iii.gt.1)then; do j=1,(iii-1); tj=xscores(,j); xwork=sfam(tj*ui); ui=mfam(afam(ui)-(xwork*afam(tj))); enddo; endif; enddo; yscores(,iii)=ui; enddo; pls_beta=weights*transpose(yload); adj=array(nocols(x):); do jj=1,nocols(x); adj(jj)=meanx(jj)*sfam(pls_beta(jj,1)); enddo; scale=meany-sum(adj); 10-29 10-30 Chapter 10 jj=norows(pls_beta); pls_beta(jj,1)=scale; yhat=x*pls_beta; pls_res=vfam(afam(y)-afam(yhat)); rss(i)=sumsq(pls_res); pls_coef(,i)=pls_beta(,1); enddo; if(iprint.ne.0)then; call print(' ':); iix=nocols(x); call print('Partial Least Squares - 26 April 2011 Version' :); call print('Number Columns in origional data ',iix :); call print('Number Columns in PLS Coefficient Vector',ncomp:); call print('PlS sum of squared errors ',rss(ncomp):); endif; /; go to done; finish continue; yhat=missing(); phs_beta=missing(); done continue; return; end; Table 10.4 PLS1 the Jong-Wise-Ricker (2001) PLS-CRM Estimation Approach function [c,R,P,B_CPR,R2X,R2y,T,B_CPRmh] = pls1(x,y,A,alpha,yORG,xORG) % % Code suggested by Jong-Wise-Ricker j Chemometrics 2001, 15: 85-100 % % Inputs: % x x matrix % y y matrix % A dimensionality of PLS model % alpha =0, .5 1. for OLS, PLS and PC % % Outputs: % T orthonormal PLS component scores where T = XR % R weights % P % B_CPR betas - PLS regression vector % c loadings of T with respect to y % +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ % % Code from de Jong, Wise, Ricker (2001) % 29 April 2011 version. Additions by Michael Hunstad % +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [U,L,V]=svd(x,0); % SVD L=diag(L.^2); % Eigenvalues r=sum(L>(L(1)/1e14)) % rank of x L=L(1:r); % non-zero eigenvalues U=U(:,1:r); % unit-length PC scores V=V(:,1:r); % PC weights A=min(A,r); % dimensionality gam = alpha/(1-alpha); % continuum power exponent Lgam=L.^gam; % 'powered' eigenvalues rho=(y'*U)'; % |y| * corr(y,PCs) T=zeros(r,A); % initialize T for a=1:A; t=Lgam.*rho; % can version of (XX')^gam*y t=t-T(:,1:max(1,a-1))*(T(:,1:max(1,a-1))'*t); % orthogonalize t=t/sqrt(t'*t) % normalize t T(:,a)=t % store in T rho=rho-t*(t'*rho); % residual of rho w. r. t. t end c=rho'*T; % loadings c of T w. r.to y cmh = y'*U*T; % This is added R=V*(diag(1./sqrt(L))*T); P=V*(diag(sqrt(L))*T); B_CPR=R*triu(c'*ones(size(c))); % ones(size)) has been fixed meanY = mean(yORG); %mean of original data - i.e., not de-meaned xORG(:,end+1) = ones(size(xORG,1),1); meanX = mean(xORG); %mean of original data - i.e., not de-meaned B_CPRmh = R*cmh'; B_CPRmh(end) = [meanY - meanX*B_CPRmh]; Special Topics in OLS Estimation R2X=100*1'*cumsum(T'.^2)'/sum(1); % R2y=100*cumsum(c.^2)/(y'*y); % T=U*T; % %note that throughout the algorithm T is actually %step 10-31 R-squared on X Eq. 18 R-squared on y Eq. 17 Eq.11 Ttilde until the last The algorithm logic in Tables 10.2 and 10.3 shows that initially the covariance matrix of X and y is formed. The SVD is then repeatedly calculated as the information contained in each T vector is used to update the covariance matrix. A number of loops are required. In contrast the de Jong-Wise-Ricker (2001) algorithm achieves the same result with only one loop and one SVD calculation. The Matlab logic is shown in Table 10.4 and its b34s implementation in Table 10.5. Comments have been added to show additions to the Matlab code. The crmtest program was developed to graphically display the results of changing on the residual sum of squares. Table 10.5 B34S Implementation of PLS1 including CRMTEST /; /; Also includes crmtest to graphically study CRM Model /; subroutine pls1_reg(y,x,y0,x0,r,pls_beta, u,v,s,pls_coef,yhat,pls_res,rss,ncomp1,gamma,iprint); /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; Partial Least Squares. See Wold (1975). This subroutine allows user to artificially decrease (gamma < 1) or increase (gamma > 1) the degree of multicolinearity in the X data. Only one SVD is used in contrast to the simpls approach coded in pls_reg which should increase performance. This is called canonical PLS. /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; pls_reg and pls1_reg take the response variable into account, and therefore often leads to models that are able to fit the response variable with fewer components. pls1_reg is designed to implement de Jong-Wise_Ricker Matlab code from which this discussion of PLS and PC regression has been developed. Alternative Matlab code came from de Jong, S. "SIMPLS: An Alternative Approach to Partial Least Squares Regression." Chemometrics and Intelligent Laboratory Systems. Vol. 18, 1993, pp. 251–263 and was implemented in B34S as pls_reg. This approach is substantially slower and has less capability. The complete reference for this code is: de Jong, Sijmen, Barry Wise, N. Lawrence Ricker "Canonical Partial Least Squares and Continuum Power Regression," Journal of Chemometrics Vol. 15, 2001, pp 85-100 pc_reg creates components to explain the observed variability in the predictor variables, without considering the response variable at all. y x => => u v s y0 x0 r pls_beta => => => => => => => pls_coef => yhat res rss => => => left hand variable. Usually %y from olsq with :savex left hand variable. Usually %x from olsq with :savex n by k from svd(x) x=u*diagmat(s)*transpose(v) from svd(x) singular values y with mean removed X with means subtracted except for last col weight matrix. Note bigt=x0*r from equation (16) pls_beta such that x0*r*pls_beta + mean(y) maps to yhat t*pls_beta + mean(y) since t = x0*r If ncomp is set = k, then pls_coef is the same as the ols coefficients. This might change in future releases to be simular to pls_reg. Note: (t*pls_beta)+mean(y) = x*pls_coef = yhat Predicted y value for last regression Residual for last PLS regression Residual Sum of Squares all ncomp models 10-32 /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; /; ncomp gamma Chapter 10 => => Note: iprint # of cols in pls_beta =0 for OLS =1. for pls => infinity for pc Use caution changing gamma gamma = 0 => OLS gamma = 1 => PLS gamma > 1 => increasing multicolinearity in dataset set .le. 20 gamma < 1 => decrease multicolinearity in dataset => =0 print nothing, =1 print results, =2 suppress coef list Note: bigt= x0*r; bigt= u * t_tilda => t_tilda= transpose(u) * bigt loadings of bigt with respect to y0 cc = y0*u*t_tilda = y0*bigt fitted y0= bigt* cc = x0*r*y0*x0*r Built 11 May 2011 by Houston H. Stokes Contributions made by Michael Hunstad to Matlab version. First Equation number refers to de Jong-Sijman-Wise-Ricker (2001) kin=nocols(x); kk=nocols(y); n=norows(x); ncomp=ncomp1; if(ncomp.gt.kin)then; call epprint( 'ERROR: ncomp must be 0 < ncomp le # cols of x. was',ncomp:); call epprint( ' # of columns of x was ',kin:); go to finish; endif; if(norows(y).ne.norows(x))then; call epprint('ERROR: # of obs in y and x not the same':); go to finish; endif; if(kk.gt.1)then; call epprint( 'ERROR: This release of pls1_reg limited to one left hand variable':); go to finish; endif; meany=mean(y); meanx=array(kin:); y0=y-meany; x0=x; do i=1,(kin-1); meanx(i)=mean(x(,i)); x0(,i)=mfam(afam(x(,i))-sfam(afam(meanx(i)))); enddo; s=svd(x0,ibad,21,u,v ); if(ibad.ne.0)then; call epprint('ERROR: SVD x0 failed':); go to finish; endif; L=afam(s)*afam(s); k=idint(sum(L.gt.(sfam(L(1))/1.e+14))); L=L(integers(1,k)); ncomp=min1(ncomp,k); u=submatrix(u,1,norows(u),1,k); v=submatrix(v,1,norows(v),1,k); Lgam=vfam(afam(L)**gamma); rho=(y0*u); bigt=matrix(k,ncomp:); rss=vector(ncomp:); do i=1,ncomp; maxcol=max1(1,i-1); t=afam(Lgam)*afam(rho); Special Topics in OLS Estimation t=t- afam(submatrix(bigt,1,k,1,maxcol)* (transpose(submatrix(bigt,1,k,1,maxcol))*vfam(t))); t=vfam(afam(t)/sqrt(sumsq((t)))); bigt(,i)=vfam(t); rho=rho-(vfam(t)*(vfam(t)*vfam(rho))); /; Equation (16) (10.4-13) pls_beta=y0*u*bigt; yhat=(u*bigt*pls_beta) + meany; rss(i)=sumsq((afam(y)-afam(yhat))); enddo; /; Equation (14) (10.4-8) r=v*(diagmat((1./dsqrt(afam(L))))*bigt); /; Equation (22) (10.4-13) pls_coef=r*pls_beta; i=norows(pls_coef); pls_coef(i)=sfam(meany)-sfam(vfam(meanx)*pls_coef); /; Equation (11) (10.4-7) bigt=u*bigt; yhat=x*pls_coef; pls_res=vfam(afam(y)-afam(yhat)); tss=variance(y)*dfloat(norows(y)-1); rsq=1.0-afam(rss(ncomp)/tss); if(iprint.ne.0)then; call print(' ':); iix=nocols(x); call print('Partial Least Squares PLS1 - 9 May 2011 Version. ' :); call print('Logic from de Jong, Wise, Ricker (2001) Matlab Code':); call print('Number of rows in original data ', norows(x):); call print('Number Columns in origional data ',iix:); call print('Number Columns in PLS Coefficient Vector ', ncomp:); if(ncomp.lt.ncomp1) call print('Note: PLS coefficient vector reduced due to rank of X':); call print('Gamma ', gamma:); call print('Mean of left hand variable ', meany:); call print('PLS sum of squared errors ', rss(ncomp):); call print('Total sum of squares ',tss:); call print('PLS R^2 ',rsq:); if(iprint.ne.2)then; call tabulate(pls_beta,pls_coef :title '(T*pls_beta)+mean(y) = x*pls_coef'); endif; endif; /; go to done; finish continue; yhat=missing(); phs_coef=missing(); done continue; return; end; subroutine crmtest(y,x,ncomp,gammag,rsstest,rote,iprint,noshow); /; /; Investigates the effect of changes in gamma on the RSS /; Various gamma => alternate Continuum Regression Models /; /; y => left hand variable. Must be vector /; x => right hand variable matrix with constant included /; ncomp => # of PLS/CR vectors /; gammg => Vector of gamma values /; rsstest=> Matrix of RSS values for 1-ncomp vectors and gammag /; rote => Sets rotation /; iprint => =0 do not give pls1_reg output /; =1 give pls1_reg output /; =2 do not give pls1_coef list 10-33 10-34 Chapter 10 /; noshow => =1 Just produce graph in crm_test.wmf /; =0 show graph and save graph /; /; /; /; 0 < gamma < 1 => Multicollinearity taken from X matrix /; 1 < gamma < 15.=> Multicollinearity added to X matrix. /; gamma = 1 => PLS Model. a large value of gamma /; approachs PC /; /; Subroutine crmtest built 23 May 2010 by Houston H. Stokes /; +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; ii=ncomp; jj=norows(gammag); rsstest=matrix(ii,jj:); do j=1,jj; gamma=gammag(j); call pls1_reg(y,x,y0,x0,r,c,,u,v,s,pls2coef,yhat, pls_res,pls_rss, ncomp,gamma,iprint); rsstest(,j)=pls_rss; enddo; scaleadj=array(4: dfloat(1), gammag(1), dfloat(ii),gammag(norows(gammag))); if(noshow.eq.1)then; call graph(rsstest :plottype meshc :d3axis :d3border :grid :noshow :rotation rote :pgborder :file 'crm_test.wmf' :xlabel '# Vectors' :ylabelleft 'Gamma' :pgunits scaleadj :heading 'RSS vs # vectors and gamma' ); endif; if(noshow.ne.1)then; call graph(rsstest :plottype meshc :d3axis :d3border :grid :rotation rote :pgborder :file 'crm_test.wmf' :xlabel '# Vectors' :ylabelleft 'Gamma' :pgunits scaleadj :heading 'RSS vs # vectors and gamma' ); endif; return; end; Table 10.6 is the driving program to analyse the gas data using the PLS and CRM methods of analysis. Special Topics in OLS Estimation Table 10.6 Effect on the RSS of PCM, PLS and CRM Models of Varying Degrees b34sexec options ginclude('gas.b34'); b34srun; b34sexec matrix; call loaddata; call load(pls_reg); call load(pls1_reg); call load(pc_reg); call echooff; nn=6; call olsq(gasout gasin{1 to nn} gasout{1 to nn} :print :savex); ols_coef=%coef; ols_rss =%rss; iprint=1; %xhold=%x; %yhold=%y; /; Test model - As setup will stop testmod=1; if(testmod.ne.0)then; iprint=2; noshow=0; ncomp=4; gammag=grid(.05,1.,.01); call print(gammag); rote=270.; call crmtest(%y,%x, ncomp,gammag,rsstest,rote,iprint,noshow); call print(rsstest); call stop; endif; call pc_reg(%y,%x,ols_coef,ols_rss,tss,pc_coef, pcrss,pc_size,u,iprint); jj=integers(norows(pcrss),1,-1); pc_rss=pcrss(jj); /; /; Note: Use caution changing gamma for pls1_reg /; gamma = 0 => OLS /; gamma = 1 => PLS /; gamma > 1 => increasing multicolinearity in dataset /; gamma < 1 => decrease multicolinearity in dataset /; ncompmax=9; /; /; simpls code: Left in for testing purposes only. /; pls1_reg much faster! /; /; call pls_reg(%y,%x,pls_coef,xload,yload,xscores, /; yscores,weights,yhat,error_pls,pls_rss,ncompmax,iprint); /; /; call tabulate(pc_rss,pls_rss); /; gamma= 1.; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,pls_rss,ncompmax,gamma,iprint); k1=norows(pls_rss); k2=norows(pc_rss); if(k1.lt.k2)call deleterow(pc_rss,k1+1,(k2-k1)); %x=%xhold; %y=%yhold; rss_1p=pls_rss; ncompmax=6; gamma= 1.; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,pls_rss,ncompmax,gamma,iprint); jj=integers(norows(pcrss),1,-1); pc_rss=pcrss(jj); gamma= .1; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, 10-35 10-36 Chapter 10 pls_res,rss_p1 ,ncompmax,gamma,iprint); gamma= .2; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,rss_p2 ,ncompmax,gamma,iprint); gamma= .4; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,rss_p4 ,ncompmax,gamma,iprint); gamma= 1.5; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,rss_1p5 ,ncompmax,gamma,iprint); gamma= 10.; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,rss_10p,ncompmax,gamma,iprint); gamma= 15.; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,rss_15p,ncompmax,gamma,iprint); k1=norows(rss_p1); k2=norows(pc_rss); if(k1.lt.k2)call deleterow(pc_rss,k1+1,(k2-k1)); call graph(pc_rss rss_p1 rss_p2, rss_p4,pls_rss, rss_1p5,rss_10p,rss_15p :nocontact :pgborder :grid :nolabel :file 'pc_pls_rss.wmf' :heading 'Res Sum sq vs # pls_.5, pls,% pls_1p5 components'); call tabulate(pc_rss rss_p1 rss_p2, rss_p4,pls_rss, rss_1p5,rss_10p,rss_15p :title 'As Gamma increases => PC Regression rss'); gamma= .1; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,rss_p1 ,3,gamma,iprint); t=(x0*r); call print(' ':); call print('Using Only Three vectors of Reduced Model we Try OLS':); call olsq(%y t :print); call print(' ':); call print('We now see what Stepwise will do in terms of RSS':); call stepwise(%y %xhold :nstep 3 :noint :print :printsteps); b34srun; Edited results from running the commands in Table 10.6 are listed next. B34SI 8.11F Variable TIME GASIN GASOUT CONSTANT (D:M:Y) 19/ 5/11 (H:M:S) 20:45:35 Label 1 2 Input gas rate in cu. ft / min 3 Percent CO2 in outlet gas 4 Number of observations in data file Current missing variable code DATA STEP # Cases 296 148.500 296 -0.568345E-01 296 53.5091 296 1.00000 296 1.000000000000000E+031 B34SI Matrix Command. d/m/y 19/ 5/11. h:m:s 20:45:35. => CALL LOADDATA$ => CALL LOAD(PLS1_REG)$ => CALL LOAD(PC_REG)$ => CALL ECHOOFF$ The OLS results liusted next are the "base case." Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error PAGE Mean GASOUT 0.994664107436370 0.994432949635779 16.1385829591582 5.826203234353124E-002 0.241375293564878 Std. Dev. 85.5921 1.07277 3.20212 0.00000 Variance Maximum Minimum 7326.00 1.15083 10.2536 0.00000 296.000 2.83400 60.5000 1.00000 1.00000 -2.71600 45.6000 1.00000 1 Special Topics in OLS Estimation Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F(12, 277) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable GASIN GASIN GASIN GASIN GASIN GASIN GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT CONSTANT Lag 1 2 3 4 5 6 1 2 3 4 5 6 0 Coefficient 0.63160860E-01 -0.13345763 -0.44123536 0.15200749 -0.12036440 0.24930584 1.5452265 -0.59293307 -0.17105674 0.13238479 0.56869923E-01 -0.42085617E-01 3.8241094 10-37 3024.53296551724 7.36469419004290 53.5096551724138 3.23504435694615 48.1338529453902 4302.96578742115 1.00000000000000 1.929993696611666E-008 1.43081466326252 290 SE 0.75989856E-01 0.16490508 0.18869442 0.19021604 0.17941884 0.10973982 0.59808504E-01 0.11024897 0.11518138 0.11465530 0.10083191 0.42891891E-01 0.85547296 t 0.83117489 -0.80929968 -2.3383593 0.79913078 -0.67085705 2.2717902 25.836234 -5.3781279 -1.4851076 1.1546329 0.56400722 -0.98120217 4.4701698 The principle component results indicate the increase in e ' e that occurs when less that 13 principle component vectors are used. Principle Total Sum Number of Number of Component Regression Model of Squares 3024.53296551724 observations in X 290 columns in X 13 PC and OLS Coefficients Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 PC_COEF -912.2 -27.16 5.125 -18.93 5.327 3.503 -2.414 -1.023 -1.104 -0.3703 -0.3912 0.3701 -1.058 OLS_COEF 0.6316E-01 -0.1335 -0.4412 0.1520 -0.1204 0.2493 1.545 -0.5929 -0.1711 0.1324 0.5687E-01 -0.4209E-01 3.824 Shrinkage Accuracy loss Obs PC_SIZE 1 2 3 4 5 6 7 8 9 10 11 12 13 13 12 11 10 9 8 7 6 5 4 3 2 1 PC_RSS 16.14 17.26 17.39 17.55 17.68 18.90 19.95 25.78 38.05 66.42 424.9 451.1 1189. RSQ 0.9947 0.9943 0.9942 0.9942 0.9942 0.9937 0.9934 0.9915 0.9874 0.9780 0.8595 0.8508 0.6070 Note that for 9 partial least squares vectors e ' e 16.1597 while for the PCR model e ' e 17.68 . Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 PLS_BETA 47.70 25.33 6.602 5.115 3.910 2.053 PLS_COEF 0.6239E-01 -0.1572 -0.3602 0.4605E-01 -0.5736E-01 0.2355 290 13 9 1.00000000000000 53.5096551724138 16.1597525278728 3024.53296551724 0.994657108151205 10-38 7 8 9 10 11 12 13 Chapter 10 1.325 0.8342 0.2304 NA NA NA NA 1.550 -0.6111 -0.1489 0.1277 0.4777E-01 -0.3690E-01 3.820 Here only 6 latent vectors are calculated but the first 6 are rthe same as when 9 were calculated in the prior estimation. The implied OLS coefficients are however all different. For this problem e ' e 18.665 while for a PCR mopdel with 6 latent vectors e ' e 25.78 which is substantially larger. Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 290 13 6 1.00000000000000 53.5096551724138 18.6653423691790 3024.53296551724 0.993828686087412 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 PLS_BETA 47.70 25.33 6.602 5.115 3.910 2.053 NA NA NA NA NA NA NA PLS_COEF 0.1171 -0.1598 -0.3709 -0.2662 0.6345E-01 0.4162 1.222 -0.5941E-01 -0.3070 -0.1807E-01 0.1571 -0.5878E-01 3.439 Here a CRM model is estimated where .1 . Note e ' e 16.138 which shows the gain over the prior PLS model. Models with .2 and .4 were tried with only marginal changes in e ' e. Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 290 13 6 0.100000000000000 53.5096551724138 16.1385839842000 3024.53296551724 0.994664107097461 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 PLS_BETA PLS_COEF 54.53 0.6322E-01 5.776 -0.1337 1.081 -0.4408 0.1641 0.1515 0.3024E-01 -0.1199 0.5818E-02 0.2490 NA 1.545 NA -0.5930 NA -0.1711 NA 0.1324 NA 0.5687E-01 NA -0.4209E-01 NA 3.826 Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 PLS_BETA 53.82 PLS_COEF 0.6393E-01 290 13 6 0.200000000000000 53.5096551724138 16.1391507928914 3024.53296551724 0.994663919693753 Special Topics in OLS Estimation 2 3 4 5 6 7 8 9 10 11 12 13 10.14 2.845 0.8182 0.2597 0.9250E-01 NA NA NA NA NA NA NA 10-39 -0.1367 -0.4325 0.1397 -0.1107 0.2436 1.545 -0.5923 -0.1745 0.1336 0.5915E-01 -0.4358E-01 3.860 Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 290 13 6 0.400000000000000 53.5096551724138 16.2201208262738 3024.53296551724 0.994637148607339 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 PLS_BETA 52.00 16.39 5.139 2.643 1.390 0.6000 NA NA NA NA NA NA NA PLS_COEF 0.6955E-01 -0.1453 -0.3834 0.2769E-01 -0.4567E-01 0.2386 1.514 -0.5271 -0.2266 0.1303 0.1027 -0.6803E-01 4.005 The next three models show CRM models where was set to 1.5, 10.0 and 15.0 respectively. The result was a poorer fit. Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 290 13 6 1.50000000000000 53.5096551724138 20.8390179186053 3024.53296551724 0.993110004699505 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 PLS_BETA 46.13 27.75 6.983 5.662 4.138 2.882 NA NA NA NA NA NA NA PLS_COEF 0.1431 -0.1815 -0.4012 -0.3103 0.7465E-01 0.4672 1.094 0.9743E-01 -0.2608 -0.9247E-01 0.9881E-01 -0.2192E-02 3.501 Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 PLS_BETA 45.08 29.07 7.601 6.030 3.252 1.041 PLS_COEF -0.1727 -0.1519 -0.1474 -0.1569 -0.1369 -0.4914E-01 290 13 6 10.0000000000000 53.5096551724138 43.2030846923479 3024.53296551724 0.985715783168870 10-40 7 8 9 10 11 12 13 Chapter 10 NA NA NA NA NA NA NA 0.7445 0.3305 -0.1016 -0.3090 -0.1595 0.2667 12.19 Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 290 13 6 15.0000000000000 53.5096551724138 200.245716277202 3024.53296551724 0.933792847173363 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 PLS_BETA 45.08 29.07 7.601 6.059 6.059 6.059 NA NA NA NA NA NA NA PLS_COEF 0.4093 0.7494E-01 -0.2962 -0.5521 -0.5721 -0.3516 0.8249 0.1877 -0.3572 -0.4844 -0.1153 0.5611 20.45 As Gamma increases => PC Regression rss This table summarizes the effect of the residual sum of squares for settings of .1, .2, .4, 1., 1.5, 10., and 15. Resulting in 6 PLS vectors to be compared to the PC_RSS vector. Obs 1 2 3 4 5 6 PC_RSS 1189. 451.1 424.9 66.42 38.05 25.78 RSS_P1 50.69 17.34 16.17 16.14 16.14 16.14 RSS_P2 127.8 24.98 16.88 16.22 16.15 16.14 RSS_P4 320.5 51.91 25.50 18.51 16.58 16.22 PLS_RSS 749.6 107.9 64.33 38.17 22.88 18.67 RSS_1P5 896.9 127.1 78.33 46.27 29.15 20.84 RSS_10P 992.0 146.8 89.06 52.70 42.12 43.20 RSS_15P 992.0 146.8 89.06 52.70 89.76 200.2 Using only three latent vectors with .1 the residual sum of squares is 16.166. This is substantially smaller than the case with the PC approach were e ' e 424.9 . Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 290 13 3 0.100000000000000 53.5096551724138 16.1664707952968 3024.53296551724 0.994654886893411 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 PLS_BETA 54.53 5.776 1.081 NA NA NA NA NA NA NA NA NA NA PLS_COEF 0.6363E-01 -0.1371 -0.4245 0.1119 -0.1005 0.2502 1.521 -0.5501 -0.1862 0.1164 0.6327E-01 -0.3639E-01 3.867 Using Only Three vectors of Reduced Model we Try OLS Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares %Y 0.994654886893411 0.994598819273412 16.1664707952967 5.652612166187671E-002 0.237752227459338 3024.53296551724 Special Topics in OLS Estimation Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 3, 286) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Col____1 Col____2 Col____3 CONSTANT Lag 0 0 0 0 Coefficient 54.532922 5.7756087 1.0813185 53.509655 7.11434715299218 53.5096551724138 3.23504435694615 48.2875380708319 17740.2730293859 1.00000000000000 3.452225714523516E-003 1.41956375257301 290 SE 0.23775223 0.23775223 0.23775223 0.13961292E-01 t 229.36871 24.292554 4.5480900 3832.7153 We now see what Stepwise will do in terms of RSS Stepwise Option called. Y Variable %Y Y Variable Mean 53.5096551724138 Y Variable Variance 10.4655119914091 Y Variable Number 13 Number of observations 290 PIN set as 5.000000000000000E-002 POUT set as 0.100000000000000 TOL set as 2.220446049250313E-014 Constant estimated in all models - Listed for Final Model X Var. # 1 2 3 4 5 6 7 8 9 10 11 12 Name Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Lag 0 0 0 0 0 0 0 0 0 0 0 0 Mean -0.59800000E-01 -0.57886207E-01 -0.56775862E-01 -0.56613793E-01 -0.57286207E-01 -0.58534483E-01 53.496207 53.482759 53.467931 53.451379 53.434483 53.417931 Var 1.1731711 1.1737639 1.1742883 1.1743570 1.1741485 1.1738231 10.423757 10.373543 10.308830 10.227766 10.139360 10.047221 FORWARD STEPWISE SELECTION STEP 0: No variables entered. * * * Statistics for Variables Not in the Model * * * Coef. Standard t-statistic Prob. of Variance Variable Estimate Error to enter Larger t Inflation 1 -1.788 0.1410 -12.682 0.0000 1 2 -2.165 0.1212 -17.873 0.0000 1 3 -2.516 0.0947 -26.586 0.0000 1 4 -2.761 0.0669 -41.239 0.0000 1 5 -2.838 0.0546 -51.977 0.0000 1 6 -2.732 0.0710 -38.483 0.0000 1 7 0.975 0.0137 71.218 0.0000 1 8 0.904 0.0258 35.057 0.0000 1 9 0.805 0.0357 22.520 0.0000 1 10 0.696 0.0433 16.085 0.0000 1 11 0.593 0.0486 12.201 0.0000 1 12 0.506 0.0522 9.681 0.0000 1 STEP 1 : Variable 7 entered. Dependent Variable 13 Source Regression Error Total Variable 7 R-squared (percent) 94.627 Adjusted R-squared 94.608 Est. Std. Dev. of Model Error 0.7512 * * * Analysis of Variance * * * Sum of Mean DF Squares Square Overall F 1 2862.0 2862.0 5072.046 288 162.5 0.6 289 3024.5 Prob. of Larger F 0.0000 * * * Inference on Coefficients * * * (Conditional on the Selected Model) Coef. Standard Prob. of Estimate Error t-statistic Larger t 0.975 0.0137 71.218 0.0000 Variance Inflation 1.00 * * * Statistics for Variables Not in the Model * * * Coef. Standard t-statistic Prob. of Variance Estimate Error to enter Larger t Inflation -0.495 0.0365 -13.584 0.0000 1.31 -0.662 0.0328 -20.179 0.0000 1.56 -0.860 0.0310 -27.718 0.0000 2.12 -1.072 0.0429 -24.958 0.0000 3.50 -1.033 0.0908 -11.381 0.0000 7.16 0.421 0.1336 3.150 0.0018 11.06 -0.857 0.0306 -27.994 0.0000 18.53 -0.405 0.0207 -19.580 0.0000 5.25 -0.245 0.0178 -13.746 0.0000 2.75 -0.161 0.0166 -9.717 0.0000 1.89 -0.108 0.0159 -6.781 0.0000 1.51 Variable 1 2 3 4 5 6 8 9 10 11 12 STEP 2 : Variable 8 entered. 10-41 10-42 Chapter 10 Dependent Variable 13 Source Regression Error Total Variable 7 8 R-squared (percent) 98.560 Adjusted R-squared 98.550 Est. Std. Dev. of Model Error 0.3896 * * * Analysis of Variance * * * Sum of Mean DF Squares Square Overall F 2 2981.0 1490.5 9819.495 287 43.6 0.2 289 3024.5 Prob. of Larger F 0.0000 * * * Inference on Coefficients * * * (Conditional on the Selected Model) Coef. Standard Prob. of Estimate Error t-statistic Larger t 1.807 0.03056 59.126 0.0000 -0.857 0.03063 -27.994 0.0000 Variance Inflation 18.53 18.53 * * * Statistics for Variables Not in the Model * * * Coef. Standard t-statistic Prob. of Variance Variable Estimate Error to enter Larger t Inflation 1 -0.279 0.02036 -13.687 0.0000 1.53 2 -0.379 0.02154 -17.580 0.0000 2.15 3 -0.514 0.02792 -18.403 0.0000 3.79 4 -0.551 0.05009 -11.003 0.0000 7.96 5 -0.082 0.07212 -1.131 0.2589 11.64 6 0.323 0.06792 4.749 0.0000 11.08 9 0.465 0.05263 8.835 0.0000 68.97 10 0.213 0.02172 9.788 0.0000 12.22 11 0.121 0.01358 8.936 0.0000 4.54 12 0.080 0.01042 7.695 0.0000 2.50 STEP 3 : Variable 3 entered. Dependent Variable 13 Source Regression Error Total Variable 3 7 8 R-squared (percent) 99.341 Adjusted R-squared 99.334 Est. Std. Dev. of Model Error 0.2641 * * * Analysis of Variance * * * Sum of Mean DF Squares Square Overall F 3 3004.6 1001.5 14361.258 286 19.9 0.1 289 3024.5 Prob. of Larger F 0.0000 * * * Inference on Coefficients * * * (Conditional on the Selected Model) Coef. Standard Prob. of Estimate Error t-statistic Larger t -0.514 0.02792 -18.403 0.0000 1.352 0.03225 41.926 0.0000 -0.518 0.02777 -18.648 0.0000 Variance Inflation 3.79 44.91 33.16 * * * Statistics for Variables Not in the Model * * * Coef. Standard t-statistic Prob. of Variance Estimate Error to enter Larger t Inflation -0.052 0.02918 -1.775 0.0770 4.17 -0.139 0.05694 -2.442 0.0152 16.05 0.324 0.07612 4.258 0.0000 29.88 0.282 0.04985 5.663 0.0000 13.41 0.300 0.04440 6.762 0.0000 11.09 0.200 0.04233 4.730 0.0000 82.25 0.103 0.01758 5.852 0.0000 14.62 0.060 0.01060 5.647 0.0000 5.23 0.039 0.00784 5.009 0.0000 2.77 Variable 1 2 4 5 6 9 10 11 12 Intercept Std. error t Stat. 8.85996699737638 0.423056555466042 20.9427483935716 Figure 10.5 shows that the explained sum of squares starts to level out if there are more than 4 vectors in the T matrix. Figure 10.6 shows the pattern of the residual sum of squares for alternative values and number of columns in the T matrix. In general the PC model had a larger residual sum of squares for every number of latent vectors except for the 10 and 15 models for > 4 vectors. For example e ' e was 42.12 and 43.20 for the 10 model for a model of 5 and 6 vectors respectively. For the 15 model the values were 89.76 and 200.2 while the corresponding PC model values were 38.05 and 25.78 respectively. Clearly was too large. Special Topics in OLS Estimation 10-43 Explained sum of squares as a function # of PC terms 1.00 .95 .90 .85 .80 .75 .70 .65 .60 2 4 6 8 10 12 PC_SIZE Figure 10.5 PCR Model Explained Sum of Squares vs # of Latent Vectors 10-44 Chapter 10 Res Sum sq vs # pls_.5, pls,% pls_1p5 components 1200 1100 1000 900 800 700 PRRRPRRR CSSSLSSS _SSSSSSS R_______ SPPPR111 S124SP05 S5PP 600 500 400 300 200 100 0 1 1.5 2 2.5 3 3.5 Obs 4 4.5 5 5.5 6 Figure 10.6 Residual Sum of Squares for various CR Models Figure 10.7 shows the same information as figure 10.6 except for more cases and in a 3-D context. A grid (.05,.06, ,.99,1.) was used for up to 4 CRM vectors (4*96) for 384 models where e ' e was calculated and displayed on the vertical axis. The resulting 3-D graph illustrates the gain of CRM models over PLS models. Special Topics in OLS Estimation 10-45 RSS vs # vectors and gamma Z A x i s 700 600 500 400 300 200 100 4 3 2 # Ve ctor s 1 .10 .20 .30 .40 .50 .60 .70 .80 .90 1.00 a Gamm Figure 10.7 Residual Sum of Squares Surface The next dataset was used by Matlab to illustrate the PLS model where there are more columns in the X matrix than observations. The Matlab spectra dataset consists of 60 samples of gasoline where 401 near infrared (NIR) spectral intensities were measured. The left hand side variable is the octane rating of the gasoline. Table 10.7 lists the setup to study this dataset. Figure 10.8 lists the X matrix. It can be seen that the spectral intensities are quite different depending on frequency but remarkedly stable across the 60 samples. Figure 10.9 plots e ' e for various values (.1.,.2, , 2.4, 2.5) with from one to five vectors for a total of 125 models. A value of 1 and 6 vectors was selected as a reasonable model. Figure 10.10 shows a plot of the T matrix of latent values while figure 10.11 shows how the data in the original X matrix maps to the six latent vectors. Table 10.7 Setup for Analysis of Octain Data /; /; Data obtained from Matlab /; b34sexec matrix; 10-46 Chapter 10 call echooff; call load(pls1_reg); call load(pc_reg); testmod=0; /; Data on near infrared (NIR) spectral Intensities at 401 wavelengths /; 60 samples. Octane rating is in col 1. Matrix is 60,402 /; call getmatlab(dd :file 'c:\b34slm\mfiles\spectra.dat'); y=dd(,1); x=submatrix(dd,1,norows(dd),2,nocols(dd)); call graph(x :plottype meshc :d3axis :d3border :grid :rotation 180. :pgborder :file 'Spectra_data.wmf' :xlabel 'Sample #' :ylabelleft 'NIF Frequency' :heading 'Spectra NIR vs Octane' ); /; Test model if(testmod.ne.0)then; iprint=2; noshow=0; ncomp=5; gammag=grid(.1, 2.5,.1); call print(gammag); rote=180.; call crmtest(y,x, ncomp,gammag,rsstest,rote,iprint,noshow); call print(rsstest); call stop; endif; /; Model test using PC Model iprint=2; call pc_reg(y,x,ols_coef,ols_rss,tss,pc_coef, pcrss,pc_size,u,iprint); /; PLS Model ncomp=6; gamma=1.0; call pls1_reg(y,x,y0,x0,r,c,,u,v,s,pls2coef,yhat, pls_res,pls_rss, ncomp,gamma,iprint); T=x*r; call print(r); call print(t); call graph(T :plottype meshc :d3axis :d3border :grid :pgborder :xlabel 'Sample' :ylabelleft 'PLS Latent Vector' :file 'pls_latent_v.wmf' :heading 'PLS Latent Vector matrix T'); /; 2D graphs of latent vectors /; /; /; /; t1=t(,1); t2=t(,2); t9=t(,9); call graph(t1,t2,t9); scale=array(4: 1.,1.,dfloat(nocols(x)),dfloat(ncomp)); call graph(r :plottype meshc :d3axis :d3border :grid :pgborder :xlabel 'X Data Mapping' :pgunits scale /; :rotation 90. :ylabelleft 'PLS Latent Vector' :file 'pls_loading.wmf' :heading 'PLS Loading Matrix R'); /; /; /; /; r1=r(,1); r2=r(,2); r9=r(,9); call graph(r1 r2 r9); call olsq(y t :print ); b34srun; Special Topics in OLS Estimation 10-47 Spectra NIR vs Octane Z A x i s 1.2 1 .8 .6 .4 .2 0 400 300 200 100 NIF Freq uenc y Figure 10.8 Octain vs NIG Data 60 50 40 30 le # Samp 20 10 10-48 Chapter 10 RSS vs # vectors and gamma Z A x i s 120 100 80 60 40 20 1 2 2 1.5 3 1 .5 Gamm a 4 5 s ctor # Ve Figure 10.9 Sensitivity of Octain Model to # of Vectors and setting Special Topics in OLS Estimation 10-49 PLS Latent Vector matrix T Z A x i s 4 2 0 -2 -4 1 2 3 PLS 4 Late nt V ect Figure 10.10 Octain Model Matrix T 5 6 10 20 40 30 e l Samp 50 60 10-50 Chapter 10 PLS Loading Matrix R Z A x i s 2.5 2 1.5 1 .5 0 -.5 -1 1 400 2 3 PLS 300 4 200 5 Late nt V ect 100 6 ng appi M a t X Da Figure 10.11 Maping of X data to the PLS Vectors Edited and annotated output from running the commands in Table 10.7 are listed next. First aPC model calculates 60 PC vectors and 401 implied OLS coefficients. B34SI Matrix Command. d/m/y 26/ 5/11. h:m:s 13: 1:46. => CALL ECHOOFF$ Data File built by Matlab 21-May-2011 11:38:12 Principle Total Sum Number of Number of Number of Component Regression Model of Squares 138.127125000000 observations in X 60 columns in X 401 PC elements 60 PC and OLS Coefficients Obs 1 2 3 4 5 6 7 8 PC_COEF OLS_COEF -675.2 -23.01 -2.159 -10.65 -6.105 -26.52 -0.4348 -21.86 -10.88 -21.90 0.9242E-01 5.019 -2.296 6.909 0.6961 -7.443 Special Topics in OLS Estimation 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 398 399 400 401 10-51 0.4730 -14.68 -1.858 -38.76 1.198 6.106 -1.570 14.45 -0.3460 3.334 -0.5206 7.614 0.1264 9.576 -0.2591 -1.239 0.3299 1.752 0.1848 -1.778 0.8297 -2.746 0.8646E-01 5.230 0.8050E-02 9.994 0.1637 6.767 0.3583 -10.93 -0.9069E-01 4.438 -0.3398 0.1093 -0.1047E-01 5.693 -0.5152 -10.47 0.2129E-01 -3.665 -0.1544 1.370 0.3143 0.9660 -0.4799E-01 -11.18 0.2204E-01 9.342 0.2801 12.99 -0.1853 6.821 0.1522 4.310 -0.3894E-01 -1.419 -0.1909 -0.8463 0.3460 -13.15 0.1246 12.81 0.2984 7.896 -0.2201E-01 -4.004 0.2400 1.009 0.5249 9.465 -0.1540 -2.972 -0.1975E-01 -1.656 -0.1968 5.203 -0.4844E-01 3.471 -0.1857E-01 -7.817 0.5029E-01 -13.18 0.2989E-01 13.92 -0.2077 -2.524 0.1306 2.368 0.2120 -7.109 -0.8663E-01 -11.43 -0.2303 8.175 -0.2034 2.233 0.8019E-01 9.659 -0.1181 5.830 -0.9125E-01 -11.84 0.2023 -10.18 NA NA NA NA -1.375 -1.992 0.2682 8.139 The accuracy loss as less and less PC vectors are listed next. If for example there were only 6 PC vectors e ' e 16.39. Shrinkage Accuracy loss Obs PC_SIZE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 60 59 58 57 56 55 54 53 52 51 50 49 48 47 PC_RSS RSQ 0.2777E-24 1.000 0.4094E-01 0.9997 0.4927E-01 0.9996 0.6322E-01 0.9995 0.6965E-01 0.9995 0.1110 0.9992 0.1640 0.9988 0.1716 0.9988 0.2165 0.9984 0.2336 0.9983 0.2767 0.9980 0.2776 0.9980 0.2801 0.9980 0.2805 0.9980 10-52 Chapter 10 15 46 0.2828 48 49 50 51 52 53 54 55 56 57 58 59 60 13 12 11 10 9 8 7 6 5 4 3 2 1 2.941 3.060 5.526 6.961 10.41 10.64 11.12 16.39 16.40 134.7 134.9 172.2 176.9 0.9980 0.9787 0.9778 0.9600 0.9496 0.9246 0.9230 0.9195 0.8813 0.8813 0.2457E-01 0.2320E-01 -0.2466 -0.2803 Note that e ' e 1.769 far lower that a 6 vector model PC model. Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 60 401 6 1.00000000000000 87.1775000000000 1.76946660567481 138.127125000000 0.987189579123761 The 4001 by 6 R matrix, of which only 20 rows are shown, maps from X to T using equation (10.4-7). The jth column measures the maping of the ith column in the X matrix to the jth latent vector. The latent vector matrix T is shown in figure 10.10 while the R matrix is shown in figure 10.11. The full T matrix is shown below. R = Matrix of 1 2 3 4 5 6 7 8 9 10 11 12 T 1 -0.312977E-02 -0.198509E-02 -0.187377E-02 -0.128795E-02 -0.121976E-02 -0.122001E-02 0.971001E-04 -0.160181E-02 -0.335944E-02 -0.631175E-02 -0.875870E-02 -0.101636E-01 = Matrix of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 0.988901 0.693743 0.793356 0.767886 1.14628 0.941463 1.11180 0.928202 1.06688 1.11623 1.22368 1.11970 0.927796 0.825877 0.695175 1.01141 1.08833 1.07239 0.907406 0.946685 401 by 2 -0.710386E-04 0.280989E-03 0.384907E-03 0.714660E-03 0.793598E-03 0.890373E-03 0.130553E-02 0.765827E-03 0.659687E-04 -0.104119E-02 -0.191428E-02 -0.242430E-02 60 by 2 0.297868E-01 -0.220051E-01 0.269856E-02 -0.123758E-01 0.675831E-01 0.230492E-01 0.624769E-01 0.299545E-01 0.547265E-01 0.634153E-01 0.816043E-01 0.591827E-01 0.232402E-01 0.100105E-01 -0.123719E-01 0.345590E-01 0.558072E-01 0.556562E-01 0.199947E-01 0.324142E-01 6 elements 3 0.243791E-01 0.300102E-01 0.336652E-01 0.432287E-01 0.461349E-01 0.504599E-01 0.573708E-01 0.482682E-01 0.324927E-01 0.909237E-02 -0.833871E-02 -0.187356E-01 6 4 0.864504E-01 0.846470E-01 0.841480E-01 0.847414E-01 0.861428E-01 0.867424E-01 0.992914E-01 0.894234E-01 0.908605E-01 0.789190E-01 0.679014E-01 0.692084E-01 5 0.840030E-01 0.744934E-01 0.755793E-01 0.762021E-01 0.737757E-01 0.822037E-01 0.841788E-01 0.944800E-01 0.879563E-01 0.883645E-01 0.732018E-01 0.753006E-01 6 -0.578909E-01 -0.131944 -0.122159 -0.147239 -0.111015 -0.285924E-01 0.203989E-01 0.102505 0.520659E-01 0.778642E-01 -0.492166E-01 -0.332195E-01 elements 3 -1.37788 -1.20226 -0.927989 -1.38612 -1.16447 -1.32746 -1.10155 -0.998211 -1.07367 -1.13888 -1.24993 -1.21057 -1.10421 -0.977263 -0.808208 -1.34252 -1.16320 -1.10078 -1.26097 -1.01369 4 -5.25711 -5.24118 -5.20228 -5.26845 -5.20102 -5.26045 -5.15796 -5.22046 -5.14851 -5.14688 -5.15973 -5.19612 -5.27282 -5.13307 -5.17704 -5.35978 -5.15035 -5.18537 -5.31193 -5.26007 5 4.57921 4.53260 4.48372 4.53431 4.62051 4.62389 4.55432 4.44818 4.51542 4.54266 4.60185 4.62723 4.49948 4.40562 4.40392 4.59336 4.49485 4.40185 4.32437 4.34062 6 0.224458 0.346293 0.185458 0.119033 -0.309322 0.483005E-01 0.113346E-01 0.827684E-01 -0.463346E-01 0.912766E-01 0.259372 0.335436 0.333305 0.130752 0.270361 0.135832 0.180443 0.100493 0.954622E-01 0.136154 Special Topics in OLS Estimation 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 1.04033 0.990318 0.984146 1.01701 1.00449 1.08251 1.00992 1.03627 1.00691 1.08173 1.04786 0.888500 0.913688 0.882807 0.903452 1.08614 1.01224 1.10485 0.970562 1.07900 1.29637 1.17331 1.09387 0.939720 1.18000 1.17921 1.17253 1.08209 1.19069 1.22376 1.08351 1.16980 1.14637 0.883618 0.990513 0.968454 1.00950 1.09814 1.22335 1.07687 0.445206E-01 0.355594E-01 0.347272E-01 0.421427E-01 0.393541E-01 0.571684E-01 0.406514E-01 0.433007E-01 0.397300E-01 0.516227E-01 0.458973E-01 0.129255E-01 0.189848E-01 0.114013E-01 0.136410E-01 0.554221E-01 0.347526E-01 0.545122E-01 0.310651E-01 0.516013E-01 0.928419E-01 0.727464E-01 0.556964E-01 0.195051E-01 0.745302E-01 0.710570E-01 0.679095E-01 0.566267E-01 0.723700E-01 0.807788E-01 0.580789E-01 0.710416E-01 0.721835E-01 0.173970E-01 0.360076E-01 0.274825E-01 0.517277E-01 0.604937E-01 0.882957E-01 0.557995E-01 -1.23375 -1.16217 -1.17629 -1.14606 -1.19053 -1.08735 -1.25093 -1.30249 -1.24469 -1.27910 -1.26462 -1.36045 -1.36032 -1.36708 -1.37623 -1.15058 -1.36990 -1.13586 -1.02342 -1.09833 -1.24613 -1.16533 -1.14417 -1.31289 -1.16350 -1.12185 -1.13039 -1.00997 -1.13766 -1.13632 -1.19487 -1.31213 -1.21964 -1.37663 -1.43152 -1.44889 -1.19443 -1.31325 -1.15502 -1.27155 -5.28707 -5.17825 -5.26010 -5.27858 -5.30752 -5.25844 -5.26287 -5.33735 -5.32440 -5.35182 -5.34313 -5.37922 -5.37230 -5.35219 -5.39813 -5.23086 -5.37636 -5.35801 -5.37516 -5.38084 -5.20289 -5.17351 -5.28656 -5.40741 -5.26702 -5.45193 -5.53415 -5.43525 -5.50693 -5.47720 -5.00232 -5.13979 -4.98038 -4.99856 -5.00132 -5.25708 -4.98649 -5.09163 -5.05872 -5.09131 4.32031 4.60146 4.31059 4.31616 4.32373 4.43951 4.37527 4.37207 4.32866 4.37673 4.35792 4.32504 4.24544 4.33537 4.35643 4.38856 4.47357 4.52024 4.42959 4.46691 4.56651 4.50500 4.63859 4.48914 4.42641 4.24065 4.34307 4.19551 4.31464 4.22620 4.30132 4.44597 4.35272 4.17836 4.37452 4.52559 4.07003 4.28462 4.24757 4.19806 10-53 0.107698 -0.560219E-01 0.103885 0.749457E-01 0.548220E-01 0.743532E-01 0.147988 0.647937E-01 -0.150272E-01 0.531925E-02 -0.114005E-01 0.108952 0.115398 0.103431 0.105236 0.308389 0.840115E-01 0.281195 0.300350 0.329084 0.202289 -0.139271E-02 0.177489 0.180210 0.863997E-01 0.265480 0.293311 -0.626014E-01 0.242892 0.238114 0.111759 0.334206 0.361026 0.240871 0.127398 0.383110 0.645175E-01 0.205630 0.312002 0.103613 As a test, the latent vector matrix T is used on the right hand side of an OLS model predicting y. As expected the residual sum of squares was 1.76940. Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 6, 53) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 CONSTANT Lag 0 0 0 0 0 0 0 Coefficient 6.6986671 0.52497431 9.3217709 2.0139747 0.47607837 0.97167176 99.783430 Y 0.987190014829113 0.985739827828635 1.76940642294723 3.338502684806099E-002 0.182715699511730 138.127125000000 20.5747007921381 87.1775000000000 1.53007768164378 8.45940033463316 680.732908586196 1.00000000000000 1.604776721098139E-008 0.447398270645863 60 SE 4.2958494 22.074553 0.43351639 0.45293403 0.29499321 0.19412013 1.3070037 t 1.5593347 0.23781877E-01 21.502696 4.4465077 1.6138621 5.0055177 76.345179 B34S Matrix Command Ending. Last Command reached. Space available in allocator Number variables used Number temp variables used 24856574, peak space used 74, peak number used 5192, # user temp clean 583756 74 0 10-54 Chapter 10 Table 10.8 Tests to illustrate PLS Model intermediate calculations. /; /; Illustrates PLS Calculations /; b34sexec options ginclude('gas.b34'); b34srun; b34sexec matrix; call loaddata; call load(pls1_reg); call echooff; /; Base Case calculated with OLS and PLS e'e will be the same nn=6; call olsq(gasout gasin{1 to nn} gasout{1 to nn} :print :savex); ols_coef=%coef; ols_rss =%rss; yhat_ols=%yhat; iprint=1; ncompmax=nocols(%x); gamma= 1.; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,pls_rss,ncompmax,gamma,iprint); call echoon; call print('are ols_rss = pls_rss?',ols_rss,pls_rss); /; /; testing equation 10.4-12. Note we use the x with means removed. /; yhat_pls=x0*r*pls_beta+mean(%y); /; /; PLS_coef = r*pls_beta. pls_coef gets us to yhat another way /; yhat2=%x * pls_coef; call tabulate(yhat_ols,yhat,yhat_pls,yhat2); /; Reduced Model - How does e'e increase if only 6 vectors in T? ncompmax=4; gamma= 1.; call echooff; call pls1_reg(%y,%x,y0,x0,r,pls_beta,u,v,s,pls_coef,yhat, pls_res,pls_rss,ncompmax,gamma,iprint); call echoon; yhat_pls=x0*r*pls_beta+mean(%y); yhat2=%x * pls_coef; call tabulate(yhat_ols,yhat,yhat_pls,yhat2 :title 'Last three columns must be the same'); T=%x*r; T_tilda=transpose(u)*T; call call call call call print(' ':); print('Looking at loading':); print('Error sum of squares decreases as columns of T enter':); print(' ':); stepwise(%y T :nstep 3 :print :printsteps); call graph(r :plottype meshc :d3axis :grid :rotation 0. :pgborder :heading 'Mapping of X to PLS vectors in T' :file 'mapping.wmf'); /; call print(' ':); call print('Validate the r**2 using equation (10.4-10) & (10.4-11)':); /; c=y0*U*T_tilda; r_sq_y=sumsq(c)/sumsq(y0); call print(r_sq_y); call print(' ':); call print('Check with an OLS calculation of y = f(T) ':); call olsq(%y t :print); /; Validate the R matrix call print('r matrix',r); call olsq(t(,1) xhold :noint :print); call olsq(t(,2) xhold :noint :print); Special Topics in OLS Estimation 10-55 call olsq(t(,3) xhold :noint :print); call olsq(t(,4) xhold :noint :print); b34srun; Edited results of running the code in table 10.8 follows. B34SI Matrix Command. d/m/y 27/ 5/11. h:m:s => CALL LOADDATA$ => CALL LOAD(PLS1_REG)$ => CALL ECHOOFF$ 9:10:20. This is the base case where if the number of PLS vectors = the number of right hand side vectors in an OLS model e ' e 16.13858 in both cases. Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F(12, 277) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable GASIN GASIN GASIN GASIN GASIN GASIN GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT CONSTANT Lag 1 2 3 4 5 6 1 2 3 4 5 6 0 Coefficient 0.63160860E-01 -0.13345763 -0.44123536 0.15200749 -0.12036440 0.24930584 1.5452265 -0.59293307 -0.17105674 0.13238479 0.56869923E-01 -0.42085617E-01 3.8241094 GASOUT 0.994664107436370 0.994432949635779 16.1385829591582 5.826203234353124E-002 0.241375293564878 3024.53296551724 7.36469419004290 53.5096551724138 3.23504435694615 48.1338529453902 4302.96578742115 1.00000000000000 1.929993696611666E-008 1.43081466326252 290 SE 0.75989856E-01 0.16490508 0.18869442 0.19021604 0.17941884 0.10973982 0.59808504E-01 0.11024897 0.11518138 0.11465530 0.10083191 0.42891891E-01 0.85547296 t 0.83117489 -0.80929968 -2.3383593 0.79913078 -0.67085705 2.2717902 25.836234 -5.3781279 -1.4851076 1.1546329 0.56400722 -0.98120217 4.4701698 Note that the implied PLS_COEF is in fact the OLS coefficients. Equation (10.4-15) shows two ways to obtain ŷ from ˆPLS or ˆOLS . Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 PLS_BETA 47.70 25.33 6.602 5.115 3.910 2.053 PLS_COEF 0.6316E-01 -0.1335 -0.4412 0.1520 -0.1204 0.2493 290 13 13 1.00000000000000 53.5096551724138 16.1385829591581 3024.53296551724 0.994664107436370 10-56 7 8 9 10 11 12 13 Chapter 10 1.325 1.545 0.8342 -0.5929 0.2304 -0.1711 0.1323 0.1324 0.5716E-01 0.5687E-01 0.4442E-02 -0.4209E-01 0.1914E-01 3.824 This section illustrates the increase in e ' e as fewer and fewer PLS vectors are used. => CALL PRINT('are ols_rss = pls_rss?',OLS_RSS,PLS_RSS)$ are ols_rss = pls_rss? OLS_RSS = 16.138583 PLS_RSS = Vector of 749.612 16.2128 16.1598 13 elements 107.920 64.3347 38.1673 22.8806 16.1422 16.1390 16.1389 16.1386 18.6653 16.9087 Here ŷ from the olsq command (yhat_ols) is the same as ŷ from the pls1_reg command (yhat). The vectors yhat_pls and yhat2 are from equation (10.4-15) and are the same. => YHAT_PLS=X0*R*PLS_BETA+MEAN(%Y)$ => YHAT2=%X * PLS_COEF$ => CALL TABULATE(YHAT_OLS,YHAT,YHAT_PLS,YHAT2)$ Obs 1 2 3 4 5 6 7 8 9 10 . YHAT_OLS 52.76 52.34 52.15 52.08 51.94 52.15 52.92 53.76 55.04 55.89 . YHAT 52.76 52.34 52.15 52.08 51.94 52.15 52.92 53.76 55.04 55.89 . YHAT_PLS 52.76 52.34 52.15 52.08 51.94 52.15 52.92 53.76 55.04 55.89 . YHAT2 52.76 52.34 52.15 52.08 51.94 52.15 52.92 53.76 55.04 55.89 . 280 281 282 283 284 285 286 287 288 289 290 53.00 53.69 55.40 57.05 57.26 57.96 58.33 57.78 57.61 57.14 56.74 53.00 53.69 55.40 57.05 57.26 57.96 58.33 57.78 57.61 57.14 56.74 53.00 53.69 55.40 57.05 57.26 57.96 58.33 57.78 57.61 57.14 56.74 53.00 53.69 55.40 57.05 57.26 57.96 58.33 57.78 57.61 57.14 56.74 The reduced PLS model now has only 4 vectors. Note that as expected e ' e 38.167 . There are only 4 PLS coefficients (PLS_BETA) but they map to 13 coefficients that show how the 4 PLS vectors map to the original X matrix. => NCOMPMAX=4$ => GAMMA= 1.$ => CALL ECHOOFF$ Special Topics in OLS Estimation Partial Least Squares PLS1 - 9 May 2011 Version. Logic from de Jong, Wise, Ricker (2001) Matlab Code Number of rows in original data Number Columns in origional data Number Columns in PLS Coefficient Vector Gamma Mean of left hand variable PLS sum of squared errors Total sum of squares PLS R^2 10-57 290 13 4 1.00000000000000 53.5096551724138 38.1673392441907 3024.53296551724 0.987380749464682 (T*pls_beta)+mean(y) = x*pls_coef Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 PLS_BETA 47.70 25.33 6.602 5.115 NA NA NA NA NA NA NA NA NA PLS_COEF 0.3014E-02 -0.1409 -0.2707 -0.3042 -0.2166 -0.4860E-01 0.7309 0.2073 -0.1318 -0.2034 -0.6247E-01 0.1701 15.44 => YHAT_PLS=X0*R*PLS_BETA+MEAN(%Y)$ => YHAT2=%X * PLS_COEF$ => => CALL TABULATE(YHAT_OLS,YHAT,YHAT_PLS,YHAT2 :TITLE 'Last three columns must be the same')$ The last three columns have to be the same. The first column is what would have been the case had all 13 PLS vectors been used. Note that the first 4 PLS vector coefficients are the same as when there were 13 PLS coefficients but that ˆOLS has of course changed. This suggests that users can experiment withs subsets of a PLS model by using less T vectors. Last three columns must be the same Obs 1 2 3 4 5 6 7 8 9 10 . 280 281 282 283 284 285 286 287 288 289 290 YHAT_OLS 52.76 52.34 52.15 52.08 51.94 52.15 52.92 53.76 55.04 55.89 . 53.00 53.69 55.40 57.05 57.26 57.96 58.33 57.78 57.61 57.14 56.74 YHAT 53.00 52.52 52.18 52.07 52.04 52.19 52.73 53.57 54.73 55.76 . 53.05 53.67 54.91 56.35 57.04 57.27 57.19 56.81 56.46 56.02 55.76 YHAT_PLS 53.00 52.52 52.18 52.07 52.04 52.19 52.73 53.57 54.73 55.76 . 53.05 53.67 54.91 56.35 57.04 57.27 57.19 56.81 56.46 56.02 55.76 YHAT2 53.00 52.52 52.18 52.07 52.04 52.19 52.73 53.57 54.73 55.76 . 53.05 53.67 54.91 56.35 57.04 57.27 57.19 56.81 56.46 56.02 55.76 Here we are obtaining T and T using (10.4-6) and (10.4-7) for future use. Using the stepwise command and restricting to 3 variables, the first 3 T vectors enter with e ' e values of 749.6, 107.9 and 64.33 respectively. As a final check all 4 PLS vectors are used to estimate the model using the olsq command. As anticipated e ' e 38.17 . Of more interest is that the PLS vectors 10-58 Chapter 10 now can be ranked in terms of their t tests which were respectively 130.33, 69.22, 18.04, 13.98 with the t for the constant 29.057. => T=%X*R$ => T_TILDA=TRANSPOSE(U)*T$ => CALL PRINT(' ':)$ => CALL PRINT('Looking at loading':)$ Looking at loading => CALL PRINT('Error sum of squares decreases as columns of T enter':)$ Error sum of squares decreases as columns of T enter => CALL PRINT(' ':)$ => CALL STEPWISE(%Y T :NSTEP 3 :PRINT :PRINTSTEPS)$ Stepwise Option called. Y Variable %Y Y Variable Mean 53.5096551724138 Y Variable Variance 10.4655119914091 Y Variable Number 4 Number of observations 290 PIN set as 5.000000000000000E-002 POUT set as 0.100000000000000 TOL set as 2.220446049250313E-014 Constant estimated in all models - Listed for Final Model X Var. # 1 2 3 Name Col____1 Col____2 Col____3 Lag 0 0 0 Mean 0.95836819 -0.44771392 -0.19376726 Var 0.34602076E-02 0.34602076E-02 0.34602076E-02 FORWARD STEPWISE SELECTION STEP 0: No variables entered. * * * Statistics for Variables Not in the Model * * * Coef. Standard t-statistic Prob. of Variance Variable Estimate Error to enter Larger t Inflation 1 47.70 1.613 29.564 0.0000 1 2 25.33 2.876 8.807 0.0000 1 3 6.60 3.217 2.052 0.0411 1 STEP 1 : Variable 1 entered. Dependent Variable 4 R-squared (percent) 75.216 Source Regression Error Total Variable 1 Adjusted R-squared 75.130 Est. Std. Dev. of Model Error 1.613 * * * Analysis of Variance * * * Sum of Mean DF Squares Square Overall F 1 2274.9 2274.9 874.022 288 749.6 2.6 289 3024.5 Prob. of Larger F 0.0000 * * * Inference on Coefficients * * * (Conditional on the Selected Model) Coef. Standard Prob. of Estimate Error t-statistic Larger t 47.70 1.613 29.564 0.0000 Variance Inflation 1 * * * Statistics for Variables Not in the Model * * * Coef. Standard t-statistic Prob. of Variance Estimate Error to enter Larger t Inflation 25.33 0.613 41.310 0.0000 1 6.60 1.568 4.209 0.0000 1 Variable 2 3 STEP 2 : Variable 2 entered. Special Topics in OLS Estimation Dependent Variable 4 R-squared (percent) 96.432 Est. Std. Dev. of Model Error 0.6132 * * * Analysis of Variance * * * Sum of Mean DF Squares Square Overall F 2 2916.6 1458.3 3878.204 287 107.9 0.4 289 3024.5 Source Regression Error Total Variable 1 2 Adjusted R-squared 96.407 Prob. of Larger F 0.0000 * * * Inference on Coefficients * * * (Conditional on the Selected Model) Coef. Standard Prob. of Estimate Error t-statistic Larger t 47.70 0.6132 77.781 0.0000 25.33 0.6132 41.310 0.0000 Variance Inflation 1 1 * * * Statistics for Variables Not in the Model * * * Coef. Standard t-statistic Prob. of Variance Variable Estimate Error to enter Larger t Inflation 3 6.60 0.4743 13.920 0.0000 1 STEP 3 : Dependent Variable 4 Variable 3 entered. R-squared (percent) 97.873 Intercept Std. error t Stat. Est. Std. Dev. of Model Error 0.4743 * * * Analysis of Variance * * * Sum of Mean DF Squares Square Overall F 3 2960.2 986.7 4386.523 286 64.3 0.2 289 3024.5 Source Regression Error Total Variable 1 2 3 Adjusted R-squared 97.851 Prob. of Larger F 0.0000 * * * Inference on Coefficients * * * (Conditional on the Selected Model) Coef. Standard Prob. of Estimate Error t-statistic Larger t 47.70 0.4743 100.564 0.0000 25.33 0.4743 53.410 0.0000 6.60 0.4743 13.920 0.0000 Variance Inflation 1 1 1 20.4197570610335 0.510801287876884 39.9759310433750 => => => CALL GRAPH(R :PLOTTYPE MESHC :D3AXIS :GRID :ROTATION 0. :PGBORDER :HEADING 'Mapping of X to PLS vectors in T' :FILE 'mapping.wmf')$ => CALL PRINT(' ':)$ => CALL PRINT('Validate the r**2 using equation (10.4-10) & (10.4-11)':)$ Validate the r**2 using equation (10.4-10) & (10.4-11) => C=Y0*U*T_TILDA$ => R_SQ_Y=SUMSQ(C)/SUMSQ(Y0)$ => CALL PRINT(R_SQ_Y)$ R_SQ_Y = 0.98738075 => CALL PRINT(' => CALL PRINT('Check with an OLS calculation of y = f(T) ':)$ Check with an OLS calculation of y = f(T) => CALL OLSQ(%Y T :PRINT)$ ':)$ 10-59 10-60 Chapter 10 Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 4, 285) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Col____1 Col____2 Col____3 Col____4 CONSTANT => Lag 0 0 0 0 0 Coefficient 47.696134 25.331646 6.6018826 5.1154030 15.438848 %Y 0.987380749464682 0.987203637176467 38.1673392441908 0.133920488576108 0.365951483910242 3024.53296551724 -117.446563459150 53.5096551724138 3.23504435694615 70.3905891845412 5574.88562434546 1.00000000000000 3.532683390771366E-004 1.74714855404004 290 SE 0.36595148 0.36595148 0.36595148 0.36595148 0.53132559 t 130.33458 69.221322 18.040322 13.978364 29.057227 CALL PRINT('r matrix',R)$ The final sequence of tests varidate the calculation of the R matrix by running each PLS vector in the T matrix on all the X data. The ith regression listed below replicates the ith column in R . Note that as expected th fit is perfect. r matrix R = Matrix of 1 2 3 4 5 6 7 8 9 10 11 12 13 => 1 -0.816216E-03 -0.989053E-03 -0.114989E-02 -0.126158E-02 -0.129674E-02 -0.124796E-02 0.395372E-02 0.364950E-02 0.322800E-02 0.276954E-02 0.233997E-02 0.197721E-02 -0.270289E-18 13 by 2 -0.400790E-02 -0.504122E-02 -0.571785E-02 -0.568495E-02 -0.477588E-02 -0.309993E-02 0.980122E-02 0.443410E-02 -0.907315E-03 -0.519005E-02 -0.782500E-02 -0.873809E-02 0.322290E-17 4 elements 3 -0.834093E-02 -0.119730E-01 -0.134877E-01 -0.107002E-01 -0.376867E-02 0.471397E-02 0.850522E-02 -0.105492E-01 -0.178896E-01 -0.114058E-01 0.466072E-02 0.230272E-01 0.458080E-16 4 0.388115E-01 0.221005E-01 0.351568E-02 -0.575148E-02 -0.173543E-02 0.114023E-01 0.465133E-01 -0.184729E-02 -0.282878E-01 -0.251736E-01 -0.129510E-02 0.283711E-01 0.234304E-15 CALL OLSQ(T(,1) XHOLD :NOINT :PRINT)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Lag 0 0 0 0 0 0 0 0 Coefficient -0.81621613E-03 -0.98905278E-03 -0.11498880E-02 -0.12615798E-02 -0.12967366E-02 -0.12479594E-02 0.39537165E-02 0.36494975E-02 ##1209 1.00000000000000 1.00000000000000 1.705044853683151E-022 6.155396583693686E-025 7.845633552297537E-013 1.00000000000000 7678.51625031125 0.958368190986431 5.882352941176474E-002 1.596971443973416E-010 1.929993696611666E-008 3.731459585765151E-012 290 SE 0.24699651E-12 0.53600550E-12 0.61333008E-12 0.61827593E-12 0.58318085E-12 0.35669698E-12 0.19440084E-12 0.35835193E-12 t -0.33045654E+10 -0.18452288E+10 -0.18748273E+10 -0.20404803E+10 -0.22235582E+10 -0.34986543E+10 0.20337960E+11 0.10184116E+11 Special Topics in OLS Estimation Col____9 Col___10 Col___11 Col___12 Col___13 => 0 0 0 0 0 0.32280031E-02 0.27695401E-02 0.23399745E-02 0.19772077E-02 -0.21420440E-11 Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Col___13 Lag 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficient -0.40079028E-02 -0.50412166E-02 -0.57178505E-02 -0.56849453E-02 -0.47758761E-02 -0.30999253E-02 0.98012207E-02 0.44341045E-02 -0.90731500E-03 -0.51900517E-02 -0.78250020E-02 -0.87380883E-02 -0.49981027E-11 ##1216 1.00000000000000 1.00000000000000 1.749298702340186E-023 6.315157770181178E-026 2.512997765653837E-013 1.00000000000000 8008.67567425387 -0.447713919937468 5.882352941176472E-002 5.085987186959073E-011 1.929993696611666E-008 1.206201805104001E-012 290 SE 0.79114285E-13 0.17168539E-12 0.19645286E-12 0.19803704E-12 0.18679590E-12 0.11425192E-12 0.62267614E-13 0.11478201E-12 0.11991723E-12 0.11936952E-12 0.10497775E-12 0.44655451E-13 0.89064693E-12 t -0.50659660E+11 -0.29363108E+11 -0.29105458E+11 -0.28706475E+11 -0.25567350E+11 -0.27132369E+11 0.15740479E+12 0.38630657E+11 -0.75661773E+10 -0.43478870E+11 -0.74539623E+11 -0.19567797E+12 -5.6117666 CALL OLSQ(T(,3) XHOLD :NOINT :PRINT)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Col___13 => 0.86221674E+10 0.74315313E+10 0.71396667E+10 0.14182144E+11 -0.77034789 CALL OLSQ(T(,2) XHOLD :NOINT :PRINT)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals 1/Condition XPX Maximum Absolute Residual Number of Observations => 0.37438418E-12 0.37267422E-12 0.32774282E-12 0.13941529E-12 0.27806190E-11 Lag 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficient -0.83409338E-02 -0.11972963E-01 -0.13487669E-01 -0.10700250E-01 -0.37686661E-02 0.47139743E-02 0.85052182E-02 -0.10549245E-01 -0.17889554E-01 -0.11405813E-01 0.46607156E-02 0.23027214E-01 -0.62476284E-11 ##1223 1.00000000000000 1.00000000000000 1.387238706486448E-022 5.008081972875263E-025 7.076780322205333E-013 1.00000000000000 7708.42629761190 -0.193767263817997 5.882352941176477E-002 1.497757265433997E-010 1.929993696611666E-008 3.005623527840839E-012 290 SE 0.22279145E-12 0.48347825E-12 0.55322521E-12 0.55768637E-12 0.52603053E-12 0.32174153E-12 0.17535003E-12 0.32323430E-12 0.33769543E-12 0.33615304E-12 0.29562481E-12 0.12575293E-12 0.25081251E-11 t -0.37438303E+11 -0.24764223E+11 -0.24380068E+11 -0.19186859E+11 -0.71643486E+10 0.14651432E+11 0.48504231E+11 -0.32636526E+11 -0.52975410E+11 -0.33930419E+11 0.15765644E+11 0.18311474E+12 -2.4909557 CALL OLSQ(T(,4) XHOLD :NOINT :PRINT)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance ##1230 1.00000000000000 1.00000000000000 8.851832887877189E-022 3.195607540749888E-024 10-61 10-62 Chapter 10 Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Col___13 Lag 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficient 0.38811538E-01 0.22100525E-01 0.35156778E-02 -0.57514845E-02 -0.17354253E-02 0.11402341E-01 0.46513253E-01 -0.18472915E-02 -0.28287817E-01 -0.25173608E-01 -0.12951020E-02 0.28371074E-01 -0.12852264E-10 1.787626230717677E-012 0.999999999999996 7439.69644703890 0.973707980304693 5.882352941176460E-002 3.691369432345937E-010 1.929993696611666E-008 7.552181102710165E-012 290 SE 0.56278113E-12 0.12212876E-11 0.13974715E-11 0.14087406E-11 0.13287765E-11 0.81273344E-12 0.44294198E-12 0.81650425E-12 0.85303370E-12 0.84913756E-12 0.74676143E-12 0.31765749E-12 0.63356356E-11 t 0.68963823E+11 0.18096086E+11 0.25157420E+10 -0.40827136E+10 -0.13060325E+10 0.14029620E+11 0.10500981E+12 -0.22624395E+10 -0.33161430E+11 -0.29646089E+11 -0.17342915E+10 0.89313413E+11 -2.0285674 10.5 Boosting Hastie-Tibshirani-Friedman (2001, 299) have noted "boosting is one of the most powerful learning ideas introduced in the last ten years….The motivation for boosting was a procedure that combines the outputs of many 'weak' classifiers12 to produce a powerful 'committee.'" Efron-HastieJohnson-Tibshirani (2004, 445) noted ".. in some sense least squares boosting may be carrying out a Lasso fit on the infinite set of tree predictors." In this section two forms of the boosting algorithm are outlined. Their use is later illustrated. In data mining applications in many cases the number of potential right hand side variables is large and it is thus not feasible to place them in an OLS model. Boosting provides an iterative procedure by which the information from the vast set of potential right hand side variables can be extracted, one at a time. Boosting first centers all x variables to have zero mean and unit length and removes the mean from the left hand side variable. Assume a small positive adjustment constant 0 1 . Define e jt as the t th observation of the j th iteration of the residual of a model to predict yt . e jt for t {1, , N } e j . . Start the process by assuming j 1 where by assumption no knowledge is assumed or yˆ1t 0 which implies eˆ1t yt . Next select xk . as the vector having the highest absolute correlation with eˆ jt . An OLS regression of eˆ j . f ( xk . ) produces a fitted value zˆ j . which is used to update yˆ j 1. yˆ j . zˆ j . and eˆ j 1. y. yˆ j 1. The process is repeated for a given number of steps. It can be shown that if enough steps are done, the correlation of yt . and yˆt . will approximate what can be obtained of by an OLS fit if all the x variables are in the model. This will be illustrated in Table 10.12. A variant to boosting is modified stagewise boosting, shown in Table 10.13, which involves first finding the best x vector but in addition taking a small step in all vectors already in the model. Stokes suggests a modification that estimates a constant at each stage and does 12 By classifier they mean OLS, L1, MINIMAX, MARS, GAM etc. Special Topics in OLS Estimation 10-63 not center the x variables or remove the mean from Y. This modification shown in Table 10.13 and 10.14 facilitates forecasting out of sample. 10.6 Extended Examples The code listed in Table 10.9 will illustrate shrinkage estimation. A RATS implementation is shown to provide validation of the shrinkage calculation. The matrix command subroutines ridge and lasso, listed in Table 10.9, provide implementation detail. The routine lasso, not shown, shows how d in (10.2-30) can be set and includes the usual lasso as a special case. The GLM routine, discussed in section 10.3, provides a substantially more efficient approach to solve elastic net problems that involves using both the lasso and risge models in varying proportions. Table 10.9 Example file for Shrinkage Models ___________________________________________________________________ /; /; Ridge regression also shown /; %b34slet dorats =0; %b34slet doridge=1; b34sexec options ginclude('gas.b34'); b34srun; b34sexec matrix; call loaddata; call load(ridge :staging); call load(lasso :staging); call echooff; k=6; call olsq(gasout gasout{1 to k} gasin{1 to k} :print :savex); %b34sif(&doridge.ne.0)%then; lamda=2.; call ridge(%y,%x,lamda,%coef,%names,%lag,ridge_c,0); call ridge(%y,%x,lamda,%coef,%names,%lag,ridge_c,1); %b34sendif; lamda= 10.*sum(%coef)/2.0; call echooff; call lasso(%y,%x,%coef,%lcoef1,%l_t1,lamda,lresid1,1); call lasso(%y,%x,%coef,%lcoef2,%l_t2,lamda,lresid2,3); call tabulate(%names,%lag,%coef,%se,%t,%lcoef1,%l_t1,%lcoef2,%l_t2); b34srun; %b34sif(&dorats.ne.0)then; b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$ b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$ b34sexec options clean(28)$ b34srun$ b34sexec options clean(29)$ b34srun$ /; Uses logic from RATS User's Guide Version 6.1 Page 192 b34sexec pgmcall$ rats passasts 10-64 Chapter 10 pcomments('* ', '* Data passed from B34S(r) system to RATS', '* ', "display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()" '* ') $ PGMCARDS$ * * Non centered results * cmoment # constant gasout{1 to 6} gasin{1 to 6} gasout linreg(cmoment) gasout # constant gasout{1 to 6} gasin{1 to 6} do row=1,13 compute %cmom(row,row)=%cmom(row,row)+2 end do row linreg(cmoment) gasout # constant gasout{1 to 6} gasin{1 to 6} b34sreturn$ b34srun $ b34sexec options close(28)$ b34srun$ b34sexec options close(29)$ b34srun$ b34sexec options /$ dodos(' rats386 rats.in rats.out ') dodos('start /w /r rats32s rats.in /run') dounix('rats rats.in rats.out')$ B34SRUN$ b34sexec options npageout WRITEOUT('Output from RATS',' ',' ') COPYFOUT('rats.out') dodos('ERASE rats.in','ERASE rats.out','ERASE dounix('rm rats.in','rm rats.out','rm $ B34SRUN$ %b34sendif; rats.dat') rats.dat') The matrix subroutines ridge and lasso that are called are listed in Table 10.10: Table 10.10 Ridge and Lasso Routines _____________________________________________________________ subroutine ridge(y,x,lamda,ols_c,_name,_lag,ridge_c,iscale); /; /; Ridge Regression. See Hastie-Tibshirani-Friedman (2001) page 60 /; /; y => left hand side /; x => right hand side - constant at end /; lamda => Lamda for Ridge Regression /; OLS_c => OLS Coef /; _name => Usually %name /; lag => Usually %lag /; ridge_c => Ridge Coef /; iscale => =0 do not center X matrix /; =1 center X matrix /; /; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; Routine built 24 March 2006 /; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Special Topics in OLS Estimation newx=mfam(x); if(iscale.ne.0)then; call deletecol(newx); b0=mean(y); do i=1,nocols(newx); newx(,i)=newx(,i)-mean(newx(,i)); enddo; endif; d=diagmat(vector(nocols(newx):))+lamda; b_ridge=inv((transpose(newx)*newx) + d)*transpose(newx)*vfam(y); if(iscale.ne.0)then; ridge_c=vector(:b_ridge,b0); call print(' ':); call print('Ridge Regression: X matrix has been centered ':); endif; if(iscale.eq.0)then; ridge_c=vector(:b_ridge); call print(' ':); call print('Ridge Regression: X matrix has not been centered ':); endif; call tabulate(_name,_lag,OLS_c,ridge_c); return; end; subroutine lasso(y,x2,olscoef,lcoef,l_t,lamda,lresid,iprint); /; /; Implements the LASSO shrinkage Method /; Reference: Hastie-Tibshirani-Friedman (2001) Page 64, 72 and 77 /; /; y => left hand side /; x => Right hand side. Usually %x from :savex /; olscoef => %coef from call olsq. Used for starting values /; lcoef => Lasso Coef /; lcoefse => lasso Coef t /; lamda => Lamda for Lasso Model. Larger Lamda => more shrinkage /; lresid => Residual from Lasso Model /; iprint => =0 No print use cmax2 /; iprint => =0 Print use cmax2 /; iprint => =0 No print use max2 /; iprint => =0 Print use max2 /; /; Note: The constant is not restricted!! /; Page 77 illustrates a centered example /; +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ rvec=array(:olscoef); ll= array(norows(rvec):)-1.d+32; uu= array(norows(rvec):)+1.d+32; x=mfam(x2); lcoef=vfam(olscoef); i=integers(norows(olscoef)-1); if(iprint.eq.0) call cmaxf2(func :name lasso_1 :parms lcoef :ivalue rvec :maxit 2000 :lower ll :upper uu); 10-65 10-66 Chapter 10 if(iprint.eq.1) call cmaxf2(func :name lasso_1 :parms lcoef :ivalue rvec :maxit 2000 :lower ll :upper uu :print); if(iprint.eq.2) call maxf2(func :name lasso_1 :parms lcoef :ivalue rvec :maxit 2000); if(iprint.eq.3) call maxf2(func :name lasso_1 :parms lcoef :ivalue rvec :maxit 2000 :print); lresid =sumsq(y-x*lcoef); l_t=%t; lse=%se; if(iprint.eq.1.or.iprint.eq.3)then; call print('Lamda for Lasso model ',lamda:); call print('Sum of squared Residuals for Lasso Model ',lresid); endif; return; end; program lasso_1; func=(-1.0)*(sumsq(y-x*lcoef) + lamda*(sum(abs(lcoef(i)))) ); call outstring(3,3,'Function'); call outdouble(36,3,func); return; end; Edited output is shown next. First the Gas data is used in a VAR model with 6 lags. The condition of X’X was found to be 2.341548639194536E-08. A problem with the ridge approach is the need to set which was set as 2.0 for purposes of illustration. Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F(12, 277) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT GASIN GASIN GASIN GASIN GASIN GASIN CONSTANT Lag 1 2 3 4 5 6 1 2 3 4 5 6 0 Coefficient 1.5452265 -0.59293307 -0.17105674 0.13238479 0.56869923E-01 -0.42085617E-01 0.63160860E-01 -0.13345763 -0.44123536 0.15200749 -0.12036440 0.24930584 3.8241094 GASOUT 0.9946641074363697 0.9944329496357792 16.13858295915809 5.826203234353101E-02 0.2413752935648779 3024.532965517241 7.364694190043447 53.50965517241379 3.235044356946151 48.13385294520057 4302.965787421152 1.000000000000000 2.341548639194536E-08 1.430814663262694 290 SE 0.59808504E-01 0.11024897 0.11518138 0.11465530 0.10083191 0.42891891E-01 0.75989856E-01 0.16490508 0.18869442 0.19021604 0.17941884 0.10973982 0.85547296 Ridge Regression: X matrix has not been centered Obs 1 2 _NAME GASOUT GASOUT _LAG 1 2 OLS_C 1.545 -0.5929 RIDGE_C 1.462 -0.3270 t 25.836234 -5.3781279 -1.4851076 1.1546329 0.56400722 -0.98120217 0.83117489 -0.80929968 -2.3383593 0.79913078 -0.67085705 2.2717902 4.4701698 Special Topics in OLS Estimation 3 4 5 6 7 8 9 10 11 12 13 GASOUT GASOUT GASOUT GASOUT GASIN GASIN GASIN GASIN GASIN GASIN CONSTANT 3 4 5 6 1 2 3 4 5 6 0 -0.1711 0.1324 0.5687E-01 -0.4209E-01 0.6316E-01 -0.1335 -0.4412 0.1520 -0.1204 0.2493 3.824 10-67 -0.2444 0.5392E-01 0.7752E-01 -0.2530E-01 0.6540E-01 -0.1593 -0.3127 -0.7397E-01 0.2525E-01 0.4256 0.2010 Ridge Regression: X matrix has been centered Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 _NAME GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT GASIN GASIN GASIN GASIN GASIN GASIN CONSTANT _LAG 1 2 3 4 5 6 1 2 3 4 5 6 0 OLS_C RIDGE_C 1.545 1.355 -0.5929 -0.3117 -0.1711 -0.2472 0.1324 0.5269E-01 0.5687E-01 0.9835E-01 -0.4209E-01 -0.3854E-01 0.6316E-01 0.4999E-01 -0.1335 -0.1568 -0.4412 -0.3262 0.1520 -0.1011 -0.1204 0.4107E-02 0.2493 0.2302 3.824 53.51 The ridge calculation is validated for the not centered case by RATS as shown below. For example note that both B34S and RATS find the coefficient for gasout{1} = 1.46164. * * Data passed from B34S(r) system to RATS * display @1 %dateandtime() @33 ' Rats Version ' %ratsversion() 08/21/2006 12:05 Rats Version 6.10000 * CALENDAR(IRREGULAR) ALLOCATE 296 OPEN DATA rats.dat DATA(FORMAT=FREE,ORG=OBS, $ MISSING= 0.1000000000000000E+32 ) / $ TIME $ GASIN $ GASOUT $ CONSTANT SET TREND = T TABLE Series Obs Mean Std Error Minimum TIME 296 148.500000 85.592056 1.000000 GASIN 296 -0.056834 1.072766 -2.716000 GASOUT 296 53.509122 3.202121 45.600000 TREND 296 148.500000 85.592056 1.000000 Maximum 296.000000 2.834000 60.500000 296.000000 * * Non centered results * cmoment # constant gasout{1 to 6} gasin{1 to 6} gasout linreg(cmoment) gasout # constant gasout{1 to 6} gasin{1 to 6} Linear Regression - Estimation by Least Squares Dependent Variable GASOUT Usable Observations 290 Degrees of Freedom 277 Centered R**2 0.994664 R Bar **2 0.994433 Uncentered R**2 0.999981 T x R**2 289.994 Mean of Dependent Variable 53.509655172 Std Error of Dependent Variable 3.235044357 Standard Error of Estimate 0.241375294 Sum of Squared Residuals 16.138582959 Regression F(12,277) 4302.9658 Significance Level of F 0.00000000 Log Likelihood 7.36469 Durbin-Watson Statistic 1.990879 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 3.824109410 0.855472960 4.47017 0.00001141 2. GASOUT{1} 1.545226521 0.059808504 25.83623 0.00000000 3. GASOUT{2} -0.592933069 0.110248971 -5.37813 0.00000016 4. GASOUT{3} -0.171056741 0.115181382 -1.48511 0.13865246 5. GASOUT{4} 0.132384790 0.114655303 1.15463 0.24923589 6. GASOUT{5} 0.056869923 0.100831906 0.56401 0.57320557 7. GASOUT{6} -0.042085617 0.042891891 -0.98120 0.32734926 8. GASIN{1} 0.063160860 0.075989856 0.83117 0.40659072 9. GASIN{2} -0.133457630 0.164905082 -0.80930 0.41903740 10. GASIN{3} -0.441235358 0.188694423 -2.33836 0.02008022 11. GASIN{4} 0.152007492 0.190216039 0.79913 0.42489927 12. GASIN{5} -0.120364395 0.179418842 -0.67086 0.50287058 13. GASIN{6} 0.249305837 0.109739815 2.27179 0.02386617 10-68 Chapter 10 do row=1,13 (01.0032) compute %cmom(row,row)=%cmom(row,row)+2 (01.0076) end do row linreg(cmoment) gasout # constant gasout{1 to 6} gasin{1 to 6} Linear Regression - Estimation by Least Squares Dependent Variable GASOUT Usable Observations 290 Degrees of Freedom 277 Centered R**2 0.994093 R Bar **2 0.993838 Uncentered R**2 0.999979 T x R**2 289.994 Mean of Dependent Variable 53.509655172 Std Error of Dependent Variable 3.235044357 Standard Error of Estimate 0.253955001 Sum of Squared Residuals 17.864600467 Log Likelihood -7.36850 Durbin-Watson Statistic 1.714901 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 0.201004680 0.175858742 1.14299 0.25402989 2. GASOUT{1} 1.461646296 0.048558960 30.10045 0.00000000 3. GASOUT{2} -0.327011979 0.088129225 -3.71060 0.00024987 4. GASOUT{3} -0.244439653 0.089247268 -2.73890 0.00656424 5. GASOUT{4} 0.053923597 0.088583807 0.60873 0.54320247 6. GASOUT{5} 0.077517182 0.082098090 0.94420 0.34588925 7. GASOUT{6} -0.025297741 0.037997848 -0.66577 0.50611360 8. GASIN{1} 0.065403718 0.058181817 1.12413 0.26193261 9. GASIN{2} -0.159258039 0.106196160 -1.49966 0.13484158 10. GASIN{3} -0.312688564 0.109538260 -2.85461 0.00463463 11. GASIN{4} -0.073966322 0.111719455 -0.66207 0.50847551 12. GASIN{5} 0.025246375 0.111576001 0.22627 0.82115779 13. GASIN{6} 0.425619705 0.074577938 5.70705 0.00000003 Greene (2003, 58) remarks on the ridge regression "this biased estimator has a covariance matrix unambiguously smaller … The tradeoff of some bias for smaller variance may be worth making … but, never the less, economists are generally averse to biased estimators, so this approach has little practical use." Greene's traditional view may be a bit harsh in view of the recent interest in data mining. A major disadvantage of the ridge procedure is the need to set . A possible strategy might be to estimate the model over a range of values. Accuracy problems associated with estimation of large models might be resolved by use the higher accuracy methods such as the QR and higher data precision. These issues are discussed in Chapter 16. An advantage of the Lasso technique lies in its ability to reduce towards 0.0 the least significant coefficients. The CMAX2 and MAX2 commands are used to solve (10.3-5). Since the constraints were not binding, the observed differences are in the way the Hessian was calculated.13 Note: Scaled step tolerance satisfied. May be a local solution. Or progress too slow. Adjust STEPTL. Constrained Maximum Likelihood Estimation using CMAXF2 Command Final Functional Value -66.33990842435293 # of parameters 13 # of good digits in function 15 # of iterations 107 # of function evaluations 366 # of gradiant evaluations 108 Scaled Gradient Tolerance 6.055454452393343E-06 Scaled Step Tolerance 3.666852862501036E-11 Relative Function Tolerance 3.666852862501036E-11 False Convergence Tolerance 2.220446049250313E-14 Maximum allowable step size 4210.1335437160988 Size of Initial Trust region -1.000000000000000 1 / Cond. of Hessian Matrix 2.703839538993241E-12 # 1 2 3 4 Name BETA___1 BETA___2 BETA___3 BETA___4 Coefficient 1.0244446 -0.25655831E-09 -0.23134710 -0.25678470E-07 Standard Error 0.27374792E-02 0.27427500E-05 0.72168633E-02 0.11949062E-03 T Value 374.22918 -0.93540535E-04 -32.056462 -0.21489946E-03 13 The SE is the square root of the diagonal elements of the inverse of the Hessian. The CMAX2 and MAX2 commands use IMSL routines based on NLPQL and ZXLSF respectively that use a Quasi-Newton (BFGS) method. The Hessian can differ depending on the gradiant at the point of the solution. Special Topics in OLS Estimation 5 6 7 8 9 10 11 12 13 BETA___5 BETA___6 BETA___7 BETA___8 BETA___9 BETA__10 BETA__11 BETA__12 BETA__13 0.93661382E-11 0.24113886E-01 -0.10680745E-10 -0.41266048E-01 -0.53869926 -0.45525531E-07 -0.23292376E-08 0.16193677E-08 9.7543182 0.31952100E-02 0.15537732E-02 0.78749958E-06 0.52025460E-02 0.10659899E-01 0.20988395E-03 0.84625424E-04 0.59690117E-04 0.45478764 10-69 0.29313060E-08 15.519566 -0.13562859E-04 -7.9318949 -50.535117 -0.21690811E-03 -0.27524088E-04 0.27129577E-04 21.448072 Gradiant Vector -0.162525E-01 -0.274755E-02 -8.32499 -6.88956 -0.179800E-01 -20.6699 -11.5486 9.27715 14.5865 -0.209805E-03 -0.151524E-01 -4.30439 0.160521E-02 Lower vector -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 -0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33 Upper vector 0.100000E+33 0.100000E+33 Lamda for Lasso model 22.60966011992043 Sum of squared Residuals for Lasso Model LRESID Note: = 24.288858 Last global step failed to locate a lower point than the current x value. Maximum Likelihood Estimation Finite-difference Gradiant Final Functional Value # of parameters # of good digits in function # of iterations # of function evaluations # of gradiant evaluations Scaled Gradient Tolerance Scaled Step Tolerance Relative Function Tolerance False Convergence Tolerance Maximum allowable step size 1 / Cond. of Hessian Matrix # 1 2 3 4 5 6 7 8 9 10 11 12 13 Name BETA___1 BETA___2 BETA___3 BETA___4 BETA___5 BETA___6 BETA___7 BETA___8 BETA___9 BETA__10 BETA__11 BETA__12 BETA__13 using MAXF2 Command -66.33989973667484 13 15 105 294 129 6.055454452393343E-06 3.666852862501036E-11 3.666852862501036E-11 2.220446049250313E-14 4210.1335437160988 9.887930572777443E-07 Coefficient 1.0246898 -0.23461755E-08 -0.23158361 -0.78332610E-09 0.39740723E-10 0.24191727E-01 -0.56468635E-08 -0.41580631E-01 -0.53813947 -0.34905921E-09 -0.15004931E-08 0.15162698E-08 9.7497007 Standard Error 0.21438009E-01 0.21492272E-02 0.25456171E-01 0.66603466E-03 0.23122965E-02 0.33097359E-01 0.11352848E-02 0.16030380E-01 0.63823539E-02 0.86480428E-03 0.48923809E-03 0.24349930E-03 0.93149088 T Value 47.797807 -0.10916368E-05 -9.0973466 -0.11761041E-05 0.17186690E-07 0.73092619 -0.49739622E-05 -2.5938644 -84.316771 -0.40362799E-06 -0.30669997E-05 0.62269984E-05 10.466770 SE calculated as sqrt |diagonal(inv(%hessian))| Gradiant Vector -0.231362E-02 0.197955E-03 12.6483 15.4705 -0.121604E-02 0.864598 Lamda for Lasso model 10.5148 -12.2471 -8.01297 -0.692538E-04 0.144096E-02 14.2601 0.106954E-03 22.60966011992043 Sum of squared Residuals for Lasso Model LRESID Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 = 24.281743 %NAMES GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT GASIN GASIN GASIN GASIN GASIN GASIN CONSTANT %LAG 1 2 3 4 5 6 1 2 3 4 5 6 0 %COEF %SE %T 1.545 0.5981E-01 25.84 -0.5929 0.1102 -5.378 -0.1711 0.1152 -1.485 0.1324 0.1147 1.155 0.5687E-01 0.1008 0.5640 -0.4209E-01 0.4289E-01 -0.9812 0.6316E-01 0.7599E-01 0.8312 -0.1335 0.1649 -0.8093 -0.4412 0.1887 -2.338 0.1520 0.1902 0.7991 -0.1204 0.1794 -0.6709 0.2493 0.1097 2.272 3.824 0.8555 4.470 %LCOEF1 %L_T1 %LCOEF2 %L_T2 1.024 374.2 1.025 47.80 -0.2566E-09 -0.9354E-04 -0.2346E-08 -0.1092E-05 -0.2313 -32.06 -0.2316 -9.097 -0.2568E-07 -0.2149E-03 -0.7833E-09 -0.1176E-05 0.9366E-11 0.2931E-08 0.3975E-10 0.1719E-07 0.2411E-01 15.52 0.2419E-01 0.7309 -0.1068E-10 -0.1356E-04 -0.5647E-08 -0.4974E-05 -0.4127E-01 -7.932 -0.4158E-01 -2.594 -0.5387 -50.54 -0.5381 -84.32 -0.4553E-07 -0.2169E-03 -0.3491E-09 -0.4036E-06 -0.2329E-08 -0.2752E-04 -0.1500E-08 -0.3067E-05 0.1619E-08 0.2713E-04 0.1516E-08 0.6227E-05 9.754 21.45 9.750 10.47 10-70 Chapter 10 At a cost of e ' e going from 16.13858 to 22.60966, insignificant parameters such as GASIN{1} were reduced substantially, ie. from .631eE-0 to -.1068E-10. We are left with a substantially reduced model where GASOUT{1, 3} and GASIN{3} where shown to be driving the model. The GAS model is next viewed from the perspective of the LTS approach. Here outliers are trimmed from the dataset and the results inspected to see how much difference it makes.14 The analysis uses the lts and lts_rec subroutines listed in Table 10.11 Table 10.11 LTS and LTS_REC Routines for Resistant Estimation __________________________________________________________ subroutine lts(y,x,resid,names,lags,oldcoef,oldse,oldt,p,sort_y, sort_x,ihold,iprint); /; /; Least Trimmed Squares /; /; Reference: "Linear Models with R" By Julian Faraway /; Page 101 /; /; y => left hand side. Usually %y /; x => right hand side. Usually %x /; resid => Residual from original regression Usually %res /; names => Names from regression. Usually %names /; lags => lags from original regression. Usually %lags /; oldceof => Coefficient for full model. Usually %coef /; oldse => SE from original Model. Usually %se /; oldt => t from original model. Usually %t /; p => % trimmed /; sort_y => Y after sorting and truncation /; sort_x => X after sorting and truncation /; ihold => # of obs held out /; iprint => If > 0 => print. /; /; /; Help on how to recover coef ouside of routine /; Assume: /; /; n=6; /; call olsq(gasout gasin{1 to n} gasout{1 to n} :print :savex); /; call lts(%y,%x,%res,%names,%lag,%coef,%se,%t,p,newy,newx, /; ihold,iprint); /; option # 1 /; call olsq(newy newx :noint :print :holdout ihold); /; option # 2 /; n=norows(newy); /; call deleterow(newy,n-ihold+1,ihold); /; call deleterow(newx,n-ihold+1,ihold); /; call print(inv(transpose(newx)*newx)*transpose(newx)*newy); /; /; ----------------------------------------------------------/; /; LTS is an example of a resistent regression method. /; The objective is to see how sensitive the results are /; to outliers. 14 Since the larger residuals have been removed, the t statistics usually increase as the number of observations dropped are increased. Special Topics in OLS Estimation /; /; Built 21 June 2007 by Houston H. Stokes /; Mods 23 June 2007 /; n=norows(x); if(p.lt.0. .or. p .gt.1.)then; call epprint('p not in range 0 lt p le 1. go to done; endif; Was ',p:); r=afam(resid)*afam(resid); j=ranker(r); sort_x=x(j,); sort_y=y(j); ihold=idint((1.0-p)*dfloat(n)); if(iprint.eq.0) call olsq(sort_y sort_x :noint :qr :holdout ihold); if(iprint.ne.0)then; call print(' ':); call print('Least Trimmed Squares with holdout ',ihold:); call olsq(sort_y sort_x :noint :print :qr :holdout ihold); call print(' ':); call print('Least Trimmed Squares with holdout % ',(1.0-p):); call tabulate(names lags oldcoef oldse oldt %coef %se %t :cname); endif; done continue; return; end; program lts_rec; /; /; Does Recursive lts /; /; Needs the following set: /; /; oldcoef =1; /; n_recur =6; /; /; oldcoef = 0 => use Base Coefficients for table /; oldceof = 1 => use prior LTS coef for table /; n_recur = => Sets number of recursive LTS estimates /; /; example of use; /; /; b34sexec matrix; /; call loaddata; /; call load(lts :staging); /; call load(lts_rec :staging); /; call echooff; /; n=6; /; p=.8; /; iprint=1; /; call olsq(gasout gasin{1 to n} gasout{1 to n} :print :savex); /; call lts(%y,%x,%res,%names,%lag,%coef,%se,%t,p,newy,newx, /; ihold,iprint); /; oldcoef=1; /; n_recur=6; /; call lts_rec /; b34srun; /; 10-71 10-72 Chapter 10 /; Built 23 June 2007 /; ----------------------------------------------------------/; holdname=%names; holdlag =%lag; holdcoef=%coef; holdse =%se; holdt =%t; do i=1,n_recur; /; logic . Rerun to get %coef etc and proceed call olsq(newy,newx,:noint :savex :holdout ihold); yy=newy; xx=newx; n_y=norows(yy); call deleterow(yy,n_y-ihold+1,ihold); call deleterow(xx,n_y-ihold+1,ihold); if(oldcoef.eq.1)then; %coef=holdcoef; %se =holdse; %t =holdt; endif; call print(' ':); if(oldcoef.eq.0)call print('Oldcoef is Prior LTS Coef':); if(oldcoef.eq.1)call print('Oldcoef is OLS Coef':); call print('Recursive Trimmed Squares pass ',i:); call lts(yy,xx,%res,%names,%lag,%coef,%se,%t,p,newy, newx,ihold,iprint); enddo; return; end; A sample job that does both LTS and recursive LTS estimation is shown in Table 10.12 Table 10.12 Estimation of LTS Based Models ____________________________________ b34sexec options ginclude('gas.b34'); b34srun; b34sexec matrix; call loaddata; call load(lts :staging); call load(lts_rec :staging); call echooff; n=6; p=.8; iprint=1; call olsq(gasout gasin{1 to n} gasout{1 to n} :print :savex); call lts(%y,%x,%res,%names,%lag,%coef,%se,%t,p,newy,newx,ihold,iprint); /; /; Recursive /; oldcoef=1; n_recur=2; call lts_rec; b34srun; Special Topics in OLS Estimation 10-73 and when run produces edited output: Least Trimmed Squares with holdout = 57 Ordinary Least Squares Estimation using QR Method Dependent variable SORT_Y Centered R**2 0.9986167752837815 Adjusted R**2 0.9985413266628969 Residual Sum of Squares 3.228594427396152 Residual Variance 1.467542921543706E-02 Standard Error 0.1211421859446042 Total Sum of Squares 2334.106952789699 Log Likelihood 167.8898399989807 Mean of the Dependent Variable 53.32103004291845 Std. Error of Dependent Variable 3.171877335426148 Sum Absolute Residuals 22.80572349773055 F(12, 220) 13235.71940182378 F Significance 1.000000000000000 QR Rank Check variable (eps) set as 2.220446049250313E-16 Maximum Absolute Residual 0.3122580017218866 Number of Observations 233 Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Col___13 Lag 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficient 0.25577822E-01 -0.16805808E-01 -0.61852464 0.26140688 -0.45601485E-01 0.17696140 1.5118237 -0.46466148 -0.30331998 0.20943964 -0.88668066E-04 -0.18285169E-01 3.4737529 SE 0.45056830E-01 0.10722801 0.12362575 0.12062224 0.11847044 0.65535734E-01 0.43583183E-01 0.79569077E-01 0.74393105E-01 0.75542966E-01 0.67338641E-01 0.27449928E-01 0.57869901 Least Trimmed Squares with holdout % Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 NAMES GASIN GASIN GASIN GASIN GASIN GASIN GASOUT GASOUT GASOUT GASOUT GASOUT GASOUT CONSTANT LAGS 1 2 3 4 5 6 1 2 3 4 5 6 0 t 0.56767914 -0.15672965 -5.0032025 2.1671532 -0.38491867 2.7002277 34.688234 -5.8397244 -4.0772594 2.7724572 -0.13167487E-02 -0.66612815 6.0026936 0.2000000000000000 OLDCOEF 0.6316E-01 -0.1335 -0.4412 0.1520 -0.1204 0.2493 1.545 -0.5929 -0.1711 0.1324 0.5687E-01 -0.4209E-01 3.824 OLDSE OLDT 0.7599E-01 0.8312 0.1649 -0.8093 0.1887 -2.338 0.1902 0.7991 0.1794 -0.6709 0.1097 2.272 0.5981E-01 25.84 0.1102 -5.378 0.1152 -1.485 0.1147 1.155 0.1008 0.5640 0.4289E-01 -0.9812 0.8555 4.470 %COEF 0.2558E-01 -0.1681E-01 -0.6185 0.2614 -0.4560E-01 0.1770 1.512 -0.4647 -0.3033 0.2094 -0.8867E-04 -0.1829E-01 3.474 %SE 0.4506E-01 0.1072 0.1236 0.1206 0.1185 0.6554E-01 0.4358E-01 0.7957E-01 0.7439E-01 0.7554E-01 0.6734E-01 0.2745E-01 0.5787 %T 0.5677 -0.1567 -5.003 2.167 -0.3849 2.700 34.69 -5.840 -4.077 2.772 -0.1317E-02 -0.6661 6.003 Note how the significance of most variables increased, as expected. If in fact true outliers were removed by the LTS algorithm, then the LTS estimated coefficients might be used in place of the OLS coefficients for any out-of-sample forecasting. Some coefficnts that got their significance from the outliers that were removed, in many cases will be seen to lose significance in the LTS model as these outliers are removed. There is no reason why the LTS algorithm might not be applied again. As an example, if LTS is applied two more times the results are: Oldcoef is OLS Coef Recursive Trimmed Squares pass 1 Least Trimmed Squares with holdout 46 Ordinary Least Squares Estimation using QR Method Dependent variable SORT_Y Centered R**2 0.9993441473606075 Adjusted R**2 0.9992989161440977 Residual Sum of Squares 1.276018688275978 Residual Variance 7.333440737218262E-03 Standard Error 8.563551095905403E-02 Total Sum of Squares 1945.587486631016 Log Likelihood 200.9770082868623 Mean of the Dependent Variable 53.34973262032085 Std. Error of Dependent Variable 3.234215171813110 Sum Absolute Residuals 13.09864156581015 F(12, 174) 22094.12490914283 F Significance 1.000000000000000 QR Rank Check variable (eps) set as 2.220446049250313E-16 Maximum Absolute Residual 0.1621619933634477 Number of Observations 187 Variable Col____1 Lag 0 Coefficient 0.26812228E-01 SE 0.38419042E-01 t 0.69788903 10-74 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Col___13 Chapter 10 0 0 0 0 0 0 0 0 0 0 0 0 0.26604491E-01 -0.69665370 0.30968284 -0.52287859E-01 0.16948075 1.4767408 -0.37286811 -0.39747469 0.26441683 -0.22112572E-01 -0.14860658E-01 3.5218606 0.89371617E-01 0.98456422E-01 0.91216416E-01 0.87851941E-01 0.50876822E-01 0.36399384E-01 0.65275731E-01 0.64883283E-01 0.63975923E-01 0.54621589E-01 0.22135590E-01 0.49989539 Least Trimmed Squares with holdout % Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 NAMES Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Col___13 LAGS 0 0 0 0 0 0 0 0 0 0 0 0 0 0.29768389 -7.0757568 3.3950341 -0.59518160 3.3311977 40.570487 -5.7122013 -6.1259954 4.1330679 -0.40483208 -0.67134681 7.0451953 0.2000000000000000 OLDCOEF 0.6316E-01 -0.1335 -0.4412 0.1520 -0.1204 0.2493 1.545 -0.5929 -0.1711 0.1324 0.5687E-01 -0.4209E-01 3.824 Oldcoef is OLS Coef Recursive Trimmed Squares pass 2 Least Trimmed Squares with holdout 37 OLDSE OLDT 0.7599E-01 0.8312 0.1649 -0.8093 0.1887 -2.338 0.1902 0.7991 0.1794 -0.6709 0.1097 2.272 0.5981E-01 25.84 0.1102 -5.378 0.1152 -1.485 0.1147 1.155 0.1008 0.5640 0.4289E-01 -0.9812 0.8555 4.470 %COEF 0.2681E-01 0.2660E-01 -0.6967 0.3097 -0.5229E-01 0.1695 1.477 -0.3729 -0.3975 0.2644 -0.2211E-01 -0.1486E-01 3.522 %SE %T 0.3842E-01 0.6979 0.8937E-01 0.2977 0.9846E-01 -7.076 0.9122E-01 3.395 0.8785E-01 -0.5952 0.5088E-01 3.331 0.3640E-01 40.57 0.6528E-01 -5.712 0.6488E-01 -6.126 0.6398E-01 4.133 0.5462E-01 -0.4048 0.2214E-01 -0.6713 0.4999 7.045 %COEF 0.1470E-01 0.7846E-01 -0.7579 0.3250 -0.2532E-01 0.1531 1.446 -0.2819 -0.4990 0.3203 -0.3598E-01 -0.1550E-01 3.523 %SE %T 0.3239E-01 0.4540 0.7526E-01 1.043 0.8120E-01 -9.334 0.7342E-01 4.426 0.7138E-01 -0.3547 0.4105E-01 3.728 0.3255E-01 44.41 0.5977E-01 -4.716 0.5722E-01 -8.720 0.5490E-01 5.833 0.4794E-01 -0.7506 0.1931E-01 -0.8028 0.4046 8.707 Ordinary Least Squares Estimation using QR Method Dependent variable SORT_Y Centered R**2 0.9996155075367260 Adjusted R**2 0.9995818293647604 Residual Sum of Squares 0.5880635102914726 Residual Variance 4.292434381689581E-03 Standard Error 6.551667254744842E-02 Total Sum of Squares 1529.453933333333 Log Likelihood 202.7758915427529 Mean of the Dependent Variable 53.14733333333334 Std. Error of Dependent Variable 3.203871329950913 Sum Absolute Residuals 7.986427256057823 F(12, 137) 29681.40635892062 F Significance 1.000000000000000 QR Rank Check variable (eps) set as 2.220446049250313E-16 Maximum Absolute Residual 0.1219604495012589 Number of Observations 150 Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Col___13 Lag 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficient 0.14701514E-01 0.78455632E-01 -0.75794272 0.32497672 -0.25320706E-01 0.15306172 1.4458257 -0.28187578 -0.49898873 0.32026947 -0.35984380E-01 -0.15499016E-01 3.5229776 SE 0.32385411E-01 0.75255516E-01 0.81201400E-01 0.73417337E-01 0.71378642E-01 0.41052916E-01 0.32552851E-01 0.59770228E-01 0.57220844E-01 0.54902520E-01 0.47938922E-01 0.19306494E-01 0.40461601 Least Trimmed Squares with holdout % Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 NAMES Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Col___11 Col___12 Col___13 LAGS 0 0 0 0 0 0 0 0 0 0 0 0 0 t 0.45395483 1.0425233 -9.3341090 4.4264302 -0.35473785 3.7284008 44.414719 -4.7159898 -8.7204014 5.8334202 -0.75062973 -0.80278768 8.7069654 0.2000000000000000 OLDCOEF 0.6316E-01 -0.1335 -0.4412 0.1520 -0.1204 0.2493 1.545 -0.5929 -0.1711 0.1324 0.5687E-01 -0.4209E-01 3.824 OLDSE OLDT 0.7599E-01 0.8312 0.1649 -0.8093 0.1887 -2.338 0.1902 0.7991 0.1794 -0.6709 0.1097 2.272 0.5981E-01 25.84 0.1102 -5.378 0.1152 -1.485 0.1147 1.155 0.1008 0.5640 0.4289E-01 -0.9812 0.8555 4.470 where the original OLS results are listed on the left. Note how the t scores are moving upward. Tables 10.13 and 10.14 lists subroutines that implement boosting and modified stagewise boosting. Table 10.15 contains the test case code that does 100 iterations where .5 . Figure 10.12 Special Topics in OLS Estimation 10-75 shows how the correlation of y and yˆ j ,. increases at the iterations proceed. The “best” correlation obtained for OLS was .719547. Table 10.13 Boosting Routine ___________________________________________________________ subroutine boost(y,yhat,res,x,in,e,ipass,itype,iprint); /; /; Implements OLS boosting as discribed in "Least Angle Regression" /; by Bradley Efron, Trevor Hastie, Iain Johnson and Robert Tibshirani /; The Annals of Statistics Vol 32 No. 2 (April 2004) pp 407-451 /; /; y => left hand variable /; yhat => Forecasted y /; res => error /; x => n by k matrix of right hand side variables /; in => work vector telling what vectors in X used /; in = 0 implies that vector in /; e => Step size - use a small number /; ipass => Number of times called. Be sure called enough times /; one way to test this is to monitor progress in the /; correlation improvement /; itype => =0 OLS, =1 MARS, =2 GAM, =3 L1 =4 mimimax /; iprint => =0 no printing, =1 print step data, =2 print correlations /; /; Note: X must be centered. Y must have mean removed. /; cc=array(nocols(x):); if(ipass.eq.1)then; in=array(nocols(x):)+1.; yhat=y*0.0; res=y; endif; do i=1,nocols(x); cc(i)=ccf(res,x(,i)); enddo; ij=imax(abs(cc)); if(iprint.eq.2)then; call print('Largest Correlation vector was ',ij:); call tabulate(in,cc); endif; if(iprint.eq.0)then; if(itype.eq.0)call olsq( res,x(,ij) :noint); if(itype.eq.1)call marspline(res,x(,ij)); if(itype.eq.2)call gamfit( res,x(,ij)[predictor,3] :noint); if(itype.eq.3)call olsq( res,x(,ij) :noint :l1); if(itype.eq.4)call olsq( res,x(,ij) :noint :minimax); endif; if(iprint.ne.0)then; if(itype.eq.0)call olsq( res,x(,ij) :noint :print); if(itype.eq.1)call marspline(res,x(,ij) :print); if(itype.eq.2)call gamfit( res,x(,ij)[predictor,3] :noint :print); if(itype.eq.3)call olsq( res,x(,ij) :noint :print :l1); if(itype.eq.4)call olsq( res,x(,ij) :noint :print :minimax); endif; if(itype.eq.3)%yhat=%l1yhat; if(itype.eq.4)%yhat=%mmyhat; yhat=yhat+ afam(e)*afam(%yhat); res=y-yhat; return; end; ____________________________________________________________ 10-76 Chapter 10 Table 10.14 Modified Forward Stagewise model boosting ___________________________________________________________ subroutine boost2(y,yhat,res,x,xbuild,in,e,ipass,itype,iprint); /; /; Modified OLS boosting as described in "Least Angle Regression" /; by Bradley Efron, Trevor Hastie, Iain Johnson and Robert Tibshirani /; The Annals of Statistics Vol 32 No. 2 (April 2004) pp 407-451 /; /; y => left hand variable /; yhat => Forecasted y /; res => error /; x => n by k matrix of right hand side variables /; in => work vector telling what vectors in X used in set = 0 /; e => Step size /; ipass => Number of times called /; itype => =0 OLS, =1 MARS, =2 GAM, =3 L1, =4 Minimax /; iprint => =0 no printing, =1 print step data, =2 print correlations /; /; Note: X must be centered. Y must have mean removed. /; cc=array(nocols(x):); if(ipass.eq.1)then; in=array(nocols(x):)+1.; yhat=y*0.0; res=y; endif; do i=1,nocols(x); cc(i)=ccf(res,x(,i)); enddo; ij=imax(abs(cc)); if(iprint.eq.2)then; call print('Largest Correlation vector was ',ij:); call tabulate(in,cc); endif; if(iprint.ne.0)call print(ipass,cc,ij); if(ipass.eq.1) xbuild=array(norows(x),1:x(,ij)); if(ipass.gt.1.and.in(ij).ne.0.0)xbuild=catcol(xbuild,x(,ij)); in(ij)=0.0; do j=1,nocols(xbuild); if(iprint.eq.0)then; if(itype.eq.0)call olsq( res,xbuild(,j):noint); if(itype.eq.1)call marspline(res,xbuild(,j)); if(itype.eq.2)call gamfit( res,xbuild(,j)[predictor,3] :noint); if(itype.eq.3)call olsq( res,xbuild(,j):noint :l1); if(itype.eq.4)call olsq( res,xbuild(,j):noint :minimax); endif; if(iprint.ne.0)then; if(itype.eq.0)call olsq( res,xbuild(,j) :noint :print); if(itype.eq.1)call marspline(res,xbuild(,j) :print); if(itype.eq.2)call gamfit( res,xbuild(,j)[predictor,3] :noint :print); if(itype.eq.3)call olsq( res,xbuild(,j) :noint :print :l1); if(itype.eq.4)call olsq( res,xbuild(,j) :noint :print :minimax); endif; if(itype.eq.3)%yhat=%l1yhat; if(itype.eq.4)%yhat=%mmyhat; yhat=yhat+ afam(e)*afam(%yhat); res=y-yhat; enddo; return; end; Special Topics in OLS Estimation _____________________________________________________________________________ Table 10.15 Boosting Test Case _________________________________________________________________ b34sexec options ginclude('b34sdata.mac') member(efron_1); b34srun; /; b34sexec options ginclude('b34sdata.mac') member(efron_2); b34srun; /; /; Implements OLS boosting as discribed in "Least Angle Regression" /; by Bradley Efron, Trevor Hastie, Iain Johnson and Robert Tibshirani /; The Annals of Statistics Vol 32 No. 2 (April 2004) pp 407-451 /; /; Modified boosting also available if iboost set = 1 /; b34sexec matrix; call loaddata; call load(center :staging); call load(center2 :staging); call load(boost :staging); call echooff; /; iboost=0 => ols boost /; iboost=1 => modified OLS boost iboost=0; /; iboost=1; e=.5; ntry=100; /; /; /; /; /; itype=0 itype=1 itype=2 itype=3 itype=4 => => => => => ols mars gam l1 minimax itype=0; iprint=0; x=center2(catcol(age sex bmi bp s1 s2 s3 s4 s5 s6)); y=y-mean(y); /; Set up the base case. if(itype.eq.0)call if(itype.eq.1)call if(itype.eq.2)call if(itype.eq.3)call if(itype.eq.4)call olsq(y,x :noint :print); marspline(y x :print :nk 20); gamfit(y x[predictor,3] :print); olsq(y,x :noint :print :l1); olsq(y,x :noint :print :minimax); call print(ccf(%y,%yhat)); base1=ccf(%y,%yhat); fit=array(ntry:); do i=1,ntry; if(iboost.eq.0)call boost( y,yhat,res,x, in,e,i,itype,iprint); if(iboost.eq.1)call boost2(y,yhat,res,x,xbuild,in,e,i,itype,iprint); /; if(iprint.ne.0)call graph(y,yhat); fit(i)=ccf(y,yhat); call outstring(1,2,'Iteration'); call outinteger(40,2,i); call outstring(1,3,'Correlation'); call outdouble(40,3,fit(i)); enddo; 10-77 10-78 Chapter 10 base=array(norows(fit):)+base1; if(itype.eq.0)call char1(cc1,'OLS' ); if(itype.eq.1)call char1(cc1,'MARS' ); if(itype.eq.2)call char1(cc1,'GAM' ); if(itype.eq.3)call char1(cc1,'L1' ); if(itype.eq.4)call char1(cc1,'MINIMAX'); cc=' Boosting based Correlation of y and yhat given eps ='; if(iboost.eq.1) cc=' Modified Boosting based Correlation of y and yhat given eps ='; call ir8tostr(e,cc2,'(f8.4)'); cc=catrow(cc1,cc,cc2); call graph(base,fit :heading cc :file 'boost.wmf' :nocontact :nolabel :pgborder); call print(fit); b34srun; OLS Boosting based Correlation of y and yhat given eps = 0.5000 .72 .70 .68 .66 B A S E .64 .62 .60 .58 0 10 20 30 Figure 10.12 OLS Boosting Example 40 50 Obs 60 70 80 90 100 F I T Special Topics in OLS Estimation 10-79 Using Efron's test data, and setting .5 , after 100 iterations the correlation for boosting was .717720 as shown below: Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 9, 432) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Col____8 Col____9 Col___10 Lag 0 0 0 0 0 0 0 0 0 0 Coefficient -10.009866 -239.81564 519.84592 324.38465 -792.17564 476.73902 101.04327 177.06324 751.27370 67.626692 Y 0.5177484222203508 0.5077015143499414 1263985.785633341 2925.893022299400 54.09152449598182 2621009.124434389 -2385.992862123519 -1.196026686437816E-14 77.09300453299109 19128.63379518925 51.53311137103671 1.000000000000000 1.708266908084447E-03 155.8267661215523 442 SE 59.680052 61.151444 66.456394 65.346228 416.19732 338.63787 212.28533 161.28879 171.70091 65.907867 t -0.16772549 -3.9216677 7.8223613 4.9640913 -1.9033655 1.4078137 0.47597857 1.0978025 4.3754789 1.0260792 0.71954737 FIT = Array 0.586450 0.707305 0.715390 0.716644 0.717120 0.717309 0.717412 0.717472 0.717527 0.717570 0.717617 0.717660 0.717703 of 100 0.676525 0.710165 0.715387 0.716694 0.717162 0.717326 0.717418 0.717482 0.717531 0.717577 0.717620 0.717665 0.717708 elements 0.691952 0.712198 0.715911 0.716811 0.717174 0.717342 0.717429 0.717487 0.717538 0.717581 0.717627 0.717671 0.717713 0.700391 0.712596 0.715876 0.716909 0.717203 0.717356 0.717434 0.717497 0.717543 0.717588 0.717632 0.717675 0.717720 0.700302 0.713552 0.716153 0.716942 0.717220 0.717374 0.717444 0.717500 0.717548 0.717593 0.717638 0.717682 0.706626 0.713761 0.716215 0.716963 0.717249 0.717381 0.717454 0.717509 0.717555 0.717600 0.717645 0.717688 0.705759 0.713940 0.716437 0.717037 0.717265 0.717390 0.717459 0.717517 0.717559 0.717607 0.717648 0.717693 0.706539 0.714881 0.716488 0.717096 0.717294 0.717397 0.717469 0.717522 0.717567 0.717611 0.717652 0.717696 If the iterations were increased to 1900 the correlation increased to .719537, just a shade below the OLS value of .719547. 10-80 Chapter 10 Table 10.16 Modifications to OLS Boosting to Facilitate Forecasting ________________________________________________________________________ subroutine boost3(y,yhat,res,x,in,beta1,beta2,e,ipass,itype,iprint); /; /; Implements a mod to OLS boosting as described in /; "Least Angle Regression" /; by Bradley Efron, Trevor Hastie, Iain Johnson and Robert Tibshirani /; The Annals of Statistics Vol 32 No. 2 (April 2004) pp 407-451 /; /; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; HHS Mods involve use of constant and no mean adjustment /; Goal is to facilitate forecasting using boost4 routine /; in, beta1, beta2 used as input into boost4 /; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; /; y => left hand variable /; yhat => Forecasted y /; res => error /; x => n by k matrix of right hand side variables /; in => work vector telling vector in X used /; beta1 => saves the beta /; beta2 => sames the constant /; e => Step size - use a small number /; ipass => Number of times called. Be sure called enough times /; one way to test this is to monitor progress in the /; correlation improvement /; itype => =0 OLS, =1 MARS, =2 GAM, =3 L1 =4 mimimax /; iprint => =0 no printing, =1 print step data, =2 print correlations /; /; +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; cc=array(nocols(x):); if(itype.eq.1.or.itype.eq.2)then; call print('ERROR: ITYPE in boots3 cannot be 1 or 2':); call stop; endif; if(ipass.eq.1)then; yhat=y*0.0; res=y; endif; do i=1,nocols(x); cc(i)=ccf(res,x(,i)); enddo; ij=imax(abs(cc)); in(ipass)=ij; if(iprint.eq.2)then; call print('Largest Correlation vector was ',ij:); call tabulate(in,cc); endif; if(iprint.eq.0)then; if(itype.eq.0)call olsq( res,x(,ij) ); /; if(itype.eq.1)call marspline(res,x(,ij) ); /; if(itype.eq.2)call gamfit( res,x(,ij)[predictor,3] :noint); if(itype.eq.3)call olsq( res,x(,ij) :l1); if(itype.eq.4)call olsq( res,x(,ij) :minimax); endif; Special Topics in OLS Estimation if(iprint.ne.0)then; if(itype.eq.0)call olsq( res,x(,ij) :print); /; if(itype.eq.1)call marspline(res,x(,ij) :print ); /; if(itype.eq.2)call gamfit( res,x(,ij)[predictor,3] :noint :print); if(itype.eq.3)call olsq( res,x(,ij) :print :l1); if(itype.eq.4)call olsq( res,x(,ij) :print :minimax); endif; if(itype.eq.0)then; beta1(ipass)=%coef(1); beta2(ipass)=%coef(2); endif; if(itype.eq.3)then; %yhat=%l1yhat; beta1(ipass)=%l1coef(1); beta2(ipass)=%l1coef(2); endif; if(itype.eq.4)then; %yhat=%mmyhat; beta1(ipass)=%mmcoef(1); beta2(ipass)=%mmcoef(2); endif; yhat=yhat+ afam(e)*afam(%yhat); res=y-yhat; return; end; subroutine boost4(yhat,x,in,beta1,beta2,e); /; /; HHS boosting Forecasting for model estimated with boost3 /; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; HHS Mods involve use of constant and no mean adjustment /; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; /; yhat => Forecasted y /; x => n by k matrix of right hand side variables /; in => work vector telling what vectors in X used /; beta1 => Beta Coef /; beta2 => Constant coef /; e => Step size used /; /; +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; yhat=array(norows(x):); do i=1,norows(beta1); add=(beta1(i)*x(,in(i)))+beta2(i); yhat=yhat+ afam(e)*afam(add); enddo; return; end; _______________________________________________________________________ 10-81 10-82 Chapter 10 Table 10.17 Forecasting an OLS Boosting Model ________________________________________________________________________ b34sexec options ginclude('b34sdata.mac') member(efron_1); b34srun; /; b34sexec options ginclude('b34sdata.mac') member(efron_2); b34srun; /; /; Implements OLS boosting as described in "Least Angle Regression" /; by Bradley Efron, Trevor Hastie, Iain Johnson and Robert Tibshirani /; The Annals of Statistics Vol 32 No. 2 (April 2004) pp 407-451 /; /; Modified boosting also available if iboost set = 1 /; b34sexec matrix; call loaddata; call load(center :staging); call load(center2 :staging); call load(boost :staging); call echooff; /; /; /; /; /; itype=0 itype=1 itype=2 itype=3 itype=4 => => => => => ols mars not ready gam not ready L1 MINIMAX itype=0; iprint=0; e=.5 ; ntry=1000; x=catcol(age sex bmi bp s1 s2 s3 s4 s5 s6); /; Gets the base if(itype.eq.0)call olsq(y,x :print); if(itype.eq.1)call marspline(y x :print :nk 20); if(itype.eq.2)call gamfit(y x[predictor,3] :print); if(itype.eq.3)call olsq(y,x :print :l1); if(itype.eq.4)call olsq(y,x :print :minimax); call print(ccf(%y,%yhat)); base1=ccf(%y,%yhat); fit beta1 beta2 in =array(ntry:); =array(ntry:); =array(ntry:); =idint(array(ntry:)); do i=1,ntry; call boost3(y,yhat,res,x,in,beta1,beta2,e,i,itype,iprint); fit(i)=ccf(y,yhat); call outstring(1,2,'Iteration'); call outinteger(20,2,i); call outstring(1,3,'Correlation'); call outstring(1,3,'Correlation'); call outdouble(20,3,fit(i)); enddo; call tabulate(in,fit,beta1,beta2); /; Testing if can "forecast" using saved Model call boost4(yhat2,x,in,beta1,beta2,e); call tabulate(yhat,yhat2); Special Topics in OLS Estimation 10-83 base=array(norows(fit):)+base1; if(itype.eq.0)call char1(cc1,'OLS' ); if(itype.eq.1)call char1(cc1,'MARS' ); if(itype.eq.2)call char1(cc1,'GAM' ); if(itype.eq.3)call char1(cc1,'L1' ); if(itype.eq.4)call char1(cc1,'MINIMAX'); cc=' HHS Boosting based Correlation of y and yhat given eps ='; call ir8tostr(e,cc2,'(f8.4)'); cc=catrow(cc1,cc,cc2); call graph(base,fit :heading cc :file 'boost.wmf' :nocontact :nolabel :pgborder); call print(fit); b34srun; Edited Results for running the code is Table 10.17 is given next. B34S 8.11D (D:M:Y) Variable # Cases AGE 1 SEX 2 BMI 3 BP 4 S1 5 S2 6 S3 7 S4 8 S5 9 S6 10 Y 11 CONSTANT 12 442 442 442 442 442 442 442 442 442 442 442 442 23/ 3/08 (H:M:S) 15:18: 1 Mean 48.51809955 1.468325792 26.37579186 94.64701357 189.1402715 115.4391403 49.78846154 4.070248869 4.641410860 91.26018100 152.1334842 1.000000000 Number of observations in data file Current missing variable code DATA STEP Std Deviation 13.10902782 0.4995611704 4.418121561 13.83128342 34.60805168 30.41308097 12.93420215 1.290449897 0.5223905611 11.49633474 77.09300453 0.000000000 Variance 171.8466104 0.2495613630 19.51979812 191.3044010 1197.717241 924.9554940 167.2935854 1.665260936 0.2728918983 132.1657124 5943.331348 0.000000000 442 1.000000000000000E+31 B34S(r) Matrix Command. d/m/y 23/ 3/08. h:m:s 15:18: 1. => CALL LOADDATA$ => CALL LOAD(CENTER => CALL LOAD(CENTER2 :STAGING)$ => CALL LOAD(BOOST => SUBROUTINE BOOST3(Y,YHAT,RES,X,IN,BETA1,BETA2,E,IPASS,ITYPE,IPRINT)$ => SUBROUTINE BOOST4(YHAT,X,IN,BETA1,BETA2,E)$ => CALL ECHOOFF$ :STAGING)$ :STAGING)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F(10, 431) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Col____1 Col____2 Col____3 Col____4 Col____5 Col____6 Col____7 Lag 0 0 0 0 0 0 0 Coefficient -0.36361224E-01 -22.859648 5.6029621 1.1168080 -1.0899963 0.74645046 0.37200472 Y 0.5177484222203494 0.5065592904853228 1263985.785633344 2932.681637200335 54.15423932805570 2621009.124434389 -2385.992862123519 152.1334841628959 77.09300453299109 19128.63379518937 46.27243958524313 1.000000000000000 1.170314752960385E-08 155.8267661215634 442 SE 0.21704144 5.8358213 0.71710550 0.22523817 0.57333186 0.53083439 0.78246385 t -0.16753126 -3.9171261 7.8133023 4.9583425 -1.9011613 1.4061833 0.47542735 Efron Diabeties Data Maximum 79.00000000 2.000000000 42.20000000 133.0000000 301.0000000 242.4000000 99.00000000 9.090000000 6.107000000 124.0000000 346.0000000 1.000000000 Minimum 19.00000000 1.000000000 18.00000000 62.00000000 97.00000000 41.60000000 22.00000000 2.000000000 3.258100000 58.00000000 25.00000000 1.000000000 PAGE 1 10-84 Chapter 10 Col____8 Col____9 Col___10 CONSTANT 0 0 0 0 6.5338319 68.483125 0.28011699 -334.56714 5.9586378 15.669719 0.27331395 67.454621 1.0965311 4.3704117 1.0248909 -4.9598846 0.71954737 This illustrates the fit as more and more iterations are performed. Fit approaches the OLS fit of .71954737. Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 IN 3 9 4 7 9 2 3 4 7 2 5 10 2 7 4 6 2 9 5 9 FIT 0.5865 0.6765 0.6920 0.7004 0.7003 0.7066 0.7058 0.7065 0.7073 0.7102 0.7122 0.7126 0.7136 0.7138 0.7139 0.7149 0.7154 0.7154 0.7159 0.7159 991 992 993 994 995 996 997 998 999 1000 6 5 6 7 5 9 1 6 5 6 0.7194 0.7194 0.7194 0.7194 0.7194 0.7194 0.7194 0.7194 0.7194 0.7194 BETA1 BETA2 10.23 -117.8 64.20 -221.9 1.337 -88.55 -1.067 72.16 19.87 -82.71 -13.63 24.77 1.566 -38.93 0.3923 -35.94 -0.3380 17.42 -10.40 15.56 -0.1071 20.40 0.3346 -30.46 -5.868 8.653 -0.2400 11.97 0.1968 -18.62 -0.9124E-01 10.54 -4.372 6.422 4.479 -20.79 -0.5773E-01 10.142 3.225 -14.97 0.2122E-02 -0.2006E-02 0.2085E-02 0.4979E-02 -0.1873E-02 0.1560 -0.4288E-02 0.1982E-02 -0.2113E-02 0.2069E-02 -0.2450 0.3795 -0.2407 -0.2479 0.3542 -0.7241 0.2080 -0.2288 0.3996 -0.2388 This tests BOOST4 since YHAT = YHAT2 Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 YHAT 205.9 68.40 176.6 166.0 128.2 106.1 75.28 119.9 159.5 213.9 97.98 97.49 114.9 164.0 102.6 175.8 210.14 182.8 147.6 123.6 119.7 87.21 114.5 257.9 165.0 147.1 96.66 178.9 129.1 184.4 159.1 69.51 259.6 110.6 79.00 86.36 207.6 157.0 241.2 137.0 YHAT2 205.9 68.40 176.6 166.0 128.2 106.1 75.28 119.9 159.5 213.9 97.98 97.49 114.9 164.0 102.6 175.8 210.14 182.8 147.6 123.6 119.7 87.21 114.5 257.9 165.0 147.1 96.66 178.9 129.1 184.4 159.1 69.51 259.6 110.6 79.00 86.36 207.6 157.0 241.2 137.0 Special Topics in OLS Estimation 10-85 The next set of examples uses the QR approach to estimate datasets that are known to be difficult to estimate when test answers are available. Table 10.17 lists the commands to load the data from Wampler (1970). Five problems are run, each with a different left-hand-side. The object is to show the accuracy differences that result. y1 1 x x 2 x 3 x 4 x 5 (10.5-1) y1 1 .1x .01x 2 .001x3 .0001x 4 .00001x5 (10.5-2) y3 y1 (10.5-3) y4 y1 100 (10.5-4) y5 y1 10000 (10.5-5) For all problems except (10.5-2) the right hand side is a constant plus x, x 2 , , x5 . For problem (10.5-2) the effect of multicollinearity is isolated from the effect of larger and larger values on the right hand side. Here the variables are scaled as (.1) x, (.1)2 x 2 , , (.1)5 x5 and the left-hand side adjusted such that for all 5 models the estimated coefficients should be 1. It is hypothesized that the estimated coefficients from estimating (10.5-2) will be closer 1. than (10.5-1) and that this effects will be due to scaling.. 10-86 Chapter 10 Table 10.18 Wampler Test Problem ==WAMPLER b34sexec data heading('Data from Wampler JASA June 1970') filef=@@$ * Data from Roy Wampler JASA June 1970 Vol 65, No. 330, pp. 549-564$ input x1 delta$ build y1 y2 y3 y4 y5 x2 x3 x4 x5 ax1 ax2 ax3 ax4 ax5$ * y1 = 1+ x1 + x1**2 + x1**3 + x1**4 + x1**5 $ * y2 = 1 + .1*x1 +.01*x1**2 +.001*x1**3 + .0001*x1**4 + .00001*x1**5$ * y3 = y1 + delta $ * y4 = y1 + 100*delta $ * y5 = y1 + 10000*delta $ gen x2= x1*x1 $ gen x3 = x2*x1$ gen x4 = x3*x1$ gen x5 = x4*x1$ gen ax1=.1*x1 $ gen ax2= .01*x2$ gen ax3=.001*x3$ gen ax4=.0001*x4$ gen ax5=.00001*x5 $ gen y1 = 1 + x1 + x1**2 + x1**3 + x1**4 + x1**5$ gen y2 = 1 + .1*x1 +.01*x1**2 +.001*x1**3 + .0001*x1**4 + .00001*x1**5$ gen y3 = y1 + delta$ gen y4 = y1 + 100. *delta $ gen y5 = y1 + 10000.*delta $ datacards$ 0. 759. 1. -2048. 2. 2048. 3. -2048. 4. 2523. 5. -2048. 6. 2048. 7. -2048. 8. 1838. 9. -2048. 10. 2048. 11. -2048. 12. 1838. 13. -2048. 14. 2048. 15. -2048. 16. 2523. 17. -2048. 18. 2048. 19. -2048. 20. 759. b34sreturn$ b34seend$ b34sexec list$ var x1 delta$ b34seend$ b34sexec qr $ model y1= x1 x2 x3 x4 x5 $ b34seend$ b34sexec qr $ model y2=ax1 ax2 ax3 ax4 ax5 $ b34seend$ b34sexec qr $ model y3= x1 x2 x3 x4 x5 $ b34seend$ b34sexec qr $ model y4= x1 x2 x3 x4 x5 $ b34seend$ b34sexec qr $ model y5= x1 x2 x3 x4 x5 $ b34seend$ == Equations (10.5-1) and (10.5-2) are perfect fits and all estimated coefficients should be 1.0 since the parameters {.1, .01, .001, .0001, .00001} have been built into the variables {AX1,...,AX5}. Equations (10.5-3) - (10.5-5) are not perfect fits and have increasing amounts of noise. X is defined as {0,1,...,20}. Edited output from running this test problem on a Gateway P5-90 is given below. On the IBM 390 slightly different answers will be found as shown in Stokes (1991, 246). OLS using QR decomposition for Y Number of observations Adjusted R square Standard Error of Estimate Sum of Squared Residuals Variable X5 X4 X3 X-11 X-10 X- 9 = Y1 X- 3 = 21 = 1.00000000000000 = 0.763648257769776D-010 = 0.874737992392221D-019 Coefficient 1.00000000000001350 0.999999999999290570 1.00000000001383780 Standard Error 0.363438173143536890E-14 0.182671077017781300E-12 0.328334037806728010E-11 t value 275149963293776.470 5474320381337.38090 304567874440.871150 Special Topics in OLS Estimation X2 CONSTANT X1 X- 8 X-17 X- 1 0.999999999878483870 0.999999999453480280 1.00000000045903460 OLS using QR decomposition for Y Number of observations Adjusted R square Standard Error of Estimate Sum of Squared Residuals Variable AX5 CONSTANT AX3 AX1 AX4 AX2 X-16 X-17 X-14 X-12 X-15 X-13 Variable X5 X4 X3 X2 CONSTANT X1 X-11 X-10 X- 9 X- 8 X-17 X- 1 Variable X5 X4 X3 X2 CONSTANT X1 X-11 X-10 X- 9 X- 8 X-17 X- 1 Variable X5 X4 X3 X2 CONSTANT X1 X-11 X-10 X- 9 X- 8 X-17 X- 1 t value 130992017073610.310 683616589443359.750 14499714893904.2730 62252325878667.1480 26061870416950.0860 18879555449649.2110 Standard Error 0.112324854679314690 5.64566512170766810 101.475507550352620 779.343524331607910 2152.32624678173580 2363.55173469688360 t value 8.90274910975845120 0.177127048530407170 0.985459471114569390E-02 0.128313121061206540E-02 0.464613578411293800E-03 0.423092071916349630E-03 = Y4 X- 6 = 21 = 0.943304587767550 = 236014.502379268 = 835542680000.000 Coefficient 1.00000000000030930 0.999999999984220960 1.00000000028554030 0.999999997837600070 0.999999996673739290 1.00000000606312800 OLS using QR decomposition for Y Number of observations Adjusted R square Standard Error of Estimate Sum of Squared Residuals Standard Error 0.763405299300070910E-14 0.146280826919992380E-14 0.689668733017904870E-13 0.160636568334661690E-13 0.383702314531362890E-13 0.529673488693598360E-13 = Y3 X- 5 = 21 = 0.999994078701093 = 2360.14502379267 = 83554267.9999999 Coefficient 1.00000000000001620 0.999999999999141240 1.00000000001652990 0.999999999858289690 0.999999999425811750 1.00000000051438680 OLS using QR decomposition for Y Number of observations Adjusted R square Standard Error of Estimate Sum of Squared Residuals 39656683700.2237400 14359430718.8853910 13076159579.0254170 = Y2 X- 4 = 21 = 1.00000000000000 = 0.160405034435521D-014 = 0.385946626083912D-028 Coefficient 0.999999999999994780 0.999999999999995890 0.999999999999980570 1.00000000000001440 1.00000000000002040 0.999999999999993560 OLS using QR decomposition for Y Number of observations Adjusted R square Standard Error of Estimate Sum of Squared Residuals 0.252164302854411900E-10 0.696406437713641010E-10 0.764750532765802950E-10 10-87 Standard Error 11.2324854679314790 564.566512170767280 10147.5507550352700 77934.3524331608500 215232.624678173770 236355.173469688570 t value 0.890274910976105170E-01 0.177127048527764240E-02 0.985459471379667420E-04 0.128313120801925440E-04 0.464613577132643180E-05 0.423092074263977630E-05 = Y5 X- 7 = 21 = -0.330337747712334 = 23601450.2379268 = 0.835542680000000D+016 Coefficient 1.00000000002976020 0.999999998484693230 1.00000002728896820 0.999999795397397810 0.999999721282435080 1.00000056024505920 Standard Error 1123.24854679314760 56456.6512170767240 1014755.07550352710 7793435.24331608510 21523262.4678173740 23635517.3469688560 t value 0.890274911002324830E-03 0.177127048262157320E-04 0.985459497990451150E-06 0.128313094826191270E-06 0.464613449182103880E-07 0.423092308733959070E-07 Since they are exact fits, the models for Y1 and Y2 show 1.0 as the adjusted R2. The model for Y1 shows approximately 11 to 12 correct digits, while the model for Y2 shows approximately 14 correct digits. This is because in the model for Y2, the higher power X terms are scaled, making them smaller, and thus making it easier to factor X. This problem, as noted, has been designed to shows the effects of matrix scaling when the answers are known. The models for Y3, Y4 and Y5 are not exact. As we gradually increase the error, the adjusted R2 falls from .99 to .94 to -.33 and the round-off error increases with the same X matrix. It is a common error to assume that only rank problems in X can cause problems. These examples show the effect of changes in the left-hand-side vector on the accuracy of the solution. This example illustrates the a signal-to-noise-ratio effect. Note that the SEs of the last three models have the same digits but appear to have been scaled. How could 10-88 Chapter 10 this be so? From equation (2.1-11) we note that the SEs the are square roots of the diagonal elements of ˆ 2 ( X ' X )1 . In all three problems ( X ' X )1 is unchanged although increasingly more noise is added in the form of , 100δ and 10,000δ. The added noise causes ˆ 2 to increase by a factor of 100 and 10,000 respectively. The second problem to be studied is the Longley (1967) dataset, which is given in Table 10.19. Three types of problems are run. These are the qr command, which obtains ˆi from equation (4.1-6), the usual regression command in which the constant is a vector of 1's in the data set and ˆi is obtained from equation (2.1-3) and the deviations from the mean approach, which obtains ˆ from equation (2.1-5) for all variables except the constant and equation (2.1-6) i for the constant. The deviations-from-the-mean estimation strategy is implemented by rereading the data back into B34S from unit 8 (with the UNIT=8 option) and specifying the noconstant option on the second data command. In the subsequent regression command the noint option is used because the deviations-from-the-mean approach obtains an estimate of the constant from equation (2.1-6). The deviations-from-the-mean option is rarely used since the usual approach is quite accurate. Its use here is to study the effect of multicollinearity on estimation. Although Table 10.19 lists all the Longley data, only results for Y1 are given. The reader is encouraged to run the other problems. Special Topics in OLS Estimation 10-89 Table 10.19 Longley Test Data ______________________________________________________________ ==LONGLEY /$longley /$ this problem is discussed in chapter 10 b34sexec data noob=16 nohead heading=('longley data')$ input y1 x1 x2 x3 x4 x5 x6 y2 y3 y4 y5 y6 y7 y8 $ * Data from longley jasa september 1967 $ * All equations contain variables x1-x7 $ datacards$ 60323. 83.0 234289. 2356. 1590. 107608. 1947. 8256. 6045. 427. 1714. 38407. 1892. 3582. 61122. 88.5 259426. 2325. 1456. 108632. 1948. 7960. 6139. 401. 1731. 39241. 1863. 3787. 60171. 88.2 258054. 3682. 1616. 109773. 1949. 8017. 6208. 396. 1772. 37922. 1908. 3948. 61187. 89.5 284599. 3351. 1650. 110929. 1950. 7497. 6069. 404. 1995. 39196. 1928. 4098. 63221. 96.2 328975. 2099. 3099. 112075. 1951. 7048. 5869. 400. 2055. 41460. 2302. 4087. 63639. 98.1 346999. 1932. 3594. 113270. 1952. 6792. 5670. 431. 1922. 42216. 2420. 4188. 64989. 99.0 365385. 1870. 3547. 115094. 1953. 6555. 5794. 423. 1985. 43587. 2305. 4340. 63761 100.0 363112. 3578. 3350. 116219. 1954. 6495. 5880. 445. 1919. 42271. 2188. 4563. 66019 101.2 397469. 2904. 3048. 117388. 1955. 6718. 5886. 524. 2216. 43761. 2187. 4727. 67857 104.6 419180. 2822. 2857. 118734. 1956. 6572. 5936. 581. 2359. 45131. 2209. 5069. 68169 108.4 442769. 2936. 2798. 120445. 1957. 6222. 6089. 626. 2328. 45278. 2217. 5409. 66513 110.8 444546. 4681. 2637. 121950. 1958. 5844. 6185. 605. 2456. 43530. 2191. 5702. 68655 112.6 482704. 3813. 2552. 123366. 1959. 5836. 6298. 597. 2520. 45214. 2233. 5957. 69564 114.2 502601. 3931. 2514. 125368. 1960. 5723. 6367. 615. 2489. 45850. 2270. 6250. 69331 115.7 518173. 4806. 2572. 127852. 1961. 5463. 6388. 662. 2594. 45397. 2279. 6548. 70551 116.9 554894. 4007. 2827. 130081. 1962. 5190. 6271. 623. 2626. 46652. 2340. 6849. b34sreturn$ b34seend$ b34sexec qr ipcc=pcreg $ model y1 = x1 x2 x3 x4 x5 x6 $ title=('Regression on Total')$ b34seend$ b34sexec regression manydigits toll=.1e-08$ comment=('Regression on Total ')$ model y1=x1 x2 x3 x4 x5 x6 $ b34seend$ b34sexec data noob=16 nohead noconstant unit=08 filef=dp heading=('no constant on input')$ input y1 x1 x2 x3 x4 x5 x6 y2 y3 y4 y5 y6 y7 y8 $ b34seend$ b34sexec regression manydigits toll=.1e-08 noint$ comment=('Regression on Total ')$ model y1=x1 x2 x3 x4 x5 x6 $ b34seend$ == _____________________________________________________________ Edited output from running the problem in Table 10.19 on a Gateway 2000 Intel P5-90 is given next. The manydigits option has been specified on the regression command to write the coefficients with 16 digits of accuracy. In addition, the usual coefficient output is given. The models run are QR, regression on levels and regression on deviations from the means. QR option version 1 July 1996 Comments regression on total Of 35000 Real*8 space, 144 is being used. 10-90 Chapter 10 OLS using QR decomposition for Y Number of observations Adjusted R square Standard Error of Estimate Sum of Squared Residuals Variable X2 X5 X3 X4 X6 X1 CONSTANT = Y1 X- 1 = 16 = 0.992465007628829 = 304.854073561898 = 836424.055505548 Coefficient -0.358191792926104010E-01 -0.511041056535210610E-01 -2.02022980381710490 -1.03322686717371810 1829.15146461373640 15.0618722714605970 -3482258.63459618620 X- 3 X- 6 X- 4 X- 5 X- 7 X- 2 X-15 Standard Error 0.334910077722366880E-01 0.226073200069318770 0.488399681651600530 0.214274163161636330 455.478499142097410 84.9149257747525470 890420.383607150170 t value -1.06951631722183400 -0.226051144663991450 -4.13642735594212410 -4.82198531044692570 4.01588981271120100 0.177376028231056780 -3.91080291815572870 Singular values of X 1663668.227889470 3.648093794807085 83899.57794622086 0.3423709062101756E-03 3407.197376095865 1582.643681003795 41.69360109707301 -46093.11125516977 1192.224208619154 2880.890760850634 1604.488608619055 1776.626531885898 -151.1972948782361 3.910802944796743 9.450064836564154 5.263136522573899 5.827793314774812 PC Regression Coef. -257497.7072024466 -210.6907153931316 t val. PC Reg. Coef. -844.6589025164000 -0.6911198952713113 Problem Number Subproblem Number 1 1 F to enter F to remove Tolerance Maximum no of steps Dependent variable X( 1). Standard Error of Y = 0.99999998E-02 0.49999999E-02 0.99999997E-09 7 Variable Name Y1 3511.9683 for degrees of freedom ............. Step Number 7 Variable Entering 3 Multiple R 0.997737 Std Error of Y.X 304.854 R Square 0.995479 X- 2 X- 3 X- 4 X- 5 X- 6 X- 7 X-15 X1 X2 X3 X4 X5 X6 CONSTANT Var X1 X2 X3 X4 X5 X6 CONSTANT T Val. Coef 0.13686 0.68732 0.99746 0.99906 0.17379 0.99696 0.99644 0.0590 -0.3358 -0.8095 -0.8491 -0.0751 0.8011 0.2345E-01 -0.2126 -0.9877E-01 -0.4123E-01 -0.9187E-01 54.73 0.992465 219.234869071137 235.234869071137 241.415578849055 92936.0029969030 Order of entrance (or deletion) of the variables = Estimate of computational error in coefficients = 1 0.4144E-01 2 -323.9 3 -0.3364 7 -0.1038 2 1 0.99999998E-02 0.49999999E-02 0.99999997E-09 6 6 4 2 7 15 4 -0.6822E-01 5 3 5 95.36 F Sig. 1.000000 Partial Cor. for Var. not in equation Variable Coefficient F for selection 84.91492341168758 0.3349100704795319E-01 0.4883996712397694 0.2142741577415063 0.2260731940689374 455.4784907402389 890420.3671920912 84.91492 0.1774 0.3349101E-01 -1.070 0.4883997 -4.136 0.2142742 -4.822 0.2260732 -0.2261 455.4785 4.016 890420.4 -3.911 Adjusted R Square -2 * ln(Maximum of Likelihood Function) Akaike Information Criterion (AIC) Scwartz Information Criterion (SIC) Residual Variance F to enter F to remove Tolerance Maximum no of steps T Sig. P. Cor. Elasticity SE 15.06187367747587 -0.3581918143189604E-01 -2.020229835731279 -1.033226876342277 -0.5110409828822657E-01 1829.151500000934 -3482258.703813081 X- 2 15.06187 X- 3 -0.3581918E-01 X- 4 -2.020230 X- 5 -1.033227 X- 6 -0.5110410E-01 X- 7 1829.152 X-15 -3482259. *************** Problem Number Subproblem Number 15. Analysis of Variance for reduction in SS due to variable entering Source DF SS MS F Due Regression 6 0.18417E+09 0.30695E+08 330.29 Dev. from Reg. 9 0.83642E+06 92936. Total 15 0.18501E+09 0.12334E+08 Multiple Regression Equation Variable Coefficient Std. Error Y1 = Extended precision = 6 202.8 Special Topics in OLS Estimation Dependent variable X( 1). Variable Name Y1 Standard Error of Y = 3511.9683 for degrees of freedom ............. Step Number 6 Variable Entering 2 Multiple R 0.997737 Std Error of Y.X 304.854 R Square 0.995479 Extended precision for constant and SE XXXXXXX1 X2 X3 X4 X5 X6 2 3 4 5 6 7 X1 X2 X3 X4 X5 X6 XXXXXX- Var 15. Analysis of Variance for reduction in SS due to variable entering Source DF SS MS F Due Regression 6 0.18417E+09 0.30695E+08 330.29 Dev. from Reg. 9 0.83642E+06 92936. Total 15 0.18501E+09 0.12334E+08 Multiple Regression Equation Variable Coefficient Std. Error Y1 = CONST. -3482259. 890420.4 Extended precision = 10-91 T Val. 890420.3882199302 SE 15.06187230978256 -0.3581917930289365E-01 -2.020229803946594 -1.033226867209468 -0.5110410558489326E-01 1829.151464660804 84.91492337552916 0.3349100682722105E-01 0.4883996678667511 0.2142741571071722 0.2260731936835627 455.4784862720230 84.91492 0.1774 0.3349101E-01 -1.070 0.4883997 -4.136 0.2142742 -4.822 0.2260732 -0.2261 455.4785 4.016 Adjusted R Square -2 * ln(Maximum of Likelihood Function) Akaike Information Criterion (AIC) Scwartz Information Criterion (SIC) Residual Variance Partial Cor. for Var. not in equation Variable Coefficient F for selection 0.99644 -3482258.634718623 Coef 2 15.06187 3 -0.3581918E-01 4 -2.020230 5 -1.033227 6 -0.5110411E-01 7 1829.151 T Sig. P. Cor. Elasticity -3.911 F Sig. 1.000000 0.13686 0.68732 0.99746 0.99906 0.17379 0.99696 0.0590 -0.3358 -0.8095 -0.8491 -0.0751 0.8011 0.2345E-01 -0.2126 -0.9877E-01 -0.4123E-01 -0.9187E-01 54.73 0.992465 219.234869616679 235.234869616679 241.415579394597 92936.0061656852 Order of entrance (or deletion) of the variables = 3 Estimate of computational error in coefficients = 1 -0.2957E-09 2 0.1304E-04 3 0.1019E-07 4 5 4 7 6 0.1905E-08 2 5 -0.4573E-06 6 -0.2079E-09 Table 10.20 lists the estimated coefficients for the three approaches and also lists results obtained by Beaton, Rubin, and Barone (1976), using DORTHO. Careful readers will note that there are slight differences between the reported coefficients in columns QR, LEVELS and DMEANS, which were run with B34S on IBM/MVS and the corresponding values obtained in the output given above. The differences are due to the fact that on the Intel P5-90 there is full IEEE accuracy, while on the IBM there is less accuracy. DORTHO uses the classical Gram-Schmidt approach, which Beaton, Rubin and Barone assure us "agree with Longley's hand-calculated solution in every published place."15 The B34S QR results on raw, unscaled data are very close to these results. The deviations-from-the-means output is next best in accuracy, while the levels output is least accurate. 15 Since most economic data only have a few significant digits, reporting regression coefficients to 16 places makes no sense. However, the study of all 16 digits does make sense when testing program calculation accuracy. When performing these tests, an implicit assumption is that we have at least 16 significant digits of accuracy for the input data. If we have n significant digits, where n < 16, reporting 16 digits of accuracy in the answer assumes implicitly that the last 16-n digits in the raw data were 0. As will be shown in section 16.7, this does not mean that calculation should not be carried out with a high degree of accuracy. Experiments with VPA math will show that the precision selected for the initial data read into a floating point number may “doom” any subsequent calculations even if the accuracy is boosted for calculation. The implication is that data read into real*4 in memory, may cause problems with subsequent calculations even if the data has been moved to real*8 because of the accuracy loss of the original data read. 92 Chapter 10 Unless the TOLL parameter is set to a very low number, the B34S regression command will not attempt the problem. The regression command's measure of accuracy, based on the Faddeeva (1959) algorithm, on the IBM/MVS shows the absolute values of the tests ranging from .04144 to 1619 for output using the level data. The corresponding range for the P5-90 is .04144 to 323.9, which is substantially lower. These errors are orders of magnitude from what is usually found for problems such as those reported in Chapter 2. Nevertheless, about eight digits are correct and, if the usual output is used, there would be no way to tell any difference from the DORTHO results. The regression results from the deviations-from-the-means approach are more accurate and the error estimate values range from .5206E-09 to .8714E-05 in absolute value on the IBM, with comparable values on the P590. The PC regression output shows the singular values of X. These range from .1664E+07 to .34237D-03, which is a large spread. Equation (10.2-9) shows how this spread magnifies the variance of the coefficients. Section 16.7 introduces the topic of accuracy from the point of view of an exhaustive investigation of the gains of calculation in real*16, how data is read (as character or as real numbers) and the advantages of variable precision arithmetic (VPA) whereby ~1750 digits of accuracy are possible in the current implementation of B34S. Since these approaches involve extensive use of the matrix command, their discussion has been put off until later. Table 2.9 in Chapter 2 illustrates use of the QR procedure to measure leverage as suggested by Davidson and MacKinnon (1993, 37-39). Table 10.20 Results from the Longley Total Equation ─────────────────────────────────────────────────────────────────────────────────────── Variable DORTHO QR LEVELS DMEANS ────────────────────────────────────────────────────────────────────────────────────── Constant -3482258.634597 -3482258.63459606981 -3482258.656344591 -3482258.634727585 X1 15.061872271761 15.0618722714175008 15.06187281288034 15.06187235777704 X2 -.035819179293 -.035819179292592743 -.0358191799858490 -.035819179315730 X3 -2.020229803818 -2.02022980381681849 -2.020229814090662 -2.02022980410821 X4 -1.033226867174 -1.03322686717366818 -1.033226870110064 -1.03322686725428 X5 -0.051104105653 -0.05110410565364853 -.0511041031599990 -.051104105499356 X6 1829.151464614112 1829.15146461368289 1829.151475721324 1829.15146471942 ─────────────────────────────────────────────────────────────────────────────────────── DORTHO reports results from Beaton, Rubin, and Barone (1976). QR reports results from the B34S QR command. LEVELS reports the regression command where data are levels. DMEANS reports results from the regression command where the data are deviations from the means. Columns QR, LEVELS and DMEANS were run with B34S on IBM/MVS. One of the key assumptions of OLS is that the correlation between the error term and the right hand sides of the equation are forced to be zero or near zero. At issue is how sensitive this calculation is to the problem at hand, the data precision and the method of calculation. The next two examples investigate the relationship between the precision of the data (real*4, real*8, real*16 and VPA), the method of solution (Cholesky vs OLS) and the difficulty of the problem. More detail on this issue is contained in Stokes (2005). The first problem involves a very simple test case based on the Grunfeld data that does not involve right hand side variables of different sizes. The second example uses a three lag model of the gas data. A summary of the findings is given in Table 10.21. Using the simple dataset and real*8 data, the correlation for the Choleski vs QR approach is between xe14 and xe-15 where x is some number. The corresponding numbers for real*16 are xe-33 and ex- Special Topics in OLS Estimation 93 34. Variable precision math (VPA) is discussed in some detail in chapter 16. Using default settings, the correlation was xm-62. Using real*4 the corresponding correlations were xe-5 to xe-6. For the somewhat more complex dataset a different pattern emerges. For real*8 the correlations increase for all cases with the QR showing substantially less correlation. What is surprising is the relatively poor real*4 showing of correlations in the area of xe-01. Use of the QR reduces these to xe-05. The implication of this study is that any calculation involving real*4 data is very prone to error and should be avoided. However if real*4 data are used, it is highly recommended that the QR method be used. Even with the real*16 data, gains from using the QR are shown. The jobs that made these calculations are shown next and the estimated coefficients are shown. While the estimated coefficients are not markedly different, except for the real*4 cases, and thus can mask calculation problems, when the error is correlated with the right hand side the accuracy of the calculation is more rigorously tested. Table 10.21 Correlation between the Error term and Right-Hand-Side ___________________________________________________ Grunfeld Example GEF GEC _____________________________________________________________________________________ Real*8 Chol -0.356987E-14 -0.260707E-14 Real*8 QR 0.271906E-15 0.333125E-15 Real*16 Chol -0.852072E-33 -0.188439E-33 Real*16 QR -0.912934E-34 -0.263815E-33 Real*4 LU 0.171420E-05 0.126598E-06 Real*4 QR 0.216786E-06 -0.236300E-06 VPA -.332843M-62 -.179417M-62 ______________________________________________________________________________________________________ Gas Data of lag Order 3 GASOUT(t-1) GASOUT(t-2) GASOUT(t-3) GASIN(t-1) GASIN(t-2) GASIN(t-3) ______________________________________________________________________________________________________ Real*8 Chol -0.119284E-10 0.868642E-11 -0.430643E-11 -0.434085E-13 0.455703E-13 -0.123613E-13 Real*8 QR 0.131642E-13 0.120757E-13 0.106601E-13 -0.596223E-14 -0.858368E-14 -0.108972E-13 Real*16 Chol -0.132839E-28 0.747619E-29 -0.214679E-29 0.515527E-32 0.797312E-32 0.441218E-32 Real*16 QR -0.137433E-31 -0.126724E-31 -0.112671E-31 0.584987E-32 0.889735E-32 0.117243E-31 Real*4 LU -0.895673E-01 -0.918947E-01 -0.891362E-01 0.369165E-01 0.452364E-01 0.557397E-01 Real*4 QR -0.144533E-05 -0.151427E-05 -0.153826E-05 0.124983E-06 0.420500E-06 0.724758E-06 VPA .957024M-59 .796435M-59 .628011M-59 -.691398M-59 -.839960M-59 -.963458M-59 ______________________________________________________________________________________________________ For data descriptions see text. All calculations made with C10H and C10I programs. The Cholesky calculation made with various precisions of the LINPACK Cholesky routines. The QR results were run with the LINPACK QR code except for the real*4 function call to the qr( ) which used LAPACK. The VPA results were done with LINPACK routines that were converted to run VPA math. The real*8 LU and real*4 LU calculations were made with LINPACK routines DGECO DGEDI and SGECO SGEDI respectively. Runs made with b34slf95.exe. Real*4 results differ fopr b34sia32.exe. This first test case, whose commands are shown in Table 10.22, runs the simple Grunfeld GE equation. 94 Chapter 10 Table 10.22 Effect of Data Precision on Accuracy using the Grunfeld Data ==CH10F Effect of Data Precision on Accuracy /; /; Charles Renfro Test Regression /; b34sexec options ginclude('b34sdata.mac') member(grunfeld_4); b34srun; b34sexec matrix; call loaddata; call echooff; /; /; Set parameters for tests /; /; call print(' ':); call print('GE Equation':); call print('-----------':); call print(' ':); call print(' ':); call load(cov :staging); call load(cor :staging); call echooff; /; dovpa=1; call olsq(gei gef gec :print :savex); call print('Tests on Real*8 Cholesky':); i=nocols(%x); test=%x; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); call olsq(gei gef gec :print :savex :qr); call print('Tests on Real*8 QR':); i=nocols(%x); test=%x; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); %yhold=%y; %xhold=%x; %ytest=r8tor16(%yhold); %xtest=r8tor16(%xhold); call olsq(%ytest %xtest :print :noint :savex); call print('Tests on Real*16 Cholesky':); i=nocols(%x); test=%x; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); call olsq(%ytest %xtest :print :noint :qr :savex); call print('Tests on Real*16 QR':); i=nocols(%x); test=%x; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); /; Special Topics in OLS Estimation /; This example show how key method is when data is not /; saved with enough precision call print('Real*4 Tests ':); call print('_________ ':); %ytest=r8tor4(%yhold); %xtest=r8tor4(%xhold); beta = inv(transpose(%xtest)*%xtest)*transpose(%xtest)*%ytest; call print('beta from Real*4 using inverse ':); call print(beta); i=nocols(%xtest); %res=%xtest*beta-%ytest; test=%xtest; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); /; QR approach r4_r=qr( %xtest ,r4_q); /; r8_r=qr(r4tor8(%xtest),r8_q); /; call print('R looked at from real 4 to real 8 and from real 4', /; r8_r,r4_r); r4_yhat=r4_q*transpose(r4_q)*%ytest; r4_res =%ytest-r4_yhat; /; /; Beta not needed to get yhat from QR !! /; r4_beta=inv(r4_r)*transpose(r4_q)*%ytest; call print('beta from Real*4 using QR ':); call print(r4_beta); i=nocols(%xtest); test=%xtest; test(1,i)=r4_res; call print('Last Column is the residual':); call print(cor(test)); if(dovpa.ne.0)then; call print('VPA Tests ':); call print('_________ ':); %ytest=vpa(%yhold); %xtest=vpa(%xhold); beta = inv(transpose(%xtest)*%xtest)*transpose(%xtest)*%ytest; call print('beta from VPA ':); call print(beta); i=nocols(%xtest); %res=%xtest*beta-%ytest; test=%xtest; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); endif; b34srun; == Edited output produces: 95 96 Chapter 10 B34S 8.11C Variable YEAR GMI GMF GMC CI CF CC USI USF USC GEI GEF GEC WI WF WC CONSTANT (D:M:Y) 19/ 3/07 (H:M:S) 8: 8:20 Label 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 DATA STEP Mean 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 1944.50 608.020 4333.85 648.435 86.1235 693.210 121.245 405.460 1971.83 299.855 102.290 1941.33 400.160 42.8915 670.910 85.6400 1.00000 Year General Motors gross investment General Motors value of firm General Motors stock of plant and equip. Chrysler gross investment Chrysler value of firm Chrysler stock of plant and equip. US Steel gross investment US Steel value of firm US Steel stock of plant and equip. General Electric gross investment General Electric value of firm General Electric stock of plant & equip. Westinghouse gross investment Westinghouse value of firm Westinghouse stock of plant and equip. Number of observations in data file 20 Current missing variable code 1.000000000000000E+31 Data begins on (D:M:Y) 1: 1:1935 ends 1: 1:1954. Frequency is B34S(r) Matrix Command. d/m/y 19/ 3/07. h:m:s => CALL LOADDATA$ => CALL ECHOOFF$ 8: 8:20. GE Equation ----------- Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 2, 17) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable Lag Coefficient GEF 0 0.26551189E-01 GEC 0 0.15169387 CONSTANT 0 -9.9563065 Tests on Real*8 Cholesky Last Column is the residual Array 1 2 3 of 1 1.00000 0.118243 -0.356987E-14 3 by 2 0.118243 1.00000 -0.260707E-14 GEI 0.7053066881516170 0.6706368867576896 13216.58777024301 777.4463394260592 27.88272474895628 44848.61800000000 -93.31372761767831 102.2900000000000 48.58449936911327 398.8665105091626 20.34354567358886 0.9999691221270504 8.363317695562057E-09 58.73723628346180 20 SE 0.15566104E-01 0.25704083E-01 31.374249 3 t 1.7057055 5.9015476 -0.31734007 elements 3 -0.356987E-14 -0.260707E-14 1.00000 Ordinary Least Squares Estimation using QR Method Dependent variable GEI Centered R**2 0.7053066881516169 Adjusted R**2 0.6706368867576895 Residual Sum of Squares 13216.58777024301 Residual Variance 777.4463394260592 Standard Error 27.88272474895628 Total Sum of Squares 44848.61800000000 Log Likelihood -93.31372761767832 Mean of the Dependent Variable 102.2900000000000 Std. Error of Dependent Variable 48.58449936911327 Sum Absolute Residuals 398.8665105091630 F( 2, 17) 20.34354567358885 F Significance 0.9999691221270504 QR Rank Check variable (eps) set as 2.220446049250313E-16 Maximum Absolute Residual 58.73723628346178 Number of Observations 20 Variable GEF GEC Lag 0 0 Coefficient 0.26551189E-01 0.15169387 SE 0.15566104E-01 0.25704083E-01 Grunfeld Investment Study # Cases t 1.7057055 5.9015476 1 Std. Dev. 5.91608 309.575 904.305 630.164 42.7256 160.599 111.328 129.352 301.088 153.022 48.5845 413.843 250.619 19.1102 222.392 62.2649 0.00000 PAGE 1 Variance Maximum Minimum 35.0000 95836.5 817767. 397107. 1825.47 25792.1 12393.8 16731.9 90653.9 23415.8 2360.45 171266. 62809.8 365.199 49458.2 3876.92 0.00000 1954.00 1486.70 6241.70 2226.30 174.930 1001.50 414.900 645.200 2676.30 669.700 189.600 2803.30 888.900 90.0800 1193.50 213.500 1.00000 1935.00 257.700 2792.20 2.80000 40.2900 410.1400 10.2000 209.900 1362.40 50.5000 33.1000 1170.60 97.8000 12.9300 191.500 0.800000 1.00000 Special Topics in OLS Estimation CONSTANT 0 -9.9563065 Tests on Real*8 QR Last Column is the residual Array of 1 1.00000 0.118243 0.271906E-15 1 2 3 3 31.374249 by 3 2 0.118243 1.00000 0.333125E-15 -0.31734007 elements 3 0.271906E-15 0.333125E-15 1.00000 Ordinary Least Squares Estimation - Real*16 Dependent variable %YTEST Centered R**2 0.70530668815161693372209221797820645 Adjusted R**2 0.67063688675768951415998542009328959 Residual Sum of Squares 13216.587770243005958442337265798097 Residual Variance 777.44633942605917402601983916459390 Standard Error 27.882724748956282268018488624583601 Total Sum of Squares 44848.617999999999664993310943828684 Log Likelihood -93.313727617678318043967643581115597 Mean of the Dependent Variable 102.29000000000000021316282072803005 Std. Error of Dependent Variable 48.584499369113276923737439122417442 Sum Absolute Residuals 398.86651050916290890583210485686862 F( 2, 17) 20.343545673588852782321238534390417 F Significance 0.99996912212705035827298161166254431 1/Condition XPX 8.3633176955620372924913485687834572E-0009 Maximum Absolute Residual 58.737236283461805116033204434351558 Number of Observations 20 Variable Lag Coefficient Col____1 0 0.26551189E-01 Col____2 0 0.15169387 Col____3 0 -9.9563065 Tests on Real*16 Cholesky Last Column is the residual Array of 1 1.00000 0.118243 -0.852072E-33 1 2 3 3 by SE 0.15566104E-01 0.25704083E-01 31.374249 3 2 0.118243 1.00000 -0.188439E-33 t 1.7057055 5.9015476 -0.31734007 elements (real*16) 3 -0.852072E-33 -0.188439E-33 1.00000 Ordinary Least Squares Estimation - Real*16 & QR Dependent variable %YTEST Centered R**2 0.70530668815161693372209221797820645 Adjusted R**2 0.67063688675768951415998542009328959 Residual Sum of Squares 13216.587770243005958442337265798095 Residual Variance 777.44633942605917402601983916459380 Standard Error 27.882724748956282268018488624583598 Total Sum of Squares 44848.617999999999664993310943828684 Log Likelihood -93.313727617678318043967643581115585 Mean of the Dependent Variable 102.29000000000000021316282072803005 Std. Error of Dependent Variable 48.584499369113276923737439122417442 Sum Absolute Residuals 398.86651050916290890583210485686862 F( 2, 17) 20.343545673588852782321238534390417 F Significance 0.99996912212705035827298161166254431 QR Rank Check variable (eps) set as 1.9259299443872358530559779425849273E-0034 Maximum Absolute Residual 58.737236283461805116033204434351546 Number of Observations 20 Variable Lag Coefficient Col____1 0 0.26551189E-01 Col____2 0 0.15169387 Col____3 0 -9.9563065 Tests on Real*16 QR Last Column is the residual Array of 1 1.00000 0.118243 -0.912934E-34 1 2 3 Real*4 Tests _________ beta from Real*4 BETA 3 by SE 0.15566104E-01 0.25704083E-01 31.374249 3 2 0.118243 1.00000 -0.263815E-33 t 1.7057055 5.9015476 -0.31734007 elements (real*16) 3 -0.912934E-34 -0.263815E-33 1.00000 using inverse = Vector of 0.265513E-01 3 elements (real*4) 0.151694 -9.95650 Last Column is the residual Array 1 of 1 1.00000 3 by 2 0.118243 3 elements (real*4) 3 0.171420E-05 97 98 Chapter 10 2 3 0.118243 0.171420E-05 beta from Real*4 1.00000 0.126598E-06 0.126598E-06 1.00000 using QR R4_BETA = Vector of 0.265512E-01 3 elements (real*4) 0.151694 -9.95630 Last Column is the residual Array 1 2 3 of 3 1 1.00000 0.118243 0.216786E-06 by 3 2 0.118243 1.00000 -0.236300E-06 elements (real*4) 3 0.216786E-06 -0.236300E-06 1.00000 VPA Tests _________ beta from VPA BETA = Vector of .265512M-1 3 .151694M+0 elements VPA - FM -.995631M+1 Last Column is the residual Array of 1 1 .100000M+1 2 .118243M+0 3 -.332843M-62 3 by 2 .118243M+0 .100000M+1 -.179417M-62 3 elements VPA - FM 3 -.332843M-62 -.179417M-62 .100000M+1 B34S Matrix Command Ending. Last Command reached. Space available in allocator Number variables used Number temp variables used 11856250, peak space used 110, peak number used 1294, # user temp clean 289943 120 0 The next setup shown in Table 10.23 runs the slightly more demanding gas data model. Special Topics in OLS Estimation Table 10.23 Effect of Data Precision on Accuracy using Gas Data ==CH10G Effect of precision on OLS Assumptions b34sexec options ginclude('gas.b34'); b34srun; b34sexec matrix; call loaddata; call load(cov :staging); call load(cor :staging); call echooff; lag=3; /; /; This options takes space /; dovpa=1; doreal4=1; call olsq(gasout gasout{1 to lag} gasin{1 to lag} :print :savex :diag); /; Save data for matlab call makematlab(%y :file 'ydata2.m'); call makematlab(%x :file 'xdata2.m'); call print('Tests on Real*8 Cholesky':); call print('++++++++++++++++++++++++':); i=nocols(%x); test=%x; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); call olsq(gasout gasout{1 to lag} gasin{1 to lag} :print :savex :qr :diag); call print('Tests on Real*8 QR':); call print('++++++++++++++++++':); i=nocols(%x); test=%x; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); %yhold=%y; %xhold=%x; %ytest=r8tor16(%yhold); %xtest=r8tor16(%xhold); call olsq(%ytest %xtest :print :diag :noint :savex); call print('Tests on Real*16 Cholesky':); call print('+++++++++++++++++++++++++':); i=nocols(%x); test=%x; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); call olsq(%ytest %xtest :print :diag :noint :qr :savex); call print('Tests on Real*16 QR':); i=nocols(%x); test=%x; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); 99 100 Chapter 10 if(dovpa.ne.0)then; call print('VPA Tests ':); call print('_________ ':); %ytest=vpa(%yhold); %xtest=vpa(%xhold); beta = inv(transpose(%xtest)*%xtest)*transpose(%xtest)*%ytest; call print('beta from VPA ':); call print(beta); i=nocols(%xtest); %res=%xtest*beta-%ytest; test=%xtest; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); endif; /; This example show how key method is when data is not /; saved with enough precision if(doreal4.ne.0)then; call print('Real*4 Tests ':); call print('_________ ':); %ytest=r8tor4(%yhold); %xtest=r8tor4(%xhold); beta = inv(transpose(%xtest)*%xtest)*transpose(%xtest)*%ytest; call print('beta from Real*4 using inverse ':); call print(beta); i=nocols(%xtest); %res=%xtest*beta-%ytest; test=%xtest; test(1,i)=%res; call print('Last Column is the residual':); call print(cor(test)); /; QR approach r4_r=qr( %xtest ,r4_q); /; r8_r=qr(r4tor8(%xtest),r8_q); /; call print('R looked at from real 4 to real 8 and from real 4', /; r8_r,r4_r); r4_yhat=r4_q*transpose(r4_q)*%ytest; r4_res =%ytest-r4_yhat; /; /; Beta not needed to get yhat from QR !! /; r4_beta=inv(r4_r)*transpose(r4_q)*%ytest; call print('beta from Real*4 using QR ':); call print(r4_beta); i=nocols(%xtest); test=%xtest; test(1,i)=r4_res; call print('Last Column is the residual':); call print(cor(test)); endif; b34srun; == Special Topics in OLS Estimation 101 When the commands in Table 10.23 are run, the edited output is: B34S 8.11C Variable TIME GASIN GASOUT CONSTANT (D:M:Y) 19/ 3/07 (H:M:S) Label DATA STEP # Cases 1 2 Input gas rate in cu. ft / min 3 Percent CO2 in outlet gas 4 Number of observations in data file Current missing variable code => CALL LOADDATA$ => CALL LOAD(COV :STAGING)$ => CALL LOAD(COR :STAGING)$ => CALL ECHOOFF$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 6, 286) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Array of 1 1.00000 0.972640 0.899640 -0.484755 -0.599240 -0.726835 -0.119284E-10 7 by Std. Dev. 85.5921 1.07277 3.20212 0.00000 Variance Maximum Minimum 7326.00 1.15083 10.2536 0.00000 296.000 2.83400 60.5000 1.00000 1.00000 -2.71600 45.6000 1.00000 8: 8:22. 15.04125896346881 31.04125896346881 60.48263983560535 6.465104255530167E-02 6.468796447937171E-02 6.696762383517987E-02 6.457896289465981E-02 6.472673911647753E-02 SE 0.57281193E-01 0.87318874E-01 0.41791902E-01 0.77592475E-01 0.15220401 0.10682485 0.56650687 2 0.972640 1.00000 0.972527 -0.394493 -0.485453 -0.600971 0.868642E-11 Mean 296 148.500 296 -0.568345E-01 296 53.5091 296 1.00000 GASOUT 0.9940295933829316 0.9939043400972588 18.05876021349723 6.314251822901129E-02 0.2512817506883683 3024.711945392491 -7.520629481734408 53.50784982935154 3.218478297691941 51.43868016689360 7936.155830513969 1.000000000000000 7.928195070756484E-08 1.523989618188139 293 -2 * ln(Maximum of Likelihood Function) Akaike Information Criterion (AIC) Scwartz Information Criterion (SIC) Akaike (1970) Finite Prediction Error Generalized Cross Validation Hannan & Quinn (1979) HQ Shibata (1981) Rice (1984) Variable Lag Coefficient GASOUT 1 1.6122699 GASOUT 2 -0.93427375 GASOUT 3 0.19438193 GASIN 1 0.10562265 GASIN 2 -0.32153622 GASIN 3 -0.19498353 CONSTANT 0 6.8096728 Tests on Real*8 Cholesky ++++++++++++++++++++++++ Last Column is the residual PAGE 296 1.000000000000000E+31 B34S(r) Matrix Command. d/m/y 19/ 3/07. h:m:s 1 2 3 4 5 6 7 8: 8:22 7 t 28.146583 -10.699562 4.6511865 1.3612487 -2.1125345 -1.8252639 12.020459 elements 3 0.899640 0.972527 1.00000 -0.330178 -0.395169 -0.487296 -0.430643E-11 4 -0.484755 -0.394493 -0.330178 1.00000 0.952560 0.834273 -0.434085E-13 Ordinary Least Squares Estimation using QR Method Dependent variable GASOUT Centered R**2 0.9940295933829316 Adjusted R**2 0.9939043400972588 Residual Sum of Squares 18.05876021349716 Residual Variance 6.314251822901103E-02 Standard Error 0.2512817506883678 Total Sum of Squares 3024.711945392491 Log Likelihood -7.520629481733790 Mean of the Dependent Variable 53.50784982935154 Std. Error of Dependent Variable 3.218478297691941 Sum Absolute Residuals 51.43868016594674 F( 6, 286) 7936.155830513969 F Significance 1.000000000000000 5 -0.599240 -0.485453 -0.395169 0.952560 1.00000 0.952600 0.455703E-13 6 -0.726835 -0.600971 -0.487296 0.834273 0.952600 1.00000 -0.123613E-13 7 -0.119284E-10 0.868642E-11 -0.430643E-11 -0.434085E-13 0.455703E-13 -0.123613E-13 1.00000 2 102 Chapter 10 QR Rank Check variable (eps) set as Maximum Absolute Residual Number of Observations 2.220446049250313E-16 1.523989618133797 293 -2 * ln(Maximum of Likelihood Function) Akaike Information Criterion (AIC) Scwartz Information Criterion (SIC) Akaike (1970) Finite Prediction Error Generalized Cross Validation Hannan & Quinn (1979) HQ Shibata (1981) Rice (1984) Variable Lag Coefficient GASOUT 1 1.6122699 GASOUT 2 -0.93427375 GASOUT 3 0.19438193 GASIN 1 0.10562265 GASIN 2 -0.32153622 GASIN 3 -0.19498353 CONSTANT 0 6.8096728 Tests on Real*8 QR ++++++++++++++++++ Last Column is the residual Array 1 2 3 4 5 6 7 of 1 1.00000 0.972640 0.899640 -0.484755 -0.599240 -0.726835 0.131642E-13 7 15.04125896346756 31.04125896346756 60.48263983560410 6.465104255530139E-02 6.468796447937145E-02 6.696762383517960E-02 6.457896289465954E-02 6.472673911647726E-02 SE 0.57281193E-01 0.87318874E-01 0.41791902E-01 0.77592475E-01 0.15220401 0.10682485 0.56650687 by 2 0.972640 1.00000 0.972527 -0.394493 -0.485453 -0.600971 0.120757E-13 7 t 28.146583 -10.699562 4.6511865 1.3612487 -2.1125345 -1.8252639 12.020459 elements 3 0.899640 0.972527 1.00000 -0.330178 -0.395169 -0.487296 0.106601E-13 4 -0.484755 -0.394493 -0.330178 1.00000 0.952560 0.834273 -0.596223E-14 5 -0.599240 -0.485453 -0.395169 0.952560 1.00000 0.952600 -0.858368E-14 6 -0.726835 -0.600971 -0.487296 0.834273 0.952600 1.00000 -0.108972E-13 7 0.131642E-13 0.120757E-13 0.106601E-13 -0.596223E-14 -0.858368E-14 -0.108972E-13 1.00000 Ordinary Least Squares Estimation - Real*16 Dependent variable %YTEST Centered R**2 0.99402959338293156924425522698084867 Adjusted R**2 0.99390434009725880496266617579862870 Residual Sum of Squares 18.058760213497255366748012081384418 Residual Variance 6.3142518229011382401216825459386075E-0002 Standard Error 0.25128175068836849856260081690384742 Total Sum of Squares 3024.7119453924911865704776486789551 Log Likelihood -7.5206294817346103960626213352809158 Mean of the Dependent Variable 53.507849829351535632472389747957627 Std. Error of Dependent Variable 3.2184782976919404594999841764916592 Sum Absolute Residuals 51.438680165948185297626072577080069 F( 6, 286) 7936.1558305139014385914947531791336 F Significance 1.0000000000000000000000000000000000 1/Condition XPX 7.9281950718440669605441111957152000E-0008 Maximum Absolute Residual 1.5239896181337612967868794091600124 Number of Observations 293 -2 * ln(Maximum of Likelihood Function) Akaike Information Criterion (AIC) Scwartz Information Criterion (SIC) Akaike (1970) Finite Prediction Error Generalized Cross Validation Hannan & Quinn (1979) HQ Shibata (1981) Rice (1984) Variable Lag Coefficient Col____1 0 1.6122699 Col____2 0 -0.93427375 Col____3 0 0.19438193 Col____4 0 0.10562265 Col____5 0 -0.32153622 Col____6 0 -0.19498353 Col____7 0 6.8096728 Tests on Real*16 Cholesky +++++++++++++++++++++++++ Last Column is the residual Array 1 2 3 4 5 6 7 of 1 1.00000 0.972640 0.899640 -0.484755 -0.599240 -0.726835 -0.132839E-28 7 15.041258963469220792125242670561870 31.041258963469220792125242670561870 60.482639835605759239643964773061853 6.4651042555301756724795384429405543E-0002 6.4687964479371800851596258250350072E-0002 6.6967623835179973193778768296956194E-0002 6.4578962894659895835614156355752737E-0002 6.4726739116477617801964201008546308E-0002 SE 0.57281193E-01 0.87318874E-01 0.41791902E-01 0.77592475E-01 0.15220401 0.10682485 0.56650687 by 2 0.972640 1.00000 0.972527 -0.394493 -0.485453 -0.600971 0.747619E-29 7 t 28.146583 -10.699562 4.6511865 1.3612487 -2.1125345 -1.8252639 12.020459 elements (real*16) 3 0.899640 0.972527 1.00000 -0.330178 -0.395169 -0.487296 -0.214679E-29 4 -0.484755 -0.394493 -0.330178 1.00000 0.952560 0.834273 0.515527E-32 5 -0.599240 -0.485453 -0.395169 0.952560 1.00000 0.952600 0.797312E-32 Ordinary Least Squares Estimation - Real*16 & QR Dependent variable %YTEST Centered R**2 0.99402959338293156924425522698084867 Adjusted R**2 0.99390434009725880496266617579862870 Residual Sum of Squares 18.058760213497255366748012081384479 Residual Variance 6.3142518229011382401216825459386292E-0002 Standard Error 0.25128175068836849856260081690384785 6 -0.726835 -0.600971 -0.487296 0.834273 0.952600 1.00000 0.441218E-32 7 -0.132839E-28 0.747619E-29 -0.214679E-29 0.515527E-32 0.797312E-32 0.441218E-32 1.00000 Special Topics in OLS Estimation Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 6, 286) F Significance QR Rank Check variable (eps) set as Maximum Absolute Residual Number of Observations 3024.7119453924911865704776486789551 -7.5206294817346103960626213352814235 53.507849829351535632472389747957627 3.2184782976919404594999841764916592 51.438680165948185297626072576174537 7936.1558305139014385914947531791336 1.0000000000000000000000000000000000 1.9259299443872358530559779425849273E-0034 1.5239896181337612967868794091023147 293 -2 * ln(Maximum of Likelihood Function) Akaike Information Criterion (AIC) Scwartz Information Criterion (SIC) Akaike (1970) Finite Prediction Error Generalized Cross Validation Hannan & Quinn (1979) HQ Shibata (1981) Rice (1984) Variable Lag Coefficient Col____1 0 1.6122699 Col____2 0 -0.93427375 Col____3 0 0.19438193 Col____4 0 0.10562265 Col____5 0 -0.32153622 Col____6 0 -0.19498353 Col____7 0 6.8096728 Tests on Real*16 QR Last Column is the residual Array of 7 1 1.00000 0.972640 0.899640 -0.484755 -0.599240 -0.726835 -0.137433E-31 1 2 3 4 5 6 7 103 15.041258963469220792125242670562856 31.041258963469220792125242670562856 60.482639835605759239643964773062840 6.4651042555301756724795384429405760E-0002 6.4687964479371800851596258250350289E-0002 6.6967623835179973193778768296956423E-0002 6.4578962894659895835614156355752954E-0002 6.4726739116477617801964201008546524E-0002 SE 0.57281193E-01 0.87318874E-01 0.41791902E-01 0.77592475E-01 0.15220401 0.10682485 0.56650687 by 7 2 0.972640 1.00000 0.972527 -0.394493 -0.485453 -0.600971 -0.126724E-31 t 28.146583 -10.699562 4.6511865 1.3612487 -2.1125345 -1.8252639 12.020459 elements (real*16) 3 0.899640 0.972527 1.00000 -0.330178 -0.395169 -0.487296 -0.112671E-31 4 -0.484755 -0.394493 -0.330178 1.00000 0.952560 0.834273 0.584987E-32 5 -0.599240 -0.485453 -0.395169 0.952560 1.00000 0.952600 0.889735E-32 6 -0.726835 -0.600971 -0.487296 0.834273 0.952600 1.00000 0.117243E-31 7 -0.137433E-31 -0.126724E-31 -0.112671E-31 0.584987E-32 0.889735E-32 0.117243E-31 1.00000 VPA Tests _________ beta from VPA BETA = Vector of .161227M+1 7 elements -.934274M+0 VPA - FM .194382M+0 .105623M+0 -.321536M+0 -.194984M+0 .680967M+1 Last Column is the residual Array of 1 .100000M+1 .972640M+0 .899640M+0 -.484755M+0 -.599240M+0 -.726835M+0 .957024M-59 1 2 3 4 5 6 7 Real*4 Tests _________ beta from Real*4 BETA 7 7 elements VPA - FM 3 .899640M+0 .972527M+0 .100000M+1 -.330178M+0 -.395169M+0 -.487296M+0 .628011M-59 4 -.484755M+0 -.394493M+0 -.330178M+0 .100000M+1 .952560M+0 .834273M+0 -.691398M-59 5 -.599240M+0 -.485453M+0 -.395169M+0 .952560M+0 .100000M+1 .952600M+0 -.839960M-59 6 -.726835M+0 -.600971M+0 -.487296M+0 .834273M+0 .952600M+0 .100000M+1 -.963458M-59 7 .957024M-59 .796435M-59 .628011M-59 -.691398M-59 -.839960M-59 -.963458M-59 .100000M+1 using inverse = Vector of 1.61238 by 2 .972640M+0 .100000M+1 .972527M+0 -.394493M+0 -.485453M+0 -.600971M+0 .796435M-59 7 elements (real*4) -0.942064 0.194891 0.105573 -0.320821 -0.195644 6.79072 Last Column is the residual Array of 1 1.00000 0.972640 0.899639 -0.484755 -0.599241 -0.726835 -0.895673E-01 1 2 3 4 5 6 7 beta from Real*4 by 7 2 0.972640 1.00000 0.972527 -0.394493 -0.485453 -0.600971 -0.918947E-01 elements (real*4) 3 0.899639 0.972527 1.00000 -0.330178 -0.395169 -0.487296 -0.891362E-01 4 -0.484755 -0.394493 -0.330178 1.00000 0.952560 0.834273 0.369165E-01 5 -0.599241 -0.485453 -0.395169 0.952560 1.00000 0.952600 0.452364E-01 6 -0.726835 -0.600971 -0.487296 0.834273 0.952600 1.00000 0.557397E-01 7 -0.895673E-01 -0.918947E-01 -0.891362E-01 0.369165E-01 0.452364E-01 0.557397E-01 1.00000 using QR R4_BETA = Vector of 1.61227 7 7 -0.934273 elements (real*4) 0.194382 0.105623 -0.321537 -0.194985 6.80968 104 Chapter 10 Last Column is the residual Array 1 2 3 4 5 6 7 of 1 1.00000 0.972640 0.899639 -0.484755 -0.599241 -0.726835 -0.144533E-05 7 by 2 0.972640 1.00000 0.972527 -0.394493 -0.485453 -0.600971 -0.151427E-05 7 elements (real*4) 3 0.899639 0.972527 1.00000 -0.330178 -0.395169 -0.487296 -0.153826E-05 4 -0.484755 -0.394493 -0.330178 1.00000 0.952560 0.834273 0.124983E-06 5 -0.599241 -0.485453 -0.395169 0.952560 1.00000 0.952600 0.420500E-06 6 -0.726835 -0.600971 -0.487296 0.834273 0.952600 1.00000 0.724758E-06 7 -0.144533E-05 -0.151427E-05 -0.153826E-05 0.124983E-06 0.420500E-06 0.724758E-06 1.00000 B34S Matrix Command Ending. Last Command reached. Space available in allocator Number variables used Number temp variables used 11856223, peak space used 90, peak number used 5108, # user temp clean 9183819 97 0 which validate the results reported in Table 10.21. At issue whether these accuracy patterns obtained above are unique to B34S or can be confirmed with other software. For that we turn to Matlab where the commands in Table 10.26 were run using two versions of Matlab. The pattern of the calculations found in Table 10.24 is replicated in Table 10.25 for real*8 and real*4 data using Matlab 2007a and Table 10.26 attempting the same problem using Matlab 2006b. Note that there are differences indicating that in these two releases of Matlab there were hidden and not announced changes that marginally impacted accuracy.16 However the patterns exhibited in both tables are similar and support the results obtained with B34S reported in Table 10.23. Using Matlab 2007a, real*4 gas data and the LU method produced correlations between |-.0264| and .0035 which are way larger than epsilon for this data type. Since most regression packages use LU or Cholesky to solve OLS, this finding shows the danger since the calculation of the model has clearly broken down. Both versions of Matlab produced error messages to warn the user. For Matlab 20006b the error message was "Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 5.224432e-008", which should alert users to problems. For the GE problem the message was "Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 8.358012e-009." The exact Matlab commands used for the Gas Data problem are listed in Table 10.24. The command getb34s brings in the x and y matrix with the appropriate lags for the two problems studied. 16 The Matlab 2006b results were done on a Windows XP Professional Dell 650 work station. The Matlab 2006b results were done on a Dell Latitude Windows XP Professional machine. Special Topics in OLS Estimation Table 10.24 Matlab Commands to Replicate Accuracy Results Obtained with B34S x=getb34s('xdata2.m'); y=getb34s('ydata2.m'); beta8=inv(x'*x)*x'*y; res8=y-x*beta8; yhat_8=x*beta8; newx8=x; newx8(:,7)=res8; disp('Matlab LU real*8 results') c8=corr(newx8) beta4=inv(single(x)'*single(x))*single(x)'*single(y); disp('++++++++++++++++ beta4-beta8 +++++++++++++++++++++++') beta4-beta8 yhat_4=single(x)*beta4; res4 =single(y)-single(x)*beta4; newx4=single(x); newx4(:,7)=res4; disp('Matlab LU real*4 results') c4=corr(newx4) betatest=[beta8 double(beta4) ] % qr disp('Matlab QR real*8 and QR real*4') [q8,r8]=qr(x,0); [q4,r4]=qr(single(x),0); yhat_q8=q8*q8'*y; yhat_q4=q4*q4'*single(y); % plot(yhat_q8-yhat_q4) % [yhat_q4 yhat_q8 yhat_8 yhat_4] res_q8= y -yhat_q8; res_q4=single(y)-yhat_q4; % Get residual another way via beta to test beta_q4=inv(r4)*q4'*single(y); % ' yhat_q4_alt=single(x)*beta_q4; res_q4_alt=single(y)-yhat_q4_alt; disp('beta_q4-beta8') beta_q4-beta8 newx8_q=x; newx4_q=single(x); newx8_q(:,7)=res_q8; newx4_q(:,7)=res_q4; c8_q=corr(newx8_q); c4_q=corr(newx4_q); disp('alt c4_q via beta') newx4_q_alt=single(x); newx4_q_alt(:,7)=res_q4_alt; c4_q_alt=corr(newx4_q_alt); testres=[res8 res_q8 res4 res_q4 res_q4_alt]; testaccr =[c8(7,:)' c8_q(7,:)' c4(7,:)' c4_q(7,:)' c4_q_alt(7,:)']'; testaccr disp('++++++++++++++++ Matlab Ending +++++++++++++++') quit 105 106 Chapter 10 Table 10.25. Correlation of the Residual and RHS Variables using Matlab 2006b ______________________ Grunfeld Example _____________________________________ GEF GEC Real*8 LU -1.2559e-015 4.3021e-016 Real*8 QR 8.3267e-017 2.0817e-017 Real*4 LU -4.4651e-007 1.4417e-007 Real*4 QR -9.9298e-008 2.4814e-007 Real*4 QR_2 -4.7489e-009 1.0605e-007 __________________________________________________________________________________________________ Gas Data of lag Order 3 Model Estimated: gasout = f(gasout(t-1),…,gasout(t-3),gasin(t-1),……,gasin(t-3), constant) Real*8 LU 2.4454e-010 2.4313e-010 2.233e-010 -1.1906e-010 -1.4591e-010 -1.7591e-010 Real*8 QR -1.4745e-015 -1.0686e-015 -1.027e-015 -8.9304e-015 -4.5935e-015 2.9837e-016 Real*4 LU -0.022544 -0.023669 -0.026415 0.0035028 0.0054714 0.0087814 Real*4 QR 2.6992e-006 4.2817e-006 5.239e-006 2.8188e-006 1.8091e-006 9.9824e-007 Real*4 QR_2 8.7203e-005 8.8714e-005 8.4771e-005 -2.5406e-005 -3.4171e-005 -4.6354e-005 _________________________________________________________________________________________________ All calculations done with Matlab Version 2006b. Correlation is done with the Matlab supplied command corr. The LU factorization and QR analysis is done in Matlab with the LAPACK software [1992 ]. Real*4 QR calculates the residual as e y Q ' Qy while Real*4 QR_2 uses e y x y xR 1Q ' y . Real*8 data was converted to real*4 in Matlab using the built-in function single( ). Table 10.26. Correlation of the Residual and RHS Variables using Matlab 2007a ______________________ Grunfeld Example _____________________________________ GEF GEC Real*8 LU 2.0123e-016 -4.4409e-016 Real*8 QR 4.9613e-016 2.6368e-016 Real*4 LU -1.8515e-006 -3.2783e-007 Real*4 QR -2.9802e-008 2.9802e-007 Real*4 QR_2 -7.0781e-008 4.4703e-008 __________________________________________________________________________________________________ Gas Data of lag Order 3 Model Estimated: gasout = f(gasout(t-1),…,gasout(t-3),gasin(t-1),……,gasin(t-3), constant) Real*8 LU 1.257e-010 1.3066e-010 1.2848e-010 -5.5661e-011 -6.6412e-011 -7.9434e-011 Real*8 QR -1.2436e-014 -1.1942e-014 -1.2615e-014 2.2346e-014 2.2511e-014 2.1032e-014 Real*4 LU -0.033102 -0.033387 -0.032592 0.015702 0.019042 0.022815 Real*4 QR 2.0284e-006 1.682e-006 1.3784e-006 -2.6099e-006 -2.5911e-006 -2.6249e-006 Real*4 QR_2 -1.2402e-005 -1.1405e-005 -1.027e-005 1.1327e-005 1.2809e-005 1.3571e-005 _____________________________________________________________________________________ For a discussion of what is calculated, see Table 10.25. Matlab documentation indicates that the LAPACK routines used were DLANGE, DGETRF, DGECON, DGETRI and SLANGE, SGETRF, SGECON, SGETRI for real*8 and real*4 respectively for the inv( ) command and DGEQRF, DORGQR and SGEQRF, SORGQR for the Matlab qr( ) command.. 10.7 Conclusion This chapter has illustrated the QR approach to regression analysis and the associated PC regression. A number of examples were used to illustrate various problems. The Wampler data set shows the effects on accuracy of rank problems in the X matrix, given the y vector, and changes in the y vector, given the X matrix. While most researchers realize that problems can occur with X close to not being full rank, only a few realize that the y vector can also cause problems. The Longley data set was used to show the effects on accuracy from estimating the coefficients with ( X ' X )1 X ' y , with the deviations-from-the-means approach, or with the QR approach. The PC regression was shown to provide important information about the structure of the OLS problem, Special Topics in OLS Estimation 107 especially in the case of collinearity or near collinearity. The ridge lasso and elastic net approaches were illustrated to show alternative approaches to data shrinkage. The LTS model, a resistant estimation method, illustrated how outliers can be dropped systematically and the changes in the estimated coefficients and t tests observed. The final example shows using two relatively easy problems the relationship between data precision, and calculation method in imposing the OLS assumption that the error of the model should be uncorrelated with the right hand side variables. The main finding of this research is that real*4 calculation of the OLS model is subject to serious loss of accuracy, especially when the QR method is not employed.