LM (Lagrange multiplier) Test on Non-linearities and Model
Specification
Data 4-8 sub= Subscription Rate, number of subscribers (demand for cable
TV)=Dependent Variable home= Number of homes passed by each system inst= installation fee svc= monthly service charge tv= number of tv signals carried by each cable system age= age of each system in years air= number of free tv signals received y= per capita income in dollars
Original MODEL (with all variables) reg sub home inst svc tv age air y
Source SS df MS
Model 38941.0866 7 5563.01237
Residual 4923.91489 32 153.87234
Total 43865.0015 39 1124.74363
Number of obs = 40
F( 7, 32) = 36.15
Prob > F
R-squared
= 0.0000
= 0.8877
Adj R-squared = 0.8632
Root MSE = 12.405 sub Coef. Std. Err. t P>t [95% Conf. Interval] home .4055489 .0350034 11.59 0.000 .3342493 inst -.5264195 .4760743 -1.11 0.277 -1.496151
.4768484
.4433121 svc tv age air y
2.038732 2.126968 0.96
.7565077 .6878111 1.10
1.193511 .5026509 2.37
-5.111142 1.518459 -3.37
0.345 -2.293759
0.280 -.6445177
0.024 .1696448
0.002 -8.204143
.0016552 .0034692 0.48 0.637 -.0054114
_cons -6.807724 26.65981 -0.26 0.800 -61.11198
6.371224
2.157533
2.217378
-2.018142
.0087217
47.49653
For the LM test, we need the residuals, u hat. Under this equation output, go to PROC/ Make residuals. Residuals will be stored as RESID01 (if not stored in the data as resid previously!)
1
Step 1: Type predict uhat, res to create the predicted residuals from this model.
Regress uhat on the original variables (all linear variables) and squared values of all these variables. This is called the auxiliary regression.
To generate the squared values of these variables, use generate command and type homesq=home^2 and repeat this for every variable. The output is below.
Auxiliary Regression
Dependent Variable: uhat reg uhat home inst svc tv age air y homesq instsq svcsq tvsq agesq ysq airsq
Source SS df MS
Model 2707.24898 14 193.374927
Residual 2216.66599 25 88.6666394
Total 4923.91497 39 126.25423
Number of obs =
F( 14, 25)
Prob > F
R-squared
40
= 2.18
= 0.0431
= 0.5498
Adj R-squared = 0.2977
Root MSE = 9.4163 uhat Coef. Std. Err. t P>t [95% Conf. Interval] home .0338516 .0838807 0.40 0.690 -.138904 .2066071 inst .9183878 svc 10.10552
2.124189 0.43
19.19416 0.53
0.669 -3.456461
0.603 -29.42559
5.293237
49.63664 tv -1.417977 2.654203 -0.53 0.598 -6.884409 4.048455 age -2.550652 1.462274 -1.74 0.093 -5.562261 .4609572 air 23.82289 5.239157 4.55 0.000 13.03264 34.61313 y .0828789 .0525753 1.58 0.128 -.025402 .1911597 homesq .0002207 .0002839 0.78 0.444 -.0003639 .0008053 instsq -.0210408 .0655122 -0.32 0.751 -.1559657 .113884 svcsq -.7789781 1.285439 -0.61 0.550 -3.426389 1.868433 agesq .1392755 .0733663 1.90 0.069 -.0118254 .2903763 tvsq .0484437 .1016867 0.48 ysq -4.55e-06 2.83e-06 -1.60
0.638 -.160984
0.121 -.0000104
.2578714
1.29e-06 airsq -1.582278 .3731794 -4.24 0.000 -2.350855 -.8137009
_cons -481.4361 264.2859 -1.82 0.080 -1025.743 62.87092
Step 2: Test the hypothesis that the coefficients of all squared variables equal to zero by computing the following statistics.
N*Rsq= 40*0.5498=22. This statistics has a Chi-squared distribution with df equal to 7
(the number of squared variables).
2
If N*Rsq > chi-squared critical with 7 df at significance of 5%, then reject the null that the non-linearities (squared terms) have zero coefficients in favor of the alternative hypothesis that non-linearities are present in the regression equation. Since chi-squared critical is 14.06
(read from the table), we reject the null and conclude that some of the squared terms should belong to the model.
Step 3: Include only those variables in the revised model with p-values less than 0.5 (in the auxiliary model).
The auxiliary regression helps us determine the new variables to be added to the model.
We use the arbitrary rule that only those squared variables with p-values of less than
50% should be added to the model to capture possible non-linearities. According to this rule, squared home, air, y and age are to be included in the revised model. The revised regression output (1) is below.
REVISED REGRESSION reg sub home inst svc tv age air y airsq homesq agesq ysq
Source SS df MS Number of obs= 40
F( 11, 28) = 45.85
Model 41557.8147 11 3777.98315
Residual 2307.18681 28 82.3995289
Total
Prob > F
R-squared
= 0.0000
= 0.9474
43865.0015 39 1124.74363
Adj R-squared = 0.9267
Root MSE = 9.0774 sub Coef. Std. Err. t P>t [95% Conf. Interval] home .4319163 .0792356 5.45 0.000 .2696096 .594223 inst -.1820829 .3957133 -0.46 0.649 -.9926647 .628499 svc .212256 1.966629 0.11 0.915 -3.8162 4.240712 tv .6961586 .5292162 1.32 0.199 -.3878917 1.780209 age -1.071755 1.230453 -0.87 0.391 -3.592225 1.448715 air 18.19856 4.882402 3.73 0.001 8.197409 28.1997 y .0756733 .0475538 1.59 0.123 -.0217363 .1730829 airsq -1.557916 .3382912 -4.61 0.000 -2.250874 -.8649579 homesq .000224 .0002689 0.83 0.412 -.0003268 .0007748 agesq .1173853 .0579629 2.03 0.052 -.0013463 .2361169 ysq -4.05e-06 2.56e-06 -1.58 0.124 -9.29e-06 1.19e-06
_cons -407.0792 211.7802 -1.92 0.065 -840.8913 26.73291
3
Step4: In order to arrive at the final model, successively eliminate the variables with the highest p-value but one at a time. The final model should look like this with non-linear variables (squared terms).
All variables are significant at 10% and the based on the R-sq, the model has a good fit.
FINAL MODEL reg sub home age air y airsq agesq ysq
Source | SS df MS Number of obs = 40
-------------+------------------------------ F( 7, 32) = 74.94
Model | 41343.0677 7 5906.15253 Prob > F = 0.0000
Residual | 2521.9338 32 78.8104313 R-squared = 0.9425
-------------+------------------------------ Adj R-squared = 0.9299
Total | 43865.0015 39 1124.74363 Root MSE = 8.8775
------------------------------------------------------------------------------
sub | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
home | .4959548 .0283002 17.52 0.000 .4383091 .5536005
age | -1.557536 .9037355 -1.72 0.094 -3.398385 .2833129
air | 7.30471 4.341038 3.99 0.000 8.462305 26.14711
y | .1108304 .034791 3.19 0.003 .0399634 .1816974
airsq | -1.417655 .2919451 -4.86 0.000 -2.012328 -.8229826
agesq | .1392119 .0437577 3.18 0.003 .0500803 .2283435
ysq | -5.95e-06 1.88e-06 -3.16 0.003 -9.78e-06 -2.12e-06
_cons | -562.6759 158.0816 -3.56 0.001 -884.6775 -240.6743
------------------------------------------------------------------------------
4