Nonlinear Regression Functions (SW Chapter 8) Outline 1. Nonlinear regression functions – general comments 2. Nonlinear functions of one variable 3. Nonlinear functions of two variables: interactions 1 The TestScore – STR relation looks linear (maybe)… 2 But the TestScore – Income relation looks nonlinear... 3 The general nonlinear population regression function Yi = f(X1i, X2i,…, Xki) + ui, i = 1,…, n Assumptions 1. E(ui| X 1i,X2i,…,Xki) = 0 (same); implies that f is the conditional expectation of Y given the X’s. 2. (X1i,…,Xki,Yi) are i.i.d. (same). 3. Big outliers are rare (same idea ; the precise mathematical condition depends on the specific f). 4. No perfect multicollinearity (same idea; the precise statement depends on the specific f). 4 5 Nonlinear Functions of a Single Independent Variable (SW Section 8.2) We’ll look at two complementary approaches: 1. Polynomials in X 2. Logarithmic transformations 6 1. Polynomials in X Approximate the population regression function by a polynomial: Yi = b0 + b1Xi + b2 X i2 +…+ br X ir + ui · Still linear in the parameters? · Therefore … 7 Example: TestScore vs. Income Incomei = average district income in the ith district (thousands of dollars per capita) TestScorei = b0 + b1Incomei + b2(Incomei)2 + ui TestScorei = b0 + b1Incomei + b2(Incomei)2 + b3(Incomei)3 + ui 8 Estimation in STATA generate avginc2 = avginc*avginc; reg testscr avginc avginc2, r; Regression with robust standard errors Create a new regressor Number of obs F( 2, 417) Prob > F R-squared Root MSE = = = = = 420 428.52 0.0000 0.5562 12.724 -----------------------------------------------------------------------------| Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------avginc | 3.850995 .2680941 14.36 0.000 3.32401 4.377979 avginc2 | -.0423085 .0047803 -8.85 0.000 -.051705 -.0329119 _cons | 607.3017 2.901754 209.29 0.000 601.5978 613.0056 ------------------------------------------------------------------------------ 9 Interpretation . graph twoway scatter testscr avginc || connected yhat avginc, sort msymbol(none) || connected yhat2 avginc, sort msymbol(none) See Chapter 3 of Statistics with Stata, especially pages 79 & 118 10 Interpretation (b) Compute “effects” for different values of X ! TestScore = 607.3 + 3.85Incomei – 0.0423(Incomei)2 (2.9) (0.27) (0.0048) Predicted change in TestScore for a change in income from $5,000 per capita to $6,000 per capita: ! DTestScore = 607.3 + 3.85*6 – 0.0423*62 - (607.3 + 3.85*5 – 0.0423*52) = 3.4 11 ! TestScore = 607.3 + 3.85Incomei – 0.0423(Incomei)2 Predicted “effects” for different values of X: Change in Income ($1000 per capita) from 5 to 6 from 25 to 26 from 45 to 46 ! DTestScore 3.4 1.7 0.0 How does the effect change as income increases? perhaps, a declining marginal benefit of an increase in school budgets? Caution! What is the effect of a change from 65 to 66? Don’t extrapolate outside the range of the data! 12 Interpretation 13 Marginal effects in STATA 14 Estimation of a cubic in STATA test avginc2 avginc3; ( 1) ( 2) Execute the test command after running the regression avginc2 = 0.0 avginc3 = 0.0 F( 2, 416) = 37.69 Prob > F = 0.0000 Write down H0 and H1 … conclusion? 15 Plotting a cubic in STATA . reg testscr avginc avginc2 avginc3, rob . predict yhat3 (option xb assumed; fitted values) . graph twoway scatter testscr avginc || connected yhat2 avginc, sort msymbol(none) || connected yhat3 avginc, sort msymbol(T) 16 Marginal effects in STATA 17 Marginal effects in STATA 18 Ramsey’s RESET Test: REgression Specification Error Test • Consider the model (1) Yi = b0 + b1X1i + ...+ bk X ki + ui • General test for misspecification of functional form • If LSA #1 holds, then no non-linear function of the X’s should be significant when added to the model. • Consider (2) Yi = b0 + b1X1i + ...+ bk X ki + d0Yˆi + d1Yˆi + ...+ ui 2 3 • Null hypothesis is that (1) is correctly specified • How many powers of predicted values to include? • Conduct F-test on powers of predicted values •J.B. Ramsey (1969), Tests for Specification Error in Classical Linear Least Squares Regression Analysis. Journal of the Royal Statistical Society, Series B 31, 350–371 19 Ramsey’s RESET Test . reg test str avginc, r Linear regression Number of obs F( 2, 417) Prob > F R-squared Root MSE = = = = = 420 132.65 0.0000 0.5115 13.349 -----------------------------------------------------------------------------| Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------str | -.6487401 .3533403 -1.84 0.067 -1.34329 .04581 avginc | 1.839112 .114733 16.03 0.000 1.613585 2.064639 _cons | 638.7292 7.301234 87.48 0.000 624.3773 653.081 -----------------------------------------------------------------------------. estat ovtest (can just type . ovtest) Ramsey RESET test using powers of the fitted values of testscr Ho: model has no omitted variables F(3, 414) = 18.36 Prob > F = 0.0000 20 Ramsey’s RESET Test . reg test str avginc avginc2, r Linear regression Number of obs F( 3, 416) Prob > F R-squared Root MSE = = = = = 420 286.55 0.0000 0.5638 12.629 -----------------------------------------------------------------------------| Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------str | -.9099512 .3545374 -2.57 0.011 -1.606859 -.2130432 avginc | 3.881859 .2709564 14.33 0.000 3.349245 4.414474 avginc2 | -.044157 .0049606 -8.90 0.000 -.053908 -.034406 _cons | 625.2308 7.087793 88.21 0.000 611.2984 639.1631 -----------------------------------------------------------------------------. estat ovtest Ramsey RESET test using powers of the fitted values of testscr Ho: model has no omitted variables F(3, 413) = 2.48 Prob > F = 0.0605 21 Ramsey’s RESET Test . reg test str avginc avginc2 avginc3, r Linear regression Number of obs F( 4, 415) Prob > F R-squared Root MSE = = = = = 420 207.23 0.0000 0.5663 12.608 -----------------------------------------------------------------------------| Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------str | -.9277523 .3562919 -2.60 0.010 -1.628114 -.2273905 avginc | 5.124736 .7045403 7.27 0.000 3.739824 6.509649 avginc2 | -.1011073 .0287052 -3.52 0.000 -.157533 -.0446815 avginc3 | .0007293 .0003414 2.14 0.033 .0000582 .0014003 _cons | 617.8974 7.926373 77.95 0.000 602.3165 633.4782 -----------------------------------------------------------------------------. estat ovtest Ramsey RESET test using powers of the fitted values of testscr Ho: model has no omitted variables F(3, 412) = 1.79 Prob > F = 0.1490 22 Ramsey’s RESET Test . reg test str el_pct meal_pct , r Linear regression Number of obs F( 3, 416) Prob > F R-squared Root MSE = = = = = 420 453.48 0.0000 0.7745 9.0801 -----------------------------------------------------------------------------| Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------str | -.9983092 .2700799 -3.70 0.000 -1.529201 -.4674178 el_pct | -.1215733 .0328317 -3.70 0.000 -.18611 -.0570366 meal_pct | -.5473456 .0241072 -22.70 0.000 -.5947328 -.4999583 _cons | 700.15 5.56845 125.74 0.000 689.2042 711.0958 -----------------------------------------------------------------------------. estat ovtest Ramsey RESET test using powers of the fitted values of testscr Ho: model has no omitted variables F(3, 413) = 6.29 Prob > F = 0.0004 23 Ramsey’s RESET Test . reg test str el_pct meal_pct avginc Linear regression , r Number of obs F( 4, 415) Prob > F R-squared Root MSE = = = = = 420 467.42 0.0000 0.8053 8.4477 -----------------------------------------------------------------------------| Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------str | -.5603892 .2550641 -2.20 0.029 -1.061768 -.0590105 el_pct | -.1943282 .0332445 -5.85 0.000 -.2596768 -.1289795 meal_pct | -.3963661 .0302302 -13.11 0.000 -.4557895 -.3369427 avginc | .674984 .0837161 8.06 0.000 .5104236 .8395444 _cons | 675.6082 6.201865 108.94 0.000 663.4172 687.7992 -----------------------------------------------------------------------------. estat ovtest Ramsey RESET test using powers of the fitted values of testscr Ho: model has no omitted variables F(3, 412) = 0.47 Prob > F = 0.7014 24 Ramsey’s RESET Test: replicated . predict yh (option xb assumed; fitted values) . sum yh Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------yh | 420 654.1565 17.09817 614.9183 702.8387 . gen yhz = (yh-r(mean))/r(sd) . sum yh* Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------yh | 420 654.1565 17.09817 614.9183 702.8387 yhz | 420 1.22e-09 1 -2.294882 2.847214 . gen yhz2=yhz*yhz . gen yhz3=yhz^3 . gen yhz4=yhz^4 25 Ramsey’s RESET Test: replicated . reg test str el meal avginc yhz2 yhz3 yhz4 Source | SS df MS -------------+-----------------------------Model | 122595.145 7 17513.5921 Residual | 29514.4488 412 71.6370116 -------------+-----------------------------Total | 152109.594 419 363.030056 Number of obs F( 7, 412) Prob > F R-squared Adj R-squared Root MSE = = = = = = 420 244.48 0.0000 0.8060 0.8027 8.4639 -----------------------------------------------------------------------------testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------str | -.5500585 .2336368 -2.35 0.019 -1.009327 -.0907896 el_pct | -.2170374 .0407058 -5.33 0.000 -.2970544 -.1370204 meal_pct | -.400967 .0289303 -13.86 0.000 -.4578364 -.3440976 avginc | .6476592 .1505253 4.30 0.000 .3517657 .9435527 yhz2 | .7652051 .915534 0.84 0.404 -1.034495 2.564906 yhz3 | -.0822669 .3243362 -0.25 0.800 -.7198272 .5552933 yhz4 | -.0650369 .1767693 -0.37 0.713 -.412519 .2824453 _cons | 675.8077 5.443279 124.15 0.000 665.1076 686.5077 -----------------------------------------------------------------------------. test yhz2 yhz3 yhz4 ( 1) ( 2) ( 3) yhz2 = 0 yhz3 = 0 yhz4 = 0 F( 3, 412) = Prob > F = 0.47 0.7014 26 2. Logarithmic functions of Y and/or X 27 The 3 log specifications Case I. linear-log Population regression function II. log-linear ln(Yi) = b0 + b1Xi + ui III. log-log Yi = b0 + b1ln(Xi) + ui ln(Yi) = b0 + b1ln(Xi) + ui · 28 I. Linear-log population regression function 29 Linear-log case, continued 30 Example: TestScore vs. ln(Income) TestScore = 557.8 + 36.42*ln(Incomei) (3.8) (1.40) so a 1% increase in Income is associated with an increase in TestScore of 0.36 points on the test. · Standard errors, confidence intervals, R2 – all the usual tools of regression apply here. · How does this compare to the cubic model? 31 Linear-log vs. Cubic models 32 II. Log-linear population regression function 33 Log-linear case, continued 34 III. Log-log population regression function 35 Log-log case, continued 36 Example: ln( TestScore) vs. ln( Income) · ln(TestScore) = 6.336 + 0.0554*ln(Incomei) (0.006) (0.0021) An 1% increase in Income is associated with an increase of .0554% in TestScore (Income up by a factor of 1.01, TestScore up by a factor of 1.000554) 37 Example: ln( TestScore) vs. ln( Income), ctd. 38 The log-linear and log-log specifications: 39 40 41 42 43 44 Other nonlinear functions (and nonlinear least squares) (SW App. 8.1) The foregoing nonlinear regression functions have flaws… · Polynomial … · Linear-log … · How about a nonlinear function such as Y = b 0 - a e- b1 X b0, b1, and a are unknown parameters. This is called a negative exponential growth curve 45 Negative exponential growth We want to estimate the parameters of, Yi = b 0 - a e- b1 X i + ui or Yi = b 0 éë1 - e - b1 ( X i - b 2 ) ùû + ui (*) Compare model (*) to linear-log or cubic models: Yi = b0 + b1ln(Xi) + ui Yi = b0 + b1Xi + b2 X i2 + b2 X i3 + ui The linear-log and polynomial models are linear in the parameters b 0 and b 1 – but the model (*) is not. 46 Nonlinear Least Squares 47 48 Negative exponential growth; RMSE = 12.675 Linear-log; RMSE = 12.618 (oh well…) 49