4. Nonlinear regression functions Up to now: • Population regression function was assumed to be linear • The slope(s) of the population regression function is (are) constant • The effect on Y of a unit-change in the regressor Xj (j = 1, . . . , k) does not depend on the value of Xj Now: • Two groups of methods for detecting and modeling nonlinear population regression functions 72 Group #1: • Effect on Y of a change in one regressor, say X1, depends on the value of X1 itself • Example: reducing class size by one student per teacher (that is a change in STR) might have a larger effect on TEST SCORE when class sizes are already managebly small Group #2: • Effect on Y of a change in one regressor, say X1, depends on the value of another regressor, say X2 • Example: students still learning English might benefit from having more one-on-one attention • Effect on TEST SCORE of reducing STR is greater in districts with higher values of PCTEL 73 Population regression functions with different slopes Slope depends on the value of X1 Y Y Constant slope X1 X1 Slope depends on the value of X2 Y Population regression function when X2=1 Population regression function when X2=0 X1 74 4.1. A general strategy for modeling nonlinear regression functions Empirical example: • Consider the student-performance dataset • Generally, we would expect that the economic background of the students might have an impact on TEST SCORES (’rich’ students perform better than ’poor’ students) • The economic background is measured by the variable AVGINC (average per capita income in the school district in thousands 1998 US-dollars) 75 Test score vs. district income with a linear OLS regression function 720 700 Test score 680 660 640 620 600 0 10 20 30 40 50 District income (thousands of dollars) 60 76 Scatterplot characteristics: • The variable AVGINC and TEST SCORE are highly correlated (correlation coefficient: 0.71) • For incomes below 10000 US-$ or above 40000 US-$ the points are below the OLS-line • For incomes between 15000 US-$ and 30000 US-$ the points are above the OLS-line −→ nonlinear relationship between TEST SCORE and AVGINC Possibly: • Quadratic relationship between both variables: TEST SCOREi = β0 + β1 · AVGINCi + β2 · AVGINC2 i + ui (quadratic regression model) (4.1) 77 Estimation of model (4.1): • Eq. (4.1) is a variant of the multiple regression model Yi = β0 + β1 · X1i + . . . + βk · Xki + ui with k = 2 and X1i = AVGINCi X2i = AVGINC2 i −→ OLS estimation technique is applicable • We can test the null hypothesis that the population regression function is linear versus the alternative that it is quadratic by conducting the test H0 : β2 = 0 vs. H1 : β2 6= 0 on the basis of the conventional t-statictic 78 OLS estimation results of the quadratic model (4.1) Dependent Variable: TEST_SCORE Method: Least Squares Date: 10/04/12 Time: 18:31 Sample: 1 420 Included observations: 420 White heteroskedasticity-consistent standard errors & covariance Variable Coefficient Std. Error t-Statistic Prob. C AVGINC AVGINC_SQ 607.3017 3.850995 -0.042308 2.901754 0.268094 0.004780 209.2878 14.36434 -8.850509 0.0000 0.0000 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 0.556173 0.554045 12.72381 67510.32 -1662.708 261.2778 0.000000 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 654.1565 19.05335 7.931944 7.960803 7.943350 0.951439 79 Test scores vs. district income with a quadratic OLS regression function 720 700 Test score 680 660 640 620 600 0 10 20 30 40 50 District income (thousands of dollars) 60 80 Obviously: • β2 is significantly different from zero at all conventional levels −→ quadratic model fits the data better than the linear model Next: • Consider the general nonlinear regression model Yi = f (X1i, X2i, . . . , Xki) + ui (i = 1, . . . , n) (4.2) where f (X1i, X2i, . . . , Xki) is a general nonlinear population regression function • Under the OLS assumptions on Slide 18 we have E(Yi|X1i, X2i, . . . , Xki) = f (X1i, X2i, . . . , Xki) 81 Question: • What is the expected effect on Y of a change in one regressor, say of a change ∆Xj in Xj (j = 1, . . . , k)? Answer: • The expected change in Y, ∆Y , associated with a change in the regressor Xj , ∆Xj , holding all other regressors constant, is the difference between the value of the population regression function before and after changing Xj , holding all other regressors constant: ∆Y = f (X1, . . . , Xj + ∆Xj , . . . , Xk ) − f (X1, . . . , Xj , . . . , Xk ) (4.3) 82 Remarks: • Note that the specific parametric form of f (X1, X2, . . . , Xk ) is unknown • f (X1, X2, . . . , Xk ) contains unknown parameters that have to estimated from the data • Let fˆ(X1, X2, . . . , Xk ) denote the predicted value of Y based on the estimator fˆ of the population regression function • Then, the predicted change in Y is ∆Ŷ = fˆ(X1, . . . , Xj + ∆Xj , . . . , Xk ) − fˆ(X1, . . . , Xj , . . . , Xk ) (4.4) 83 Example: • Consider the quadratic OLS regression of TEST SCORE on AVGINC and AVGINC2 on Slide 79 with the estimated coefficients β̂0 = 607.3017, β̂1 = 3.8510, β̂2 = −0.0423 • An increase in district income from 10 to 11 (i.e. from 10000 US-$ per capita to 11000 US-$) yields the estimated effect ∆Ŷ = (β̂0 + β̂1 ·11+ β̂2 ·112)−(β̂0 + β̂1 ·10+ β̂2 ·102) = 2.9627 • An increase in district income from 40 to 41 (i.e. from 40000 US-$ per capita to 41000 US-$) yields the estimated effect ∆Ŷ = (β̂0 + β̂1 ·41+ β̂2 ·412)−(β̂0 + β̂1 ·40+ β̂2 ·402) = 0.4247 84 Example: [continued] • Obviously, a change of income of 1000 US-$ is associated with a larger change in predicted test scores if the initial income is low (10000 US-$) than if it is high (40000 US-$) • The predicted changes are 2.9627 points versus 0.4247 points Remarks: • The estimator ∆Ŷ of the effect on Y of changing the regressor Xj depends on the estimator of the population regression function, fˆ, which varies from one sample to the next −→ ∆Ŷ contains sampling error • There are several techniques for computing the standard error SE(∆Ŷ ) (see Stock & Watson, 2011, pp. 302, 303) 85 Strategy for modeling nonlinear regressions: 1. Identify a possible nonlinear relationship (use economic theory and general knowledge) 2. Specify a nonlinear function and estimate parameters by OLS (see next section for various nonlinear functions) 3. Check if the nonlinear model improves upon a linear model (use t- and F -statistics) 4. Plot the estimated nonlinear regression function 5. Estimate the effect on Y of a change in the regressor Xj 86 4.2. Nonlinear functions of a single regressor Outline: • Description of most important nonlinear regression functions (polynomials and logarithms) • We restrict attention to regressions with a single regressor • Extensions to multiple regressors are straightforward • We treat the alternative nonlinear functions separately, although it is unproblematic to combine them in one regression function 87 4.2.1. Polynomials Definition 4.1: (Polynomial regression model) We define the general polynomial regression model of degree r as (4.5) Yi = β0 + β1 · Xi + β2 · Xi2 + . . . + βr · Xir + ui. When r = 2 or r = 3, we call Eq. (4.5) the quadratic or the cubic regression model, respectively. Remarks: • We interpret Xi, Xi2, . . . , Xir as the r distinct regressors X1i, X2i, . . . , Xri • We estimate the parameters β0, β1, . . . , βr via OLS by regressing Yi against Xi, Xi2, . . . , Xir 88 Test of linear versus polynomial specification: • If the ’true’ regression is linear, then the terms Xi2, Xi3, . . . , Xir do not enter the population regression function (4.5) • Hypothesis test: H0: ’Regression is linear’ vs. H1: ’Regression is polynomial of degree r’ • In probabilistic terms: H0 : β2 = 0, β3 = 0, . . . , βr = 0 vs. H1 : at least one βj 6= 0 (j = 2 . . . , r) • Use the F -testing strategy as decribed in Section 3.4 to test this specification issue 89 Which polynomial degree? • Trade-off between (1) flexibility in the shape of the regression function and (2) the precision of estimated coefficients • Include just enough polynomial terms to model the nonlinear regression function adequately, but no more Sequential testing procedure: 1. Pick a maximum value of r and perform OLS estimation (try a maximal r = 4) 2. Use the t-statistic to test H0 : βr = 0 vs. H1 : βr 6= 0 3. If you reject H0 in Step 2 use the polynomial of degree r and stop the procedure 90 Sequential testing procedure: [continued] 4. If you do not reject H0 in Step 2, eliminate the Xir -term from the regression and estimate a polynomial of degree r −1. Use the t-statistic to test H0 : βr−1 = 0 vs. H1 : βr−1 6= 0 5. If you reject H0 in Step 4, use the polynomial of degree r − 1 and stop the procedure 6. If you do not reject H0 in Step 4, continue this procedure until the coefficient on the highest power in your polynomial is statistically significant 91 4.2.2. Logarithms Natural logarithm [ln(x)]: • ln(x) is the most important nonlinear function in economics • Logs convert changes in variables into percentage changes Logs and pecentages: • Consider a variable x and a small change in x, ∆x • The percentage change in x is given by 100 · (∆x/x) • The following approximation holds: ∆x x (difference in logs approximates the percentage change devided by 100) ln(x + ∆x) − ln(x) ≈ 92 Definition 4.2: (Logarithmic regression models) We consider the following three types of regression models: Yi = β0 + β1 · ln(Xi) + ui, ln(Yi) = β0 + β1 · Xi + ui, ln(Yi) = β0 + β1 · ln(Xi) + ui. (4.6) (4.7) (4.8) We refer to the models (4.6) – (4.8) as the linear-log, the loglinear and the log-log model, respectively. Remarks: • The regression models (4.6) – (4.8) are conventional regression models with a single regressor −→ OLS estimation technique applies (provided that the OLS assumptions are satisfied) 93 Remarks: [continued] • The three models (4.6) – (4.8) differ in their interpretation of the coefficient β1 Interpretation of β1: • In the linear-log model (4.6) a 1%-change in X is associated with a change in Y of 0.01β1 • In the log-linear model (4.7) a change in X by one unit (∆X = 1) is associated with a 100β1%-change in Y • In the log-log model (4.8) a 1%-change in X is associated with a β1% change in Y , that is β1 is the elasticity of Y with respect to X (see class for details) 94 Remarks: • Which of the log regression models best fits the data? • Only the log-linear and the log-log models (4.7) and (4.8) can be compared via their R̄2 values • The linear-log model (4.6) cannot be compared with the other log models via the R̄2 values since the dependent variables are different (Yi vs. ln(Yi)) −→ Use economic theory and other expert knowledge of the specific data problem at hand to decide whether it makes sense to specify Y in logarithms 95 4.3. Interactions between regressors Up to now: • Nonlinear relationship between Y and the regressor X depends on the values of the regressor X itself Now: • The effect on Y of a change in one regressor, say X1, depends on the value of another regressor, say X2 −→ Interactions between the regressors 96 4.3.1. Interactions between two dummy regressors Definition 4.3: (Dummy variables) We consider a potential regressor that may indicate the presence or absence of a qualitative characteristic or an attribute (such as male or female, catholic or non-catholic and so forth). We quantify such attributes by constructing artificial variables of the form Di = ( 1 if the attribute is present for the ith observation 0 if the attribute is not present for the ith observation for i = 1, . . . , n. We call variables like Di dummy variables (or binary or indicator variables). 97 Remarks: • We have already made use of dummy variables on the Slides 34–36 • Dummies are essentially nominal scale (qualitative) variables that have been quantified • Note that a dummy variable can only assume the two values 0 and 1 • Dummy regressors as specified in Definition 4.3 can be incorporated in regression models just as easily as any other quantitative (continuous) regressor 98 Consider the following empirical problem: • Assume you have a data set containing the dependent variable (log) earnings, that is Yi = ln(EARNINGSi) (i = 1, . . . , n) and the two dummy variables D1i = ( 1 if the ith worker has a college degree 0 otherwise D2i = ( 1 if the ith worker is female 0 otherwise • You aim at analyzing the effects of a worker’s schooling (college degree or not) and the worker’s gender (female or male) on the worker’s earnings 99 Empirical problem: [continued] • Consider the intuitive regression model Yi = β0 + β1 · D1i + β2 · D2i + ui (4.9) • Interpretation of parameters: β1 is the effect on (log) earnings of having a college degree holding gender constant β2 is the effect on (log) earnings of being female holding schooling constant • The limitation of this model is that the effect on earnings of having a college degree is the same for men and women 100 Removal of this limitation: • Augmenting the regression model (4.9) by the interaction term (D1i × D2i): Yi = β0 + β1 · D1i + β2 · D2i + β3 · (D1i × D2i) + ui (4.10) • The interaction term (D1i × D2i) in (4.10) allows the population effect on log earnings (Yi) of having a college degree (that is changing D1i from D1i = 0 to D1i = 1) to depend on gender D2i Mathematical background: • Use Formula (4.3) on Slide 82 to compute the expected effect on Y , ∆Y , resulting from a change in D1i from 0 to 1 given the fixed value d2 for D2i 101 Mathematical background: [continued] • We have E(Yi|D1i = 0, D2i = d2) = β0 + β1 × 0 + β2 × d2 + β3 × (0 × d2) = β0 + β2 × d2 and E(Yi|D1i = 1, D2i = d2) = β0 + β1 × 1 + β2 × d2 + β3 × (1 × d2) = β0 + β1 + β2 × d2 + β3 × d2 • This yields the expected effect on Y : ∆Y = E(Yi|D1i = 1, D2i = d2) − E(Yi|D1i = 0, D2i = d2) = β0 + β1 + β2 × d2 + β3 × d2 − β0 − β2 × d2 = β1 + β3 × d2 (4.11) 102 Interpretation of (4.11): • The expected effect of acquiring a college degree (that is a unit change in D1i) depends on the person’s gender: ∆Y = ( if the worker is male (d2 = 0) β1 β1 + β3 if the worker is female (d2 = 1) −→ The coefficient β3 on the interaction term (D1i × D2i) in regression (4.10) is the difference in the effect of acquiring a college degree for women versus men Empirical exercise: • Interaction between the student-teacher ratio and the percentage of English learners in a dummy regression model of the form (4.10) (see class) 103 4.3.2. Interactions between a continuous and a dummy regressor Consider the following data set: • The dependent variable is (log) earnings, that is Yi = ln(EARNINGSi) (i = 1, . . . , n) • We consider the dummy regressor Di = ( 1 if the ith worker has a college degree 0 otherwise • We consider the continuous regressor Xi = individual’s years of work experience 104 Three alternative regression models: • Specification with both regressors, no interaction term Yi = β0 + β1 · Xi + β2 · Di + ui (4.12) • Specification with both regressors plus interaction term Yi = β0 + β1 · Xi + β2 · Di + β3 · (Xi × Di) + ui (4.13) • Specification with continuous regressor plus interaction term Yi = β0 + β1 · Xi + β2 · (Xi × Di) + ui (4.14) Question: • For each of the specifications (4.12) – (4.14), what are the expected effects on (log) earnings (Yi) of having a college degree (that is from changing Di from Di = 0 to Di = 1)? 105 Expected effects: • For specification (4.12) we have ∆Y = E(Yi|Di = 1, Xi) − E(Yi|Di = 0, Xi) = β0 + β1 · Xi + β2 − β0 − β1 · Xi = β2 (4.15) • By analogous calculations, we find for the specifications (4.13) and (4.14) ∆Y = β2 + β3 · Xi (4.16) ∆Y = β2 · Xi (4.17) and 106 Remarks: • The effects ∆Y = E(Yi|Di = 1, Xi) − E(Yi|Di = 0, Xi) computed in (4.15) – (4.17) can be interpreted as differences in the two population regression functions associated with the two values of the dummy regressor Di = 1 and Di = 0 • For specification (4.12) this difference is constantly equal to β2 producing two population regression lines with different intercepts and the same slope • By analogous reasoning, the specifications (4.13) and (4.14) produce population regression lines with (a) different intercepts and different slopes and with (b) the same intercept and different slopes (see the figure on Slide 108) 107 Regression functions using dummy and continuous regressors 108 Interpretation of coefficients: • How can we interpret the specific regression coefficients involved in the specifications (4.12) – (4.14) −→ See class Empirical exercise: • Application to the student-teacher ratio and the percentage of English learners (see class) 109 4.3.3. Interactions between two continuous regressors Consider the following data set: • The dependent variable is (log) earnings, that is Yi = ln(EARNINGSi) (i = 1, . . . , n) • We consider the two continuous regressors X1i = individual’s years of work experience X2i = individual’s years he or she went to school • Regression specification: Yi = β0 + β1 · X1i + β2 · X2i + β3 · (X1i × X2i) + ui (4.18) 110 Expected effects on Y : • In (4.18), a change in X1 by ∆X1 (holding X2 constant) leads to ∆Y = (β1 + β3 · X2)∆X1 (4.19) −→ The effect on Y of a change in X1 by ∆X1 depends on the value of X2 • Analogously, we find that the effect on Y of a change in X2 by ∆X2 (holding X1 constant) depends on the value of X1: ∆Y = (β2 + β3 · X1)∆X2 (4.20) 111 Expected effects on Y : [continued] • We now consider a simultaneous change in X1 by ∆X1 and, at the same time, in X2 by ∆X2 • We then find that the expected change in Y is given by ∆Y = (β1 + β3 · X2)∆X1 + (β2 + β3 · X1)∆X2 + β3∆X1∆X2 (4.21) • The first term in (4.21) is the effect from changing X1 holding X2 constant • The second term in (4.21) is the effect from changing X2 holding X1 constant • The final term, β3∆X1∆X2, in (4.21) is the extra effect from changing both X1 and X2 112 Interactions in multiple regression: [Summary] • The interaction term between the regressors X1 and X2 is their product X1 × X2 • Including the interaction term allows the effect on Y of a change in X1 to depend on the value of X2 and, conversely, allows the effect of a change in X2 to depend on the value of X1 • The coefficient on X1 ×X2 is the effect of a one-unit increase in X1 and X2, above and beyond the sum of the individual effects of a unit increase in X1 alone and a unit increase in X2 alone • This is true irrespective of whether X1 and/or, X2 are continuous or dummy regressors 113