2/7/24 TODAY 1. The general linear model MULTIVARIATE REGRESSION I 2. Nonlinear variables and interactions 3. Empirical example 1 2 WE SAW THE SIMPLE LINEAR MODEL UNDER THE CLASSICAL ASSUMPTIONS 1. 2. 3. 4. 3 MULTIPLE LINEAR REGRESSION MODEL • We simply add explanatory variables: π" = π½% + π½&π& + π½#π# + β― + π½'π' + π ππ = πΆ + π·πΏπ + ππ , π = π, … , π E[π" |π" ]=0, i = 1, … , π V π" = constant ≡ π #, π = 1 … π Coπ£ π" , π$ = 0 ∀ π ≠ π The Xi are not random and they are not all the same Now the regressors are π&, … , π' • From now on, for convenience of notation, we will call the y-intercept of the model π½% • It is as if there were an explanatory variable π% = 1 for all observations. • We will maintain all the classic assumptions except for a small modification in the second part of assumption 4 4 1 2/7/24 DEFINITION OF THE MULTIPLE LINEAR REGRESSION MODEL MOTIVATION FOR MULTIPLE REGRESSION • Incorporate more explanatory factors into the model • Explicitly hold fixed other factors that otherwise would be in “Explains variable y in terms of variables x1, x2,…, xk” • Allow for more flexible functional forms • Example: Wage equation 5 5 6 6 EXAMPLE: AVERAGE TEST SCORES AND PER STUDENT SPENDING INTERPRETATION OF THE PARAMETERS OF THE GENERAL LINEAR MODEL π" = π½% + π½&π& + π½#π# + β― + π½' π' + π • With πΈ μ" = 0 and non-random regressors we have: πΈ π" = π½% + π½&π& + π½#π# + β― + π½' π' + π • Per student spending is likely to be correlated with average family income at a given high school because of school financing. • Omitting average family income in regression would lead to biased estimate of the effect of spending on average test scores. • In a simple regression model, effect of per student spending would partly include the effect of family income on test scores. • The marginal effect of π' is given by: ππΈ π" = π½' ππ' 7 7 8 2 2/7/24 INTERPRETATION OF THE MULTIPLE REGRESSION MODEL INTERPRETATION OF THE PARAMETERS OF THE GENERAL LINEAR MODEL ππΈ π" = π½' ππ' • π½' measures the change in πΈ π" associated with a marginal change in the kth explanatory variable, while keeping all other variables constant (partial derivative) • The multiple linear regression model manages to hold the values of other explanatory variables fixed even if, in reality, they are correlated with the explanatory variable under consideration. • “Ceteris paribus”-interpretation • It has still to be assumed that unobserved factors do not change if the explanatory variables are changed. • The meaning of “marginal change” is tied to the units of measurement of the explanatory variable (1 cent, $1, $1 thousand, $1 million, etc.) 10 9 10 LINEARITY EXAMPLE: DETERMINANTS OF COLLEGE GPA π" = π½% + π½&π& + π½#π# + β― + π½'π' + π Note that this model is linear in the variables and in the parameters: • Interpretation • Y is a linear function of the variables X2, X3, ..., XK ⇒ Linear model in the variables • Holding ACT fixed, another point on high school grade point average is associated with another .453 points college grade point average • Or: If we compare two students with the same ACT, but the hsGPA of student A is one point higher, we predict student A to have a colGPA that is .453 higher than that of student B • Holding high school grade point average fixed, another 10 points on ACT are associated with less than one point on college GPA 11 • Y is a linear function of β1, β2, ..., βK ⇒ Linear model in the parameters. OLS only requires linearity in parameters OLS only requires linearity in parameters 12 3 2/7/24 WE SAW TWO MODELS BEFORE THE MIDTERM EXAMPLE: CEO SALARY, SALES AND CEO TENURE Variables in logarithms • Logarithmic model (log-log): πππ" = π½% + π½&πππ& + π → π½& ππ ππ ππππ π‘ππππ‘π¦ • Semi-logarithmic (log-lin) model: πππ" = π½% + π½&π& + π → π½& ππ π π πππ − ππππ π‘ππππ‘π¦ • Model assumes a constant elasticity relationship between CEO salary and the sales of his or her firm. • Model assumes a quadratic relationship between CEO salary and his or her tenure with the firm. • Meaning of “linear” regression • The model has to be linear in the parameters (not in the variables) 13 13 14 INTERPRETATION WHAT DO THESE DATA SUGGEST? • Important: we always have to think about which model we are in and what the units of measurement of the variables are. • Suppose that Y and X are measured in pesos, and that π½& = 0.5 Model Linear Log-log Log-Lin Lin-Log 15 Equation π! = π½" + π½#π! + π πππ! = π½" + π½#πππ! + π πππ! = π½" + π½#π! + π π! = π½" + π½#πππ! + π Marginal effect If x increase by $1, Y increases $ 0.5 If x increase by 1%, Y increases 0.50 % If x increase by $1, Y increases 50 % (0.50 x 100%) If x increase by 1%, Y increases 0.005$ (0.5/100) 16 4 2/7/24 QUADRATIC VARIABLES • Quadratic model in X: π" = π½% + π½&π& + π½#π&# + π • The marginal effect of X is now given by: ππΈ π" = π½& + 2π½#π& ππ' • The marginal effect is no longer constant: it depends on π½&, π½# and the value we assign to Xi. • Not only does the magnitude of the effect depend on X. When π½& πππ π½# have different signs, the sign of the marginal effect also depends on the value we assign to Xi. 17 • By construction, the linear model predicts a constant marginal effect of X on Y. • Note that the marginal effect of the quadratic model changes with the value of the X (here the marginal effect is increasingly negative). 18 EXAMPLE: FAMILY INCOME AND FAMILY CONSUMPTION PROPERTIES OF OLS ON ANY SAMPLE OF DATA • Fitted values and residuals • Model has two explanatory variables: inome and income squared • Consumption is explained as a quadratic function of income • One has to be very careful when interpreting the coefficients: • Algebraic properties of OLS regression 19 19 20 20 5 2/7/24 GOODNESS-OF-FIT EXAMPLE: EXPLAINING ARREST RECORDS • Decomposition of total variation • R squared • Interpretation: • If the proportion prior arrests increases by 0.5, the predicted fall in arrests is 7.5 arrests per 100 men. • If the months in prison increase from 0 to 12, the predicted fall in arrests is 0.408 arrests for a particular man. • If the quarters employed increase by 1, the predicted fall in arrests is 10.4 arrests per 100 men. • Alternative expression for R squared 22 23 22 23 STANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL EXAMPLE: EXPLAINING ARREST RECORDS (CONT.) • An additional explanatory variable is added. • Assumption MLR.1 (Zero conditional mean) • In a multiple regression model, the zero conditional mean assumption is much more likely to hold because fewer things end up in the error. • Interpretation: • Average prior sentence increases number of arrests (?) • Limited additional explanatory power as R-squared increases by little • Example: Average test scores • General remark on R-squared • Even if R-squared is small (as in the given example), regression may still provide good estimates of ceteris paribus effects. 24 24 25 25 6 2/7/24 ESTIMATION MULTIPLE REGRESSION ANALYSIS: ESTIMATION • Including irrelevant variables in a regression model • Standard assumptions for the multiple regression model • Assumption: Linear in parameters (14 OF 37) • Assumption MLR.2 (Random sampling) • Omitting relevant variables: the simple case 26 27 26 27 STANDARD ASSUMPTIONS FOR THE MULTIPLE REGRESSION MODEL (CONT.) NON-PERFECT MULTICOLLINEARITY IN THE GENERAL LINEAR MODEL • Assumption MLR.4 (No perfect collinearity) • It requires that there is no linear dependence between the explanatory variables. • In other words, none of the explanatory variables can be expressed as a linear combination of the others. • In the sample (and therefore in the population), none of the independent variables is constant and there are no exact linear relationships among the independent variables. • It cannot be the case that there are constants π$ , not all equal to zero, such that π% = π$ π$ ; j ≠ π • Remarks on MLR.4 • The assumption only rules out perfect collinearity/correlation between explanatory variables; imperfect correlation is allowed. • If an explanatory variable is a perfect linear combination of other explanatory variables it is superfluous and may be eliminated. • Constant variables are also ruled out (collinear with intercept). • Non-perfect multicollinearity ≠ no correlation between the explanatory variables • there can be correlation if it is not perfect • It is not enough that the correlation is not perfect between pairs of variables • none of the variables π# … π% can be a constant. Why? 28 28 29 7 2/7/24 MORE EXAMPLES HOW DO RESEARCHERS USE MVR? • Example for perfect collinearity: small sample • To interpret differences as causal, we’re always looking for an all other things equal comparison • Ideally, everything should be equivalent across observations except for the variable of interest • Example for perfect collinearity: relationships between regressors • This is why RCTs and natural experiments were so useful But we can’t always answer questions of interest with experiments 30 30 31 DALE AND KRUEGER (2002) • In multivariate regression, coefficients are interpreted as the change in Y associated with a one-unit change in X holding all other righthand side variables constant • In some cases, it may be possible to control for the right RHS variables to have a plausibly all other things equal comparison • “Regression-based causal inference is predicated on the assumption that when key observed variables have been made equal across treatment and control groups, selection bias from the things we can’t see is also mostly eliminated” -MM • Caution: while this assumption may be satisfied in some cases, it is not in many others! 32 33 8 2/7/24 DALE AND KRUEGER (2002) DALE AND KRUEGER (2002) • Question: How do returns to college differ for public v. private colleges? • Dale and Krueger’s idea: Students who applied to the same set of schools and had the same acceptances/rejections but attend different schools may be similar enough for an all else equal comparison • Question: How do returns to college differ for public v. private colleges? • Why not just compare earnings for people who went to public v. private colleges? • Ideal Experiment: Randomly assign students to colleges • Feasible? Ethical? • “many decisions and choices, including those related to college attendance, involve a certain amount of serendipitous variation generated by financial considerations, personal circumstances, and timing. Serendipity can be exploited in a sample of applicants on the cusp, who could easily go one way or the other” • We can’t do the ideal experiment, so we have to find another way! 34 35 DALE AND KRUEGER (2002) DALE AND KRUEGER (2002) Data: College and Beyond (C&B) • Includes more than 14,000 students who enrolled in a group of moderately- to highly-selective U.S. colleges (enrolled 1976) • Prestigious private schools like UPenn, Princeton, Yale • Smaller private schools like Swarthmore, Williams, Oberlin • Four public universities: Michigan, UNC, Penn State, Miami of Ohio • Survey data collected when these students took the SAT (1975) Follow-up data long after most completed college (1996) 36 37 9 2/7/24 DALE AND KRUEGER (2002) DALE AND KRUEGER (2002) &8% Model πππ" = πΌ + π½π" + Y πΎ" πΊπ πππ"$ + πΏ&ππ΄π" + πΏ#ππππΌ" + π"$ • π" is earnings for individual i in 1995 • π" = 1 if individual i attended private school and =0 if attended public • πΊπ πππ"$ = 1 if individual i is in college application/acceptance group j, =0 otherwise (together called selectivity group fixed effects) • ππ΄π" + is individual i’s SAT score • ππΌ" is parental income for individual I • Also includes some other controls not written out 38 &8% πππ" = πΌ + π½π" + Y πΎ" πΊπ πππ"$ + πΏ&ππ΄π" + πΏ#ππππΌ" + π"$ $7& $7& • What is the coefficient of interest? • How will we interpret it? 39 DALE AND KRUEGER (2002) DALE AND KRUEGER (2002) • For causality, we need another things equal comparison • Experiments are the ideal but not always plausible • Clever multivariate regressions that control for the right thing may be an alternative path • Dale and Krueger (2002) compares students who applied for and were accepted to a set of schools with the same selectivity rankings but attended private v. public school and finds that there is no estimated earnings gain from attending a private school (control for selection group fixed effect) 40 41 10 2/7/24 QUESTIONS? 42 11