Econ 140 Multiple Regression Applications II &III Lecture 17 Lecture 17 1 Today’s Plan Econ 140 • Two topics and how they relate to multiple regression – Multicollinearity – Dummy variables Lecture 17 2 Multicollinearity Econ 140 • Suppose we have the following regression equation: Y = a + b1X1 + b2X2 + e • Multicollinearity occurs when some or all of the independent X variables are linearly related • Different forms of multicollinearity: – Perfect: OLS estimation will not work – Non-perfect: comes out of applied work Lecture 17 3 Multicollinearity Example Econ 140 • Again we’ll use returns to education where: – the dependent variable Y is (log) wages – the independent variables (X’s) are age, experience, and years of schooling • Experience is defined as years in the labor force, or the difference between age and years of schooling – this can be written: Experience = Age - Years of school – What’s the problem with this? Lecture 17 4 Multicollinearity Example (2) Econ 140 • Note that we’ve expressed experience as the difference of two of our other independent variables – by constructing experience in this manner we create a collinear dependence between age and experience – the relationship between age and experience is a linear relationship such that: as age increases, for given years of schooling, experience also increases • We can write our regression equation for this example: Wages = a + b1Experience + b2Age + e Lecture 17 5 Multicollinearity Example (3) Econ 140 • Recall that our estimate for b1 is bˆ1 2 x y x 1 2 x1x2 x2 y 2 2 2 x x x x 1 2 1 2 Where x1 = experience and x2 = age • The problem is that x1 and x2 are linearly related – as we get closer to perfect linearity, the denominator will go to zero. – OLS won’t work! Lecture 17 6 Multicollinearity Example (4) Econ 140 • Recall that the estimated variance for bˆ1 is: 2 x 2 2 2 ˆ b1 ˆ YX 2 2 2 x1 x2 x1x2 – So as x1 and x2 approach perfect collinearity, the denominator will go to zero and the expression for the the estimated variance of bˆ1 will increase • Implications: – with multicollinearity, you will get large standard errors on partial coefficients – your t-ratios, given the null hypothesis that the value of the coefficient is zero, will be small Lecture 17 7 More Multicollinearity Examples Econ 140 • On L16_1.xls we have individual data on age, years of education, weekly earnings, school age, and experience – we can perform a regression to calculate returns given age and experience – we can also estimate bivariate models including only age, only experience, and only years of schooling – we expect that the problem is that experience is related to age (to test this, we can regress age on experience) • if the slope coefficient on experience is 1, there is perfect multicollinearity Lecture 17 8 More Multicollinearity Examples Econ (2) 140 • On L16_2.xls there’s a made-up example of perfect multicollinearity – OLS is unable to calculate the slope coefficients – calculating the products and cross-products, we find that the denominator for the slope coefficients is zero as predicted – If we have is an applied problem with these properties: 1) OLS is still unbiased 2) Large variance, standard errors, and difficult hypothesis testing 3) Few significant coefficients but a high R2 Lecture 17 9 More Multicollinearity Examples Econ (3) 140 • What to do with L16_2.xls? – There’s simply not enough variation – We can collect more data or rethink the model – We can test for partial correlations between the X variables (as demonstrated on L16_1.xls). Lecture 17 10 Dummy variables Econ 140 • Dummy variables allow you to include qualitative variables (or variables that otherwise cannot be quantified) in your regression – examples include: gender, race, marital status, and religion – also becomes important when looking at “regime shifts” which may be new policy initiatives, economic change, or seasonality • We will look at some examples: – using female as a qualitative variable – using marital status as a qualitative variable – using the Phillips curve to demonstrate a regime shift Lecture 17 11 Qualitative example: female Econ 140 • We’ll construct a dummy variable: Di = 0 if not female i = 1, …n Di = 1 if female – We can do this with any qualitative variable – Note: assigning the values for the dummy variable is an arbitrary choice • On L17_1.xls there is a sample from the current CPS – to create the dummy variable “female” we assign the value one and zero to the CPS’ value of two and one for sex, respectively – we can include the dummy variable in the regression equation like we would any other variable Lecture 17 12 Qualitative example: female (2) Econ 140 • We estimate the following equation: Yˆi 5.975 0.485Di • Now we can ask: what are the expected earnings given that a person is male? E Yi | Di 0 a b(0) a E Yi | Di 0 5.975 • Similarly, what are the expected earnings given that a person is female? E(Yi | Di = 1) = a + b(1) = a + b = 5.975 - 0.485 = 5.490 Lecture 17 13 Qualitative example: female (4) Econ 140 • We can use other variables to extend our analysis • for example we can include age to get the equation: Y = a + b1Di + b2Xi + e – where Xi can be any or all relevant variables – Di and the related coefficient b1 will indicate how much, on average, females earn less than males – for males the intercept will be aˆ – for females the intercept will be aˆ bˆ1 Lecture 17 14 Qualitative example: female (5) Econ 140 • The estimated regression found on the spreadsheet is Yˆi 5.085 0.656Di 0.023X i • The expected weekly earnings for men are: E (Yi | Di 0) a b2 X i • The expected weekly earnings for women are: E (Yi | Di 1) (a b1 ) b2 X i Lecture 17 15 Qualitative example: female (6) Econ 140 • An important note: • We can’t include dummy variables for both male and female in the same regression equation – suppose we have Y = a + b1D1i + b2D2i + e – where: D1i = 0 if male D1i = 1 if female D2i = 0 if female D2i = 1 if male – OLS won’t be able to estimate the regression coefficients because D1i and D2i show perfect multicollinearity with intercept a • So if you have m qualitative variables, you should include (m-1) dummy variables in the regression equation Lecture 17 16 Example: marital status Econ 140 • The spreadsheet (L17_1.xls) also estimates the following regression equation using two distinct dummy variables: Y a b1D1i b2 D2i b3 X i e – where: D1i = 0 if male D1i = 1 if female D2i = 0 if other D2i = 1 if married • Using the regression equation we can create four categories: married males, unmarried males, married females, and unmarried females Lecture 17 17 Example: marital status (2) Econ 140 • Expected earnings for unmarried males: E (Yi | D1i 0, D2i 0) a b3 X i • Expected earnings for unmarried females: E (Yi | D1i 1, D2i 0) (a b1 ) b3 X i • Expected earnings for married males: E (Yi | D1i 0, D2i 1) (a b2 ) b3 X i • Expected earnings for unmarried females: E (Yi | D1i 1, D2i 1) (a b1 b2 ) b3 X i Lecture 17 18 Interactive terms Econ 140 • So far we’ve only used dummy variables to change the intercept • We can also use dummy variables to alter the partial slope coefficients • Let’s think about this model: Wt = a + b1Agei + b2Marriedi + e – we could argue that bˆ1, bˆ2 and aˆ would be different for males and females – we want to think about two sub-sample groups: males and females – we can test the hypothesis that the partial slope coefficients will be different for these 2 groups Lecture 17 19 Interactive terms (2) Econ 140 • To test our hypothesis we’ll estimate the regression equation for the whole sample and then for the two sub-sample groups • We test to see if our estimated coefficients are the same between males and females • Our null hypothesis is: H0 : aM, b1M, b2M = aF, b1F, b2F Lecture 17 20 Interactive terms (3) Econ 140 • We have an unrestricted form and a restricted form – unrestricted: used when we estimate for the sub-sample groups separately – restricted: used when we estimate for the whole sample • What type of statistic will we use to carry out this test? – F-statistic: SSRR SSRU q F SSRU n1 k n2 k q = k, the number of parameters in the model n = n1 + n2 where n is complete sample size Lecture 17 21 Interactive terms (4) Econ 140 • The sum of squared residuals for the unrestricted form will be: SSRU = SSRM + SSRF • L17_2.xls – the data is sorted according to the dummy variable “female” – there is a second dummy variable for marital status – there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female subsample Lecture 17 22 Interactive terms (5) Econ 140 • The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the estimate: SSRR SSRU q F SSRU n1 k n2 k 16.261 7.495 5.093 3 7.495 5.093 33 6 1.224 2.626 0.466 – Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females Lecture 17 23 Irene O. Wong: Irene O. Wong: Interactive terms (6) Econ 140 • What if F* > F0.05,3, 27 ? How to read the results? – There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately – Or we could interact the dummy variables with the other variables • To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get: Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei Lecture 17 24 Interactive terms (7) Econ 140 • Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns – one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms – in this example, neither of the t-ratios are statistically significant so we can’t reject the null hypothesis • We now know how to use dummy variables to indicate the importance of sub-sample groups within the data – dummy variables are also useful for testing for structural breaks or regime shifts Lecture 17 25 Interactive terms (8) Econ 140 • If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero: E(Wt|Di = 0) = a + b1Agei + b2Marriedi • We can do the same for the second sub-sample (Females) E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3) Marriedi • We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample Lecture 17 26 Phillips Curve example Econ 140 • Phillips curve as an example of a regime shift. • Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment W UN Lecture 17 27 Phillips Curve example (2) Econ 140 • But if we look at data points from 1971 - 1996: W UN • From the data we can detect an upward sloping relationship Lecture 17 28 Phillips Curve example (3) Econ 140 • There seems to be a regime shift between the two periods – note: this is an arbitrary choice of regime shift - it was not dictated by a specific change • We will use the Chow Test (F-test) to test for this regime shift 1 – the test will use a restricted form: Wt a b UN – it will also use an unrestricted form: 1 1 Wt a b1D b2 b3 D UN UN – D is the dummy variable for the regime shift, equal to 0 Lecture 17for 1950-1970 and 1 for 1971-1996 29 Phillips Curve example (4) Econ 140 • L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test: • The null hypothesis will be: H0 : b1 = b3 = 0 – we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient • The F-statistic is (* indicates restricted) 2* 2 e e q ˆ ˆ F eˆ2 n k Lecture 17 Where q=2 30 Phillips Curve example (5) Econ 140 • The expectation of wage inflation for the first time period: 1 E (W | D 0) a b UN • The expectation of wage inflation for the second time period: 1 E (W | D 1) a b1 b2 b3 UN • You can use the spreadsheet data to carry out these calculations Lecture 17 31 What we’ve learned Econ 140 • Multicollinearity – linear relationship between independent variables – examples • Dummy variables – way to include qualitative variables in regressions – examples Lecture 17 32