Econ 140 Multiple Regression Applications III Lecture 18 Lecture 18 1 Dummy variables Econ 140 • Include qualitative indicators into the regression: e.g. gender, race, regime shifts. • So far, have only seen the change in the intercept for the regression line. • Suppose now we wish to investigate if the slope changes as well as the intercept. • This can be written as a general equation: Wi = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei • Suppose first we wish to test for the difference between males and females. Lecture 18 2 Interactive terms Econ 140 • For females and males separately, the model would be: Wi = a + b1Agei + b2Marriedi + e – in so doing we argue that bˆ1, bˆ2 and aˆ would be different for males and females – we want to think about two sub-sample groups: males and females – we can test the hypothesis that the intercept and partial slope coefficients will be different for these 2 groups Lecture 18 3 Interactive terms (2) Econ 140 • To test our hypothesis we’ll estimate the regression equation above (Wi = a + b1Agei + b2Marriedi + e) for the whole sample and then for the two sub-sample groups • We test to see if our estimated coefficients are the same between males and females • Our null hypothesis is: H0 : aM, b1M, b2M = aF, b1F, b2F Lecture 18 4 Interactive terms (3) Econ 140 • We have an unrestricted form and a restricted form – unrestricted: used when we estimate for the sub-sample groups separately – restricted: used when we estimate for the whole sample • What type of statistic will we use to carry out this test? – F-statistic: SSRR SSRU q F SSRU n1 k n2 k q = k, the number of parameters in the model n = n1 + n2 where n is complete sample size Lecture 18 5 Interactive terms (4) Econ 140 • The sum of squared residuals for the unrestricted form will be: SSRU = SSRM + SSRF • L17_2.xls – the data is sorted according to the dummy variable “female” – there is a second dummy variable for marital status – there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female subsample Lecture 18 6 Interactive terms (5) Econ 140 • The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the test statistic: SSRR SSRU q F SSRU n1 k n2 k 16.261 7.495 5.093 3 7.495 5.093 33 6 1.224 2.626 0.466 – Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females Lecture 18 7 Irene O. Wong: Irene O. Wong: Interactive terms (6) Econ 140 • What if F* > F0.05,3, 27 ? How to read the results? – There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately – Or we could interact the dummy variables with the other variables • To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get: Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei Lecture 18 8 Interactive terms (7) Econ 140 • Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns – one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms – in this example, neither of the t-ratios are statistically significant so we can’t reject the null hypothesis Lecture 18 9 Interactive terms (8) Econ 140 • If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero: E(Wt|Di = 0) = a + b1Agei + b2Marriedi • We can do the same for the second sub-sample (Females) E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3) Marriedi • We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample Lecture 18 10 Phillips Curve example Econ 140 • Phillips curve as an example of a regime shift. • Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment W UN Lecture 18 11 Phillips Curve example (2) Econ 140 • But if we look at data points from 1971 - 1996: W UN • From the data we can detect an upward sloping relationship • ALWAYS graph the data between the 2 main variables of interest Lecture 18 12 Phillips Curve example (3) Econ 140 • There seems to be a regime shift between the two periods – note: this is an arbitrary choice of regime shift - it was not dictated by a specific change • We will use the Chow Test (F-test) to test for this regime shift 1 – the test will use a restricted form: Wt a b UN – it will also use an unrestricted form: 1 1 Wt a b1D b2 b3 D UN UN – D is the dummy variable for the regime shift, equal to 0 for 1950-1970 and 1 for 1971-1996 Lecture 18 13 Phillips Curve example (4) Econ 140 • L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test: • The null hypothesis will be: H0 : b1 = b3 = 0 – we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient • The F-statistic is (* indicates restricted) 2* 2 e e q ˆ ˆ F eˆ2 n k Lecture 18 Where q=2 14 Phillips Curve example (5) Econ 140 • The expectation of wage inflation for the first time period: 1 E (W | D 0) a b UN • The expectation of wage inflation for the second time period: 1 E (W | D 1) a b1 b2 b3 UN • You can use the spreadsheet data to carry out these calculations Lecture 18 15 Econ 140 Relaxing Assumptions Lecture 18 Lecture 18 16 Today’s Plan Econ 140 • A review of what we have learned in regression so far and a look forward to what we will happen when we relax assumptions around the regression line • Introduction to new concepts: – Heteroskedasticity – Serial correlation (also known as autocorrelation) – Non-independence of independent variables Lecture 18 17 CLRM Revision Econ 140 • Calculating the linear regression model (using OLS) • Use of the sum of square residuals: calculate the variance for the regression line and the mean squared deviation • Hypothesis tests: t-tests, F-tests, c2 test. • Coefficient of determination (R2) and the adjustment. • Modeling: use of log-linear, logs, reciprocal. • Relationship between F and R2 • Imposing linear restrictions: e.g. H0: b2 = b3 = 0 (q = 2); H0: a + b = 1. • Dummy variables and interactions; Chow test. Lecture 18 18 Relaxing assumptions Econ 140 • What are the assumptions we have used throughout? • Two assumptions about the population for the bi-variate case: 1. E(Y|X) = a + bX (the conditional expectation function is linear); 2. V(Y|X) = (conditional variances are constant) • Assumptions concerning the sampling procedure (i= 1..n) 1. Values of Xi (not all equal) are prespecified; 2. Yi is drawn from the subpopulation having X = Xi; 3. Yi ‘s are independent. • Consequences are: 1. E(Yi) = a + bXi; 2. V(Yi) = s2; 3. C(Yh, Yi) = 0 – How can we test to see if these assumptions don’t hold? – What can we do if the assumptions don’t hold? Lecture 18 19 Homoskedasticity Econ 140 • We would like our estimates to be BLUE • We need to look out for three potential violations of the CLRM assumptions: heteroskedasticity, autocorrelation, and non-independence of X (or simultaneity bias). • Heteroskedasticity: usually found in cross-section data (and longitudinal) • In earlier lectures, we saw that the variance of bˆ is V (bˆ) s2 2 x – This is an example of homoskedasticity, where the variance is constant Lecture 18 20 Homoskedasticity (2) Econ 140 • Homoskedasticity can be illustrated like this: Y constant variance around the regression line X X1 Lecture 18 X2 X3 21 Heteroskedasticity Econ 140 • But, we don’t always have constant variance s2 – We may have a variance that varies with each 2 s observation, or i • When there is heteroskedasticty, the variance around the regression line varies with the values of X Lecture 18 22 Heteroskedasticity (2) Econ 140 • The non-constant variance around the regression line can be drawn like this: Y X Lecture 18 X1 X2 X3 23 Serial (auto) correlation Econ 140 • Serial correlation can be found in time series data (and longitudinal data) • Under serial correlation, we have covariance terms V (b) cis 2 ch cis hi h i – where Yi and Yh are correlated or each Yi is not independently drawn – This results in nonzero covariance terms Lecture 18 24 Serial (auto) correlation (2) Econ 140 • Example: We can think of this using time series data such that unemployment at time t is related to unemployment in the previous time period t-1 • If we have a model with unemployment as the dependent variable Yt then – Yt and Yt-1 are related – et and et-1 are also related Lecture 18 25 Non-independence Econ 140 • The non-independence of independent variables is the third violation of the ordinary least squares assumptions • Remember from the OLS derivation that we minimized the sum of the squared residuals e2 given that g b X , e 0 – we needed independence between the X variable and the error term – if not, the values of X are not pre-specified – without independence, the estimates are biased Lecture 18 26 Summary Econ 140 • Heteroskedasticity and serial correlation – make the estimates inefficient – therefore makes the estimated standard errors incorrect • Non-independence of independent variables – makes estimates biased – instrumental variables and simultaneous equations are used to deal with this third type of violation • Starting next lecture we’ll take a more in-depth look at the three violations of the CLRM assumptions Lecture 18 27