Lecture 18

Econ 140 Multiple Regression Applications III Lecture 18 Lecture 18 1 Dummy variables Econ 140 • Include qualitative indicators into the regression: e.g. gender, race, regime shifts. • So far, have only seen the change in the intercept for the regression line. • Suppose now we wish to investigate if the slope changes as well as the intercept. • This can be written as a general equation: Wi = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei • Suppose first we wish to test for the difference between males and females. Lecture 18 2 Interactive terms Econ 140 • For females and males separately, the model would be: Wi = a + b1Agei + b2Marriedi + e – in so doing we argue that bˆ1, bˆ2 and aˆ would be different for males and females – we want to think about two sub-sample groups: males and females – we can test the hypothesis that the intercept and partial slope coefficients will be different for these 2 groups Lecture 18 3 Interactive terms (2) Econ 140 • To test our hypothesis we’ll estimate the regression equation above (Wi = a + b1Agei + b2Marriedi + e) for the whole sample and then for the two sub-sample groups • We test to see if our estimated coefficients are the same between males and females • Our null hypothesis is: H0 : aM, b1M, b2M = aF, b1F, b2F Lecture 18 4 Interactive terms (3) Econ 140 • We have an unrestricted form and a restricted form – unrestricted: used when we estimate for the sub-sample groups separately – restricted: used when we estimate for the whole sample • What type of statistic will we use to carry out this test? – F-statistic: SSRR  SSRU  q F SSRU n1  k   n2  k  q = k, the number of parameters in the model n = n1 + n2 where n is complete sample size Lecture 18 5 Interactive terms (4) Econ 140 • The sum of squared residuals for the unrestricted form will be: SSRU = SSRM + SSRF • L17_2.xls – the data is sorted according to the dummy variable “female” – there is a second dummy variable for marital status – there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female subsample Lecture 18 6 Interactive terms (5) Econ 140 • The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the test statistic: SSRR  SSRU  q F SSRU n1  k   n2  k  16.261  7.495  5.093 3  7.495  5.093 33  6  1.224   2.626 0.466 – Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females Lecture 18 7 Irene O. Wong: Irene O. Wong: Interactive terms (6) Econ 140 • What if F* > F0.05,3, 27 ? How to read the results? – There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately – Or we could interact the dummy variables with the other variables • To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get: Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei Lecture 18 8 Interactive terms (7) Econ 140 • Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns – one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms – in this example, neither of the t-ratios are statistically significant so we can’t reject the null hypothesis Lecture 18 9 Interactive terms (8) Econ 140 • If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero: E(Wt|Di = 0) = a + b1Agei + b2Marriedi • We can do the same for the second sub-sample (Females) E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3) Marriedi • We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample Lecture 18 10 Phillips Curve example Econ 140 • Phillips curve as an example of a regime shift. • Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment W UN Lecture 18 11 Phillips Curve example (2) Econ 140 • But if we look at data points from 1971 - 1996: W UN • From the data we can detect an upward sloping relationship • ALWAYS graph the data between the 2 main variables of interest Lecture 18 12 Phillips Curve example (3) Econ 140 • There seems to be a regime shift between the two periods – note: this is an arbitrary choice of regime shift - it was not dictated by a specific change • We will use the Chow Test (F-test) to test for this regime shift 1 – the test will use a restricted form: Wt  a  b UN – it will also use an unrestricted form:  1 1 Wt  a  b1D  b2  b3  D UN  UN    – D is the dummy variable for the regime shift, equal to 0 for 1950-1970 and 1 for 1971-1996 Lecture 18 13 Phillips Curve example (4) Econ 140 • L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test: • The null hypothesis will be: H0 : b1 = b3 = 0 – we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient • The F-statistic is (* indicates restricted) 2* 2   e  e q ˆ ˆ   F  eˆ2  n  k  Lecture 18 Where q=2 14 Phillips Curve example (5) Econ 140 • The expectation of wage inflation for the first time period: 1 E (W | D  0)  a  b UN • The expectation of wage inflation for the second time period: 1 E (W | D  1)  a  b1   b2  b3  UN • You can use the spreadsheet data to carry out these calculations Lecture 18 15 Econ 140 Relaxing Assumptions Lecture 18 Lecture 18 16 Today’s Plan Econ 140 • A review of what we have learned in regression so far and a look forward to what we will happen when we relax assumptions around the regression line • Introduction to new concepts: – Heteroskedasticity – Serial correlation (also known as autocorrelation) – Non-independence of independent variables Lecture 18 17 CLRM Revision Econ 140 • Calculating the linear regression model (using OLS) • Use of the sum of square residuals: calculate the variance for the regression line and the mean squared deviation • Hypothesis tests: t-tests, F-tests, c2 test. • Coefficient of determination (R2) and the adjustment. • Modeling: use of log-linear, logs, reciprocal. • Relationship between F and R2 • Imposing linear restrictions: e.g. H0: b2 = b3 = 0 (q = 2); H0: a + b = 1. • Dummy variables and interactions; Chow test. Lecture 18 18 Relaxing assumptions Econ 140 • What are the assumptions we have used throughout? • Two assumptions about the population for the bi-variate case: 1. E(Y|X) = a + bX (the conditional expectation function is linear); 2. V(Y|X) = (conditional variances are constant) • Assumptions concerning the sampling procedure (i= 1..n) 1. Values of Xi (not all equal) are prespecified; 2. Yi is drawn from the subpopulation having X = Xi; 3. Yi ‘s are independent. • Consequences are: 1. E(Yi) = a + bXi; 2. V(Yi) = s2; 3. C(Yh, Yi) = 0 – How can we test to see if these assumptions don’t hold? – What can we do if the assumptions don’t hold? Lecture 18 19 Homoskedasticity Econ 140 • We would like our estimates to be BLUE • We need to look out for three potential violations of the CLRM assumptions: heteroskedasticity, autocorrelation, and non-independence of X (or simultaneity bias). • Heteroskedasticity: usually found in cross-section data (and longitudinal) • In earlier lectures, we saw that the variance of bˆ is V (bˆ)  s2 2 x  – This is an example of homoskedasticity, where the variance is constant Lecture 18 20 Homoskedasticity (2) Econ 140 • Homoskedasticity can be illustrated like this: Y constant variance around the regression line X X1 Lecture 18 X2 X3 21 Heteroskedasticity Econ 140 • But, we don’t always have constant variance s2 – We may have a variance that varies with each 2 s observation, or i • When there is heteroskedasticty, the variance around the regression line varies with the values of X Lecture 18 22 Heteroskedasticity (2) Econ 140 • The non-constant variance around the regression line can be drawn like this: Y X Lecture 18 X1 X2 X3 23 Serial (auto) correlation Econ 140 • Serial correlation can be found in time series data (and longitudinal data) • Under serial correlation, we have covariance terms V (b)   cis 2   ch cis hi h i – where Yi and Yh are correlated or each Yi is not independently drawn – This results in nonzero covariance terms Lecture 18 24 Serial (auto) correlation (2) Econ 140 • Example: We can think of this using time series data such that unemployment at time t is related to unemployment in the previous time period t-1 • If we have a model with unemployment as the dependent variable Yt then – Yt and Yt-1 are related – et and et-1 are also related Lecture 18 25 Non-independence Econ 140 • The non-independence of independent variables is the third violation of the ordinary least squares assumptions • Remember from the OLS derivation that we minimized the sum of the squared residuals  e2 given that g b  X , e  0 – we needed independence between the X variable and the error term – if not, the values of X are not pre-specified – without independence, the estimates are biased Lecture 18 26 Summary Econ 140 • Heteroskedasticity and serial correlation – make the estimates inefficient – therefore makes the estimated standard errors incorrect • Non-independence of independent variables – makes estimates biased – instrumental variables and simultaneous equations are used to deal with this third type of violation • Starting next lecture we’ll take a more in-depth look at the three violations of the CLRM assumptions Lecture 18 27

Lecture 18

Related documents

Products

Support

Lecture 18

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib