PANEL DATA (Ch. 10) The recommended exercise questions from the textbook: • [1] Chapter 10: All except (10.6), (10.10). What are panel data? • Panel data consists of the observations on the same n entities at two or more time periods T. If the data set contains observations on the variables X and Y, then the data are denoted ( X it , Yit ), i = 1,..., n and t = 1,..., T , where the first subscript, i, refers to the entity being observed, and the second subscript, t, refers to the date at which it is observed. • Balanced panel Vs. unbalanced panel. • Balanced panel: Variables are observed for each entity and each time period. • Unbalanced panel: Some missing data for at least one time period. • We consider the analysis of balanced panel. But extension to unbalanced is straightforward. Panel-1 [2] Revisiting Omitted Variables Biases • Issue: • Do alcohol taxes help decrease traffic deaths? • Data: fatality.wf1 • 48 U.S. states (excluding Alaska and Hawaii): N = 48. • 1982 -1988: T =7. • fatality rate = # of traffic accident deaths per 10,000 people. beertax = tax per a case of beer ($). • Estimation results for the 1982 data: n = 2.01 + 0.15BeerTax FatalityRate (0.15) (0.13) • Estimation results for the 1988 data: n = 1.86 + 0.44BeerTax FatalityRate (0.11) (0.13) Panel-2 Panel-3 • What is going on here? • Consider a simple multiple regression model (for a given time t): Yit = β0 + β1Xit + β2Zi + uit, i = 1, ... , N, where Zi is a time-invariant regressor. 1 • What do β1 and β2 measure? β1 measures the partial effect of Xit on Yit with Zi held constant. Similarly, β2 measures the partial effect of Zi on Yi with Xit held constant. • If you estimate Yit = α0 + α1Xit + errorit instead? • αˆ1 → p β1 + β 2 cov( X it , Z i ) var( X it ) • Each state would have a different level of preference for alcohol (say, Zi = Pal). • Pal (Z) and Beertax (X) could be positively related: cov( X it , Zi ) >0. • Pal (Z) would have a positive partial effect on FatalityRate (β2 > 0). • Thus, α̂1 could be positive even if the true β1 is negative. • How could we control Pal using panel data? Panel-4 [3] Panel Data with Two Time Periods • Two equations for 1982 and 1988: FatalityRatei,1988 = β0 + β1BeerTaxi,1988 + β2Zi + ui,1988. FatalityRatei,1982 = β0 + β1BeerTaxi,1982 + β2Zi + ui,1982. → FatalityRatei,1988 – Fatalityi,1982 = β1(BeerTaxi,1988 –BeerTaxi,1982) + (ui,1988-ui,1982). • No Zi in (1)! OLS on (1) will yield a consistent estimator of β1. • Actual estimation results for (1): n Fatality 1988 − Fatality1982 = -0.072 – 1.04(BeerTax1988 – BeerTax1982) (0.065) (0.36) Panel-5 (1) • Comments on the before-and-after estimation results. • As real beer tax increases by $1 per case, the traffic fatality rate falls by 1.04 deaths per 10,000 people. → This is a big effect, because mean traffic fatality rate is approximately two. • This before-and-after approach works well if T = 2. What should we do if T > 2? Panel-6 [4] Fixed Effects Regression (A) A simple regression model: Yit = β0 + β1Xit + β2Zi + uit, i = 1, ... , N, t = 1, ... , T. (1) • Set αi = β0 + β2Zi. Then, we have Yit = β1Xit + αi + uit, (2) which is called the “fixed effects regression model. • For the i’th cross-sectional entity, the regression line is (2). The slope coefficient β1 is the same for all i, but the intercept terms αi are different across different i (but constant over time). • Set: Yit = β0 + β1Xit + γ2D2i + γ3D3i + ... + γnDni + uit, (3) where i = 1, ... , n, t = 1, ..., T (nT observations), ⎧1 if i is the 2nd entity; D 2i = ⎨ ⎩0 otherwise, and other dummy variables D3, ..., Dn are similarly defined. • In (3), α1 = β0, α2 = β0 + γ2, ... , αn = β0 + γn. • The slope coefficient β1 and n other parameters (β0, γ2, ..., γn) can be estimated by OLS on model (3). Panel-7 • “Entity-demeaned” OLS algorithm • Yit = β1Xit + αi + uit Yi = β1 X i + αi + ui , where Yi = 1 T Σt =1Yit . T ------------------------------------ (Y it − Yi ) = β1 ( X it − X i ) + ( uit − ui ) . (4) • OLS estimator of β1 from (4) = OLS estimator of β1 from (3). • Least Square Assumptions for the fixed effects model: (FEA.1) E (uit | X i1 , X i 2 ,..., X iT , αi ) = 0 . (FEA.2) The data, ( X i1 ,..., X iT , Yi1 ,..., YiT ) , i =1, ..., n, are random sample. (FEA.3) ( X it , αi ) have nonzero finite fourth moments: Large outliers are unlikely. (FEA.4) There is no perfect multicollinearity. (FEA.5) No autocorrelation: cov(uit , uis | X i1 ,..., X iT , αi ) = 0 for all t ≠ s. For multiple regressions, Xit should be replaced by full list of X1,it, …, Xk,it. • What happens if (FEA.5) is violated? Panel-8 (B) Extension to multiple X’s. • The fixed effects regression model is Yit = β1X1,it + ... + βkXk,it + αi + uit, (5) where i = 1, ... , n, and t = 1, ... , T. • Equivalently, the fixed effects model can be written as Yit = β0 + β1X1,it + ... + βkXk,it + γ2D2i + ... + γnDni + uit. (6) • “Entity-demeaned” algorithm (Y it − Yi ) = β1 ( X 1,it − X 1,i ) + ... + β k ( X k ,it − X k ,i ) + ( uit − ui ) . (7) • OLS estimators of β1, ... , βk from (7) = OLS estimators of β1, ... , βk from (6). (C) Application to Traffic Deaths. • Fixed effects regression results: n = -0.66BeerTax + StateFixedEffects. FatalityRate (0.20) Panel-9 [5] Time and Entity Fixed Effects Model (1) Motivation. • Return to our FatalityRate example: Yit = β0 + β1Xit + β2Zi + β3St + uit, where, Yit = FatalityRate; Xit = BeerTax; Zi = time-invariant preferences for alcohol or driving of the people in State i; St = Time specific effects (common to all states) such as overall mobile safety improvements. ⎧1 if t is the first time period ; • Let B1t = ⎨ ⎩0, otherwise. Define dummy variables B2t, ... , BTt similarly. (2) Time and Entity Fixed Effects Model: Yit = β0 + β1X1,it + ... + βkXk,it + γ2D2i + ... + γnDni + δ2B2t + ... δTBTt + uit. • Too many regressors. But can get reasonably accurate estimates of β1, ... , βk. But the estimates of γ2, ... , γn and δ2, ... , δT are inaccurate. (3) Application to traffic death n = -0.64Beertax + StateFixedEffects FatalityRate (0.25) + TimeFixedEffects. Panel-10 [6] Drunk Driving Laws and Traffic Death • Would driving laws and economic conditions matter? Panel-11 • Drinking or drunken driving law do not matter very much. • Economic factors are important. • (4) is the base model. • Average tax = $0.5/case, and average fatality rate = 2 per 10,000 people. • As tax increases by $0.5, fatality rate drops 0.45×0.5 = 0.225 (per 10,000). → But this result is somewhat imprecise: The confidence interval for the effect of BeerTax at 95% of confidence level is: −0.45 ± 1.96 × 0.22 → (-0.88, -0.02), which is quite wide. Panel-12 [7] Eviews Exercise (1) Exercise with an artificial panel data set named “artificial_panel.xls.” There are four variables in the excel file, “country”, “year”, “y”, and “x”. Each variable has 11 observations from the 3rd row to the 14th row. The data are artificial numbers for three countries, US, Japan and Korea. Notice that the variable “country” is alphabetic, not numeric. STEP 1: Open artificial_panel.xls using Excel. Then, using your mouse, block the data and copy them. STEP 2: Open Eviews. Then, type the following on the Eviews window (the narrow white window below the File, Edit, Object buttons): create u 12 (enter) Then, a workfile window will pop up. Panel-13 Type the followings on the Eviews window: alpha country (enter) data year y x (enter) The command “alpha” is used to create alphabetic variables, while “data” is for numeric variables. Then, a spreadsheet will pop up. Panel-14 Close the window by clicking on X on the North-East corner of the window. Eviews will ask you whether you want to delete Untitled Group. Click on the Yes button. Panel-15 STEP 3: On the workfile, click on the show buttom. Then, a SHOW window will pop up. Type on the window: country year y x Panel-16 Click on OK. Then, a spreadsheet will pop up. Panel-17 Click on Edit+/- buttom and locate your cursor on the 1-country cell. And push the right button on your mouse. Panel-18 Then, you will see that the data from the excel file are pasted to the spreadsheet. Panel-19 Close the spreadsheet by clicking on X on the North-East corner. Eviews will ask you whether you want to delete Untitled Group. Click on the Yes button. STET 4: On the workfile, push the save buttom. Determine the drive and file folder where you want to save the file. Choose the file name “artificial_panel.wf1”. Panel-20 Click on the save button. Then, a “Workfile Save” window will pop up. Just click on the ok button. Panel-21 Then, you will be back to the workfile. Panel-22 STEP 5: On the workfile, push the Proc button. Choose Structure/Resize Current Page… Panel-23 Then you will have the Workfile Structure window. Choose Dated Panel. Then, you will have the following screen. Panel-24 Type 2001 for Start date, 2004 for End date, country for Crosssection ID series, and year for Data series. Then, click on OK. Panel-25 Then, you will be back to the workfile. Save it!!! STEP 6: Push the objects/new object... button. Choose Equation and choose art_pan as the name of the object. Then, an Equation Estimation window will pop up. Type “y x” on the Equation specification box. Panel-26 And click on Panel Options. Panel-27 Choose “Fixed” for Cross-section, “Fixed” for Period, and “White (diagonal) for Coef covariance method. By choosing “Fixed” for Cross-section, you are doing regression with dummy variables for individual entities. By choosing “Fixed” for Period, you are adding time dummy variables into regression. Panel-28 STEP 7: Choose view/Fixed/Random Effects/Cross-section Effects. Then you will have: Panel-29 Choose view/Fixed/Random Effects/Period Effects. Panel-30 Choose view/Fixed/Random Effects Testing/Redundant Fixed Effects. Panel-31 Panel-32 I found that the F and χ2 statistics for the individual dummy variables and the time dummy variables are computed assuming the error terms in the regression models are homoskedastic over i and t. So, the results are not reliable if the error terms are in fact heteroskedastic. If you would like to test whether time effects are statistically significant, I would like to suggest you to estimate your model choosing None for Period but including time-dummy variables as time dummy variables. Panel-33 (2) Exercise with fatality.wf1. ----------------------------------------------------------------------------------variable name variable label ---------------------------------------------------------------------------------state State ID (FIPS) Code year Year spircons Spirits Consumption unrate Unemployment Rate perinc Per Capita Personal Income emppop Employment/Population Ratio beertax Tax on Case of Beer sobapt % Southern Baptist mormon % Mormon mlda Minimum Legal Drinking Age dry % Residing in Dry Counties yngdrv % of Drivers Aged 15-24 vmiles Ave. Mile per Driver vmilespd Ave. Mile per 1,000 Driver breath Prelim. Breath Test Law jaild Mandatory Jail Sentence comserd Mandatory Community Service jailcom jaild + comserd allmort # of Vehicle Fatalities (#VF) mrall Vehicle Fatality Rate (VFR) = #VF/Population vfrall 10,000*mrall = VFR per 10,000 people allnite # of Night-time VF (#NVF) mralln Night-time VFR (NVFR) allsvn # of Single VF (#SVF) a1517 #NVF, 15-17 year olds mra1517n NVFR, 15-17 year olds a1829 #VF, 18-20 year olds a1820n #NVF, 18-20 year olds mra1820 VFR, 18-20 year olds mra1820n NVFR, 18-20 year olds a2124 #VF, 21-24 year olds mra2124 VFR, 21-24 year olds a2124n #NVF, 21-24 year olds mra2124n NVFR, 21-24 year olds aidall # of alcohol-involved VF Panel-34 da18 Dummy variable for drinking age = 18 da19 Dummy variable for drinking age = 19 da20 Dummy variable for drinking age = 20 lincperc Log of per capita real income mraidall Alcohol-Involved VFR pop Population pop1517 Population, 15-17 year olds pop1820 Population, 18-20 year olds pop2124 Population, 21-24 year olds miles total vehicle miles (millions) unus U.S. unemployment rate epopus U.S. Emp/Pop Ratio gspch GSP Rate of Change Dum1982 Dum1983 Dum1984 : DUM1988 ------------------------------------------------------------------------------------ Panel-35 • Estimation of the specification (4) on Table 10.1 in p. 368. Dependent Variable: VFRALL Sample: 1982 1988 Cross-sections included: 48 Total panel (balanced) observations: 336 White diagonal standard errors & covariance (d.f. corrected) Variable Coefficient Std. Error t-Statistic Prob. C BEERTAX DA18 DA19 DA20 JAILD COMSERD VMILESPD LINCPERC UNRATE DUM1982 DUM1983 DUM1984 DUM1985 DUM1986 DUM1987 -2.327171 -0.450272 0.027509 -0.019096 0.030875 0.012644 0.034135 0.008226 1.814889 -0.063043 0.533926 0.435841 0.246723 0.155325 0.189843 0.087532 1.316419 0.222005 0.065473 0.039510 0.045689 0.031940 0.114820 0.008368 0.472220 0.011616 0.075931 0.070418 0.050392 0.043688 0.040808 0.032452 -1.767804 -2.028203 0.420158 -0.483315 0.675767 0.395866 0.297289 0.983073 3.843312 -5.427345 7.031706 6.189300 4.896067 3.555327 4.652090 2.697246 0.0782 0.0435 0.6747 0.6293 0.4998 0.6925 0.7665 0.3264 0.0002 0.0000 0.0000 0.0000 0.0000 0.0004 0.0000 0.0074 Effects Specification Cross-section fixed (dummy variables) R-squared Adjusted R-squared Log likelihood Durbin-Watson stat 0.939540 0.925809 183.8646 1.733929 Mean dependent var S.D. dependent var F-statistic Prob(F-statistic) Panel-36 2.040444 0.570194 68.42532 0.000000 • Testing significance of the individual and time dummy variables: [Estimation choosing “Fixed” for period and not using dummy variables as regressor.] Redundant Fixed Effects Tests Equation: MIN Test cross-section and period fixed effects Effects Test Cross-section F Cross-section Chi-square Period F Period Chi-square Cross-Section/Period F Cross-Section/Period Chi-square Statistic 44.772106 727.186063 19.685127 120.798386 40.398468 732.351587 Panel-37 d.f. Prob. (47,273) 47 (6,273) 6 (53,273) 53 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 • Testing significance of the time dummy variables: [Estimation choosing “None” for period and using dummy variables as regressor.] Wald Test: Equation: MIN Test Statistic F-statistic Chi-square Value 11.46715 68.80287 Panel-38 df (6, 273) 6 Probability 0.0000 0.0000 Comments on (FEA.5): • What if Assumption #5 fails: so corr(uit,uis|Xit,Xis,αi) ≠0? • OLS panel data estimators of β1 are unbiased, consistent. • The OLS standard errors will be wrong. • Use “heteroskedasticity and autocorrelation-consistent standard errors” (clustered standard errors). • The clustered SE formula is NOT the usual (hetero-robust) SE formula! [Appendix 10.2 (pp. 379 – 381)]. • The clustered SE might not be very accurate if N is small. • Eviews can compute these! • In Eviews, choose “White period” instead of “White (diagonal)”. Panel-39 • Estimation of the specification (7) on Table 10.1 in p. 368. Dependent Variable: VFRALL Sample: 1982 1988 Cross-sections included: 48 Total panel (balanced) observations: 336 White period standard errors & covariance (d.f. corrected) Variable Coefficient Std. Error t-Statistic Prob. C BEERTAX DA18 DA19 DA20 JAILD COMSERD VMILESPD LINCPERC UNRATE DUM1982 DUM1983 DUM1984 DUM1985 DUM1986 DUM1987 -2.327171 -0.450272 0.027509 -0.019096 0.030875 0.012644 0.034135 0.008226 1.814889 -0.063043 0.533926 0.435841 0.246723 0.155325 0.189843 0.087532 1.915400 0.319805 0.075267 0.053288 0.054076 0.017699 0.142797 0.007355 0.683535 0.013984 0.098541 0.091540 0.064103 0.054832 0.042774 0.032445 -1.214979 -1.407961 0.365483 -0.358351 0.570957 0.714386 0.239043 1.118432 2.655150 -4.508168 5.418291 4.761205 3.848852 2.832774 4.438265 2.697841 0.2254 0.1603 0.7150 0.7204 0.5685 0.4756 0.8113 0.2644 0.0084 0.0000 0.0000 0.0000 0.0001 0.0050 0.0000 0.0074 Effects Specification Cross-section fixed (dummy variables) R-squared Adjusted R-squared Durbin-Watson stat 0.939540 0.925809 1.733929 Mean dependent var S.D. dependent var Prob(F-statistic) Panel-40 2.040444 0.570194 0.000000 • Average tax = $0.5/case, and average fatality rate = 2 per 10,000 people. • As tax increases by $0.5, fatality rate drops 0.45×0.5 = 0.225 (per 10,000). → The confidence interval for the effect of BeerTax at 95% of confidence level is: −0.45 ± 1.96 × 0.32 → (-1.08, 0.18), which is wider than (-0.88, -0.02). Panel-41 Panel-42 Panel-43