STATA Introductory manual 1 MADE QUIZ (for the last time) • What are the main OLS assumptions? 1. On average right 2. Linear 3. Predicting variables and error term uncorrelated 4. No serial correlation in error term 5. Homoscedasticity + Normality of error term 2 MADE Regression with STATA • Refresh yourself how we did regressions? – regress gdppc fdi trade inv school2 – regress gdppc fdi trade inv school2, nocons Source df MS Model Residual 2.0703e+09 691290985 4 88 517584509 7855579.38 Total 2.7616e+09 92 30017706.8 gdppc fdi trade inv school2 _cons 3 SS Coef. 508.1666 13.57583 -72.10749 110.629 -83.17579 Std. Err. 147.8113 12.64687 39.74305 10.25845 899.459 t 3.44 1.07 -1.81 10.78 -0.09 Number of obs = F( 4, 88) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.286 0.073 0.000 0.927 = = = = = 93 65.89 0.0000 0.7497 0.7383 2802.8 [95% Conf. Interval] 214.4226 -11.55717 -151.0885 90.24245 -1870.661 801.9106 38.70884 6.873478 131.0155 1704.31 MADE Diagnostics with STATA • Normality of the residual – predict e, residual [directly after regress] – sktest e [Jarque-Bery test] Skewness/Kurtosis tests for Normality Variable Pr(Skewness) e 0.025 Pr(Kurtosis) 0.057 adj chi2(2) 7.73 joint Prob>chi2 0.0210 • RESET test (so-called omitted variables or functional form test) – ovtest, rhs Ramsey RESET test using powers of the independent variables Ho: model has no omitted variables F(12, 76) = 3.30 Prob > F = 0.0007 4 MADE First conclusions • There is something wrong with the model specification: – residuals are not normal (although sample is not that small) – some variables seem to demonstrate nonlinearity • We need to check if it causes problems for estimation… 5 MADE Diagnostics with STATA • Heteroscedasticity – hettest, rhs [Breush-Pagan test] Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fdi trade inv school2 chi2( 4) Prob > chi2 6 = = 32.11 0.0000 MADE Diagnostics with STATA • Heteroscedasticity – imtest, white [White test] White's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2( 14) Prob > chi2 = = 30.92 0.0057 Cameron & Trivedi's decomposition of IM-test Source 7 chi2 df p Heteroskedasticity Skewness Kurtosis 30.92 9.28 1.43 14 4 1 0.0057 0.0545 0.2319 Total 41.63 19 0.0020 MADE Diagnostics with STATA • Autocorrelation (we would need time, not this dataset) – tsset t – dwstat [Durbin-Watson test] – bgodfrey, lags(1 2 3) [Breush-Godfrey test] 8 MADE Conclusions • Model specification is wrong – sample small and no normality – omitted variable problem – for sure heteroscedasticity (cannot trust standard errors) => There is something very wrong! • Solutions – we can use robust estimators, but it will not solve the inconsistency problem – we need a new model specification 9 MADE Diagnostics with STATA • Structural stability [Chow test] – – – – – – – – – – 10 gen d=0 gen dfdi=0 gen dtrade=0 gen dinv=0 replace d=1 if lat<0 (why so????) replace dfdi=fdi if lat<0 replace dtrade=trade if lat<0 replace dinv=inv if lat<0 reg gdppc fdi trade inv school2 d dfdi dtrade dinv test (d=0) (dfdi=0) (dtrade=0) (dinv=0) MADE Structure does not seem to be driving this Source SS df MS Model Residual 2.0905e+09 671139673 8 84 261311169 7989758.01 Total 2.7616e+09 92 30017706.8 gdppc fdi trade inv school2 d dfdi dtrade dinv _cons Coef. 526.6277 12.36589 -111.958 113.815 -3125.048 -53.68357 8.752883 137.5564 641.9922 Std. Err. 171.405 13.84263 47.72409 10.87722 2113.929 377.4741 75.44521 93.96076 1037.978 t 3.07 0.89 -2.35 10.46 -1.48 -0.14 0.12 1.46 0.62 Number of obs = F( 8, 84) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.003 0.374 0.021 0.000 0.143 0.887 0.908 0.147 0.538 = = = = = 93 32.71 0.0000 0.7570 0.7338 2826.6 [95% Conf. Interval] 185.7702 -15.16171 -206.8626 92.18447 -7328.828 -804.3323 -141.2782 -49.29488 -1422.141 867.4853 39.89348 -17.05342 135.4455 1078.732 696.9651 158.784 324.4077 2706.125 . test (d=0) (dfdi=0) (dtrade=0) (dinv=0) ( ( ( ( 1) 2) 3) 4) d = 0 dfdi = 0 dtrade = 0 dinv = 0 F( 11 4, 84) = Prob > F = . end of do-file 0.63 0.6421 MADE Job for you now • Start thinking – Maybe there are some variables we should include in specification? – Maybe we should start taking logs? – Maybe interactions? – Maybe model should be estimated on subsamples? • Try drawing things to prove your approach 12 MADE