Introduction into STATA III: Graphs and Regressions Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” June 27, 2013 1 GRAPHS • • • • • • Present your data graphically It is usually helpful if you present the main information /vairables in your data set graphically There are many graphical commands, use the Graphics menue the simplest way is to show the development of your variable(s) over time Syntax: • graph twoway line [variable1] [variable2] if … • graph twoway line wqjt year if ed==1 & ex == 1 This produces a two-dimensional variable with the wage on the vertical and the year on the horizontal axis for education group 1 and experience group 1 Making a graph Graph of mean wage in education 1 and experience 1 group Graph of migration rate in edu 1 and exp 1 group GRAPHS: Two Y-axes • Two axes: It might be useful to display two variables in different y-axes with different scales (e.g. wages and migration rates) • Syntax: • graph (twoway line [variable1] [variable2], yaxis(1)) (twoway line [variable3] [variable2], yaxis(2)) if … • graph (twoway line wqjt year, yaxis(1)) (twoway line mqjt year, yaxis(2)) if ed==1 & ex == 1 • This produces a two-dimensional graph with the wage on the first vertical axis (y1) and the migration rate on the second vertical axis (y2) GRAPHS: Scatter plots (I/II) • Scatter plots display the relations between two variables • Syntax: • graph twoway scatter [variable1] [variable2] if … • graph twoway scatter wqjt mqjt if ed==1 • This produces a two-dimensional scatter plot which shows the relation between the two variables GRAPHS: Scatter plots (II/II) • • You can also add a linear fitted line: Syntax: • graph twoway scatter [variable1] [variable2] if … || lfit [variable1] [variable2] if … • graph twoway scatter wqjt mqjt if ed==1 || lfit wqjt mqjt if ed==1 2 Running regressions • • The standard OLS regression command in STATA is Syntax • regress depvar [list of indepvar ] [if], [options] • e.g. regress ln_wijt mijt $D_i $D_j $D_t The multivariate linear regression model The general econometric model: γi indicates the dependent (or: endogenous) variable x1i,ki exogenous variable, explaining the independent variable β0 constant or the y-axis intercept (if x = 0) β1,2,k regression coefficient or parameter of regression εi residual, disturbance term Running a regression model Globals ! Regression command Dependent variable Independent variables Running a Regression: Output How to interpret the output of a regression variance of model degrees of freedom . reg ln_wqkt mqkt Source SS df MS Model Residual 23.4146717 87.9145738 1 798 23.4146717 .110168639 Total 111.329246 799 .139335727 ln_wqkt Coef. mqkt _cons -1.369118 4.706176 Std. Err. .093913 .017403 t -14.58 270.42 Number of obs F( 1, 798) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 800 212.53 0.0000 0.2103 0.2093 .33192 [95% Conf. Interval] -1.553464 4.672015 β1 β0 = = = = = = 1. Observations 2. fit of the model 3. F-Test 4. R-squared 5. adjusted Rsquared 6. Root Mean Standard Error -1.184772 4.740337 95% confidence interval analysis of significance levels Recall the Borjas (2003)-Modell yijt = β mijt + si + xj + tt + (si ∙ xj) + (si ∙ tt) + (xj ∙ tt) + εijt This model in STATA Syntax: regress ln_wqjt mqjt $Di $Dj $Dt $Dij $Dit $Djt where • ln_wqjt: dependent variable (log wage) • mqjt: migration share in educatipn-experience cell • $Di: global for education dummies • $Dj: global for experience dummies • $Dt: global for time dummies • $Dij: global for interaction education-experience dummies • $Dit: global for education-time interaction dummies • $Djt: global for experience-time interaction dummies What is a global? • A global defines a vector of variables • Defining a global: STATA Syntax: global [global name] [variable1] [variable2] …[variablex] global Di Ded1 Ded2 Ded3 Using a global e.g. in a regression: regress [depvariable] [other variable] [$global name] regress ln_wqjt mqjt $Di This is equivalent to: regress ln_wqjt mqjt Ded1 Ded2 Ded3 Thus, globals are useful shortcuts for lists (vectors) of variables. An alternative to the Borjas (2003) model: yijkt = β mijt + γk (zk ∙ mijt) + si + xj + zk + tt + (si ∙ xj) + (si ∙ zk) + (xj ∙ zk) + (si ∙ tt) + (xj ∙ tt) + (zk ∙ tt) + εijt where • zk is a dummy for foreigners (1 if foreigner, 0 if native) • γk is a coefficient, which captures the different impact on foreigners, • k (k= 0, 1) is a subscript for nationality Idea: the slope coefficient γk is significantly different from zero, if natives and immigrants are imperfect substitutes in the labour market. Problem: We have to reorganize the dataset such that it delivers the wage and unemployment rates etc. for foreigners and natives. 3 Panel Models • Very often you use panel models, i.e. models which have a group and time series dimension • There exist special estimators for this, e.g. fixed or random effects models • A fixed effects model is a model where you have a fixed (constant) effect for each individual/group. This is equivalent to a dummy variable for each group • A random effects model is a model where you have a random effect for each individual group, which is based on assumptions on the distribution of individual effects Panel Models Preparing data for Panel Models: • For running panel models STATA needs to identify the group(individual) and time series dimension • Therefore you need an index for each group and an index for each time period • Then use the tsset command to organize you dataset as a panel data set • Syntax: • tsset index year • where index is the group/individual index and year the time index Preparation: Running the tsset command Running Regressions: Panel Models • Then you can use panel estimators, e.g. the xtreg estimator • Syntax • xtregress depvar [list of indepvar ] [if], [options] • xtregress ln_wijt m_ijt, fe • i.e. in the example we run a simple fixed effects panel regression model which is equivalent to include a dummy variable for each group (in this case education-experience group) Running a Panel Regression: command Running a Panel Regression: Output Running Regressions: Panel Models • There are other features of panel estimators which are helpful • Heteroscedasticity: • Heteroscedasticity: the variance is not constant, but varies across groups • xtpcse , p(h) corrects for heteroscedastic standard errors • xtgls , p(h) corrects coefficient and standard errors for panel heteroscedasticity, but may produce biased results depending on the group and time dimension of the panel • Note: p(h) after the comma is a so-called “option” in the STATA syntax Heteroscedasticity within a group Y x Heteroscedasticity in panel models across groups Y x Running Regressions: Panel Models • Contemporary correlation across cross-sections • Contemporary correlation: the error terms are contemporarily correlated across cross-sections, e.g. due to macroeconomic disturbances • xtgls , p(c) corrects for contemporary correlation and panel heteroscedasticity, but may produce biased results depending on the group and time dimension of the panel. Next Meeting July 4! Presentation: July 18 Room RZ 01.02