GRAPHS

advertisement
Introduction into STATA III:
Graphs and Regressions
Prof. Dr. Herbert Brücker
University of Bamberg
Seminar “Migration and the Labour Market”
June 27, 2013
1 GRAPHS
•
•
•
•
•
•
Present your data graphically
It is usually helpful if you present the main
information /vairables in your data set graphically
There are many graphical commands, use the
Graphics menue
the simplest way is to show the development of your
variable(s) over time
Syntax:
• graph twoway line [variable1] [variable2] if …
• graph twoway line wqjt year if ed==1 & ex == 1
This produces a two-dimensional variable with the
wage on the vertical and the year on the horizontal
axis for education group 1 and experience group 1
Making a graph
Graph of mean wage in education 1 and experience 1
group
Graph of migration rate in edu 1 and exp 1 group
GRAPHS: Two Y-axes
•
Two axes: It might be useful to display two variables
in different y-axes with different scales (e.g. wages
and migration rates)
• Syntax:
• graph (twoway line [variable1] [variable2],
yaxis(1)) (twoway line [variable3] [variable2],
yaxis(2)) if …
• graph (twoway line wqjt year, yaxis(1)) (twoway
line mqjt year, yaxis(2)) if ed==1 & ex == 1
• This produces a two-dimensional graph with the
wage on the first vertical axis (y1) and the migration
rate on the second vertical axis (y2)
GRAPHS: Scatter plots (I/II)
•
Scatter plots display the relations between two
variables
• Syntax:
• graph twoway scatter [variable1] [variable2] if …
• graph twoway scatter wqjt mqjt if ed==1
• This produces a two-dimensional scatter plot which
shows the relation between the two variables
GRAPHS: Scatter plots (II/II)
•
•
You can also add a linear fitted line:
Syntax:
• graph twoway scatter [variable1] [variable2] if …
|| lfit [variable1] [variable2] if …
• graph twoway scatter wqjt mqjt if ed==1
|| lfit wqjt mqjt if ed==1
2 Running regressions
•
•
The standard OLS regression command in STATA is
Syntax
• regress depvar [list of indepvar ] [if], [options]
• e.g. regress ln_wijt mijt $D_i $D_j $D_t
The multivariate linear regression model
The general econometric model:
γi
indicates the dependent (or: endogenous) variable
x1i,ki
exogenous variable, explaining the independent variable
β0
constant or the y-axis intercept (if x = 0)
β1,2,k
regression coefficient or parameter of regression
εi
residual, disturbance term
Running a regression model
Globals !
Regression
command
Dependent
variable
Independent
variables
Running a Regression: Output
How to interpret the output of a regression
variance of model
degrees of
freedom
. reg ln_wqkt mqkt
Source
SS
df
MS
Model
Residual
23.4146717
87.9145738
1
798
23.4146717
.110168639
Total
111.329246
799
.139335727
ln_wqkt
Coef.
mqkt
_cons
-1.369118
4.706176
Std. Err.
.093913
.017403
t
-14.58
270.42
Number of obs
F( 1,
798)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
800
212.53
0.0000
0.2103
0.2093
.33192
[95% Conf. Interval]
-1.553464
4.672015
β1
β0
=
=
=
=
=
=
1. Observations
2. fit of the model
3. F-Test
4. R-squared
5. adjusted Rsquared
6. Root Mean
Standard Error
-1.184772
4.740337
95% confidence interval
analysis of significance levels
Recall the Borjas (2003)-Modell
yijt = β mijt + si + xj + tt + (si ∙ xj) + (si ∙ tt) + (xj ∙ tt) + εijt
This model in STATA Syntax:
regress ln_wqjt mqjt $Di $Dj $Dt $Dij $Dit $Djt
where
• ln_wqjt: dependent variable (log wage)
• mqjt: migration share in educatipn-experience cell
• $Di: global for education dummies
• $Dj: global for experience dummies
• $Dt: global for time dummies
• $Dij: global for interaction education-experience
dummies
• $Dit: global for education-time interaction dummies
• $Djt: global for experience-time interaction dummies
What is a global?
• A global defines a vector of variables
• Defining a global:
STATA Syntax:
global [global name] [variable1] [variable2] …[variablex]
global Di Ded1 Ded2 Ded3
Using a global e.g. in a regression:
regress [depvariable] [other variable] [$global name]
regress ln_wqjt mqjt $Di
This is equivalent to:
regress ln_wqjt mqjt Ded1 Ded2 Ded3
Thus, globals are useful shortcuts for lists (vectors) of
variables.
An alternative to the Borjas (2003) model:
yijkt = β mijt + γk (zk ∙ mijt) + si + xj + zk + tt + (si ∙ xj) + (si ∙ zk)
+ (xj ∙ zk) + (si ∙ tt) + (xj ∙ tt) + (zk ∙ tt) + εijt
where
• zk is a dummy for foreigners (1 if foreigner, 0 if native)
• γk is a coefficient, which captures the different impact on foreigners,
• k (k= 0, 1) is a subscript for nationality
Idea: the slope coefficient γk is significantly different from zero, if
natives and immigrants are imperfect substitutes in the labour
market.
Problem: We have to reorganize the dataset such that it delivers
the wage and unemployment rates etc. for foreigners and
natives.
3 Panel Models
•
Very often you use panel models, i.e. models which
have a group and time series dimension
• There exist special estimators for this, e.g. fixed or
random effects models
• A fixed effects model is a model where you have a
fixed (constant) effect for each individual/group.
This is equivalent to a dummy variable for each
group
•
A random effects model is a model where you
have a random effect for each individual group,
which is based on assumptions on the distribution
of individual effects
Panel Models
Preparing data for Panel Models:
• For running panel models STATA needs to identify the
group(individual) and time series dimension
• Therefore you need an index for each group and an
index for each time period
• Then use the tsset command to organize you dataset
as a panel data set
• Syntax:
• tsset index year
• where index is the group/individual index and year
the time index
Preparation: Running the tsset command
Running Regressions: Panel Models
•
Then you can use panel estimators,
e.g. the xtreg estimator
• Syntax
• xtregress depvar [list of indepvar ] [if], [options]
• xtregress ln_wijt m_ijt, fe
• i.e. in the example we run a simple fixed effects panel
regression model which is equivalent to include a
dummy variable for each group (in this case
education-experience group)
Running a Panel Regression: command
Running a Panel Regression: Output
Running Regressions: Panel Models
•
There are other features of panel estimators which
are helpful
• Heteroscedasticity:
• Heteroscedasticity: the variance is not constant,
but varies across groups
• xtpcse , p(h) corrects for heteroscedastic standard
errors
• xtgls , p(h) corrects coefficient and standard
errors for panel heteroscedasticity, but may
produce biased results depending on the group
and time dimension of the panel
• Note: p(h) after the comma is a so-called “option”
in the STATA syntax
Heteroscedasticity within a group
Y
x
Heteroscedasticity in panel models across groups
Y
x
Running Regressions: Panel Models
•
Contemporary correlation across cross-sections
• Contemporary correlation: the error terms are
contemporarily correlated across cross-sections,
e.g. due to macroeconomic disturbances
• xtgls , p(c) corrects for contemporary correlation
and panel heteroscedasticity, but may produce
biased results depending on the group and time
dimension of the panel.
Next Meeting
July 4!
Presentation: July 18
Room RZ 01.02
Download