stata hands on session

advertisement
SIMULATIONS MODELS FOR INTERNATIONAL TRADE
GRAVITY EQUATIONS FOR INTERNATIONAL TRADE MODELS
Paris-Dauphine / September 2015
DOCUMENT 2:
STATA HANDS ON SESSION
Ramón Mahía – UAM
(Based on the material provided y UNCTAD-WTO)1
Complete modified and commented DO File: DO_MODIFIED_COMMENTED
1.- MANIPULATION OF DATA (Previous to Econometric estimation)
(Steps 1 to 7 as described in Chapter 3 – UNCTAD/WTO.)
Several operations to perform before estimation (see DO_MANIPULATION_COMMENTED):
-
Download datasets from sources and import them into a single software format (stata dta, E-Views
wf,..)
Homogenize formats of different datasets, list of countries, names for countries, names for variables,
“names” for years
Replace missings (ceros for trade, functional 999 for real missings….)
Generate the structure for the gravity model data set: all possible combinations of countries (and
years if panel Is used)
Merge different files into a single one
Generate dummies (if needed) for year, country, and Year x Country
Compute log variables (for GDP, trade and distance)
Step 1:
-
1
Import CSV trade flows (tradeflows.csv), label variables and save to .dta
Import txt file “joinwto.txt” with year of accession for each country and save it in .dta format
Import CSV file “GDP.csv” with GDP data for each countries from 1960 to 2006,
Replace BELGIUM and LUXEMBOURG by BENELUX, compute BENELUX GDP with the sum of both
countries and change names for year variables save it in .dta format
Open STATA datafile containing the rest of explanatory variables, fix BENELUX problem, change some
variable names, label some other variables and save it in .dta format
IMPORTANT NOTE: The content of this document, and specially the exercise section, is based on the document prepared
by UNCTAD-WTO entitled “A Practical Guide to Trade Policy Analysis. (Chapter 3. Analyzing bilateral trade using the gravity
equation). To access the on-line version of this UNCTAD-WTO doc, visit the WEB page: http://vi.unctad.org/tpa/index.html
1
Basically, at the end of that Step 1, four different STATA files are created and stored in the default
directory:
1.
tradeflows.dta (endogenous variable) in a Panel dataset for YEARS and PAIRS of countries
in LONG format
2.
joinwto.dta (for the explanatory variable “wtoaccesion”) in a Cross Section dataset for
INDIVIDUAL countries
3.
GDP.dta (for explanatory variables GDP’s) in a Panel dataset for YEARS and INDIVIDUAL
countries in WIDE format
2
4.
CEPII.dta (other explanatory variables in LONG format) in a Cross Section dataset for PAIRS
of countries
Step 2:
-
Starting with “tradeflows.dta”, create the FULL structure of the datafile: PANEL DATA for YEARS and
every possible combination (PAIR) of countries filling with “zeros” the pairs newly created. The
temporary file created is "gravity_temp1.dta"
-
Reshape GDP.dta to LONG Panel set and create a duplicate (GDP is going to be used as both importers’s
GDP and exporter’s GDP)
Step 3:
reshape long stub, i(i) j(j)
\
j new variable
reshape long yr, i(countrycode) j(year)
rename yr gdp
-
And MERGE those two new files (“GDP_exporter.dta”
and “GDP_importer.dta”) with
"gravity_temp1.dta" keeping those observations (PAIRS of countries) with information in both files.
3
-
MERGE “joinWTA.dta” with that file creating two new variables: join_exporter and join_importer .
-
The new temporary file created is "gravity_temp3.dta"
-
MERGE data of both two new files “CEPII.dta” (previously saved) and “religion.dta” with the
previous.
The new temporary file created is "gravity_temp4.dta"
Step 4:
-
Step 5:
-
-
Create WTO accession dummies depending on whether one, none or both countries are members of
WTO or not (onein, nonein, bothin)
The new PERMANENT file created is "gravity.dta" and basically contains the core dataset (endogenous
and exogenous variables, except for country/country x time/time dummies and some lasting
transformations)
The structure of the main dataset is shown in the next screenshot: Each row contains a trade flow
(import) and the variables for the gravity equation (GDPS, and the terms for barriers and incentives)
EXCEPT FOR MRT’S dummies.
4
Step 6:
-
Create country/country x time/time dummies for the specification of MTR terms and time fixed effects
In this block, due to memory restrictions, three different options are offered if the number of dummies
exceed the STATA capacity:
o Option selected in this example: Reduce the number of years (>1995→1996 – 2005)
o Compute country-period (and not country-year dummies)
o Make a balanced panel (reducing the sample to those countries having the information for the
same time period).
-
Create logs of variables GDP’s, and distance
Compute five year averages of some variables
Create a subset with OECD countries for the period 1196-2005
Create a subset with OECD countries for the period 2000-2005
Step 7:
2.- ECONOMETRIC ESTIMATIONS OF GRAVITY EQUATIONS
-
Load dataset “gravity_OECD_2000_2005.dta”:
o 33 countries
o 6 years
o make a total of ([33(33-1)])x6=6336 records
5
-
REG1: ESTIMATE A CROSS SECTION BASIC REGRESION IN LOG-LOG, WITHOUT MRT’s FOR YEAR AND
DO SOME BASIC CHECKS:
o
Check number of valid observations for the endogenous “LIMPORTS” in 2000 and 2005
There could be a maximum of 33*32= 1056 valid values but there are only 992 because of 64
Missings due to zero values for trade with origin or destination in BLX.
o
Estimate the simplest log-linear gravity model regression for the year 2005 using only
lgdp_exporter, lgdp_importer and ldistance and interpret parameters/elasticities
o
Check if GDP elasticities are close to unitary as predicted by theory:



Theory predicts a value around 1 for both elasticities
A difference between origins GDP and destination GDPs is expected, a lower estimation
for importer GDPs would suggest evidence of home market effects (due to barriers to
entry or national product differentiation).
Meta-Analysis shows that distance coefficient is also around -1.
META analysis for 2500 gravity equations estimations.
Table extracted from Head, K., & Mayer, T. (2013). Gravity equations: Workhorse, toolkit, and
cookbook. Handbook of international economics, 4.
6
o
Check if trade elasticity is significantly more sensible to trade barriers (proxied by distance) in 2005
than in 2000
 Procedure: compare basic estimation for different years (2000 Vs 2005) using seemingly
unrelated estimation (STATA suest2 command)
It looks like no statistical difference exists comparing 2000 and 2005 estimates.
-
REG2: ESTIMATE ANOTHER CROSS SECTION REGRESSION WITH ADDITIONAL VARIABLES
o
Estimate, with robust inference, for 2005 adding more variables:
reg limports contig comlang_off onein colony REPlandlocked PARTlandlocked religion ldist
lgdp* if year==2005, robust
2
Seemingly unrelated estimation procedure combines the estimation results (parameter and variance matrices) in one parameter vector and
simultaneous (co)variance matrix. The procedure is done after the isolated estimation of each equation. The idea behind this reasoning is that
error terms in different equations might be correlated, and that may impact in the estimated covariance of parameters and thus in every crossmodel hypothesis concerning parameters of those different equations.
7

o
-
“onein” coefficient cannot be estimated (only zero values), and the same for “bothin”
(only value 1) (tab onein if year==2005)
Compare REG1 and REG2 regressions3. Check elasticities obtained:
 GDP’s coefficients appeared to be slightly overestimated in the first regression but the
size, and even the sign of this bias depends on the particular nature of relationship
between trade resistance / incentive omitted variables for the particular case of countries
comprised in the sample.
 How do you compute the elasticity for dummy variables?4
 Adjacency coefficient (“contig”) usually lies in the vicinity of 0.5 (Head, K 2003) suggesting
that trade is about 65% higher as a result of sharing a border. To omit this variable may
cause an upward bias (in absolute value) in distance parameter (both are negatively
related to each other)
 Contiguity and common language effects seem to have very comparable effects, with
coefficients around 0.5. (Head, K., & Mayer, T. (2013), see table above)).
 According to some papers, common links (lenguaje, colony,…) may cause very significant
rises in trade (up to two, three times or even more…). Colonial links are not significant in
our regression given the particular nature of the sample (only OECD countries included)
 “Landlocked” variable seems only be significant for PARTNER (importer) country resulting
in a reduction of imports of around 42% (coeff.=0,357).
REG3: ADDING DUMMIES TO CONTROL FOR MTR’s EFFECT
o
o
Keep only 2000 - 2005 observations with origin or destination in an OECD country
3.1 Try to estimate REG2, with robust inference, for a cross section in 2005 adding country
dummies importer_* and exporter_* to control for MTR with country dummies (adding also year*
dummies)
3
For that, it is useful to use “eststo” command (download it first if not already installed)
4
Remember that, in a log-log model, raw coefficients for dummies do not represent elasticities (% changes). The elasticity can be easily derived
with Exp(β)-1.
8

o
o
o
-
Commonly, year dummies control for omitted terms causing secular / trend variation in
panel data models (affecting in our example world trade for every single pair of exporter
– importer)
 Given that importer_* and exporter_* are country specific (not pair specific) perfectly
correlate with other country specif variables such as REPlandlocked PARTlandlocked and
lgdp_importer lgdp_exporter
 Important differences appear for common coefficients, especially in the case of distance
(“ldist”) that now exhibits an elasticity grater that one (as expected according to the
MetaAnalysis table previously shown)
3.2 How can we add country dummies keeping the estimates of those country specifics such as
GDP’s?. A pooled OLS regression for a short period (2000-2005) could be a solution for that country
specific variables that varies over time (GDP’s for example) but not for country time-invariant
variables (such as REPlandlocked, PARTlandlocked)
 Repeat previous regression for the period 2000-2005
 In effect, GDP’s coefficients can be now estimated and, according to literature, elasticities
drop substantially (down to 0.6) in this “structural” version compared to previous
estimates (without controlling for MRT’s)
3.3. What if we now add country x time dummies allowing for MTR time variants? (in the previous
regression, MRT terms were supposed to be constant over time)
 The answer is that, given that MRT’s now varies over time, we cannot again estimate
country specific time variant variables (such as GDP’s)
3.4. What if we now add country-pair dummies allowing to control for paired heterogeneity?
 Adding “pairid” fixed effects does not allow to estimate the coefficients for any “country
pairs” such as distance, colony, onein,…..
 SO IF WE CONTROL FOR ALL FIXED EFFECTS AT THE SAME TIME (COUNTRY, YEAR,
COUNTRY X YEAR, AND COUNTRY-PAIRS) WE THEN LOSE THE REST OF PARAMETERS
(except for fixed effects)
REG4: PANEL DATA (Step 8 in UNCTAD-WTO document)
o
o
o
Set panel data structure (remember that the panel observation refers to “ij” pairs)
Estimate a simple panel data FIXED effects (to control for bilateral MRT’s (including, also, time
effects)
We have to notice that, controlling with FE for bilateral MRT’s terms we are unable to estimate
coefficients for every TIME INVARIANT variables both for “ij” pairs (such as distance, colony,
common language, FTA) or simply at the level of “i” and/or “j” (such as landlocked)
9
o
Using RANDOM Effects, we may estimate every coefficient (missed with FE) but, as ever with RE,
at the risk of biased estimates:
o
Consider the possibility of RE Vs FE doing a Haussman test
10
Download