ASSIGNMENT #2 - DUE APRIL 11, 2016 DATA ANALYSIS USING STATA

advertisement
ASSIGNMENT #2 - DUE APRIL 11, 2016
NO LATE ASSIGNMENTS ACCEPTED; NO ASSIGNMENTS SENT VIA EMAIL ACCEPTED.
DATA ANALYSIS USING STATA
The following questions use the Excel data, “gravity.xlsx”, posted on the course website.
Your job is to use the data provided in order to analyze the determinants of global trade.
Answer the following questions, providing a short, written answer in complete sentences and attach a
hard copy of the Stata output that supports your written answer. This output must consist of screen
captures from Stata—not Excel, nor anything else. If not, zero points will be given.
Feel free to work in groups of up to four people. When you submit your assignment, please submit
one assignment per group and write down the names of all people who have worked on the
assignment. Each individual in the group will receive the same grade for the assignment.
Collaboration is encouraged; copying answers is not. If the assignment of group/person A appears to
be identical to that of group/person B, but A's name is not on B's assignment or vice-versa, then both
A and B will receive an automatic zero on the assignment.
The dataset consists of 1188 observations on 13 variables for a set of 108 bilateral country pairs. The
following comprises the full set of variable names and labels in the dataset.
year = the year of the observation
ctry1 = the abbreviation for the first country in the country pair (e.g., CAN = Canada)
ctry2 = the abbreviation for the second country in the country pair (e.g., AUS = Australia)
trade12 = the value of exports from ctry1 to ctry2 (in millions of 1990 US dollars)
trade21 = the value of exports from ctry2 to ctry1 (in millions of 1990 US dollars)
gdp1 = gross domestic product of ctry1 (in millions of 1990 US dollars)
gdp2 = gross domestic product of ctry2 (in millions of 1990 US dollars)
dist = the distance separating the capital cities of ctry1 and ctry2 (in kilometers)
border = a dummy variable equal to 1 if ctry1 and ctry2 share a border and 0 if otherwise (e.g., if
Canada is ctry1 and the US is ctry2, then border = 1; if Canada is ctry1 and the India is ctry2,
then border = 0)
empire = a dummy variable equal to 1 if ctry1 and ctry2 were in the same empire either now or in the
past and 0 if otherwise (e.g., if Canada is ctry1 and the US is ctry2, then empire = 0; if
Canada is ctry1 and India is ctry2, then empire = 1)
ervol = a measure of the volatility of the nominal exchange rate between ctry1’s currency and ctry2’s
currency
fixed= a dummy variable equal to 1 if ctry1 and ctry2 have a fixed nominal exchange rate and 0 if
otherwise
language = a dummy variable equal to 1 if ctry1 and ctry2 share the same language and 0 if otherwise
(e.g., if Canada is ctry1 and the US is ctry2, then language = 1; if Canada is ctry1 and India is
ctry2, then language = 1)
1) Generate the sum of trade12 and trade21 and call it “trade”; likewise, generate the sum of gdp1 and
gdp2 and call it “gdp”. Report the sample mean, minimum, maximum, and standard deviation for
“trade” and “gdp”. Generate and attach a histogram for each of these variables.
2) Generate the natural log of the product of trade12 and trade21 and call it “lntrade”; likewise,
generate the natural log of the product of gdp1 and gdp2 and call it “lngdpprod”. Report the sample
mean, minimum, maximum, and standard deviation for “lntrade” and “lngdpprod”. Generate and
attach a histogram for each of these variables.
3) Generate a scatterplot of “trade” versus “gdp” (with “trade” on the y-axis) and “lntrade” versus
“lngdpprod” (with “lntrade” on the y-axis). Describe your findings.
4) Generate the natural log of distance and call it “lndist”. Estimate the coefficients in the log-log
regression of “lntrade” on “lngdpprod" and “lndist”, making sure to include a constant. Interpret the
coefficient estimates. Formulate an appropriate null and alternative hypothesis based on your
expectations about the sign of the coefficient associated with “lndist”, using a 1% level of significance
and interpreting the results of the test in plain English. Construct a 99% confidence interval for the
coefficient associated with “lndist”. What is the interpretation of this 99% confidence interval?
5) Perform a test of joint significance (that is, a F-test) for the model in question 4, using a 1% level
of significance. Explain the results.
6) Perform a RESET test for the model in question 4, using a 10% level of significance. Explain the
results.
Now, you are going to think about an appropriate model by yourself and use it to quantify the
determinants of international trade between countries.
7) Which variable would you choose as your dependent variable, “trade” or “lntrade”? Explain.
8) For the dependent variable that you chose under 7), you need to select some variables apart from
GDP and distance which simply must be in the model. Name one more variable which simply must be
in the model and explain why and what your expectations are regarding its sign. Now, name two more
variables you may want to include, but about which you are not yet sure. Explain your answer and
what your expectations are regarding their signs.
9) Estimate a few models including the variables you listed for 7) and 8). For your preferred set of
explanatory variables estimate both a linear model and a log-log model. Report the regression results
of all the models you estimated, indicate which one is your preferred specification, and explain why
(make sure you refer to and/or test for omitted variable bias, other specification errors, multicollinearity, and serial correlation).
10) Name one more variable which is not in the dataset, but should be in the model given the time
period of the sample. Explain why and what your expectations are regarding its sign. Given this
answer, what can you likely conclude about the determinants of international trade between countries?
Download