Paper Outline

advertisement
Paper Outline
Abstract
This is a half-page description of your conclusions and how you reached them. Don’t summarize
everything that you did. Don’t report t-values, p-values, goodness of fit measures. Do highlight the
results for your variable(s) of interest and how your results compare to other studies.
I. Introduction and Motivation
Introduce the topic and your variable(s) of interest by describing what you are doing and why it is
important or unique. Provide relevant “big picture” statistics, for example, if you are estimating
gender wage discrimination, you could begin by discussing the unconditional average (or median)
wages by gender.
II. Literature Review
Have other people studied this issue or one related to it (in particular, search the peer-reviewed
literature with broad enough key words to find appropriate studies), and what conclusions did they
reach? See your Stats notebook for suggestions on places to find peer-reviewed literature.
III. Model Development
Write down the standard Mincerian wage regression.
(1)
Yi   0  1 X 1i   2 X 2i  ...   K X Ki   i
Do not discuss the economic theory behind every variable. Rather, pick (at least) one explanatory
variable of interest and carefully develop the theory behind it in the context of a linear regression
model as given in equation (1), where i indexes individuals, K is the number of slope parameters,
and  represents a random error term. Should  k , the marginal influence of X k on Y be positive
or negative and why? Does your theory specify a linear relationship1 between the variables and, if
not, is it appropriate to use a linear representation to approximate the relationship? Any relationship
evident from your data is not relevant at this point. You may have to explain that you will assume
linearity because your theory may be non-linear or it may not specify the nature of the relationship.
IV. Discussion of the Data
Are the data appropriate to test the theory for your variable of interest? How closely do your data
match the theoretical regression in (1)? Describe the data sources, the sampling procedures used to
collect the data, and possible nonsampling errors.
Include a table with “raw” (i.e., not yet cleaned) summary statistics for your dataset as it is right
after you download it and get it into SAS, Stata, or Excel. Next, describe the steps you took to
1
Equation (1) only includes linear effects, but you may find that non-linear effects are appropriate
to include for some independent variables.
“clean” your data, justifying your decisions. For example, if you have a variable called income and
the summary statistics indicate that the maximum value for it is 999999, then that likely indicates
that 999999 is not a real value for a person’s annual income, but is instead a placeholder which the
data collectors used when a person did not answer the question about income or perhaps did not
know their income. Of course, it could be a real income value, so read the documentation
accompanying your data carefully and investigate any values which seem out of the ordinary.
Another example of data that would need to be cleaned is if a tabulation of years of education
includes the value 0. While that looks like it could be a true value of education, it may actually
mean a person did not even complete 1st grade, or it could be another placeholder, perhaps
indicating that the variable years of education is “Not Available” for that individual. Check the
person’s age, it might be a 3 year-old kid, whom you would not want to include in a Mincerian
wage determination analysis. A third example is data which are simply incorrect, such as birth
month equal to 13 or weeks unemployed last year equal to 75 (where the person may have misread
or misinterpreted the question “How long were you unemployed in the last year?” to be “How long
have you been unemployed?”
When you find such data, you can either correct it if you are able to discern the true value, or you
can change it to a period ( actually type in “.” ). This commonly means “missing value.”
You should also make the units of your variables clear. Do you have binary variables (only 0 or 1)?
Do you have ordinal variables (self-rated health on a scale of 1 to 5)? Are monetary values
expressed in dollars, thousands of dollars, millions? Are they real or nominal? If real, what is the
base year(s)? Are rates – such as interest rates, unemployment rates, unionization rate, remarriage
rate – measured as a percent or a proportion?
Summarize this cleaned data, which you will use to run the regressions, with appropriate sample
statistics (mean, median, percentiles, standard deviation, coefficient of variation, etc.). What do the
statistics tell you about the sample (e.g., the degree of dispersion or the symmetry of the data)? Do
you notice any interesting facts or observations? Might those observations affect your analysis and,
if so, how?
V. Analyses
Write down and estimate a model similar to equation (1) and properly interpret the results, focusing
on your variable(s) of interest. In your discussions, do not discuss the mechanics of the calculation
but, instead, emphasize the implications of the calculations. Do not simply paste the statistical
results as presented by SAS, Stata, or Excel; instead, use your word processor to create more
informative tables. For example, a table like the following is much more reader-friendly and
informative than a series of regression outputs from SAS, Stata, or Excel. You may use the “outreg”
command in Stata, which automatically formats your results in the style used in many economics
journals. You should report the full regression results (I have only listed one explanatory variable
before for brevity), but highlight your variable(s) of interest, making sure to discuss statistical
significance and economic significance (that is, you may find a statistically significant effect, but it
might be very small in magnitude).
Table #
Regression Results
Dependent variable is …
Covariates
Xk
(standard error or t-statistic)
Constant (intercept)
(standard error or t-statistic)
Coefficient of Determination
Adjusted R2
Standard Error of the Estimate
Model (1)
Model (2)
Model (3)
Absolute value of t-statistics/standard errors in parentheses. * significant at 10%; ** significant at 5%; *** significant at 1%
When you present the regression analysis, be certain to discuss the following points.
a. Interpret the estimated regression coefficient(s) of interest for the population model of equation
(1). Do they make sense intuitively? Why or why not?
b. How accurate is the estimation of the linear representation given in equation (1)? Look at the
coefficient of determination and the standard error of the estimate and compare these values to
the standards for a “good” and a “poor” regression. How does it compare to others’ work?
c. Perform the following hypothesis tests to determine if there is a linear relationship as specified
in equation (1). Clearly interpret the results for your variable(s) of interest.
H0: 1 = …= K = 0
H0: k = 0 (appropriate one-tailed alternative)
This second hypothesis tests statistical significance. Also discuss economic significance.
d. You do not need to test the least squares assumptions for this paper.
e. What are the drawbacks of your analysis (e.g., omitted variables and their consequences, sample
selection bias)?
VI. Conclusion
Discuss the implications of the analysis for your theory. Compare your results to previous work.
References (including citations for the data)
Appendices
Include your SAS or Stata program with appropriate comments in it (so that a reader can follow it
more easily), SAS or Stata output, and Excel printouts. These should have no additional writing
Download