Paper Outline Abstract This is a half-page description of your conclusions and how you reached them. Don’t summarize everything that you did. Don’t report t-values, p-values, goodness of fit measures. Do highlight the results for your variable(s) of interest and how your results compare to other studies. I. Introduction and Motivation Introduce the topic and your variable(s) of interest by describing what you are doing and why it is important or unique. Provide relevant “big picture” statistics, for example, if you are estimating gender wage discrimination, you could begin by discussing the unconditional average (or median) wages by gender. II. Literature Review Have other people studied this issue or one related to it (in particular, search the peer-reviewed literature with broad enough key words to find appropriate studies), and what conclusions did they reach? See your Stats notebook for suggestions on places to find peer-reviewed literature. III. Model Development Write down the standard Mincerian wage regression. (1) Yi 0 1 X 1i 2 X 2i ... K X Ki i Do not discuss the economic theory behind every variable. Rather, pick (at least) one explanatory variable of interest and carefully develop the theory behind it in the context of a linear regression model as given in equation (1), where i indexes individuals, K is the number of slope parameters, and represents a random error term. Should k , the marginal influence of X k on Y be positive or negative and why? Does your theory specify a linear relationship1 between the variables and, if not, is it appropriate to use a linear representation to approximate the relationship? Any relationship evident from your data is not relevant at this point. You may have to explain that you will assume linearity because your theory may be non-linear or it may not specify the nature of the relationship. IV. Discussion of the Data Are the data appropriate to test the theory for your variable of interest? How closely do your data match the theoretical regression in (1)? Describe the data sources, the sampling procedures used to collect the data, and possible nonsampling errors. Include a table with “raw” (i.e., not yet cleaned) summary statistics for your dataset as it is right after you download it and get it into SAS, Stata, or Excel. Next, describe the steps you took to 1 Equation (1) only includes linear effects, but you may find that non-linear effects are appropriate to include for some independent variables. “clean” your data, justifying your decisions. For example, if you have a variable called income and the summary statistics indicate that the maximum value for it is 999999, then that likely indicates that 999999 is not a real value for a person’s annual income, but is instead a placeholder which the data collectors used when a person did not answer the question about income or perhaps did not know their income. Of course, it could be a real income value, so read the documentation accompanying your data carefully and investigate any values which seem out of the ordinary. Another example of data that would need to be cleaned is if a tabulation of years of education includes the value 0. While that looks like it could be a true value of education, it may actually mean a person did not even complete 1st grade, or it could be another placeholder, perhaps indicating that the variable years of education is “Not Available” for that individual. Check the person’s age, it might be a 3 year-old kid, whom you would not want to include in a Mincerian wage determination analysis. A third example is data which are simply incorrect, such as birth month equal to 13 or weeks unemployed last year equal to 75 (where the person may have misread or misinterpreted the question “How long were you unemployed in the last year?” to be “How long have you been unemployed?” When you find such data, you can either correct it if you are able to discern the true value, or you can change it to a period ( actually type in “.” ). This commonly means “missing value.” You should also make the units of your variables clear. Do you have binary variables (only 0 or 1)? Do you have ordinal variables (self-rated health on a scale of 1 to 5)? Are monetary values expressed in dollars, thousands of dollars, millions? Are they real or nominal? If real, what is the base year(s)? Are rates – such as interest rates, unemployment rates, unionization rate, remarriage rate – measured as a percent or a proportion? Summarize this cleaned data, which you will use to run the regressions, with appropriate sample statistics (mean, median, percentiles, standard deviation, coefficient of variation, etc.). What do the statistics tell you about the sample (e.g., the degree of dispersion or the symmetry of the data)? Do you notice any interesting facts or observations? Might those observations affect your analysis and, if so, how? V. Analyses Write down and estimate a model similar to equation (1) and properly interpret the results, focusing on your variable(s) of interest. In your discussions, do not discuss the mechanics of the calculation but, instead, emphasize the implications of the calculations. Do not simply paste the statistical results as presented by SAS, Stata, or Excel; instead, use your word processor to create more informative tables. For example, a table like the following is much more reader-friendly and informative than a series of regression outputs from SAS, Stata, or Excel. You may use the “outreg” command in Stata, which automatically formats your results in the style used in many economics journals. You should report the full regression results (I have only listed one explanatory variable before for brevity), but highlight your variable(s) of interest, making sure to discuss statistical significance and economic significance (that is, you may find a statistically significant effect, but it might be very small in magnitude). Table # Regression Results Dependent variable is … Covariates Xk (standard error or t-statistic) Constant (intercept) (standard error or t-statistic) Coefficient of Determination Adjusted R2 Standard Error of the Estimate Model (1) Model (2) Model (3) Absolute value of t-statistics/standard errors in parentheses. * significant at 10%; ** significant at 5%; *** significant at 1% When you present the regression analysis, be certain to discuss the following points. a. Interpret the estimated regression coefficient(s) of interest for the population model of equation (1). Do they make sense intuitively? Why or why not? b. How accurate is the estimation of the linear representation given in equation (1)? Look at the coefficient of determination and the standard error of the estimate and compare these values to the standards for a “good” and a “poor” regression. How does it compare to others’ work? c. Perform the following hypothesis tests to determine if there is a linear relationship as specified in equation (1). Clearly interpret the results for your variable(s) of interest. H0: 1 = …= K = 0 H0: k = 0 (appropriate one-tailed alternative) This second hypothesis tests statistical significance. Also discuss economic significance. d. You do not need to test the least squares assumptions for this paper. e. What are the drawbacks of your analysis (e.g., omitted variables and their consequences, sample selection bias)? VI. Conclusion Discuss the implications of the analysis for your theory. Compare your results to previous work. References (including citations for the data) Appendices Include your SAS or Stata program with appropriate comments in it (so that a reader can follow it more easily), SAS or Stata output, and Excel printouts. These should have no additional writing