UNIVERSITY OF OTTAWA 2020-2021 DEPARTMENT OF ECONOMICS Dr. Myra Mohnen ECO4186 Applied Econometrics Assignment 1 NOTE: The deadline for handing in this econometric exercise is Monday, 25 September 2020. You should hand in (1) your code, (2) logfile and (3) a word/PDF document with your figures, tables and interpretations of the results. Save these documents as surname code, surname log and surname results, respectively. Upload these documents on brightspace. 1. You are asked to construct the dataset yourself. To do so, follow these instructions: (a) go to NLS Investigator website at https://www.nlsinfo.org/investigator/pages/ search.jsp?s=NLSY79. (b) You can access data as a guest. Click begin searching as guest. (c) Choose the NLSY79 (National Longitudinal Survey of Youth 1979). (d) In the “Variable Search” tab, you can select the variables you needed for your analysis You only need to select the variable for one year (e.g. 1979) (e) Once you have selected your variables, review them in the “Review Selected Variables” tab. Make sure to save this page. This will help you label the variables later on. (f) Go to the “Save/Download” tab and then the “Basic Download”. From there, choose a filename and download your created dataset. (g) In your downloads, you will find a folder with the filename. (h) Open the CSV file in STATA using the command“insheet using filename.csv ” (i) Rename all variables to upper case using the command “rename *, upper” (j) Use the do file provided in the downloads folder. It contains the labels of values along with the renaming of variables. Copy paste these to clean your dataset. (k) Rename variable such that they start with their name followed by the year. For example, “rename Q3 4 * educ*”. Keep the variable CASEID 1979, SAMPLE RACE, and SAMPLE SEX. Variables needed for the analysis: • Gender 1 • Race • Income: Total income from wages and salary in past calendar year • Age • Education: highest grade completed • Experience: total tenure (in weeks) with employers as interview data job #1 • Training: work experience/special training to get current job • Married: Marital status • Occupation at job #1 • Residence: Rural/ urban • Union membership 2. Clean your dataset • Certain numerical variables will find negative numbers. This comes from the fact that some responds refused to answer the question (value = -1), didn’t know the answer (value = -2), skipped the question (value = -3), and wasn’t asked to answer (value = -5). Make sure to replace those values with missing “.” • Create dummy variables for female (equal to 1 if the person is a women, 0 otherwise), minority (equal to 1 if the person is black or hispanic, 0 otherwise), married (equal to 1 if the person is married, 0 otherwise), urban (equal to 1 if the person is living in an urban area, 0 otherwise) • Create a string variable categorizing the type of occupation (professional, manager, sales, clerical, craftsmen, and armed forces). The link between the occupation codes and these categories will be explained in the NLSY website when you select the variables. • Label the indicator variable for union membership. • Label each value of the variable union appropriately. • Check the descriptives statistics found in the NLS codebook and your own. 3. Descriptives (a) Create descriptives table: for each variable, what is the average, standard deviation, minimum, and maximum value in your dataset? What is the number of observations in your dataset? (b) Calculate the correlation coefficient between wage and years of education. Is there a strong relationship between these two variables? (c) Create a histogram representing education (highest grade completed) for each type of occupation. Describe it. 2 4. Tests (a) What percentage of individuals are women, minorities and union members? (b) What is the difference in wages for each of these characteristics? Is this difference significant? (c) Examine graphically the relationship between age and union membership, using a bar graph. In this graph, each bar should show the average union membership status for one of the age categories. Interpret your result. 5. Plots (a) Create the variable log(wage) (b) Produce a scatter plot of log wage and education including the fitted line 6. Partialling out vs multivariate regression (a) Drop all observations for which log(wage) is missing (b) Run the following regression: educ = β0 + β1 women (c) Predict the residuals of the above equation residual (d) Run the following regression: log(wage) = β0 + β1 residual (e) Run the following regression: log(wage) = β0 + β1 educ + β2 women (f) Compare the coefficients on education in both specification. 7. MLR (a) Create a table from the following regressions and interpret the results i. wage = β0 + β1 educ + ii. log(wage) = β0 + β1 educ + iii. log wage = β0 + β1 educ + β2 experience + iv. log(wage) = β0 + β1 educ + β3 training + v. log(wage) = β0 + β1 educ + β2 experience + β3 training + vi. log(wage) = β0 + β1 educ + β2 experience + β3 training + β4 experience2 + vii. log(wage) = β0 + β1 educ + β2 union + (b) Discuss the problem of omitted variable bias using equations (ii) and (iii) (c) For specification (iv), test if education and training are jointly significant (at a significance level of 5%). (d) In specification (vi), what is the return to experience? (e) Using specification (vii), test if union has a statistically significant effect on log wages (at 1% significance). 3 (f) How much does the R2 increase by including union membership, compared to the specification without union membership? 8. Regression: dummy, interactions (a) For each of the following regressions and interpret your results: i. Run a regression of lwage on educ and women ii. Run a regression of lwage on educ, women, and minority iii. Run a regression of lwage on educ, women, and the different values of minority iv. Run a regression of lwage on educ, women, minority, urban, married and age v. Add an interaction term between women and minority. (Hint: you can use the command # to create an interaction or create the interaction variables yourself.) (b) Plot how predicted log wage changes with education by gender (c) Given the last specification, what is the predicted log wage for a minority married woman, aged 20, living in an urban area with 10 years of education? (d) What does the coefficient on the interaction term measure? (e) Test if the effect of the interaction term is significant 4