Uploaded by Amanda Jiang

Assignment1

advertisement
UNIVERSITY OF OTTAWA
2020-2021
DEPARTMENT OF ECONOMICS
Dr. Myra Mohnen
ECO4186 Applied Econometrics
Assignment 1
NOTE: The deadline for handing in this econometric exercise is Monday, 25
September 2020.
You should hand in (1) your code, (2) logfile and (3) a word/PDF document
with your figures, tables and interpretations of the results. Save these documents
as surname code, surname log and surname results, respectively. Upload these
documents on brightspace.
1. You are asked to construct the dataset yourself. To do so, follow these instructions:
(a) go to NLS Investigator website at https://www.nlsinfo.org/investigator/pages/
search.jsp?s=NLSY79.
(b) You can access data as a guest. Click begin searching as guest.
(c) Choose the NLSY79 (National Longitudinal Survey of Youth 1979).
(d) In the “Variable Search” tab, you can select the variables you needed for your
analysis
You only need to select the variable for one year (e.g. 1979)
(e) Once you have selected your variables, review them in the “Review Selected Variables” tab. Make sure to save this page. This will help you label the variables later
on.
(f) Go to the “Save/Download” tab and then the “Basic Download”. From there,
choose a filename and download your created dataset.
(g) In your downloads, you will find a folder with the filename.
(h) Open the CSV file in STATA using the command“insheet using filename.csv ”
(i) Rename all variables to upper case using the command “rename *, upper”
(j) Use the do file provided in the downloads folder. It contains the labels of values
along with the renaming of variables. Copy paste these to clean your dataset.
(k) Rename variable such that they start with their name followed by the year.
For example, “rename Q3 4 * educ*”.
Keep the variable CASEID 1979, SAMPLE RACE, and SAMPLE SEX.
Variables needed for the analysis:
• Gender
1
• Race
• Income: Total income from wages and salary in past calendar year
• Age
• Education: highest grade completed
• Experience: total tenure (in weeks) with employers as interview data job #1
• Training: work experience/special training to get current job
• Married: Marital status
• Occupation at job #1
• Residence: Rural/ urban
• Union membership
2. Clean your dataset
• Certain numerical variables will find negative numbers. This comes from the fact
that some responds refused to answer the question (value = -1), didn’t know the
answer (value = -2), skipped the question (value = -3), and wasn’t asked to answer
(value = -5). Make sure to replace those values with missing “.”
• Create dummy variables for female (equal to 1 if the person is a women, 0 otherwise), minority (equal to 1 if the person is black or hispanic, 0 otherwise), married
(equal to 1 if the person is married, 0 otherwise), urban (equal to 1 if the person is
living in an urban area, 0 otherwise)
• Create a string variable categorizing the type of occupation (professional, manager,
sales, clerical, craftsmen, and armed forces). The link between the occupation codes
and these categories will be explained in the NLSY website when you select the
variables.
• Label the indicator variable for union membership.
• Label each value of the variable union appropriately.
• Check the descriptives statistics found in the NLS codebook and your own.
3. Descriptives
(a) Create descriptives table: for each variable, what is the average, standard deviation,
minimum, and maximum value in your dataset? What is the number of observations
in your dataset?
(b) Calculate the correlation coefficient between wage and years of education. Is there
a strong relationship between these two variables?
(c) Create a histogram representing education (highest grade completed) for each type
of occupation. Describe it.
2
4. Tests
(a) What percentage of individuals are women, minorities and union members?
(b) What is the difference in wages for each of these characteristics? Is this difference
significant?
(c) Examine graphically the relationship between age and union membership, using
a bar graph. In this graph, each bar should show the average union membership
status for one of the age categories. Interpret your result.
5. Plots
(a) Create the variable log(wage)
(b) Produce a scatter plot of log wage and education including the fitted line
6. Partialling out vs multivariate regression
(a) Drop all observations for which log(wage) is missing
(b) Run the following regression: educ = β0 + β1 women
(c) Predict the residuals of the above equation residual
(d) Run the following regression: log(wage) = β0 + β1 residual
(e) Run the following regression: log(wage) = β0 + β1 educ + β2 women
(f) Compare the coefficients on education in both specification.
7. MLR
(a) Create a table from the following regressions and interpret the results
i. wage = β0 + β1 educ + ii. log(wage) = β0 + β1 educ + iii. log wage = β0 + β1 educ + β2 experience + iv. log(wage) = β0 + β1 educ + β3 training + v. log(wage) = β0 + β1 educ + β2 experience + β3 training + vi. log(wage) = β0 + β1 educ + β2 experience + β3 training + β4 experience2 + vii. log(wage) = β0 + β1 educ + β2 union + (b) Discuss the problem of omitted variable bias using equations (ii) and (iii)
(c) For specification (iv), test if education and training are jointly significant (at a
significance level of 5%).
(d) In specification (vi), what is the return to experience?
(e) Using specification (vii), test if union has a statistically significant effect on log
wages (at 1% significance).
3
(f) How much does the R2 increase by including union membership, compared to the
specification without union membership?
8. Regression: dummy, interactions
(a) For each of the following regressions and interpret your results:
i. Run a regression of lwage on educ and women
ii. Run a regression of lwage on educ, women, and minority
iii. Run a regression of lwage on educ, women, and the different values of minority
iv. Run a regression of lwage on educ, women, minority, urban, married and age
v. Add an interaction term between women and minority. (Hint: you can use
the command # to create an interaction or create the interaction variables
yourself.)
(b) Plot how predicted log wage changes with education by gender
(c) Given the last specification, what is the predicted log wage for a minority married
woman, aged 20, living in an urban area with 10 years of education?
(d) What does the coefficient on the interaction term measure?
(e) Test if the effect of the interaction term is significant
4
Download