ASSIGNMENT #1 - DUE MARCH 2, 2016 DATA ANALYSIS USING STATA

advertisement
ASSIGNMENT #1 - DUE MARCH 2, 2016
NO LATE ASSIGNMENTS ACCEPTED; NO ASSIGNMENTS SENT VIA EMAIL ACCEPTED.
DATA ANALYSIS USING STATA
The following questions use the Excel data, “NHL 1601.xlsx”, posted on the course website.
Your job is to use the data provided in order to analyze the determinants of NHL players’ salaries.
Answer the following questions, providing a short, written answer in complete sentences and attach a
hard copy of the Stata output that supports your written answer. Stata output is required for questions
1 through 6 and 8 through 10. NB: this output must consist of screen captures from Stata—not Excel,
nor anything else; see example below. If not, zero points will be given.
Feel free to work in groups of up to four people. When you submit your assignment, please submit
one assignment per group and write down the names of all people who have worked on the
assignment. Each individual in the group will receive the same grade for the assignment.
Collaboration is encouraged; copying answers is not. If the assignment of group/person A appears to
be identical to that of group/person B, but A's name is not on B's assignment or vice-versa, then both
A and B will receive an automatic zero on the assignment.
(1) Give the sample mean, minimum, maximum, and standard deviation for the SALARY variable.
(2) Create a new variable called SALARY_NOMISS that equals 1 if the SALARY variable is not
missing and equals 0 otherwise (that is, 0 if SALARY is missing). What is the value of the sample
mean of SALARY_NOMISS? What does it measure?
(3) What is the average number of POINTS given that SALARY is not missing? What about when
SALARY is missing? What is the interpretation of these averages?
(4) The variable CAD_SALARY is an indicator variable that equals 1 if a player’s salary is measured
in Canadian dollars and equals 0 if it is measured in US dollars. The SALARY variable is the salary
paid to the player in the 1999-2000 season. From 1999 to 2000, the average Canada-US exchange rate
was 1 USD = 1.4833 CAD. Use this exchange rate to create a new variable called SALARY_USD
that report players’ 1999-2000 salaries in US dollars. Report the mean, minimum, maximum, and
standard deviation of this new variable.
(5) Since this is historical data, it would be nice to know the real value of US dollar salaries (that is,
salaries adjusted for changes in prices over time). The US CPI in June 2000 was equal to 172.4 while
the US CPI in June 2015 was equal to 238.6. Generate a new variable called SALARY_REAL which
represents the real value of US dollar salaries in 2015. Attach a histogram for this new variable.
(6) Which (non-salary related) variable has the strongest sample correlation with SALARY_REAL?
Which (non-salary related) variable has the weakest sample correlation with SALARY_REAL?
Interpret the value of these correlations.
(7) Report the p-value for testing the null hypothesis that the average SALARY_REAL of NHL
players was 1.50 million USD. Explain what this means.
(8) Use OLS to estimate the regression of SALARY_REAL (this is your dependent variable) on AGE,
POINTS, and an intercept. For reference, your independent variables here are: AGE = the player’s age
in years; and POINTS = number of points scored during the regular season. Report the estimated
regression coefficients and interpret the estimated coefficient on POINTS.
(9) Construct a new variable, DENSITY = WEIGHT/HEIGHT where WEIGHT = weight in pounds
and HEIGHT = height in inches. Use OLS to estimate the regression of SALARY_REAL (this is your
dependent variable) on DENSITY, AGE, POINTS, and an intercept. Report the estimated regression
coefficients. Is the estimated coefficient on POINTS the same as in (8)? Why or why not? Finally,
report the value of R2 for this regression and interpret its value.
(10) Identify the highest paid player (based on SALARY_REAL). Given the values of the regression
estimated in (8), determine the value of the associated residual and interpret this value. Finally,
identify the player with the largest residual (in absolute value). What is the value of this residual and
what is its meaning?
Download