The median starting salary for new law school graduates is determined by 𝒍𝒐𝒈(𝒔𝒂𝒍𝒂𝒓𝒚) = 𝜷𝟎 + 𝜷𝟏 𝑳𝑺𝑨𝑻 + 𝜷𝟐 𝑮𝑷𝑨 + 𝜷𝟑 𝒍𝒐𝒈(𝒍𝒊𝒃𝒗𝒐𝒍) + 𝜷𝟒 𝒍𝒐𝒈(𝒄𝒐𝒔𝒕) + 𝜷𝟓 𝒓𝒂𝒏𝒌 + 𝒖 where LSAT is the median LSAT score for the graduating class, GPA is the median college GPA for the class, libvol is the number of volumes in the law school library, cost is the annual cost of attending law school, and rank is a law school ranking (with rank = 1 being the best) i. Explain why we expect 𝛽5 ≤ 0. We expect the sign of β5 to be negative because it is stated in the relation that the higher the number of variables rank, the lesser is the prestige of the law school. For example, if the rank is 50 then there are 49 more prestigious law schools whose students deserve higher starting salary. ii. What signs do you expect for the other slop parameters? Justify your answers We expect all other parameters to have a positive sign. β1 > 0, β2 > 0. The LSAT and GPA are both measures of the quality of entering class. No matter the school quality or prestige, the higher the students' GPA and SAT scores are, the higher their starting salary should be. β3, β4 > 0. The size of the library and the cost of tuition are also a sign of the school's prestige and quality. If the cost of attending some schools is high, we would expect the students from those schools to be more educated and therefore deserve a higher starting salary. iii. Using the data LAWSCH85, estimate the equation ̂ log(𝑠𝑎𝑙𝑎𝑟𝑦) 8.34 0.0047𝐿𝑆𝐴𝑇 0.248𝐺𝑃𝐴 0.095 log(𝑙𝑖𝑏𝑣𝑜𝑙) 0.038 log(𝑐𝑜𝑠𝑡) . 0033𝑟𝑎𝑛𝑘 𝑢 , 𝑛 = 136, 𝑅 2 = 0.842. 1 iv. What is the predicted ceteris paribus difference in salary for schools with a median GPA different by one point? (Report your answer as a percentage.) Since the starting salary is under log it should be interpreted by percentage. Therefore, if the GPA score changes by one point, the starting salary will on average change by 24.8%. ̂ log(𝑠𝑎𝑙𝑎𝑟𝑦) = 0.248 ⋅ 𝛥𝐺𝑃𝐴 = 0.248 − 1 = 0.248 v. Interpret the coefficient on the variable log(𝑙𝑖𝑏𝑣𝑜𝑙) This is an elasticity: a one percent increase in library volumes implies a 0.095% increase in predicted median starting salary, other things equal. vi. Would you say it is better to attend a higher ranked law school? How much is a difference in the ranking of 20 worth in terms of predicted starting salary? It is better to attend a law school with a lower rank. If law school A has a ranking 20 less than law school B, the predicted difference in starting salary is 100(. 0033)(20) = 6.6% higher for law school A. TASK 2 In a study relating college grade point average to time spent in various activities, you distribute a survey to several students. The students are asked how many hours they spend each week on four activities: studying, sleeping, working, and leisure. Any activity is put into one of the four categories, so that for each student, the sum of hours in the four activities must be 168 i. In this model, 𝑮𝑷𝑨 = 𝜷𝟎 + 𝜷𝟏𝒔𝒕𝒖𝒅𝒚 + 𝜷𝟐𝒔𝒍𝒆𝒆𝒑 + 𝜷𝟑𝒘𝒐𝒓𝒌 + 𝜷𝟒 𝒍𝒆𝒊𝒖𝒓𝒆 + 𝒖 Does it make sense to keep sleep, work, and leisure fixed while changing your study schedule? The Given model is: 𝑮𝑷𝑨 = 𝜷𝟎 + 𝜷𝟏𝒔𝒕𝒖𝒅𝒚 + 𝜷𝟐𝒔𝒍𝒆𝒆𝒑 + 𝜷𝟑𝒘𝒐𝒓𝒌 + 𝜷𝟒 𝒍𝒆𝒊𝒖𝒓𝒆 + 𝒖 2 Also, it is given that the sum of four activities (study, sleep, work & leisure) must be equal to 168. At least one of the variables should not be kept fixed so that the sum of hours of that respective variable & study is equal to 168. Therefore, it makes no sense to hold sleep, work & leisure fixed, while changing study, as the sum of hours will not equate to 168. ii. Explain why this model violates the assumption of multicollinearity. Give your supported answer. One of the assumptions of MLR.3 is that no perfect collinearity should exist. One condition for the existence of perfect collinearity is that one variable should be a linear function of two or more variables. From Part (i), study is a linear function of sleep, work & leisure. Similarly, sleep is a linear function of study, work & leisure. Similar observation can be seen for work & leisure. Therefore, perfect collinearity in the given model. Hence, the model violates Assumption MLR.3. iii. How could you reformulate the model so that its parameters have a useful interpretation, and it satisfies Assumption MLR-3? The model can be reformulated to satisfy assumption MLR.3 by removing the variable that is highly correlated to other variables and leave the most significant ones in the set. Here the variable, which we can drop out is "leisure". If the variable, leisure is dropped, the model is reformulated so that its parameters have a useful interpretation & satisfy assumption MLR.3. 3 TASK 3 Use the data in ATTEND for this exercise. i. Obtain the minimum, maximum, and average values for the variables atndrte, priGPA, and ACT. The minimum of the variable atndrte is 6.25, the maximum is 100 and the average value is 81.71. The minimum of the variable priGPA is 0.857, the maximum is 3.93 and the average value is 2.587. The minimum of the variable ACT is 13, the maximum is 32 and the average value is 22.51. ii. Estimate the model 𝒂𝒕𝒏𝒅𝒓𝒕𝒆 = 𝜷𝟎 + 𝜷𝟏𝒑𝒓𝒊𝑮𝑷𝑨 + 𝜷𝟐𝑨𝑪𝑻 + 𝒖 write the results in equation form. Interpret the intercept. Does it have a useful meaning? 4 estimated the model as follows, ̂ = 𝟕𝟓. 𝟕 + 𝟏𝟕. 𝟐𝟔 𝒑𝒓𝒊𝑮𝑷𝑨 − 𝟏. 𝟕𝟐 𝑨𝑪𝑻 𝒂𝒕𝒏𝒅𝒓𝒕𝒆 𝒏 = 𝟔𝟖𝟎, 𝑹𝟐 = 𝟎. 𝟐𝟗𝟎𝟔 The intercept suggests that the students who have zero ACT score and zero cumulative GPA prior to the term, would on average have an attendance rate of 75.7%. We see that this interpretation makes no sense because the students who would have both GPA and ACT scores equal to zero would not be able to even enroll in the college. iii. Discuss the estimated slope coefficients. Are there any surprises? The slope coefficient on the variable priGPA is 17.26, which does not surprise us because we would expect that the students who had a larger GPA achieved this score by regularly attending their classes. Therefore, they would continue to regularly attend their classes in college. However, the negative sign on the slope coefficient for the variable ATC surprises us. This would mean that the higher a student’s ATC score was, the lower the attendance rate they would have. It could be said that the reason why the students with higher ATC scores attend fewer classes is that they have already learned the material from certain classes. Therefore, these kinds of students think that going to these classes just wastes their time and resources. iv. What is the predicted atndrte if priGPA = 3.65 and ACT = 20? What do you make of this result? Are there any students in the sample with these values of the explanatory variables? ̂ = 75.7 + 17.26(⋅ 3.65) − 1.72(⋅ 20) 𝑎𝑡𝑛𝑑𝑟𝑡𝑒 = 75.7 + 62.999 − 34.4 = 104.299 The predicted attendance rate if priGPA=3.65 and ACT=20 is 104.3 We see that the outcome makes no sense since we have established in part (i) that the highest attendance rate can be 100. 5 By dividing the required data in a table, the following findings were made such that there is one student in the sample with priGPA=3.65 and ACT=20 and his attendance rate is 87.5%. v. If Student A has priGPA = 3.1 and ACT = 21 and Student B has priGPA = 2.1 and ACT = 26, what is the predicted difference in their attendance rates? 𝛥𝑎𝑡𝑛𝑑𝑟𝑡𝑒 = 17.26 ⋅ 𝛥(𝑝𝑟𝑖𝐺𝑃𝐴1 − 𝑝𝑟𝑖𝐺𝑃𝐴2) − 1.72 ⋅ 𝛥(𝐴𝐶𝑇1 − 𝐴𝐶𝑇2) = 17.26 ⋅ (3.1 − 2.1) − 1.72 ⋅ (21 − 26) = 17.26 ⋅ 1 − 1.72 ⋅ (−5) = 17.26 + 8.6 = 25.86 The difference in predicted attendance rates for A and B would be 25.86. TASK 4 Consider the estimated equation below, which can be used to study the effects of skipping class on college GPA i. Using the standard normal approximation, find the 95% confidence interval for 𝛽ℎ𝑠𝐺𝑃𝐴? The equation for the confidence interval is: for lower bound and: 𝛽ˆ𝑗 − 𝑐 ⋅ 𝑆𝐸(𝛽ˆ𝑗) equation 1 for the upper bound. 𝛽ˆ𝑗 + 𝑐 ⋅ 𝑆𝐸(𝛽ˆ𝑗) equation 2 The constant c is the critical level of the t distribution depending on the regression's degrees of freedom and the percentage of the reliability of the assessment. 6 In our case, the regression has 137 degrees of freedom (𝑑𝑓 = 𝑛 − 𝑘 − 1 = 141 − 3 − 1 = 137), and 95% of reliability. We determined that the critical value of the t distribution for our data is 1.96. By putting this in our equations 1 and 2 we can get upper and lower bounds of the confidence intervals. 𝛽̂ ℎ𝑠𝐺𝑃𝐴 = 𝛽̂ ℎ𝑠𝐺𝑃𝐴 + 𝑐 ⋅ 𝑆𝐸(𝛽̂ ℎ𝑠𝐺𝑃𝐴) = 0.412 + 1.96 ⋅ 0.094 = 0.412 + 0.184 = 0.596 𝛽̂ ℎ𝑠𝐺𝑃𝐴 = 𝛽̂ ℎ𝑠𝐺𝑃𝐴 − 𝑐 ⋅ 𝑆𝐸(𝛽̂ ℎ𝑠𝐺𝑃𝐴) = 0.412 − 1.96 ⋅ 0.094 = 0.412 − 0.184 = 0.228 ̂ 𝒉𝒔𝑮𝑷𝑨 increases by 1 The interpretation is that with the reliability level of 95%, if the variable 𝜷 point, holding everything else fixed, the college GPA would on average change between 0.228 and 0.596 points. ii. ̂ 𝒉𝒔𝑮𝑷𝑨 = 0.4 against the two-sided alternative Can you reject the hypothesis 𝑯𝟎 ∶ 𝜷 at 5% level? We could not reject the hypothesis H0=0.4 because the value is well inside the 95% confidence interval. The confidence interval is, as we have calculated in the part (i) of the task: 𝑃(0.228 < 𝛽̂ ℎ𝑠𝐺𝑃𝐴 < 0.596) = 0.05 iii. Can you reject the hypothesis 𝐻0: 𝛽ℎ𝑠𝐺𝑃𝐴 = 1 against the two-sided alternative at 5% level? ̂ 𝐡𝐬𝐆𝐏𝐀 = 𝟏 because the value is well outside the 95% We would reject the hypothesis 𝐇𝟎: 𝛃 confidence interval. We reject H0 against the two-sided alternative at 5% significance level. 7 TASK 5 Use the data in WAGE2 for this exercise. i. Consider the standard wage equation. 𝒍𝒐𝒈(𝒘𝒂𝒈𝒆) = 𝜷𝟎 + 𝜷𝟏𝒆𝒅𝒖𝒄 + 𝜷𝟐𝒆𝒙𝒑𝒆𝒓 + 𝜷𝟑 𝒕𝒆𝒏𝒖𝒓𝒆 + 𝒖 State the null hypothesis that another year of general workforce experience has the same effect on log(wage) as another year of tenure with the current employer. The null hypothesis would be: 𝐻0: 𝛽2 = 𝛽3 ⟹ 𝛽2 − 𝛽3 = 0 𝐻1: 𝛽2 ≠ 𝛽3 The expression means that another year of general workforce experience (β2) has the same effect on log(wage) as another year of tenure with the current employer (β3). ii. Test the null hypothesis in part (i) against a two-sided alternative, at the 5% significance level, by constructing a 95% confidence interval. What do you conclude? Firstly, we will define a variable θ1 as: 𝜃1 = 𝛽2 − 𝛽3 ⟹ 𝛽2 = 𝜃1 + 𝛽3 𝐻0: 𝜃1 = 0, 𝐻1: 𝜃1 ≠ 0 ≠0 We then add the variable θ1 in the existing model, we get the equation: 𝑙𝑜𝑔(𝑤𝑎𝑔𝑒) = β0 + β1 𝑒𝑑𝑢𝑐 + β2 𝑒𝑥𝑝𝑒𝑟 + β3 𝑡𝑒𝑛𝑢𝑟𝑒 + u = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + (𝜃1 + 𝛽3) 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝑢 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝜃1 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝑢 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝜃1 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 (𝑒𝑥𝑝𝑒𝑟 + 𝑡𝑒𝑛𝑢𝑟𝑒) + 𝑢 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑥3 = 𝑒𝑥𝑝𝑒𝑟 + 𝑡𝑒𝑛𝑢𝑟𝑒 log(𝑤𝑎𝑔𝑒) = β0 + β1 𝑒𝑑𝑢𝑐 + β2 𝑒𝑥𝑝𝑒𝑟 + β3 𝑡𝑒𝑛𝑢𝑟𝑒 + u log(𝑤𝑎𝑔𝑒) = 5.497 + 0.0749 𝑒𝑑𝑢𝑐 + 0.00195 𝑒𝑥𝑝𝑒𝑟 + 0.0134 𝑥3 (0.41) (0.0065) (0.00 474) (0.0026) 𝑡𝜃1 ̂ = 0.41 8 The 95% confidence interval for the variable θ1 variates between −0.00736 and 0.01126. Since there is a zero within this interval, we fail to reject the 𝐻0 against a two-sided alternative at 5% significance level. In order words, another year of general workforce experience statistically has the same effect on log(wage) as another year of tenure with the current employer. 9