Uploaded by Maham Raza

upload

advertisement
The median starting salary for new law school graduates is determined by
𝒍𝒐𝒈(𝒔𝒂𝒍𝒂𝒓𝒚) = 𝜷𝟎 + 𝜷𝟏 𝑳𝑺𝑨𝑻 + 𝜷𝟐 𝑮𝑷𝑨 + 𝜷𝟑 𝒍𝒐𝒈(𝒍𝒊𝒃𝒗𝒐𝒍) + 𝜷𝟒 𝒍𝒐𝒈(𝒄𝒐𝒔𝒕)
+ 𝜷𝟓 𝒓𝒂𝒏𝒌 + 𝒖
where LSAT is the median LSAT score for the graduating class, GPA is the median college
GPA for the class, libvol is the number of volumes in the law school library, cost is the
annual cost of attending law school, and rank is a law school ranking (with rank = 1 being
the best)
i.
Explain why we expect 𝛽5 ≤ 0.
We expect the sign of β5 to be negative because it is stated in the relation that the higher the
number of variables rank, the lesser is the prestige of the law school.
For example, if the rank is 50 then there are 49 more prestigious law schools whose students
deserve higher starting salary.
ii.
What signs do you expect for the other slop parameters? Justify your answers
We expect all other parameters to have a positive sign.
β1 > 0, β2 > 0. The LSAT and GPA are both measures of the quality of entering class. No matter
the school quality or prestige, the higher the students' GPA and SAT scores are, the higher their
starting salary should be.
β3, β4 > 0. The size of the library and the cost of tuition are also a sign of the school's prestige
and quality. If the cost of attending some schools is high, we would expect the students from
those schools to be more educated and therefore deserve a higher starting salary.
iii.
Using the data LAWSCH85, estimate the equation
̂
log(𝑠𝑎𝑙𝑎𝑟𝑦)
 8.34  0.0047𝐿𝑆𝐴𝑇  0.248𝐺𝑃𝐴 0.095 log(𝑙𝑖𝑏𝑣𝑜𝑙)  0.038 log(𝑐𝑜𝑠𝑡) . 0033𝑟𝑎𝑛𝑘  𝑢 ,
𝑛 = 136, 𝑅 2 = 0.842.
1
iv.
What is the predicted ceteris paribus difference in salary for schools with a median
GPA different by one point? (Report your answer as a percentage.)
Since the starting salary is under log it should be interpreted by percentage.
Therefore, if the GPA score changes by one point, the starting salary will on average change
by 24.8%.
̂
log(𝑠𝑎𝑙𝑎𝑟𝑦)
= 0.248 ⋅ 𝛥𝐺𝑃𝐴
= 0.248 − 1
= 0.248
v.
Interpret the coefficient on the variable log(𝑙𝑖𝑏𝑣𝑜𝑙)
This is an elasticity: a one percent increase in library volumes implies a 0.095% increase in
predicted median starting salary, other things equal.
vi.
Would you say it is better to attend a higher ranked law school? How much is a
difference in the ranking of 20 worth in terms of predicted starting salary?
It is better to attend a law school with a lower rank.
If law school A has a ranking 20 less than law school B, the predicted difference in starting
salary is 100(. 0033)(20) = 6.6% higher for law school A.
TASK 2
In a study relating college grade point average to time spent in various activities, you distribute a
survey to several students. The students are asked how many hours they spend each week on four
activities: studying, sleeping, working, and leisure. Any activity is put into one of the four
categories, so that for each student, the sum of hours in the four activities must be 168
i.
In this model,
𝑮𝑷𝑨 = 𝜷𝟎 + 𝜷𝟏𝒔𝒕𝒖𝒅𝒚 + 𝜷𝟐𝒔𝒍𝒆𝒆𝒑 + 𝜷𝟑𝒘𝒐𝒓𝒌 + 𝜷𝟒 𝒍𝒆𝒊𝒖𝒓𝒆 + 𝒖
Does it make sense to keep sleep, work, and leisure fixed while changing your study
schedule?
The Given model is:
𝑮𝑷𝑨 = 𝜷𝟎 + 𝜷𝟏𝒔𝒕𝒖𝒅𝒚 + 𝜷𝟐𝒔𝒍𝒆𝒆𝒑 + 𝜷𝟑𝒘𝒐𝒓𝒌 + 𝜷𝟒 𝒍𝒆𝒊𝒖𝒓𝒆 + 𝒖
2
Also, it is given that the sum of four activities (study, sleep, work & leisure) must be equal to
168.
At least one of the variables should not be kept fixed so that the sum of hours of that respective
variable & study is equal to 168.
Therefore, it makes no sense to hold sleep, work & leisure fixed, while changing study, as the
sum of hours will not equate to 168.
ii.
Explain why this model violates the assumption of multicollinearity. Give your
supported answer.
One of the assumptions of MLR.3 is that no perfect collinearity should exist. One condition for
the existence of perfect collinearity is that one variable should be a linear function of two or
more variables.
From Part (i), study is a linear function of sleep, work & leisure.
Similarly, sleep is a linear function of study, work & leisure. Similar observation can be seen for
work & leisure.
Therefore, perfect collinearity in the given model. Hence, the model violates Assumption
MLR.3.
iii.
How could you reformulate the model so that its parameters have a useful
interpretation, and it satisfies Assumption MLR-3?
The model can be reformulated to satisfy assumption MLR.3 by removing the variable that is
highly correlated to other variables and leave the most significant ones in the set.
Here the variable, which we can drop out is "leisure".
If the variable, leisure is dropped, the model is reformulated so that its parameters have a useful
interpretation & satisfy assumption MLR.3.
3
TASK 3
Use the data in ATTEND for this exercise.
i.
Obtain the minimum, maximum, and average values for the variables atndrte,
priGPA, and ACT.
The minimum of the variable atndrte is 6.25, the maximum is 100 and the average value
is 81.71.
The minimum of the variable priGPA is 0.857, the maximum is 3.93 and the average value
is 2.587.
The minimum of the variable ACT is 13, the maximum is 32 and the average value is 22.51.
ii.
Estimate the model
𝒂𝒕𝒏𝒅𝒓𝒕𝒆 = 𝜷𝟎 + 𝜷𝟏𝒑𝒓𝒊𝑮𝑷𝑨 + 𝜷𝟐𝑨𝑪𝑻 + 𝒖
write the results in equation form. Interpret the intercept. Does it have a useful
meaning?
4
estimated the model as follows,
̂ = 𝟕𝟓. 𝟕 + 𝟏𝟕. 𝟐𝟔 𝒑𝒓𝒊𝑮𝑷𝑨 − 𝟏. 𝟕𝟐 𝑨𝑪𝑻
𝒂𝒕𝒏𝒅𝒓𝒕𝒆
𝒏 = 𝟔𝟖𝟎, 𝑹𝟐 = 𝟎. 𝟐𝟗𝟎𝟔
The intercept suggests that the students who have zero ACT score and zero cumulative GPA
prior to the term, would on average have an attendance rate of 75.7%.
We see that this interpretation makes no sense because the students who would have both GPA
and ACT scores equal to zero would not be able to even enroll in the college.
iii.
Discuss the estimated slope coefficients. Are there any surprises?
The slope coefficient on the variable priGPA is 17.26, which does not surprise us because we
would expect that the students who had a larger GPA achieved this score by regularly attending
their classes. Therefore, they would continue to regularly attend their classes in college.
However, the negative sign on the slope coefficient for the variable ATC surprises us. This
would mean that the higher a student’s ATC score was, the lower the attendance rate they would
have.
It could be said that the reason why the students with higher ATC scores attend fewer classes is
that they have already learned the material from certain classes. Therefore, these kinds of
students think that going to these classes just wastes their time and resources.
iv.
What is the predicted atndrte if priGPA = 3.65 and ACT = 20? What do you make
of this result? Are there any students in the sample with these values of the
explanatory variables?
̂ = 75.7 + 17.26(⋅ 3.65) − 1.72(⋅ 20)
𝑎𝑡𝑛𝑑𝑟𝑡𝑒
= 75.7 + 62.999 − 34.4
= 104.299
The predicted attendance rate if priGPA=3.65 and ACT=20 is 104.3
We see that the outcome makes no sense since we have established in part (i) that the highest
attendance rate can be 100.
5
By dividing the required data in a table, the following findings were made such that there is one
student in the sample with priGPA=3.65 and ACT=20 and his attendance rate is 87.5%.
v.
If Student A has priGPA = 3.1 and ACT = 21 and Student B has priGPA = 2.1 and
ACT = 26, what is the predicted difference in their attendance rates?
𝛥𝑎𝑡𝑛𝑑𝑟𝑡𝑒 = 17.26 ⋅ 𝛥(𝑝𝑟𝑖𝐺𝑃𝐴1 − 𝑝𝑟𝑖𝐺𝑃𝐴2) − 1.72 ⋅ 𝛥(𝐴𝐶𝑇1 − 𝐴𝐶𝑇2)
= 17.26 ⋅ (3.1 − 2.1) − 1.72 ⋅ (21 − 26)
= 17.26 ⋅ 1 − 1.72 ⋅ (−5)
= 17.26 + 8.6
= 25.86
The difference in predicted attendance rates for A and B would be 25.86.
TASK 4
Consider the estimated equation below, which can be used to study the effects of skipping
class on college GPA
i.
Using the standard normal approximation, find the 95% confidence interval for
𝛽ℎ𝑠𝐺𝑃𝐴?
The equation for the confidence interval is:
for lower bound and:
𝛽ˆ𝑗 − 𝑐 ⋅ 𝑆𝐸(𝛽ˆ𝑗)  equation 1
for the upper bound.
𝛽ˆ𝑗 + 𝑐 ⋅ 𝑆𝐸(𝛽ˆ𝑗) equation 2
The constant c is the critical level of the t distribution depending on the regression's degrees of
freedom and the percentage of the reliability of the assessment.
6
In our case, the regression has 137 degrees of freedom
(𝑑𝑓 = 𝑛 − 𝑘 − 1 = 141 − 3 − 1 = 137), and 95% of reliability.
We determined that the critical value of the t distribution for our data is 1.96.
By putting this in our equations 1 and 2 we can get upper and lower bounds of the confidence
intervals.
𝛽̂ ℎ𝑠𝐺𝑃𝐴 = 𝛽̂ ℎ𝑠𝐺𝑃𝐴 + 𝑐 ⋅ 𝑆𝐸(𝛽̂ ℎ𝑠𝐺𝑃𝐴)
= 0.412 + 1.96 ⋅ 0.094
= 0.412 + 0.184
= 0.596
𝛽̂ ℎ𝑠𝐺𝑃𝐴 = 𝛽̂ ℎ𝑠𝐺𝑃𝐴 − 𝑐 ⋅ 𝑆𝐸(𝛽̂ ℎ𝑠𝐺𝑃𝐴)
= 0.412 − 1.96 ⋅ 0.094
= 0.412 − 0.184
= 0.228
̂ 𝒉𝒔𝑮𝑷𝑨 increases by 1
The interpretation is that with the reliability level of 95%, if the variable 𝜷
point, holding everything else fixed, the college GPA would on average change
between 0.228 and 0.596 points.
ii.
̂ 𝒉𝒔𝑮𝑷𝑨 = 0.4 against the two-sided alternative
Can you reject the hypothesis 𝑯𝟎 ∶ 𝜷
at 5% level?
We could not reject the hypothesis H0=0.4 because the value is well inside the 95% confidence
interval.
The confidence interval is, as we have calculated in the part (i) of the task:
𝑃(0.228 < 𝛽̂ ℎ𝑠𝐺𝑃𝐴 < 0.596) = 0.05
iii.
Can you reject the hypothesis 𝐻0: 𝛽ℎ𝑠𝐺𝑃𝐴 = 1 against the two-sided alternative at
5% level?
̂ 𝐡𝐬𝐆𝐏𝐀 = 𝟏 because the value is well outside the 95%
We would reject the hypothesis 𝐇𝟎: 𝛃
confidence interval.
We reject H0 against the two-sided alternative at 5% significance level.
7
TASK 5
Use the data in WAGE2 for this exercise.
i.
Consider the standard wage equation.
𝒍𝒐𝒈(𝒘𝒂𝒈𝒆) = 𝜷𝟎 + 𝜷𝟏𝒆𝒅𝒖𝒄 + 𝜷𝟐𝒆𝒙𝒑𝒆𝒓 + 𝜷𝟑 𝒕𝒆𝒏𝒖𝒓𝒆 + 𝒖
State the null hypothesis that another year of general workforce experience has the same
effect on log(wage) as another year of tenure with the current employer.
The null hypothesis would be:
𝐻0: 𝛽2 = 𝛽3 ⟹ 𝛽2 − 𝛽3 = 0
𝐻1: 𝛽2 ≠ 𝛽3
The expression means that another year of general workforce experience (β2) has the same effect
on log(wage) as another year of tenure with the current employer (β3).
ii.
Test the null hypothesis in part (i) against a two-sided alternative, at the 5%
significance level, by constructing a 95% confidence interval. What do you
conclude?
Firstly, we will define a variable θ1 as:
𝜃1 = 𝛽2 − 𝛽3 ⟹ 𝛽2 = 𝜃1 + 𝛽3
𝐻0: 𝜃1 = 0, 𝐻1: 𝜃1 ≠ 0 ≠0
We then add the variable θ1 in the existing model, we get the equation:
𝑙𝑜𝑔(𝑤𝑎𝑔𝑒) = β0 + β1 𝑒𝑑𝑢𝑐 + β2 𝑒𝑥𝑝𝑒𝑟 + β3 𝑡𝑒𝑛𝑢𝑟𝑒 + u
= 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + (𝜃1 + 𝛽3) 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝑢
= 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝜃1 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑡𝑒𝑛𝑢𝑟𝑒 + 𝑢
= 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝜃1 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 (𝑒𝑥𝑝𝑒𝑟 + 𝑡𝑒𝑛𝑢𝑟𝑒) + 𝑢
𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑥3 = 𝑒𝑥𝑝𝑒𝑟 + 𝑡𝑒𝑛𝑢𝑟𝑒
log(𝑤𝑎𝑔𝑒) = β0 + β1 𝑒𝑑𝑢𝑐 + β2 𝑒𝑥𝑝𝑒𝑟 + β3 𝑡𝑒𝑛𝑢𝑟𝑒 + u
log(𝑤𝑎𝑔𝑒) = 5.497 + 0.0749 𝑒𝑑𝑢𝑐 + 0.00195 𝑒𝑥𝑝𝑒𝑟 + 0.0134 𝑥3
(0.41)
(0.0065)
(0.00 474)
(0.0026)
𝑡𝜃1
̂ = 0.41
8
The 95% confidence interval for the variable θ1 variates between −0.00736 and 0.01126.
Since there is a zero within this interval, we fail to reject the 𝐻0 against a two-sided alternative
at 5% significance level.
In order words, another year of general workforce experience statistically has the same
effect on log(wage) as another year of tenure with the current employer.
9
Download