2022 BT2101 Mid Term Practice

advertisement
BT2101 Mid-Term Problem-Set
This problem-set is shared with the intention of simulating the mid-term exam. You do not need to
submit your solutions to the problem-set. Please feel free to discuss the problems (and solutions) with
your peers and post any clarification questions on the LumiNUS Forum. Suggested solutions to the PS
will be shared as a separate file.
September 21, 2022
1
Critique of Research Proposals
Critique each of the following proposed research plans. Your critique should explain any problems with
the proposed research and describe how the research plan might be improved (e.g., addressing omitted
variable issue). Include a discussion of any additional data that need to be collected (e.g., adding
more control variables) and the appropriate statistical techniques (e.g., multiple regression) for
analyzing those data.
(a) A recent study found that the death rate for people who sleep less than 6 hours per night is lower
than the death rate for people who sleep 8 or more hours.
1 million observations used for this study came from a random survey of Americans aged 30 to
102. The study tracked sleep and death rates over a period of 5 years. The death rate was
measured as follows: the death rate for people sleeping 8 hours was calculated as the ratio of the
number of deaths over the span of the study among people sleeping 8 hours to the total number of
survey respondents who slept 8 hours. This calculation was then repeated for people sleeping 6
hours and so on.
Based on this summary, would you recommend that people who sleep 9 hours per night consider
reducing their sleep to 6 or 7 hours if they want to prolong their lives? Why or why not? Explain.
(b) A student is interested in determining whether imprisonment has a permanent effect on a person’s
wage rate. She collects data on a random sample of people who have been released from prison.
The data set includes information on each person’s current wage, education, age, ethnicity,
gender, tenure(time in current job), occupation, and unionstatus.
She also collects wage information from a random sample of people who have never served time in
prison. She makes sure that these two samples are statistically similar for all columns that could
confound the relationship of interest (education, age, ethnicity, gender, tenure(time in current
job), occupation, and unionstatus).
She merges these two samples and maintains an indicator column that records if the person has
ever been imprisoned (0/1). The researcher plans to estimate the effect of imprisonment on
wages by regressing wages on an indicator variable for incarceration, including in the regression
the other potential determinants of wages (education, tenure, union status, and so on).
2
Real Estate Business Analyst
A business analyst for a real estate firm collected data from a random sample of 220 home sales from
a community. Let P rice denote the selling price (in hundreds of thousands of SGD), BDR denote the
number of bedrooms, Bath denote the number of bathrooms, Hsize denote the size of the house (in
square feet), Lsize denote the lot size (in square feet), Age denote the age of the house (in years), and
P oor denote a binary variable that is equal to 1 if the condition of the house is reported as “poor.” An
estimated regression yields:
Ğ
P
rice “ 119.2 ` 0.485BDR ` 23.4Bath ` 0.156Hsize ` 0.002Lsize ` 0.090Age ´ 48.8P oor
1
(1)
(a) Suppose that a homeowner adds a new bathroom to her house, which increases the size of the
house by 100 square feet. What is the expected increase in the value of the house?
(b) The analyst wants to test the impact of house color (Red, Blue or Green) on house price. He/She
estimates the model below:
Ğ
P
rice “ βˆ0 ` βˆ1 ˚ Red ` βˆ2 ˚ Green
(2)
The mean price of all houses in the sample is 58.97. The mean price of all red houses in the
sample is 63.12. The mean price is 36.34 for blue houses and 47.14 for green houses.
What will be the estimated value of βˆ1 and βˆ2 in the model above?
(c) The analyst fails to reject the null hypothesis in a hypothesis test for βˆ1 , but accepts the
alternative hypothesis for βˆ0 and βˆ2 . Interpret this in the context of population-level mean prices
(for all houses and/or houses of each color). Is there a relationship between color of the house and
selling price?
3
Software Pilot Test
NUS is developing a new software with an aim to improve students’ academic performance. Students
were asked to volunteer to pilot test the software. The following table contains the results derived from
pilot test for eight students. Grade is the score each student gets in a class quiz and the full mark is 10.
Adopt is a binary variable and equals 1 if the volunteer student uses the software when preparing for the
quiz (equals 0 otherwise).
Ğ “ 7.75 ` 1.5 Adopt
Grade
(3)
(a) How do you interpret the 7.75 and 1.5 seen in the above model?
(b) If we had many more volunteers and still obtain the same estimates, can we say that using the
software will increase the academic performance (with causality)? Why or why not?
(c) If you had complete control over the pilot testing (ethics and money are not a constraint), how
would you measure the effectiveness of this software? Please explain your actions and choices with
rigorous reasoning.
4
Transformations
Two students from BT2101 are tasked with running an OLS regression and interpreting the results. The
raw data assigned to them contains only two columns, the dependent variable (Y ) and the independent
variable (X).
Student 1 trains the following model and reports the estimated values for slope (β1 ) and intercept
(β0 ):
Ȳ “ β0 ` β1 X
(4)
Student 2 standardizes the independent variable observations, by using the following transformation,
where µX is the mean of independent variable observations and σX is the standard deviation:
X1 “
X ´ µX
σX
Student 2 then uses the standardized independent variable to run their regression:
Ȳ “ β01 ` β11 X 1
(5)
• Student 1 reports that β0 is positive and statistically significant (at a significance level mutually
agreed upon ahead of the time). Can we use this information to predict the direction and statistical
significance of β01 ? Why, or why not?
2
• Student 1 reports that β1 is negative and statistically insignificant (at a significance level mutually
agreed upon ahead of the time). Can we use this information to predict the direction and statistical
significance of β11 ? Why, or why not?
• Student 1 reports that the R2 of their model is 63%. Can we use this information to predict the
R2 of the model reported by Student 2? Why, or why not?
5
Non-linear Transformation
Consider a regression model:
Yi “ β0 ` β1 ˚ lnpX1i q ` β2 ˚ X2i ` β3 ˚ D1i ` β4 ˚ D2i ` β5 ˚ pX2i ˚ D1i q ` β6 ˚ pD1i ˚ D2i q ` ϵi
where X1i and X2i are continuous variables and D1i and D2i are binary variables.
Answer the following questions.
(A) How much does Y change by when X1i is increased by .05%?
(B) How much does Y change by when X2i is increased by 1 unit?
(C) What is the effect on Y of going from D2i “ 0 to D2i “ 1?
3
(6)
Download