BT2101 Mid-Term Problem-Set This problem-set is shared with the intention of simulating the mid-term exam. You do not need to submit your solutions to the problem-set. Please feel free to discuss the problems (and solutions) with your peers and post any clarification questions on the LumiNUS Forum. Suggested solutions to the PS will be shared as a separate file. September 21, 2022 1 Critique of Research Proposals Critique each of the following proposed research plans. Your critique should explain any problems with the proposed research and describe how the research plan might be improved (e.g., addressing omitted variable issue). Include a discussion of any additional data that need to be collected (e.g., adding more control variables) and the appropriate statistical techniques (e.g., multiple regression) for analyzing those data. (a) A recent study found that the death rate for people who sleep less than 6 hours per night is lower than the death rate for people who sleep 8 or more hours. 1 million observations used for this study came from a random survey of Americans aged 30 to 102. The study tracked sleep and death rates over a period of 5 years. The death rate was measured as follows: the death rate for people sleeping 8 hours was calculated as the ratio of the number of deaths over the span of the study among people sleeping 8 hours to the total number of survey respondents who slept 8 hours. This calculation was then repeated for people sleeping 6 hours and so on. Based on this summary, would you recommend that people who sleep 9 hours per night consider reducing their sleep to 6 or 7 hours if they want to prolong their lives? Why or why not? Explain. (b) A student is interested in determining whether imprisonment has a permanent effect on a person’s wage rate. She collects data on a random sample of people who have been released from prison. The data set includes information on each person’s current wage, education, age, ethnicity, gender, tenure(time in current job), occupation, and unionstatus. She also collects wage information from a random sample of people who have never served time in prison. She makes sure that these two samples are statistically similar for all columns that could confound the relationship of interest (education, age, ethnicity, gender, tenure(time in current job), occupation, and unionstatus). She merges these two samples and maintains an indicator column that records if the person has ever been imprisoned (0/1). The researcher plans to estimate the effect of imprisonment on wages by regressing wages on an indicator variable for incarceration, including in the regression the other potential determinants of wages (education, tenure, union status, and so on). 2 Real Estate Business Analyst A business analyst for a real estate firm collected data from a random sample of 220 home sales from a community. Let P rice denote the selling price (in hundreds of thousands of SGD), BDR denote the number of bedrooms, Bath denote the number of bathrooms, Hsize denote the size of the house (in square feet), Lsize denote the lot size (in square feet), Age denote the age of the house (in years), and P oor denote a binary variable that is equal to 1 if the condition of the house is reported as “poor.” An estimated regression yields: Ğ P rice “ 119.2 ` 0.485BDR ` 23.4Bath ` 0.156Hsize ` 0.002Lsize ` 0.090Age ´ 48.8P oor 1 (1) (a) Suppose that a homeowner adds a new bathroom to her house, which increases the size of the house by 100 square feet. What is the expected increase in the value of the house? (b) The analyst wants to test the impact of house color (Red, Blue or Green) on house price. He/She estimates the model below: Ğ P rice “ βˆ0 ` βˆ1 ˚ Red ` βˆ2 ˚ Green (2) The mean price of all houses in the sample is 58.97. The mean price of all red houses in the sample is 63.12. The mean price is 36.34 for blue houses and 47.14 for green houses. What will be the estimated value of βˆ1 and βˆ2 in the model above? (c) The analyst fails to reject the null hypothesis in a hypothesis test for βˆ1 , but accepts the alternative hypothesis for βˆ0 and βˆ2 . Interpret this in the context of population-level mean prices (for all houses and/or houses of each color). Is there a relationship between color of the house and selling price? 3 Software Pilot Test NUS is developing a new software with an aim to improve students’ academic performance. Students were asked to volunteer to pilot test the software. The following table contains the results derived from pilot test for eight students. Grade is the score each student gets in a class quiz and the full mark is 10. Adopt is a binary variable and equals 1 if the volunteer student uses the software when preparing for the quiz (equals 0 otherwise). Ğ “ 7.75 ` 1.5 Adopt Grade (3) (a) How do you interpret the 7.75 and 1.5 seen in the above model? (b) If we had many more volunteers and still obtain the same estimates, can we say that using the software will increase the academic performance (with causality)? Why or why not? (c) If you had complete control over the pilot testing (ethics and money are not a constraint), how would you measure the effectiveness of this software? Please explain your actions and choices with rigorous reasoning. 4 Transformations Two students from BT2101 are tasked with running an OLS regression and interpreting the results. The raw data assigned to them contains only two columns, the dependent variable (Y ) and the independent variable (X). Student 1 trains the following model and reports the estimated values for slope (β1 ) and intercept (β0 ): Ȳ “ β0 ` β1 X (4) Student 2 standardizes the independent variable observations, by using the following transformation, where µX is the mean of independent variable observations and σX is the standard deviation: X1 “ X ´ µX σX Student 2 then uses the standardized independent variable to run their regression: Ȳ “ β01 ` β11 X 1 (5) • Student 1 reports that β0 is positive and statistically significant (at a significance level mutually agreed upon ahead of the time). Can we use this information to predict the direction and statistical significance of β01 ? Why, or why not? 2 • Student 1 reports that β1 is negative and statistically insignificant (at a significance level mutually agreed upon ahead of the time). Can we use this information to predict the direction and statistical significance of β11 ? Why, or why not? • Student 1 reports that the R2 of their model is 63%. Can we use this information to predict the R2 of the model reported by Student 2? Why, or why not? 5 Non-linear Transformation Consider a regression model: Yi “ β0 ` β1 ˚ lnpX1i q ` β2 ˚ X2i ` β3 ˚ D1i ` β4 ˚ D2i ` β5 ˚ pX2i ˚ D1i q ` β6 ˚ pD1i ˚ D2i q ` ϵi where X1i and X2i are continuous variables and D1i and D2i are binary variables. Answer the following questions. (A) How much does Y change by when X1i is increased by .05%? (B) How much does Y change by when X2i is increased by 1 unit? (C) What is the effect on Y of going from D2i “ 0 to D2i “ 1? 3 (6)