Western Michigan University Department of Statistics Statistics 5680 – Summer I MIDTERM EXAM Instructions: This take-home exam contains 3 pages and 2 problems. Show enough work to convince me that you know what you are doing. Please remember to JUSTIFY JUSTIFY. It is no fun to read solutions that are written the minimal amount and used as few words as possible and fail to carefully justify, even if they are correct. I will be caring a lot about justifications. You are NOT allowed to communicate with anyone (in person, by phone, email, text or otherwise) about your exam other than me. If you make any assumptions because of lack of information, clearly state the assumptions that you are making. Copy the statement below to your answer document and print your name and write the date. I certify here that the work on this exam is solely mine. I did not receive any assistance from others and I did not provide any assistance to others. PRINT YOUR NAME: DATE: Turning in Your Exam: You should submit your exam to the Dropbox on Elearning. Due: by 5pm May 31, 2022. Late exams will NOT be accepted and will not be scored. I wish you all well! 30 25 20 15 10 Orthodontic Distance 35 1. (50 points) You have taken a course in regression analysis, and a group of public health researchers ask you to serve as a statistical consultant. The researchers have conducted an orthodontic study involving 200 children, 100 boys and 100 girls 8 to 14 years of age. On each child, the distance (mm) from the center of the pituitary gland to the pterygomaxillary fissure was measured. The study’s objective is to investigate the association between the orthodontic distance measurements (denoted by distance) with age (denoted by age). The scatterplot is shown in Figure 1. 8 10 12 14 age Figure 1: Orthodontic study scatter plot You suggest researchers a simple linear regression model with normal error term such as Model 1A: distancei = β0 + β1 agei + i , where i ’s are independently distributed from normal distribution N (0, σ 2 ). (a) (5 points) Based on Figure 1, discuss on the appropriateness of Model 1A to describe this data. (b) (5 points) Researchers notice that in Model 1A, you assume a constant variance for all i ’s. Why would you want to assume that the mean of error terms is 0, i.e. E(i ) = 0? Clearly explain your reasoning. (c) (5 points) Researchers argue that when the age is zero, i.e. age = 0, the orthodontic distance is also be zero, i.e. distance = 0. Thus there is no need y–intercept in the Model 1A, and researchers propose another model such as Model 1B: distancei = β1 agei + i , As a statistician consultant, do you agree with researchers’ Model 1B? Clearly explain your reasoning. Page 1 of 3 (d) (5 points) Researchers plan to use the transformation age∗i = age2i and modify Model 1A such as Model 1C: distancei = β0 + β1 age∗i + i , Suppose that the researchers wish to compare Model 1A and Model 1C for predicting distance by using the “General linear test approach” of Lecture 6 (slide 16-18), which compares full and reduced models. Do you think that this comparison can be done? If yes, write the full and reduced models. If not, clearly explain your reasoning. The following command in R for Model 1A is > fit1 = lm(distance ~ age, data = dental.data) > summary(fit1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.03672 1.08298 0.957 0.34 age 2.05135 0.09815 20.901 <2e-16 *** --Residual standard error: 2.511 on 198 degrees of freedom Multiple R-squared: 0.6881,Adjusted R-squared: 0.6865 F-statistic: 436.8 on 1 and 198 DF, p-value: < 2.2e-16 (e) (5 points) Provide the fitted regression model equation. \ change? (f) (5 points) If age increases 3 years of age, how does distance (g) (5 points) Suppose that researchers are trying to determine whether there is a linear relationship between distance and age. Carefully state the null and alternative hypothesis you will test, also provide a test statistic and p-value for the test, and make your conclusion (using significance level 0.05). (h) (5 points) Researchers notice that the coefficient of determination R2 is substantially large, i.e. R2 = 0.688 ≈ 69%. They conclude that the Model 1A is a good model for predicting distance. Do you agree with their conclusion? Explaining your reasoning. (i) (5 points) What is an estimate of σ 2 ? (j) (5 points) One boy who was 12 years old had an orthodontic distance of 26.3 mm. What is his predicted orthodontic distance? Calculate his residual. 2. (20 points) Consider the orthodontic study in Problem 1, the following command in R for analysis of variance of Model 1A is > fit1 = lm(distance > anova(fit1) Analysis of Variance Response: distance Df Sum Sq age 1 2755.1 Residuals 198 1248.8 ~ age, data = dental.data) Table Mean Sq F value Pr(>F) 2755.08 436.84 < 2.2e-16 *** 6.31 (a) (5 points) Compute SSE, SSR and SSTO. Page 2 of 3 (b) (5 points) Obtain the coefficient of correlation r by using results from part (a). (c) (5 points) State precisely the null and alternative hypothesis are being tested. (d) (5 points) What is your conclusion based on the test in part (c) (using significance level 0.05)? END OF EXAM. Page 3 of 3