Uploaded by Ernest Au

Midterm Summer 2022

advertisement
Western Michigan University
Department of Statistics
Statistics 5680 – Summer I
MIDTERM EXAM
Instructions: This take-home exam contains 3 pages and 2 problems. Show enough work to
convince me that you know what you are doing. Please remember to JUSTIFY JUSTIFY. It is no
fun to read solutions that are written the minimal amount and used as few words as possible and
fail to carefully justify, even if they are correct. I will be caring a lot about justifications.
You are NOT allowed to communicate with anyone (in person, by phone, email, text
or otherwise) about your exam other than me. If you make any assumptions because of lack
of information, clearly state the assumptions that you are making.
Copy the statement below to your answer document and print your name and write
the date.
I certify here that the work on this exam is solely mine. I did not receive any assistance from others
and I did not provide any assistance to others.
PRINT YOUR NAME:
DATE:
Turning in Your Exam: You should submit your exam to the Dropbox on Elearning.
Due: by 5pm May 31, 2022. Late exams will NOT be accepted and will not be scored.
I wish you all well!
30
25
20
15
10
Orthodontic Distance
35
1. (50 points) You have taken a course in regression analysis, and a group of public health researchers ask you to serve as a statistical consultant. The researchers have conducted an
orthodontic study involving 200 children, 100 boys and 100 girls 8 to 14 years of age. On each
child, the distance (mm) from the center of the pituitary gland to the pterygomaxillary fissure
was measured. The study’s objective is to investigate the association between the orthodontic
distance measurements (denoted by distance) with age (denoted by age). The scatterplot is
shown in Figure 1.
8
10
12
14
age
Figure 1: Orthodontic study scatter plot
You suggest researchers a simple linear regression model with normal error term such as
Model 1A:
distancei = β0 + β1 agei + i ,
where i ’s are independently distributed from normal distribution N (0, σ 2 ).
(a) (5 points) Based on Figure 1, discuss on the appropriateness of Model 1A to describe this
data.
(b) (5 points) Researchers notice that in Model 1A, you assume a constant variance for all
i ’s. Why would you want to assume that the mean of error terms is 0, i.e. E(i ) = 0?
Clearly explain your reasoning.
(c) (5 points) Researchers argue that when the age is zero, i.e. age = 0, the orthodontic
distance is also be zero, i.e. distance = 0. Thus there is no need y–intercept in the
Model 1A, and researchers propose another model such as
Model 1B:
distancei = β1 agei + i ,
As a statistician consultant, do you agree with researchers’ Model 1B? Clearly explain
your reasoning.
Page 1 of 3
(d) (5 points) Researchers plan to use the transformation age∗i = age2i and modify Model 1A
such as
Model 1C: distancei = β0 + β1 age∗i + i ,
Suppose that the researchers wish to compare Model 1A and Model 1C for predicting
distance by using the “General linear test approach” of Lecture 6 (slide 16-18), which
compares full and reduced models. Do you think that this comparison can be done? If
yes, write the full and reduced models. If not, clearly explain your reasoning.
The following command in R for Model 1A is
> fit1 = lm(distance ~ age, data = dental.data)
> summary(fit1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.03672
1.08298
0.957
0.34
age
2.05135
0.09815 20.901
<2e-16 ***
--Residual standard error: 2.511 on 198 degrees of freedom
Multiple R-squared: 0.6881,Adjusted R-squared: 0.6865
F-statistic: 436.8 on 1 and 198 DF, p-value: < 2.2e-16
(e) (5 points) Provide the fitted regression model equation.
\ change?
(f) (5 points) If age increases 3 years of age, how does distance
(g) (5 points) Suppose that researchers are trying to determine whether there is a linear
relationship between distance and age. Carefully state the null and alternative hypothesis
you will test, also provide a test statistic and p-value for the test, and make your conclusion
(using significance level 0.05).
(h) (5 points) Researchers notice that the coefficient of determination R2 is substantially
large, i.e. R2 = 0.688 ≈ 69%. They conclude that the Model 1A is a good model for
predicting distance. Do you agree with their conclusion? Explaining your reasoning.
(i) (5 points) What is an estimate of σ 2 ?
(j) (5 points) One boy who was 12 years old had an orthodontic distance of 26.3 mm. What
is his predicted orthodontic distance? Calculate his residual.
2. (20 points) Consider the orthodontic study in Problem 1, the following command in R for
analysis of variance of Model 1A is
> fit1 = lm(distance
> anova(fit1)
Analysis of Variance
Response: distance
Df Sum Sq
age
1 2755.1
Residuals 198 1248.8
~ age, data = dental.data)
Table
Mean Sq F value
Pr(>F)
2755.08 436.84 < 2.2e-16 ***
6.31
(a) (5 points) Compute SSE, SSR and SSTO.
Page 2 of 3
(b) (5 points) Obtain the coefficient of correlation r by using results from part (a).
(c) (5 points) State precisely the null and alternative hypothesis are being tested.
(d) (5 points) What is your conclusion based on the test in part (c) (using significance level
0.05)?
END OF EXAM.
Page 3 of 3
Download