Solution sheet for HW1 1.5. No

advertisement
Solution sheet for HW1
1.5. No
The simple linear regression model is 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 . It
is “simple” in that there is only one predictor variable (KNNL p9)
1.7. a. No. Model 1.1 does not assume any distribution for the
data and so it is impossible to compute any Y-related probability.
b. Yes, P(195≤Y≤205)=P(
195−200
√25
≤𝑍≤
205−200
√25
)=
𝑃(−1 ≤ 𝑍 ≤ 1) =.68
1.12.
a. Observational
b. This conclusion may not be true. Causation is not clear in
this case since other factors that could have influenced frequency
of colds cannot be excluded.
c. Gender, age, family income and so on.
d. One possible way out of this situation is to try to include
other explanatory variables (such as those in part c) into the
original model.
1.19 a.
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t| 95% Confidence Limits
Estimate
Error
Intercept
1
2.11405
0.32089
6.59 <.0001
1.47859
2.74951
ACT
1
0.03883
0.01277
3.04 0.0029
0.01353
0.06412
𝛽0 = 2.11405, 𝛽1 = 0.03883, 𝑌̂ = 2.11405 + .03883X
b. Graphically, the model seems to describe the prevailing
trend rather well, although a substantial amount of variation in the
data hasn’t been explained by it. Note also the presence of several
outliers that could have influenced the fit rather strongly.
c.
̂
𝑌ℎ = 3.27895
d. 𝛽1 = 0.03883
1.21
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t| 95% Confidence Limits
Estimate
Error
Intercept
1
10.20000
0.66332
15.38 <.0001
8.67037
11.72963
X
1
4.00000
0.46904
8.53 <.0001
2.91839
5.08161
a. 𝑌̂ℎ = 10.20 + 4.00𝑋
b. 𝑌̂ℎ =14.2
c. 4.0
d. (𝑋, 𝑌)=(1,14.2)
1.23.
a. Please get residuals from the “output statistics” (which can be
got from the code “output out=new1 p=pred r=resid”) table in SAS
output
Sum of Residuals
Sum of Squared Residuals
0
45.81761
Predicted Residual SS (PRESS) 47.61035
i: 1
2
...
ei: 0.9676 1.2274
119
...
120
-0.8753
-0.2532
Yes
b.
Root MSE
0.62313 R-Square 0.0726
Dependent Mean
3.07405 Adj R-Sq 0.0648
Coeff Var
MSE = 0.388,
20.27049
√𝑀𝑆𝐸= 0.623, grade points
2.1 a. Yes, α = .05
b. Note that in the real world the population cannot be equal
to zero. Therefore, any inference concerning the intercept is rather
meaningless.
2.4 a.
t(.995; 118) = 2.61814,
.03883 ± 2.61814(.01277),
.00540 ≤ 𝛽1 ≤ .07226
*You also can get it from the SAS output:
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t| 99% Confidence Limits
Estimate
Error
Intercept
1
2.11405
0.32089
6.59 <.0001
1.27390
2.95420
ACT
1
0.03883
0.01277
3.04 0.0029
0.00539
0.07227
(This table also provide for b and c parts)
b. 𝐻0 : 𝛽1 = 0, 𝐻𝑎 : 𝛽1 ≠ 0.
𝑡 ∗ = (.03883−0)/.01277 = 3.04072.
If |t*| ≤2.61814,
do not reject 𝐻0 , otherwise reject 𝐻0
c. 0.00291
Because p-value=0.00291<0.01, we have significant evidence to
reject H0, conclude Ha. Therefore, it is same as we get in part b
SAS CODE
1.19 and 1.23
data new;
input GPA ACT;
datalines;
(COPY AND PASTE THE GIVEN DATASET HERE)
;
run;
symbol1 v=dot i=rl;
proc gplot data=new;
plot GPA*ACT;
run;
proc reg data=new;
model GPA=ACT/clb p r;
output out=new1 p=pred r=resid;
run;
1.21
data new;
input Y X;
datalines;
(COPY AND PASTE THE GIVE DATASET HERE)
;
run;
symbol1 v=dot i=rl;
proc gplot data=new;
plot Y*X;
run;
proc reg data=new;
model Y=X/clb p r;
output out=new1 p=pred r=resid;
run;
2.4
data new;
input GPA ACT;
datalines;
(COPY AND PASTE THE GIVEN DATASET HERE)
;
run;
proc reg data=new alpha=0.01;//alpha can be used for setting the C.I.
model GPA=ACT/clb p r;
output out=new1 p=pred r=resid;
run;
Download