Solution sheet for HW1 1.5. No The simple linear regression model is 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 . It is “simple” in that there is only one predictor variable (KNNL p9) 1.7. a. No. Model 1.1 does not assume any distribution for the data and so it is impossible to compute any Y-related probability. b. Yes, P(195≤Y≤205)=P( 195−200 √25 ≤𝑍≤ 205−200 √25 )= 𝑃(−1 ≤ 𝑍 ≤ 1) =.68 1.12. a. Observational b. This conclusion may not be true. Causation is not clear in this case since other factors that could have influenced frequency of colds cannot be excluded. c. Gender, age, family income and so on. d. One possible way out of this situation is to try to include other explanatory variables (such as those in part c) into the original model. 1.19 a. Parameter Estimates Variable DF Parameter Standard t Value Pr > |t| 95% Confidence Limits Estimate Error Intercept 1 2.11405 0.32089 6.59 <.0001 1.47859 2.74951 ACT 1 0.03883 0.01277 3.04 0.0029 0.01353 0.06412 𝛽0 = 2.11405, 𝛽1 = 0.03883, 𝑌̂ = 2.11405 + .03883X b. Graphically, the model seems to describe the prevailing trend rather well, although a substantial amount of variation in the data hasn’t been explained by it. Note also the presence of several outliers that could have influenced the fit rather strongly. c. ̂ 𝑌ℎ = 3.27895 d. 𝛽1 = 0.03883 1.21 Parameter Estimates Variable DF Parameter Standard t Value Pr > |t| 95% Confidence Limits Estimate Error Intercept 1 10.20000 0.66332 15.38 <.0001 8.67037 11.72963 X 1 4.00000 0.46904 8.53 <.0001 2.91839 5.08161 a. 𝑌̂ℎ = 10.20 + 4.00𝑋 b. 𝑌̂ℎ =14.2 c. 4.0 d. (𝑋, 𝑌)=(1,14.2) 1.23. a. Please get residuals from the “output statistics” (which can be got from the code “output out=new1 p=pred r=resid”) table in SAS output Sum of Residuals Sum of Squared Residuals 0 45.81761 Predicted Residual SS (PRESS) 47.61035 i: 1 2 ... ei: 0.9676 1.2274 119 ... 120 -0.8753 -0.2532 Yes b. Root MSE 0.62313 R-Square 0.0726 Dependent Mean 3.07405 Adj R-Sq 0.0648 Coeff Var MSE = 0.388, 20.27049 √𝑀𝑆𝐸= 0.623, grade points 2.1 a. Yes, α = .05 b. Note that in the real world the population cannot be equal to zero. Therefore, any inference concerning the intercept is rather meaningless. 2.4 a. t(.995; 118) = 2.61814, .03883 ± 2.61814(.01277), .00540 ≤ 𝛽1 ≤ .07226 *You also can get it from the SAS output: Parameter Estimates Variable DF Parameter Standard t Value Pr > |t| 99% Confidence Limits Estimate Error Intercept 1 2.11405 0.32089 6.59 <.0001 1.27390 2.95420 ACT 1 0.03883 0.01277 3.04 0.0029 0.00539 0.07227 (This table also provide for b and c parts) b. 𝐻0 : 𝛽1 = 0, 𝐻𝑎 : 𝛽1 ≠ 0. 𝑡 ∗ = (.03883−0)/.01277 = 3.04072. If |t*| ≤2.61814, do not reject 𝐻0 , otherwise reject 𝐻0 c. 0.00291 Because p-value=0.00291<0.01, we have significant evidence to reject H0, conclude Ha. Therefore, it is same as we get in part b SAS CODE 1.19 and 1.23 data new; input GPA ACT; datalines; (COPY AND PASTE THE GIVEN DATASET HERE) ; run; symbol1 v=dot i=rl; proc gplot data=new; plot GPA*ACT; run; proc reg data=new; model GPA=ACT/clb p r; output out=new1 p=pred r=resid; run; 1.21 data new; input Y X; datalines; (COPY AND PASTE THE GIVE DATASET HERE) ; run; symbol1 v=dot i=rl; proc gplot data=new; plot Y*X; run; proc reg data=new; model Y=X/clb p r; output out=new1 p=pred r=resid; run; 2.4 data new; input GPA ACT; datalines; (COPY AND PASTE THE GIVEN DATASET HERE) ; run; proc reg data=new alpha=0.01;//alpha can be used for setting the C.I. model GPA=ACT/clb p r; output out=new1 p=pred r=resid; run;