Stats 252 Lab Assignment 3 Zeng, Yiye Q3 Mark Prokopiuk Stats 252 Q3 Zeng, Yiye Lab Assignment 3 Mark Prokopiuk 1034216 Question 1: 60 Count 40 20 0 88 96 104 106 109 115 120 129 131 133 135 137 139 141 143 147 149 151 153 155 162 height There is a trend for height to be on the high end of all the averages. The mean height looks to be around 139-141. The height varies quite a bit over the 31 day span. Temperature, air pressure, might be factors in distorting measurements. b) 10 8 Count 6 4 2 0 1.50000 1.75000 1.91667 2.08333 2.35000 3.16667 3.83333 4.03333 4.21667 4.38333 4.55000 4.75000 duration There seem to be 2 different averages, one around 1.9 and the other around 4.4. 2 Stats 252 Q3 Zeng, Yiye Lab Assignment 3 Mark Prokopiuk 1034216 Question 2: a) Duration, Interval 120 interval 100 80 60 40 1.00000 2.00000 3.00000 4.00000 5.00000 duration c) There is quite a strong relationship, quite linear. There are many outliers, but the general trend is linear. Positive association, as duration of eruption increases, so will the intervals between eruptions. Question 3: a) Correlations interval interval Pearson Correlation duration 1 Sig. (2-tailed) N duration Pearson Correlation .924(**) .000 272 263 .924(**) 1 Sig. (2-tailed) .000 N 263 299 ** Correlation is significant at the 0.01 level (2-tailed). b) The sign and magnitude due agree with two. Positive sign, and very close to the value 1. 3 Stats 252 Q3 Zeng, Yiye Lab Assignment 3 Mark Prokopiuk 1034216 Question 4: a) Model Summary Model 1 R R Square Adjusted R Square .924(a) .854 a Predictors: (Constant), duration Std. Error of the Estimate .853 6.493 ANOVA(b) Model 1 Regression Sum of Squares 64228.526 Residual df 1 Mean Square 64228.526 11004.721 261 42.164 75233.247 a Predictors: (Constant), duration b Dependent Variable: interval 262 Total F 1523.314 Sig. .000(a) Coefficients(a) Unstandardized Coefficients Model 1 B Standardized Coefficients Std. Error (Constant) 33.347 1.201 duration 13.285 .340 Beta t .924 Sig. 27.765 .000 39.030 .000 a Dependent Variable: interval Model – μ(interval | duration) = β1 + β0(duration) b) Estimate - û(interval | duration) = 33.347 + 13.285(duration) The slope of the regression line indicates a positive relationship between interval and duration. 4 Stats 252 Q3 Zeng, Yiye Lab Assignment 3 Mark Prokopiuk 1034216 120 interval 100 80 60 R Sq Linear = 0.854 40 1.00000 2.00000 3.00000 4.00000 duration 5.00000 _ The line is a pretty decent fit, with many outliers from the actual line. c) R Square explains variation. R2 = 0.854 85.4 percent of variation in interval can be explained by the regression of interval on duration. d) 33.347 + 13.285(duration) = 33.347 + 13.285(2) = 33.347 + 26.57 = 59.917 e) H0: β1 = 0, Ha: β1 ≠ 0 12.285 / 0.340 = 39.030 t-stat sig = 0.000 Extremely strong evidence against H0 SSR(EM) = 75233.247 SSR(SLR) = 11004.721 f) 5 Stats 252 Q3 Zeng, Yiye Lab Assignment 3 Mark Prokopiuk 1034216 Scatterplot Dependent Variable: interval Regression Standardized Residual 4 2 0 -2 -4 -2 -1 0 1 2 Regression Standardized Predicted Value The residuals are clumped into two groups, each with their seemingly own average. They do however seem to be scattered about the horizontal (zero) line. g) Normal P-P Plot of Regression Standardized Residual Dependent Variable: interval Expected Cum Prob 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob Normality from the response is appropriate. The values of the above plot are almost completely along the linear line presented. h) Question 5: 6 Stats 252 Q3 Zeng, Yiye Lab Assignment 3 Mark Prokopiuk 1034216 a) Interval and Duration*Height 120 interval 100 80 60 40 200.00 400.00 600.00 800.00 DandH b) The graph is almost identical to the one of just Interval and Duration alone. The relationship is quite linear. Given a certain height and duration, interval can be easily predicted. Positive association. Yes, there are some outliers. c) model - μ(interval | duration*height) = β1 + β0(duration*height) Estimate - û(interval | duration) = 33.347 + .088(duration) R2 = 0.738 73.8 percent of the variation can be explained by the above plots. 7