SOLUTIONS TO ACTIVITY SET 4 Relationship Between Chug Time and Weight of a Person From Utts and Heckard. For a statistics class project at a large northeastern university, students examined the relationship between x = body weight (in pounds) and y = time to chug a 12 ounce beverage (in seconds). We’ll leave it to you to imagine the beverage. The student collected data from 13 individuals, and those data are displayed below. Time to Chug vs. Weigjht of Person Student Weight Time 1 153 5.6 2 169 6.1 3 178 3.3 4 198 3.4 5 128 8.2 6 183 3.5 7 177 6.1 8 210 3.1 9 243* 4.0 weight 10 208 3.2 11 157 6.3 12 163 6.9 13 158 6.7 time = 13.0028 - 0.0444460 weight S = 1.10466 R-Sq = 61.5 % R-Sq(adj) = 58.0 % 8 7 time 6 5 4 3 2 150 200 250 4.1 Use Minitab on the dataset Chugging found in the Datasets folder in ANGEL. Do Stat>Regression>Fitted Line Plot. Click ‘Storage’ and then ‘Residuals’ and ‘Fits’. These will be stored in columns C4 and C5 and named as RESI1 and FITS1. Your output should look as follows: Regression Analysis: time versus weight The regression equation is time = 13.0028 - 0.0444460 weight S = 1.10466 R-Sq = 61.5 % Analysis of Variance Source DF SS Regression 1 21.4093 Error 11 13.4230 Total 12 34.8323 Note: Explain this equation. Discuss slope as change in y per unit change in x in the context of this problem. R-Sq(adj) = 58.0 % MS 21.4093 1.2203 F 17.5447 P 0.002 Delete the big Guy! Regression Analysis: time versus weight The regression equation is time = 16.2509 - 0.0640304 weight S = 0.864635 R-Sq = 77.8 % R-Sq(adj) = 75.5 % Analysis of Variance Source Regression Error Total DF 1 10 11 SS 26.1532 7.4759 33.6292 MS 26.1532 0.7476 F 34.9832 P 0.000 The NOTE from above: The slope indicates “for a unit change in X, Y will change by the amount and direction of the slope”. So here, for a 1 pound increase in Weight the predicted time will decrease by 0.04445 seconds. a. Create a scatter plot of the measurements by Graph > Scatter Plot, and select weight as the predictor and time as the response. Describe the relationship between chug time and weight. Which is the response variable and which is predictor? There is a negative relationship between time (the response variable) and weight (the explanatory variable) b. The heaviest person looks to be an outlier. Do you think it is a legitimate observation or do you think an error was made in recording or entering the data? It is probably a legitimate value—there is no guarantee that all heavy people can chug faster than lighter weight people. c. The least squares regression line for predicting chug time from body weight is given on the preceding page. What is the fitted regression line? (Stat>Regression>Regression) Fitted regression line: time = 16.2509 - 0.0640304 weight d. What do the values in the FITS and RES columns represent? The fits are the values of the Response (e.g. time) obtained when the observed predictor variable (e.g. weight) values are entered into the regression The residuals (RES) are the values of the observed Response, Y, values minus the fitted values. 4.2 Although outliers should never be deleted without a reason, there are several reasons why it may be legitimate to conduct an analysis without them. Delete the data point for the heaviest person (click on the cell with the weight of 243 and enter *) and re-calculate the regression line for the remainder of the data. You should obtain the following output: (Big Guy deleted) Regression Analysis: time versus weight The regression equation is time = 16.2509 - 0.0640304 weight S = 0.864635 R-Sq = 77.8 % R-Sq(adj) = 75.5 % Analysis of Variance Source Regression Error DF 1 10 SS 26.1532 7.4759 Total 11 33.6292 MS 26.1532 0.7476 F 34.9832 P 0.000 a. Use the regression line with the ‘big guy’ deleted to estimate the chug time for an individual who weighs 243 pounds. Do you think this estimate could be achieved by anybody? The fitted regression equation is: time = 16.2509 - 0.0640304 weight. Substitute weight in this equation by 243 to get time = 16.2509 – (0.0640304)(243) = 16.2509 – 15.5594 = 0.6915. It is inconceivable that anyone could chug this fast! We are extrapolating beyond the range of observation and this can lead to very misleading results. b. What does the value of R2 represent? (Explain it using the variables from this data). R2 is the coefficient of determination and in simple terms provides how much of the variation in the Response(Y) variable is explained by the Predictor(X) variable. For our example: with the Big Guy deleted, 77.8% of the variability Chugging Time is explained by Weight compared to 61.5% for when the Big Guy is included. c. What is the correlation between Chug Time and Weight for both the data sets including and excluding the Big Guy? The correlation is equal to the square root of R2 and takes the sign of the slope (therefore being able to take on a range of values from – 1 ≤ r ≤ 1). The correlation is commonly represented as a decimal value. Thus, the correlation between Chug Time and Weight is equal to the square root of the correlation of determination (R2) Big Guy Deleted: correlation, r, = √0.778 = .882 and is negative since the slope of the regression equation is negative. So r is – 0.882. In the case where the Big Guy is included, the correlation is: r = – 0.784 d. Find the correlation between Chug Time and Weight (you can pick whether do use the Big Guy or not) by going to Stat>Basic Statistics>Correlation and entering both variables into the Variables box. Does this correlation value agree with the value you found in part c? Yes, the values are the same. e. How does the fit of the regression line of the original data compare (visually and statistically) to the fit of the regression line to the data with the big guy removed? To do this, first stay with the current data with the big guy removed and go to Stat > Regression > Fitted Line Plot. Select weight as the Predictor (x-variable) and time as the Response (y-variable). Once the graph is created you can Click twice on the title which will open an “Edit title” box. Type in the box under Text: Big Guy Deleted. Now add the weight (243) of the Big Guy back into the data and repeat these steps and labeling the graph Big Guy Included. Big Guy Deleted time = 16.25 - 0.06403 weight S R-Sq R-Sq(adj) 8 7 time 6 5 4 3 120 130 140 150 160 170 weight 180 190 200 210 0.864635 77.8% 75.5% Big Guy Included time = 13.00 - 0.04445 weight 9 S R-Sq R-Sq(adj) 8 1.10466 61.5% 58.0% time 7 6 5 4 3 2 120 140 160 180 200 weight 220 240 Discussion, for the regression using all of the data: ‘RESIDUALS’ and ‘FITS’. Fits are the values obtained by substituting values of weight in the regression equation. Residuals are the differences between observed values y and fitted values FITS = time = 16.2509 - 0.0640304 weight Calculation of R2: SSTO = ( y y ) 2 = 34.8323 = sum of squared errors of predictions using the simple average y = 5.05385 to estimate y = time to chug. To get y = 5.05385, use Minitab: Calc>column statistics>Mean (time) SSE = ( y yˆ ) 2 = 13.4230 = sum of squared errors of predictions (fitted values) using the regression equation = sum of squared residuals. Do in Minitab: Calc>Column Statistics>Sum of Squares. R2 = (34.8323 - 21.4093) / 34.6323 = 21.4093/34.6323 = .615 or 61.5%. The value 21.4093 is the amount by which the sum of squared errors of predictions is reduced using the regression equation as compared to using the mean = SSR. Discuss effect of removing the ‘Big Guy’: NOTE: how R2 changes, from 61.5% to 77.8% how the regression equation changes. Slope is more negative. scatter plot looks more tight’ around regression line because outlier is not there now.