Solutions_Activity_04

advertisement
SOLUTIONS TO ACTIVITY SET 4
Relationship Between Chug Time and Weight of a Person
From Utts and Heckard. For
a statistics class project at a large northeastern university, students examined the relationship
between x = body weight (in pounds) and y = time to chug a 12 ounce beverage (in seconds).
We’ll leave it to you to imagine the beverage. The student collected data from 13 individuals, and
those data are displayed below.
Time to Chug vs. Weigjht of Person
Student
Weight
Time
1
153
5.6
2
169
6.1
3
178
3.3
4
198
3.4
5
128
8.2
6
183
3.5
7
177
6.1
8
210
3.1
9
243*
4.0
weight
10
208
3.2
11
157
6.3
12
163
6.9
13
158
6.7
time = 13.0028 - 0.0444460 weight
S = 1.10466
R-Sq = 61.5 %
R-Sq(adj) = 58.0 %
8
7
time
6
5
4
3
2
150
200
250
4.1 Use Minitab on the dataset Chugging found in the Datasets folder in ANGEL. Do
Stat>Regression>Fitted Line Plot. Click ‘Storage’ and then ‘Residuals’ and ‘Fits’.
These will be stored in columns C4 and C5 and named as RESI1 and FITS1. Your output
should look as follows:
Regression Analysis: time versus weight
The regression equation is
time = 13.0028 - 0.0444460 weight
S = 1.10466
R-Sq = 61.5 %
Analysis of Variance
Source
DF
SS
Regression
1
21.4093
Error
11
13.4230
Total
12
34.8323
Note: Explain this equation. Discuss slope
as change in y per unit change in x
in the context of this problem.
R-Sq(adj) = 58.0 %
MS
21.4093
1.2203
F
17.5447
P
0.002
Delete the big Guy!
Regression Analysis: time versus weight
The regression equation is
time = 16.2509 - 0.0640304 weight
S = 0.864635
R-Sq = 77.8 %
R-Sq(adj) = 75.5 %
Analysis of Variance
Source
Regression
Error
Total
DF
1
10
11
SS
26.1532
7.4759
33.6292
MS
26.1532
0.7476
F
34.9832
P
0.000
The NOTE from above: The slope indicates “for a unit change in X, Y will change by the amount
and direction of the slope”. So here, for a 1 pound increase in Weight the predicted time will
decrease by 0.04445 seconds.
a. Create a scatter plot of the measurements by Graph > Scatter Plot, and select weight as the
predictor and time as the response. Describe the relationship between chug time and weight.
Which is the response variable and which is predictor? There is a negative relationship between
time (the response variable) and weight (the explanatory variable)
b. The heaviest person looks to be an outlier. Do you think it is a legitimate observation or do
you think an error was made in recording or entering the data? It is probably a legitimate
value—there is no guarantee that all heavy people can chug faster than lighter weight
people.
c. The least squares regression line for predicting chug time from body weight is given on the
preceding page. What is the fitted regression line? (Stat>Regression>Regression)
Fitted regression line: time = 16.2509 - 0.0640304 weight
d. What do the values in the FITS and RES columns represent?
The fits are the values of the Response (e.g. time) obtained when the observed predictor
variable (e.g. weight) values are entered into the regression
The residuals (RES) are the values of the observed Response, Y, values minus the fitted
values.
4.2 Although outliers should never be deleted without a reason, there are several reasons why it
may be legitimate to conduct an analysis without them. Delete the data point for the heaviest
person (click on the cell with the weight of 243 and enter *) and re-calculate the regression line
for the remainder of the data. You should obtain the following output:
(Big Guy deleted)
Regression Analysis: time versus weight
The regression equation is
time = 16.2509 - 0.0640304 weight
S = 0.864635
R-Sq = 77.8 %
R-Sq(adj) = 75.5 %
Analysis of Variance
Source
Regression
Error
DF
1
10
SS
26.1532
7.4759
Total
11
33.6292
MS
26.1532
0.7476
F
34.9832
P
0.000
a. Use the regression line with the ‘big guy’ deleted to estimate the chug time for an individual
who weighs 243 pounds. Do you think this estimate could be achieved by anybody?
The fitted regression equation is: time = 16.2509 - 0.0640304 weight. Substitute
weight in this equation by 243 to get
time = 16.2509 – (0.0640304)(243) = 16.2509 – 15.5594 = 0.6915.
It is inconceivable that anyone could chug this fast! We are extrapolating beyond the range of
observation and this can lead to very misleading results.
b. What does the value of R2 represent? (Explain it using the variables from this data).
R2 is the coefficient of determination and in simple terms provides how much of the
variation in the Response(Y) variable is explained by the Predictor(X) variable. For
our example: with the Big Guy deleted, 77.8% of the variability Chugging Time is
explained by Weight compared to 61.5% for when the Big Guy is included.
c. What is the correlation between Chug Time and Weight for both the data sets
including and excluding the Big Guy?
The correlation is equal to the square root of R2 and takes the sign of the slope
(therefore being able to take on a range of values from – 1 ≤ r ≤ 1). The correlation
is commonly represented as a decimal value. Thus, the correlation between Chug
Time and Weight is equal to the square root of the correlation of determination (R2)
Big Guy Deleted: correlation, r, = √0.778 = .882 and is negative since the slope of the
regression equation is negative. So r is – 0.882. In the case where the Big Guy is
included, the correlation is: r = – 0.784
d. Find the correlation between Chug Time and Weight (you can pick whether do use the
Big Guy or not) by going to Stat>Basic Statistics>Correlation and entering both variables into
the Variables box. Does this correlation value agree with the value you found in part c?
Yes, the values are the same.
e. How does the fit of the regression line of the original data compare (visually and statistically)
to the fit of the regression line to the data with the big guy removed? To do this, first stay with
the current data with the big guy removed and go to Stat > Regression > Fitted Line Plot. Select
weight as the Predictor (x-variable) and time as the Response (y-variable). Once the graph is
created you can Click twice on the title which will open an “Edit title” box. Type in the box
under Text: Big Guy Deleted. Now add the weight (243) of the Big Guy back into the data and
repeat these steps and labeling the graph Big Guy Included.
Big Guy Deleted
time = 16.25 - 0.06403 weight
S
R-Sq
R-Sq(adj)
8
7
time
6
5
4
3
120
130
140
150
160 170
weight
180
190
200
210
0.864635
77.8%
75.5%
Big Guy Included
time = 13.00 - 0.04445 weight
9
S
R-Sq
R-Sq(adj)
8
1.10466
61.5%
58.0%
time
7
6
5
4
3
2
120
140
160
180
200
weight
220
240
Discussion, for the regression using all of the data:
‘RESIDUALS’ and ‘FITS’. Fits are the values obtained by substituting values of
weight in the regression equation. Residuals are the differences between observed
values y and fitted values FITS = time = 16.2509 - 0.0640304 weight
Calculation of R2:
SSTO =  ( y  y ) 2 = 34.8323 = sum of squared errors of predictions using the
simple average y = 5.05385 to estimate y = time to chug. To get y = 5.05385, use
Minitab: Calc>column statistics>Mean (time)
SSE =  ( y  yˆ ) 2 = 13.4230 = sum of squared errors of predictions (fitted values)
using the regression equation = sum of squared residuals. Do in Minitab:
Calc>Column Statistics>Sum of Squares.
R2 = (34.8323 - 21.4093) / 34.6323 = 21.4093/34.6323 = .615 or 61.5%. The value
21.4093 is the amount by which the sum of squared errors of predictions is reduced
using the regression equation as compared to using the mean = SSR.
Discuss effect of removing the ‘Big Guy’:
NOTE:
 how R2 changes, from 61.5% to 77.8%
 how the regression equation changes. Slope is more negative.
 scatter plot looks more tight’ around regression line because outlier is not
there now.
Download