Simple Linear Regression * Assignment #7 ( points)

advertisement
STAT 600 – Simple Linear Regression Assignment (50 pts.)
Spring 2014
1 - Mercury Contamination of Walleyes in Island Lake Reservoir
Goal: Develop a regression models to predict/explain mercury level found in the tissues
of a walleye (ppm) using length (in.).
Data Set: Walleyes Island Lake
This assignment is similar to your simple linear regression handout; however, I want you
to investigate mercury contamination levels found in walleyes (ppm) versus the length of
the walleye (in.). The primary interest is in developing a walleye consumption advisory
based on length for walleyes in Island Lake Reservoir near Duluth, so let Y=HGPPM and
X = LGTHIN.
Main Items to address:
a. Obtain a linear correlation measurement to initially investigate the linear relationship
between these two variables. Are these variables linearly related to each other? Explain.
(2 pts.)
b. Perform the overall regression usefulness test (i.e. HO: Regression is not useful vs HA:
Regression is useful) to formalize your initial investigation of these variables. What is
your decision for this test? Write a conclusion in everyday language for this test. (2 pts.)
c. Perform the test to ensure that the slope of our regression line is not zero (i.e. HO:
LGTHIN = 0 vs HA: LGTHIN ≠ 0). What is your decision for this test? Write a conclusion
using everyday language for this test. (2 pts.)
d. What is the RSquare value for this analysis? In the context of this problem, carefully
explain what this number is measuring. (3 pts.)
e. Create a scatter plot of the data with the estimated regression line added.
In the context of this problem, carefully interpret the y-intercept and slope of your
estimated regression line. Again, carefully explain what these numbers are measuring.
(You need to do more than say they are the y-intercept and slope of the line.) (4 pts.)
f. Discuss whether or not the assumptions for this procedure are being meet. Also,
identify any outliers in the data set. If there are problems, I do not expect to you try to fix
them, just identify them and for the purposes of the rest of the problem we will pretend
they are not there. (4 pts.)
Checking the assumptions:
> Model Appropriate: Make sure no existing trends remain in the residual plot.
> Constant Variance: Make sure there is no megaphone patterns in the residual plot
> Independence: Don’t really need to check this as these data are collected over
time.
> Normality: Make a histogram of the residuals and make sure they follow a normal
1
distribution
> Outliers: Any observations that fall outside ±2*RMSE are considered possible outliers.
g. It is recommended that humans should not consume more than one fish per month
with mercury levels in its tissues greater than .5 ppm. Because your average walleye
angler does not carry a gas spectrometer in their fishing boat, actually measuring the Hg
level found in a walleye they have caught is a problem. However, it is very easy for an
angler to measure the length of their walleye in inches.
Using your regression, model what length of walleye would you recommend for the “do
not eat more than one walleye exceeding _______ inches per month” advisory? (2 pts.)
It is also recommended that humans should never consume fish with mercury levels
exceeding 1 ppm in their tissues. Complete the following “we recommend that you do
not eat any walleyes exceeding __________ inches from Island Lake”. (2 pts.)
h. Using your regression analysis, estimate the mean mercury level found in the
population of walleyes in Island Lake that are the lengths below. Give both a single
value estimate and a 95% confidence interval for the mean. Also give the correct
interpretation of the confidence interval for each case.
a) 21.2 inches in length (Note: this is the actual length of one of the walleyes in the data) (3 pts.)
b) 11 inches in length (Note: this is the actual length of one of the walleyes in the data) (3 pts.)
i. Suppose you just caught a whopper walleye measuring 25.1 inches from Island Lake.
What do predict the mercury level would be in this particular fish? Give both a single
value estimate and an interval estimate, giving the correct interpretation of the interval
estimate. (3 pts.)
j. Would you recommend using your model to predict the mercury level for a walleye
that is 8 inches in length? How about 29 inches? Explain your reasoning. (1 pt.)
k. Would you recommend using this model to predict the mercury levels for walleyes in
the Mississippi River? Explain. (1 pt.)
l. The Island Lake walleye data file also contains the weight (lbs.) for each of the fish
sampled. Do you think using weight as opposed to length to establish consumption
advisories is a good idea? Justify your answer. (2 pts.)
2
2 - Waist Circumference and Deep Abdominal AT
Goal: Develop a regression model to predict/explain deep abdominal AT (Y) using waist
circumference (cm) as the predictor (X).
Data File: Waist Circumference
Despres et al. in “Estimate of Deep Abdominal Adipose-Tissue Accumulation from
Simple Anthropometric Measurements in Men”, American Journal of Clinical Nutrition,
(1991), point out that the topography of adipose tissue (AT) is associated with metabolic
complications considered as risk factors for cardiovascular disease. It is important, they
state, to measure the amount of intraabdominal AT as part of the evaluation of the
cardiovascular-disease risk of an individual. Computed tomography (CT), the only
available technique that precisely and reliably measures the amount of deep abdominal
AT, however, is costly and requires irradiation of the subject. In addition, the technique
is not available to many physicians. Despres and his colleagues conducted a study to
develop equations to predict the amount of deep abdominal AT from simple
anthropometric measurements. Their subjects were men between the ages of 18 and 42
years who were free from metabolic disease that would require treatment. Among the
measurements taken on each subject were deep abdominal AT obtained by CT and waist
circumference (cm). The question of interest is how well can one predict and estimate
deep abdominal AT from a knowledge of waist circumference.
Main Items to address:
a.) Create a scatter plot of the data and compute the correlation between waist
circumference and deep abdominal AT. Comment what you see in this plot in terms of
the relationship between deep abdominal AT and waist circumference. (2 pts.)
b.) In the context of this problem, carefully interpret the y-intercept and slope of your
estimated regression line, i.e. carefully explain what these numbers are measuring.
(You need to do more than say they are the y-intercept and slope of the line.) Also
explain to a colleague how they would use this model to predict deep abdominal AT.
(3 pts.)
c.) What is the R- Square value for this analysis? In the context of this problem, carefully
explain what this number is measuring. (2 pts.)
d.) Discuss whether or not the regression assumptions are being met. Also, identify any
outliers in the data set. (4 pts.)
Checklist for checking the regression model assumptions:
> Model Appropriate: Make sure no existing trends remain in the residual plot.
> Constant Variance: Make sure there is no megaphone patterns in the residual plot
> Independence: Don’t really need to check this as these data are collected over
time.
> Normality: Make a histogram of the residuals and make sure they follow a normal
3
distribution
> Outliers: Any observations that fall outside ±2*RMSE are considered possible outliers.
e.) Give a 95% prediction interval for the deep abdominal AT for 25 year old man with a
waist circumference of 105 cm. Interpret this interval. (2 pts.)
Note: There is an individual with a waist circumference of 105 cm in these data.
f.) Can we use the results of this study to predict the deep abdominal AT of an individual
with a waist circumference of 135 cm? Explain. (1 pt.)
g.) Can we use the results of this study predict the deep abdominal AT of 50 year old
male with a waist circumference of 100 cm? Explain. (1 pt.)
h.) Can we use the results of this study predict the deep abdominal AT of 24 year old
female with a waist circumference of 70 cm? Explain. (1 pt.)
4
Download