Assignment 3 - Winona State University

advertisement
Assignment 3 – STAT 360 – Regression Analysis
1 - Mercury Contamination of Walleyes in Island Lake Reservoir
Datafiles: Walleyes Island Lake.JMP and Walleyes Island Lake.txt
Walleyes are the state fish of Minnesota and are the most important game fish in
MN (estimated 43,000 jobs and $2.8 billion in retail spending). The main
contaminant found in MN walleyes is mercury which can have health
consequences if ingested. Every year the MN Dept. of Natural Resources (DNR)
and the MN Dept. of Health (MDH) publish waterway specific fish consumption
guidelines for at-risk populations - children under age 15 and women who are or
may become pregnant. Too see current consumption guidelines visit:
http://www.health.state.mn.us/divs/eh/fish/eating/sitespecific.html.
Data Source: These data come from Minnesota's Fish Contaminant Monitoring
Program (FCMP) which is a joint effort by the DNR, MDH, MN Dept. of
Agriculture (MDA), and the Minnesota Pollution Control Agency (MPCA) from
the years (1990 -1998).
Goal: Develop a regression model to predict/explain mercury level found in the
tissues of a walleye (ppm) using length (in.). Of primary interest is in developing
a walleye consumption advisory based on length for walleyes in Island Lake
Reservoir near Duluth, so let Y=HGPPM and X = LGTHIN.
a) Examine a scatterplot of HGPPM vs. LGTHIN and add marginal
distribution estimates to the plot. Would you characterize the joint
distribution of (LGTHIN,HGPPM) as bivariate normal? Explain. (3 pts.)
b) Take the natural log of HGPPM and use Analyze > Distribution to
examine the distribution of both HGPPM and log(HGPPM). Find the
̅̅̅̅̅̅̅̅
sample means of the mercury levels in the log scale (log⁡
(𝑦)) and in the
̅̅̅̅̅̅̅̅
original scale (𝑦̅). Convert log⁡(𝑦) back to the original scale and compare it
to 𝑦̅. How do they compare? (3 pts.)
c) Repeat part (b), but this time consider the sample medians instead. What
do you find? (3 pts.)
1
d) Fit the model:
𝐸(𝐻𝐺𝑃𝑃𝑀|𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 + 𝛽1𝐿𝐺𝑇𝐻𝐼𝑁
𝑉𝑎𝑟(𝐻𝐺𝑃𝑃𝑀|𝐿𝐺𝑇𝐻𝐼𝑁) = 𝜎 2
Examine residual plots and comment on the model assumptions. (3 pts.)
e) Construct a nonconstant variance plot for the model fit in part (d),
1
|𝑒̂𝑖 2 | ⁡𝑣𝑠. 𝑦̂𝑖 , and discuss what this model suggests regarding the model
assumptions. (3 pts.)
f) Despite the fact this model is clearly deficient, interpret the both
parameters estimates 𝛽̂𝑜 ⁡&⁡𝛽̂1 in words using proper units. (2 pts.)
g) Now fit the model:
𝐸(log(𝐻𝐺𝑃𝑃𝑀|𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑂 + 𝛽1 𝐿𝐺𝑇𝐻𝐼𝑁
𝑉𝑎𝑟(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝜎 2
Examine residual plots and comment on the model assumptions. (3 pts.)
h) Construct a nonconstant variance plot for the model fit in part (f),
1
|𝑒̂𝑖 2 | ⁡𝑣𝑠. 𝑦̂𝑖 , and discuss what this model suggests regarding the model
assumptions. (3 pts.)
You should find that the model using log⁡(𝐻𝐺𝑃𝑃𝑀) as the response is more
appropriate for modeling the relationship between the mercury levels found in
the walleyes using their length in inches. For the remainder of this problem you
will be working with the model from part (c) where the response is log⁡(𝐻𝐺𝑃𝑃𝑀).
i) Test the hypotheses
NH: 𝐸(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜
AH: 𝐸(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 + 𝛽1 𝐿𝐺𝑇𝐻𝐼𝑁
and summarize your findings. (3 pts.)
2
j) Conduct the following test for population slope parameter (𝛽1 )
𝑁𝐻: 𝛽1 = 0
𝐴𝐻: 𝛽1 ≠ 0
Summarize your results. Square the t-statistic for this test and compare it
to the F-statistic from the test in part (f), what do you find? (3 pts.)
k) What is the R-Square (𝑅 2 ) value for the regression of log⁡(𝐻𝐺𝑃𝑃𝑀) on
𝐿𝐺𝑇𝐻𝐼𝑁? In the context of this problem, carefully explain what this value
is measuring. (3 pts.)
l) Use the estimated slope (𝛽̂1 ) and the associated CI for 𝛽1 to interpret the
change in the response in the original scale associated with a 1 inch
increase in the length of walleyes in Island Lake. Summarize your
findings carefully and thoroughly, both multiplicatively and in terms of a
percent increase in the mercury level (ppm). (6 pts.)
m) Construct a scatterplot of Y = HGPPM vs. X = LGTHIN and use Bivariate
Fit > Fit Special to the fit the model 𝐸(log⁡(𝐻𝐺𝑃𝑃𝑀)|𝐿𝐺𝑇𝐻𝐼𝑁). Also add
the shaded confidence and prediction intervals to the plot. Include this
plot below and discuss it. (3 pts.)
n) Give a point estimate and CI for 𝐸(log⁡(𝐻𝐺𝑃𝑃𝑀)|𝐿𝐺𝑇𝐻𝐼𝑁 = 17.9). Also
convert these back to the original scale and interpret. (4 pts.)
̃
o) Give a point estimate and PI for log(𝐻𝐺𝑃𝑃𝑀)
|𝐿𝐺𝐻𝑇𝐼𝑁 = 20. Also convert
these back to the original scale and interpret. (4 pts.)
Note regarding part (l): There are no walleyes in the sample that are 20 inches in length.
To obtain these estimates you will need to save the Prediction Interval Formula to the
data table and then add a new row to the spreadsheet corresponding a walleye that is 20
inches in length.
It is recommended that humans should not consume more than one fish per
month with mercury levels in its tissues greater than .5 ppm. Because your
average walleye angler does not carry a gas spectrometer in their fishing boat,
actually measuring the Hg level found in a walleye they have caught is a
problem. However, it is very easy for an angler to measure the length of their
walleye in inches.
3
p) Using your regression model, what length of walleye would you
recommend for the “do not eat more than one walleye exceeding _______
inches per month” advisory? (2 pts.)
Note regarding parts (m & n): The process of finding an X value associated with a
specific value for the response (Y) is called inverse prediction. Also keep in mind that your
model is for log(Y) not Y, so you will need to take this into account when answering this
question.
q) It is also recommended that humans should never consume fish with
mercury levels exceeding 1 ppm in their tissues. Complete the following
“we recommend that you do not eat any walleyes exceeding __________
inches from Island Lake”. (2 pts.)
r) Would you recommend using your model to predict the mercury level for
a walleye that is 7 inches in length? How about 29 inches? Explain your
reasoning. (2 pts.)
s) Would you recommend using this model to predict the mercury levels
and develop consumption advisories for walleyes in the Mississippi
River? Explain. (1 pt.)
t) The Island Lake walleye data also contains the weight (lbs.) for each of the
fish sampled. Do you think using weight as opposed to length to establish
consumption advisories is a good idea? Justify your answer by fitting
models for mercury or log mercury level using X = WTLB as the predictor
and contrasting the results with those above. (4 pts.)
u) Another possible model to consider is:
𝐸(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 + 𝛽1 log⁡(𝐿𝐺𝑇𝐻𝐼𝑁)
𝑉𝑎𝑟(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝜎 2
Fit this model, examine residual plots, and comment on the adequacy of
this model. (5 pts.)
v) Use the estimated slope (𝛽̂1 ) from the model in part (t) to interpret the
change in the response in the original scale associated with a 1 unit
increase in the log(𝐿𝐺𝑇𝐻𝐼𝑁). (3 pts.)
4
w) Use the estimated slope (𝛽̂1 ) from the model in part (t) to interpret the
change in the response in the original scale associated with a 20% increase
in the length of walleyes. (3 pts.)
x) Using this regression model, what length of walleye would you
recommend for the “do not eat more than one walleye exceeding _______
inches per month” advisory? How does your recommendation compare
to the recommendation from part (o)? (3 pts.)
y) It is also recommended that humans should never consume fish with
mercury levels exceeding 1 ppm in their tissues. Complete the following
“we recommend that you do not eat any walleyes exceeding __________
inches from Island Lake” using this regression model. How does your
recommendation compare to the recommendation from part (p)? (3 pts.)
z) Use R to fit the model 𝐸(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 + 𝛽1 log⁡(𝐿𝐺𝑇𝐻𝐼𝑁)
and obtain a model summary. Include the output from R below. To
retain the appearance of R output using Courier New (10 pt) as the
font. (5 pts.)
Code to run in R:
> Island = read.table(file.choose(),header=T,sep=”,”)
> names(Island)
> attach(Island)
> logHg = log(HGPPM)
> logX = log(LGTHIN)
> trendscatter(logHg~logX)
> lm1 = lm(logHg~logX)
> summary(lm1)
5
Download