Assignment 3 – STAT 360 – Regression Analysis 1 - Mercury Contamination of Walleyes in Island Lake Reservoir Datafiles: Walleyes Island Lake.JMP and Walleyes Island Lake.txt Walleyes are the state fish of Minnesota and are the most important game fish in MN (estimated 43,000 jobs and $2.8 billion in retail spending). The main contaminant found in MN walleyes is mercury which can have health consequences if ingested. Every year the MN Dept. of Natural Resources (DNR) and the MN Dept. of Health (MDH) publish waterway specific fish consumption guidelines for at-risk populations - children under age 15 and women who are or may become pregnant. Too see current consumption guidelines visit: http://www.health.state.mn.us/divs/eh/fish/eating/sitespecific.html. Data Source: These data come from Minnesota's Fish Contaminant Monitoring Program (FCMP) which is a joint effort by the DNR, MDH, MN Dept. of Agriculture (MDA), and the Minnesota Pollution Control Agency (MPCA) from the years (1990 -1998). Goal: Develop a regression model to predict/explain mercury level found in the tissues of a walleye (ppm) using length (in.). Of primary interest is in developing a walleye consumption advisory based on length for walleyes in Island Lake Reservoir near Duluth, so let Y=HGPPM and X = LGTHIN. a) Examine a scatterplot of HGPPM vs. LGTHIN and add marginal distribution estimates to the plot. Would you characterize the joint distribution of (LGTHIN,HGPPM) as bivariate normal? Explain. (3 pts.) b) Take the natural log of HGPPM and use Analyze > Distribution to examine the distribution of both HGPPM and log(HGPPM). Find the ̅̅̅̅̅̅̅̅ sample means of the mercury levels in the log scale (log (𝑦)) and in the ̅̅̅̅̅̅̅̅ original scale (𝑦̅). Convert log(𝑦) back to the original scale and compare it to 𝑦̅. How do they compare? (3 pts.) c) Repeat part (b), but this time consider the sample medians instead. What do you find? (3 pts.) 1 d) Fit the model: 𝐸(𝐻𝐺𝑃𝑃𝑀|𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 + 𝛽1𝐿𝐺𝑇𝐻𝐼𝑁 𝑉𝑎𝑟(𝐻𝐺𝑃𝑃𝑀|𝐿𝐺𝑇𝐻𝐼𝑁) = 𝜎 2 Examine residual plots and comment on the model assumptions. (3 pts.) e) Construct a nonconstant variance plot for the model fit in part (d), 1 |𝑒̂𝑖 2 | 𝑣𝑠. 𝑦̂𝑖 , and discuss what this model suggests regarding the model assumptions. (3 pts.) f) Despite the fact this model is clearly deficient, interpret the both parameters estimates 𝛽̂𝑜 &𝛽̂1 in words using proper units. (2 pts.) g) Now fit the model: 𝐸(log(𝐻𝐺𝑃𝑃𝑀|𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑂 + 𝛽1 𝐿𝐺𝑇𝐻𝐼𝑁 𝑉𝑎𝑟(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝜎 2 Examine residual plots and comment on the model assumptions. (3 pts.) h) Construct a nonconstant variance plot for the model fit in part (f), 1 |𝑒̂𝑖 2 | 𝑣𝑠. 𝑦̂𝑖 , and discuss what this model suggests regarding the model assumptions. (3 pts.) You should find that the model using log(𝐻𝐺𝑃𝑃𝑀) as the response is more appropriate for modeling the relationship between the mercury levels found in the walleyes using their length in inches. For the remainder of this problem you will be working with the model from part (c) where the response is log(𝐻𝐺𝑃𝑃𝑀). i) Test the hypotheses NH: 𝐸(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 AH: 𝐸(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 + 𝛽1 𝐿𝐺𝑇𝐻𝐼𝑁 and summarize your findings. (3 pts.) 2 j) Conduct the following test for population slope parameter (𝛽1 ) 𝑁𝐻: 𝛽1 = 0 𝐴𝐻: 𝛽1 ≠ 0 Summarize your results. Square the t-statistic for this test and compare it to the F-statistic from the test in part (f), what do you find? (3 pts.) k) What is the R-Square (𝑅 2 ) value for the regression of log(𝐻𝐺𝑃𝑃𝑀) on 𝐿𝐺𝑇𝐻𝐼𝑁? In the context of this problem, carefully explain what this value is measuring. (3 pts.) l) Use the estimated slope (𝛽̂1 ) and the associated CI for 𝛽1 to interpret the change in the response in the original scale associated with a 1 inch increase in the length of walleyes in Island Lake. Summarize your findings carefully and thoroughly, both multiplicatively and in terms of a percent increase in the mercury level (ppm). (6 pts.) m) Construct a scatterplot of Y = HGPPM vs. X = LGTHIN and use Bivariate Fit > Fit Special to the fit the model 𝐸(log(𝐻𝐺𝑃𝑃𝑀)|𝐿𝐺𝑇𝐻𝐼𝑁). Also add the shaded confidence and prediction intervals to the plot. Include this plot below and discuss it. (3 pts.) n) Give a point estimate and CI for 𝐸(log(𝐻𝐺𝑃𝑃𝑀)|𝐿𝐺𝑇𝐻𝐼𝑁 = 17.9). Also convert these back to the original scale and interpret. (4 pts.) ̃ o) Give a point estimate and PI for log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝐻𝑇𝐼𝑁 = 20. Also convert these back to the original scale and interpret. (4 pts.) Note regarding part (l): There are no walleyes in the sample that are 20 inches in length. To obtain these estimates you will need to save the Prediction Interval Formula to the data table and then add a new row to the spreadsheet corresponding a walleye that is 20 inches in length. It is recommended that humans should not consume more than one fish per month with mercury levels in its tissues greater than .5 ppm. Because your average walleye angler does not carry a gas spectrometer in their fishing boat, actually measuring the Hg level found in a walleye they have caught is a problem. However, it is very easy for an angler to measure the length of their walleye in inches. 3 p) Using your regression model, what length of walleye would you recommend for the “do not eat more than one walleye exceeding _______ inches per month” advisory? (2 pts.) Note regarding parts (m & n): The process of finding an X value associated with a specific value for the response (Y) is called inverse prediction. Also keep in mind that your model is for log(Y) not Y, so you will need to take this into account when answering this question. q) It is also recommended that humans should never consume fish with mercury levels exceeding 1 ppm in their tissues. Complete the following “we recommend that you do not eat any walleyes exceeding __________ inches from Island Lake”. (2 pts.) r) Would you recommend using your model to predict the mercury level for a walleye that is 7 inches in length? How about 29 inches? Explain your reasoning. (2 pts.) s) Would you recommend using this model to predict the mercury levels and develop consumption advisories for walleyes in the Mississippi River? Explain. (1 pt.) t) The Island Lake walleye data also contains the weight (lbs.) for each of the fish sampled. Do you think using weight as opposed to length to establish consumption advisories is a good idea? Justify your answer by fitting models for mercury or log mercury level using X = WTLB as the predictor and contrasting the results with those above. (4 pts.) u) Another possible model to consider is: 𝐸(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 + 𝛽1 log(𝐿𝐺𝑇𝐻𝐼𝑁) 𝑉𝑎𝑟(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝜎 2 Fit this model, examine residual plots, and comment on the adequacy of this model. (5 pts.) v) Use the estimated slope (𝛽̂1 ) from the model in part (t) to interpret the change in the response in the original scale associated with a 1 unit increase in the log(𝐿𝐺𝑇𝐻𝐼𝑁). (3 pts.) 4 w) Use the estimated slope (𝛽̂1 ) from the model in part (t) to interpret the change in the response in the original scale associated with a 20% increase in the length of walleyes. (3 pts.) x) Using this regression model, what length of walleye would you recommend for the “do not eat more than one walleye exceeding _______ inches per month” advisory? How does your recommendation compare to the recommendation from part (o)? (3 pts.) y) It is also recommended that humans should never consume fish with mercury levels exceeding 1 ppm in their tissues. Complete the following “we recommend that you do not eat any walleyes exceeding __________ inches from Island Lake” using this regression model. How does your recommendation compare to the recommendation from part (p)? (3 pts.) z) Use R to fit the model 𝐸(log(𝐻𝐺𝑃𝑃𝑀) |𝐿𝐺𝑇𝐻𝐼𝑁) = 𝛽𝑜 + 𝛽1 log(𝐿𝐺𝑇𝐻𝐼𝑁) and obtain a model summary. Include the output from R below. To retain the appearance of R output using Courier New (10 pt) as the font. (5 pts.) Code to run in R: > Island = read.table(file.choose(),header=T,sep=”,”) > names(Island) > attach(Island) > logHg = log(HGPPM) > logX = log(LGTHIN) > trendscatter(logHg~logX) > lm1 = lm(logHg~logX) > summary(lm1) 5