STAT 360: Take-Home Midterm Fall 2014 Points: 75 pts Name(s):______________________ _____________________ 1. For the 1st problem, consider the FireAntMound data provide on our course website. This data is from an investigation centered on computing the volume of fire ant mounds. This data is compliments of J.T. Vogt who works for the US Forest Service and is stationed in Tennessee. Source: Vogt, J.T (2007). Three-Dimensional Sampling Method For Characterizing Ant Mounds. Florida Entomologist 90(3), pp. 553-558. Variable Descriptions: MoundID: Mound identification Long Side (dm): Length of the long side of the mound (units of measurement are decimenter, dm) Short Side (dm): Length of the short side of the mound Height (dm): Height of the mound Volume (dm^3): Volume of the mound, using Raster Method In this paper, Vogt highlights two different methods for estimating the volume of fire ant mounds. Ellipsoid (or Manual) Method: 1 𝑉𝑜𝑙𝑢𝑚𝑒 2 𝑜𝑓 𝐸𝑙𝑙𝑖𝑝𝑠𝑜𝑖𝑑 = Oblique Cone Method: 𝑉𝑜𝑙𝑢𝑚𝑒 𝑜𝑓 𝑂𝑏𝑙𝑖𝑞𝑢𝑒 𝐶𝑜𝑛𝑒 = Volume using Ellipsoid Method 1 3 2 3 ∗𝜋∗ ∗𝜋∗ 𝐿𝑒𝑛𝑔𝑡ℎ 2 𝐿𝑒𝑛𝑔𝑡ℎ 2 ∗ ∗ 𝑊𝑖𝑑𝑡ℎ 2 𝑊𝑖𝑑𝑡ℎ 2 ∗ 𝐻𝑒𝑖𝑔ℎ𝑡 ∗ 𝐻𝑒𝑖𝑔ℎ𝑡 Volume using Oblique Cone Method 1 Consider the following snip from Vogt’s article. Several sentences from the top of page 556 have been identified here. a. References to “raster method” refer to Volume variable in our dataset and “manual method” is the calculation of volume using half of an ellipsoid. Verify Statement #1, i.e. that the variation in Volume is indeed less than the variation in estimated volume using the ellipsoid method. (2 pts) Turn In: Output that supports this statement with a brief discussion. b. Add a kernel smoother to a plot of the residuals from the ellipsoid method to verify the first part of Statement #2, i.e. “Overestimation of mound volume by the manual method”. Discuss how this portion of Statement #2 is supported by your plot. (3 pts) Turn In: Plot with kernel smoother and your discussion. c. Consider next the second portion of Statement #2, i.e. “(Overestimation) increased with increasing mound size.” How is this portion of the statement supported by the plot you obtained above? (2 pts) d. Obtain plot of the residuals from the oblique cone method to verify Statement #3, i.e. “… more closely resembling an oblique cone…”. How does this plot (along with the residual plot obtained above for ellipsoid method) support this statement? (3 pts) Turn In: Plot of residuals and your discussion. 2 e. Compute the appropriate R2 quantities to verify the second part of Statement #3, i.e. “… more closely resembling an oblique cone …”. Discuss how these values support Statement #2. (5 pts) f. Turn In: All output used to compute the total unexplained variability, the calculations used to obtain the R2 values, and your discussion. Fit a model with Y = Estimated Volume using Oblique Cone and X = Volume to verify the quantities reported in Statement #6. (2 pts) Turn In: Regression output that supports these quantities. g. Create a new variable, say ObliquePrediction_Adjusted, by adding the correction factor specified in Statement #7 to the predicted volumes from the oblique cone method. Compute an R2 quantity to verify this correction factor improved the predictions. (3 pts) Turn In: Screen capture or print out this new set of predicted values. Turn In: All output used to compute the R2 value. 2. The Donkey (or Ass) Problem. The problem is centered on the estimation of the weight of donkeys in rural Kenya. Veterinarians need a quick and accurate method of weighting donkeys so that the right dosage of drugs can be administered. Published in Significance Magazine, Oct 2014. Consider the following possible mean functions for this investigation. Predictor Length Girth Height Structure for Mean and Variance Functions 𝐸(𝑊𝑒𝑖𝑔ℎ𝑡|𝐿𝑒𝑛𝑔𝑡ℎ) = 𝛽0 + 𝛽1 ∗ 𝐿𝑒𝑛𝑔𝑡ℎ 𝑉𝐴𝑅(𝑊𝑒𝑖𝑔ℎ𝑡|𝐿𝑒𝑛𝑔𝑡ℎ) = 𝜎 2 𝐸(𝑊𝑒𝑖𝑔ℎ𝑡|𝐺𝑖𝑟𝑡ℎ) = 𝛽0 + 𝛽1 ∗ 𝐺𝑖𝑟𝑡ℎ 𝑉𝐴𝑅(𝑊𝑒𝑖𝑔ℎ𝑡|𝐺𝑖𝑟𝑡ℎ) = 𝜎 2 𝐸(𝑊𝑒𝑖𝑔ℎ𝑡|𝐻𝑒𝑖𝑔ℎ𝑡) = 𝛽0 + 𝛽1 ∗ 𝐻𝑒𝑖𝑔ℎ𝑡 𝑉𝐴𝑅(𝑊𝑒𝑖𝑔ℎ𝑡|𝐻𝑒𝑖𝑔ℎ𝑡) = 𝜎 2 Note: I have specified in JMP to exclude Observation #9. This can be removed by right-clicking and selecting Unexclude. 3 a. Which single predictor (Length, Girth, or Height) is best? (2 pts) Turn In: Rationale for how you made this determination. b. Write several sentences (in paragraph form with proper grammar, etc.) that communicates a summary of the important elements of this regression output. You should write these statements as if the intended audience is the Veterinary Association of Kenya, see http://www.kva.co.ke/. (6 pts) Turn In: Written sentences in nice paragraph form. c. Obtain the appropriate residual plot from your regression analysis. Again, using a few sentences, discuss whether or not the theoretical regression assumption are being meet for your analysis. (4 pts) Turn In: Plot of your residuals and your discussion. d. Use the Fit Y by X platform in JMP, and its associated options from it’s drop down menu, to obtain a plot similar to one shown below. Also, write a few sentences regarding this plot so that a veterinarian can understand how to read this plot. (4 pts) Note: I was able to remove the dots by highlighting all dots, right-clicking, and selecting Transparency to 0. Turn In: Plot and your discussion regarding how to use this plot. e. Suppose that the veterinarian does not really like your plot because such a plot does not follow acceptable dosage protocol. The protocols allow for a 10% error in weight when determining the appropriate amount of dosage. The vet has drawn a plot she’d rather have. Create this plot in software package you’d like. (4 pts) Turn In: The plot with the ±10% error bands. 4 f. [Time to earn your $60,000 future starting salary.] Use whatever procedure you’d like to identify all donkeys that are outside the ±10% error bands you provided in part e. The dosage amount for these donkeys would have either been too high or too low. Other than their weight, is there anything particularly unusual about these donkeys? That is, do these donkeys tend to be old? Do they tend to be of the same sex? Is the inappropriate amount of dosage a function of Body Condition Score? Is there something we can attribute these inappropriate dosages to? If so, as a collaborator in this research we have an ethical obligation to inform the veterinarians. (5 pts) Turn In: A list of row identifiers for the donkeys which would have suffered from an inappropriate dose of medication. A few sentences about the potential reasons for any inappropriate dosages. g. For this last part, Unexclude Observation #9 (which I believe to be an newborn donkey) Refit the model you have working with. Use JMP to obtain the leverage values and plot a leverage value plot similar to what we did on page 8/9 of Handout #9. Why does this plot strongly support leaving Observation #9 out of our analysis? Discuss. (5 pts) Turn In: The leverage plot and your discussions in regards to leaving Observation #9 out of our analysis. 3. Consider the following data that I gathered on my recent trip to Hawaii last month. (Yes, I really did this). The airplane on the way to Los Angeles was modern and had flight information data that was updated about every minute or so throughout the entire flight. This problem will utilize only the data from the last 30 minutes or so. An airplane Screen (which contains flight data) on seat back (not the one I rode in) Data as entered into JMP 5 The vertical temperature profile in our atmosphere varies depending on the geographic characteristics of a particular area. Suppose in Los Angeles, the vertical temperature profile is of the following form. The temperature near sea level stays a bit cooler due to cooler ocean air coming in; whereas the temperature profile a couple thousand feet above sea level tends to a bit higher due to warm air moving over the mountains. After this point, the temperatures continue to drop at a fairly constant rage as altitude rises. a. Create a plot of Temp (F) vs. Altitude (x-axis) for the flight data I collected in the last 30 minutes or so of my flight. Does the data I collected agree with the temperature profile schematic provided above? Discuss in detail. (4 pts) Turn In: The plot and relevant discussions. b. Fit the following mean and variance function to this data. 𝐸(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒) = 𝛽0 + 𝛽1 ∗ 𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 𝑉𝐴𝑅(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒) = 𝜎 2 In what ways does this model fail in capturing the Temperature as a function of Altitude? Discuss. (4 pts) Turn In: The plot with estimated model and relevant discussions. c. Obtain the residual plot for the model specified in part b. In what ways does this plot support your answer in Part b? (3 pts) Turn In: The residual plot with relevant discussions. d. Is a kernel smoother able to appropriately model this particular temperature profile? You should obtain a residual plot for your kernel smoother to ensure an adequate fit. Discuss the adequacy of the predictions from the kernel smoother. (4 pts) Turn In: The plot with kernel smoother predictions, a plot of residuals from this fit, and all relevant discussions. One major problem in using a kernel smoother is that making predictions for new observations is not easily done (JMP saves the prediction formulas, but this is not typical in other software). Suppose your boss wants you to fit the following models. Model #1 for Altitude ≤ 1750: 𝐸(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 ≤ 1750) = 𝛽0 + 𝛽1 ∗ 𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 𝑉𝐴𝑅(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 ≤ 1750) = 𝜎12 6 Model #2 for Altitude > 1750: 𝐸(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 > 1750) = 𝛽3 + 𝛽4 ∗ 𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 𝑉𝐴𝑅(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 > 1750) = 𝜎22 e. Fit these model separately. Figure out how to plot a single mean function for this situation. You can use whatever software package you’d like to do this. (Note: These predicted values near 1750 may not match; this is OK as getting them to match exactly is beyond the scope of this course.) (5 pts) f. Turn In: A plot the clearly indicates the predicted Temperature for Altitude values from 0 up to 35000 or so. Use the single set of predicted values you obtained above (from joining Model #1 and #2) to compute the residuals. Find the R2 for this model that was constructed in two parts. What is this value? (5 pts) Turn In: All output used to compute the R2 value. 7