STAT 360: Exam #1

advertisement
STAT 360: Take-Home Midterm
Fall 2014
Points: 75 pts
Name(s):______________________
_____________________
1. For the 1st problem, consider the FireAntMound data provide on our course website. This
data is from an investigation centered on computing the volume of fire ant mounds. This
data is compliments of J.T. Vogt who works for the US Forest Service and is stationed in
Tennessee.
Source: Vogt, J.T (2007). Three-Dimensional Sampling Method For Characterizing Ant
Mounds. Florida Entomologist 90(3), pp. 553-558.
Variable Descriptions:
 MoundID: Mound identification
 Long Side (dm): Length of the long side of the mound (units of measurement are
decimenter, dm)
 Short Side (dm): Length of the short side of the mound
 Height (dm): Height of the mound
 Volume (dm^3): Volume of the mound, using Raster Method
In this paper, Vogt highlights two different methods for estimating the volume of fire ant
mounds.
Ellipsoid (or Manual) Method:
1
𝑉𝑜𝑙𝑢𝑚𝑒
2
𝑜𝑓 𝐸𝑙𝑙𝑖𝑝𝑠𝑜𝑖𝑑 =
Oblique Cone Method: 𝑉𝑜𝑙𝑢𝑚𝑒 𝑜𝑓 𝑂𝑏𝑙𝑖𝑞𝑢𝑒 𝐶𝑜𝑛𝑒 =
Volume using
Ellipsoid Method
1
3
2
3
∗𝜋∗
∗𝜋∗
𝐿𝑒𝑛𝑔𝑡ℎ
2
𝐿𝑒𝑛𝑔𝑡ℎ
2
∗
∗
𝑊𝑖𝑑𝑡ℎ
2
𝑊𝑖𝑑𝑡ℎ
2
∗ 𝐻𝑒𝑖𝑔ℎ𝑡
∗ 𝐻𝑒𝑖𝑔ℎ𝑡
Volume using
Oblique Cone Method
1
Consider the following snip from Vogt’s article. Several sentences from the top of page 556
have been identified here.
a. References to “raster method” refer to Volume variable in our dataset and “manual
method” is the calculation of volume using half of an ellipsoid. Verify Statement #1, i.e. that
the variation in Volume is indeed less than the variation in estimated volume using the
ellipsoid method. (2 pts)

Turn In: Output that supports this statement with a brief discussion.
b. Add a kernel smoother to a plot of the residuals from the ellipsoid method to verify the first
part of Statement #2, i.e. “Overestimation of mound volume by the manual method”.
Discuss how this portion of Statement #2 is supported by your plot. (3 pts)

Turn In: Plot with kernel smoother and your discussion.
c. Consider next the second portion of Statement #2, i.e. “(Overestimation) increased with
increasing mound size.” How is this portion of the statement supported by the plot you
obtained above? (2 pts)
d. Obtain plot of the residuals from the oblique cone method to verify Statement #3, i.e. “…
more closely resembling an oblique cone…”. How does this plot (along with the residual
plot obtained above for ellipsoid method) support this statement? (3 pts)

Turn In: Plot of residuals and your discussion.
2
e. Compute the appropriate R2 quantities to verify the second part of Statement #3, i.e. “…
more closely resembling an oblique cone …”. Discuss how these values support Statement
#2. (5 pts)

f.
Turn In: All output used to compute the total unexplained variability, the
calculations used to obtain the R2 values, and your discussion.
Fit a model with Y = Estimated Volume using Oblique Cone and X = Volume to verify the
quantities reported in Statement #6. (2 pts)

Turn In: Regression output that supports these quantities.
g. Create a new variable, say ObliquePrediction_Adjusted, by adding the correction factor
specified in Statement #7 to the predicted volumes from the oblique cone method.
Compute an R2 quantity to verify this correction factor improved the predictions. (3 pts)


Turn In: Screen capture or print out this new set of predicted values.
Turn In: All output used to compute the R2 value.
2. The Donkey (or Ass) Problem.
The problem is centered on the estimation of the weight of donkeys in rural Kenya.
Veterinarians need a quick and accurate method of weighting donkeys so that the right
dosage of drugs can be administered.
Published in Significance Magazine, Oct 2014.
Consider the following possible mean functions for this investigation.
Predictor
Length
Girth
Height
Structure for Mean and Variance Functions






𝐸(𝑊𝑒𝑖𝑔ℎ𝑡|𝐿𝑒𝑛𝑔𝑡ℎ) = 𝛽0 + 𝛽1 ∗ 𝐿𝑒𝑛𝑔𝑡ℎ
𝑉𝐴𝑅(𝑊𝑒𝑖𝑔ℎ𝑡|𝐿𝑒𝑛𝑔𝑡ℎ) = 𝜎 2
𝐸(𝑊𝑒𝑖𝑔ℎ𝑡|𝐺𝑖𝑟𝑡ℎ) = 𝛽0 + 𝛽1 ∗ 𝐺𝑖𝑟𝑡ℎ
𝑉𝐴𝑅(𝑊𝑒𝑖𝑔ℎ𝑡|𝐺𝑖𝑟𝑡ℎ) = 𝜎 2
𝐸(𝑊𝑒𝑖𝑔ℎ𝑡|𝐻𝑒𝑖𝑔ℎ𝑡) = 𝛽0 + 𝛽1 ∗ 𝐻𝑒𝑖𝑔ℎ𝑡
𝑉𝐴𝑅(𝑊𝑒𝑖𝑔ℎ𝑡|𝐻𝑒𝑖𝑔ℎ𝑡) = 𝜎 2
Note: I have specified in JMP to exclude Observation #9. This can be removed by right-clicking and
selecting Unexclude.
3
a. Which single predictor (Length, Girth, or Height) is best? (2 pts)

Turn In: Rationale for how you made this determination.
b. Write several sentences (in paragraph form with proper grammar, etc.) that
communicates a summary of the important elements of this regression output. You
should write these statements as if the intended audience is the Veterinary Association
of Kenya, see http://www.kva.co.ke/. (6 pts)

Turn In: Written sentences in nice paragraph form.
c. Obtain the appropriate residual plot from your regression analysis. Again, using a few
sentences, discuss whether or not the theoretical regression assumption are being meet
for your analysis. (4 pts)

Turn In: Plot of your residuals and your discussion.
d. Use the Fit Y by X platform in JMP, and its associated options from it’s drop down menu,
to obtain a plot similar to one shown below. Also, write a few sentences regarding this
plot so that a veterinarian can understand how to read this plot. (4 pts)
Note: I was able to remove the dots by highlighting all dots, right-clicking, and selecting Transparency to 0.

Turn In: Plot and your discussion regarding how to use this plot.
e. Suppose that the veterinarian does not really like your plot because such a plot does not
follow acceptable dosage protocol. The protocols allow for a 10% error in weight when
determining the appropriate amount of dosage. The vet has drawn a plot she’d rather
have. Create this plot in software package you’d like. (4 pts)

Turn In: The plot with the ±10% error bands.
4
f.
[Time to earn your $60,000 future starting salary.] Use whatever procedure you’d like to
identify all donkeys that are outside the ±10% error bands you provided in part e. The
dosage amount for these donkeys would have either been too high or too low. Other
than their weight, is there anything particularly unusual about these donkeys? That is,
do these donkeys tend to be old? Do they tend to be of the same sex? Is the
inappropriate amount of dosage a function of Body Condition Score? Is there
something we can attribute these inappropriate dosages to? If so, as a collaborator in
this research we have an ethical obligation to inform the veterinarians. (5 pts)


Turn In: A list of row identifiers for the donkeys which would have suffered from an
inappropriate dose of medication.
A few sentences about the potential reasons for any inappropriate dosages.
g. For this last part, Unexclude Observation #9 (which I believe to be an newborn donkey)
Refit the model you have working with. Use JMP to obtain the leverage values and plot
a leverage value plot similar to what we did on page 8/9 of Handout #9. Why does this
plot strongly support leaving Observation #9 out of our analysis? Discuss. (5 pts)

Turn In: The leverage plot and your discussions in regards to leaving Observation #9
out of our analysis.
3. Consider the following data that I gathered on my recent trip to Hawaii last month. (Yes, I
really did this). The airplane on the way to Los Angeles was modern and had flight
information data that was updated about every minute or so throughout the entire flight.
This problem will utilize only the data from the last 30 minutes or so.
An airplane
Screen (which contains flight data) on seat back
(not the one I rode in)
Data as entered into JMP
5
The vertical temperature profile in our atmosphere varies depending on the geographic
characteristics of a particular area. Suppose in Los Angeles, the vertical temperature profile is of
the following form.
The temperature near sea level stays a bit cooler due to cooler ocean air coming in; whereas the
temperature profile a couple thousand feet above sea level tends to a bit higher due to warm air
moving over the mountains. After this point, the temperatures continue to drop at a fairly
constant rage as altitude rises.
a. Create a plot of Temp (F) vs. Altitude (x-axis) for the flight data I collected in the last 30
minutes or so of my flight. Does the data I collected agree with the temperature profile
schematic provided above? Discuss in detail. (4 pts)

Turn In: The plot and relevant discussions.
b. Fit the following mean and variance function to this data.


𝐸(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒) = 𝛽0 + 𝛽1 ∗ 𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒
𝑉𝐴𝑅(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒) = 𝜎 2
In what ways does this model fail in capturing the Temperature as a function of
Altitude? Discuss. (4 pts)

Turn In: The plot with estimated model and relevant discussions.
c. Obtain the residual plot for the model specified in part b. In what ways does this plot
support your answer in Part b? (3 pts)

Turn In: The residual plot with relevant discussions.
d. Is a kernel smoother able to appropriately model this particular temperature profile?
You should obtain a residual plot for your kernel smoother to ensure an adequate fit.
Discuss the adequacy of the predictions from the kernel smoother. (4 pts)

Turn In: The plot with kernel smoother predictions, a plot of residuals from this fit,
and all relevant discussions.
One major problem in using a kernel smoother is that making predictions for new observations
is not easily done (JMP saves the prediction formulas, but this is not typical in other software).
Suppose your boss wants you to fit the following models.
Model #1 for Altitude ≤ 1750:


𝐸(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 ≤ 1750) = 𝛽0 + 𝛽1 ∗ 𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒
𝑉𝐴𝑅(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 ≤ 1750) = 𝜎12
6
Model #2 for Altitude > 1750:


𝐸(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 > 1750) = 𝛽3 + 𝛽4 ∗ 𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒
𝑉𝐴𝑅(𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒|𝐴𝑙𝑡𝑖𝑡𝑢𝑑𝑒 > 1750) = 𝜎22
e. Fit these model separately. Figure out how to plot a single mean function for this
situation. You can use whatever software package you’d like to do this. (Note: These
predicted values near 1750 may not match; this is OK as getting them to match exactly is
beyond the scope of this course.) (5 pts)

f.
Turn In: A plot the clearly indicates the predicted Temperature for Altitude
values from 0 up to 35000 or so.
Use the single set of predicted values you obtained above (from joining Model #1 and #2)
to compute the residuals. Find the R2 for this model that was constructed in two parts.
What is this value? (5 pts)

Turn In: All output used to compute the R2 value.
7
Download