STAT 360: HW #2 Fall 2014 Points: 35 pts Name(s): ________________________ _______________________ _______________________ Data Description These data are haystack measurements taken in Nebraska in 1927 and 1928, when farmers sold hay unbaled and in the stack, requiring estimation of the volume of a stack. Two measurements that could be made easily with a rope were usually employed: the circumference around the base of the stack and the OVER, the distance from the ground on one side of a stack to the ground on the other side of the stack. Stacks vary in height and shape so using a simple computation like the volume of a hemisphere, while perhaps a useful first approximation, may or may not be sufficiently accurate. Source: Methods of Correlation Analysis, Ezekiel, M., (1941), 2nd ed, New York: Wiley, p. 378-380 Data Typical Haystack (I think…) 1 Answer the following using whatever software you’d like. Marginal Distribution 1. Obtain the following quantities for Volume. (3 pts) Mean = Variance = Total Unexplained Variation = Distribution of Volume | Circumference 2. Create a plot of Volume vs. Circumference. Volume is the response variable for this investigation, so place this variable on the y-axis in your plot. Give a brief statement (one or two sentences) about the general pattern you see in this plot. (3 points) 3. Show the math as to why it is reasonable to estimate the volume of a haystack using the quantity. (3 pts) Hint: You may want to start with the volume of a sphere – Google it! ππππ’ππ ≈ πΆππππ’ππππππππ 3 12π 2 4. Use the function above as an estimate of the mean function for Volume | Circumference. Plot this estimated mean function on the graph created for Problem #2. Note: You may have to obtain an estimate of the mean function for each data point in the dataset in order to construct this plot. (3 pts) 2 5. Obtain Residual2 value for each point in your dataset. Sum up these values to obtain the total unexplained variation for the mean function given above. (2 pts) Total Unexplained Variation = 6. What proportion of the total unexplained variation is being explained by using this mean function? Show the math here. (2 pts) Distribution of Volume | Over 7. What would be a reasonable function for estimating volume if the Over measurement was used instead of the Circumference measurement? Explain how you obtained this value. (4 pts) 8. Use your function in Problem #7 to obtain the Residual2 value for each point in your dataset. Sum up these values to obtain the total unexplained variation for the mean function using Over. (3 pts) Total Unexplained Variation = 9. Once again, what proportion of the total unexplained variation is being explained by the using a mean function based on Over? (2 pts) 3 Comparing the Estimating Mean Functions 10. Which is a better estimate to use for estimating the average volume of a haystack – the Circumference of the Over measurement? Explain. (4 pts) Investigation of the Variance Function 11. Pick one set of the residuals obtained above. Obtain the |Residual| value for each data point and plot it against Circumference or Over, depending on which mean function was used. (3 pts) 12. Discuss the general pattern seen in the above plot. Does the variance function appear to increase/decrease/not change as the haystacks get larger? Discuss the possible consequences of a changing variance function on someone who purchases hay in this manner. (3 pts) 4