STAT 530 Final Exam Take-home Part – May 5 2016 Total Possible Points 50 General Instructions: 1. You may do these by hand or by computer 2. If computer is used, attach relevant output only. No writing on output unless it is a plot. 3. Highlight or underline your answers 4. WHETHER OR NOT you are asked plot your data before proceeding. Some of the data sets may be problematic and only plotting these will allow you to proceed in the right direction. 5. You may use other books and resources, but no help from any “animate” being, especially no help from classmates. 6. Try to be brief and neat, please! 1. Physical Characteristics of sharks are of interest to surfers and scuba divers as well as to marine researchers. Because it is difficult to measure jaw width in living sharks, researchers would like to determine whether it is possible to estimate jaw width from body length, which is more easily measured. The following data on x = length (in feet) and y = jaw width (in inches) for 44 sharks was found in various articles appearing in the magazines Skin Diver and Scuba News: x 18.7 12.3 18.6 16.4 15.7 18.3 14.3 16.6 9.4 18.2 13.2 a. b. c. d. e. f. g. h. y 17.5 12.3 21.8 17.2 16.2 19.9 13.3 15.8 10.2 19.0 16.8 x 14.6 15.8 14.9 17.6 12.1 16.4 13.6 15.3 16.1 13.5 19.1 y 13.9 14.7 15.1 18.5 12.0 13.8 14.2 16.9 16.0 15.9 17.9 x 16.7 17.8 16.2 12.6 17.8 13.8 16.2 22.8 16.8 13.6 13.2 y 15.2 18.2 16.7 11.6 17.4 14.2 15.7 21.2 16.3 13.0 13.3 x 12.2 15.2 14.7 12.4 13.2 15.8 15.7 19.7 18.7 13.2 16.8 y 14.8 15.9 15.3 11.9 11.6 14.3 14.3 21.3 20.8 12.2 16.9 Determine least squares’ equation that can be used for predicting jaw width. Use your model to predict jaw width when length is 21 feet. Provide the 95% confidence interval and interpret it. Are any inherent assumptions being violated? Discuss briefly. Give one test and one plot for each assumption. LOWESS the data using a smoothing parameter of .3. Based on your LOWESS predicted values and your LS predicted values, do you think the linear model is appropriate here? There is a suggestion that a non-linear model might fit the data better. The suggested model is y = g1 * (1 - exp(-g0 * x)). Use your own starting values to fit this model. Find your estimated value of the parameters and predict y when x=21. Which do you think is a better model? Find the 95% confidence interval for the parameters of the non-linear fit and using bootstrap and compare with the observed 95% confidence interval. Comment. Another way of looking at this thinking about a shark with a jaw bigger than 14 inches as a big mouthed shark. Is length a good predictor of a shark being big-mouthed? Do the relevant analysis for this Critique your analysis. What would you/could you do different if you had more time. Problem 2: Use the data set that is given on the WEB under Final_exam_prob2.xls. This is a data set for predicting weight of an individual based on certain physical characteristics. Age: Age of the subject Gender: Male 1, female 0 Weight kg: Weight in kilograms pelvic brdth: the breadth measurement around the pelvic region waistgirth: the circumference of the waist region thighgrth: circumference of the thighs bicepgrth: circumference of the biceps calfgrth: circumference of the calf height: height in cms head: head circumference Use the first 350 observations to fit the data. Use the remaining observations as a Hold-out sample to check the feasibility of the model. a. b. First do detailed exploratory analysis, graphs, plots, box-plots, etc to check for distributional assumptions and presence of aberrations and outliers in data. Select the BEST model for this data. In doing so take into account Cp, adjusted R 2 and partial slopes. For your BEST model, find: c. The least square estimates for the partial slopes. d. Test to see if these partial slopes significantly differ from 0. Perform each test at alpha = .05. If we want to make a joint statement about the partial slopes, is our alpha level =.05? e. How good is your model? Mention the assumptions and possible violations of such. Give one test and one plot for each of the relevant assumptions about the errors. Assume that the data was collected in the order shown. f. Is multicollinearity a problem? Why or why not? If so provide the ridge trace and give your choice for the biasing constant. Provide the ridge estimates. g. Are there any outliers in the data set? What would you do if you detect outliers? Are they influential observations or merely aberrations and mistakes in the data entry process? If outliers are an issue, use robust regression with Huber weights and provide the robust estimates. h. How do the predicted values from your hold-out sample match up with your observed values? Discuss the results from the hold-out sample. i. Critique your model building process (a-h). Can you suggest viable alternatives to this procedure?