stat530_2016_th_fin.doc

advertisement
STAT 530
Final Exam
Take-home Part – May 5 2016
Total Possible Points 50
General Instructions:
1. You may do these by hand or by computer
2. If computer is used, attach relevant output only. No writing on output unless it is a plot.
3. Highlight or underline your answers
4. WHETHER OR NOT you are asked plot your data before proceeding. Some of the data sets may be
problematic and only plotting these will allow you to proceed in the right direction.
5. You may use other books and resources, but no help from any “animate” being, especially no help from
classmates.
6. Try to be brief and neat, please!
1. Physical Characteristics of sharks are of interest to surfers and scuba divers as well as to marine researchers.
Because it is difficult to measure jaw width in living sharks, researchers would like to determine whether it is
possible to estimate jaw width from body length, which is more easily measured. The following data on x =
length (in feet) and y = jaw width (in inches) for 44 sharks was found in various articles appearing in the
magazines Skin Diver and Scuba News:
x
18.7
12.3
18.6
16.4
15.7
18.3
14.3
16.6
9.4
18.2
13.2
a.
b.
c.
d.
e.
f.
g.
h.
y
17.5
12.3
21.8
17.2
16.2
19.9
13.3
15.8
10.2
19.0
16.8
x
14.6
15.8
14.9
17.6
12.1
16.4
13.6
15.3
16.1
13.5
19.1
y
13.9
14.7
15.1
18.5
12.0
13.8
14.2
16.9
16.0
15.9
17.9
x
16.7
17.8
16.2
12.6
17.8
13.8
16.2
22.8
16.8
13.6
13.2
y
15.2
18.2
16.7
11.6
17.4
14.2
15.7
21.2
16.3
13.0
13.3
x
12.2
15.2
14.7
12.4
13.2
15.8
15.7
19.7
18.7
13.2
16.8
y
14.8
15.9
15.3
11.9
11.6
14.3
14.3
21.3
20.8
12.2
16.9
Determine least squares’ equation that can be used for predicting jaw width.
Use your model to predict jaw width when length is 21 feet. Provide the 95% confidence interval and
interpret it.
Are any inherent assumptions being violated? Discuss briefly. Give one test and one plot for each
assumption.
LOWESS the data using a smoothing parameter of .3. Based on your LOWESS predicted values and your
LS predicted values, do you think the linear model is appropriate here?
There is a suggestion that a non-linear model might fit the data better. The suggested model is
y = g1 * (1 - exp(-g0 * x)). Use your own starting values to fit this model. Find your estimated value of
the parameters and predict y when x=21. Which do you think is a better model?
Find the 95% confidence interval for the parameters of the non-linear fit and using bootstrap and compare
with the observed 95% confidence interval. Comment.
Another way of looking at this thinking about a shark with a jaw bigger than 14 inches as a big mouthed
shark. Is length a good predictor of a shark being big-mouthed? Do the relevant analysis for this
Critique your analysis. What would you/could you do different if you had more time.
Problem 2:
Use the data set that is given on the WEB under Final_exam_prob2.xls. This is a data set for predicting weight
of an individual based on certain physical characteristics.
Age: Age of the subject
Gender: Male 1, female 0
Weight kg: Weight in kilograms
pelvic brdth: the breadth measurement around the pelvic region
waistgirth: the circumference of the waist region
thighgrth: circumference of the thighs
bicepgrth: circumference of the biceps
calfgrth: circumference of the calf
height: height in cms
head: head circumference
Use the first 350 observations to fit the data. Use the remaining observations as a Hold-out sample to check the
feasibility of the model.
a.
b.
First do detailed exploratory analysis, graphs, plots, box-plots, etc to check for distributional assumptions
and presence of aberrations and outliers in data.
Select the BEST model for this data. In doing so take into account Cp, adjusted R 2 and partial
slopes.
For your BEST model, find:
c. The least square estimates for the partial slopes.
d. Test to see if these partial slopes significantly differ from 0. Perform each test at alpha = .05. If we want to
make a joint statement about the partial slopes, is our alpha level =.05?
e. How good is your model? Mention the assumptions and possible violations of such. Give one test and one
plot for each of the relevant assumptions about the errors. Assume that the data was collected in the order
shown.
f. Is multicollinearity a problem? Why or why not? If so provide the ridge trace and give your choice for the
biasing constant. Provide the ridge estimates.
g. Are there any outliers in the data set? What would you do if you detect outliers? Are they influential
observations or merely aberrations and mistakes in the data entry process? If outliers are an issue, use
robust regression with Huber weights and provide the robust estimates.
h. How do the predicted values from your hold-out sample match up with your observed values? Discuss the
results from the hold-out sample.
i. Critique your model building process (a-h). Can you suggest viable alternatives to this procedure?
Download