Take_home_stat530ex1sp16.doc

advertisement
Spring 2016 STAT 530(due April 1)
Mid Term Exam
Take-home Part
Total Possible Points 50
General Instructions:
1. You may do these by hand or by computer
2. If computer is used, attach relevant output only. DO NOT write on output.
3. Highlight or underline your answers
4. You may use other books and resources, but no help from any “animate” being.
5. Try to be brief and neat, please! (Your grade could be inversely proportional to the weight of your
exam!!!!)
1) The following data set is a real life data set indicating income level and the amount of fish consumed
per capita for a household. This data was collected from 25 families in rural east India, where fish is a
major (sometimes the only) protein source. Income is yearly income in rupees (about 50 rupees to a
dollar is the exchange rate), and the fish consumption is in grams per head. We are interested in
predicting fish consumption based on income levels. Define your corresponding y and x and answer
the following questions. (25 points)
Income fish-cons.
income cons
3000
2532.0
6000
6579.0
11250
1724.6
22500
603.4
40000
16756.0
3040
2802.9
6120
5372.6
11470
1101.3
23260
4839.5
43500
15141.7
3090
2841.4
6235
3333.3
11800
1092.7
25900
3.8
47730
15547.4
3145
1197.3
6372
1301.4
12254
1595.8
29064
4032.7
53075
2.8
3211
2572.4
6538
1521.6
12745
597.2
33055
3064.1
a. Construct a scatter plot of y vs x.
b. Determine least squares equation that can be
used for predicting a value of y based on a value
of x.
c. Give 95% confidence intervals for the
change in fish consumption per unit rise in
income.
d. Does this rate (in c) differ significantly from
0? Use alpha=0.05.
e. Use your model to predict fish consumption
when income is i)Rs. 10,000, ii) Rs. 2000
f. Provide 95% confidence intervals for the
mean fish consumption in e (i).
g. Provide 95% prediction intervals for fish
consumption in e (i)
h. Comment on your findings in (f) and (g).
i. Provide a 95% prediction interval for the
income when you are given that the fish
consumption was 1600 gm per unit.
j. Provide the ANOVA table and discuss what
this F-test actually tests.
k. What were the basic assumptions for the
model in (b)?
l. Were these assumptions satisfied? Discuss
in detail giving plots and formal tests.
m. Would you consider the model in (b) to be a
good model? Why or why not?
Example. The following consulting problem is about land cover in South America in a region that was
ravaged by forest fires. The response is the percentage of the area that is now covered with some
vegetation. The possible predictors are the age of the burn in months, the elevation of the area, the slope
and aspect of the area. (25 points)
Exposed
Soil (%),C1
6.32
2.90
7.70
25.66
4.94
2.75
0.00
2.12
0.00
13.10
12.48
11.12
1.56
0.00
0.00
0.00
1.27
13.12
4.34
0.33
0.04
0.00
Age of Burn
Months,C2
1.5
12.0
15.0
22.0
25.0
48.0
90.0
108.0
120.0
6.0
7.0
22.0
22.0
48.0
120.0
156.0
12.0
18.0
24.0
36.0
36.0
60.0
Elevation in
Feet,C3
3720
4160
3760
3860
3800
3760
3785
4120
3775
3680
3730
3800
3815
3725
3630
3840
3875
4130
3525
3850
3920
3610
Slope,C4
24
20
18
22
25
20
8
32
10
20
16
27
10
15
17
14
7
11
20
13
10
10
Aspect in
Degrees,C5
290
270
90
170
140
90
180
180
220
270
60
270
160
100
40
70
320
220
110
90
90
40
For the data provided above find:
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
The least square estimates for the partial slopes for a model trying to predict y based on the given
predictors.
Interpret your results in ACTUAL words.
Test to see if these partial slopes significantly differ from 0. Perform each test at alpha = .05.
How good is your model?
Mention the assumptions and possible violations in the context of your data. Provide at least one plot
and one test for your assumption checks.
Give the partial sums of squares and interpret them in this context. What information do these give
you.
Select the “best model” for this data set. What criteria did you use to select “best”? (Hint: if your
assumptions are not met earlier, think of easy transformations in the selection of your “best model”)
Look at the assumptions and violations as such for the “best” model. ( Your final model may have
violations too)
Are outliers a problem for the data set? Why or why not?
Is multi-collinearity a problem?
Give a general critique of the methods that you used in your analysis (a-j). If you were a statistician
consulting on this problem what would you have done different?
Download