Spring 2016 STAT 530(due April 1) Mid Term Exam Take-home Part Total Possible Points 50 General Instructions: 1. You may do these by hand or by computer 2. If computer is used, attach relevant output only. DO NOT write on output. 3. Highlight or underline your answers 4. You may use other books and resources, but no help from any “animate” being. 5. Try to be brief and neat, please! (Your grade could be inversely proportional to the weight of your exam!!!!) 1) The following data set is a real life data set indicating income level and the amount of fish consumed per capita for a household. This data was collected from 25 families in rural east India, where fish is a major (sometimes the only) protein source. Income is yearly income in rupees (about 50 rupees to a dollar is the exchange rate), and the fish consumption is in grams per head. We are interested in predicting fish consumption based on income levels. Define your corresponding y and x and answer the following questions. (25 points) Income fish-cons. income cons 3000 2532.0 6000 6579.0 11250 1724.6 22500 603.4 40000 16756.0 3040 2802.9 6120 5372.6 11470 1101.3 23260 4839.5 43500 15141.7 3090 2841.4 6235 3333.3 11800 1092.7 25900 3.8 47730 15547.4 3145 1197.3 6372 1301.4 12254 1595.8 29064 4032.7 53075 2.8 3211 2572.4 6538 1521.6 12745 597.2 33055 3064.1 a. Construct a scatter plot of y vs x. b. Determine least squares equation that can be used for predicting a value of y based on a value of x. c. Give 95% confidence intervals for the change in fish consumption per unit rise in income. d. Does this rate (in c) differ significantly from 0? Use alpha=0.05. e. Use your model to predict fish consumption when income is i)Rs. 10,000, ii) Rs. 2000 f. Provide 95% confidence intervals for the mean fish consumption in e (i). g. Provide 95% prediction intervals for fish consumption in e (i) h. Comment on your findings in (f) and (g). i. Provide a 95% prediction interval for the income when you are given that the fish consumption was 1600 gm per unit. j. Provide the ANOVA table and discuss what this F-test actually tests. k. What were the basic assumptions for the model in (b)? l. Were these assumptions satisfied? Discuss in detail giving plots and formal tests. m. Would you consider the model in (b) to be a good model? Why or why not? Example. The following consulting problem is about land cover in South America in a region that was ravaged by forest fires. The response is the percentage of the area that is now covered with some vegetation. The possible predictors are the age of the burn in months, the elevation of the area, the slope and aspect of the area. (25 points) Exposed Soil (%),C1 6.32 2.90 7.70 25.66 4.94 2.75 0.00 2.12 0.00 13.10 12.48 11.12 1.56 0.00 0.00 0.00 1.27 13.12 4.34 0.33 0.04 0.00 Age of Burn Months,C2 1.5 12.0 15.0 22.0 25.0 48.0 90.0 108.0 120.0 6.0 7.0 22.0 22.0 48.0 120.0 156.0 12.0 18.0 24.0 36.0 36.0 60.0 Elevation in Feet,C3 3720 4160 3760 3860 3800 3760 3785 4120 3775 3680 3730 3800 3815 3725 3630 3840 3875 4130 3525 3850 3920 3610 Slope,C4 24 20 18 22 25 20 8 32 10 20 16 27 10 15 17 14 7 11 20 13 10 10 Aspect in Degrees,C5 290 270 90 170 140 90 180 180 220 270 60 270 160 100 40 70 320 220 110 90 90 40 For the data provided above find: a. b. c. d. e. f. g. h. i. j. k. The least square estimates for the partial slopes for a model trying to predict y based on the given predictors. Interpret your results in ACTUAL words. Test to see if these partial slopes significantly differ from 0. Perform each test at alpha = .05. How good is your model? Mention the assumptions and possible violations in the context of your data. Provide at least one plot and one test for your assumption checks. Give the partial sums of squares and interpret them in this context. What information do these give you. Select the “best model” for this data set. What criteria did you use to select “best”? (Hint: if your assumptions are not met earlier, think of easy transformations in the selection of your “best model”) Look at the assumptions and violations as such for the “best” model. ( Your final model may have violations too) Are outliers a problem for the data set? Why or why not? Is multi-collinearity a problem? Give a general critique of the methods that you used in your analysis (a-j). If you were a statistician consulting on this problem what would you have done different?