STA 226: INTRODUCTION TO STATISTICAL INFERENCE . ASSIGNMENT Instructions: Attempt all questions. In case a level of significance is not stated, use 5%. Question One (a) It is desired to investigate the level of premium charged by two companies for contents policies for houses in a certain area. Random sample of 10 houses insured by company A are compared with similar houses insured by company B. The premiums charged in each case are as follows. Company A 117 154 166 189 190 202 233 263 289 331 Company B 142 160 166 188 221 241 276 279 284 302 (i) Illustrate the data given on a suitable diagram and hence comment briefly on the validity of the assumptions required for a two-sample t test for the premiums of these two companies. [1 mark] (ii) Assume that the premiums are normally distributed, carry out a formal test to check that it is appropriate to apply a two-sample t test to these data. [3 marks] (iii) Test whether the level of premiums charged by company B was higher than the charged by company A. State your conclusion clearly. [3 marks] (b) In a medical study conducted to test the suggestion that daily exercise has the effect of lowering blood pressure, a sample of eight patients with high blood pressure was selected. Their blood pressure was measured initially and then again a month later after they had participated in an exercise program. The results are shown in the table below: Patient 1 2 3 4 5 6 7 8 Before 155 152 146 153 146 160 139 148 After 145 147 123 137 141 142 140 138 The following contains the R-program outputs Output 1 Shapiro-Wilk normality test data: before - after W = 0.9706, p-value = 0.9027 Page 1 of 5 Output 2 Paired t-test data: before and after t = 3.8549, df = 7, p-value = 0.003126 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 5.46661 Inf sample estimates: mean of the differences 10.75 Use above outputs to answer the following questions. (i) Does the data seem to be normal? Justify your answer. [1 mark] (ii) Does the date provide sufficient evidence to support the claim that the exercise reduces blood pressure in patients? [2 marks] (c) Assume that the above data are from two independent populations. The following are R outputs from the data. Output 3 F test to compare two variances data: before and after F = 0.7866, num df = 7, denom df = 7, p-value = 0.7595 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1574794 3.9289733 sample estimates: ratio of variances 0.7865955 Output 4 Two Sample t-test data: before and after t = 3.1085, df = 14, p-value = 0.007702 Page 2 of 5 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 3.33269 18.16731 sample estimates: mean of x mean of y 149.875 139.125 Output 5 Welch Two Sample t-test data: before and after t = 3.1085, df = 13.803, p-value = 0.007813 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 3.322749 18.177251 sample estimates: mean of x mean of y 149.875 139.125 Clearly interpret the above outputs. [3 marks] Question Two (a) The number of new customers generated per month by different branches of a small building society is being monitored for employee bonus purpose. Head office has collated the figures sent in by four branches over recent months, which are as follows: Branch 1 11 5 4 9 3 0 - Branch 2 9 7 6 8 12 - - Branch 3 5 4 5 6 0 8 6 Branch 4 7 8 12 0 1 15 6 There are different numbers of figures because of incomplete data being sent to Head office. Investigate whether there is any different between the mean number of new customers. Use 5% level of significance. [5 marks] (b) Nineteen pigs are assigned at random among four experimental groups. Each group is fed a different diet. The data are pig body weights, in kilograms, after being raised on these diets. We wish to ask whether pig weights are the same for all four diets. Page 3 of 5 Feed1 Feed2 Feed3 Feed4 60.8 68.7 102.6 87.9 57 67.7 102.1 84.2 65 74 100.2 83.1 58.6 66.3 96.5 85.7 61.7 69.8 100 90.3 60 70 80 90 100 (i) What type of hypothesis test will you use? [1 mark] (ii) What are the test's assumptions? [2 marks] (iii) A side-by-side boxplots are plotted as shown below to compare the three distributions. Do the samples look like they were drawn from populations with same distribution? Justify your answer. [2 marks] feed1 feed2 feed3 feed4 (iv) The following is R output from the data. Interpret the results in the context of the problem. [1 mark] Page 4 of 5 Output 6 Analysis of Variance Table Response: feed Residuals Df Sum Sq Mean Sq F value Pr(>F) 3 4686 1562 194.6 8.47e-13 *** 16 128 Signif. codes: 8 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 (v) The following R output to test the homogeneity of variances. Interpret the results in the context of the problem. [2 mark] Output 7 Bartlett test of homogeneity of variances data: y by feed Bartlett's K-squared = 0.2364, df = 3, p-value = 0.9715 (a) A certain specimen of plant produces flowers which are either red, white or pink. It also produces leaves which maybe either plain or variegated. For example of 500 plants, the distribution of flower color and leaf type was. Red White Pink Plain 97 42 77 Variegated 105 148 31 Test whether these results indicates any association between flower color and the leaf type. [4 marks] Page 5 of 5