Uploaded by ndiritu.johnson

STA 226

advertisement
STA 226: INTRODUCTION TO STATISTICAL INFERENCE . ASSIGNMENT
Instructions: Attempt all questions. In case a level of significance is not stated, use 5%.
Question One
(a) It is desired to investigate the level of premium charged by two companies for
contents policies for houses in a certain area. Random sample of 10 houses insured by
company A are compared with similar houses insured by company B. The premiums
charged in each case are as follows.
Company A 117 154 166 189 190 202 233 263 289 331
Company B
142 160 166 188 221 241 276 279 284 302
(i) Illustrate the data given on a suitable diagram and hence comment briefly on the
validity of the assumptions required for a two-sample t test for the premiums of these
two companies.
[1 mark]
(ii) Assume that the premiums are normally distributed, carry out a formal test to check
that it is appropriate to apply a two-sample t test to these data.
[3 marks]
(iii) Test whether the level of premiums charged by company B was higher than the
charged by company A. State your conclusion clearly.
[3 marks]
(b) In a medical study conducted to test the suggestion that daily exercise has the effect
of lowering blood pressure, a sample of eight patients with high blood pressure was
selected. Their blood pressure was measured initially and then again a month later
after they had participated in an exercise program. The results are shown in the table
below:
Patient
1
2
3
4
5
6
7
8
Before
155
152
146
153
146
160
139
148
After
145
147
123
137
141
142
140
138
The following contains the R-program outputs
Output 1
Shapiro-Wilk normality test
data: before - after
W = 0.9706, p-value = 0.9027
Page 1 of 5
Output 2
Paired t-test
data: before and after
t = 3.8549, df = 7, p-value = 0.003126
alternative hypothesis: true difference in means is greater
than 0
95 percent confidence interval:
5.46661
Inf
sample estimates:
mean of the differences
10.75
Use above outputs to answer the following questions.
(i) Does the data seem to be normal? Justify your answer.
[1 mark]
(ii) Does the date provide sufficient evidence to support the claim that the exercise
reduces blood pressure in patients?
[2 marks]
(c) Assume that the above data are from two independent populations.
The following are R outputs from the data.
Output 3
F test to compare two variances
data: before and after
F = 0.7866, num df = 7, denom df = 7, p-value = 0.7595
alternative hypothesis: true ratio of variances is not equal
to 1
95 percent confidence interval:
0.1574794 3.9289733
sample estimates:
ratio of variances
0.7865955
Output 4
Two Sample t-test
data: before and after
t = 3.1085, df = 14, p-value = 0.007702
Page 2 of 5
alternative hypothesis: true difference in means is not
equal to 0
95 percent confidence interval:
3.33269 18.16731
sample estimates:
mean of x mean of y
149.875
139.125
Output 5
Welch Two Sample t-test
data: before and after
t = 3.1085, df = 13.803, p-value = 0.007813
alternative hypothesis: true difference in means is not
equal to 0
95 percent confidence interval:
3.322749 18.177251
sample estimates:
mean of x mean of y
149.875
139.125
Clearly interpret the above outputs.
[3 marks]
Question Two
(a) The number of new customers generated per month by different branches of a small
building society is being monitored for employee bonus purpose. Head office has
collated the figures sent in by four branches over recent months, which are as follows:
Branch 1
11
5
4
9
3
0
-
Branch 2
9
7
6
8
12
-
-
Branch 3
5
4
5
6
0
8
6
Branch 4
7
8
12
0
1
15
6
There are different numbers of figures because of incomplete data being sent to Head
office. Investigate whether there is any different between the mean number of new
customers. Use 5% level of significance.
[5 marks]
(b) Nineteen pigs are assigned at random among four experimental groups. Each group
is fed a different diet. The data are pig body weights, in kilograms, after being raised
on these diets. We wish to ask whether pig weights are the same for all four diets.
Page 3 of 5
Feed1
Feed2
Feed3
Feed4
60.8
68.7
102.6
87.9
57
67.7
102.1
84.2
65
74
100.2
83.1
58.6
66.3
96.5
85.7
61.7
69.8
100
90.3
60
70
80
90
100
(i) What type of hypothesis test will you use?
[1 mark]
(ii) What are the test's assumptions?
[2 marks]
(iii) A side-by-side boxplots are plotted as shown below to compare the three
distributions. Do the samples look like they were drawn from populations with same
distribution? Justify your answer.
[2 marks]
feed1
feed2
feed3
feed4
(iv) The following is R output from the data. Interpret the results in the context of the
problem.
[1 mark]
Page 4 of 5
Output 6
Analysis of Variance Table
Response:
feed
Residuals
Df
Sum Sq
Mean Sq
F value
Pr(>F)
3
4686
1562
194.6
8.47e-13 ***
16
128
Signif. codes:
8
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1
(v) The following R output to test the homogeneity of variances. Interpret the
results in the context of the problem.
[2 mark]
Output 7
Bartlett test of homogeneity of variances
data:
y by feed
Bartlett's K-squared = 0.2364, df = 3, p-value = 0.9715
(a) A certain specimen of plant produces flowers which are either red, white or pink. It
also produces leaves which maybe either plain or variegated. For example of 500
plants, the distribution of flower color and leaf type was.
Red
White
Pink
Plain
97
42
77
Variegated
105
148
31
Test whether these results indicates any association between flower color and the leaf
type.
[4 marks]
Page 5 of 5
Download