Department of Mathematics & Statistics STAT 2593 Solutions to

advertisement

Department of Mathematics & Statistics

STAT 2593

Solutions to Assignment 10.

1. (Problem 6, page 624 Devore 5th ed., page 641 6th ed.)

The null probabilities for the three categories are

4 / 16 ( R for red, Y for yellow, W p

R

= 9 / 16, p

Y

= 3 for white). I.e our null hypothesis is

/

H

16 and

0

: p

R p

W

=

= 9 / 16, p

Y

= 3 / 16 and p

W

= 4 / 16. The alternative hypothesis is simply the contradiction of this: H a

: H

0 is false. The observed counts are 195, 73, and 100, respectively.

The sample size is n = 195 + 73 + 100 = 368. Hence the expected counts are E

R

=

368 × 9 / 16 = 207 E

Y

= 368 × 3 / 16 = 69, and E

W

= 368 × 4 / 16 = 92.

The test statistic is

Q =

(195 − 207) 2

207

+

(73 − 69) 2

69

+

(100 − 92) 2

92

= 0 .

6957 + 0 .

2319 + 0 .

6957 = 1.6233

.

The degrees of freedom are 3 − 1 = 2 ; the table value at the 0.10 significance level is 4.065 . Since the observed value of Q is less than this table value we do not reject H

0 even at the 0.10 significance level.

I.e. there is no evidence in this data set against the genetic theory. Thus the data do not contradict the genetic theory. (They do not actually confirm the genetic theory, but they are “in harmony with” the genetic theory; they are consistent with the genetic theory.)

2. The estimate of λ is

ˆ

=

P

9

P

9 i i =0

×

O i

O i

=

1163

300

= 3.8767

to 4 decimal places. The corresponding Poisson probabilities are e

λ x

P ( X = x ) = x !

for x = 0 , 1 , . . .

8 and P ( X ≥ 9) = 1 −

P

8 i =0

P ( X = i ).

The expected counts are n × p i

, where the p i are as given in the foregoing table, and n is the sample size, equal to the sum of the observed counts which is 300. We tabulate all the available information as follows:

Category

Probability

Observed count

Expected count

0 1 2 3 4

0.0207

0.0803

0.1557

0.2012

0.1950

6 24 42 59 62

6.21

24.09

46.71

60.36

58.50

Category

Probability

5 6 7 8 ≥ 9

0.1512

0.0977

0.0541

0.0262

0.0179

Observed count 44 41 14

Expected count 45.36

29.31

16.23

6

7.86

2

5.37

The test statistic is

X

( O − E ) 2

Q =

E

(6 − 6 .

21) 2

=

6 .

21

= 8.2871

+

(24 − 24 .

09) 2

24 .

09

+ . . .

+

(6 − 7 .

86) 2

7 .

86

+

(2 − 5 .

37) 2

5 .

37 to 4 decimal places.

The degrees of freedom are 10 − 1 − 1 = 8 ; the table value at the 0.10 significance level is 13.362 . The observed test statistic is much less than this so we do not reject H

0

— even at the 0.10 level.

Now Devo says to combine the 8 and 9 categories into one cell — this makes me inclined to believe that Devo was actually doing the problem incorrectly. I.e. he probably used X = 9 for category 9 rather than X ≥ 9. Doing this makes the probabilities add up to something less than 1, and hence the observed counts add up to something less than n = 300 = the sum of the observed counts. Which makes everything a bit out of whack.

Doing it Devo’s way we get p

9

= P ( X = 9) = 0 .

0113, which makes E

9 equal to

3.39. Devo would have worried because this is less than 5, violating the conservative criterion for validity of a chi-squared test. Combining the 8 and 9 categories we get a corresponding observed count of 8. The probability for this category is 0 .

0262 +

0 .

0113 = 0 .

0375, giving an expected count of 11.25. The rest of the observed and expected counts remain unchanged; calculating the chi-squared statistic after having combined categories 8 and 9 we get 6.6709 (to 4 decimal places). This is on 9 − 1 − 1 =

7 degrees of freedom. The table value at the 0.10 significance level is 12.017 , and the observed value of the test statistic, 6.6709, is much smaller so our decision is unchanged: do not reject H

0

.

Finally we could also do the problem by considering a category X ≥ 10 (with an observed count of 0). The probability p

10 for this category is 0.0066, and the corresponding expected count is 1.98. Notice that all of the expected counts are at least

1; the smallest two are the last ones, 3.39 and 1.98. Also ¯ = 300 / 11 = 27 .

2727 > 5.

Hence the liberal criterion for the validity of the chi-squared test is satisfied. The resulting test statistic is 8.7221 .

The degrees of freedom are 11 − 1 − 1 = 9 ; the table value at the 0.10 significance level is 14.684 . The observed test statistic is much less than this so yet again we do not reject H

0

— even at the 0.10 level.

The upshot of it all is that no matter how you conduct the test, the data cast no doubt on the assertion that the number of exchanges has a Poisson distribution; the data are quite compatible with this assertion.

2

3. The Minitab output for the analysis is as follows:

MTB > chis c1-c3

Expected counts are printed below observed counts

1

A

66

61.06

B

44

47.39

C

34

35.54

Total

144

2

3

Total

36

39.86

32

33.08

134

38

30.94

22

25.67

104

20

23.20

24

19.25

78

94

78

316

Chi-Sq = 0.399 + 0.243 + 0.067 +

0.374 + 1.613 + 0.442 +

0.035 + 0.525 + 1.170 = 4.868

DF = 4, P-Value = 0.301

(a) The null hypothesis is H

0

: There is no association between the classification according to the alleles at locus 1 and the classification according to the alleles at locus 2. This can also be stated as H

0

: The alleles at locus 1 are independent of the alleles at locus 2, or even more succinctly as H

0

: The two loci are in linkage equilibrium.

The alternative hypothesis is simply the contradiction of the null hypothesis, which may be stated as H a

: There is some sort of association between the classification according to the alleles at locus 1 and the classification according to the alleles at locus 2, or as H a

: The alleles at locus 1 and the alleles at locus 2 are dependent, or as H a

: The two loci are not in linkage equilibrium.

(b) From the Minitab output this expected number is 39.86 .

(c) Since the p value of the test is 0.301 which is large (much bigger than 0.10) we do not reject H

0 even at the 0.10 significance level. I.e. there is no evidence in the data against these loci being in linkage equilibrium.

(d) However we CANNOT conclude that the loci ARE in linkage equilibrium. We have simply “failed” to prove that they are not in linkage equilibrium. (Suppose that I tell you that I have a pet dragon in my garage 1 . You can’t prove that I don’t have a pet dragon in my garage. But this is a far cry from proving that I do have a pet dragon in my garage.)

1 See “ The Demon Haunted World ” by Carl Sagan.

3

4. (a)

(i) =

(ii) =

(iii) =

(iv) =

120 × 402

= 96.48

500

32 × 100

= 6.40

500

(34 − 26 .

40) 2

= 2.188

26 .

40

(10 − 5 .

12) 2

= 4.651

5 .

12

(v) = (4 − 1) × (3 − 1) = 6

(b) H

0

: There is no association between machine and specification category.

H a

:

There is some sort of association between machine and specification category.

(c) The table value at the 0.05 significance level is 12.592 ; since the observed value of the test statistic is 15 .

584 > 12 .

592 we reject H

0 at the 0.05 level (and hence a fortiori at the 0.10 level). At the 0.01 significance level is 16.812 ; since the observed value of the test statistic is 15 .

584 < 16 .

812 we do not reject H

0 at the 0.01 level.

I.e. at the 0.10 and 0.05 significance levels there is evidence of an association between “machine” and “specification category”. However at the (tougher) 0.01

level the evidence of such an association is lacking.

4

Download