Lecture 25 25.1 Goodness-of-fit for composite hypotheses.

advertisement

Lecture 25

25.1

Goodness-of-fit for composite hypotheses.

(Textbook, Section 9.2)

Suppose that we have a sample of random variables X

1

, . . . , X n finite number of values B

1

, . . . , B r with unknown probabilities that can take a p

1

= ( X = B

1

) , . . . , p r

= ( X = B r

) and suppose that we want to test the hypothesis that this distribution comes from a parameteric family {

θ

: θ ∈ Θ } .

In other words, if we denote p j

( θ ) =

θ

( X = B j

) , we want to test:

H

1

H

2

: p j

= p j

( θ ) for all j ≤ r for some θ ∈ Θ

: otherwise.

If we wanted to test H

1 for one particular fixed θ we could use the statistic

T = r

X ( ν j

− np j

( θ )) 2

, np j

( θ ) j =1 and use a simple χ 2 test from last lecture. The situation now is more complicated because we want to test if p j

= p j

( θ ) , j ≤ r at least for some θ ∈ Θ which means that we have many candidates for θ.

One way to approach this problem is as follows.

(Step 1) Assuming that hypothesis find an estimate θ ∗ of this unknown θ

H

1 holds, i.e.

and then

=

θ for some θ ∈ Θ , we can

θ ∗ by using (Step 2) try to test whether indeed the distribution is equal to the statistics

T = r

X ( ν j

− np j

( θ ∗ )) 2 np j

( θ ∗ ) j =1 in χ 2 test.

99

LECTURE 25.

100

This approach looks natural, the only question is what estimate θ ∗ the fact that θ ∗ to use and how also depends on the data will affect the convergence of T.

It turns out that if we let θ ∗ be the maximum likelihood estimate, i.e.

θ that maximizes the likelihood function ϕ ( θ ) = p

1

( θ ) ν

1 . . . p r

( θ ) ν r then the statistic

T = r

X ( ν j

− np j

( θ ∗ )) 2 np j

( θ ∗ ) j =1

→ χ 2 r − s − 1 converges to χ 2 r − s − 1 distribution with r − s − 1 degrees of freedom, where s is the dimension of the parameter set Θ .

Of course, here we assume that s ≤ r − 2 so that we have at least one degree of freedom. Very informally, by dimension we understand the number of free parameters that describe the set Θ , which we illustrate by the following examples.

1. The family of Bernoulli distributions B ( p ) has only one free parameter p ∈ [0 , 1] so that the set Θ = [0 , 1] has dimension s = 1 .

2. The family of normal distributions

σ 2

N ( µ, σ 2 ) has two free parameters µ ∈ and

≥ 0 and the set Θ = × [0 , ∞ ) has dimension s = 2 .

3. Let us consider a family of all distributions on the set { 0 , 1 , 2 } .

The distribution

( X = 0) = p

1

, ( X = 1) = p

2

, ( X = 2) = p

3 is described by parameters p

1

, p

2 and p

3

.

But since they are supposed to add up to 1 p

3

, p

1

+ p

2

+ p

3

= 1 , one of these parameters is not free, for example,

= 1 − p

1

− p

2

.

The remaining two parameters belong to a set p

1

∈ [0 , 1] , p

2

∈ [0 , 1 − p

1

] shown in figure 25.1, since their sum should not exceed 1 and the dimension of this set is s = 2 .

1 p2 s=2 p1

1

0

Figure 25.1: Free parameters of a three point distribution.

LECTURE 25.

101

Example.

(textbook, p.545) Suppose that a gene has two possible alleles A

1

A

2 and A

2

A

2

.

We want to test a theory that and and the combinations of theses alleles define there possible genotypes A

1

A

1

, A

1

A

2

Probability to pass A

1

Probability to pass A

2 to a child = θ : to a child = 1 − θ : and the probabilities of genotypes are given by p

1

( θ ) = ( A

1

A

1

) = θ 2 p

2

( θ ) = ( A

1

A

2

) = 2 θ (1 − θ ) p

3

( θ ) = ( A

2

A

2

) = (1 − θ ) 2

(25.1)

Suppose that given the sample X

1

, . . . , X n type are ν

1

, ν

2 of the population the counts of each genoand ν

3

.

To test the theory we want to test the hypotheses

H

1

H

2

: p

1

= p

1

( θ ) , p

: otherwise.

2

= p

2

( θ ) , p

3

= p

3

( θ ) for some θ ∈ [0 , 1]

First of all, the dimension of the parameter set is s = 1 since the family of distributions in (25.1) are described by one parameter θ.

To find the MLE θ ∗ we have to maximize the likelihood function p

1

( θ ) ν

1 p

2

( θ ) ν

2 p

3

( θ ) ν

3 or, equivalently, maximize the log-likelihood log p

1

( θ ) ν

1 p

2

( θ ) ν

2 p

3

( θ ) ν

3 = ν

1 log p

1

( θ ) + ν

2 log p

2

( θ ) + ν

3 log p

3

( θ )

= ν

1 log θ 2 + ν

2 log 2 θ (1 − θ ) + ν

3 log(1 − θ ) 2 .

To find the critical point we take the derivative, set it equal to 0 and solve for θ which gives (we omit these simple steps):

θ ∗ =

2 ν

1

+ ν

2

.

2 n

Therefore, under the null hypothesis H

1 the statistic

T =

( ν

1

− np

1

( θ ∗ )) 2 np

1

( θ ∗ )

→ χ 2 r − s − 1

= χ 2

3 − 1 − 1

+

( ν

2

= χ 2

1

− np

2

( θ ∗ )) 2 np

2

( θ ∗ )

+

( ν

3

− np

3

( θ ∗ )) 2 np

3

( θ ∗ ) converges to χ 2

1 distribution with one degree of freedom. If we take the level of significance α = 0 .

05 and find the threshold c so that

0 .

05 = α = χ 2

1

( T > c ) ⇒ c = 3 .

841

LECTURE 25.

102 then we can use the following decision rule:

H

1

H

2

: T ≤ c = 3 .

841

: T > c = 3 .

841

General families.

We could use a similar test when the distributions supported by a finite number of points B

1

, . . . , B r

θ

, θ ∈ Θ are not necessarily

(for example, continuous distributions). In this case if we want to test the hypotheses

H

1

H

2

: =

θ for some θ ∈ Θ

: otherwise we can discretize them as we did in the last lecture (see figure 25.2), i.e. consider a family of distributions p j

( θ ) =

θ

( X ∈ I j

) for j ≤ r, and instead consider derivative hypotheses

H

1

H

2

: p j

= p j

( θ ) for some θ, j = 1 , · · · , r

: otherwise.

P

θ

I1 I2 Ir

Figure 25.2: Goodness-of-fit for Composite Hypotheses.

Download