licensed under a . Your use of this Creative Commons Attribution-NonCommercial-ShareAlike License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2006, The Johns Hopkins University and Karl W. Broman. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
Goodness of fit tests
We observe data like that in the following table:
AA
AB
BB
35
43
22
We want to know:
Do these data correspond reasonably to the
proportions 1:2:1?
I have neglected to make precise the role of chance in this business.
Multinomial distribution
• Imagine an urn with k types of balls.
Let pi denote the proportion of type i.
• Draw n balls with replacement.
P
• Outcome: (n1, n2, . . . , nk), with i ni = n
where ni = no. balls drawn that were of type i.
Examples
• The binomial distribution: the case k = 2.
• Self a heterozygous plant, obtain 50 progeny, and use test crosses
to determine the genotypes of each of the progeny.
• Obtain a random sample of 30 people from Hopkins, and classify them according to student/faculty/staff.
Multinomial probabilities
P(X 1=n1, . . . , X k=nk) =
if
Otherwise
0 ≤ ni ≤ n,
n!
n1! × · · · × nk!
i ni
P
pn11 × · · · × pnk k
=n
P(X 1=n1, . . . , X k=nk) = 0.
Example
Let
(p1, p2, p3) = (0.25, 0.50, 0.25) and n = 100.
P(X 1=35, X 2=43, X 3=22) =
Then
100!
0.2535 0.5043 0.2522
35! 43! 22!
≈ 7.3 × 10-4
Rather brutal, numerically speaking.
The solution: take logs (and use a computer).
Goodness of fit test
We observe (n1, n2, n3) ∼ multinomial( n, (p1, p2, p3) ).
We seek to test H0 : p1 = 0.25, p2 = 0.5, p3 = 0.25.
versus Ha : H0 is false.
We need:
(a) A test statistic
(b) The null distribution of the test statistic
Test statistics
Let n0i denote the expected count in group i if H0 is true.
LRT statistic
Pr(data | p = MLE)
LRT = 2 ln
Pr(data | H0)
P
= . . . = 2 i ni ln(ni/n0i)
χ2 test statistic
X (observed − expected)2
2
X =
expected
X (ni − n0)2
i
=
0
ni
i
Null distribution of test statistic
What values of LRT (or X2) should we expect, if H0 were true?
The null distributions of these statistics may be obtained by:
• Brute-force analytic calculations
• Computer simulations
• Asymptotic approximations
The brute-force method
Pr(LRT = g | H0) =
X
Pr(n1, n2, n3 | H0)
n1,n2,n3
giving LRT = g
This is not feasible.
Computer simulation
1. Simulate a table conforming to the null hypothesis.
e.g., simulate (n1, n2, n3) ∼ multinomial( n=100, (1/4, 1/2, 1/4) )
2. Calculate your test statistic.
3. Repeat steps (1) and (2) many (e.g., 1000 or 10,000) times.
Estimated critical value = the 95th percentile of the results
Estimated P-value = the prop’n of results ≥ the observed value.
In R, use rmultinom(n, size, prob) to do n simulations of
a multinomial(size, prob).
Asymptotic approximation
Very mathemathically savy people have shown that,
if the sample size, n, is large,
LRT ∼ χ2(k − 1)
X2 ∼ χ2(k − 1)
Example
We observe the following data:
AA
AB
BB
35
43
22
We imagine that these are counts
(n1, n2, n3) ∼ multinomial( n=100, (p1, p2, p3) ).
We seek to test H0 : p1 = 1/4, p2 = 1/2, p3 = 1/4.
We calculate LRT ≈ 4.96 and X2 ≈ 5.34.
Referring to the asymptotic approximations (χ2 dist’n with 2 degrees of freedom), we obtain P ≈ 8.4% and P ≈ 6.9%.
With 10,000 simulations under H0, we obtain P ≈ 8.9% and P ≈
7.4%.
Est’d null dist’n of LRT statistic
95th %ile = 6.06
Observed
0
5
10
15
G
Est’d null dist’n of chi−square statistic
95th %ile = 6.00
Observed
0
5
10
15
X2
Summary and recommendation
For either the LRT or the χ2 test:
• The null distribution is approximately χ2(k − 1) if the sample size
is large.
• The null distribution can be approximated by simulating data
under the null hypothesis.
If the sample size is sufficiently large that the expected count in
each cell is ≥ 5, use the asymptotic approximation without worries.
Otherwise, consider using computer simulations.
Composite hypotheses
pAA = 0.25, pAB = 0.5, pBB = 0.25
Sometimes, we ask not
But rather something like:
pAA = f2, pAB = 2f(1 − f), pBB = (1 − f)2
for some f
For example: Genotypes, of a random sample of individuals, at a
diallelic locus.
Question: Is the locus in Hardy-Weinberg equilibrium (as expected
in the case of random mating)?
Example data:
AA
AB
BB
5
20
75
Another example
ABO blood groups; 3 alleles A, B, O.
Phenotype A = genotype AA or AO
B = genotype BB or BO
AB = genotype AB
O = genotype O
Allele frequencies: fA, fB, fO
(Note that fA + fB + fO = 1)
Under Hardy-Weinberg equilibrium, we expect:
pAB = 2fAfB
pA = f2A + 2fAfO
pO = f2O
pB = f2B + 2fBfO
Example data:
O
A
B
AB
104
91
36
19
LRT for example 1
Data: (nAA, nAB, nBB) ∼ multinomial( n, (pAA, pAB, pBB) )
We seek to test whether the data conform reasonably to
H0: pAA = f2, pAB = 2f(1 − f), pBB = (1 − f)2
(for some f)
General MLEs:
p̂AA = nAA/n, p̂AB = nAB/n, p̂BB = nBB/n
MLE under H0:
f̂ = (nAA + nAB/2)/n
−→
2
p̃AA = f̂ , p̃AB = 2 f̂ (1 − f̂), p̃BB = (1 − f̂)2
LRT statistic:
LRT = 2 × ln
Pr(nAA, nAB, nBB | p̂AA, p̂AB, p̂BB)
Pr(nAA, nAB, nBB | p̃AA, p̃AB, p̃BB)
LRT for example 2
Data: (nO, nA, nB, nAB) ∼ multinomial( n, (pO, pA, pB, pAB) )
We seek to test whether the data conform reasonably to
H0: pA = f2A + 2fAfO, pB = f2B + 2fBfO, pAB = 2fAfB, pO = f2O
(for some fO, fA, fB, where fO + fA + fB = 1)
General MLEs:
p̂O, p̂A, p̂B, p̂AB, like before.
MLE under H0: Requires numerical optimization.
Call them (f̂O, f̂A, f̂B) −→ (p̃O, p̃A, p̃B, p̃AB)
LRT statistic:
LRT = 2 × ln
Pr(nO, nA, nB, nAB | p̂O, p̂A, p̂B, p̂AB)
Pr(nO, nA, nB, nAB | p̃O, p̃A, p̃B, p̃AB)
χ2
test for these examples
• Obtain the MLE(s) under H0.
• Calculate the corresponding cell probabilities.
• Turn these into (estimated) expected counts under H0.
• Calculate
2
X =
X (observed − expected)2
expected
Null distribution for these cases
• Computer simulation: (with one wrinkle)
– Simulate data under H0 (plug in the MLEs for the observed
data)
– Calculate the MLE with the simulated data
– Calculate the test statistic with the simulated data
– Repeat many times.
• Asymptotic approximation
– Under H0, if the sample size, n, is large, both the LRT statistic and the χ2 statistic follow, approximately, a χ2 distribution
with k – s – 1 degrees of freedom, where s = no. parameters
estimated under H0.
– Note that s = 1 for example 1, and s = 2 for example 2, and
so df = 1 for both examples.
Results, example 1
Example data:
MLE:
AA
AB
BB
5
20
75
f̂ = (5 + 20/2) / 100 = 15%
Expected counts:
Test statistics:
2.25
25.5
LRT statistic = 3.87
Asymptotic χ2(df = 1) approx’n:
10,000 computer simulations:
72.25
X2 = 4.65
P ≈ 4.9%
P ≈ 8.2%
P ≈ 3.1%
P ≈ 2.4%
Est’d null dist’n of LRT statistic
95th %ile = 4.58
Observed
0
2
4
6
8
G
Est’d null dist’n of chi−square statistic
95th %ile = 3.36
Observed
0
2
4
X2
6
8
Results, example 2
Example data:
MLE:
O
A
B
AB
104
91
36
19
f̂O ≈ 62.8%, f̂A ≈ 25.0%, f̂B ≈ 12.2%.
Expected counts:
Test statistics:
98.5
94.2
LRT statistic = 1.99
Asymptotic χ2(df = 1) approx’n:
10,000 computer simulations:
42.0
15.3
X2 = 2.10
P ≈ 16%
P ≈ 17%
P ≈ 15%
P ≈ 15%
Est’d null dist’n of LRT statistic
95th %ile = 3.91
Observed
0
2
4
6
8
G
Est’d null dist’n of chi−square statistic
95th %ile = 3.86
Observed
0
2
4
X2
6
8
Example 3
Data on no. sperm bound to an egg
0
1
2
3
4
5
count 26 4
4
2
1
1
Q: Do these follow a Poisson distribution?
If X ∼ Poisson(λ), Pr(X =i) = e−λλi/i!
χ2
where λ = mean
and likelihood ratio tests
MLE, λ̂ = sample average
= 0 × 26 + 1 × 4 + 2 × 4 + . . . + 5 × 1 ≈ 0.71
Expected counts: n0i = n × e−λ̂ λ̂i / i!
0
1
2
3
4
5
observed 26
4
4
2
1
1
expected 18.7 13.3 4.7 1.1 0.2 0.0
X2 =
P (obs−exp)2
exp
= . . . = 42.8
LRT = 2
P
obs log(obs/exp) = . . . = 18.8
Compare to χ2(df = 6 – 1 – 1 = 4) −→ p-value = 1 × 10−8 (χ2) and 9 × 10−4 (LRT).
By simulation: p-value = 16/10,000 (χ2) and 7/10,000 (LRT)
Null simulation results
Observed
0
20
40
60
80
100
120
Simulated χ2 statistic
Observed
0
5
10
15
20
Simulated LRT statistic
A final note
With these sorts of goodness-of-fit tests, we are often happy when
are model does fit.
In other words, we often prefer to fail to reject H0.
Such a conclusion, that the data fit the model reasonably well,
should be phrased and considered with caution.
We should think: how much power do I have to detect, with these
limited data, a reasonable deviation from H0?