licensed under a . Your use of this Creative Commons Attribution-NonCommercial-ShareAlike License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2006, The Johns Hopkins University and Karl Broman. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
Estimation
Goal: Estimate a population parameter, θ
Data: X1, X2, . . . , Xn ∼ iid with distribution depending on θ
If one has many estimators to choose from, pick
• That with the smallest SE, among all unbiased estimators
• That with the smallest RMS error, even if biased
Sometimes it’s not clear how to form even one good estimator.
Example 1
Consider the problem of estimating the recombination fraction
(call it θ) between two genetic markers in an intercross.
a a
b b
A A
B B
a a
b b
A a
B b
A a
B b
a A a A a A a A A a A A a a
b B b B b B b B B b B B b b
Note: We won’t observe the haplotypes.
a A
b B
a A
b B
Example 1
Data
AA
BB 58
Bb 8
bb 1
Probabilities
Aa
9
95
12
aa
0
14
53
AA
BB 14 (1 − θ)2
Bb 12 θ(1 − θ)
1 2
bb
θ
4
1
2
Aa
1
2 θ(1 − θ)
[θ2 + (1 − θ)2]
1
θ(1 − θ)
2
aa
1 2
4 θ
1
θ(1 − θ)
2
1
(1 − θ)2
4
Question: Possible estimates of the recombination fraction, θ?
Maximum likelihood estimation
Likelihood function:
Log likelihood:
L(θ) = Pr(data | θ)
l(θ) = log Pr(data | θ)
Maximum likelihood estimate:
Choose, as the estimate of θ, the value of θ
for which L(θ) is maximized.
For the example,
1
1
(9+8+14+12)
2 (58+53)
L(θ) ∝ 4 (1 − θ)
×
×
θ(1 − θ)
1 2(1+0)
1 22
95
× 2 [θ + (1 − θ)2]
4 θ
Example 1: Log likelihood function
−400
log likelihood
−450
MLE = 9.4%
−500
−550
0.0
0.1
0.2
0.3
0.4
0.5
Recombination Fraction
A closer view
−402
log likelihood
−404
MLE = 9.4%
−406
−408
0.06
0.08
0.10
Recombination Fraction
0.12
0.14
A comparison of two estimators
A simple estimator:
Simple estimator
Assume double-heterozygotes are
non-recombinant
For the case n = 250, θ = 0.10;
the results of 1000 simulations:
0.06
0.08
0.10
0.12
0.14
0.12
0.14
Estimate
Simple
Bias –0.005
SE
0.013
RMSE 0.014
MLE
0.000
0.014
0.014
MLE
0.06
0.08
0.10
Estimate
Example 2
Suppose x ∼ binomial(n, p).
log likelihood function: l(p) = log
n
x
px (1 − p)(n−x)
= x log(p) +(n − x) log(1 − p)+ constant
p̂ = x/n
MLE: the obvious thing:
log likelihood
n=100, x=22
−4
MLE = 0.22
−6
−8
0.10
0.15
0.20
0.25
p
0.30
0.35
0.40
Example 3
Suppose x1, . . . , x20 ∼ iid Poisson(λ).
−λ xi
e
λ
/
x
!
i
i
P
= . . . = −20λ + ( xi) logλ + constant
log likelihood function: l(λ) = log
MLE: the obvious thing:
Q
λ̂ = x̄
log likelihood
n=20, mean=1.5
−5
−10
MLE = 1.5
−15
−20
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
λ
Example 4
Suppose x1, . . . , xn ∼ iid N(µ, σ )
log likelihood function: l(µ, σ ) = log
nQ
MLEs: almost the obvious things:
pP
µ̂ = x̄
σ̂ =
(xi − x̄)2/n
√1
i σ 2π
exp
h
− 12
x−µ 2
σ
io
Example 4: the log likelihood surface
n=100
7
6
σ
5
MLE
4
3
9
10
11
12
µ
About MLEs
Maximum likelihood estimation is a general procedure for finding
a reasonable estimator
• In many cases, the MLE turns out to be the obvious thing.
• MLEs are often good (but not necessarily the best) estimators
– Nearly unbiased
– small SE
• Sometimes obtaining the MLEs requires hefty computation
Example 5: ABO blood groups
Phenotype
Genotype
Frequency
O
OO
p2O
A
AA or AO
p2A + 2pApO
B
BB or BO
p2B + 2pB pO
AB
AB
2pApB
Frequencies under the assumption of Hardy-Weinberg equilibrium.
Example 5: Data
Phenotype
No. subjects
% subjects
O
117
46.8%
A
98
39.2%
B
29
11.6%
AB
6
2.4%
Total
250
100%
Question: Estimates of pA, pB , pO ?
Example 5: Estimates
Simple estimates
√
p̃O = 0.468 = 0.684
p̃A to solve p̃2A + 2p̃A0.684 = 0.392
=⇒ p̃A = 0.243
p̃B = 0.024/(2p̃A) = 0.072
Log likelihood:
l(pO , pA, pB ) =
117 log(p2O )+ 98 log(p2A +2pApO )+ 29 log(p2B +2pB pO )+ 6 log(2pApB )
Example 5: log likelihood
1.0
p0 = 0.690
0.8
pA = 0.237
pB = 0.073
pA
0.6
0.4
MLE
0.2
0.0
0.0
0.2
0.4
0.6
pO
0.8
1.0