Slide set 26 Stat 330 (Spring 2015) Last update: March 30, 2015

advertisement
Slide set 26
Stat 330 (Spring 2015)
Last update: March 30, 2015
Stat 330 (Spring 2015): slide set 26
Estimators (Cont’d)
Review: What is an estimator? What are estimates?
properties we use to compare estimators?
What are the
Example: The sample mean x̄ is consistent for µ. That means that, as
the sample size gets large, then X̄ gets very close to µ in the sense of
probability.
Derivation: Using Chebyshev’s inequality,
Var(X̄)
σ2
P (|X̄ − µ| > ) ≤
= 2
2
n
so that if n → ∞
P (|X̄ − µ| > ) → 0
which means X̄ is consistent for µ.
1
Stat 330 (Spring 2015): slide set 26
Estimating parameters
Method of moments: is one of the methods available for estimate
parameters. The basic idea is to equate the sample moments with the
population moments based on the sample (x1, · · · , xn), where xi are the
observed values of Xi for i = 1, . . . , n.
♣ The k-th population moment is defined as
µk = E(X k )
♦ For example: E(X) = µ, Var(X) = µ2 − µ21
♣ The k-th sample moments is defined as
n
P
−1
mk = n
xki ,
i=1
♦ For example: m1 =
n
P
xi/n = x̄
i=1
2
Stat 330 (Spring 2015): slide set 26
Estimating parameters (Cont’d)
♣ The k-th population central moments is defined as
µ0k = E((X − µ)k )
♣ The k-th sample central moments is defined as
m0k = n−1
n
X
(xi − x̄)k , xi is the realization/sample value of Xi,
i=1
where x̄ is the sample mean.
♠ To estimate k parameters, equate the first k population and sample
moments
µi = mi, i = 1, 2, · · · , k
So we have k equations to solve.
3
Stat 330 (Spring 2015): slide set 26
Estimating parameters: Examples
Example 1: To estimate the parameter λ of Poisson distribution, we need
to set
µ1 = m1 ⇐⇒ λ = m1 = x̄ since µ1 = E(X) = λ
Only one unknown is there; solving for λ we obtain
λ̂ = x̄
which is the method of moment estimator (short as MoM) of λ.
Example 2: (Roll a die) A die is rolled until a face shows a 6. Repeating
this experiment 20 times, and the number of trials needed to show a 6 are
3, 9, 1, 6, 11, 10, 4, 1, 1, 4, 4, 10, 17, 1, 23, 3, 2, 7, 2, 3
Estimate the probability of getting 6, say θ when rolling a die using method
of moments.
4
Stat 330 (Spring 2015): slide set 26
Solution: X is the number of trials needed for getting a 6 and x is the
value of X for each realization. This is a geometric random variable with
parameter θ, i.e. P (X = x) = θ(1 − θ)x−1. The method of moment gives
equation
20
1 X
xi since µ1 = E(X) = 1/θ
µ1 = m1 ⇐⇒ 1/θ = m1 =
20 i=1
so that for given realization xi, θ̂ = 20/(
20
P
xi) = 0.16393
i=1
Example 3: Assume that IQ score follows a normal distribution.
Suppose we already know that the mean IQ score of human
population is µ = 100 and would like to estimate the variance
σ 2 based on the IQ score of randomly selected 20 people:
92, 115, 103, 81, 107, 95, 92, 118, 99, 124, 90, 87, 108, 103, 91, 74, 84, 124, 81, 100
5
Stat 330 (Spring 2015): slide set 26
Solution: Only one parameter is unknown, σ 2.
µ2 = E(X 2) = σ 2 + E(X)2 = σ 2 + µ21 and
We know that
20
1 X 2
m2 =
x , for given realization m2 = 9879.5
20 i=1 i
so that
µ2 = m2 ⇔ σˆ2 = m2 − µ21.
If using µ1 = 100 then σˆ2 = 9879.5 − 1002 < 0!!
problem?
Using central moment,
µ02
=
m02
How to resolve this
1 X
⇔ σ = E((X − µ) ) =
(xi − x̄)2
20
2
2
so that σˆ2 = 196.94. That is an estimate of σ 2 is obtained directly.
6
Stat 330 (Spring 2015): slide set 26
Estimating parameters: Maximum likelihood estimation
(MLE)
Situation and motivation: We have n data values (some numbers, e.g.), and
assume that these data values are realizations of n i.i.d. random variables
X1, . . . , Xn with distribution Fθ . Unfortunately the value for θ is unknown.
Motivation: By changing the value for θ we can ”move the density function
fθ around” (below), so that the density function fits the data best.
7
Stat 330 (Spring 2015): slide set 26
Principle: since we do not know the true value θ of the distribution, we
take that value θ̂ that most likely produced the observed values x1, · · · , xn
Sketch of idea: ”Most likely” is equivalent to maximize
Pθ (X1 = x1 ∩ X2 = x2 ∩ . . . ∩ Xn = xn)
Xi are independent!
=
=
or
=
Pθ (X1 = x1) · Pθ (X2 = x2) · . . . · Pθ (Xn = xn)
n
Y
Pθ (Xi = xi)
i=1
n
Y
fθ (xi)
i=1
Qn
i.e. θ̂ is the maximizer of i=1 fθ (xi).
Qn
♠ Summary: θ̂ = argmaxθ i=1 fθ (xi)
8
Stat 330 (Spring 2015): slide set 26
Estimating parameters: Maximum likelihood estimation
Maximum likelihood:
(a) The joint probability density function (for continuous random variables)
or joint probability mass function (for discrete random variables) for a
randomm sample X1, · · · , Xn is defined as
Qn
Qn
L(θ) = L(θ; X1 = x1, · · · , Xn = xn) = i=1 Pθ (xi) or i=1 fθ (xi)
The above function is a function of θ
♠ L(θ) is called the Likelihood function of X1, · · · , Xn.
(b) The maximum likelihood estimator of θ is the maximizer of likelihood
L(θ) on its parameter space Θ
(c) How do we get a maximum of L(θ)? Differentiate it and set it to zero,
or try to find the θ to maximize the likelihood directly.
9
Stat 330 (Spring 2015): slide set 26
(d) Very often, it is difficult to find a derivative of L(θ) - instead we use
another trick, and find a maximum for l(θ) = log L(θ), the Log-Likelihood
function.
♣ Five steps for finding MLE:
1. Find Likelihood function L(θ).
2. Get natural log of Likelihood function l(θ) = log L(θ).
3. Differentiate log-Likelihood function with respect to θ.
4. Set derivative to zero.
5. Solve for θ.
10
Stat 330 (Spring 2015): slide set 26
Examples: MLE
Example (Rolling dies): Find the MLE for the probability of getting a face
6, θ, in the ”Roll a Die” example: repeating this experiment 100 times gave
the following results,
x
#
1
18
2
20
3
8
4
9
5
9
6
5
7
8
8
3
9
5
11
3
14
3
15
3
16
1
17
1
20
1
21
1
27
1
29
1
♣ Recall: We know, that X, the number of rolls until a 6 shows up has a
geometric distribution Geop and x in the table is its realization.
♣ For a fair die, p is 1/6, and the Geometric distribution has probability
mass function p(x) = (1 − p)x−1 · p.
Step 1: Find the likelihood function L(θ): Since we have observed 100
Q100
outcomes x1, ..., x100, the likelihood function L(p) = i=1 p(xi),
L(p) =
100
Y
i=1
fp(xi) =
100
Y
i=1
(1 − p)xi−1p = p100 ·
100
Y
(1 − p)xi−1
i=1
11
Stat 330 (Spring 2015): slide set 26
=p
100
P100
· (1 − p)
i=1 (xi −1)
=p
100
· (1 − p)
P100
i=1 xi −100
.
Step 2: log of Likelihood function log L(p):
P100
l(p) = log L(p) = log p100 · (1 − p) i=1 xi−100
P100
= log p100 + log (1 − p) i=1 xi−100
!
100
X
= 100 log p +
xi − 100 log(1 − p).
i=1
Step 3: Differentiate log-Likelihood with respect to p:
d
d
100
l(p) =
log L(p) =
−
dp
dp
p
100
X
i=1
!
xi − 100
1
1−p
12
Stat 330 (Spring 2015): slide set 26
=
=
1
p(1 − p)
1
p(1 − p)
100(1 − p) −
100
X
! !
xi − 100 p
i=1
100 − p
100
X
!
xi .
i=1
Step 4: Set derivative to zero: For the estimate p̂ the derivative must be
zero:
!
100
X
d
1
log L(p̂) = 0 ⇐⇒
100 − p̂
xi = 0
dp
p̂(1 − p̂)
i=1
Step 5: Solve for p̂:
1
p̂(1 − p̂)
100 − p̂
100
X
i=1
!
xi
= 0 ⇐⇒ 100 − p̂
100
X
xi = 0
i=1
13
Stat 330 (Spring 2015): slide set 26
so that
100
p̂ = P100
i=1 xi
=
1
100
1
P100
i=1 xi
.
♠ In total, we have an estimate
100
p̂ =
= 0.1710
568
for the given xi from the above table.
14
Download