(Cont’d) rs

advertisement
♦ For example: m1 =
i=1
n
xi/n = x̄
i=1
♣ The k-th sample moments is defined as
n
xki ,
mk = n−1
♦ For example: E(X) = μ, Var(X) = μ2 − μ21
μk = E(X k )
♣ The k-th population moment is defined as
2
i=1
n
(xi − x̄)k , xi is the realization/sample value of Xi,
3
♠ To estimate k parameters, equate the first k population and sample
moments
μi = mi, i = 1, 2, · · · , k
So we have k equations to solve.
where x̄ is the sample mean.
mk = n−1
♣ The k-th sample central moments is defined as
μk = E((X − μ)k )
Method of moments: is one of the methods available for estimate
parameters. The basic idea is to equate the sample moments with the
population moments based on the sample (x1, · · · , xn), where xi are the
observed values of Xi for i = 1, . . . , n.
♣ The k-th population central moments is defined as
1
Stat 330 (Spring 2015): slide set 26
Estimating parameters (Cont’d)
Stat 330 (Spring 2015): slide set 26
Var(X̄)
σ2
= 2
2
n
P (|X̄ − μ| > ) → 0
which means X̄ is consistent for μ.
so that if n → ∞
P (|X̄ − μ| > ) ≤
Derivation: Using Chebyshev’s inequality,
Estimating parameters
Last update: March 30, 2015
Stat 330 (Spring 2015)
Slide set 26
Example: The sample mean x̄ is consistent for μ. That means that, as
the sample size gets large, then X̄ gets very close to μ in the sense of
probability.
What are the
Stat 330 (Spring 2015): slide set 26
Review: What is an estimator? What are estimates?
properties we use to compare estimators?
Estimators (Cont’d)
Stat 330 (Spring 2015): slide set 26
We know that
1 (xi − x̄)2
20
6
How to resolve this
7
Motivation: By changing the value for θ we can ”move the density function
fθ around” (below), so that the density function fits the data best.
Situation and motivation: We have n data values (some numbers, e.g.), and
assume that these data values are realizations of n i.i.d. random variables
X1, . . . , Xn with distribution Fθ . Unfortunately the value for θ is unknown.
Estimating parameters: Maximum likelihood estimation
(MLE)
Stat 330 (Spring 2015): slide set 26
so that σˆ2 = 196.94. That is an estimate of σ 2 is obtained directly.
μ2 = m2 ⇔ σ 2 = E((X − μ)2) =
μ2 = m2 ⇔ σˆ2 = m2 − μ21.
If using μ1 = 100 then σˆ2 = 9879.5 − 1002 < 0!!
problem?
Using central moment,
so that
xi) = 0.16393
Example 3: Assume that IQ score follows a normal distribution.
Suppose we already know that the mean IQ score of human
population is μ = 100 and would like to estimate the variance
σ 2 based on the IQ score of randomly selected 20 people:
92, 115, 103, 81, 107, 95, 92, 118, 99, 124, 90, 87, 108, 103, 91, 74, 84, 124, 81, 100
i=1
20
Stat 330 (Spring 2015): slide set 26
1 2
m2 =
x , for given realization m2 = 9879.5
20 i=1 i
20
20
1 xi since μ1 = E(X) = 1/θ
20 i=1
so that for given realization xi, θ̂ = 20/(
μ1 = m1 ⇐⇒ 1/θ = m1 =
5
Solution: Only one parameter is unknown, σ .
μ2 = E(X 2) = σ 2 + E(X)2 = σ 2 + μ21 and
2
Stat 330 (Spring 2015): slide set 26
Solution: X is the number of trials needed for getting a 6 and x is the
value of X for each realization. This is a geometric random variable with
parameter θ, i.e. P (X = x) = θ(1 − θ)x−1. The method of moment gives
equation
4
Estimate the probability of getting 6, say θ when rolling a die using method
of moments.
3, 9, 1, 6, 11, 10, 4, 1, 1, 4, 4, 10, 17, 1, 23, 3, 2, 7, 2, 3
Example 2: (Roll a die) A die is rolled until a face shows a 6. Repeating
this experiment 20 times, and the number of trials needed to show a 6 are
which is the method of moment estimator (short as MoM) of λ.
λ̂ = x̄
Example 1: To estimate the parameter λ of Poisson distribution, we need
to set
μ1 = m1 ⇐⇒ λ = m1 = x̄ since μ1 = E(X) = λ
Only one unknown is there; solving for λ we obtain
Estimating parameters: Examples
Stat 330 (Spring 2015): slide set 26
=
i=1
i=1
n
fθ (xi)
8
5. Solve for θ.
4. Set derivative to zero.
3. Differentiate log-Likelihood function with respect to θ.
2. Get natural log of Likelihood function l(θ) = log L(θ).
1. Find Likelihood function L(θ).
♣ Five steps for finding MLE:
10
(d) Very often, it is difficult to find a derivative of L(θ) - instead we use
another trick, and find a maximum for l(θ) = log L(θ), the Log-Likelihood
function.
Stat 330 (Spring 2015): slide set 26
Pθ (X1 = x1) · Pθ (X2 = x2) · . . . · Pθ (Xn = xn)
n
Pθ (Xi = xi)
Pθ (X1 = x1 ∩ X2 = x2 ∩ . . . ∩ Xn = xn)
n
i.e. θ̂ is the maximizer of i=1 fθ (xi).
n
♠ Summary: θ̂ = argmaxθ i=1 fθ (xi)
or
=
=
Xi are independent!
Sketch of idea: ”Most likely” is equivalent to maximize
Principle: since we do not know the true value θ of the distribution, we
take that value θ̂ that most likely produced the observed values x1, · · · , xn
Stat 330 (Spring 2015): slide set 26
Examples: MLE
1
18
2
20
3
8
4
9
5
9
6
5
7
8
8
3
9
5
11
3
14
3
15
3
16
1
17
1
20
1
21
1
27
1
29
1
L(p) =
i=1
100
fp(xi) =
i=1
100
(1 − p)
xi −1
p=p
100
·
i=1
100
(1 − p)xi−1
11
Step 1: Find the likelihood function L(θ): Since we have observed 100
100
outcomes x1, ..., x100, the likelihood function L(p) = i=1 p(xi),
♣ For a fair die, p is 1/6, and the Geometric distribution has probability
mass function p(x) = (1 − p)x−1 · p.
♣ Recall: We know, that X, the number of rolls until a 6 shows up has a
geometric distribution Geop and x in the table is its realization.
x
#
Example (Rolling dies): Find the MLE for the probability of getting a face
6, θ, in the ”Roll a Die” example: repeating this experiment 100 times gave
the following results,
Stat 330 (Spring 2015): slide set 26
9
(c) How do we get a maximum of L(θ)? Differentiate it and set it to zero,
or try to find the θ to maximize the likelihood directly.
(b) The maximum likelihood estimator of θ is the maximizer of likelihood
L(θ) on its parameter space Θ
♠ L(θ) is called the Likelihood function of X1, · · · , Xn.
The above function is a function of θ
(a) The joint probability density function (for continuous random variables)
or joint probability mass function (for discrete random variables) for a
randomm sample X1, · · · , Xn is defined as
n
n
L(θ) = L(θ; X1 = x1, · · · , Xn = xn) = i=1 Pθ (xi) or i=1 fθ (xi)
Maximum likelihood:
Estimating parameters: Maximum likelihood estimation
· (1 − p)
i=1 (xi −1)
100
=p
100
100
· (1 − p)
i=1
1
100
.
i=1 xi
.
1
1−p
12
14
Stat 330 (Spring 2015): slide set 26
xi − 100
1
100
i=1
100
100
= 0.1710
568
=
for the given xi from the above table.
p̂ =
♠ In total, we have an estimate
i=1 xi
100
p̂ = 100
d
100
d
l(p) =
log L(p) =
−
dp
dp
p
Step 3: Differentiate log-Likelihood with respect to p:
so that
i=1 xi −100
Stat 330 (Spring 2015): slide set 26
100
l(p) = log L(p) = log p100 · (1 − p) i=1 xi−100
100
= log p100 + log (1 − p) i=1 xi−100
100
xi − 100 log(1 − p).
= 100 log p +
Step 2: log of Likelihood function log L(p):
=p
100
1
p(1 − p)
100 − p
i=1
100
i=1
xi .
100(1 − p) −
100
xi − 100 p
Stat 330 (Spring 2015): slide set 26
1
p̂(1 − p̂)
Step 5: Solve for p̂:
100 − p̂
i=1
100
xi
= 0 ⇐⇒ 100 − p̂
i=1
100
xi = 0
13
Step 4: Set derivative to zero: For the estimate p̂ the derivative must be
zero:
100
1
d
100 − p̂
xi = 0
log L(p̂) = 0 ⇐⇒
dp
p̂(1 − p̂)
i=1
=
=
1
p(1 − p)
Download