♦ For example: m1 = i=1 n xi/n = x̄ i=1 ♣ The k-th sample moments is defined as n xki , mk = n−1 ♦ For example: E(X) = μ, Var(X) = μ2 − μ21 μk = E(X k ) ♣ The k-th population moment is defined as 2 i=1 n (xi − x̄)k , xi is the realization/sample value of Xi, 3 ♠ To estimate k parameters, equate the first k population and sample moments μi = mi, i = 1, 2, · · · , k So we have k equations to solve. where x̄ is the sample mean. mk = n−1 ♣ The k-th sample central moments is defined as μk = E((X − μ)k ) Method of moments: is one of the methods available for estimate parameters. The basic idea is to equate the sample moments with the population moments based on the sample (x1, · · · , xn), where xi are the observed values of Xi for i = 1, . . . , n. ♣ The k-th population central moments is defined as 1 Stat 330 (Spring 2015): slide set 26 Estimating parameters (Cont’d) Stat 330 (Spring 2015): slide set 26 Var(X̄) σ2 = 2 2 n P (|X̄ − μ| > ) → 0 which means X̄ is consistent for μ. so that if n → ∞ P (|X̄ − μ| > ) ≤ Derivation: Using Chebyshev’s inequality, Estimating parameters Last update: March 30, 2015 Stat 330 (Spring 2015) Slide set 26 Example: The sample mean x̄ is consistent for μ. That means that, as the sample size gets large, then X̄ gets very close to μ in the sense of probability. What are the Stat 330 (Spring 2015): slide set 26 Review: What is an estimator? What are estimates? properties we use to compare estimators? Estimators (Cont’d) Stat 330 (Spring 2015): slide set 26 We know that 1 (xi − x̄)2 20 6 How to resolve this 7 Motivation: By changing the value for θ we can ”move the density function fθ around” (below), so that the density function fits the data best. Situation and motivation: We have n data values (some numbers, e.g.), and assume that these data values are realizations of n i.i.d. random variables X1, . . . , Xn with distribution Fθ . Unfortunately the value for θ is unknown. Estimating parameters: Maximum likelihood estimation (MLE) Stat 330 (Spring 2015): slide set 26 so that σˆ2 = 196.94. That is an estimate of σ 2 is obtained directly. μ2 = m2 ⇔ σ 2 = E((X − μ)2) = μ2 = m2 ⇔ σˆ2 = m2 − μ21. If using μ1 = 100 then σˆ2 = 9879.5 − 1002 < 0!! problem? Using central moment, so that xi) = 0.16393 Example 3: Assume that IQ score follows a normal distribution. Suppose we already know that the mean IQ score of human population is μ = 100 and would like to estimate the variance σ 2 based on the IQ score of randomly selected 20 people: 92, 115, 103, 81, 107, 95, 92, 118, 99, 124, 90, 87, 108, 103, 91, 74, 84, 124, 81, 100 i=1 20 Stat 330 (Spring 2015): slide set 26 1 2 m2 = x , for given realization m2 = 9879.5 20 i=1 i 20 20 1 xi since μ1 = E(X) = 1/θ 20 i=1 so that for given realization xi, θ̂ = 20/( μ1 = m1 ⇐⇒ 1/θ = m1 = 5 Solution: Only one parameter is unknown, σ . μ2 = E(X 2) = σ 2 + E(X)2 = σ 2 + μ21 and 2 Stat 330 (Spring 2015): slide set 26 Solution: X is the number of trials needed for getting a 6 and x is the value of X for each realization. This is a geometric random variable with parameter θ, i.e. P (X = x) = θ(1 − θ)x−1. The method of moment gives equation 4 Estimate the probability of getting 6, say θ when rolling a die using method of moments. 3, 9, 1, 6, 11, 10, 4, 1, 1, 4, 4, 10, 17, 1, 23, 3, 2, 7, 2, 3 Example 2: (Roll a die) A die is rolled until a face shows a 6. Repeating this experiment 20 times, and the number of trials needed to show a 6 are which is the method of moment estimator (short as MoM) of λ. λ̂ = x̄ Example 1: To estimate the parameter λ of Poisson distribution, we need to set μ1 = m1 ⇐⇒ λ = m1 = x̄ since μ1 = E(X) = λ Only one unknown is there; solving for λ we obtain Estimating parameters: Examples Stat 330 (Spring 2015): slide set 26 = i=1 i=1 n fθ (xi) 8 5. Solve for θ. 4. Set derivative to zero. 3. Differentiate log-Likelihood function with respect to θ. 2. Get natural log of Likelihood function l(θ) = log L(θ). 1. Find Likelihood function L(θ). ♣ Five steps for finding MLE: 10 (d) Very often, it is difficult to find a derivative of L(θ) - instead we use another trick, and find a maximum for l(θ) = log L(θ), the Log-Likelihood function. Stat 330 (Spring 2015): slide set 26 Pθ (X1 = x1) · Pθ (X2 = x2) · . . . · Pθ (Xn = xn) n Pθ (Xi = xi) Pθ (X1 = x1 ∩ X2 = x2 ∩ . . . ∩ Xn = xn) n i.e. θ̂ is the maximizer of i=1 fθ (xi). n ♠ Summary: θ̂ = argmaxθ i=1 fθ (xi) or = = Xi are independent! Sketch of idea: ”Most likely” is equivalent to maximize Principle: since we do not know the true value θ of the distribution, we take that value θ̂ that most likely produced the observed values x1, · · · , xn Stat 330 (Spring 2015): slide set 26 Examples: MLE 1 18 2 20 3 8 4 9 5 9 6 5 7 8 8 3 9 5 11 3 14 3 15 3 16 1 17 1 20 1 21 1 27 1 29 1 L(p) = i=1 100 fp(xi) = i=1 100 (1 − p) xi −1 p=p 100 · i=1 100 (1 − p)xi−1 11 Step 1: Find the likelihood function L(θ): Since we have observed 100 100 outcomes x1, ..., x100, the likelihood function L(p) = i=1 p(xi), ♣ For a fair die, p is 1/6, and the Geometric distribution has probability mass function p(x) = (1 − p)x−1 · p. ♣ Recall: We know, that X, the number of rolls until a 6 shows up has a geometric distribution Geop and x in the table is its realization. x # Example (Rolling dies): Find the MLE for the probability of getting a face 6, θ, in the ”Roll a Die” example: repeating this experiment 100 times gave the following results, Stat 330 (Spring 2015): slide set 26 9 (c) How do we get a maximum of L(θ)? Differentiate it and set it to zero, or try to find the θ to maximize the likelihood directly. (b) The maximum likelihood estimator of θ is the maximizer of likelihood L(θ) on its parameter space Θ ♠ L(θ) is called the Likelihood function of X1, · · · , Xn. The above function is a function of θ (a) The joint probability density function (for continuous random variables) or joint probability mass function (for discrete random variables) for a randomm sample X1, · · · , Xn is defined as n n L(θ) = L(θ; X1 = x1, · · · , Xn = xn) = i=1 Pθ (xi) or i=1 fθ (xi) Maximum likelihood: Estimating parameters: Maximum likelihood estimation · (1 − p) i=1 (xi −1) 100 =p 100 100 · (1 − p) i=1 1 100 . i=1 xi . 1 1−p 12 14 Stat 330 (Spring 2015): slide set 26 xi − 100 1 100 i=1 100 100 = 0.1710 568 = for the given xi from the above table. p̂ = ♠ In total, we have an estimate i=1 xi 100 p̂ = 100 d 100 d l(p) = log L(p) = − dp dp p Step 3: Differentiate log-Likelihood with respect to p: so that i=1 xi −100 Stat 330 (Spring 2015): slide set 26 100 l(p) = log L(p) = log p100 · (1 − p) i=1 xi−100 100 = log p100 + log (1 − p) i=1 xi−100 100 xi − 100 log(1 − p). = 100 log p + Step 2: log of Likelihood function log L(p): =p 100 1 p(1 − p) 100 − p i=1 100 i=1 xi . 100(1 − p) − 100 xi − 100 p Stat 330 (Spring 2015): slide set 26 1 p̂(1 − p̂) Step 5: Solve for p̂: 100 − p̂ i=1 100 xi = 0 ⇐⇒ 100 − p̂ i=1 100 xi = 0 13 Step 4: Set derivative to zero: For the estimate p̂ the derivative must be zero: 100 1 d 100 − p̂ xi = 0 log L(p̂) = 0 ⇐⇒ dp p̂(1 − p̂) i=1 = = 1 p(1 − p)