n Stat 330 (Spring 2015): slide set 27 2 √ 2πσ 2 1 e μ̂ = 2 − i=1 n 1 and setting ♠ However, bias does not ruin MLE’s because of other nice features like small MSE etc. 3 2 ♠ Do you find something different? σˆ2 = s2 (divisor is n − 1 not n)! the MLE is biased! n 1 (xi − μ̂)2 σˆ2 = n i=1 (xi −μ)2 2σ 2 Stat 330 (Spring 2015): slide set 27 = (2πσ 2)n/2 · e 1 xi = x̄, n i=1 n (x −μ) − i 2 2σ log L(μ, σ ) = 0 gives i=1 n ♦ Plugging this value into the derivative for σ 2 d log L(μ̂, σ 2) = 0 gives dσ 2 ♦ Setting d dμ L(μ, σ 2) = ♦ Since we have values from n independent variables, the Likelihood function is a product of n densities: ♥ What is the pdf of normal random variable? x1, · · · , xn are the data/sample values of X1, · · · , Xn Example: Let X1, . . . , Xn be i.i.d N (μ, σ 2), both μ and σ 2 are unknown. ♣ θ may be multiple: Θ ⊂ Rp with p > 1 Stat 330 (Spring 2015): slide set 27 ♦ Need find values for μ and σ 2, that yield zeros for both derivatives at the same time n n1 1 ∂ log L(μ, σ 2) = − 2 + (xi − μ)2 2 ∂σ 2σ 2(σ 2)2 i=1 n n 1 1 ∂ (xi − μ) · (−2) = 2 (xi − μ) log L(μ, σ 2) = 0 − 2 · ∂μ 2σ i=1 σ i=1 ♦ Since we have now two parameters, μ and σ 2, we need to get 2 partial derivatives of the log-Likelihood: n 1 l(μ, σ 2) = log L(μ, σ 2) = − ln(2πσ 2) − 2 (xi − μ)2 2 2σ i=1 ♦ Log-Likelihood is: Last update: April 1, 2015 Stat 330 (Spring 2015) Slide set 27 Review: What is MLE? How to find it (5 steps)? Example for MLE: Stat 330 (Spring 2015): slide set 27 Interpretation of confidence intervals (CI): ♠ This is WRONG! (2, 8) is a fixed interval, it either contains or not contain θ (in other word, the probability for θ in this particular interval is either 0 or 1) 6 (b) The probability associated with a confidence interval may also be considered from a pre-experiment point of view, in the same context in which arguments for the random allocation of treatments to study items are made. Here the experimenter sets out the way in which they intend (a) The confidence interval can be expressed in terms of samples (or repeated samples): ” Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter 90% of the time” 7 to calculate a confidence interval and know, before they do the actual experiment, that the interval they will end up calculating has a certain chance of covering the true bun unknown value. This is very similar to the ”repeated sample” interpretation above, except that it avoids replying on considering hypothetical repeats of a sampling procedure that may not be repeatable in any meaningful sense. Stat 330 (Spring 2015): slide set 27 θ∈Θ Stat 330 (Spring 2015): slide set 27 for C.C.). ♠ Remark 4: min Pθ (θ ∈ (L, U )) is also called confidence coefficients (short ♠ Remark 3: The true value θ is either within the confidence interval or not. ♠ Remark 2: In the above definitions, α is a value near 0. ♠ Remark 1: L, U in the first definition are functions of sample, that is, L = L(X1, · · · , Xn) and U = U (X1, · · · , Xn). In other words, L, U are random variables. 5 Wrong interpretation: I computed a 95% CI, it is (2, 8), so I can say my parameter θ must be in this interval with probability 95%. Right interpretation: Stat 330 (Spring 2015): slide set 27 we say, that the interval (θ̂ − e, θ̂ + e) is an (1 − α) · 100% confidence interval of θ, 2e is the size of confidence interval. P (|θ̂ − θ| < e) ≥ 1 − α Equivalent definition: Let θ̂ be an estimate of θ. If 4 The coverage probability 1 − α is called a confidence level. P (L < θ < U ) = 1 − α. Definition: An interval (L, U ) is an (1 − α) · 100% confidence interval for the parameter θ if it contains the parameter with probability (1 − α) Further thoughts: Instead of just looking at the point estimate, we will now try to compute an interval around the estimated parameter value, in which the true parameter is ”likely” to fall. An interval like that is called confidence interval. Motivations: The last lectures have provide a way to compute point estimate for parameters. Based on that, it is natural to ask ”how good is this point estimate?” Or ”how close is the estimate to the true value of the parameter?” Topic 2: Confidence intervals: Stat 330 (Spring 2015): slide set 27 zα/2 1.65 1.96 α 0.02 0.01 zα/2 2.33 2.58 Some useful critical values, zα/2, Stat 330 (Spring 2015): slide set 27 10 Example 2: Suppose, we want to analyze some complicated queueing system, for which we have no formulas and theory. We are interested in the ♥ The 95% confidence interval is then: 21543 ± 588 i.e. if we repeat this study 100 times (with 100 different employees each time), can say: in 95 out of 100 studies, the true parameter μ falls into a 588 range around x̄. where Φ−1(1 − α/2) = Φ−1(1 − 0.05/2) = Φ−1(0.975) = 1.96 ♥ By using the above expression, we get a 95% confidence interval as: α 3000 21543 ± Φ−1 1 − ·√ = 21543 ± Φ−1(0.975) · 300 2 100 in 50th simulation length of x̄ = 21.5 and s = in 1st simulation in 2nd simulation = (17.9998, 25.0002) 11 ♠ A 90% confidence interval is given as (α = 1 − 0.9 = 0.1) 15 s s 15 x̄ − z0.05 · √ , x̄ + z0.05 · √ = 21.5 − 1.65 · √ , 21.5 + 1.65 · √ n n 50 50 ♠ After 50 simulations, we have got data: x1 = number in queue at time 1000 hrs x2 = number in queue at time 1000 hrs ... x50 = number in queue at time 1000 hrs ♠ observations yield an average queue Our n 1 2 i=1 (xi − x̄) = 15. n−1 ♠ The only thing possible for us is to run simulations of this system and look at the queue length at some large time t, e.g. t = 1000 hrs. mean queue length of the system after reaching steady state. Stat 330 (Spring 2015): slide set 27 ♠ Recall Φ−1(1 − α/2) is defined by P (Z ≤ zα/2) = 1 − α/2 α 0.1 0.05 ♠ How to find Φ−1(p) from table? depending on α are: 9 Example 1: Suppose, we want to find a 95% confidence interval for the mean salary of an ISU employee. A random sample of 100 ISU employees gives us a sample mean salary of 21543 = x̄. Suppose, the standard deviation of salaries is known to be 3000. Examples for CI: Stat 330 (Spring 2015): slide set 27 Large sample CI of μ: An (1 − α)· 100% confidence interval for μ is given as σ σ √ √ , x̄ + zα/2 · x̄ − zα/2 · n n −1 −1 where zα/2 = Φ (1 − α/2), −zα/2 = Φ (α/2). n 1 2 ♠ In practice, we plug-in s = n−1 i=1 (xi − x̄) for σ, when a value for it is not available. 8 ♠ By CLT, X̄ is an approximately normal distributed random variable with 2 X̄−μ √ ∼ E[X̄] = μ and V ar[X̄] = σn , i.e. X̄ ∼ N (μ, σ 2/n) then Z := σ/ n N (0, 1) = Φ ♠ X̄ is an unbiased estimator for μ ♠ Situation: we have a large set of observed values (n > 30, usually) x1, · · · , xn, that these values are realizations of i.i.d X1, . . . , Xn with E[Xi] = μ and V ar[Xi] = σ 2. Large sample CI for μ: ♥ However, we do not know the true value. For different parameters, we have different method to construct CI. ♠ If the true value of the parameter lies outside the 90% CI once it has been calculated, then an even has occurred which had a probability of 10% (or less) of happening by chance Construct CI: Stat 330 (Spring 2015): slide set 27 p(1−p) n . Example: i=1 Stat 330 (Spring 2015): slide set 27 12 14 ♥ There are not enough evidences to allow the conclusion that the true average is NOT 0.3, namely, 0.3 could be the true average in 2002. ♠ The substitution method gives a slightly smaller confidence interval, but both intervals contain 0.3. ♠ Compute a 95% CI for the true batting average to study the above question. √ ♥ Conservative Method gives: 0.288 ± 1.96/(2 555) ⇒ 0.288 ± 0.042 ♥ Substitution Method gives: 0.288 ± 1.96 · 0.288(1 − 0.288)/555 ⇒ 0.288 ± 0.038 Example: In the 2002 season the baseball player Sammy Sosa had a batting average of 0.288. (The batting average is the ratio of the number of hits and the times at bat.) Sammy Sosa was at bats 555 times in the 2002 season. Could the ”true” batting average still be 0.300? p(1 − p)/n 2. conservative C.I. are larger than C.I. found by substitution. 13 ♦ Remarks: ♥ Recall: Bernoulli random variable: E(X) = p and Var(X) = p(1 − p) 1. for large n or p̂ is close to 0.5, there is almost no difference at all 2. Substitution method: using p̂ instead of p, so that (1 − α) · 100% CI for p is p̂ ± zα/2 p̂(1 − p̂)/n. ♠ This corresponds to a Bernoulli-n-sequence, where we are only interested in the number of successes, X, which in our case corresponds to the number of individuals that qualify for the interesting subgroup. ♠ Unbiased estimator of p is p̂ = X̄ whose value is p̂ = x̄. n E(Xi) = p, and Var(X̄) = n1 Var(X1) = ♥ X1, · · · , Xn i.i.d., E(X̄) = n1 1. Conservative√ method: p := 0.5, so that (1 − α) · 100% CI for p is p̂ ± zα/2/(2 n) (why conservative?). ♦ In the above expression, we need use an appropriate value to substitute p: ♥ Using C.L.T., the (1 − α) · 100% CI is p̂ ± e where e = zα/2 · Stat 330 (Spring 2015): slide set 27 ♠ In order to get an estimate for this proportion, we can take a sample of n individuals from the population and check each one of them, whether or not they fulfill the criterion to be in that proportion of interest. ♠ p denote a proportion of a large population or a probability Large sample CI for proportion p: Construct CI (Cont’d):