Statistics: Using Probability to Learn from Data A Summary of Chapter 6 April 28, 2009 Data Generating Mechanisms and Inference 1. Nature - true (unknown) parameter θ Example: Two methods for manufacturing computer chips. p1 = Probability of a defect from method 1 p2 = Probability of a defect from method 2 2. Data: X1 , . . . , Xn ∼ iidFx (x; θ) Example: Manufacture n chips using each of the two methods X = Number defective from method 1 X ∼ Binomial(n, p1 ) Y = Number defective from method 2 Y ∼ Binomial(n, p2 ) 3. Use probability to evalueate what values of θ are plausible based on the data. Example: What are our estimates of p1 and p2 ? What are plausible values for the difference p1 − p2 ? Are the data consistent with the hypotheses that the two methods are the same? i.e., Are the data consistent with the hypothesis that p1 = p2 ? () Chapter 6 Summary April 28, 2009 2 / 11 Three Main Themes We don’t know θ, but θ is the answer to our question. We can use the data to learn about θ 1. Estimate θ – Use the data to make an intelligent guess about the value of θ 2. Make a confidence interval for θ – Give a range of values of θ that are plausible, based on the data 3. Test a hypothesis about θ – Are the data consistent with the hypotheses that θ = θo , or should we reject the hypothesis θ = θo ? () Chapter 6 Summary April 28, 2009 3 / 11 Estimation of θ An estimator of θ is a function of the data: θ̂ = θ̂(X1 , . . . , Xn ) Properties of Estimators Unbiased: E[θ̂] = θ Consistent: For a large sample size, P (|θ̂ − θ| > ) ≈ 0 2 · Asymptotically Normal: θ̂ ∼ N (θ, σn ) Comment: An estimator is a random variable. The distribution of an estimator is called the sampling distribution. () Chapter 6 Summary April 28, 2009 4 / 11 Estimation of θ Example: Estimate the probability of a defect from method 1. Data: X1 , . . . Xn ∼ iid Bernoulli(p1 ) ( 1 if defect from method 1 Xi = 0 otherwise Estimator of p1 : n p̂1 = # defective 1X Xi = n i=1 n p̂1 is unbiased: E[p̂1 ] = p1 Sampling Distribution: By CLT, p1 (1 − p1 ) · p̂1 ∼ N p1 , n () Chapter 6 Summary April 28, 2009 5 / 11 How to Estimate θ? Method of Maximum Likelihood Let X1 , . . . , Xn be discrete. For observations, X1 = x1 , . . . , Xn = xn , the Maximum Likelihood Estimator (MLE) is the value θ̂ such that P (X1 = x1 , . . . , Xn = xn ; θ̂) ≥ P (X1 = x1 , . . . , Xn = xn ; θ) for any θ. () Chapter 6 Summary April 28, 2009 6 / 11 How to Estimate θ? Example: Estimate the probability of a defect from method 1. Data: X1 , . . . Xn ∼ iid Bernoulli(p1 ) ( 1 if defect from method 1 Xi = 0 otherwise Estimator of p1 : n 1X # defective p̂1 = Xi = n i=1 n Exercise: Show that p̂1 is the maximum likelihood estimator of p1 . () Chapter 6 Summary April 28, 2009 7 / 11 Confidence Interval for θ What is a range of plausible values for θ? A 95% confidence interval for θ is an interval of the form i h θ̂ − δ, θ̂ + δ , such that P (θ̂ − δ < θ < θ̂ − δ) = 95% Comments: What’s random? The interval is random. What’s fixed? The true, unknown θ is fixed. I construct an interval that has a high (i.e., 95%) chance of capturing the true θ. If I repeat the experiment 100 times, I expect 95% of the confidence intervals will contain the true θ. A confidence interval is called conservative if P (θ̂ − δ < θ < θ̂ − δ) ≥ 95% () Chapter 6 Summary April 28, 2009 8 / 11 Confidence Interval for θ Example: Estimate the probability of a defect from method 1. Estimator of p1 : n p̂1 = 1X # defective Xi = n i=1 n Use the CLT to show that for large n, a conservative 95% confidence interval for p1 is 1 1 p̂1 − 1.96 √ , p̂1 + 1.96 √ . 2 n 2n Approximately 95% of confidence intervals constructed in this way will contain the true p1 √ ≈ √1 is called the margin of error. This is why 21.96 n n As we collect more data, n increases, and we get a narrower interval. We have a more precise estimator of p1 when we have more data. () Chapter 6 Summary April 28, 2009 9 / 11 Test a Hypothesis Null Hypothesis: Ho : θ = θo Alternative Hypothesis: H1 : θ 6= θo Test Procedure: Reject the null hypothesis in favor of the alternative if P (Observed data | Ho is true ) < .05 If the observed data are highly unlikely when Ho is true, then we have grounds for rejecting Ho in favor of H1 . () Chapter 6 Summary April 28, 2009 10 / 11 Test a Hypothesis Example: I give you a coin to toss, and I tell you that the coin is fair. Do you believe me? Let p = Prob(head). Hypotheses: Ho : Coin fair ↔ p = 21 . H1 : Coin unfair ↔ p < 21 or p > 12 . Data: You toss coin 20 times and get 18 heads and 2 tails! Test procedure: If the coin is fair (i.e., p = 12 ), then P (18 or more H 0 s) = 1 − Binn=20,p=0.5 (17) = 0.002 P (2 or fewer H 0 s) = Binn=20,p=0.5 (2) = 0.002. P ( outcome as extreme as 18 H’s and 2 T’s) = 0.002 + 0.002 = 0.004 Conclusion: If the coin is fair, then you got extremely lucky. You observed a very unusual outcome. On the basis of these results, we reject the hypothesis that the coin is fair. () Chapter 6 Summary April 28, 2009 11 / 11