Statistics: Using Probability to Learn from Data April 28, 2009

advertisement
Statistics: Using Probability to Learn from Data
A Summary of Chapter 6
April 28, 2009
Data Generating Mechanisms and Inference
1. Nature - true (unknown) parameter θ
Example: Two methods for manufacturing computer chips.
p1 = Probability of a defect from method 1
p2 = Probability of a defect from method 2
2. Data: X1 , . . . , Xn ∼ iidFx (x; θ)
Example: Manufacture n chips using each of the two methods
X = Number defective from method 1
X ∼ Binomial(n, p1 )
Y = Number defective from method 2
Y ∼ Binomial(n, p2 )
3. Use probability to evalueate what values of θ are plausible based on
the data.
Example:
What are our estimates of p1 and p2 ?
What are plausible values for the difference p1 − p2 ?
Are the data consistent with the hypotheses that the two methods are
the same?
i.e., Are the data consistent with the hypothesis that p1 = p2 ?
()
Chapter 6 Summary
April 28, 2009
2 / 11
Three Main Themes
We don’t know θ, but θ is the answer to our question. We can use
the data to learn about θ
1. Estimate θ – Use the data to make an intelligent guess about the value
of θ
2. Make a confidence interval for θ – Give a range of values of θ that are
plausible, based on the data
3. Test a hypothesis about θ – Are the data consistent with the
hypotheses that θ = θo , or should we reject the hypothesis θ = θo ?
()
Chapter 6 Summary
April 28, 2009
3 / 11
Estimation of θ
An estimator of θ is a function of the data:
θ̂ = θ̂(X1 , . . . , Xn )
Properties of Estimators
Unbiased: E[θ̂] = θ
Consistent: For a large sample size, P (|θ̂ − θ| > ) ≈ 0
2
·
Asymptotically Normal: θ̂ ∼ N (θ, σn )
Comment: An estimator is a random variable. The distribution of an
estimator is called the sampling distribution.
()
Chapter 6 Summary
April 28, 2009
4 / 11
Estimation of θ
Example: Estimate the probability of a defect from method 1.
Data: X1 , . . . Xn ∼ iid Bernoulli(p1 )
(
1 if defect from method 1
Xi =
0 otherwise
Estimator of p1 :
n
p̂1 =
# defective
1X
Xi =
n i=1
n
p̂1 is unbiased: E[p̂1 ] = p1
Sampling Distribution: By CLT,
p1 (1 − p1 )
·
p̂1 ∼ N p1 ,
n
()
Chapter 6 Summary
April 28, 2009
5 / 11
How to Estimate θ?
Method of Maximum Likelihood
Let X1 , . . . , Xn be discrete. For observations, X1 = x1 , . . . , Xn = xn ,
the Maximum Likelihood Estimator (MLE) is the value θ̂ such that
P (X1 = x1 , . . . , Xn = xn ; θ̂) ≥ P (X1 = x1 , . . . , Xn = xn ; θ)
for any θ.
()
Chapter 6 Summary
April 28, 2009
6 / 11
How to Estimate θ?
Example: Estimate the probability of a defect from method 1.
Data: X1 , . . . Xn ∼ iid Bernoulli(p1 )
(
1 if defect from method 1
Xi =
0 otherwise
Estimator of p1 :
n
1X
# defective
p̂1 =
Xi =
n i=1
n
Exercise: Show that p̂1 is the maximum likelihood estimator of p1 .
()
Chapter 6 Summary
April 28, 2009
7 / 11
Confidence Interval for θ
What is a range of plausible values for θ?
A 95% confidence interval for θ is an interval of the form
i
h
θ̂ − δ, θ̂ + δ ,
such that
P (θ̂ − δ < θ < θ̂ − δ) = 95%
Comments:
What’s random? The interval is random.
What’s fixed? The true, unknown θ is fixed.
I construct an interval that has a high (i.e., 95%) chance of capturing
the true θ.
If I repeat the experiment 100 times, I expect 95% of the confidence
intervals will contain the true θ.
A confidence interval is called conservative if
P (θ̂ − δ < θ < θ̂ − δ) ≥ 95%
()
Chapter 6 Summary
April 28, 2009
8 / 11
Confidence Interval for θ
Example: Estimate the probability of a defect from method 1.
Estimator of p1 :
n
p̂1 =
1X
# defective
Xi =
n i=1
n
Use the CLT to show that for large n, a conservative 95% confidence
interval for p1 is
1
1
p̂1 − 1.96 √ , p̂1 + 1.96 √
.
2 n
2n
Approximately 95% of confidence intervals constructed in this way will
contain the true p1
√ ≈ √1 is called the margin of error.
This is why 21.96
n
n
As we collect more data, n increases, and we get a narrower interval.
We have a more precise estimator of p1 when we have more data.
()
Chapter 6 Summary
April 28, 2009
9 / 11
Test a Hypothesis
Null Hypothesis: Ho : θ = θo
Alternative Hypothesis: H1 : θ 6= θo
Test Procedure: Reject the null hypothesis in favor of the alternative
if
P (Observed data | Ho is true ) < .05
If the observed data are highly unlikely when Ho is true, then we have
grounds for rejecting Ho in favor of H1 .
()
Chapter 6 Summary
April 28, 2009
10 / 11
Test a Hypothesis
Example: I give you a coin to toss, and I tell you that the coin is fair.
Do you believe me? Let p = Prob(head).
Hypotheses:
Ho : Coin fair ↔ p = 21 .
H1 : Coin unfair ↔ p < 21 or p > 12 .
Data: You toss coin 20 times and get
18 heads and 2 tails!
Test procedure: If the coin is fair (i.e., p = 12 ), then
P (18 or more H 0 s)
=
1 − Binn=20,p=0.5 (17) = 0.002
P (2 or fewer H 0 s)
=
Binn=20,p=0.5 (2) = 0.002.
P ( outcome as extreme as 18 H’s and 2 T’s) = 0.002 + 0.002 = 0.004
Conclusion: If the coin is fair, then you got extremely lucky. You
observed a very unusual outcome. On the basis of these results, we
reject the hypothesis that the coin is fair.
()
Chapter 6 Summary
April 28, 2009
11 / 11
Download