ortion

advertisement
Stat 330 (Spring 2015): slide set 28
p̂ ± zα/2
p̂(1 − p̂)
= 0.6 ± 1.96
n
0.6 · 0.4
= 0.6 ± 0.096.
100
For the confidence interval using substitution we get:
1
1
= 0.6 ± 0.098.
p̂ ± zα/2 √ = 0.6 ± 1.96 √
2 n
2 · 100
2
♠ For a 95% confidence interval, zα/2 = z(1−0.95)/2 = Φ−1(0.975) = 1.96.
The conservative confidence interval is:
♠ If 60 out of 100 simulations showed a free server, we can use p̂ =
60/100 = 0.6 as an estimate for p, or, use the conservative one p̂ = 0.5.
Example 2: Suppose that we are interested in the large time probability p
that a server is available. Doing 100 simulations has shown, that in 60 of
them a server was available at time t = 1000 hrs. What is a 95% confidence
interval for this probability?
Last update: April 8, 2015
Stat 330 (Spring 2015)
Slide set 28
Stat 330 (Spring 2015): slide set 28
√
zα/2
2.33
√ ≤ 0.01 ⇐⇒ n ≥
= 116.5
2 · 0.01
2 n
Two Populations
σ12
n1
σ2
+ n22
(when σ12, σ22 unknown, substitute
s21, s22 respectively )
x̄1 − x̄2 ± zα/2
large n confidence interval for
μ1 − μ2 (based on independent
X̄1 and X̄2)
zα/2
2
1
n1
p̂1 (1−p̂1 )
n1
3
p̂2 )
+ p̂2(1−
n2
+ n12 (conservative)
p̂1 − p̂2 ± zα/2
(substitution)
or
p̂1 − p̂2 ±
large n confidence interval for p1 − p2
(based on independent p̂1 and p̂2)
♠ Confidence intervals are summarized below:
♠ X̄1 − X̄2 and p̂1 − p̂2 are the unbiased estimators for μ1 − μ2 and p1 − p2,
respectively.
CI for mean difference μ1 − μ2, or, the difference of two proportions, p1 − p2
Stat 330 (Spring 2015): slide set 28
1
so that n ≥ 13573.
z 2
, Δ is the half of the desired size of confidence interval.
♠ n ≥ 0.25 α/2
Δ
2e ≤ 0.02 ⇐⇒
♠ Using the second definition: recall that P (|θ̂ − θ| < e) ≥ 1 − α is the
second definition, we have
♠ The size of CI is actually 0.02 to satisfy the desire condition (W.L.O.G.,
we choose a conservative confidence interval for easy computation):
Question: We want to get a 98% confidence interval and wish to estimate
the quantity to within 0.01. How many samples we need?
Example 1: Suppose we want to estimate the fraction of records in the
2000 IRS data base that have a taxable income over 35K.
Examples for CI of proportion
Stat 330 (Spring 2015): slide set 28
♠
37000 − 42000 ± zα/2
0.05 · 0.95 0.10 · 0.90
+
= −0.05 ± 0.086
100
100
6
♥ This calculation tells us, that based on these sample sizes, we don’t even
have a solid idea about the sign of p1 − p2, i.e. we can’t tell which of the
two pi is larger.
0.05 − 0.1 ± 2.33 ·
♥ Use:
Example 2: Two different digital communication systems send 100 large
messages via each system and determine how many are corrupted in
transmission, p̂1 = 0.05 and pˆ2 = 0.10. What’s the difference in the
corruption rates? Find a 98% confidence interval.
♠ However, if it contained the 0, the message wouldn’t be so clear.
income is higher than the mean East Coast taxable income (in the report
from 2000). The interval contains only negative numbers
Stat 330 (Spring 2015): slide set 28
• See pages below for a note on the t-distribution.
This is helpful when sample size n is small, since the CLT does not apply.
s
x̄ ± tn−1, α/2 · √ .
n
• The resulting 100 × (1 − α)% confidence interval for μ is
7
• If the standard deviation σ is unknown, but sample X1, ..., Xn can be
assumed to come from a Normal distribution, then instead of using zα/2, we
may use tn−1, α/2 which is the corresponding percentile of the t distribution
with n − 1 degrees of freedom
Single Population:
Small samples when the standard deviation σ is unknown
Stat 330 (Spring 2015): slide set 28
5
n2
♠ We can, for example, compute a 2 sided 95% confidence interval for
μ1 − μ2 = difference in mean taxable income as reported from 2000 tax
return between East and West Coast as following:
4
+
σ22
Example 1: Assume, we have two parts of the IRS database: East Coast
and West Coast. We want to compare the mean taxable income between
reported from the two regions in 2000.
East Coast
West Coast
# of sampled records:
n1 = 1000
n2 = 2000
mean taxable income: x̄1 = $37200 x̄2 = $42000
standard deviation: s1 = $10100 s2 = $15600
101002 156002
+
= −5000 ± 927
1000
2000
♠ Note: this shows pretty conclusively that the mean West Coast taxable
n1
σ12
Stat 330 (Spring 2015): slide set 28
Two Populations: Example
3. Then we can use the similar arguments as before and get a C.I. for
μ1 − μ2 as shown above.
V ar[X̄1 − X̄2] = V ar[X̄1] + (−1)2V ar[X̄2] =
E[X̄1 − X̄2] = E[X̄1] − E[X̄2] = μ1 − μ2
2. X̄i ∼ N (μi, σi2/ni) for i = 1, 2
1. X̄1 − X̄2 is approximately normal, since X̄1 and X̄2 are approximately
normal, with (X̄1, X̄2 are independent)
Derivation: The arguments are very similar in both cases - we will only
discuss the confidence interval for the difference between the two means.
Two Populations: Simple derivation
3. Chi-square distribution is a special case of the Gamma distribution.
10
1. Z ∼ N (0, 1) and V ∼ χ2ν (a chi-square distribution with ν degrees of
freedom).
2. Z and V are independent.
ν=
s21
n
s41
n2 (n−1)
s2
s4
2
2
+ m2(m−1)
+ m2
.
s21 s22
+ ,
n m
11
• Read percentiles of the t distribution from Table A5 of Baron’s textbook.
• Observe that as the degrees of freedom increases the shape of the
t-ditribution tends towards that of the Normal distribution.
• The diagram shows the probability density function of this distribution
for several different degrees of freedom:
Stat 330 (Spring 2015): slide set 28
where
x̄1 − x̄2 ± tν,α/2
Stat 330 (Spring 2015): slide set 28
is said to have a t distribution with ν degress of freedom for random
variables Z and V such that:
Z
T =
V /ν
• A random variable T that is of form
Stat 330 (Spring 2015): slide set 28
The 100 × (1 − α)% confidence interval for θ = σ1 − σ2 is
assume unequal variances σ12 = σ22
9
•
8
1
1
+ ,
n m
(n − 1)s21 + (m − 1)s22
n+m−2
t distribution
s2p =
where s2p is the pooled variance calculated as
x̄1 − x̄2 ± tn+m−2,α/2 · sp
The 100 × (1 − α)% confidence interval for θ = μ1 − μ2 is
assume equal variances σ12 = σ22 = σ 2
• Population variances σ12 and σ22 are unknown.
Two Populations:
•
Stat 330 (Spring 2015): slide set 28
Small samples; σ is unknown: continued...
Download