Slide set 28 Stat 330 (Spring 2015) Last update: April 8, 2015

advertisement
Slide set 28
Stat 330 (Spring 2015)
Last update: April 8, 2015
Stat 330 (Spring 2015): slide set 28
Examples for CI of proportion
Example 1: Suppose we want to estimate the fraction of records in the
2000 IRS data base that have a taxable income over 35K.
Question: We want to get a 98% confidence interval and wish to estimate
the quantity to within 0.01. How many samples we need?
♠ The size of CI is actually 0.02 to satisfy the desire condition (W.L.O.G.,
we choose a conservative confidence interval for easy computation):
♠ Using the second definition: recall that P (|θ̂ − θ| < e) ≥ 1 − α is the
second definition, we have
√
zα/2
2.33
2e ≤ 0.02 ⇐⇒ √ ≤ 0.01 ⇐⇒ n ≥
= 116.5
2 · 0.01
2 n
so that n ≥ 13573.
z 2
♠ n ≥ 0.25 α/2
, ∆ is the half of the desired size of confidence interval.
∆
1
Stat 330 (Spring 2015): slide set 28
Example 2: Suppose that we are interested in the large time probability p
that a server is available. Doing 100 simulations has shown, that in 60 of
them a server was available at time t = 1000 hrs. What is a 95% confidence
interval for this probability?
♠ If 60 out of 100 simulations showed a free server, we can use p̂ =
60/100 = 0.6 as an estimate for p, or, use the conservative one p̂ = 0.5.
♠ For a 95% confidence interval, zα/2 = z(1−0.95)/2 = Φ−1(0.975) = 1.96.
The conservative confidence interval is:
1
1
p̂ ± zα/2 √ = 0.6 ± 1.96 √
= 0.6 ± 0.098.
2 n
2 · 100
For the confidence interval using substitution we get:
r
p̂ ± zα/2
p̂(1 − p̂)
= 0.6 ± 1.96
n
r
0.6 · 0.4
= 0.6 ± 0.096.
100
2
Stat 330 (Spring 2015): slide set 28
Two Populations
CI for mean difference µ1 − µ2, or, the difference of two proportions, p1 − p2
♠ X̄1 − X̄2 and p̂1 − p̂2 are the unbiased estimators for µ1 − µ2 and p1 − p2,
respectively.
♠ Confidence intervals are summarized below:
large n confidence interval for
µ1 − µ2 (based on independent
X̄1 and X̄2)
x̄1 − x̄2 ± zα/2
q
σ12
n1
+
σ22
n2
(when σ12, σ22 unknown, substitute
s21, s22 respectively )
large n confidence interval for p1 − p2
(based on independent p̂1 and p̂2)
p̂1 − p̂2 ±
zα/2
2
q
1
n1
+ n12 (conservative)
or
p̂1 − p̂2 ± zα/2
(substitution)
q
p̂1 (1−p̂1 )
n1
p̂2 )
+ p̂2(1−
n2
3
Stat 330 (Spring 2015): slide set 28
Two Populations: Simple derivation
Derivation: The arguments are very similar in both cases - we will only
discuss the confidence interval for the difference between the two means.
1. X̄1 − X̄2 is approximately normal, since X̄1 and X̄2 are approximately
normal, with (X̄1, X̄2 are independent)
2. X̄i ∼ N (µi, σi2/ni) for i = 1, 2
E[X̄1 − X̄2] = E[X̄1] − E[X̄2] = µ1 − µ2
σ12 σ22
+
V ar[X̄1 − X̄2] = V ar[X̄1] + (−1) V ar[X̄2] =
n 1 n2
2
3. Then we can use the similar arguments as before and get a C.I. for
µ1 − µ2 as shown above.
4
Stat 330 (Spring 2015): slide set 28
Two Populations: Example
Example 1: Assume, we have two parts of the IRS database: East Coast
and West Coast. We want to compare the mean taxable income between
reported from the two regions in 2000.
East Coast
West Coast
# of sampled records:
n1 = 1000
n2 = 2000
mean taxable income: x̄1 = $37200 x̄2 = $42000
standard deviation: s1 = $10100 s2 = $15600
♠ We can, for example, compute a 2 sided 95% confidence interval for
µ1 − µ2 = difference in mean taxable income as reported from 2000 tax
return between East and West Coast as following:
♠
r
101002 156002
37000 − 42000 ± zα/2
+
= −5000 ± 927
1000
2000
♠ Note: this shows pretty conclusively that the mean West Coast taxable
5
Stat 330 (Spring 2015): slide set 28
income is higher than the mean East Coast taxable income (in the report
from 2000). The interval contains only negative numbers
♠ However, if it contained the 0, the message wouldn’t be so clear.
Example 2: Two different digital communication systems send 100 large
messages via each system and determine how many are corrupted in
transmission, p̂1 = 0.05 and pˆ2 = 0.10. What’s the difference in the
corruption rates? Find a 98% confidence interval.
♥ Use:
r
0.05 − 0.1 ± 2.33 ·
0.05 · 0.95 0.10 · 0.90
+
= −0.05 ± 0.086
100
100
♥ This calculation tells us, that based on these sample sizes, we don’t even
have a solid idea about the sign of p1 − p2, i.e. we can’t tell which of the
two pi is larger.
6
Stat 330 (Spring 2015): slide set 28
Small samples when the standard deviation σ is unknown
Single Population:
• If the standard deviation σ is unknown, but sample X1, ..., Xn can be
assumed to come from a Normal distribution, then instead of using zα/2, we
may use tn−1, α/2 which is the corresponding percentile of the t distribution
with n − 1 degrees of freedom
• The resulting 100 × (1 − α)% confidence interval for µ is
s
x̄ ± tn−1, α/2 · √ .
n
This is helpful when sample size n is small, since the CLT does not apply.
• See pages below for a note on the t-distribution.
7
Stat 330 (Spring 2015): slide set 28
Small samples; σ is unknown: continued...
Two Populations:
• Population variances σ12 and σ22 are unknown.
•
assume equal variances σ12 = σ22 = σ 2
The 100 × (1 − α)% confidence interval for θ = µ1 − µ2 is
r
x̄1 − x̄2 ± tn+m−2,α/2 · sp
1
1
+ ,
n m
where s2p is the pooled variance calculated as
2
2
(n
−
1)s
+
(m
−
1)s
1
2
s2p =
n+m−2
8
Stat 330 (Spring 2015): slide set 28
•
assume unequal variances σ12 6= σ22
The 100 × (1 − α)% confidence interval for θ = σ1 − σ2 is
r
x̄1 − x̄2 ± tν,α/2
where
ν=
s21
n
s41
n2 (n−1)
s21 s22
+ ,
n m
2
+
s22
m
+
s42
m2 (m−1)
.
9
Stat 330 (Spring 2015): slide set 28
t distribution
• A random variable T that is of form
Z
T =p
V /ν
is said to have a t distribution with ν degress of freedom for random
variables Z and V such that:
1. Z ∼ N (0, 1) and V ∼ χ2ν (a chi-square distribution with ν degrees of
freedom).
2. Z and V are independent.
3. Chi-square distribution is a special case of the Gamma distribution.
10
Stat 330 (Spring 2015): slide set 28
• The diagram shows the probability density function of this distribution
for several different degrees of freedom:
• Observe that as the degrees of freedom increases the shape of the
t-ditribution tends towards that of the Normal distribution.
• Read percentiles of the t distribution from Table A5 of Baron’s textbook.
11
Download