Uploaded by Walkee Kiya

11-interval-estimation-slides

advertisement
Interval Estimation
Edwin Leuven
Interval estimation
While an estimator may be unbiased or consistent, a given estimate
will never equal the true value
I
this point estimate does not give a sense of “closeness”, while
I
the variance estimate does not give a sense of location
We could try to combine both in statements like:
“We are confident that θ lies somewhere between . . . and . . . ”
where we would like to
1. give a specific interval, and
2. be precise about how confident we are
This is the aim of a confidence interval (CI), which is a particular
type of probability interval
2/36
Interval estimation
Formally we define a confidence interval as follows
Pr(L̂ < θ < Û) = 1 − α
where we construct estimates of a lower bound L̂ and an upper
bound Û such that the interval [L̂, Û] covers the parameter of
interest with probability 1 − α
We call 1 − α the confidence level
CI’s are random intervals because they differ across random samples
The confidence level is thus a probability relative to the sampling
distribution!
3/36
Probability intervals
We will consider probability intervals for continuous r.v. X
Pr(a < X < b)
with density f (x ) and support on the real line this equals
Z b
Pr(a < X < b) =
f (x )dx
a
Z b
=
f (x )dx −
−∞
Z a
f (x )dx
−∞
= F (b) − F (a)
Note that since X is continuous Pr(X = x ) = 0 and
Pr(a < X < b) = Pr(a ≤ X < b) = Pr(a < X ≤ b) = Pr(a ≤ X ≤ b)
4/36
Density
Probability intervals (X ∼ χ2 (3))
Area
= Pr(a < X < b)
= F(b) − F(a)
a
b
X
5/36
Probability intervals
We often need to compute either
p = F (a) ≡ Pr(X ≤ a)
or
a = F −1 (p)
In R we can do this using the pxxx and qxxx functions, f.e. for the
normal distribution:
pnorm(1.96)
## [1] 0.9750021
qnorm(0.975)
## [1] 1.959964
6/36
What is the probability that our estimator is close to θ?
Density
We can write this as: Pr(|θ̂ − θ| < ε) = Pr(θ − ε < θ̂ < θ + ε)
Area = ?
θ−ε
θ
θ+ε
θ^
7/36
Interval estimation
The probability that our estimator is no further than ε from θ equals
Pr(θ − ε < θ̂ < θ + ε)
which is the probability that the r.v. θ̂ is in a fixed interval with
unknown boundaries
Note though that we can rewrite this as follows
Pr(θ − ε < θ̂ < θ + ε) = Pr(θ̂ − ε < θ < θ̂ + ε)
which is the probability that the random interval (θ̂ − ε, θ̂ + ε)
covers the fixed number θ
How do we construct such intervals and compute their
corresponding confidence levels?
8/36
CI for the mean – Normal data, variance known
Let X ∼ N (µ, σ 2 ) then X̄ ∼ N (µ, σ 2 /n)
Now consider taking a random sample of size n, then
ε
X̄ − µ
ε
√ < √
Pr(µ − ε < X̄ < µ + ε) = Pr − √ <
σ/ n
σ/ n
σ/ n
ε
ε
√
−Φ − √
=Φ
σ/ n
σ/ n
ε
√
=2Φ
−1=1−α
σ/ n
!
For a given confidence level 1 − α we get the following ε
√
ε = z1−α/2 · σ/ n
where z1−α/2 = Φ−1 (1 − α/2)
9/36
CI for the mean – Normal data, variance known
Φ(x)
1−α 2
α 2
zα
2
µ
z1−α
2
x
10/36
CI for the mean – Normal data, variance known
Since
Pr(µ − ε < X̄ < µ + ε) = Pr(X̄ − ε < µ < X̄ + ε),
the following is a (1 − α)100% CI:
√
√
(X̄ − z1−α/2 · σ/ n, X̄ + z1−α/2 · σ/ n)
For a given sample, µ will be either inside or outside this interval
But before drawing the sample, there is a (1 − α)100% chance that
an interval constructed this way will cover the true parameter µ
11/36
CI for the mean – Normal data, variance known
For example if we set the confidence level at 1 − α = 0.90, then
z1−0.10/2 = Φ−1 (0.95) = −Φ−1 (0.05) ≈ 1.645
qnorm(.95)
## [1] 1.6448536
With n = 10 random draws from X ∼ N (µ, 1)
mean(rnorm(10))
## [1] -0.38315741
we get the following 90% confidence interval:
√
√
(−0.38 − 1.645 · 1/ 10, −0.38 + 1.645 · 1/ 10) ≈ (−0.90, 0.14)
12/36
CI for the mean – Normal data, variance known
We know that we need to cover the true parameter 90% of the time:
n = 10; nrep = 1e5; z = qnorm(0.95)
cover = rep(F, nrep)
for(i in 1:nrep) {
x = rnorm(n, 0, 1)
m = mean(x); se = 1 / sqrt(n)
ci0 = m - z * se; ci1 = m + z * se;
cover[i] = ci0 < 0 & 0 < ci1
}
mean(cover)
## [1] 0.90217
13/36
90% CI for the mean, n = 10
50
Sample nr.
40
30
20
10
0
−2
−1
0
1
2
CI
14/36
90% CI for the mean, n = 40
50
Sample nr.
40
30
20
10
0
−2
−1
0
1
2
CI
15/36
90% CI for the mean, n = 160
50
Sample nr.
40
30
20
10
0
−2
−1
0
1
2
CI
16/36
90% CI for the mean, n = 10
50
Sample nr.
40
30
20
10
0
−2
−1
0
1
2
CI
17/36
95% CI for the mean, n = 10
50
Sample nr.
40
30
20
10
0
−2
−1
0
1
2
CI
18/36
99% CI for the mean, n = 10
50
Sample nr.
40
30
20
10
0
−2
−1
0
1
2
CI
19/36
Computing sample size
Suppose you plan to collect data, and you want to know the sample
size you need to achieve a certain level of confidence in our interval
estimate
Since
√
ε = z1−α/2 · σ/ n
Solving for n, we obtain
n=
z1−α/2 · σ
ε
2
Note that we need a larger sample (n increases) if
I
we require greater precision (ε decreases)
I
we want to be more confident (α decreases)
I
there is more dispersion in the population (σ increases)
20/36
CI for the Variance – Normal data
The sample variance
S2 =
n
1 X
(Xi − X̄ )2
n − 1 i=1
is our estimator for the population variance
When the Xi follow a normal distribution then (n − 1)S 2 /σ 2 follows
a so-called Chi-squared distribution with n − 1 degrees of freedom:
Chi-squared distribution
If Zi ∼ N (0, 1) and V =
Pk
2
i=1 Zi
then
V ∼ χ2 (k)
where k are the degrees of freedom. E [V ] = k and Var(V ) = 2k.
21/36
CI for the Variance – χ2 distribution
Density
χ2(1)
χ2(2)
χ2(3)
χ2(9)
0
2
4
6
8
X
22/36
CI for the Variance – Normal data
Because the Chi-square distribution is asymmetric we need to make
sure that we set the boundaries of the CI such that we have α/2
probability mass on each side:
n−1
n−1
Pr(c.025
< (n − 1)S 2 /σ 2 < c.975
) = 0.95
where we can compute cpn−1 in R using qchisq(p,k)
We can rewrite the above as
n−1
n−1
Pr((n − 1)S 2 /c.975
< σ 2 < (n − 1)S 2 /c.025
) = 0.95
23/36
CI for the Variance – Normal data
1.0
Density
0.8
0.6
0.4
0.2
0.0
0.0
0.5
1.0
1.5
S
2.0
2.5
2
24/36
CI for the Variance – Normal data
If α = .05 we know that we need to cover the true parameter 95%
of the time:
n = 4; nrep = 1e5
cover = rep(F, nrep)
for(i in 1:nrep) {
v = var(rnorm(n, 1, 10))
ci0 = (n - 1) * v / qchisq(.975, n - 1)
ci1 = (n - 1) * v / qchisq(.025, n - 1)
cover[i] = ci0 < 100 & 100 < ci1
}
mean(cover)
## [1] 0.94955
25/36
CI for the mean – Normal data, variance unknown
We considered X ∼ N (µ, σ 2 ) and assumed we knew σ 2
In practice we probably don’t know σ 2 , but we have an estimator
S2 =
n
1 X
(Xi − X̄ )2
n − 1 i=1
A simple solution is to replace σ with S:
√
√ x̄ − zα/2 S/ n, x̄ + z1−α/2 S/ n
But how does this work?
26/36
CI for the mean – Normal data, variance unknown
If α = .1 we know that we need to cover the true parameter 90% of
the time:
n = 10; nrep = 1e5; z = qnorm(.95)
cover = rep(F, nrep)
for(i in 1:nrep) {
x = rnorm(n, 1, 10)
m = mean(x); se = sd(x) / sqrt(n)
ci0 = m - z * se; ci1 = m + z * se;
cover[i] = ci0 < 1 & 1 < ci1
}
mean(cover)
## [1] 0.86477
27/36
Student’s t-distribution
It turns out that
√
Z = (X̄ − µ)/(S/ n)
does not follow a Normal but a so-called t-distribution with n − 1
degrees of freedom
t distribution
If Z ∼ N (0, 1) and V ∼ χ2 (k), Z and V are independent, and
T =p
Z
V /k
then
T ∼ t(k)
where k are the degrees of freedom. E [T ] = k and Var(T ) = 2k.
28/36
Student’s t-distribution
Density
N(0, 1) = t(∞)
t(4)
t(1)
x
29/36
Student’s t-distribution
5
t1−α
2
(k)
4
3
z0.995
z0.975
z0.95
2
5
10
20
50
k (degrees of freedom)
30/36
CI for the mean – Normal data, variance unknown
We know that we need to cover the true parameter 95% of the time:
n = 10; nrep = 1e5; z = qt(.975, n - 1)
cover = rep(FALSE, nrep)
for(i in 1:nrep) {
x = rnorm(n, 1, 10)
m = mean(x); se = sd(x) / sqrt(n)
ci0 = m - z * se; ci1 = m + z * se;
cover[i] = ci0 < 1 & 1 < ci1
}
mean(cover)
## [1] 0.95102
31/36
Computing sample size – unknown variance
We saw that
n=
2
z1−α/2
· σ2
ε2
This means that without knowing the population variance σ 2 we
cannot set the sample size
When X is Binomial we know that Var(X ) = p(1 − p), while this
depends on p which is unknown, we know that 0 ≤ p(1 − p) ≤ 0.25
This means that
n=
2
z1−α/2
· σ2
ε2
≤
2
z1−α/2
· 0.25
ε2
32/36
CI for the mean – Non Normal data
Up until now we assumed that our data came from a Normal
distribution
It turns out that as long as our sample is large enough the Normal
distribution is a good approximation thanks to the Central Limit
Theorem
Central Limit Theorem
Let X1 , . . . , Xn be i.i.d. random variables with E [Xi ] = µ and
Var(Xi ) = σ 2 < ∞ then
X̄ − µ
√ → N (0, 1)
σ/ n
Consider the sampling distribution of X̄ when X ∼ χ2 (3)
33/36
Density
CI for the mean – Non Normal data
E[X]
x
34/36
CI’s for statistics other than the mean
Beyond the scope of this course, but some pointers:
I
when sampling distributions are known, use these, otherwise
CI’s when n is large
I
use the bootstrap
CI when n is small
I
rely on non-parametric or permutation tests
35/36
Summary
Confidence Intervals (CI’s) are random intervals that cover the true
parameter with a given probability
We call this probability the confidence level, and 0.95 is commonly
used
Pay attention to the interpretation!!
I
Before drawing the sample, there is a (1 − α)100% chance that
an interval constructed this way will cover the true parameter
We saw how to construct CI’s for the mean
For a given confidence level we can set CI widths by choosing the
appropriate sample sizes
36/36
Download