confidence interval

advertisement
Math 144
Confidence Interval
In addition to the estimated value of the estimator,
some statisticians suggest that we should also consider the
variance of the estimator.
Use the single value and the variance of the estimator to
form an interval that has a high probability to cover the
unknown parameter.
This method including the variance of the point
estimator is called interval estimation, or
"confidence interval".
Interval estimation
Assume that ˆL and ˆU are two functions of a random sample and are
determined by a point estimator ˆ of an unknown parameter 
such that
ˆ
ˆ
P(L    U )  1 
where α is a known value between 0 and 1.
Interval estimation
P(ˆL    ˆU )  1 
After sampling, if the actual values of
ˆL and ˆU are a and b,
respectively, then the
interval [a, b] is called
a 100(1-α)% confidence interval (hereafter, C.I.)
for θ.
The quantity 1-α is called the confidence level associated with the
confidence interval.
Caution:
By the definition, before sampling, we have a random
interval estimation
ˆ
ˆ
[ L , U ]
for the unknown parameter θ.
After sampling, the confidence interval [a, b] is a fixed (not
random) interval. Indeed, it depends on the particular sample
observations.
Caution:
Most importantly, the unknown parameter θ is either inside
or outside the confidence interval [a, b].That is,
P(a ≤ θ ≤ b) = 0 or 1.
After sampling, we have observations
 [ a, b]
x1 ,, xn
P(a ≤ θ ≤ b) = 0
[
[
a
b
θ
Caution:
Most importantly, the unknown parameter θ is either inside
or outside the confidence interval [a, b].That is,
P(a ≤ θ ≤ b) = 0 or 1.
After sampling, we have observations
 [ a, b]
x1 ,, xn
P(a ≤ θ ≤ b) = 1
a
θ
[
[
b
Caution:
Most importantly, the unknown parameter θ is either inside
or outside the confidence interval [a, b].That is,
P(a ≤ θ ≤ b) = 0 or 1.
Recall that before sampling, we have
ˆ
ˆ
P(L    U )  1 
Interpretation of C.I.
The interpretation of a 100(1-α)% C.I. is that when we
obtained N (sufficient large) independent sets of random
sample and for each set of random sample, we construct
one particular interval by using the same point estimator,
then there are N(1-α) out of these N intervals will contain
the true unknown parameter θ.
However, we do not know which interval will contain θ and
which will not contain θ, because θ is unknown.
Interpretation of C.I.
For instance, if we construct a random interval by drawing
different sets of samples repeatedly, say 100 times, then
95% = 100(1-0.05)% C.I. for μ means that μ is contained in
95 out of the 100 fixed intervals. Again, we do not know
what these 95 intervals are, because µ is unknown.
Steps to construct a confidence interval
Step 1: Find a point estimator of θ
Step 2: Find its EXACT (or approximate) distribution.
Step 3: Based on the exact (or approximate) distribution found in Step 2 to
construct the C.I.
Throughout this course, we are only interested in how to
construct confidence intervals of parameters µ and σ2 by
the sample mean X and sample variance S2.
In the following, we will discuss the distributions of X
and S2, and then see how to obtain the confidence
interval of µ and σ2 case by case.
One sample
Confidence Interval for µ with
NORMAL population
(known variance)
Confidence interval for µ
Case I: Normal distribution with unknown mean and KNOWN variance:
Consider a random sample of size n, {X1, X2, …, Xn}, from a normal
distribution with unknown mean µ and KNOWN variance σ2. That is,
X1,, X n ~ N (,  ) .
2
Then we have a result that the sampling distribution of the sample
mean is
2
X ~ N ( ,
Or equivalently,
Z

n
n( X  )

)
~ N (0,1)
How to construct the interval?
Define a quantity
z such that
P(Z  z )  .
α
zα
How to construct the interval?
Define a quantity
z such that
P(Z  z )  .
By the symmetry of the standard normal distribution, we have
P( z / 2 
n( X  )

 z / 2 )  1  
How to construct the interval?
Z
n ( X  )

z1-α/2
= -zα/2
α/2
1-α
α/2
zα/2
)
How to construct the interval?
Define a quantity
z such that
P(Z  z )  .
By the symmetry of the standard normal distribution, we have
ˆ
L
P( z / 2 
n( X  )

 z / 2 )  1  
z / 2
z / 2
P( X 
X 
)  1
n
n
θ
ˆU
How to construct the interval?
After sampling, we can find an actual value of the sample mean, say
100(1-α)% C.I for μ is that
z / 2

x

,

n

or simply written as
x . Thus,
z / 2 
x

n 
z / 2
x
n
The margin of error
For example, if α = 0.05, then
z0.025
z0.025
P( X 
X 
)  0.95
n
n
If all X1,…, Xn are observed, i.e. we have x1,…,xn 
then 95% C.I for μ is that
x
z0.025
z0.025 

x

,
x



n
n 

,
Remark again that it does not mean that μ is inside
this interval with a probability 0.95.
Note that μ is an unknown BUT fixed number, and
are known.
So, μ is either inside or outside the fixed interval.
x and σ
z0.025
z0.025
P( x 
x
)  0 or 1
n
n
z0.025
z0.025
P( X 
X 
X  x )  0 or 1
n
n
z0.025
z0.025
P( X 
X 
)  0.95
n
n
2
Questions
Page 12
Q1: Given a random sample of 100 observations from a normal distribution for
which µ is unknown and σ = 8. Suppose that the sample mean is found to be
42.7 after sampling. Then what is the 95% C.I. for µ?
Q2: A wine importer needs to report the average percentage of alcohol in
bottles of French wine. From previous experience with different kinds of wine,
the importer believes the alcohol concentration is normally distributed with
standard deviation 1.2%. The importer randomly samples 60 bottles of the new
wine and obtains a sample mean 9.3%. Find a 90% C.I. for the population
average percentage.
One sample
Confidence Interval for µ with
NORMAL population
(unknown variance)
Confidence interval for µ
Case II: Normal distribution with unknown mean and UNKNOWN variance:
Consider a random sample of size n, {X1, X2, …, Xn}, from a normal
distribution with unknown mean µ and UNKNOWN variance σ2. That is,
X1,, X n ~ N (,  ) .
2
Then we have a result that the sampling distribution of the sample
mean is
2
X ~ N ( ,
Or equivalently,
Z

n
n( X  )

)
~ N (0,1)
After sampling, we can find an actual value of the sample mean, say
100(1-α)% C.I for μ is that
z / 2

x

,

n

x
x . Thus,
z / 2 

n 
However, σ is UNKNOWN. So, this interval is also unknown.
Replace σ2 by the sample variance S2. However, the next problem is:
What is the sampling distribution of
Still normal? NO!
n ( X  )
?
S
Theorem
Consider a random sample of size n, {X1, X2, …, Xn}, from a normal
distribution with unknown mean µ and UNKNOWN variance σ2.
Then the sampling distribution of
n( X  )
S
has a Student t distribution (or simply t distribution) with n -1 degrees of freedom.
Denote by
n( X  )
Tn 1 
~ t n 1
S
where
1 n
X   Xi
n i1
and
n
1
2
S2 
(
X

X
)

i
n  1 i 1
tk distribution
• Similar to a standard normal distribution, it is also symmetric about 0, so
P(T ≤ -a) = 1 - P(T ≤ a) = P(T ≥ a), if T follows a t distribution.
• Use a table of a t distribution to find a probability of a t-distributed random variable.
How to construct the interval?
Define a quantity
t n 1, 
such that
P(Tn1  tn1, )  .
By the symmetry of the t distribution, we have
n( X  )
P( tn1, / 2 
 tn1, / 2 )  1  
S
tn1, / 2 S
tn1, / 2 S
P( X 
X 
)  1
n
n
How to construct the interval?
After sampling, we can find the actual values of the sample mean and sample
variance, say x and s. Thus, 100(1-α)% C.I for μ is
tn1, / 2 s 
 tn1, / 2 s
, x
x 

n
n 

or simply written as
x
tn 1, / 2 s
n
How to use the table of t distribution
for the value of α
For the value of the
degree of freedom
2.353 = ?
2.353 = t?3, 0.05
Degree of freedom first
α
Questions
Page 14 Q3
(i) Find P(-t14, 0.025 ≤ T14 ≤ t14, 0.005)
P(-t14, 0.025 ≤ T14 ≤ t14, 0.005)
= P(T14 ≤ t14, 0.005) – P(T14 ≤ -t14, 0.025)
= [1 - P(T14 > t14, 0.005)] – P(T14 > t14, 0.025)
= [1 – 0.005] – 0.025
= 0.97
By the symmetry of t distribution
Questions
Page 14 Q3
(ii) Find k such that P( k ≤ T14 ≤ - 1.761) = 0.045
0.045 = P( k ≤ T14 ≤ - 1.761)
= P(T14 ≤ - 1.761) – P(T14 ≤ k)
= P(T14 ≥ 1.761) – P(T14 ≥ - k)
By the symmetry of t distribution
Questions
Page 14 Q3
(ii) Find k such that P( k ≤ T14 ≤ - 1.761) = 0.045
0.045 = P( k ≤ T14 ≤ - 1.761)
= P(T14 ≤ - 1.761) – P(T14 ≤ k)
= P(T14 ≥ 1.761) – P(T14 ≥ - k)
= P(T14 ≥ t14, 0.05) – P(T14 ≥ - k)
= 0.05 – P(T14 ≥ - k)
 P(T14 ≥ - k) = 0.05 – 0.045 = 0.005
By the symmetry of t distribution
Questions
Page 14 Q3
(ii) Find k such that P( k ≤ T14 ≤ - 1.761) = 0.045
0.045 = P( k ≤ T14 ≤ - 1.761)
= P(T14 ≤ - 1.761) – P(T14 ≤ k)
= P(T14 ≥ 1.761) – P(T14 ≥ - k)
By the symmetry of t distribution
= P(T14 ≥ t14, 0.05) – P(T14 ≥ - k)
= 0.05 – P(T14 ≥ - k)
 P(T14 ≥ - k) = 0.05 – 0.045 = 0.005 = P(T14 ≥ 2.977)

k = - 2.977
Questions
Page 14
Frequencies, in hertz (Hz), of 12 elephant calls:
14, 16, 17, 17, 24, 20, 32, 18, 29, 31, 15, 35
Assume that the population of possible elephant call
frequencies is a normal distribution, Now a scientist is
interested in the average of the frequencies, say µ. Find a
95% confidence interval for µ.
Population variance is UNKNOWN
So, use t distribution to construct the C.I. for µ.
x  22.33,
s2 = 56.424, n = 12, α = 0.05
Finally, the 95% C.I. for µ is [17.557, 27.103]
Remark:
When n > 30, the difference of a t distribution with n -1
degrees of freedom and the standard normal distribution
is small. So, we have
tn1, / 2  z / 2 .
Therefore, we can use
z / 2 s

,
x 
n

z / 2 s 
x

n 
to approximate the 100(1-α)% C.I for μ with unknown
variance, as n > 30.
Two samples
Confidence Interval for µX - µY with
NORMAL populations
(known variances)
Confidence interval for µX - µY
Case I: Normal distributions with unknown means and KNOWN variances:
Consider two independent random samples,
and
X1,, X n ~ N ( X ,  )
2
X
Y1 ,, Ym ~ N (Y ,  )
2
Y
Want to construct a C.I. for the mean difference µX - µY.
First, choose a point estimator of the mean difference.
use
X Y
to estimate µX - µY.
How to construct the interval?
Second, find the sampling distribution of
X  Y . Indeed, we have a result that


 

( X  Y ) ~ N   X  Y ,

n
m

2
X
2
Y
Or equivalently,
( X  Y )  (  X  Y )
 X2
n

 Y2
m
~ N 0, 1
How to construct the interval?
Similar to Case 1 in the one-sample case. After sampling, the
100(1-α)% C.I for μX - μY is given by

 X2  Y2
 X2  Y2 

, ( x  y )  z / 2

( x  y )  z / 2

n
m
n
m 

or
( x  y )  z / 2

2
X
n


2
Y
m
Confidence interval for µX - µY
Case I: Normal distributions with unknown means and KNOWN variances:
In particular, if two variances are EQUAL, say σX2 = σY2 = σ2,
then the 100(1-α)% C.I for μX - μY becomes

1 1
1 1
( x  y )  z / 2  , ( x  y )  z / 2  
n m
n m

Example
Two kinds of thread are being compared for strength. Fifty
pieces of each type of thread are tested under similar
conditions. Brand A had an average tensile strength of 78.3
kilograms with a population standard deviation of 5.6
kilograms, while brand B had an average tensile strength of
87.2 kilograms with a population standard deviation of 6.3
kilograms. Construct a 95% confidence interval for the
difference of the population means µA - µB.
Example
n = m = 50
Two kinds of thread are being compared for strength. Fifty
pieces of each type of thread are tested under similar
conditions. Brand A had an average tensile strength of 78.3
kilograms with a population standard deviation of 5.6
kilograms, while brand B had an average tensile strength of
87.2 kilograms with a population standard deviation of 6.3
kilograms. Construct a 95% confidence interval for the
difference of the population means µA - µB. α = 0.05
Two samples
Known variances
x  78 .3 y  87.2 σX = 5.6 σY = 6.3
Example
Two kinds of thread are being compared for strength. Fifty
pieces of each type of thread are tested under similar
conditions. Brand A had an average tensile strength of 78.3
kilograms with a population standard deviation of 5.6
kilograms, while brand B had an average tensile strength of
87.2 kilograms with a population standard deviation of 6.3
kilograms. Construct a 95% confidence interval for the
difference of the population means µA - µB.
2
2
5.6 6.3
(78.3  87.2)  (1.96)

50
50
= [-11.24, -6.56]
Two samples
Confidence Interval for µX - µY with
NORMAL populations
(unknown variances)
Confidence interval for µX - µY
Case II: Normal distributions with unknown means and UNKNOWN variances:
Consider two independent random samples,
and
X1,, X n ~ N ( X ,  )
2
X
Y1 ,, Ym ~ N (Y ,  )
2
Y
(i) In a case that BOTH UNKNOWN variances are EQUAL:
(ii) In a case that BOTH UNKNOWN variances are DIFFERENT:
Recall that, in the one-sample case with UNKNOWN variance, we
replace the population variance σ2 by the sample variance S2. Then
we have a result that
n( X  )
S
has a t distribution with n-1 degrees of freedom.
So, in two-sample cases, we will also replace the unknown variances
by their estimators.
Then what estimators should we use to estimate the variances?
Confidence interval for µX - µY
Case II: Normal distributions with unknown means and UNKNOWN variances:
(i) In a case that BOTH UNKNOWN variances are EQUAL:
n
Use a statistic
S 
2
p
(X
i 1
m
 X )   (Y j  Y )
2
i
2
i 1
nm2
(n  1) S  (m  1) S

nm2
2
X
2
Y
which is called a pooled estimator of σ2 or pooled sample variance.
Confidence interval for µX - µY
Case II: Normal distributions with unknown means and UNKNOWN variances:
(i) In a case that BOTH UNKNOWN variances are EQUAL:
Based on
2
p
S ,
( X  Y )  (  X  Y )
Sp
1 1

n m
~ t n m2
So, after sampling, the 100(1-α)% C.I for μX - μY is given by
( x  y )  t n  m  2,  / 2 s p
1 1

n m
If n+m-2 > 30, then the confidence interval can be approximated by
( x  y )  z / 2 s p
1 1

n m
Example
Page 17
Two tomato fertilizers are compared to see if one is better than the
other.
The weight measurements of two independent random samples of
tomatoes grown using each of the two fertilizers (in ounces) are as
follows:
Fertilizer A (X): 12, 11, 7, 13, 8, 9, 10, 13
Fertilizer B (Y): 13, 11, 10, 6, 7, 4, 10
Assume that two populations are normal and their population variances are
equal. Consider a confidence level 1-α = 0.95.
Fertilizer A (X): 12, 11, 7, 13, 8, 9, 10, 13
Fertilizer B (Y): 13, 11, 10, 6, 7, 4, 10
Assume that two populations are normal and their population variances are equal.
Consider a confidence level 1-α = 0.95.
Since n = 8, m = 7,
sY2  9.905,
2
x  10.375, y  8.714, s X  5.125 and
2
2
(
n

1
)
s

(
m

1
)
s
X
Y
s 2p 
 7.331
nm2
Thus, the 95% C.I. for µX - µY is given by
(10.375 8.714)  t13,0.025
= [-1.366, 4.688].
1 1
7.331(  )
8 7
Question
Students may choose between a 3-semester-hour course in physics
without labs and a 4-semester-hour course with labs. The final written
examination is the same for each section. If 24 students in the section
with labs made an average examination grade of 84 with a standard
deviation of 4, and 36 students in the section without labs made an
average grade of 77 with a standard deviation of 6. Then find a 99%
confidence interval for the difference between the average grades for
the two courses.
Assume that the population variances are equal.
Confidence interval for µX- µY
Case II: Normal distributions with unknown means and UNKNOWN variances:
(ii) In a case that BOTH UNKNOWN variances are DIFFERENT:
We do not have a statistic such that its exact distribution can be
found to construct a C.I. for µX - µY in this case. However, it is still
possible for us to construct an APPROXIMATE confidence interval.
Now, both variances are different, so we cannot use the pooled
sample variance. In this case, we use the sample variance SX2 for
σX2 and SY2 for σY2.
That is, we consider
( X  Y )  (  X  Y )
S X2 SY2

n
m
.
It can be shown that the sampling distribution of the above
statistic is an approximate t distribution with v degrees of
freedom, where
2
2 2
 S X SY 

 
n
m

v
2 2
2 2
1  SX 
1  SY 

 
 
n 1  n  m 1  m 
2
S
S 

 
n
m

v
2 2
2 2
1  SX 
1  SY 

 
 
n 1  n  m 1  m 
2
X
2
Y
Before sampling, v is random and unknown.
After sampling, the actual value of v is fixed and can be
found.
Remark that after sampling, the actual value of the degree of
freedom v is not always an integer. So, in practice, we must
round down to the nearest integer to achieve the desired
confidence interval.
That is, if v = 1.4, then take 1; if v = 2.9, then take 2.
Confidence interval for µX- µY
Case II: Normal distributions with unknown means and UNKNOWN variances:
(ii) In a case that BOTH UNKNOWN variances are DIFFERENT:
Thus, the approximate 100(1-α)% C.I for μX - μY is

s X2 sY2
s X2 sY2 
 , ( x  y )  tv ,  / 2
 
( x  y )  tv,  / 2
n m
n m 

If v > 30, then the confidence interval becomes

s X2 sY2
s X2 sY2 
 , ( x  y )  z / 2
 
( x  y )  z / 2
n m
n m 

Question
A study was conducted by the Department of Zoology at the Virginia
Polytechnic Institute and State University to estimate the difference in the
amount of the chemical orthophosphorus measured at two different stations
on the James River. Orthophosphorus is measured in milligrams per liter.
Fifteen samples were collected from station 1 and 12 samples were obtained
from station 2. The 15 samples from station 1 had an average
orthophosphorus content of 3.84 milligrams per liter and a standard deviation
of 3.07 milligrams per liter, while the 12 samples from station 2 had an
average content of 1.49 milligrams per liter and a standard deviation of 0.80
milligram per liter. Find a 95% confidence interval for the difference in the true
average orthophosphorus contents at these two stations, assuming that the
observations came from normal populations with different variances.
Question
A study was conducted by the Department of Zoology at the Virginia
Polytechnic Institute and State University to estimate the difference in the
amount of the chemical orthophosphorus measured at two different stations
on the James River. Orthophosphorus is measured in milligrams per liter.
Fifteen samples were collected from station 1 and 12 samples were obtained
from station 2. The 15 samples from station 1 had an average
orthophosphorus content of 3.84 milligrams per liter and a standard deviation
of 3.07 milligrams per liter, while the 12 samples from station 2 had an
average content of 1.49 milligrams per liter and a standard deviation of 0.80
milligram per liter. Find a 95% confidence interval for the difference in the true
average orthophosphorus contents at these two stations, assuming that the
observations came from normal populations with different variances.
Two sample problem with α=0.05!!
Question
A study was conducted by the Department of Zoology at the Virginia
Polytechnic Institute and State University to estimate the difference in the
amount of the chemical orthophosphorus measured at two different stations
on the James River. Orthophosphorus is measured in milligrams per liter.
Fifteen samples were collected from station 1 and 12 samples were obtained
from station 2. The 15 samples from station 1 had an average
orthophosphorus content of 3.84 milligrams per liter and a standard deviation
of 3.07 milligrams per liter, while the 12 samples from station 2 had an
average content of 1.49 milligrams per liter and a standard deviation of 0.80
milligram per liter. Find a 95% confidence interval for the difference in the true
average orthophosphorus contents at these two stations, assuming that the
observations came from normal populations with different variances.
Two sample problem with α=0.05!!
Normal!! Different Variances
Question
A study was conducted by the Department of Zoology at the Virginia
Polytechnic Institute and State University to estimate the difference in the
amount of the chemical orthophosphorus measured at two different stations
on the James River. Orthophosphorus is measured in milligrams per liter.
Fifteen samples were collected from station 1 and 12 samples were obtained
from station 2. The 15 samples from station 1 had an average
orthophosphorus content of 3.84 milligrams per liter and a standard deviation
of 3.07 milligrams per liter, while the 12 samples from station 2 had an
average content of 1.49 milligrams per liter and a standard deviation of 0.80
milligram per liter. Find a 95% confidence interval for the difference in the true
average orthophosphorus contents at these two stations, assuming that the
observations came from normal populations with different variances.
Two sample problem with α=0.05!!
x  3.84, s X  3.07, n  15
Normal!! Different Variances
and
y  1.49, sY  0.8, m  12
Question
Two sample problem with α=0.05!!
x  3.84, s X  3.07, n  15
Normal!! Different Variances
and
y  1.49, sY  0.8, m  12
Consider µ1 - µ2, where µi is the true average orthophosphorus contents at
station i, i = 1 and 2.
Since the population variances are assumed to be unequal, we can only find an
approximate 95% C.I. based on the t distribution with v degrees of freedom,
where
3.07

2
/ 15  0.80 / 12
v
2
2
2
2
[(3.07 / 15) / 14]  [(0.80 / 12) / 11]
2
 16.3  16
2
Question
Two sample problem with α=0.05!!
Normal!! Different Variances
So, for α = 0.05, we have
tv, / 2  t16,0.025  2.120
Thus, the 95% C.I. for µ1 - µ2 is
( x  y )  t16,0.025
2
X
2
Y
s
s

n m
3.072 0.802
 (3.84  1.49)  (2.120)


[
0
.
60
,
15
12
4.10].
Question
Thus, the 95% C.I. for µ1 - µ2 is
( x  y )  t16,0.025
2
X
2
Y
s
s

n m
3.072 0.802
 (3.84  1.49)  (2.120)


[
0
.
60
,
15
12
4.10].
Hence, we can say that we are 95% confident that the interval
from 0.60 to 4.10 milligrams per liter contains the difference of
the true average orthophosphorus contents for stations 1 and 2.
One- (or Two-) sample(s)
Confidence Interval for µX (or µX - µY)
with NON-NORMAL population(s)
Approximate C.I. in One-sample case
Note that, so far, all results are based on the normal
population(s). Then a natural question is:
how to construct a C.I. with NON-Normal distribution.
Unfortunately, in general, it is not easy to find a statistic
such that its exact distribution is easily found in this case.
However, if the sample size is large enough, then we can
use a normal approximation to approximate the
distribution of the statistic used to construct the C.I.
Central Limit Theorem (CLT)
X
If
is the sample mean of a random sample X1,…, Xn
of size n from any distribution with a finite mean µ
and a finite positive variance σ2, then the distribution
of
n
X 

/ n
X
i 1
i
 n
n
is the standard normal distribution N(0,1) in the limit as n
goes to infinity.
Approximate C.I. for µ
Case I: Any distribution with unknown mean and KNOWN variance:
Consider a random sample of size n, {X1, X2, …, Xn}, from a distribution
with unknown mean µ and KNOWN variance σ2. That is,
After sampling, we can find an actual value of the sample mean, say
APPROXIMATE 100(1-α)% C.I for μ is
z / 2

,
x 
n

z / 2 
x

n 
x . Thus,
the
Case II: Any distribution with unknown mean and UNKNOWN variance:
After sampling, we can find the actual values of the sample mean and sample
variance, say x and s. Thus, the APPROXIMATE 100(1-α)% C.I for μ is
tn 1, / 2 s

,
x 
n

tn1, / 2 s 
x

n 
If n is large enough, then the approximate 100(1-α)% C.I for μ becomes
z / 2 s
z / 2 s 

, x
x 

n
n 

Approximate C.I. in Two-sample case
Consider two independent random samples from distributions
with means µX and µY and variance σX2 and σY2, respectively.
(i) In a case of SAME variance (say, σX2 = σY2 = σ2), the
APPROXIMATE 100(1-α)% C.I for µX - µY is
(if variance σ2 is known)
1 1
( x  y )  z / 2

n m
(if variance σ2 is unknown )
( x  y )  t n  m  2,  / 2 s p
or
( x  y )  z / 2 s p
1 1

n m
1 1

n m
if n+m-2 is large enough.
Approximate C.I. in Two-sample case
Consider two independent random samples from distributions
with means µX and µY and variance σX2 and σY2, respectively.
(i) In a case of Different variances, the APPROXIMATE
100(1-α)% C.I for µX - µY is
(if variances are known )
( x  y )  z / 2
 X2
n

 Y2
m
(if variances are unknown )
( x  y )  tv ,  / 2
or
( x  y )  z / 2
s X2 sY2

n m
s X2 sY2

n m
if v is large enough OR
n and m are large enough.
Confidence Interval for σ2
with NORMAL population
Confidence interval for σ2
Case : Normal distribution with UNKNOWN variance:
Consider a random sample of size n, {X1, X2, …, Xn}, from a normal
distribution with UNKNOWN mean and UNKNOWN variance σ2. Then, a
n
statistic
(n  1) S
2
has a chi-squared (or
denote it by
X

2
n 1
2
2

(X
i 1
i
 X)
2
2
) distribution with n – 1 degrees of freedom. We

(n  1) S

2
2
~
2
n 1
Chi-squared distribution with k degrees of freedom
Not symmetric !!
How to construct the interval?
2
2
2

P
(
X


Define a quantity  such that
k
 )  .
So, we have
P( 
2
1 / 2

(n  1) S

2
2
Found from the table
of chi squared
distribution with k
degrees of freedom
  / 2 )  1  
2
Density function of the chi-squared random variable
X
2
n 1
with n-1 degrees of freedom.
 /2
 /2
1

2
1 / 2
 / 2
2
How to construct the interval?
2
2
2

P
(
X


Define a quantity  such that
k
 )  .
So, we have
P( 
2
1 / 2

(n  1) S

2
2
Found from the table
of chi squared
distribution with k
degrees of freedom
  / 2 )  1  
2
(n  1)S
(n  1)S
2
P( 2
  2
)  1
 / 2
1 / 2
2
2
After sampling, we can find an actual value of the sample variance, say s2. Thus,
100(1-α)% C.I for σ2 is
 (n  1) s
,

2
  / 2
2
(n  1) s 
.
2
1 / 2 
2
How to use the table of chi-squared
distribution
P( X k2  2 )  .
for the value of α
For the value of the
degree of freedom
20.483 = ?
20.483 = ?
2
0.025
With 10 degrees of freedom
Questions
Page 21
For a chi-squared distribution with v degrees of freedom,
a) If v = 5, then

2
0.005

16.750 =

2
0.005
With 5 degrees of freedom
Questions
Page 21
For a chi-squared distribution with v degrees of freedom,
a) If v = 5, then

2
0.005
 16.750
b) If v = 19, then

2
0.05
 30.144
Questions
Page 21
For a chi-squared distribution with v degrees of freedom, find

2
a)
such that
P( X   )  0.025
2
v
2

2
0.025
when v = 19;
  0.025
 32.852
Questions
Page 21
For a chi-squared distribution with v degrees of freedom, find

2
b)
such that
P(37.652  X   )  0.045
2
v
2
 P( X   )  P( X  37.652)
2
25
 P( X
=?
2
25
2
2
25
 37.652)  P( X
2
25
  )
2
when v = 25;
P( X  37.652)
2
25
37.652 =

2
0.05
With 25 degrees of freedom
Questions
Page 21
For a chi-squared distribution with v degrees of freedom, find

2
b)
such that
P(37.652  X   )  0.045
2
v
2
 P( X   )  P( X  37.652)
2
25
 P( X
2
25
2
2
25
 37.652)  P( X
 0.05  P( X   )
2
25
2
2
25
  )
2
when v = 25;
Questions
Page 21
For a chi-squared distribution with v degrees of freedom, find

2
b)
such that
P(37.652  X   )  0.045
2
v
2
when v = 25;
 P( X   )  0.05  0.045  0.005  
2
25

2
0.005
2
 46.928.
Questions
Page 21
For a chi-squared distribution with v degrees of freedom, find

2
such that
P( X   )  0.95 when v = 6;
2
   0.05  0.05  12.592
a)
b)
2
v
2
P(   X  23.209)  0.015
2
2
v
   0.025 
2
0.025
when v = 10;
 20.483.
How about the confidence interval
for σ, not σ2?
(n  1)S
(n  1)S
2
P( 2
  2
)  1
 / 2
1 / 2
2
Recall that
2
A 100(1 - α)% confidence interval for σ can be obtained
by taking the square root of each endpoint of the interval
for σ2. That is,
 (n  1) s

,
2
  / 2
(n  1) s 
.
2
1 / 2 
Example
The following are the weights, in decagrams, of 10
packages of grass seed distributed by a certain company:
46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0.
Find a 95% C.I. for the variance of all such packages of
grass seed distributed by this company, assuming that a
normal population is used.
Example
n = 10
The following are the weights, in decagrams, of 10
packages of grass seed distributed by a certain company:
46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0.
Find a 95% C.I. for the variance of all such packages of
grass seed distributed by this company, assuming that a
normal population is used.
 / 2  
2

2
1 / 2
2
0.025

 19.023
2
0.975
 2.700
  0.05
Example
n = 10
The following are the weights, in decagrams, of 10
packages of grass seed distributed by a certain company:
46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0.
n
1
2
s 
( xi  x )  0.286

n  1 i 1
2
Thus, the 95% C.I. for the variance is
9(0.286 )
[
,
19.023
9(0.286 )
]  [0.135, 0.953].
2.700
Sample size determination
Before we end the topic of estimation, let’s consider the
problem of how to determine the sample size.
Often, we wish to know how large a sample is necessary
to ensure that the error in estimating an unknown
parameter, say µ, will be less than a specified amount e.
Consider a 100(1-α)% C.I. for µ with known variance.
The (marginal) error is
z / 2

n
Thus, solving for the sample size n in the equation
z / 2

n
e
implies that the required sample size is
 z / 2 
n
 .
 e 
2
Question
Page 23
A marketing research firm wants to conduct a survey to
estimate the average amount spent on entertainment by
each person visiting a popular resort. The people who
plan the survey would like to have an estimate close to
the true value such that we will have 95% confidence
that the difference between them is within $120. If the
population standard deviation is $400, then how large
should the sample be?
Question
Page 23
  0.05  z / 2  z0.025  1.96
A marketing research firm wants to conduct a survey to
estimate the average amount spent on entertainment by
each person visiting a popular resort. The people who
plan the survey would like to have an estimate close to
the true value such that we will have 95% confidence
that the difference between them is within $120. If the
population standard deviation is $400, then how large
should the sample be?
 |   x | 120 
  x  120
e
Question
Page 23
  0.05  z / 2  z0.025  1.96
A marketing research firm wants to conduct a survey to
estimate the average amount spent on entertainment by
each person visiting a popular resort. The people who
plan the survey would like to have an estimate close to
the true value such that we will have 95% confidence
that the difference between them is within $120. If the
population standard deviation is $400, then how large
should the sample be?
 |   x | 120 
  400
  x  120
e
Question
Page 23
A marketing research firm wants to conduct a survey to
estimate the average amount spent on entertainment by
each person visiting a popular resort. The people who
plan the survey would like to have an estimate close to
the true value such that we will have 95% confidence
that the difference between them is within $120. If the
population standard deviation is $400, then how large
should the sample be?
Then, the required sample size is
 z / 2 
n
  42.68.
 e 
2
Download