Oller`s Brief on: Confidence Intervals

advertisement
Oller's Brief on:
Confidence Intervals
CONCEPTS AND OBJECTIVES
1. Identifying and creating confidence intervals.
2. How to craft an interval when σ is known.
3. How to craft an interval when σ is unknown.
4. How to select the appropriate sample size given a desired margin of error.
5. Using excel to construct confidence intervals.
A. BACKGROUND INFORMATION
We know from previous exercises:
1. A sample of size n is drawn from a population.
2. If you sum the values of the sample and divide by n, you will obtain a sample mean.
That sample mean is a point estimate of the population mean μ.
3. Our expectation is that the sample mean will equal the population mean. E[ x ]=μ.
However, we know that this expectation will not likely be realized. This leads to our
sample error, which is x -μ.
4. Additionally, we saw that if we took many samples from a population, the sample
means and sample errors will take the form of a normal distribution in two instances.
a. If the sample size is greater than or equal to 30.
b. If the population is normally distributed.
This distribution will contain a standard deviation of these sample means or sample
errors. This standard deviation is also known as the standard error and is denoted as σ x .
This should not be confused with the standard deviation of the population, σ. Rather than
construct numerous samples and derive a standard error, we can use the commonly used
property that σ x =

n
.
Using Excel to calculate the standard error- Type =(value for the population standard
deviation/sqrt(sample size)).
5. We know that we can construct a z-score for any value x that comes from a normally
x
distributed set of sample means. z=
.
X
B. THE CONFIDENCE INTERVAL
Rationale: x is merely a point estimate of μ. We can learn more about the quality of our
sample mean by constructing a confidence interval.
1. Confidence Interval- The confidence interval is equal to x  Margin of Error.
2. Confidence Level- The confidence level is the percentage of samples that will be
within the confidence interval. E.g. 90%
3. Confidence coefficient- The confidence level expressed as a proportion. E.g. .9
4. The level of significance- The complement to the confidence coefficient. α= Level of
Significance= 1- confidence coefficient. This is the probability that an interval estimation
procedure will yield an interval that does not include μ.
To estimate the confidence interval, we have to evaluate our particular
circumstance.
Case 1- σ is known
Case 2- σ is unknown
Case 3- n<30 and the sample is not normally distributed.
C. CASE 1, σ KNOWN
First, the sample size needs to be large or the population needs to be normally distributed,
such that we can infer that our sampling distribution is normal.
With a normal distribution and a known population standard deviation, we can use the
standard normal distribution.
The CI= x  Margin of Error. The margin of error will equal the z score at a given level
of significance*the standard error.
  
CI  x  Z    * 

   n 
2
Where:
CI The confidence interval
The point estimate mean
x

Standard error
n
 % The error we are willing to accept
( 1   )% The confidence level (confidence coeff.)
Critical value of Z. The value corresponding to an area (probability) of
z  
 
2

in
2
both the lower and upper tails.
This equation means that α/2 percent values will be in the upper tail and α/2 percent in
the lower tail. To find the critical z value, we need to look up 1-α/2 in the body of the Z
chart and find the corresponding Z score.
Why use 1-α/2 instead of 1-α?
Standard Normal Distribution
Confidence
Interval
0.45
0.4
α/2
0.35
F(x)
0.3
0.25
0.2
0.15
0.1
0.05
0
-3
-2
-1
0
1
2
3
Score
Observations:
1. The higher the confidence level the wider the CI
  
 is the maximum sampling error i.e. margin of error
2. z    * 
 
n


2
Suppose:
x  125
  24
n  36
Find (construct) a 90% CI for the population mean
To find the critical z the value you need to look up the
relevant probability from within the table:
Use 1-α/2.
α=1-.9=.1. .1/2=.05.
So, lookup 1-.05=.95
Z=1.65
In excel, find the Z score by typing =normsinv(1-α/2). Find the Margin of error by typing
=(Z score result*(σ/sqrt(n)). You can find the lower boundary of the confidence interval
by typing =( x -the margin of error). You can find the upper boundary of the confidence
interval by typing =( x +the margin of error).
How would you interpret this result?
Construct a 99% CI for the population mean
D. CASE 2 σ unknown.
In this case, we will still operate under the condition that standard error is still normally
distributed (the sample size is large or the population is normally distributed); however,
you cannot use σ to find your standard error. You must use S as a proxy.
Recall that S=
 x
 x
2
i
.
n 1
a. In excel, you just need to enter =stdev(sample range).
x
no longer is a
S/ n
Standard Normal random variable. This distribution does not follow the Z distribution
anymore, but it does have a particular distribution called the t-distribution
If σ is unknown then we cannot use the z distribution because
The t-distribution is:
1. A continuous distribution
2. Bell-shaped & symmetrical about zero
3. More spread out (flatter) than the Standard Normal Distribution, however as n
increases t-distribution approaches z-distribution
To use the t-table (page 329) of your book, you need two pieces of information.
1. α/2
2. The degrees of freedom. DF=n-1.
Notes:
1. As n (and likewise the DF) becomes larger, the t score declines.
2. As n becomes large (greater than 30), t=z.
CI using t-distribution
 S 
CI  x  tdf  n 1, / 2  * 

 n
Where tdf n 1, / 2  is the t-value from the t-table using t curve with df = n-1 and a right tail
area of  / 2 .
USING EXCEL- To calculate the value of t using excel, type the following:
=tinv(α,degrees of freedom)
****Make sure you use α not α/2. Excel automatically makes the adjustment****
Do problem 18 on page 336.
E. CASE 3, n<30 and the population is not normally distributed.
You cannot use t or z. You must increase your sample size.
F. HOW TO DETERMINE YOUR SAMPLE SIZE GIVEN A DESIRED MARGIN
OF ERROR.
  
z    * 
 is the margin of error
 
n


2
  

Let e = z    * 
 
 n
2
If we know the values of e,  , & Z then we can find n
  
e = z    * 

 
n


2
Solve for n:
e n  z
z
n
e
 n
2
 z 


 e 
 z 
n

 e 
2
2
If e = 30   200   .05 , find the required sample size and always round up.
APPENDIX
Term
Description
Population
Standard
Deviation (σ)
Sample
Standard
Deviation (s)
This is the standard deviation for the
entire population. It measures the
average dispersion from the mean.
This is the standard deviation of your
sample. It is an approximation of 𝜎.
For example, if you have a
population of 500 and sample 100 of
these observations, σ would be the
standard deviation of all 500,
whereas s would be the standard
deviation of the 100 observations in
your sample.
This is the average of the entire
population.
This is the average of the sample. It
can take numerous potential values.
It is a point estimate of the
population mean.
Population
Mean (µ)
Sample Mean
(𝑥̅ )
Formula
(x−μ)2
σ=√
=stdevp(data)
N
(x−x̅)2
s=√
Excel Command
=stdev(data)
n−1
µ= ∑𝑁
1 𝑁
𝑥
=average(data)
𝑥
=average(data)
𝑥̅ = ∑𝑛
1
𝑛
Standard Error
(𝜎𝑥̅ )
This is the standard deviation of the
sampling distribution (the
distribution of the potential values
for the sample mean.) It is basically
the average error when using a
sample mean.
𝜎𝑥̅ =
𝜎
=σ/sqrt(n)
√𝑛
if σ is known.
Or
𝑠
𝜎𝑥̅ =
√𝑛
If σ is unknown.
Z(α/2)
t(α/2)
Margin of
Error
This is a Z score at the location
where the area in either tail is α/2. It
measures distance from the mean in
terms of standard errors. Basically,
this measure is telling you how far
away from the mean you must go to
leave that probability of occurrence
in each tail. We are using Z when
sigma is known.
This is a t score at the location where
the area in either tail is α/2. It
measures distance from the mean in
terms of standard errors. Basically,
this measure is telling you how far
away from the mean you must go to
leave that probability of occurrence
in each tail. We are using this
distribution sigma is unknown.
This is the distance you need to add
and subtract from the sample mean to
get your confidence interval. It is
determined by your level of
significance and standard error.
A z-score is equal to
𝑥̅ −μ
𝜎𝑥̅
,
However, we are
finding a z-score at a
given probability, so
don’t use this
calculation for
confidence intervals.
A t-score is equal to
𝜎𝑥̅
,
However, we are
finding a t-score at a
given probability, so
don’t use this
calculation for
confidence intervals.
Margin of Error=
Z(α/2)* 𝜎𝑥̅ , if
using the standard
normal distribution,
or
the t distribution
This is a range that will include the
population mean with a probability
equal to the confidence level.
=normsinv(α/2)
This will give you the
negative z-score.
Both values are
numerically the same.
=tinv(α, d.f.)
𝑥̅ −μ
t(α/2)* 𝜎𝑥̅ , if using
Confidence
Interval
=normsinv(1-α/2)
This will give you the
positive z-score.
Upper Boundary=
𝑥̅ +Margin of Error
Lower Boundary=
𝑥̅ -Margin of Error
The degrees of freedom
for these problems is
n-1. This finds a t-score
with an area of α/2 in
the upper tail.
Just use the * to
multiply the two terms.
Download