Uploaded by Neysi Parada Gamboa

Ch7 (Part 1)

advertisement
Chapter 7.1
CONFIDENCE INTERVALS AND SAMPLE SIZE WHEN ๐œŽ IS KNOWN
Estimation
๏ต
One important aspect of inferential statistics is estimation - the process
of estimating a population using information obtained from a sample.
๏ต
The accuracy of estimation depends on several key factors:
๏ต
๏ต
Sample Size
๏ต
Observations chosen randomly
๏ต
Other assumptions that we will explore
In this chapter, statistical procedures for estimating the population
mean, variance, and standard deviation will be explored.
What We Will Cover:
๏ต
Confidence intervals for the population mean when ๐œŽ is known
๏ต
Determining Sample Size
๏ต
Confidence intervals for the population mean when ๐œŽ is
unknown
Point Estimates
๏ต
A point estimate is a specific numerical value estimate of a
parameter.
๏ต
The best point estimate of the population mean ๐œ‡ is the sample
mean ๐‘ฅ.
๏ต
Sample measures (i.e. statistics) that are used to estimate
population measures (i.e. parameters) are also called
estimators.
๏ต
Statisticians are always looking for “good” estimators.
Properties of a Good Estimator
๏ต
Point estimators are considered ‘good’ if they demonstrate the
following properties:
๏ต Unbiasedness
๏ต Efficiency
(relative efficiency)
๏ต Consistency
Unbiasedness
๏ต
๏ต
For an estimator to be unbiased, the expected value (or mean)
of the estimates obtained from samples of a given size is equal
to the parameter being estimated.
๐œƒ = the population parameter of interest
๏ต
๐œƒ = the point estimator of ๐œƒ
The sample statistic ๐œƒ is an unbiased estimator of the population
๏ต
๐ธ(๐œƒ) = ๐œƒ
๏ต
parameter ๐œƒ if
Efficiency
๏ต
Assume that a simple random sample of n elements can be used to
provide two unbiased estimators for the same population parameter,
say ๐œƒ1 and ๐œƒ2
๏ต
The point estimator ๐œƒ1 is said to be more efficient relative to ๐œƒ2 if:
Var(๐œƒ1 ) < var (๐œƒ2 )
๏ต
In other words, the point estimator with the smaller variance, and
hence a smaller standard deviation, will provide estimates that are
closer to the true population parameter
Consistency
๏ต
A point estimator is consistent if values of the point estimator tend to
become closer to the true population parameter as the sample size
becomes larger.
๏ต
As the sample size n approaches the population size N, the value
should be closer to the population parameter.
Interval Estimation
๏ต
๏ต
๏ต
๏ต
๏ต
Interval Estimate: an interval or range of values used to estimate the
parameter.
For example, we might say that the mean height of males falls
somewhere between ±3 inches from 70 inches.
67 ≤ ๐œ‡ ≤ 73, or 70 ± 3
However, the interval estimate may or may not contain the value of
the true underlying population parameter.
A degree of confidence must be assigned before creating an interval
estimate.
Interval Estimation
๏ต
Confidence Level: the probability that the interval estimate will
contain the true population parameter given repeated sampling
๏ต
This is often expressed as a percentage.
๏ต
For example, one might say they are 99% confident that the interval
estimate 70 ± 3 contains the true population mean, ๐œ‡
๏ต
This would also be interpreted as, ”with 0.99 probability, the interval
contains the true population mean.”
Interval Estimation
๏ต
Confidence Interval (CI): a specific interval estimate of a parameter
determined by using data obtained from a sample and by using the
specific confidence level of the estimate.
๏ต
When an interval estimate is made and a confidence level is
assigned, we have a confidence interval.
๏ต
The most common confidence levels used for confidence intervals are
the 90%, 95%, and 99% levels of confidence.
Margin of Error
๏ต
๏ต
๏ต
๏ต
A point estimator cannot be expected to provide the exact value of
the population parameter.
But an interval estimate can be used by adding and subtracting a
margin of error to the point estimate.
Margin of error (maximum error of the estimate): the maximum likely
difference between the point estimator of a parameter and the
actual value of a population parameter.
๐‘ฅ ± margin of error
The purpose is to provide information about how close the point
estimate is to the population parameter
Assumptions for CI when ๐œŽ is Known
๏ต
The sample must be a random sample.
•
๏ต
All of the sample points have an equal chance of being selected
In most applications, a sample size of n ≥ 30 is adequate
•
If n < 30, then population must be normally distributed
๏ต
If the population distribution is highly skewed or contains outliers, then
a sample size of 50 or more is recommended.
๏ต
In practice, get as much data as you can; more is better.
Interval Estimate of a Population Mean
when ๐œŽ is Known
๏ต
In order to develop an interval estimate of a population mean, the
margin of error must be computed using either:
•
The population standard deviation ๐œŽ
•
The sample standard deviation, s
๏ต
๐œŽ is rarely known exactly, but often a good estimate can be obtained
based on comprehensive historical data
๏ต
This is typically what we are referring to when we say that ๐œŽ is known.
Interval Estimate of a Population Mean
when ๐œŽ is Known
๏ต
Interval estimate of ๐œ‡
๐œŽ
๐‘ฅ ± ๐‘ง๐›ผ/2 ∗
๐‘›
๐‘ฅ = ๐‘กโ„Ž๐‘’ ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘š๐‘’๐‘Ž๐‘›
1 − ๐›ผ = the confidence coefficient
๐‘ง๐›ผ/2
๐›ผ
= ๐‘ง ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ ๐‘๐‘Ÿ๐‘œ๐‘ฃ๐‘–๐‘‘๐‘–๐‘›๐‘” ๐‘Ž๐‘› ๐‘Ž๐‘Ÿ๐‘’๐‘Ž ๐‘œ๐‘“ ๐‘–๐‘› ๐‘กโ„Ž๐‘’ ๐‘ข๐‘๐‘๐‘’๐‘Ÿ ๐‘ก๐‘Ž๐‘–๐‘™ ๐‘œ๐‘“ ๐‘กโ„Ž๐‘’ ๐‘ ๐‘ก๐‘Ž๐‘›๐‘‘๐‘Ž๐‘Ÿ๐‘‘ ๐‘›๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘™ ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘–๐‘œ๐‘›
2
๐œŽ = ๐‘กโ„Ž๐‘’ ๐‘๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘ ๐‘ก๐‘Ž๐‘›๐‘‘๐‘Ž๐‘Ÿ๐‘‘ ๐‘‘๐‘’๐‘ฃ๐‘–๐‘Ž๐‘ก๐‘–๐‘œ๐‘›
๐‘› = ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘ ๐‘–๐‘ง๐‘’
Interval Estimate of a Population Mean
when ๐œŽ is Known
๏ต
Values for ๐‘ง๐›ผ/2 for the most commonly used confidence intervals are:
Confidence
Level
๐›ผ
๐›ผ/2
Table LookUp Area
๐‘ง(๐›ผ/2)
90%
0.10
0.05
0.95
1.65
95%
0.05
0.025
0.975
1.96
99%
0.01
0.005
0.9950
2.58
Interval
Estimate of a
Population
Mean when
๐œŽ is Known
Interval Estimate of a Population Mean
when ๐œŽ is Known
Meaning of confidence
๐œŽ
๐‘›
๏ต
Suppose we choose ๐›ผ = 0.1 and we construct the intervals using ๐‘ฅ ± 1.65 ∗
๏ต
We can say that 90% of the intervals constructed given repeated sampling will
contain the true population mean
๏ต
We say that this interval has been established at the 90% confidence level
๏ต
The value 0.9 is referred to as the confidence coefficient
When ๐œŽ is Known : Example 1
๏ต
Discount Sounds has 260 retail outlets throughout the United States. The
firm is evaluating a potential location for a new outlet, based in part, on
the mean annual income of the individuals in the marketing area of the
new location.
๏ต
A sample size of n = 36 was taken and the sample mean income is
$41,100. The population is not believed to be highly skewed. The
population standard deviation is known to be $4,500, and the
confidence coefficient to be used in the interval estimate is 0.95.
When ๐œŽ is Known : Example 1
๏ต
First, find the margin of error given a confidence coefficient of 0.95:
๐œŽ
๐‘ง๐›ผ/2 ∗
=
๐‘›
•
๐‘ง๐›ผ/2 = 1.96
•
๐‘› = 36
•
๐œŽ = $4,500
๐œŽ
4,500
๐‘ง๐›ผ/2 ∗
= 1.96 ∗
= $1,470
๐‘›
36
When ๐œŽ is Known : Example 1
๏ต
Using the calculated margin of error, we can construct the interval
estimate of ๐œ‡
๐œŽ
๐‘ฅ ± ๐‘ง๐›ผ/2 ∗
๐‘›
๏ต
$41,100 ± $1,470
๏ต
($39,630, $42,570)
๏ต
We can say that we are 95% confident that the interval contains the true
population mean, ๐œ‡
When ๐œŽ is Known : Example 1
๏ต
๏ต
We can also do this for varying levels of confidence
Confidence Level
Margin of Error
Interval Estimate
90%
$1,237.5
($39,862.5, $42,337.5)
95%
$1,470
($39,630, $42,570)
99%
$1,935
($39,165, $43,035)
Notice that in order to have a higher degree of confidence, the margin
of error and thus the width of the confidence interval must be larger.
Confidence
Intervals:
Graphically
When ๐œŽ is Known : Example 2
๏ต
A researcher wishes to estimate the number of days it takes
an automobile dealer to sell a Kia Forte.
๏ต
A random sample of 50 cars had a mean time on the dealer’s
lot of 54 days.
๏ต
Assume the population standard deviation to be 6.0 days.
๏ต
Find the best point estimate of the population mean and the
95% confidence interval of the population mean.
When ๐œŽ is Known : Example 2
๐‘ฅ = 54,
๐‘ง๐›ผ/2 = 1.96,
๐œŽ = 6,
๐‘› = 50
6
50
๏ต
54 ± 1.96 ∗
๏ต
54 ± 1.7
๏ต
The confidence interval is (52.3, 55.7)
๏ต
With 95% confidence, we can say that (52.2 < ๐œ‡ < 55.7)
When ๐œŽ is Known : Example 2
Interval Estimate of a Population Mean
when ๐œŽ is Known
๏ต
Sometimes other confidence coefficients other than 90%, 95%, and 99% are used.
๏ต
We may need to calculate other values of ๐‘ง๐›ผ/2
๏ต
The value for ๐›ผ represents the total area of the areas in both tails of the distribution
๏ต
๐›ผ is found by subtracting the desired confidence interval from 1.
๏ต
For example, if we wanted a confidence level of 98%, we take:
๐›ผ = 1 - 0.98 = 0.02
๏ต
Then, we find ๐›ผ/2 = 0.01
Interval Estimate of a Population Mean
when ๐œŽ is Known
Interval Estimate of
a Population Mean
when ๐œŽ is Known
๏ต
๐›ผ/2 = 0.01
๏ต
Now subtract this value
from 1 to get the
corresponding probability.
๏ต
1 – 0.01 = 0.99
๏ต
The closest z-score is 2.33
๏ต
So, the interval would be:
๐œŽ
๐‘ฅ ± 2.33 ∗
๐‘›
Sample Size
๏ต
The size of the sample is very important in statistical estimation.
๏ต
How large must the sample be to obtain an accurate
estimate?
๏ต
The answer to this depends on three main factors:
๏ต
The margin of error
๏ต
The population standard deviation
๏ต
The degree of confidence
Determining Sample Size
๏ต
The size of the sample is very important in statistical estimation.
๏ต
How large must the sample be to obtain an accurate
estimate?
๏ต
If a desired margin of error is selected prior to the sampling,
then the sample size necessary to satisfy the margin of error
can be determined by rearranging the equation for the
margin of error.
Determining Sample Size
๏ต
Let E = the desired margin of error
๐œŽ
๐ธ = ๐‘ง๐›ผ/2 ∗
๐‘›
๏ต
The necessary sample size for a given sample:
๐‘ง๐›ผ/2 ∗ ๐œŽ
๐‘›=
๐ธ
2
Example 1: Determining Sample Size
๏ต
Discount Sounds is evaluating a potential location for a new retail
outlet based on, in part, the mean annual income of the
individuals in the marketing area of the new location.
๏ต
Suppose that Discount Sounds management team wants an
estimate of the population mean such that there is a 0.95
probability that the sampling error is $500 or less.
๏ต
How large of a sample size is needed to meet the required
precision?
Example 1: Determining Sample Size
๏ต
We want to find n to get a margin of error equal to 500.
๐œŽ
500 = ๐‘ง๐›ผ/2 ∗
๐‘›
๏ต
At 95% confidence, ๐‘ง๐›ผ/2 = ๐‘ง0.025 = 1.96. Recall ๐œŽ = 4,500
๐‘ง๐›ผ/2 ∗ ๐œŽ
๐‘›=
๐ธ
๏ต
2
1.96 ∗ 4500
=
500
2
≈ 312
A sample size of 312 is needed to reach the desired precision.
Example 2: Determining Sample Size
๏ต
A sociologist wishes to estimate the average number of
automobile thefts in a large city per day within 2 automobiles.
๏ต
He wishes to be 99% confident, and from a previous study the
standard deviation was found to be 4.2.
๏ต
How many days should he select to survey?
Example 2: Determining Sample Size
๏ต
First, what key information do we know?
๐›ผ = 0.01
• ๐‘ง๐›ผ/2 = 2.58
• ๐ธ=2
•
•
๐‘›=
๐‘ง๐›ผ/2 ∗๐œŽ 2
๐ธ
๐‘ง๐›ผ/2 ∗ ๐œŽ
๐‘›=
๐ธ
๏ต
2
2.58 ∗ 4.2
=
2
2
= 29.35
The researcher should take a sample of 30 or more to achieve the desired
accuracy
Determining Sample Size
๏ต
๏ต
The necessary sample size equation requires a value for the population
standard deviation ๐œŽ
If ๐œŽ is unknown, a preliminary or planning value for ๐œŽ can be used in the
equation:
•
•
๏ต
Use the estimate of the population standard deviation computed in a previous
study
Use a pilot study and use the sample standard deviation from the study
Use judgement or a best guess for the value of ๐œŽ
Download