Uploaded by stephen banda

Estimation

advertisement
Point and Interval Estimates
Hope Sabao(PhD)
University of Lusaka
9th July, 2021
1
Estimation
Estimation is a procedure by which a numerical value or values are
assigned to a population parameter based on the information
collected from a sample.
Definition
Estimation: The assignment of value(s) to a population parameter
based on a value of the corresponding sample statistic is called
estimation.
In inferential statistics, µ is called the true population mean and p
is called the true population proportion. There are many other
population parameters, such as the median, mode, variance, and
standard deviation.
2
Example
Example
The central statistics office in Zambia may want to find the mean
housing expenditure per month incurred by households. The mean
housing expenditure per month incurred by households is an
illustration of estimating the true population mean µ.
If we can conduct a census (a survey that includes the entire
population) each time we want to find the value of a population
parameter, then the estimation procedures explained in this and
subsequent chapters are not needed. For example, if the central
statistics office can contact every household in the Zambia to find
the mean housing expenditure incurred by households, the result of
the survey (which will actually be a census) will give the value of µ
and the procedures learned in this chapter will not be needed.
However, it is too expensive, very time consuming, or virtually
impossible to contact every member of a population to collect
information to find the true value of a population parameter.
Therefore, we usually take a sample from the population and
calculate the value of the appropriate sample statistic. Then we
assign a value or values to the corresponding population parameter
based on the value of the sample statistic. This chapter (and
subsequent chapters) explains how to assign values to population
parameters based on the values of sample statistics.
To estimate the mean housing expenditure per month incurred by
all households in the Zambia, the Census Bureau will take a sample
of certain households, collect the information on the housing
expenditure that each of these households incurs per month, and
compute the value of the sample mean, x̄. Based on this value of
the bureau will then assign values to the population mean, µ.
4
Definition
Estimate and Estimator: The value(s) assigned to a population
parameter based on the value of a sample statistic is called an
estimate. The sample statistic used to estimate a population
parameter is called an estimator.
The estimation procedure involves the following steps.
1. Select a sample.
2. Collect the required information from the members of the
sample.
3. Calculate the value of the sample statistic.
4. Assign value(s) to the corresponding population parameter.
5
Point and Interval Estimates
A Point Estimate
If we select a sample and compute the value of the sample statistic
for this sample, then this value gives the point estimate of the
corresponding population parameter.
Definition
Point Estimate: The value of a sample statistic that is used to
estimate a population parameter is called a point estimate.
Thus, the value computed for the sample mean, x̄ from a sample is
a point estimate of the corresponding population mean,µ. For the
example mentioned earlier, suppose the Census Bureau takes a
sample of 10,000 households and determines that the mean
housing expenditure per month, x̄. for this sample is $1970. Then,
using, x̄ as a point estimate of µ
6
The Bureau can state that the mean housing expenditure per
month ,µ ,for all households is about $1970. Thus
Point estimate of a population parameter=Value of the
corresponding sample statistic
Each sample selected from a population is expected to yield a
different value of the sample statistic. Thus, the value assigned to
a population mean, µ, based on a point estimate depends on which
of the samples is drawn. Consequently, the point estimate assigns
a value to µ that almost always differs from the true value of the
population mean.
7
An Interval Estimate
In the case of interval estimation, instead of assigning a single
value to a population parameter, an interval is constructed around
the point estimate, and then a probabilistic statement that this
interval contains the corresponding population parameter is made.
Definition
Interval Estimation: In interval estimation, an interval is
constructed around the point estimate, and it is stated that this
interval is likely to contain the corresponding population parameter.
For the example about the mean housing expenditure, instead of
saying that the mean housing expenditure per month for all
households is $1970, we may obtain an interval by subtracting a
number from $1970 and adding the same number to $1970. Then
we state that this interval contains the population mean,µ. For
purposes of illustration, suppose we subtract $340 from $1970 and
add $340 to $1970. Consequently, we obtain the interval ($1970 –
$340) to ($1970+$340), or $1630 to $2310.
8
Then we state that the interval $1630 to $2310 is likely to contain
the population mean,µ, and that the mean housing expenditure
per month for all households in Zambia is between $1630 and
$2310. This procedure is called interval estimation. The value
$1630 is called the lower limit of the interval, and $2310 is called
the upper limit of the interval. The number we add to and
subtract from the point estimate is called the margin of error.
The question arises: What number should we subtract from and
add to a point estimate to obtain an interval estimate? The
answer to this question depends on two considerations:
1. The standard deviation σx̄ of the sample mean, x̄
2. The level of confidence to be attached to the interval
9
First, the larger the standard deviation of x̄ the greater is the
number subtracted from and added to the point estimate. Thus, it
is obvious that if the range over which x̄ can assume values is
larger, then the interval constructed around x̄ must be wider to
include µ.
Second, the quantity subtracted and added must be larger if we
want to have a higher confidence in our interval. We always attach
a probabilistic statement to the interval estimation. This
probabilistic statement is given by the confidence level. An interval
constructed based on this confidence level is called a confidence
interval.
Definition
Confidence Level and Confidence Interval Each interval is
constructed with regard to a given confidence level and is called a
confidence interval. The confidence interval is given as
Point estimate ± Margin of error
The confidence level associated with a confidence interval states
how much confidence we have that this interval contains the true
population parameter. The confidence level is denoted by
(1 − α)100%,
where α is the Greek letter alpha. When expressed as probability, it
is called the confidence coefficient and is denoted by 1 − α.
Although any value of the confidence level can be chosen to
construct a confidence interval, the more common values are 90%,
95%, and 99%. The corresponding confidence coefficients are .90,
.95, and .99, respectively.
Estimation of a Population Mean: σ Known
We now look at how to construct a confidence interval for the
population mean when the population standard deviation is known.
Here, there are three possible cases, as follows:
Case I. If the following three conditions are fulfilled:
1. The population standard deviation is known
2. The sample size is small (i.e. n < 30)
3. The population from which the sample is selected is normally
distributed,
then we use the normal distribution to make the confidence
interval for µ.
12
Case II. If the following two conditions are fulfilled:
1. The population standard deviation σ is known
2. The sample size is large (n ≥ 30)
then, again, we use the normal distribution to make the confidence
interval for µ.
13
Case II. If the following two conditions are fulfilled:
1. The population standard deviation σ is known
2. The sample size is large (n ≥ 30)
then, again, we use the normal distribution to make the confidence
interval for µ.
Case III. If the following three conditions are fulfilled:
1. The population standard deviation σ is known.
2. The sample size is small (i.e n < 30)
3. The population from which the sample is selected is not
normally distributed (or its distribution is unknown),
then we use a nonparametric method to make the confidence
interval for µ. Such methods are beyond the scope of this course.
Confidence Interval for µ
The (1 − α)100% confidence interval for µ under case I and II is
x̄ ± zσx̄
where
σ
σx̄ = √
n
The value of z used here is obtained from the standard normal
distribution table. The quantity zσx̄ in the confidence interval
formula is called the margin of error and is denoted by E.
Definition
The margin of error for the estimate for µ, denoted by E , is the
quantity that is subtracted from and added to the value of x̄ to
obtain a confidence interval for µ. Thus
E = zσx̄
15
The value of z in the confidence interval formula is obtained from
the standard normal distribution table for the given confidence
level. To illustrate, suppose we want to construct a 95%
confidence interval for µ. A 95% confidence level means that the
total area under the normal curve for x̄ between two points (at the
same distance) on different sides of µ is 95% or 0.95 as shown in
the figure below:
16
Note that we have denoted these two points by z1 and z2 . To find
the value of z for a 95% confidence level, we first find the areas to
the left of these two points, z1 and z2 . Then we find the z values
for these two areas from the normal distribution table. Note that
these two values of z will be the same but with opposite signs. To
find these values of z, we perform the following two steps:
17
Step 1
The first step is to find the areas to the left of z1 and z2 ,
respectively. Note that the area between z1 and z2 is denoted by
1 − α. Hence, the total area in the two tails is α because the total
area under the curve is 1.0. Therefore, the area in each tail is α2 .
In our example, 1 − α = 0.95. Hence, the total area in both tails is
α = 1 − 0.95 = 0.05. Consequently, the area in each tail is
α
0.05
2 = 2 = 0.025. Then, the area to the left of z1 is 0.0250 and
the area to the left of z2 is 0.0250 + 0.95 = 0.9750.
Step 2
Now find the z values from standard Normal Table such that the
areas to the left z1 of and z2 are .0250 and .9750, respectively.
These z values are -1.96 and 1.96, respectively.
Thus, for a confidence level of 95%, we will use z = 1.96 in the
confidence interval formula.
The following table lists the z values for some of the most
commonly used confidence levels. Note that we always use the
positive value of z in the formula.
19
Example
Example
A publishing company has just published a new college textbook.
Before the company decides the price at which to sell this
textbook, it wants to know the average price of all such textbooks
in the market. The research department at the company took a
sample of 25 comparable textbooks and collected information on
their prices. This information produced a mean price of $145 for
this sample. It is known that the standard deviation of the prices
of all such textbooks is $35 and the population of such prices is
normal.
(a) What is the point estimate of the mean price of all such
college textbooks?
(b) Construct a 90% confidence interval for the mean price of all
such college textbooks.
Solution
Here, σ is known. Although n < 30, the population is normally
distributed. Hence, we can use the normal distribution. From the
given information,
n = 25,
x̄ = $145 and σ = $35
The standard deviation for x̄ is given by
35
σ
σx̄ = √ = √ = $7.00
n
25
(a) The point estimate of the mean price of all such college
textbooks is $145; that is,
Point estimate of µ = x̄ = $145
(b) The confidence level is 90%, or .90. First we find the z value
for a 90% confidence level. Here, the area in each tail of the
normal distribution curve is α/2 = (1 − 0.90)/2 = 0.05. In the
normal look for the areas .0500 and .9500 and find the
corresponding values of z. These values are z = −1.65 and
z = 1.65. Next, we substitute all the values in the confidence
interval formula for µ. The 90% confidence interval for µ is
x̄ ± zσx̄
= 145 ± 1.65(7.00) = 145 ± 11.55
= (145 − 11.552) to (145 + 11.552)
= $133.45 to $156.55
Thus, we are 90% confident that the mean price of all such college
textbooks is between $133.45 and $156.55.
Determining the Sample Size for the Estimation of Mean
One reason we usually conduct a sample survey and not a census is
that almost always we have limited resources at our disposal. In
light of this, if a smaller sample can serve our purpose, then we
will be wasting our resources by taking a larger sample. For
instance, suppose we want to estimate the mean life of a certain
auto battery. If a sample of 40 batteries can give us the confidence
interval we are looking for, then we will be wasting money and
time if we take a sample of a much larger size, say, 500 batteries.
In such cases, if we know the confidence level and the width of the
confidence interval that we want, then we can find the
(approximate) size of the sample that will produce the required
result.
23
From earlier discussion, we learned that E = z · σx̄ is called the
margin of error of estimate for µ. As we know, the standard
deviation of the sample mean is equal to √σn Therefore, we can
write the margin of error of estimate for µ as:
σ
E =z·√
n
Suppose we predetermine the size of the margin of error, E, and
want to find the size of the sample that will yield this margin of
error. From the above expression, the following formula is obtained
that determines the required sample size n.
Determining the Sample Size for the Estimation of µ
Given the confidence level and the standard deviation of the
population, the sample size that will produce a predetermined
margin of error E of the confidence interval estimate of µ is
n=
z 2σ2
E2
Example
An alumni association wants to estimate the mean debt of this
year’s college graduates. It is known that the population standard
deviation of the debts of this year’s college graduates is $11,800.
How large a sample should be selected so that the estimate with a
99% confidence level is within $800 of the population mean?
Solution
The alumni association wants the 99% confidence interval for the
mean debt of this year’s college graduates to be
x̄ ± $800
Hence, the maximum size of the margin of error of estimate is to
be $800; that is, E = $800.
25
The value of z for a 99% confidence level is 2.58. The value of σ is
given to be $11,800. Therefore, substituting all values in the
formula and simplifying, we obtain
n=
z 2σ2
(2.58)2 (11, 800)2
=
= $1448.18 ≈ 1449
E2
8002
Thus, the required sample size is 1449. If the alumni association
takes a sample of 1449 of this year’s college graduates, computes
the mean debt for this sample, and then makes a 99% confidence
interval around this sample mean, the margin of error of estimate
will be approximately $800. Note that we have rounded the final
answer for the sample size to the next higher integer. This is
always the case when determining the sample size.
26
Estimation of a Population Mean: σ Not Known
This section explains how to construct a confidence interval for the
population mean µ when the population standard deviation σ is
not known. Here, again, there are three possible cases:
Case I. If the following three conditions are fulfilled:
1. The population standard deviation σ is not known
2. The sample size is small (ie n < 30)
3. The population from which the sample is selected is normally
distributed
then we use the t distribution to make the confidence interval for µ
Case II. If the following two conditions are fulfilled:
1. The population standard deviation σ is not known
2. The sample size is large (n > 30)
then again we use the t distribution to make the confidence
interval for µ.
27
Case III: If the following three conditions are fulfilled:
1. The population standard deviation σ is not known.
2. The sample size is small (i.e n < 30)
3. The population from which the sample is selected is not
normally distributed (or its distribution is unknown),
then we use a nonparametric method to make the confidence
interval for µ. Such procedures are beyond the scope of this course.
The t-Distribution
• The t distribution was developed by W. S. Gosset in 1908 and
published under the pseudonym Student. As a result, the t
distribution is also called Student’s t distribution.
• The t distribution is similar to the normal distribution in some
respects. Like the normal distribution curve, the t distribution
curve is symmetric (bell shaped) about the mean and never
meets the horizontal axis.
• The total area under a t distribution curve is 1.0, or 100%.
However, the t distribution curve is flatter than the standard
normal distribution curve.
• In other words, the t distribution curve has a lower height and
a wider spread (or, we can say, a larger standard deviation)
than the standard normal distribution. However, as the
sample size increases, the t distribution approaches the
standard normal distribution. The units of a t distribution are
denoted by t.
29
The shape of a particular t distribution curve depends on the
number of degrees of freedom (df). The number of degrees of
freedom for a t distribution is equal to the sample size minus one,
that is,
df = n − 1
Definition
The t Distribution The t distribution is a specific type of
bell-shaped distribution with a lower height and a wider spread
than the standard normal distribution. As the sample size becomes
larger, the t distribution approaches the standard normal
distribution. The t distribution has only one parameter, called the
degrees of freedom (df ). The mean of the t distribution is equal
to 0, and its standard deviation is
r
df
df − 2
The following figure shows the standard normal distribution and
the t distribution for 9 degrees of freedom. The standard deviation
of the standard normal distribution is 1.0, and the standard
deviation of the t distribution is
r
9
= 1.134
9−2
31
Meaning of Degrees of Freedom
• The number of degrees of freedom for a t distribution for the
purpose of this chapter is n − 1.
• The number of degrees of freedom is defined as the number of
observations that can be chosen freely.
• As an example, suppose we know that the mean of four values
is 20. Consequently, the sum of these four values is
20(4) = 80.
• Now, how many values out of four can we choose freely so
that the sum of these four values is 80?
32
• The answer is that we can freely choose 4 − 1 = 3. Suppose
we choose 27, 8, and 19 as the three values. Given these
three values and the information that the mean of the four
values is 20, the fourth value is 80-27-8-19=26. Thus, once
we have chosen three values, the fourth value is automatically
determined. Consequently, the number of degrees of freedom
for this example is
df = 4 − 1 = 3
We subtract 1 from n because we lose 1 degree of freedom to
calculate the mean.
33
Example
Find the value of t for 16 degrees of freedom and .05 area in the
right tail of a t distribution curve.
Solution
• In the t distribution table, we locate 16 in the column of
degrees of freedom (labeled df ) and .05 in the row of Area in
the right tail under the t distribution curve at the top of the
table.
• The entry at the intersection of the row of 16 and the column
of .05, which is 1.746, gives the required value of t.
Determining t for 16 df and .05 Area in the Right Tail
35
The value of t for 16 df and .05 area in the right tail.
36
The value of t for 16 df and .05 area in the left tail.
37
Confidence interval for µ using the t-distribution
The (1 − α)100% confidence interval for µ is
x̄ ± tsx̄
where
s
sx̄ = √
n
The value of t is obtained from the t distribution table for n − 1
degrees of freedom and the given confidence level. Here tsx̄ is the
margin of error of the estimate; that is,
E = tsx̄
38
Example
Example
Sixty-four randomly selected adults who buy books for general
reading were asked how much they usually spend on books per
year. The sample produced a mean of $1450 and a standard
deviation of $300 for such annual expenses. Determine a 99%
confidence interval for the corresponding population mean.
Solution
From the given information,
n = 64,
x̄ = $1450 and s = $300.
and
Confidence level = 99% or 0.99
Here σ is not known but sample size is large (n > 30). Hence, we
will use the t distribution to make a confidence interval for µ. First
we calculate the standard deviation of x̄. the number of degrees of
Here σ is not known but sample size is large (n > 30). Hence, we
will use the t distribution to make a confidence interval for µ. First
we calculate the standard deviation of x̄. the number of degrees of
freedom, and the area in each tail of the t distribution.
300
s
sx̄ = √ = √ = 37.50
n
64
df = n − 1 = 64 − 1 = 63
1 − 0.99
= 0.005
Area in each tail =
2
From the t distribution table, t = 2.656 for 63 degrees of freedom
and .005 area in the right tail. The 99% confidence interval for µ is
x̄ ± tsx̄ = $1450 ± 2.656(37.50) = $1450 ± $99.60
= $1350.40 to $1549.60
Thus, we can state with 99% confidence that based on this sample
the mean annual expenditure on books by all adults who buy
books for general reading is between $1350.40 and $1549.60.
Download