2-Estimation and Inf..

advertisement
CHAPTER 2
ESTIMATION AND INFERENCE
1.
2.
Sample Statistic
The Sampling Distribution of the Sample Mean
2.1. Expected Value and Variance of 𝑥̅
The Normal Sampling Distribution of 𝑥̅
3.1. The Margin of Sampling Error
Properties of Estimators
4.1. Unbiased Estimators
4.1.1. Proof that 𝑠² is an unbiased estimator of the population variance σ²
4.2. Efficient Estimators
Confidence Interval (Interval Estimate) for the Population Mean
5.1. The 𝑡 Distribution
Test of Hypothesis for µ
6.1. The probability value
3.
4.
5.
6.
1. Sample Statistic
In the previous discussions of random variables, both discrete and continuous, we have assumed that we
have exact information about the probability distribution or probability density function of the random
variable. In particular, we have assumed that we have the exact knowledge about the population parameters,
namely, the mean (expected value) µ and the variance σ².
In practice, other than for random variables whose values can be determined through random experiments
that can be repeated under identical conditions, we do not know the exact probability distribution or density
function of a random variable. Therefore, we do not have an exact knowledge of the population parameters.
The next best alternative to the full knowledge of the population parameters is to estimate their values based
on data obtained through a random sample. The estimators of the two population parameters µ and σ² are,
respectively, 𝑥̅ (the sample mean) and 𝑠² (the sample variance), where,
𝑥̅
=
∑𝑥
𝑛
and
𝑠2 =
∑(𝑥 − 𝑥̅ )2
𝑛−1
These estimators 𝑥̅ and 𝑠² each is a sample statistic. The specific value obtained from the sample data for
𝑥̅ and 𝑠² are called estimates.
2. The Sampling Distribution of the Sample Mean
To obtain an estimate of the population mean we take a single random sample of size 𝑛 from the population.
From the sample data we compute the sample mean as an estimate of µ. The value of the sample mean 𝑥̅
depends upon the random sample selected. Since this value is not known until we take the random sample,
then 𝑥̅ is a random variable. Since 𝑥̅ is a random variable, then it has a probability distribution. The
probability distribution of 𝑥̅ is called the sampling distribution of 𝑥̅ . To explain the sampling distribution,
consider the following simple example.
2-Estimation and Inference
1 of 21
Suppose we have a population consisting of 𝑁 = 5 elements with the following associated values represented
by 𝑥:
Population Element
A
B
C
D
E
𝑥
15
12
9
6
3
First compute the mean and variance of the population:
µ=9
and
σ2 = ∑(𝑥 − µ)2 ⁄𝑁 = 18
Next write each of the 𝑥 values in the population on a ball and put them in a bowl. Now select a sample of size
𝑛 = 3 without replacement and compute the sample mean. Even though we are selecting only a sample of
size 3, this sample is one of the 10 possible samples that can be selected. These possible samples are listed
below along with the mean corresponding to each sample.
Sample
Elements
A B C
A B D
A B E
A C D
A C E
A D E
B C D
B C E
B D E
C D E
Sample Data
𝑥𝑖
15, 12, 9
15, 12, 6
15, 12, 3
15, 9, 6
15, 9, 3
15, 6, 3
12, 9, 6
12, 9, 3
12, 6, 3
9, 6, 3
Sample Mean
𝑥̅
12
11
10
10
9
8
9
8
7
6
The following table shows the relative frequency distribution of the sample means. The relative frequency
shows that there is a probability associated with each value of the sample mean. This probability distribution
is called the sampling distribution of the random variable x̄ .
𝑥̅
6
7
8
9
10
11
12
𝑓(𝑥̅ )
0.1
0.1
0.2
0.2
0.2
0.1
0.1
The sampling distribution of 𝑥̅ above implies that when the above experiment is conducted many times, 20%
of such samples would yield a sample mean of, say, 9, or 10 percent of the time we will get a sample mean of,
say, 6. As the above distribution shows, each sample mean value has its own probability of occurring.
2-Estimation and Inference
2 of 21
̅, or the Mean of the Means
2.1. The Expected Value of 𝒙
In the previous chapter, you learned that the expected value of a random variable is the mean value of that
random variable, obtained as the weighted mean of the values of the random variable, where the weights are
the probability associated with each value.
E(𝑥) = ∑𝑥𝑓(𝑥)
Now the random variable of interest is the sample mean 𝑥̅ . Thus, the expected value of 𝑥̅ is the “mean of 𝑥̅ ”.
E(𝑥̅ ) = ∑𝑥̅ 𝑓(𝑥̅ )
The following table shows the calculation of the mean or expected value of 𝑥̅ .
𝑥̅
6
7
8
9
10
11
12
𝑓(𝑥̅ )
0.1
0.1
0.2
0.2
0.2
0.1
0.1
𝑥̅ 𝑓 (𝑥̅ )
0.6
0.7
1.6
1.8
2.0
1.1
1.2
9.0
E(𝑥̅ ) = µ𝑥̅ = ∑𝑥̅ 𝑓(𝑥̅ )
Note that µ𝑥̅ = µ. This is a very important result. This relationship between the mean of the sample means,
E(𝑥̅ ), and the population mean, µ, is the cornerstone of inferential statistics. The relationship can simply be
stated as: “The mean of the means equals the mean.” That is, the expected value of the sample means is equal
to the mean of the parent population:
E(𝑥̅ ) = E (
∑𝑥
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
) = E(
)
𝑛
𝑛
E(𝑥̅ ) =
1
E(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 )
𝑛
E(𝑥̅ ) =
1
[E(𝑥1 ) + E(𝑥2 ) + ⋯ + E(𝑥𝑛 )]
𝑛
E(𝑥̅ ) =
1
1
(µ + µ + ⋯ + µ) = 𝑛µ = µ
𝑛
𝑛
Note that since 𝑥𝑖 are randomly selected from the same population, then the expected value of each 𝑥𝑖 is the
population mean µ.
2.2. Variance of the Mean
Using the expression var(𝑥̅ ) to denote the variance of 𝑥̅ , compute var(𝑥̅ ). Remember that the variance of a
random variable is the expected value (the mean) of the squared deviations. Thus,
var(𝑥̅ ) = E[(𝑥̅ − µ)2 ] = ∑(𝑥̅ − µ)2 𝑓(𝑥̅ )
2-Estimation and Inference
3 of 21
𝑥̅
6
7
8
9
10
11
12
𝑓(𝑥̅ )
0.1
0.1
0.2
0.2
0.2
0.1
0.1
(𝑥̅ − µ)2
9
4
1
0
1
4
9
(𝑥̅ − µ)2 𝑓(𝑥̅ )
0.9
0.4
0.2
0.0
0.2
0.4
0.9
3.0
var(𝑥̅ ) = ∑(𝑥̅ − µ)2 𝑓(𝑥̅ ) = 3
The variance of 𝑥̅ is not equal to the variance of 𝑥 (the variance of the parent population), that is, var(𝑥̅ ) ≠ σ2 .
However, there is a definite relationship between the two variances, as shown by the following formula:
var(𝑥̅ ) =
σ2 𝑁 − 𝑛
(
)
𝑛 𝑁−1
var(𝑥̅ ) =
18 5 − 3
(
)=3
3 5−1
The term (
𝑁−𝑛
𝑁−1
) in the formula is called the finite population correction factor. This term disappears for non-
finite populations. Thus,
var(𝑥̅ ) =
σ2
𝑛
The proof of this relationship follows:
var(𝑥̅ ) = var (
∑𝑥
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
) = var (
)
𝑛
𝑛
var(𝑥̅ ) =
1
var(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 )
𝑛2
var(𝑥̅ ) =
1
[var(𝑥1 ) + var(𝑥2 ) + ⋯ + var(𝑥𝑛 )]
𝑛2
var(𝑥̅ ) =
1 2
(σ + σ2 + ⋯ + σ2 )
𝑛2
var(𝑥̅ ) =
1
σ2
2
𝑛σ
=
𝑛2
𝑛
Note that since 𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 are randomly selected from the same population, then
var(𝑥1 ) = var(𝑥2 ) = ⋯ = var(𝑥𝑛 ) = σ2
The square root of the variance of 𝑥̅ is the standard error of the mean, denoted by se(𝑥̅ ).
𝑠𝑒(𝑥̅ ) =
σ
√𝑛
2-Estimation and Inference
4 of 21
̅
3. The Normal Sampling Distribution of 𝒙
Note that the number of samples of size n quickly becomes astronomical. For example, the number of
possible samples of size 𝑛 = 40 selected from a population of size 𝑁 = 1,000 is:
555,974,423,571,664,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
Since each sample yields its own 𝑥̅ value, then the random variable 𝑥̅ takes on infinite values, making 𝑥̅ a
continuous random variable with a smooth probability density function. In fact, in most cases, the sampling
distribution of 𝑥̅ is normal or approximately normal. If the parent population from which the sample is taken
is normal, then the sampling distribution of 𝑥̅ is also normal.
When the parent population
distribution is normal with
mean μ and standard deviation
σ, ...
σ
Parent population
distribution
x
... the sampling distribution of
is also normal with mean μ
and standard error
Sampling
distribution of x̄
μ
If the parent population is not normal, the sampling distribution of x̄ approaches normal, per central limit
theorem, as we increase the sample size n. The minimum sample size to have an approximate normal
sampling distribution of 𝑥̅ is 𝑛 = 30.
2-Estimation and Inference
5 of 21
When the parent population
distribution is NOT normal, ...
x
... the sampling distribution of
is approximatedly normal
with mean μ and standard
error
, if n ≥ 30.
μ
3.1. Central Limit Theorem
In applied statistical analysis many of the random variables used can be characterized as the sum of a large
number of independent random variables. For example, total daily sales in a store are the result of a number of
sales to individual customers—each of which can be modeled as a random variable. Total investment in the
United States in a month is the sum of individual investments by many independent firms. Thus, if
𝑥1 , 𝑥2 , . . . , 𝑥𝑛 represents the result of individual random events, the observed random variable 𝑥 is the sum
these random variables:
𝑥 = ∑𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
Using the properties of expected value shown in the previous chapter,
E(𝑥) = E(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) = 𝑛µ
var(𝑥) = var(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) = var(𝑥1 ) + var(𝑥2 ) + ⋯ + var(𝑥𝑛 ) = 𝑛σ2
The CLT states that the resulting sum, 𝑥 = ∑𝑥𝑖 , is normally distributed with mean 𝑛µ and standard deviation
𝑛σ2 .
𝑥~𝑁(𝑛µ, √𝑛𝜎)
Therefore,
2-Estimation and Inference
6 of 21
𝑧=
𝑥 − 𝑛µ
√𝑛σ
=
∑𝑥𝑖 − 𝑛µ
√𝑛σ
is a standard normal random variable. If we divide the numerator and the denominator on the right hand side
by 𝑛, we have:
𝑧=
𝑥̅ − µ
σ⁄√𝑛
This implies that
𝑥̅ ~ N(µ, σ⁄√𝑛)
Example 1
The speed on a certain stretch of an interstate highway is normally distribution with a mean of µ = 80 mph
with a standard deviation of σ = 5 mph.
a) If a vehicle is randomly clocked, what is the probability that the speed is below 82 mph?
𝑃(𝑥 < 82) = ________
𝑧=
𝑥 − µ 82 − 80
=
= 0.40
𝜎
5
P(𝑧 < 0.40) = 0.6554
Alternatively, using the Excel NORM.DIST(...) command,
=NORM.DIST(82,80,5,1)
= 0.6554
0.6554
x = 82
μ = 80
x
b) If a random sample of 𝑛 = 16 vehicles is clocked, what is the probability that the average sample speed is
below 82 mph?
P(𝑥̅ < 82) = ________
Now you have to use the sampling distribution of 𝑥̅ to solve this problem.
𝑥̅ ~ N(µ, 𝑠𝑒(𝑥̅ ) = σ⁄√𝑛)
se(𝑥̅ ) = 5⁄√16 = 1.25
2-Estimation and Inference
7 of 21
𝑧=
𝑥̅ − µ 82 − 80
=
= 1.60
se(𝑥̅ )
1.25
P(𝑧 < 1.60) = 0.9772
=NORM.DIST(82,80,1.25,1)= 0.9452
0.9452
μ = 80
x̅ = 82
x̅
3.2. The Margin of Sampling Error
Note that, like any random variable, the random variable 𝑥̅ consists of a fixed component and a random
component. The fixed component is the mean or expected value of 𝑥̅ , which is the population mean µ, and the
random component is denoted by ε.
𝑥̅ = µ + ϵ
Using 𝑧 =
𝑥̅ −µ
se(𝑥̅ )
, and solving for 𝑥̅ , we have,
𝑥̅ = µ + 𝑧 ∙ se(𝑥̅ )
The random component 𝑥̅ , then, can be presented as,
ϵ = 𝑧 ∙ se(𝑥̅ )
The random component, called the margin of statistical (sampling) error, is a function of 𝑧. Using this
relationship between 𝜖 and 𝑧 we can determine intervals within which the sample mean will fall with an
associated probability. For example, suppose we want to find the lower and upper ends of a middle interval
(symmetric about the mean) that contains 95% of all the possible sample means. Of the remaining 5% of the
sample means, then 2.5% would exceed the upper boundary value and the other 2.5% would be below the
lower boundary. This "5%" represents the probability that the 𝑥̅ falls outside the "95% interval" and it is
called the error probability, represented by the symbol α. Generally, given an α value, then 1 − 𝛼 represent
the probability that the interval would contain the sample mean. Thus, 𝑥̅𝐿 and 𝑥̅𝑈 are the boundaries of the
interval that contain 1 − α proportion of all possible sample means. Of the remaining 𝑥̅ values, 𝛼⁄2 each fall
on the right tail (to the right of 𝑥̅𝑈 ) and the left tail (to the left of 𝑥̅𝐿 ) of the distribution.
2-Estimation and Inference
8 of 21
1−α
𝛼⁄2
𝑥̅𝐿 = 𝜇 − 𝑧𝛼⁄2 se(𝑥̅ )
𝛼⁄2
μ
𝑥̅𝑈 = 𝜇 + 𝑧𝛼⁄2 se(𝑥̅ )
x̅
The diagram is a graphic representation of the following probability statement:
P(𝑥̅𝐿 < 𝑥̅ < 𝑥̅𝑈 ) = 1 − 𝛼
P(µ − ϵ < 𝑥̅ < µ + ϵ) = P(µ − 𝑧α⁄2 se(𝑥̅ ) < 𝑥̅ < µ + 𝑧α⁄2 se(𝑥̅ )) = 1 − 𝛼
The 𝑧 score that bounds a tail area of α⁄2 under the standard normal curve is 𝑧α⁄2 . Thus, the margin of error
(MOE) formula is generally written as:
ϵ = 𝑧α⁄2 se(𝑥̅ )
and,
P(µ − 𝑧α⁄2 se(𝑥̅ ) < 𝑥̅ < µ + 𝑧α⁄2 se(𝑥̅ )) = 1 − 𝛼
Example 2
The speed on a certain stretch of an interstate highway is normally distribution with a mean of 80 mph with a
standard deviation of 5 mph. A random sample of 𝑛 = 64 vehicles is clocked. Find the 95% margin of error
for the sample mean. In other words, find the middle interval of 𝑥̅ values which contains 95% of all possible
sample means for samples of size 𝑛 = 64.
1 − α = 0.95
α⁄2 = 0.025
𝑧α⁄2 = 𝑧0.025 = 1.96
se(𝑥̅ ) = σ⁄√𝑛 = 5⁄√64 = 0.625
ϵ = 𝑧α⁄2 se(𝑥̅ ) = 1.96(0.625) = 1.225
𝑥̅𝐿 = 80 − 1.225 = 78.775
𝑥̅𝑈 = 80 + 1.225 = 81.225
2-Estimation and Inference
9 of 21
0.9500
78.775
μ = 80
81.225
x̅
4. Properties of Estimators
4.1. Unbiased Estimators
Since estimators are random variables with infinite number of values, the probability that a single estimate
will equal the population parameter is practically zero. Thus there will always be a deviation between the
estimate and the parameter. If the parameter of interest is the population mean µ, then the deviation
between the sample mean x̄ and µ is:
𝑥̅ − µ = ϵ
Although this deviation will never be zero for any single estimate, in repeated sampling it is desirable that the
mean or expected value of the deviation be zero, that is, the deviation above and below µ cancel each other
out: E(ϵ) = 0. If this equality holds in the long run, then
E(ϵ) = E(𝑥̅ − µ) = E(𝑥̅ ) − µ = 0
Thus,
E(𝑥̅ ) = µ
If deviations average to zero, then the expected value of 𝑥̅ is equal to the mean of the population. If this is
true, then 𝑥̅ is said to be an unbiased estimator of the population mean. The proof that E(𝑥̅ ) = µ was shown
above in the discussion of the sampling distribution of 𝑥̅ .
4.1.1.
Proof that 𝒔² is an unbiased estimator of the population variance 𝛔²
We learned that to compute the variance of the sample you use the formula,
𝑠2 =
∑(𝑥 − 𝑥̅ )2
𝑛−1
The variance is the mean squared deviation of the data from the sample mean. In computing of the mean
squared deviation for the sample data, why do we divide the sum of squared deviations by 𝑛 − 1 and not by
𝑛?
This has to do with the fact that when computing 𝑠² we are finding the deviations of the random variable 𝑥
from another random variable, that is, 𝑥̅ . Thus, for a sample of size n, the number of random squared
deviations is reduced by 1. To explain, suppose you randomly select three items (𝑛 = 3) from a population
and obtain the following data points: 3, 9, 12. The mean of this sample, another random number, is 𝑥̅ = 8.
2-Estimation and Inference
10 of 21
Given this mean, the first two squared deviations are (3 − 8)2 = 25 and (9 − 8)2 = 1. These are the only two
random squared deviations. The third squared deviation, (12 − 8)2 = 16, is no longer random because when
the mean is 8, the third number must be 12. Thus, you lose one “degree of freedom”. 1 To be an unbiased
mean, the mean of the squared deviations is then obtained by using 𝑛 − 1 = 2 degrees of freedom in the
denominator.
If we divide the sum of squared deviation by 𝑛, then the sample variance would be smaller and thus it would
underestimate the population variance. In other words, 𝑠² would be a biased estimator of the population
variance. The following shows that using 𝑛 in the denominator of the sample variance would make it a biased
estimator, and when divided by 𝑛 − 1 the bias disappears.
For 𝑠 2 to be an unbiased estimator of σ² the following must hold.
E(𝑠 2 ) = σ²
In the following proof, it will be shown that if in the sample variance formula the sum of squared deviations is
divided by n,
𝑠2 =
∑(𝑥 − 𝑥̅ )2
𝑛
then
E(𝑠 2 ) =
𝑛−1 2
σ < σ2
𝑛
That is, the expected value of the sample variance would be less than the population variance, imparting a
downward bias to the estimator. Therefore, dividing the sum of deviation squares of 𝑥, ∑(𝑥 − 𝑥̅ )2 , by 𝑛 would
make the resulting sample variance a biased estimator of the population variance. Now the proof:
E(𝑠 2 ) = E [
E(𝑠 2 ) =
∑(𝑥 − 𝑥̅ )2
𝑛
]
1
E[∑(𝑥 − 𝑥̅ )2 ]
𝑛
Rewrite the sum of squared deviations within the brackets by adding and subtracting µ, as follows:
∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 − 𝑥̅ + µ − µ)2
∑(𝑥 − 𝑥̅ )2 = ∑[(𝑥 − µ) − (𝑥̅ − µ)]2
1
Note that,
1
𝑥̅ = ∑𝑥
𝑛
𝑥̅ =
1
(𝑥 + 𝑥2 + ⋯ + 𝑥𝑛 )
𝑛 1
Thus,
𝑥𝑛 = 𝑛𝑥̅ − (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛−1 )
This shows that any one of the 𝑛 observations in a sample can be written as a linear combination of 𝑥̅ and the remaining 𝑛 − 1
observations. Therefore, in computing the average of squared deviations as the sample variance, there are 𝑛 − 1 independent
squared deviations
2-Estimation and Inference
11 of 21
∑(𝑥 − 𝑥̅ )2 = ∑[(𝑥 − µ)2 − 2(𝑥 − µ)(𝑥̅ − µ) + (𝑥̅ − µ)2 ]
∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 − µ)2 − 2(𝑥̅ − µ)∑(𝑥 − µ) + 𝑛(𝑥̅ − µ)2
∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 − µ)2 − 2𝑛(𝑥̅ − µ)2 + 𝑛(𝑥̅ − µ)2
Note: ∑(𝑥 − µ) = ∑𝑥 − 𝑛µ = 𝑛𝑥̅ − 𝑛µ = 𝑛(𝑥̅ − µ)
∑(𝑥 − 𝑥̅ )2 = ∑(𝑥 − µ)2 − 𝑛(𝑥̅ − µ)2
Now we can write
E(𝑠 2 ) =
1
E[∑(𝑥 − µ)2 − 𝑛(𝑥̅ − µ)2 ]
𝑛
E(𝑠 2 ) =
1
E[∑(𝑥 − µ)2 ] − E[(𝑥̅ − µ)2 ]
𝑛
E(𝑠 2 ) =
1
E[(𝑥1 − µ)2 + (𝑥2 − µ)2 + ⋯ + (𝑥𝑛 − µ)2 ] − E[(𝑥̅ − µ)2 ]
𝑛
E(𝑠 2 ) =
1
{E[(𝑥1 − µ)2 ] + E[(𝑥2 − µ)2 ] + ⋯ + E[(𝑥𝑛 − µ)2 ]} − E[(𝑥̅ − µ)2 ]
𝑛
E(𝑠 2 ) =
1 2
(σ + σ22 + ⋯ + σ2𝑛 ) − var(𝑥̅ )
𝑛 1
E(𝑠 2 ) =
1
σ2
(𝑛σ2 ) −
𝑛
𝑛
Since 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 are random selections from the same population, then, σ12 = σ22 = ⋯ = σ2𝑛 = σ2 . Also, the
variance of 𝑥̅ is: var(𝑥̅ ) = σ2 ⁄𝑛. Thus,
E(𝑠 2 ) = σ2 −
σ2 𝑛 − 1 2
=
σ
𝑛
𝑛
which is what we set out to prove. For a sample statistic to be an unbiased estimator of the population
parameter, the expected value of that sample statistic must equal to the population parameter. Therefore,
when the sample variance is calculated as
𝑠2 =
∑(𝑥 − 𝑥̅ )2
𝑛
this variance would be a biased estimator of the population variance σ².
If, however, we use 𝑛 − 1 in the denominator of the sample variance formula,
𝑠2 =
∑(𝑥 − 𝑥̅ )2
𝑛−1
the end result of same process would instead be
E(𝑠 2 ) =
1
𝑛
σ2
(nσ2 ) −
( ) = σ2
𝑛−1
𝑛−1 𝑛
2-Estimation and Inference
12 of 21
Thus, when E(𝑠 2 ) = σ², 𝑠² is said to be an unbiased estimator of σ².
4.2. Efficient Estimators
In many situations different unbiased estimators of the population parameter can be obtained. However, the
estimator with the smallest variance is clearly preferred since it would provide us with the smallest possible
margin of error in the estimation process. The smaller the variance, the more closely clustered the values of
the sample statistic (the estimator) around the population parameter. The unbiased estimator with the
smallest variance is called the most efficient estimator.
5. Confidence Interval (Interval Estimate) for the Population Mean
In the previous discussions, to establish the theory of sampling distribution, we assumed that the population
mean µ and standard deviation σ are known. It was just explained that for samples of size 𝑛, 1 − α proportion
of all sample means fall within the margin of error of 𝜖 = 𝑧α⁄2 σ⁄√𝑛 from the population mean.
In practice, the whole purpose of inferential statistic is to find an estimate of the unknown population
parameter. Obtaining a single sample will provide a point estimate of the population parameter. But this
point estimate gives us very little information about the precision of our estimate. We know that the number
of samples that could be selected, and hence the number of sample means calculated, is infinite, and these 𝑥̅
values are normally distributed about the population mean. Therefore, a point estimate would not tell us how
close the mean computed from a single sample is to the population mean.
An interval estimate provides a range of values, which allows us to state with a known level of confidence that
the population mean falls within that interval. This interval estimate for the population mean is obtained as
follows. It was explained above that,
𝑃 (µ − 𝑧α⁄2 se(𝑥̅ ) < 𝑥̅ < µ + 𝑧α⁄2 se(𝑥̅ )) = 1 − 𝛼
Take the inequality statement (the interval) within the parentheses and rewrite it as:
𝑥̅ − 𝑧α⁄2 se(𝑥̅ ) < µ < 𝑥̅ + 𝑧α⁄2 se(𝑥̅ )
The above inequality shows that µ falls within 1 − 𝛼 of all possible intervals built around the means of all
random samples: 𝑥̅ ± 𝑧α⁄2 se(𝑥̅ ). Therefore, if we select one sample of size n and build a single interval 𝑥̅ ±
𝑧α⁄2 se(𝑥̅ ), we are 1 − 𝛼 percent confident that this interval contains the population mean. Thus, the
confidence interval for the population mean, with the lower end 𝐿 and the upper end 𝑈 is:
𝐿, 𝑈 = 𝑥̅ ± 𝑧α⁄2 se(𝑥̅ )
5.1. The t Distribution
So far, the theory of confidence intervals has been explained using the population standard deviation in the
margin of error formula:
𝜖 = 𝑧α⁄2 se(𝑥̅ ) = 𝑧α⁄2
σ
√𝑛
2-Estimation and Inference
13 of 21
In practice, obviously, σ is also an unknown population parameter and must be estimated using the sample
data. The estimator of the population parameter σ is the sample statistic 𝑠, the sample standard deviation,
𝑠2 = √
∑(𝑥 − 𝑥̅ )2
𝑛−1
Therefore, in the margin of error formula the standard error of 𝑥̅ becomes an estimated value obtained using:
se(𝑥̅ ) =
s
√𝑛
When 𝑠 is used in place of σ a peculiar thing happens to the shape of the sampling distribution of 𝑥̅ . The
sampling distribution is still bell shaped, but the area under the curve for a given interval of 𝑥̅ values is not
the same as when the known σ is used. To illustrate, consider the following example:
First, suppose the mean of a normally distributed population is µ = 100 and the standard deviation is σ = 20.
The proportion of 𝑥̅ values for samples of size 𝑛 = 16 taken from this population that fall between, say, 90.2
and 109.8 are determined as follows
P(90.2 < 𝑥̅ < 109.8)
se(𝑥̅ ) = σ⁄√𝑛 = 20⁄√16 = 5
𝑧 = (𝑥̅ − µ)⁄se(𝑥̅ ) = ±1.96
P(−1.96 < 𝑧 < 1.96) = 0.95
Now, instead of using σ, let the standard deviation 20 be as if determined from a sample. That is, let 𝑠 = 20.
Hence,
se(𝑥̅ ) = s⁄√𝑛 = 20⁄√16 = 5
Here, when we attempt to transform 𝑥̅ to 𝑧 using the formula
𝑥̅ − µ
𝑠 ⁄√ 𝑛
a problem arises.
The new random variable obtained through this transformation no longer has a 𝑧 distribution (with mean 0
and standard deviation 1). This problem was observed by William S. Gosset (1867-1937), a British
chemist/statistician, in a paper published in 1908. Gosset showed that, when sample size is small, the
standard normal table z does not provide the accurate area under the curve for the scores obtained from the
conversion formula (𝑥̅ − µ)⁄se(𝑥̅ ). In the above example, if (𝑥̅ − µ)⁄se(𝑥̅ ) = ±1.96, the area under the curve
is bounded by the two scores ±1.96 is no longer 0.95. Gosset developed an alternative table to obtain the
more accurate areas or probability values for the scores thus calculated. The new table of probabilities he
provided is now called the t table. And the random variable obtained from this transformation is said to have
a t distribution, where,
𝑡=
𝑥̅ − µ
𝑠 ⁄√ 𝑛
2-Estimation and Inference
14 of 21
The difference between the z and t distributions is shown in the following diagram.
z
t (df = 4)
Tail area under z
0.025
Tail area under t
0.061
0
1.96
Like the z distribution, the t distribution is symmetric about the mean of 0. However, unlike z, which has a
unique, unchanging shape due to its fixed standard deviation 1, the t distribution acquires different shapes
depending on a parameter called degrees of freedom. In estimations involving μ, the degrees of freedom is
𝒅𝒇 = 𝒏 − 𝟏, the denominator used in computing the sample standard deviation. The smaller the degrees of
freedom, the larger the tail areas. As the df increases, the 𝑡 distribution approaches the 𝑧 distribution and tail
area under the 𝑡 curve becomes closer and closer to the tail area under the 𝑧.
As the degrees of freedom increases, the distinction between z and t practically disappears. For any 𝑑𝑓 > 2,
the standard deviation of the t distribution is 𝑠𝑡 = √
𝑑𝑓
.
𝑑𝑓−2
For example, if 𝑑𝑓 = 4, then 𝑠𝑡 = 1.414. As
𝑑𝑓 rises, the standard deviation approaches 1, which is the standard deviation of 𝑧. Let, for example, 𝑑𝑓 =
1000, then the standard deviation is practically 1 (√1000⁄998 = 1.001). The fact that 𝑡 has a larger standard
deviation than 𝑧 makes the tail area under the 𝑡 curve relatively larger for a given value, than the area under
the z curve for the same value. Thus, using a computer, it can be shown that, while the tail area for the z score
1.96 is 0.025, the tail area associated with a t score of 1.96 (with 𝑑𝑓 = 4) is 0.061.
Having a larger standard deviation and tail area than 𝑧 is a reflection of the fact that the 𝑡 distribution applies
to situations with a greater inherent uncertainty. The uncertainty arises from the fact that σ is unknown and
𝑥̅ −µ
is estimated by the random variable 𝑠. The t distribution, 𝑡 = ⁄ , thus reflects the uncertainty in two
random variables, 𝑥̅ and 𝑠, while 𝑧 =
𝑥̅ −µ
σ⁄√𝑛
𝑠 √𝑛
reflects only an uncertainty due to 𝑥̅ . The greater uncertainty in t
(which makes confidence intervals based on t wider than those based on z) is the price we pay for not
knowing σ and having to estimate it from sample data.
In inferential statistics we are interested in the t score for a given tail area, or in the tail area associated with a
given 𝑡 score. A typical t table provides the 𝑡 scores for a given 𝑑𝑓 and various tail areas. But there are no
tables which provide the tail area for different 𝑡 scores. In either case, a computer can easily provide the
values we are looking for.
Back to the confidence interval for µ: When σ is unknown, the margin of error used in building the confidence
interval is,
̅)
𝒆 = 𝒕𝛂⁄𝟐,𝒅𝒇 𝐬𝐞(𝒙
where 𝑑𝑓 = 𝑛 − 1 and se(𝑥̅ ) = 𝑠⁄√𝑛.
2-Estimation and Inference
15 of 21
[Note: The symbol 𝑒 is used for margin of error in place of 𝜖 reflecting the fact that we are using an estimated
value for the standard error.]
The confidence interval with 1 − α level of confidence for the population mean is then,
𝑥̅ − 𝑡α⁄2,𝑑𝑓
𝑠
√𝑛
< µ < 𝑥̅ + 𝑡α⁄2,𝑑𝑓
𝐿, 𝑈 = 𝑥̅ ± 𝑡α⁄2,𝑑𝑓
𝑠
√𝑛
𝑠
√𝑛
Example 3
To build a confidence interval with a 0.95 level of confidence for the average life of a certain type of light bulb
a sample of 𝑛 = 25 where tested. The sample mean is 𝑥̅ = 920.5 and the sample standard deviation is 𝑠 =
43.5.
se(𝑥̅ ) =
𝑠
√𝑛
=
1 − 𝛼 = 0.95
43.5
√25
= 8.7
𝑑𝑓 = 𝑛 − 1 = 24
𝑡α⁄2,𝑑𝑓 = 𝑡0.025,24 = 2.064
To find 𝑡α⁄2,𝑑𝑓 = 𝑡0.025,24 = 2.064, use the following Excel function:
=𝐓. 𝐈𝐍𝐕. 𝟐𝐓(𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲, 𝐝𝐞𝐠_𝐟𝐫𝐞𝐞𝐝𝐨𝐦)
𝑒 = 𝑡α⁄2,𝑑𝑓
𝑠
√𝑛
=T. INV. 2T(0.05,24)
= (2.064)(8.7) = 17.96
920.5 − 17.96 < µ < 920.5 + 17.96
902.54 < µ < 938.46
𝐿, 𝑈 = (902.54,938.46)
6. Test of Hypothesis for µ
In the interval estimate process we started with no knowledge, assumption, or hypothesis, about the
population parameter. A sample is taken and an interval is built around that sample.
Unlike the interval estimate approach to inferential statistics, the test of hypothesis starts with a conjecture
or hypothesis about the population parameter. Denoting the hypothesized value by µ0 , in hypothesis testing
we ascribe this value to the center of gravity of the 𝑥̅ values in the sampling distribution of 𝑥̅ . Then the
argument goes as follows: if µ0 were the actual population mean, then 1 − 𝛼 proportion of sample means
would fall within the interval bounded by 𝑥̅𝐿 , 𝑥̅𝑈 = µ0 ± 𝑧𝛼⁄2 se(𝑥̅ ).
2-Estimation and Inference
16 of 21
1−α
µ₀
To test for statistical validity of this conjecture, to determine if µ0 is reasonable value ascribed to the
population mean, a sample of size 𝑛 is taken from which the sample statistic is computed. If the sample mean
falls with the interval shown in the above diagram (the “acceptance region”), then we decide this mean
belongs to the family of 𝑥̅ values whose center of gravity is µ0 , and conclude the population mean is the value
we ascribed to µ0 .
Note that the interval or acceptance region (𝑥̅𝐿 , 𝑥̅𝑈 ) is obtained by adding to and subtracting the margin of
error from µ0 the margin of error 𝜖 = 𝑧𝛼⁄2 se(𝑥̅ ):
𝑥̅𝐿 , 𝑥̅𝑈 = µ0 ± 𝑧𝛼⁄2 se(𝑥̅ )
Therefore, main task in performing a test of hypothesis is to find the statistical margin of error. This would
provide us with the critical value to establish the decision rule to accept or reject the hypothesis. The
decision rule sets up the acceptance region, which defines the range of acceptable values to which the 𝑥̅ value
from the sample is compared.
To determine the acceptance region, first you must state the claim (the hypothesis) about the population
mean in a prescribed way. The claim contains a null hypothesis, denoted by 𝐻0 , and an alternative
hypothesis, 𝐻1 . Suppose we are testing the hypothesis that the population mean equals 100.
𝐻0 : µ = 100
𝐻1 : µ ≠ 100
The null hypothesis states that the population mean equals 100; the alternative hypothesis states that the
population mean is a value other than 100.
Once you stated your hypothesis, you must deal with the following dilemma involving hypothesis tests. Since
the test of hypothesis involves the sampling distribution, in deriving a conclusion from the results of the test
that is based on a random sampling process, there is always a chance that you may make a wrong decision,
commit an error. There are two possible errors.
1) Type I Error—Reject a true null hypothesis.
2) Type II Error—Not reject false null hypothesis.
2-Estimation and Inference
17 of 21
(a) Type-I Error
H₀
(b) Type-II Error
H₁
µ₀
H₀
µ₁
H₁
µ₀
x̅
µ₁
x̅
There is always a chance that you may commit either one of the two errors. If the population mean is in fact
µ = µ0 = 100, but if 𝑥̅ value falls outside the non-rejection interval (𝑥̅𝐿 , 𝑥̅𝑈 ) in panel (a) of the above diagram,
then you would wrongly reject a true null hypothesis—you have committed a Type I error. The probability of
committing a Type I error is α (the combined two-tail areas in the above diagram). The Type II error would
occur if the population mean is not equal to 100 (µ = µ1 ≠ 100), but 𝑥̅ falls inside the non-rejection interval
in panel (b), leading you to not reject a false null hypothesis. The probability of committing a Type II error is
denoted by β, shown as the area to the left of 𝑥̅𝑈 under the distribution labeled 𝐻1 . Reducing α, expanding the
non-rejection interval (𝑥̅𝐿 , 𝑥̅𝑈 ) for a given sample size, comes only at the cost of increasing β.
Performing a test of hypothesis is like conducting a trial in a criminal court. The defendant or the accused is
charged with a crime. The purpose of the trial is to establish the defendant’s guilt or innocence. The null
hypothesis is that the defendant is innocent (the accused is presumed innocent) and the alternative is that he
is guilty (the guilt to be established beyond a reasonable doubt by the prosecutor). If the jury finds an
innocent person guilty, it has rejected a true null hypothesis; it has, therefore, committed a Type I error. On
the other hand, if the jury finds a guilty person not guilty, it has not rejected a false null hypothesis; it has,
therefore, committed a Type II error.
In the hypothesis test, the benefit of the doubt is given to 𝐻0 , and burden of proof is upon 𝐻1 . That is, we want
to make it unlikely to reject the null hypothesis unless the evidence is "very strong". We want to make it
unlikely to find the defendant guilty unless guilt is established beyond a reasonable doubt. For this reason
the α, the probability of rejecting the null hypothesis, is always assigned a small value—typically, 5 percent.
The α value is also called the level of significance.
Note that in a confidence interval, α is the percentage of all possible intervals built around sample means that
do not capture the population mean. That was because α% of sample means fall outside the margin error 𝜖 =
±𝑧α⁄2 𝑠𝑒(𝑥̅ ). In a test of hypothesis, α plays a similar role. If the randomly selected 𝑥̅ falls outside the
prescribed margin of error, we would wrongly reject the null hypothesis. And there is always an α% chance
of doing that.
Since committing a Type I Error is the more serious of the two errors, the threshold probability (the level of
significance α) is set in advance. The probability of Type II Error (β), however, varies based on several
factors, one of the them being α. The method to determine β will be explained later in this chapter.
Suppose to test the null hypothesis that the population mean is 100 a random sample of size 16 is selected
with the following results:
108
109
104
95
105
93
97
100
96
95
100
109
108
106
102
108
The mean of the sample is 𝑥̅ = 102.2. The question is then, is 102.2 significantly different from 100? How do
we decide if the difference is significant? If we want to limit our probability of Type I error to 5 percent, then
we select α = 0.05. Given this probability, then we can determine the 95% margin of error as follows:
2-Estimation and Inference
18 of 21
𝑒 = ±𝑡α⁄2,(𝑛−1) se(𝑥̅ )
First we must also compute the sample standard deviation (𝑠 = 5.671) to determine the standard error of 𝑥̅ .
se(𝑥̅ ) = 𝑠⁄√𝑛 = 1.418
𝑡0.025,15 = 2.131
Thus,
𝑒 = 2.131 × 1.418 = 3.02.
This tells us 95% of all means of samples of size 𝑛 = 16 fall within ±3.02 units from the population mean.
Since 𝑥̅ = 102.2 differs from hypothesized mean µ = 100 by 2.2, then this difference falls within the
acceptable margin of error of 3.02. Alternatively stated, 𝑥̅ = 102.2 falls within the non-rejection interval of
𝑥̅𝐿 = µ0 − 𝑒 = 100 − 3.02 = 96.98 and 𝑥̅𝑈 = µ0 + 𝑒 = 100 + 3.02 = 103.2. Therefore, if the population mean
is 100, then 102.2 is one of the likely sample means.
The decision rule for rejecting the null hypothesis, in short, can be written as:
̅ − µ𝟎 | > 𝒆
Reject H0, if |𝒙
This decision rule can also be written in a more frequently applied way, derived as follows:
Start with the decision rule above and substitute for 𝑒 = 𝑡α⁄2,(𝑛−1) se(𝑥̅ ) on the right hand side of the
inequality.
|𝑥̅ − µ0 | > 𝑡α⁄2,(𝑛−1) se(𝑥̅ )
Divide both sides by se(𝑥̅ ).
|𝑥̅ − µ0 |
> 𝑡α⁄2,(𝑛−1)
se(𝑥̅ )
The left hand side is the test statistic |t| and the right hand side is the critical value. Thus, the decision rule
becomes:
Reject H₀, if the test statistic exceed the critical value: |𝑡| > 𝑡α⁄2,(𝑛−1)
2-Estimation and Inference
19 of 21
In the example,
|𝑡| =
102.2 − 100
= 1.552
1.418
is less than 𝑡0.025,15 = 2.131. Therefore, do not reject the null hypothesis.
6.1. The probability value
The probability value approach to test of hypothesis is based on the notion that if the population mean is in
fact 100, what is the probability that a randomly selected sample from this population would yield a sample
mean which would deviate from µ = 100 by 2.2 units or more?
This probability is the area under the curve to the right of 102.2. To find this probability, we must transform
the test statistic into the t variable. This is already done above: |𝑡| = 1.552. Now find P(𝑡 > |𝑡|). Using Excel,
this probability can be computed using the following Excel function:
= T. DIST(x, deg_freedom, cumulative)
Since we want the tail area associated with t-score, enter the negative value for 𝑡 and “1” for “cumulative”.
= T. DIST(−1.552,15,1) = 0.0708
P(𝑡 > 1.552) = 0.0708
2-Estimation and Inference
20 of 21
0
1.552
2.131
For two-tail tests this probability value must be doubled, 0.0708 × 2 = 0.1416. This means that the
probability that a sample mean would deviate (in either direction) from the population mean of 100 by 2.2 or
more is 0.1416. Compared to the level of significance of α = 0.05, 0.1416 is a very high probability. This
implies that if we reject the null hypothesis that the population mean is 100, the probability of committing a
Type I error, rejecting a true null hypothesis, is over 14%, which far exceeds the self-imposed limit of 5%.
Therefore, we do not reject the null hypothesis. In Excel you can obtain the 𝑝­𝑣𝑎𝑙𝑢𝑒 for a two-tail test by
= T. DIST. 2T(x, deg_freedom)
= T. DIST(1.552,15)
2-Estimation and Inference
= 0.1416
21 of 21
Download