Uploaded by Adeel Toin

notesgbu3311

advertisement
II-
Review of Chapter 8: Interval Estimation
Population: A population is the set of all elements of interest in a particular study (for that reason,
it is sometimes called the universe). Example: If one is interested in evaluating the average revenue
of the Moroccan family, the population would be the set of all Moroccan households.
Parameter: A parameter is a summary measure describing a characteristic of the entire population.
It is generally unknown, and is very difficult to assess precisely.
Sometimes in fact, very often the population is very large, to the extent that it is extremely difficult,
if not impossible, to consider it in its entirety because of the lack of time, money, etc., It is then
necessary to limit oneself to a subset of the entire population.
Sample: A sample is a subset of a population. Example: A sample of 5000 households is selected
from the population of Moroccan households to conduct the study.
Statistic: A statistic is a summary measure computed for a sample, rather than for the entire
population. It is generally used as an estimation of the corresponding population parameter.
Since it is often difficult, if not impossible to study parameters of entire populations. One needs in
that case to limit oneself to subsets of the population, known as samples, which statistics provide
approximations of the populations parameters. How well these statistics approximate the true
parameters is the subject of the current chapter.
III-
Interval Estimation of Population Mean
III-1-
The population standard deviation, , is KNOWN
Assume the standard deviation, , of the population is known. Given a sample and its corresponding mean X̄, we may state with (1 ↵) confidence that the population mean µ is within
X̄ ± Z↵/2 ⇥ p
n
If X follows a normal distribution with mean µ and standard deviation .
If X is not normally distributed then the sample size, n, must be
30.
where
⇥
⇤
P Z  Z↵/2 = 1
↵/2
Z↵/2 is called the critical value corresponding to a level of confidence of (1
e = Z↵/2 ⇥
p
n
↵).
is called the sampling error or the margin of error
Z↵/2 is called the critical value corresponding to a level of confidence of (1
↵).
Exercise 1: In an e↵ort to estimate the mean amount spent per customer for dinner at a major
Atlanta restaurant, data were collected for a sample of 49 customers. Assume a population standard
deviation of 5. If the sample mean is 24.80, what is the 95% confidence interval for the population
mean?
Solution:
9
Since the sample size is 49 which is
X̄
30 then we are (1
↵) confident that
Z↵/2 ⇥ p  µ  X̄ + Z↵/2 ⇥ p
n
n
Let’s first find Z↵/2 . Since we use a 95% confidence,
↵=1
0.95 = 0.05 so
⇥
⇤
P Z  Z↵/2 = 1
↵/2 = 1
0.05/2 = 0.975
the Z-table gives us Z↵/2 = 1.96 so we are 95% confident that the population mean, µ, is
24.80
5
5
1.96 ⇥ p  µ  24.80 + 1.96 ⇥ p
49
49
23.4  µ  26.2
We are 95% confident that the mean amount spent per customer for dinner, µ, is between 23.40
and 26.20
Exercise 2: The National Quality Research Center at the University of Michigan provides a quarterly measure of consumer opinions about products and services (The Wall Street Journal, February
18, 2003). A survey of 10 restaurants in the Fast Food/Pizza group showed a sample mean customer satisfaction index of 71. Past data indicate that the population standard deviation of the
index has been relatively stable with = 4.85.
a.
b.
c.
d.
What
What
Using
What
is the population? What is the parameter of interest? What is the statistic?
assumption should the researcher be willing to make if a margin of error is desired?
95% confidence, what is the margin of error?
is the margin of error if 99% confidence is desired?
Solution:
a. Since the sample size is 10 which is less than 30 then the researcher should assume that the
customer satisfaction index, which is X, is normally distributed.
b. Since we assume the customer satisfaction index is normally distributed then the margin of
error, e, is
e = Z↵/2 ⇥ p
n
Let’s first find Z↵/2 . Since we use a 95% confidence, ↵ = 1 0.95 = 0.05 so
⇥
⇤
P Z  Z↵/2 = 1 ↵/2 = 1 0.05/2 = 0.975
the Z-table gives us Z↵/2 = 1.96 so the margin of error is
4.85
e = Z↵/2 ⇥ p = 1.96 ⇥ p = 3.01
n
10
c. Since we assume the customer satisfaction index, X, is normally distributed then the margin
of error, e, is
e = Z↵/2 ⇥ p
n
Let’s first find Z↵/2 . Since we use a 99% confidence,
↵ = 1 0.99 = 0.01 so
⇥
⇤
P Z  Z↵/2 = 1 ↵/2 = 1 0.01/2 = 0.995
the Z-table gives us Z↵/2 = 2.58 so the margin of error is
4.85
e = Z↵/2 ⇥ p = 2.58 ⇥ p = 3.96
n
10
10
Exercise 3: A survey of small businesses with Web sites found that the average amount spent
on a site was 11500 per year (Fortune, March 5, 2001). Given a sample of 60 businesses and a
population standard deviation of = $4000,
a. What is the population? What is the random variable X? What is the parameter of interest?
What is the statistic?
b. Using a 95% confidence, what is the margin of error?
c. What would you recommend if the study required a margin of error of 500?
a. The population is all small businesses with Web sites. The random variable X is the amount
spent on a Web site by a small business. The parameter is the average amount spent on a
Web site per year. The statistic is 11500 which is the average of a sample of 60 businesses.
b. Since the sample size is 60 which is
e, is
30 then, Using a 95% confidence, the margin of error,
e = Z↵/2 ⇥ p
n
Let’s first find Z↵/2 . Since we use a 95% confidence,
↵ = 1 0.95 = 0.05 so
⇥
⇤
P Z  Z↵/2 = 1 ↵/2 = 1 0.05/2 = 0.975
the Z-table gives us Z↵/2 = 1.96
4000
e = 1.96 ⇥ p = $1012.14
60
c. If the study requires e = $500 then
Z↵/2 ⇥ p = 500
n
⇣
n = Z↵/2 ⇥
500
⌘2
So we need to have a sample size of 246.
11
=
✓
1.96 ⇥
4000
500
◆2
= 245.86
Exercise 4: Vogue magazine reported that the mean annual household income of its readers is
120000 (Vogue, January 2008). Assume this estimate of the mean annual household income is
based on a sample of 80 households, and based on past studies, the population standard deviation
is known to be = $30000.
a. What is the population? What is the random variable X? What is the parameter of interest?
What is the statistic?
b. Develop a 90% confidence interval estimate of the population mean.
c. Develop a 95% confidence interval estimate of the population mean.
d. Develop a 99% confidence interval estimate of the population mean.
e. Discuss what happens to the width of the confidence interval as the confidence level is increased. Does this result seem reasonable? Explain.
Solution:
a. The population is all readers of the magazine Vogue. The random variable X is the annual
household income of a reader of the magazine Vogue. The parameter is the mean household
income, µ, of all readers of the magazine Vogue. The statistic is 120000 which is the average
of a sample of 80 households of readers of the magazine Vogue.
b. Since the sample size is 80 which is
X̄
30 then we are (1
↵) confident that
Z↵/2 ⇥ p  µ  X̄ + Z↵/2 ⇥ p
n
n
Let’s first find Z↵/2 .
Since we use a 90% confidence, ↵ = 1
0.9 = 0.1 so
⇥
⇤
P Z  Z↵/2 = 1
↵/2 = 1
0.1/2 = 0.95
the Z-table gives us Z↵/2 = 1.64 so we are 90% confident that the population mean, µ, is
120000
30000
30000
1.64 ⇥ p
 µ  120000 + 1.64 ⇥ p
80
80
$114499.27  µ  $125500.73
c. Since the sample size is 80 which is
X̄
30 then we are (1
↵) confident that
Z↵/2 ⇥ p  µ  X̄ + Z↵/2 ⇥ p
n
n
Let’s first find Z↵/2 .
Since we use a 95% confidence, ↵ = 1
0.95 = 0.05 so
⇥
⇤
P Z  Z↵/2 = 1
↵/2 = 1
0.05/2 = 0.975
the Z-table gives us Z↵/2 = 1.96 so we are 95% confident that the population mean, µ, is
120000
30000
30000
1.96 ⇥ p
 µ  120000 + 1.96 ⇥ p
80
80
$113425.96  µ  $126574.04
12
d. Since the sample size is 80 which is
X̄
30 then we are (1
↵) confident that
Z↵/2 ⇥ p  µ  X̄ + Z↵/2 ⇥ p
n
n
Let’s first find Z↵/2 .
Since we use a 99% confidence, ↵ = 1
0.99 = 0.01 so
⇥
⇤
P Z  Z↵/2 = 1
↵/2 = 1
0.01/2 = 0.995
the Z-table gives us Z↵/2 = 2.58 so we are 99% confident that the population mean, µ, is
120000
30000
30000
2.58 ⇥ p
 µ  120000 + 2.58 ⇥ p
80
80
$111346.42  µ  $128653.58
e. The confidence interval gets wider as the confidence level is increased. It makes sense because
if we keep the same sample size and we want to be more confident then the margin of error
gets larger.
III-2-
The population standard deviation, , is UNKNOWN
The previous result giving the confidence interval estimate for the mean assumed that the standard
deviation of the population is known. Very often, however, this assumption cannot be made.
When this occurs, one needs to modify the way a confidence interval estimate is formed. This is
the objective of this section.
To the unknown problem, a straightforward and intuitive answer could be given : just estimate
using the point estimator S , the standard deviation of the sample, then just apply the usual
general formula of a confidence interval estimate:
Confidence Interval = X̄ ± (Critical Value) ⇥ (Std. error)
. The Critical Value, though, uses in this case a new probability distribution instead of the normal
distribution. This distribution is the Students t distribution.
Assume the standard deviation, , of the population is unknown. Given a sample and its corresponding mean X̄ and standard deviation S, we may state with (1 ↵) confidence that the
population mean µ is within
S
X̄ ± t↵/2 ⇥ p
n
In most applications, a sample size of n 30 is adequate when using the above interval estimation.
However, one needs to consider the validity for the following cases.
any n if the population follows a normal distribution (X is normally distributed).
n must be 15 If the population is not normally distributed but is roughly symmetric and
without outliers.
n must be
50 if the population’s distribution is highly skewed or contains outliers.
where
⇥
⇤
P T  t↵/2 = 1
13
↵/2
t↵/2 is called the critical value of the Students t-distribution corresponding to a level of
confidence of (1 ↵) and a number of degrees of freedom,df = n 1.
e = t↵/2 ⇥
pS
n
is called the sampling error or the margin of error
Remember: as n becomes larger and larger the t distribution with n
closer and closer to a normal distribution N (0,1)
1 degrees of freedom gets
Exercise 1: Redo the exercise in the previous section about a sample of 80 households of readers of
the magazine Vogue readers but consider that we do not know the population’s standard deviation
and that the sample’s standard deviation is S = $30000. Compare these results with the results
found in the previous section ( KNOWN).
Solution:
a. No need to answer it.
b. Since the sample size is 80 which is
X̄
50 then we are (1
S
S
t↵/2 ⇥ p  µ  X̄ + t↵/2 ⇥ p
n
n
Let’s first find t↵/2 .
Since we use a 90% confidence, ↵ = 1
0.9 = 0.1 so
⇥
⇤
P T  t↵/2 = 1
↵/2 = 1
the t-table gives us t↵/2 = 1.664 -we use df = n
that the population mean, µ, is
120000
↵) confident that
0.1/2 = 0.95
1 = 80
1 = 79. So we are 90% confident
30000
30000
1.664 ⇥ p
 µ  120000 + 1.664 ⇥ p
80
80
$114418.77  µ  $125581.23
The interval estimate we found when we considered
KNOWN was:
$114499.27  µ  $125500.73
It is clear that the interval when
UNKNOWN is wider than when we solved with
c. Since the sample size is 80 which is
X̄
50 then we are (1
↵) confident that
S
S
t↵/2 ⇥ p  µ  X̄ + t↵/2 ⇥ p
n
n
14
KNOWN.
Let’s first find t↵/2 .
Since we use a 95% confidence, ↵ = 1
0.95 = 0.05 so
⇥
⇤
P T  t↵/2 = 1
↵/2 = 1
the t-table gives us t↵/2 = 1.99 -we use df = n
that the population mean, µ, is
0.05/2 = 0.975
1 = 80
1 = 79. So we are 95% confident
30000
30000
1.99 ⇥ p
 µ  120000 + 1.99 ⇥ p
80
80
120000
$113325.34  µ  $126674.66
The interval estimate we found when we considered
KNOWN was:
$113425.96  µ  $126574.04
It is clear that the interval when
UNKNOWN is wider than when we solved with
d. Since the sample size is 80 which is
50 then we are (1
KNOWN.
↵) confident that
S
S
t↵/2 ⇥ p  µ  X̄ + t↵/2 ⇥ p
n
n
X̄
Let’s first find t↵/2 .
Since we use a 99% confidence, ↵ = 1
0.99 = 0.01 so
⇥
⇤
P T  t↵/2 = 1
↵/2 = 1
the t-table gives us t↵/2 = 2.64 -we use df = n
that the population mean, µ, is
0.01/2 = 0.995
1 = 80
1 = 79. So we are 99% confident
30000
30000
2.64 ⇥ p
 µ  120000 + 2.64 ⇥ p
80
80
120000
$111145.17  µ  $128854.83
The interval estimate we found when we considered
KNOWN was:
$111346.42  µ  $128653.58
It is clear that the interval when
UNKNOWN is wider than when we solved with
KNOWN.
Exercise 2: Redo exercise 1 for a sample size of 100. Compare these results with the results found
in the previous section ( KNOWN with sample size 80).
Solution:
a. No need to answer it.
b. Since the sample size is 100 which is
X̄
50 then we are (1
↵) confident that
S
S
t↵/2 ⇥ p  µ  X̄ + t↵/2 ⇥ p
n
n
Let’s first find t↵/2 .
Since we use a 90% confidence, ↵ = 1
0.9 = 0.1 so
15
⇥
⇤
P T  t↵/2 = 1
↵/2 = 1
the t-table gives us t↵/2 = 1.66 -we use df = n
that the population mean, µ, is
0.1/2 = 0.95
1 = 100
1 = 99. So we are 90% confident
30000
30000
1.66 ⇥ p
 µ  120000 + 1.66 ⇥ p
100
100
120000
$115020  µ  $124980
The interval estimate we found when we solved with
KNOWN and n = 80 was:
$114499.27  µ  $125500.73
It is clear that the interval when UNKNOWN and n = 100 is narrower than when we solved
with KNOWN and n = 80.
Even if we do not have the population’s standard deviation we can obtain a narrower interval
by increasing the sample size.
c. Since the sample size is 100 which is
X̄
50 then we are (1
S
S
t↵/2 ⇥ p  µ  X̄ + t↵/2 ⇥ p
n
n
Let’s first find t↵/2 .
Since we use a 95% confidence, ↵ = 1
0.95 = 0.05 so
⇥
⇤
P T  t↵/2 = 1
↵/2 = 1
the t-table gives us t↵/2 = 1.984 -we use df = n
that the population mean, µ, is
120000
↵) confident that
0.05/2 = 0.975
1 = 100
1 = 99. So we are 95% confident
30000
30000
1.984 ⇥ p
 µ  120000 + 1.984 ⇥ p
100
100
$114048  µ  $125952
The interval estimate we found when we solved with
KNOWN and n = 80 was:
$113425.96  µ  $126574.04
It is clear that the interval when UNKNOWN and n = 100 is narrower than when we solved
with KNOWN and n = 80.
Even if we do not have the population’s standard deviation we can obtain a narrower interval
by increasing the sample size.
d. Since the sample size is 100 which is
X̄
50 then we are (1
S
S
t↵/2 ⇥ p  µ  X̄ + t↵/2 ⇥ p
n
n
Let’s first find t↵/2 .
Since we use a 99% confidence, ↵ = 1
0.99 = 0.01 so
⇥
⇤
P T  t↵/2 = 1
↵/2 = 1
the t-table gives us t↵/2 = 2.626 -we use df = n
that the population mean, µ, is
120000
↵) confident that
0.01/2 = 0.995
1 = 100
1 = 99. So we are 99% confident
30000
30000
2.626 ⇥ p
 µ  120000 + 2.626 ⇥ p
100
100
16
$112122  µ  $127878
The interval estimate we found when we solved with
KNOWN and n = 80 was:
$111346.42  µ  $128653.58
It is clear that the interval when UNKNOWN and n = 100 is narrower than when we solved
with KNOWN and n = 80.
Even if we do not have the population’s standard deviation we can obtain a narrower interval
by increasing the sample size.
17
Exercise 3 (Excel): A study designed to estimate the mean credit card debt for the population
of U.S. households used a sample of 70 households. The data collected is in the sheet named
”SigmaUnknown-Ex3 ” in the Excel file ”Interval Estimation”. Develop a 95% confidence interval
for the mean credit card debt for the population of the US.
Solution: Using Excel
X̄ =
S=
s
Pn
X̄
i=1 Xi
n 1
Pn
i=1
n
2
=
Since we use a 95% confidence then ↵ = 1
Xi
s
= $9312
P70
i=1
2
(Xi 9312)
= $4007
69
0.95 = 0.05
t↵/2 = t0.025
. We do not know the population’s standard deviation, , but since the sample size is
we are 95% confident that the population mean, µ, is
X̄
where t has n
1 = 70
50 then
S
S
t0.025 ⇥ p  µ  X̄ + t0.025 ⇥ p
n
n
1 = 69 degrees of freedom.
P [T  t0.025 ] = 1
↵/2 = 1
0.05/2 = 0.975
the t-table gives us t0.025 = 1.99 so we are 95% confident that the mean credit card debt, µ, of US
households is
4007
4007
9312 1.99 ⇥ p  µ  9312 + 1.99 ⇥ p
70
70
$8358.93  µ  $10265.07
Exercise 4 (Excel): A study designed to estimate the average price per square meter of apartments
in a certain neighborhood in Casablanca used a sample of 100 apartments. The data collected is
in the sheet named ”SigmaUnknown-Ex4 ” in the Excel file ”Interval Estimation”. Develop a 95%
confidence interval for the mean price per square meter of apartments in that neighborhood.
IV-
Population Proportion
Given a sample of size n and the corresponding sample proportion p̄, we may state with (1
confidence that the population proportion, p, is within
p̄ ± Z↵/2 ⇥
If np̄
5 and n (1
p̄)
r
↵)
p̄ (1 p̄)
n
5
where Z↵/2 is the critical value corresponding to the (1
↵) level of confidence.
Example: A random sample of 400 voters showed that 54 will vote for Candidate A. Set up a 95%
confidence interval estimate for the proportion of voters who will vote for Candidate A.
18
We have the sample proportion, p̄, is
p̄ =
54
= 13.5%
400
so np̄ = 400 ⇥ 0.135 = 54 5 and n(1 p̄) = 400 ⇥ (1 0.135) = 346 5. We are 95% confident
that the population proportion, p, is
r
r
p̄(1 p̄)
p̄(1 p̄)
p̄ Z↵/2 ⇥
 p  p̄ + Z↵/2 ⇥
n
n
r
r
0.135 ⇥ (1 0.135)
0.135 ⇥ (1 0.135)
0.135 1.96 ⇥
 p  0.135 + 1.96 ⇥
400
400
0.1015  p  0.1685
Exercise 2 (Excel): A study about the car colors Moroccans prefer used a sample of 120 cars. The
data collected is in the sheet named ”Proportion-Ex2 ” in the Excel file ”Interval Estimation”. The
study considered only four colors: black, dark blue, red, and white.
a. Develop a 95% confidence interval for the proportion of black cars in Morocco.
b. Develop a 95% confidence interval for the proportion of dark blue cars in Morocco.
a. We have the sample proportion, p̄, is
p̄ =
35
= 29.17%
120
so np̄ = 120 ⇥ 0.2917 = 35.004 5 and n(1 p̄) = 120 ⇥ (1 0.2917) = 84.996 5. We are
95% confident that the population proportion, p, is
r
r
p̄(1 p̄)
p̄(1 p̄)
p̄ Z↵/2 ⇥
 p  p̄ + Z↵/2 ⇥
n
n
r
r
0.2917 ⇥ (1 0.2917)
0.2917 ⇥ (1 0.2917)
0.2917 1.96 ⇥
 p  0.2917 1.96 ⇥
120
120
0.2104  p  0.373
b. We have the sample proportion, p̄, is
p̄ =
29
= 24.17%
120
so np̄ = 120 ⇥ 0.2417 = 29.004 5 and n(1 p̄) = 120 ⇥ (1 0.2417) = 90.996 5. We are
95% confident that the population proportion, p, is
r
r
p̄(1 p̄)
p̄(1 p̄)
p̄ Z↵/2 ⇥
 p  p̄ + Z↵/2 ⇥
n
n
r
r
0.2417 ⇥ (1 0.2417)
0.2417 ⇥ (1 0.2417)
0.2417 1.96 ⇥
 p  0.2417 1.96 ⇥
120
120
0.1651  p  0.3183
==============================================================
19
Chapter 09: Hypothesis Testing
I. Introduction
In the previous chapter, we have seen how to develop confidence interval estimates for the mean
and the proportion. In many situations in practice, decision makers are more interested in taking
a decision to accept or reject an assessment about a parameter of the population than to find point
or interval estimates of that parameter. Statistical hypothesis testing provides an analytical and
rigourous methodology for doing such tests. We are talking here about decision to be taken from
one or a few samples, so such decisions of course are always subject to some amount of statistical
errors. However, the advantage of hypothesis testing is that it gives managers and decision makers
the complete control on the errors that they make during the process of testing hypotheses or
claim, as we shall see further in the chapter.
Definition: A null hypothesis is a claim or a statement about a parameter (e.g., the mean, the
proportion, the variance, etc.) of the population, which will be tested. Testing the null hypothesis
leads to determining whether the statement about the population parameter should or should not
be rejected. The null hypothesis is always represented by H0 .
By convention, we have the following:
(a) A null hypothesis should always contain an equality (e.g., µ = 4.65 ), or a non strict inequality
(e.g., µ  4.65 or µ
4.65). A test like µ = 4.65 is called a two-tail test, while a test like
µ  4.65 or µ 4.65 is called a one-tail test.
(b) It can only be rejected, but never confirmed or accepted.
(c) It is always accompanied by its opposite, called the alternative hypothesis.
Definition: The alternative hypothesis, represented by H1 , is the opposite claim of the null
hypothesis. Therefore, the null hypothesis is rejected if and only if the alternative hypothesis is
accepted.
Example 1: A new machine in a manufacturing plant is designed to produce an hourly average of
90 units. This is the status-quo, or the current state. Now, let suppose the engineering department
comes with a new design that is supposed to increase the output of the machine. The objective
is then to prove that the new design really worked as expected and increased the output. In this
case the hypotheses are set as follows:
20
⇢
H0 :
H1 :
µ
µ>
90
90
as you can see, since we can only reject or not the null hypothesis ( H0 : µ  90), we end up
proving or not the alternative (H1 : µ > 90), which is our objective.
Example 2: Let us now present a di↵erent situation in which the objective is not to try to prove
a claim, but rather to reject it. Suppose a consumer defense group receives complains concerning
a famous brand of shoes, which mention that the actual sizes of shoes are not conform the sizes
announced. Size 11, in particular, is mentioned many times as a source of errors. The consumer
group thus decides to investigate this size. In this case, they are primarily interested in challenging
the companys claim that the average size is 11. Therefore, their hypotheses would be:
⇢
H0 : µ = 11
H1 : µ 6= 11
The hypothesis testing procedure leads to two conclusions: the rejection of H0 or the conclusion
that H0 cannot be rejected (again, do not forget that H0 cannot be accepted!). Of course, H0 may
in reality be true or false. Hence, there are two types of errors that can be made:
Type I error: rejecting H0 while it is in fact true; or
Type II error: failing to reject H0 while it is in fact false.
II. Hypothesis Testing for the Mean
KNOWN
Let us start with hypotheses that have equalities, or to what we call two-tail tests. Using Example
2 above, we have a null hypothesis H0 : µ = 11. Intuitively, if we take a sample (with a large
21
enough size) and we compute its mean X̄, and we find a value that is very far from our claim (11),
than we suspect that the null hypothesis could be rejected. More rigorously, assuming µ = 11, we
know from the Central LimitpTheorem that X̄ is approximately normally distributed with mean
µ and standard deviation / n (see Figure 1 below).
Let us define the area of the distribution located in the tails, so that the probability of each portion
of this area is equal to ↵/2. Obviously, there is only a probability of ↵ that X̄ is in this area while
µ = 11. Consequently, we can say with (1 ↵) confidence that µ 6= 11 if we find a value of X̄
in this area. Hence, this two-tail area is called the rejection region and is used, depending on the
position of X̄ (within or outside the rejection region) to reject or not the null hypothesis.
How to determine the rejection region? We prefer generally to work with the standardized normal distribution instead of the distribution of X̄. The boundaries Z↵/2 and Z↵/2 are obtained
from
normal table or from a statistical software package, using the fact that
⇥ the standardized
⇤
P Z  Z↵/2 = 1 ↵/2. Since the standardized normal distribution is used, we need to transform
X̄ accordingly by computing the Z-test statistic,
Z=
X̄ µ
p
/ n
The final step in our testing procedure is to determine whether the test statistic is inside the
rejection region or not, by comparing Z to Z↵/2 and Z↵/2 : if Z
Z↵/2 or Z  Z↵/2 , then
we are inside the rejection region and the null hypothesis is rejected; otherwise, we conclude that
there is not enough statistical evidence to reject H0 .
For our Example 2, suppose a 100 pairs of shoes sample is selected and gives as a sample mean of
11.3. Assume also that the standard deviation announced by the company is 0.55. Let us perform
the test of hypothesis H0 with a significance level ↵ = 0.05.
Let us first compute the Z-test statistic. We have µ = 11, X̄ = 11.3,
Z=
= 0.55, and n = 100
X̄ µ
p
/ n
Let’s find Z↵/2 .
Since we use a significance level ↵ =0.05 then
P [Z  Z0.025 ] = 1
↵/2 = 1
22
0.05/2 = 0.975
the Z-table gives us Z0.025 = 1.96
Since Z Z↵/2 (5.45
5% significance level.
Z=
11.3 11
p
= 5.45
0.55/ 100
Z=
11.3 11
p
= 5.45
0.55/ 100
1.96), our sample is located in the rejection region, so we reject H0 with
Exercise 1: ATMs must be stocked with enough cash to satisfy customers making withdrawals over
an entire weekend. But if two much cash is unnecessarily kept in the ATMs, the bank is forging
the opportunity of investing the money and earning interest. Assume that at a particular branch
the population mean amount of money withdrawn from ATMs per customer transaction over the
weekend has traditionally been 160 with a population standard deviation of 30. The branch
managers are investigating the possibility that the population mean amount of money withdrawn
µ has moved away from 160.
a. If a random sample of 36 customer transactions indicates that the sample mean withdrawal
amount is 172, is there evidence to believe that the population mean is no longer 160?
(Hint: formulate first your null and alternative hypotheses, then conduct the test with 5%
significance level).
b. How would your results change if the significance level is 1
c. How would your results change if the population standard deviation is 24?
Solution:
a.
⇢
H0 :
H1 :
µ=
µ 6=
$160
$160
Let’s first find Z↵/2 .
Since we use a significance level ↵ =0.05 then
P [Z  Z0.025 ] = 1
↵/2 = 1
0.05/2 = 0.975
the Z-table gives us Z0.025 = 1.96
Z=
172 160
p
= 2.4
30/ 36
Since Z
Z↵/2 (2.4 1.96), our sample is located in the rejection region, so we reject H0
with 5% significance level.
b. Let’s first find Z↵/2 .
Since we use a significance level ↵ =0.01 then
P [Z  Z0.005 ] = 1
↵/2 = 1
0.01/2 = 0.995
the Z-table gives us Z0.005 = 2.58
Z=
172 160
p
= 2.4
30/ 36
Since Z is between Z↵/2 and Z↵/2 ( 2.58  2.4  2.58), do not reject H0 . There is not
enough evidence to conclude that the mean amount of cash withdrawn per customer from the
ATM machine is not equal to 160.
23
c. ↵ = 0.05
Let’s first find Z↵/2 .
Since we use a significance level ↵ =0.05 then
P [Z  Z0.025 ] = 1
↵/2 = 1
0.05/2 = 0.975
the Z-table gives us Z0.025 = 1.96
Z=
172 160
p
=3
24/ 36
Since Z Z↵/2 (3 1.96), our sample is located in the rejection region, so we reject H0 with
5% significance level.
↵ = 0.01 Let’s first find Z↵/2 .
Since we use a significance level ↵ =0.01 then
P [Z  Z0.005 ] = 1
↵/2 = 1
0.01/2 = 0.995
the Z-table gives us Z0.005 = 2.58
Z=
172 160
p
=3
24/ 36
Since Z Z↵/2 (3 2.58), our sample is located in the rejection region, so we reject H0 with
1% significance level.
24
Download