4 Outline

advertisement
The Sample Mean is a Random Variable
The sample mean ð‘ĨĖ… is a random variable. Its value is determined through a random sampling process. The probability distribution of xĖ… is called the
sampling distribution.
How is the Sampling Distribution of 𝒙 of Generated?
Consider the population of five elements, A, B, C, D, and E. The numeric
value of each element is listed below:
Elements
A
B
C
D
E
All possible samples
A
B
C
A
B
D
A
B
E
A
C
D
A
C
E
A
D
E
B
C
D
B
C
E
B
D
E
C
D
E
ð‘ĨĖ…
6
9
12
15
21
The population mean and variance are computed as follows:
ð‘ĨĖ…
ð‘ĨĖ… − µ
-6.6
-3.6
-0.6
2.4
8.4
0.0
6
9
12
15
21
63
µ=
ïƒĨð‘ĨĖ…
σ2 =
𝑁
=
(ð‘ĨĖ… − µ)2
43.56
12.96
0.36
5.76
70.56
133.20
63
= 12.6
5
ïƒĨ(ð‘ĨĖ… − µ)2
𝑁
=
133.2
= 26.64
5
We select a random sample of size 𝑛 = 3 from this population, without
replacement. The following is the list all the possible samples. Each
possible sample has its own mean ð‘ĨĖ…. Since there are 10 possible samples,
then there are 10 possible sample means. Thus, the values of ð‘ĨĖ… are
assigned through a random sampling process. This makes ð‘ĨĖ… a random
variable
Sample values
6
9
12
6
9
15
6
9
21
6
12
15
6
12
21
6
15
21
9
12
15
9
12
21
9
15
21
12
15
21
ð‘ĨĖ…
9
10
12
11
13
14
12
14
15
16
The table below lists all the possible values of ð‘ĨĖ… in ascending order along
with the probability (relative frequency) of each value. This table (and
the chart) represent the probability distribution of ð‘ĨĖ… . The probability
distribution of xĖ… is called the sampling distribution of ð‘ĨĖ….
Sampling Distribution of 𝒙
ð‘ĨĖ…
9
10
11
12
13
14
15
16
𝑓(ð‘ĨĖ…)
0.1
0.1
0.1
0.2
0.1
0.2
0.1
0.1
1.0
Page 1 of 12
IMPORTANT PROPERTIES OF THE SAMPLING DISTRIBUTION OF 𝒙
The Mean of the Means Equals the Mean!
In the example above there are ten ð‘ĨĖ… values. Find the mean of these ten sample means, µð‘Ĩ . This calculation is shown in two ways.
Find µð‘Ĩ directly from the ten ð‘ĨĖ… values:
ð‘ĨĖ…
9
10
12
11
13
14
12
14
15
16
126
ïƒĨð‘ĨĖ… = 126
µð‘Ĩ =
126
= 12.6
10
Find the expected value of ð‘ĨĖ…, E(ð‘ĨĖ…), from the sampling distribution:
ð‘ĨĖ…
9
10
11
12
13
14
15
16
𝑓(ð‘ĨĖ…)
0.1
0.1
0.1
0.2
0.1
0.2
0.1
0.1
ð‘ĨĖ…𝑓(ð‘ĨĖ…)
0.9
1.0
1.1
2.4
1.3
2.8
1.5
1.6
12.6
µð‘Ĩ ≡ E(ð‘ĨĖ… ) = ïƒĨð‘ĨĖ…𝑓(ð‘ĨĖ… ) = 126
Note the important conclusion from these calculations:
Ė…) = µ
µð’™ ≡ 𝐄(𝒙
The mean of the (sample) means equals the (population) mean.
Page 2 of 12
The Variance and the Standard Error of 𝒙
The variance of ð‘ĨĖ…, denoted by ðŊ𝐚ðŦ(𝒙), is the measure of dispersion of the xĖ…
values around their center of gravity ð‘ĨĖ…, that is, around µð‘Ĩ . To find var(ð‘ĨĖ…)
use the sampling distribution of ð‘ĨĖ…. The var(ð‘ĨĖ…) is calculated as the weighted
mean of the squared deviations of ð‘ĨĖ… .
ð‘ĨĖ…
9
10
11
12
13
14
15
16
𝑓(ð‘ĨĖ…)
0.1
0.1
0.1
0.2
0.1
0.2
0.1
0.1
2
(ð‘ĨĖ… − µð‘Ĩ )
12.96
6.76
2.56
0.36
0.16
1.96
5.76
11.56
var(ð‘ĨĖ… ) =
26.64 5 − 3
(
) = 8.88(0.5) = 4.440
3
5−1
The square root of var(ð‘ĨĖ… ) is called the standard error of 𝒙:
)2
(ð‘ĨĖ… − µð‘Ĩ 𝑓(ð‘ĨĖ… )
1.296
0.676
0.256
0.072
0.016
0.392
0.576
1.156
4.440
se(ð‘ĨĖ… ) = √var(ð‘ĨĖ… )
se(ð‘ĨĖ… ) = √4.440 = 2.107
ðŊ𝐚ðŦ(𝒙) for Non-Finite Populations
For large populations the FPCF approaches 1. Therefore, var(ð‘ĨĖ…)
becomes:
var(ð‘ĨĖ…) =
var(ð‘ĨĖ… ) = ïƒĨ(ð‘ĨĖ… − µð‘Ĩ )2 𝑓(ð‘ĨĖ… ) = 4.440
The Relationship Between ðŊ𝐚ðŦ(𝒙) and the Parent Population
Variance, σ2
In this example, the parent population, the population from which the
samples are obtained, is a finite population. The criterion for a finite
population is when 𝑛⁄𝑁 ≥ 0.05. Here, 𝑛 = 3 and 𝑁 = 5, which gives us
𝑛⁄𝑁 = 0.60.
se(ð‘ĨĖ…) =
σ2
𝑛
σ
√𝑛
These are formulas that we will be using from this point
forward.
When the parent population is finite, var(ð‘ĨĖ… ) is related to σ2 according to
the following formula.
var(ð‘ĨĖ… ) =
σ2 𝑁 − 𝑛
(
)
𝑛 𝑁−1
The term (
𝑁−𝑛
) is
𝑁−1
called the finite population correction factor (FPCF).
Page 3 of 12
The Number of Possible Samples and Sample Means
The number of samples of size 𝑛 quickly jumps into the astronomical
figures. For example, the number of possible samples of size 𝑛 = 50
selected from a parent population with 𝑁 = 1,000 elements is:
C(1000, 50) = 9.5E+84. This is 9.5 with 84 zeros after it.
Thus, since there are astronomical (infinite) number of samples, and each
sample provides its own sample mean, then the number of ð‘ĨĖ… values is also
infinite. From here we can conclude that for all practical purposes ð‘ĨĖ… is a
continuous random variable.
The Sampling Distribution of 𝒙 Must be Normal
The sole application of the sampling distribution of ð‘ĨĖ… is in statistical
inference (in inferential statistics). For this purpose the sampling
distribution must be normally distributed.
Examples Using the Normal Sampling Distribution of
𝒙
Suppose the average per semester textbook expense by all IUPUI
students is $550 (the population mean µ = 𝟓𝟓𝟎) with a standard
deviation of $260 (population standard deviation 𝛔 = 𝟐𝟔𝟎).
Example 1
The textbook expenses of a random sample of size 𝑛 = 100 students is
obtained. The probability that the sample mean is less than $500 is _____.
Or, The proportion or percentage of xĖ… values obtained from samples of
size n = 100 that are less than 500 is _____.
P(ð‘ĨĖ… < 500) = _____.
In the following normal sampling distribution, the mean or expected value
of distribution of ð‘ĨĖ… is the mean of the parent population. The standard
deviation (standard error) is se(ð‘ĨĖ…) = σ⁄√𝑛.
500
550
xĖ…
You must first transform ð‘ĨĖ… into the standard normal variable.
𝑧=
E(xĖ…) = µ
xĖ…
The sampling distribution of xĖ… will be normal if the samples are selected
from a normal parent population, or, if the parent population is not normal,
the sample size is at least 30 (𝑛 ≥ 30). If the parent population is not
normal, samples smaller than 30 will not generate a normal sampling
distribution of xĖ… .
ð‘ĨĖ… − µ
se(ð‘ĨĖ… )
se(ð‘ĨĖ… ) =
𝑧=
σ
√𝑛
=
260
√100
= 26
500 − 550
= −1.92
26
P(𝑧 < −1.92) = 0.0274
Page 4 of 12
The parent population parameters:
μ = $550
σ = $260
Example 2
A sample of size 𝒏 = 𝟔𝟒 is selected. The
probability that the sample mean is greater than
$600 is _____. Or, the proportion or percentage of
ð‘ĨĖ… values obtained from samples of size 𝑛 = 64
that are greater than $600 is _____.
Example 3
A sample of size 𝒏 = 𝟖𝟎 is selected. The
probability that the sample mean is between
$506 and $594 is _____. Or, the proportion or
percentage of ð‘ĨĖ… values obtained from samples of
size n = 80 that are between $506 and $594 is
_____.
𝐏(𝒙 > 𝟔𝟎𝟎) = _____
𝐏(𝟓𝟎𝟔 < 𝒙 < 𝟓𝟗𝟒) = _____
550
𝑧=
ð‘ĨĖ… − µ
se(ð‘ĨĖ… )
se(ð‘ĨĖ… ) =
𝑧=
σ
√𝑛
600
xĖ…
506
se(ð‘ĨĖ… ) =
=
260
√64
= 32.5
600 − 550
= 1.54
32.5
P(𝑧 > 1.54) = 0.0618
σ
√𝑛
=
260
√80
550
= 29.069
594
Example 4
A sample of size 𝒏 = 𝟖𝟎 is selected. The
probability that the sample mean is within ±$40
from the population mean is _____. Or, The
proportion or percentage of ð‘ĨĖ… values obtained
from samples of size n=80 that are within ±$40
from the population mean is _____.
𝐏(𝛍 − 𝟒𝟎 < Ė…
𝒙 < 𝛍 + 𝟒𝟎) = _____
xĖ…
510
se(ð‘ĨĖ… ) =
σ
√𝑛
=
260
√80
550
xĖ…
= 29.069
𝑧=
506 − 550
= −1.51
29.069
𝑧=
510 − 550
= −1.38
29.069
𝑧=
594 − 550
= 1.51
29.069
𝑧=
590 − 550
= 1.38
29.069
P(−1.51 < 𝑧 < 1.51) = 0.8690
590
P(−1.38 < 𝑧 < 1.38) = 0.8324
Page 5 of 12
The parent population parameters:
μ = $550
σ = $260
Example 5
When 𝒏 = 𝟖𝟎, the proportion of ð‘ĨĖ… values that fall
within ±1.25 standard errors from the population
mean is _____.
Example 6
For samples of size 𝒏 = 𝟏𝟎𝟎, the interval which
contains the middle 90% (0.90) of sample means
is:
Ė…) < 𝒙
Ė… < 𝝁 + 𝟏. 𝟐𝟓𝐎𝐞(𝒙
Ė…)) = _____
𝐏(𝝁 − 𝟏. 𝟐𝟓𝐎𝐞(𝒙
Ė…ð‘ģ < 𝒙
Ė…<𝒙
Ė…𝑞 ) = 𝟎. 𝟗𝟎𝟎𝟎
𝐏(𝒙
Ė…ð‘ģ = ______
Ė…𝑞 = ______
𝒙
𝒙
Example 7
For samples of size 𝒏 = 𝟏𝟎𝟎, The interval
symmetric about the mean which contains 95%
(0.95) of sample means is:
Ė…ð‘ģ < 𝒙
Ė…<𝒙
Ė…𝑞 ) = 𝟎. 𝟗𝟓𝟎𝟎
𝐏(𝒙
Ė…ð‘ģ = ______
Ė…𝑞 = ______
𝒙
𝒙
se(ð‘ĨĖ… ) = 29.069
Ė…) = 𝟏. 𝟐𝟓(𝟐𝟗. 𝟎𝟕) = 𝟑𝟔. 𝟑
𝟏. 𝟐𝟓𝐎𝐞(𝒙
Ė… < 𝛍 + 𝟑𝟔. 𝟑)
𝐏(𝛍 − 𝟑𝟔. 𝟑 < 𝒙
Ė… < 𝟓𝟖𝟔. 𝟑)
𝐏(𝟓𝟏𝟑. 𝟕 < 𝒙
550
513.7
550
586.3
xĖ…
𝑧 = (513.7 − 550)⁄29.07 = −1.25
xĖ…
Find the quantity to be subtracted from and
added to µ to get the lower and upper end of the
desired interval.
ð‘ĨĖ… − µ
𝑧=
se(ð‘ĨĖ… )
ð‘ĨĖ… = µ + 𝑧 ∙ se(ð‘ĨĖ… )
Since each tail area under the normal curve is
0.05, then
𝑧 = (586.3 − 550)⁄29.07 = 1.25
Note: the z-calculation here are redundant. The
statement “within ±1.25 standard errors from
the population mean” implies 𝑧 = ±1.25.
P(−1.25 < 𝑧 < 1.25) = 0.7888
𝑧0.05 = 1.64
se(ð‘ĨĖ… ) = 260⁄√100 = 26
ð‘ĨĖ…ðŋ = 550 + (−1.64)(26) = 550 − 42.64 = 507.36
ð‘ĨĖ…𝑈 = 550 + (1.64)(26) = 550 + 42.64 = 592.64
550
xĖ…
α = 1 − 0.95 = 0.05
α⁄2 = 0.025
𝑧α⁄2 = 𝑧0.025 = 1.96
𝑀𝑂ðļ = 𝑧α⁄2 se(ð‘ĨĖ… )
𝑀𝑂ðļ = (1.96)(26) = 50.96 ≈ $51
ð‘ĨĖ…ðŋ = 550 − 51 = $499
ð‘ĨĖ…𝑈 = 550 + 51 = $601
The term 𝑧 ∙ se(ð‘ĨĖ… ) is called the “margin of
sampling error”, or simply, MARGIN OF ERROR
(𝑀𝑂ðļ). The combined two tail areas is called the
error probability and is denoted by α. Here,
α = 0.10.
Page 6 of 12
Determining the Sample Size for a Given Margin of Error
in Example 7, where 𝑛 = 100, we determined the interval that contains
95% of ð‘ĨĖ… values is within ±$51 from µ. That is, the 95% margin of error
(𝑀𝑂ðļ) turned out to be $51, when 𝑛 = 100. Generally,
P(µ − 50.96 ≤ ð‘ĨĖ… ≤ µ + 50.96) = 0.95
P[µ − 𝑧α⁄2 se(ð‘ĨĖ… ) ≤ ð‘ĨĖ… ≤ µ + 𝑧α⁄2 se(ð‘ĨĖ… )] = 1 − α
In this interval, 95% of ð‘ĨĖ… values deviate from µ by no more than $50.96
in either direction. Now suppose we are interested in an interval in
which the middle 95% of ð‘ĨĖ… values deviate from µ by no more than $20.
That is, we want the 95% margin of error to be $20.
This expression means that (1 − α)% of sample means fall within
𝑀𝑂ðļ = ±ð‘§α⁄2 se(ð‘ĨĖ…) from the population mean. Since se(ð‘ĨĖ… ) = σ⁄√𝑛,
then
P (µ − 𝑧α⁄2
σ
√𝑛
≤ ð‘ĨĖ… ≤ µ + 𝑧α⁄2
σ
√𝑛
This smaller 𝑀𝑂ðļ, a narrower interval, requires a different sample size.
How do we determine the proper n to obtain this 𝑀𝑂ðļ? Consider the
𝑀𝑂ðļ formula:
)=1−α
This shows that the margin of error is inversely related to the sample
size n. The larger the sample size, the smaller the MOE. In example 7,
since we have chosen the error probability be α = 0.05, then,
P (µ − 𝑧0.025
σ
√𝑛
≤ ð‘ĨĖ… ≤ µ + 𝑧0.025
P(µ − 20 ≤ ð‘ĨĖ… ≤ µ + 20) = 0.95
σ
√𝑛
) = 0.95
𝑀𝑂ðļ = 𝑧α⁄2
σ
√𝑛
Solving for n, we have,
𝑧α⁄2 σ 2
)
𝑀𝑂ðļ
𝑛=(
Thus, to obtain a 95% margin of error of $20, the minimum sample size
should be:
1.96 × 260 2
𝑛=(
) = 649.2
20
xĖ…
P (µ − 1.96
260
√100
≤ ð‘ĨĖ… ≤ µ + 1.96
260
√100
Since we are interested in the minimum sample size, then round up n to
the nearest integer:
𝑛 = 650
) = 0.95
Page 7 of 12
Sampling Distribution of the Sample Proportion 𝒑
Everything you learned about the sampling distribution of ð‘ĨĖ… applies equally to the sampling distribution of 𝑝. The only difference between the two is that
in the sampling distribution of 𝑝 all data are binary. Therefore, the symbols change accordingly.
A Comparison
Sampling Distribution of 𝒙
Parent population mean:
Parent population variance:
µ = ïƒĨð‘ĨĖ… ⁄𝑁
σ = ïƒĨ(ð‘ĨĖ… − µ)2 ⁄𝑁
Sampling Distribution of 𝒑
Parent population mean (proportion):
Parent population variance:
π = ïƒĨð‘ĨĖ… ⁄𝑁
σ2 = π(1 − π)
There are infinite number of possible samples of size 𝑛 obtainable from the
parent population. Therefore, there are infinite number of sample mean ð‘ĨĖ…
values.
There are infinite number of possible samples of size n obtainable from the
parent population. Therefore, there are infinite number of sample
proportion pĖ… values.
The center of gravity, the expected value, or the mean of the sample means
equals the population mean. (The mean of the means equals the mean).
Ė…) = µð’™ = µ
𝐄(𝒙
The center of gravity, the expected value, or the mean of the sample
proportions equals the population proportion. (The mean of the
Ė… ) = µð’‘ = 𝛑
proportions equals the proportion).
𝐄(𝒑
The variance of the sample means is:
(𝑛 is the sample size)
var(ð‘ĨĖ… ) = σ2 ⁄𝑛
The variance of the sample proportions is: var(ð‘ĨĖ… ) = σ2 ⁄𝑛 = π(1 − π)⁄𝑛
(𝑛 is the sample size)
The standard error of the means is:
se(ð‘ĨĖ… ) = σ⁄√𝑛
The standard error of the proportions is: se(𝑝) = √π(1 − π)⁄𝑛
To be applicable in statistical inference, the sampling distribution of ð‘ĨĖ… must
be normal.
To be applicable in statistical inference, the sampling distribution of pĖ… must
be normal.
Page 8 of 12
Examples
Use the following information for the examples below. The proportion of households in a state who are home-owners is 0.67 (67%): 𝛑 = 𝟎. 𝟔𝟕.
Example 1
A sample of size 𝑛 = 500 households is selected.
The probability that the sample proportion 𝑝 is
less than 0.65 is _____. Or, The proportion or
percentage of 𝑝 values obtained from samples of
size 𝑛 = 500 that are less than 0.65 is _____.
Example 2
A sample of size 𝑛 = 500 is selected. The
probability that the sample proportion 𝑝 is
greater than 0.71 is _____. Or, The proportion or
percentage of 𝑝 values obtained from samples of
size 𝑛 = 500 that are greater than 0.71 is _____.
P(𝑝 < 0.65) = ______
P(𝑝 > 0.71) = ______
Example 3
A sample of size 𝑛 = 600 is selected. The
probability that the sample proportion 𝑝 is
between 0.63 and 0.71 is _____. Or, The
proportion or percentage of 𝑝 values obtained
from samples of size 𝑛 = 600 that are between
0.63 and 0.71 is _____.
P(0.63 < 𝑝 < 0.71) = _____
𝑧=
𝑝−π
se(𝑝)
se(𝑝) = 0.0210
𝑧=
π(1 − π)
0.67(1 − 0.67)
se(𝑝) = √
=√
= 0.0210
𝑛
500
𝑧=
0.65 − 0.67
= −0.95
0.0210
P(𝑧 < −0.95) = 0.1711
0.71 − 0.67
= 1.90
0.0210
P(𝑧 > 1.90) = 0.0287
0.67(1 − 0.67)
se(𝑝) = √
= 0.0192
600
𝑧=
0.63 − 0.67
= −2.08
0.0192
𝑧=
0.71 − 0.67
= 2.08
0.0192
P(−2.08 < 𝑧 < 2.08) = 0.9624
Page 9 of 12
The parent population parameter: 𝛑 = 𝟎. 𝟔𝟕
Example 4
A sample of size 𝑛 = 800 is selected. The probability that the sample
proportion is within ±0.03 (3 percentage points) from the population
proportion is _____. Or, The proportion or percentage of pĖ„ values obtained
from samples of size 𝑛 = 800 that are within ±0.03 from the population
proportion is _____.
P(π − 0.03 < 𝑝 < π − 0.03) = _____
Example 5
A sample of size 𝑛 = 800 is selected. The probability that the sample
proportion is within ±1.5 standard errors from the population
proportion is _____. Or, The proportion or percentage of pĖ„ values
obtained from samples of size 𝑛 = 600 that are within ±1.5 standard
errors from the population proportion is _____.
P[π − 1.50se(𝑝) < 𝑝 < π − 1.50𝑠𝑒(𝑝)] = _____
0.645
π(1 − π)
0.67(1 − 0.67)
se(𝑝) = √
=√
= 0.0166
𝑛
800
𝑧=
𝑝 − π 0.64 − 0.67
=
= −1.81
se(𝑝)
0.0166
0.67
0.695
pĖ…
se(𝑝) = 0.0166
π ± 1.50se(𝑝) = 0.67 ± 0.025
P(0.645 < 𝑝 < 0.695) = _____
P(−1.50 < 𝑧 < 1.50) = 0.8664
𝑧=
0.70 − 0.67
= 1.81
0.0166
Again, you can see that the calculations were redundant. “1.50 standard
errors from the population proportion” implies “𝑧 = ±1.50”.
P(−1.81 < 𝑧 < 1.81) = 0.9298
Nearly 93% of sample proportions (for samples of 𝑛 = 800) fall within
±0.03 (3 percentage points) from the population proportion. Alternatively,
nearly 93% of sample proportions deviate from the population proportion
by no more than ±0.03.
Page 10 of 12
The parent population parameter:
𝛑 = 𝟎. 𝟔𝟕 Sample Size:
𝒏 = 𝟗𝟎𝟎
Example 6
The interval which contains 80% of 𝑝 values is
P(____< 𝑝 < ____) = 0.80
Example 7
The interval which contains 90% of 𝑝Ė„ values is
P(____< 𝑝 < ____) = 0.90
Find the quantity to be subtracted from and
added to π to get the lower and upper end of the
desired interval.
α = 1 − 0.90 = 0.10
𝑧α⁄2 = 𝑧0.05 = 1.64
𝑝−π
𝑧=
se(𝑝)
𝑝 = µ + 𝑧0.10 se(𝑝)
Since each tail area under the normal curve is
0.10, then 𝒛𝟎.𝟏𝟎 = 𝟏. 𝟐𝟖.
α⁄2 = 0.05
Example 8
The interval which contains 95% of pĖ„ values is
P(____< 𝑝 < ____) = 0.95
α = 1 − 0.95 = 0.05
𝑧α⁄2 = 𝑧0.025 = 1.96
α⁄2 = 0.025
𝑀𝑂ðļ = 𝑧α⁄2 se(𝑝)
𝑀𝑂ðļ = 𝑧α⁄2 se(𝑝)
𝑀𝑂ðļ = (1.64)(0.0157) = 0.026
𝑀𝑂ðļ = (1.96)(0.0157) = 0.031
ð‘ĨĖ…ðŋ = 0.67 − 0.026 = 0.644
ð‘ĨĖ…ðŋ = 0.67 − 0.031 = 0.639
ð‘ĨĖ…𝑈 = 0.67 + 0.026 = 0.696
ð‘ĨĖ…𝑈 = 0.67 + 0.031 = 0.701
0.67(1 − 0.67)
se(𝑝) = √
= 0.0157
900
𝑝ðŋ = 0.67 + (−1.28)(0.0157) = 0.67 − 0.02 = 0.65
𝑝𝑈 = 0.67 + (1.28)(0.0157) = 0.67 + 0.02 = 0.69
The term 𝒛 ∙ 𝐎𝐞(𝒑) is called the “margin of
sampling error”, or, more simply MARGIN OF
ERROR (𝑀𝑂ðļ)
The combined two tail areas is called the error
probability and is denoted by α. Here, α = 0.20.
Page 11 of 12
Determining the Sample Size for a Given Margin of Error
in Example 8, where 𝑛 = 900, we determined the interval that contains
95% of 𝑝 values is within ±0.031 (3.1 percentage points) from µ. That
is, the 95% margin of error (𝑀𝑂ðļ) turned out to be 0.031, when 𝑛 =
900. Generally,
0.67(1 − 0.67)
0.67(1 − 0.67)
P (π − 1.96√
≤ 𝑝 ≤ π + 1.96√
) = 0.95
900
900
P(π − 0.031 ≤ 𝑝 ≤ π + 0.031) = 0.95
P[π − 𝑧α⁄2 se(𝑝) ≤ 𝑝 ≤ π + 𝑧α⁄2 se(𝑝)] = 1 − α
This expression means that (1 − α)% of sample proportions fall within
𝑀𝑂ðļ = ±ð‘§α⁄2 se(𝑝) from the population proportion. Since
π(1 − π)
Since se(𝑝) = √
, then
𝑛
In this interval, 95% of 𝑝 values deviate from π by no more than 0.031
(3.1 percentage points) in either direction. Now suppose we are
interested in an interval in which the middle 95% of 𝑝 values deviate
from π by no more than 0.02 (2 percentage points). That is, we want the
95% margin of error to be 0.02.
P(π − 0.02 ≤ ð‘ĨĖ… ≤ π + 0.02) = 0.95
π(1 − π)
π(1 − π)
P (π − 𝑧α⁄2 √
≤ 𝑝 ≤ π + 𝑧α⁄2 √
)=1−α
𝑛
𝑛
This smaller 𝑀𝑂ðļ, a narrower interval, requires a different sample size.
How do we determine the proper n to obtain this 𝑀𝑂ðļ? Consider the
𝑀𝑂ðļ formula:
This shows that the margin of error is inversely related to the sample
size n. The larger the sample size, the smaller the MOE. In example 8,
since we have chosen the error probability be α = 0.05, then,
π(1 − π)
𝑀𝑂ðļ = 𝑧α⁄2 √
𝑛
π(1 − π)
π(1 − π)
P (π − 𝑧0.025 √
≤ 𝑝 ≤ π + 𝑧0.025 √
) = 0.95
𝑛
𝑛
Solving for n, we have,
𝑧α⁄2 2
) π(1 − π)
𝑀𝑂ðļ
𝑛=(
Thus, to obtain a 95% margin of error of 0.02, the minimum sample size
should be:
1.96 2
𝑛=(
) 0.67(1 − 067) = 2123.44
0.02
pĖ…
Since we are interested in the minimum sample size, then round up n to
the nearest integer:
𝑛 = 2124
Page 12 of 12
Download