5 Outline

advertisement
Statistical Estimation of a Population Parameter
Statistical estimation involves both point estimation and interval estimation.
Point estimation involves a single number computed from the sample data, e.g., the sample mean, as an estimate of the population parameter. It is called
a "point" estimate because one point on the real number line is used to estimate the population parameter.
Interval estimation involves a range of values which may contain the population parameter. Associated with this interval is a percentage value
reflecting the level of confidence that the interval may contain the actual population parameter.
Confidence Interval (Interval Estimates) for the Population Mean μ
Background—Review of Sampling Distribution of xĖ… and Margin of Sampling Error
From a population with a mean µ = $190 and standard deviation σ = $64 samples of size n = 100 are selected. The sampling distribution of xĖ… , the sample
means computed from these samples, will be a normal curve with a mean µxĖ… = µ = 190 and standard error se(ð‘ĨĖ…) = 𝜎⁄√𝑛 = 6.4. Determine the
boundaries of the interval that contains 95% of all ð‘ĨĖ… values computed from such samples.
P(ð‘ĨĖ…1 < ð‘ĨĖ… < ð‘ĨĖ…2 ) = 0.95
𝑧=
ð‘ĨĖ… − 𝜇
se(ð‘ĨĖ… )
ð‘ĨĖ…1 = 𝜇 − 𝑧0.025 se(ð‘ĨĖ… )
ð‘ĨĖ…2 = 𝜇 + 𝑧0.025 se(ð‘ĨĖ…)
P(𝜇 − 𝑧0.025 se(ð‘ĨĖ… ) < ð‘ĨĖ… < 𝜇 + 𝑧0.025 se(ð‘ĨĖ… )) = 0.95
MOE = 𝑧0.025 se(ð‘ĨĖ…) = 𝑧0.025 𝜎⁄√𝑛 = 1.96(6.4) = 12.5
P(𝜇 − 12.5 < ð‘ĨĖ… < 𝜇 + 12.5) = 0.95
Thus, 95% of all sample means from random samples of n = 100 would fall
within MOE = ±$12.5 from the population mean µ = $190. If 95% of ð‘ĨĖ… ’s fall
within MOE, then the error probability, the probability that a sample mean
falls outside MOE, is α = 5%.
Page 1 of 11
P(𝜇 − 𝑧0.025 se(ð‘ĨĖ… ) < ð‘ĨĖ… < 𝜇 + 𝑧0.025 se(ð‘ĨĖ…)) = 0.95
P(190 − 12.5 < ð‘ĨĖ… < 190 + 12.5) = 0.95
Example
One of the infinite number of random samples of n = 100 selected from the
above population yields the following data:
205
282
142
115
230
106
108
129
123
223
81
243
195
142
160
167
292
153
215
277
149
265
132
254
175
279
93
121
179
292
215
171
105
266
211
112
246
167
175
156
139
160
273
90
262
190
273
205
98
240
225
125
136
160
199
147
199
248
132
118
200
299
210
189
182
123
168
149
207
220
297
92
131
101
120
250
269
290
157
287
207
221
194
144
182
194
106
145
91
161
149
94
293
117
170
202
227
81
247
227
The mean of this sample is
xĖ… = ïƒĨx ∕n = 18363 ∕ 100 = 183.63
This sample mean is one of the 95% of all sample means that fall within the
interval µ ± MOE = 190 ± 12.5.
Page 2 of 11
Building Intervals Around xĖ…
Now, using the MOE = $12.5 build an interval around the sample mean
xĖ… = 183.6:
L = 183.6 – 12.5 = 171.1
U = 183.6 + 12.5 = 196.1
This interval contains or includes the population mean µ = 190.
Another sample (data not shown) yields a sample mean xĖ… = 196.7. Using
the same MOE = $12.5, build an interval around this new sample mean.
This interval also contains µ = 190
Let’s select a third sample and compute the mean and build an interval
around this xĖ… . The new mean is xĖ… = 181.0 and the new interval is L, U =
(168.5, 193.5). This interval also contain µ.
Let’s consider an interval that does not contain the population mean.
This time xĖ… = 204.2 falls outside the MOE. Therefore, the interval L, U =
(191.7, 216.7) does not contain or include µ = 190. A situation like this
could happen 5% of the time. That is, 5% of sample means would
generate intervals that would not contain the population mean.
Now you can see that in 95% of cases the intervals built around different
sample means will contain µ. The reason for this is clear. Because 95% of xĖ…
values deviate from the mean by no more MOE = ±$12.5. Each time you
build an interval around xĖ… using MOE = 12.5, of every 100 of these intervals
95 of them would contain the population mean.
Thus, when you select a sample of size n from this population and build
an interval around the sample mean, then you are 95% confident that
this interval contains the population mean µ.
Building a Confidence Interval for µ When σ is Not Known
Example 1
To build an interval with a 95% confidence level for vehicle speed on a freeway, a random sample of n = 100 vehicles are clocked and the following
sample data were obtained. The data are in miles per hour.
71
68
71
78
76
87
66
83
89
83
73
81
86
84
82
79
79
72
89
82
72
67
66
86
89
74
86
89
76
79
Using Excel, compute the sample mean and standard deviation.
Page 3 of 11
86
82
70
90
78
66
80
73
75
65
87
65
76
66
71
75
73
69
90
74
70
80
70
78
66
71
68
82
86
89
72
86
85
79
70
71
73
79
76
83
71
90
77
80
77
81
78
69
79
86
75
71
78
73
71
72
80
76
76
82
78
87
90
65
90
88
88
87
89
73
ð‘ĨĖ… =
∑ ð‘ĨĖ…
= 77.85
𝑛
∑(ð‘ĨĖ… − ð‘ĨĖ… )2
𝑠=√
= 7.41
𝑛−1
The interval is built around xĖ… = 77.85. The specified “confidence level” is 95%. We want to be 95% confident that the interval we build would contain the
population mean. This means that the error probability α (the probability that the interval does not contain the mean) is 5%.
The formula for the confidence interval is:
L , U = xĖ… ± MOE
MOE = zα/2 se(xĖ… )
Given α = 0.05, then zα/2 = z0.025 = 1.96. To compute se(xĖ… ), previously we used the formula se(xĖ… ) = σ ∕ n. But this formula is no longer applicable because
σ, the population standard deviation, is not known. Instead of σ we must use s, the standard deviation computed from the sample. The standard error
formula then becomes:
se(xĖ… ) = s ∕n
se(xĖ… ) = 7.41 ∕ 100 = 0.741
MOE = (1.96)(0.741) = 1.45 mph
L, U = 77.85 ± 1.45 = (76.4, 79.3). We can state, with a 95% level of confidence, that the average speed of all vehicles on the freeway is between 76.4 and
79.3 mph.
Page 4 of 11
Example 2
To build a 95% confidence interval for the average commuting distance
travelled by IUPUI students from their residence to the campus, a random
sample of n = 120 students provided the following data (in miles).
15
8
6
8
10
23
22
19
16
6
7
8
10
8
21
7
15
14
13
22
24
2
12
14
13
4
4
25
17
25
7
6
25
9
10
23
19
6
9
12
10
7
1
13
19
20
5
14
22
23
8
6
11
18
12
21
13
18
24
22
14
25
12
5
16
13
23
7
16
7
3
24
20
7
13
1
5
2
12
17
1
1
23
15
13
5
5
8
6
23
12
21
13
9
5
11
12
22
5
15
16
4
9
17
2
2
20
10
17
8
13
4
13
9
15
24
2
17
5
9
Example 3
To develop a 95% confidence interval for the mean amount spent per
customer at a famous downtown Indianapolis steakhouse, data were
collected for a sample of 108 customers, shown below. The data are
rounded to the nearest dollar.
33
34
44
38
54
33
28
45
31
42
46
32
29
56
37
51
44
53
39
30
41
31
43
49
41
32
58
43
32
32
32
54
46
57
39
53
49
34
28
36
56
46
52
45
42
39
53
35
32
52
32
52
31
58
44
32
34
36
47
45
53
28
51
32
54
47
48
40
58
58
47
42
38
54
42
47
53
40
31
44
53
45
48
38
55
57
29
38
45
42
45
57
45
34
57
42
50
33
33
Using Excel, compute the sample mean and standard deviation:
Using Excel, compute the sample mean and standard deviation:
xĖ… = 12.45
xĖ… = $42.74 s = $9.15
s = 6.896
The confidence interval for µ:
The confidence interval for µ:
L , U = xĖ… ± MOE
MOE = zα/2 se(xĖ… )
zα/2 = z0.025 = 1.96
se(xĖ… ) = s /n = 6.896/120 = 0.63
MOE = 1.96(0.63) = 1.23 miles
L , U = 12.45 ± 1.23 = (11.22, 13.68)
L , U = xĖ… ± MOE
MOE = zα/2 se(xĖ… )
zα/2 = z0.025 = 1.96
se(xĖ… ) = s ∕n = 9.15 ∕108 = 0.88
MOE = 1.96(0.88) = $1.72
L , U = 42.74 ± 1.72 = ($41.02, $44.46)
Page 5 of 11
35
54
31
34
33
57
30
41
56
Confidence Intervals For µ Using “Small” Samples
When building a confidence interval for an unknown population parameter, to determine the MOE we use a standard error which is computed using the
sample standard deviation “s” as an estimate of the population standard deviation σ. Thus, we are using an estimated value, s, to build an interval around
another estimated value, xĖ… . This tends to lend an extra level of uncertainty to our interval estimate. The consequence of the added level of uncertainty is
to widen the margin of statistical error. When the sample size is large, the impact of added level of uncertainty on the MOE is inconsequential and can be
ignored. But with small samples we cannot ignore the impact.
Margin of Error with Small Samples
Generally, when sample size n is 100 or more (n ≥ 100) the sample is considered to be a large sample. Accordingly, the MOE formula is:
s
MOE = zα/2
n
But, if the sample size is less than 100 (n < 100) the sample is considered as a small sample, and an alternative MOE formula is used:
s
MOE = tα/2, df
n
The term “tα/2, df” represents the “t distribution”, which is explained next.
Page 6 of 11
The z-distribution and t-distribution compared
The standard normal z-distribution is a bell-shaped distribution with the
mean of zero (0) and standard deviation of one (1). For a given error
probability α, say, α = 0.05,
zα/2 = z0.025 = 1.96
The t-distribution is also a bell-shaped distribution with the mean of
zero. However, the standard deviation of “t” is not 1. The standard
deviation of t varies with the degrees of freedom (df) of the distribution.
This is why the term df appears in the subscript of the t-distribution.
The distribution’s degrees of freedom is defined as df = n – 1 (sample
size minus 1).
When the distribution’s df = 9, the standard deviation is 1.13. (Don’t
worry about the formula to compute the standard deviation of t!). With
a bigger standard deviation, the t-scores are more widely dispersed
around mean of 0 than z-scores. Therefore, the t-score which bounds a
tail area of 0.025 (t = 2.262) is located further away from zero than the
z-score bounding the same tail area (z = 1.96).
The following table shows that as the sample size (and degrees of
freedom) increases, the t-score converges to the z-score.
Tail area
0.025
Degrees of Freedom
10
20
30
50
100
200 1000
2.228 2.086 2.042 2.009 1.984 1.972 1.962
You can find the t-score for different tail areas and degrees of freedom
using the t-table, or Excel (use the function =TINV).
Page 7 of 11
Example 4
To build a 95% confidence interval for the average lifespan of compact
fluorescent light bulbs (CFL) a sample of n = 10 light bulbs were tested and
following data (in hours) were obtained:
7010 8120 6670 9300 9450
9980 8600 6820 9750 8300
Using Excel or a calculator, the mean and standard deviation of the sample
are:
xĖ… = 8400
s = 1237.435
L , U = xĖ… ± MOE
s
MOE = tα/2, df
n
n = 10 df = 10 – 1 = 9 α = 0.05
t0.025, 9 = 2.262
MOE = (2.262)(1237.435/10) = 885.15 hours
L, U = 8400 ± 885.15 = (7514.85, 9285.15) hours
Example 5
To build a 95% confidence interval for the average time spent to
commute from his residence in Fishers to his office in downtown
Indianapolis, Bob kept track of his time in 15 randomly selected days in
a three-month period and recorded the following data (in minutes).
42
55
58
63
58
43
62
56
54
48
62
61
53
40
55
xĖ… = 54
s = 7.531
se(xĖ… ) = 7.531 ∕15 = 1.944
t0.025, 14 = 2.145
MOE = 2.145(1.944) = 4.2 minutes
L, U = 54 ± 4.2 = (49.8, 58.2) minutes
Minimum Sample Size for a Desired Margin of Error (Desired Confidence Interval Width)
When building a confidence interval, the narrower the interval the more precise, hence more meaningful or useful, the interval estimate. To make the
interval more precise, we must make the MOE smaller. To do so we must increase the sample size. Depending on the nature of the statistical analysis, the
sample size varies. The proper sample size depends upon the margin of error desired for the statistical analysis.
Example 6
In example 5 a sample size of n = 10 light bulbs yielded a margin of error of MOE = 885.15 hours. Suppose we are interested in a narrower MOE, say,
MOE = 200 hours, for a 95% confidence interval. What is the required minimum sample size for this margin of error?
The formula to find the minimum sample size is as follows:
𝑧𝛞⁄2 𝜎Ė‚ 2
𝑛=(
)
𝑀𝑂ðļ
The new term in the formula is 𝜎Ė‚ (sigma hat), which is called the “planning value” for the standard deviation (see Chapter 5 notes about how to obtain a
planning value).
For the light bulb example use σĖ‚ = 1200 hours.
Page 8 of 11
1.96 × 1200 2
𝑛=(
) = 138.3
200
n = 139 light bulbs (always round the result UP)
Example 7
To build a 95% CI for Bob’s average commuting time in Example 5 with a margin of error of ±2 minutes, what is the minimum number of days Bob should
keep track of his commuting time? Use σĖ‚ = 8 minutes for planning value.
1.96 × 8 2
𝑛=(
) = 61.47
2
n = 62 days
Confidence Interval for the Population Proportion π
To build a confidence interval for the population proportion first you must compute the sample proportion pĖ… from the sample and then determine the
margin of error using pĖ… .
L, U = pĖ… ± MOE
MOE = zα/2se(pĖ… )
To compute se(pĖ… ) you must use pĖ… as an estimate of π in the standard error formula.
𝑝(1 − 𝑝)
se(𝑝) = √
𝑛
Example 7
To build a 95% confidence interval for the proportion of Indiana residents who smoke cigarettes, in a sample of 750 Hoosiers 195 said they smoked
cigarettes regularly. Compute the sample proportion and the margin of error to build the interval.
pĖ… = 195 ∕ 750 = 0.26
0.26(1 − 0.26)
se(𝑝) = √
= 0.0160
750
Page 9 of 11
α = 1 – 0.95 = 0.05
zα/2 = z0.025 = 1.96
MOE = 1.96(0.016) = 0.031
MOE ≈ 0.03
L, U = 0.26 ± 0.03 = (0.23, 0.29)
We are 95% confident that the proportion of Hoosiers who smoke cigarettes is between 0.23 and 0.29.
Example 8
In a poll conducted recently, of the 1100 American surveyed 737 said high gas prices have caused them financial hardship. Build a 95% confidence
interval for the proportion of all Americans who feel high gas prices have caused them financial hardship.
pĖ… = 737 ∕ 1100 = 0.67
0.67(1 − 0.67)
se(𝑝) = √
= 0.0142
1100
α = 1 – 0.95 = 0.05
zα/2 = z0.025 = 1.96
MOE = 1.96(0.0142) = 0.028 MOE ≈ 0.03
L, U = 0.67 ± 0.03 = (0.64, 0.70)
Minimum Sample Size for a Desired Margin of Error
Depending on the context or nature of the statistical investigation, the researcher may consider a specific width or precision for the interval estimate for
the population proportion. The narrower the interval estimate, for a given confidence level, the more precise the estimate of the population parameter it
is. The precision of the interval estimate depends on the margin of error, which varies inversely with the sample size.
p (1  p )
n
Thus, we can specify the MOE in advance of the study and determine the sample size that would yield that margin of error. We can rearrange the MOE
formula to solve for n:
MOE = zα/2
𝑧𝛞⁄2 2
𝑛=(
) 𝜋Ė‚(1 − 𝜋Ė‚)
𝑀𝑂ðļ
Note that in the n formula we have replaced the symbol pĖ… with πĖ‚ (pi-hat). We cannot use pĖ… because the sample proportion is computed from a sample
that is already selected! The symbol πĖ‚ is for the “planning value”. In many cases a planning value of πĖ‚ = 0.50 is used. This would give the largest
minimum sample size for a desired margin of error.
Example 9
To build a 95% confidence interval with a margin of error of ±0.03 (3 percentage points) for the proportion of likely voters who prefer the candidate Ima
Loozer in a statewide election how many likely voters should be randomly contacted? Since this is a two-way race a planning value of 0.50 is used.
Page 10 of 11
α = 1 – 0.95 = 0.05
zα/2 = z0.025 = 1.96
πĖ‚ = 0.50
n = 0.5(0.5)(1.96 ∕0.03)² = 1067.11
n = 1068 (round UP)
MOE = 0.03
Example 10
Suppose the election in example 9 is a three-way race and there is a strong third-party candidate Iwanna Winn. What is the minimum sample size for a
95% confidence interval with a margin of error of ±0.03 for the proportion of likely voters who prefer Ima Loozer? This time use a planning value of πĖ‚ =
0.35.
n = 0.35(0.65)(1.96 ∕0.03)² = 971.07
Page 11 of 11
n = 972 (round UP)
Download