3680 Lecture 15

advertisement
Math 3680
Lecture #15
Confidence Intervals
Review: Suppose that E(X) = m and SD(X) = s.
Recall the following two facts about the average of
n observations drawn with replacement:
E( X )  m
SD( X ) 
s
n
sX
Estimation
Example: A university has 25,000 registered
students. In a survey of 318 students, the average
age of the sample is found to be 22.4, with a sample
SD of 4.5 years. Estimate the average age of all
25,000 students, and attach a standard error to this
estimate.
Wrong Answer: The average age of the
student body is exactly 22.4 years.
What is wrong with this simplistic analysis?
Answer: Of course, we estimate the
average of the population to be 22.4 years –
but this estimate will not be exact. To
determine the magnitude of the error, we
need to find the SE, and that means a box
model.
25,000 tickets
Average = ??
SD = ??
318 draws
Bootstrap Estimation: Although the SD of the box
is unknown, we estimate the SD of the box from the
fractions in the sample:
SD of box  4.5
25000  318 4.5
SE of the sample average 
25000  1 318
 0.251.
(Why?)
Conclusion: The average age is about 22.4 years,
give or take 0.251 years or so.
Confidence Intervals:
Large samples or known s
0.4
68%
0.3
0.2
0.1
-0.994458
0.994458
We say that the range
22.40.251 years = 22.149-22.651 years
is a 68% confidence interval for the average age of the
population.
0.4
95%
0.3
0.2
0.1
-1.95996
1.95996
We say that the range
22.4(1.96)(0.251) years = 21.909-22.891 years
is a 95% confidence interval for the average age of the
population.
0.4
99.7%
0.3
0.2
0.1
-2.96774
2.96774
We say that the range
22.4(2.968)(0.251) years = 21.656-23.144 years
is a 99.7% confidence interval for the average age of
the population.
0.4
1 - 2a
0.3
0.2
za
0.1
z1a
In general, we say that the range
 s 
 s 
X  za 
  m  X  z1a 

 n
 n
is a 1 - 2a confidence interval for the population
average m.
Logic:
P za  Z  z1a   1  2a


X m
P za 
 z1a   1  2a
s/ n


s
s 

P za
 X  m  z1a
  1  2a
n
n

s
s 

P za
 m  X  z1a
  1  2a
n
n

s
s 

P X  za
 m  X  z1a
  1  2a
n
n

Observations:
1) We are NOT saying that 95% of the students
are between 21.9 and 22.9 years old – this is
patently ridiculous, of course.
2) We are NOT saying that there is a 95% chance
that the average age is between 21.9 and 22.9
years. The population average is constant – it is
either in this range or it is not.
Observations:
3) The true interpretation is as follows: If several
people run this experiment and they all find a 95%confidence interval, then the true population
parameter will lie in about 95% of these intervals.
100 different 95% confidence intervals
23.5
23
22.5
22
21.5
100 different 68% confidence intervals
23.25
23
22.75
22.5
22.25
22
21.75
23.5
23
22.5
22
21.5
100 different 95% confidence intervals,
n = 4 x 318 =1272
Observations:
4) In the previous problem, we replaced the
population s with the sample s. (When did we do
this?) As it turns out, this makes little practical
difference for large samples.
More on this later when we consider small samples.
Observations:
5) The normal approximation has been used. As
discussed earlier, a large number of draws is
required for this assumption to hold.
6) Remember: There is no such thing as a 100%
confidence interval. In practice, scientists often
use 95% as a balance between a high confidence
level and a narrow confidence interval.
Example: In a simple random sample of 680
households (in a city of millions), the average
number of TV sets is 1.86, with an SD of 0.80. Find
a 95% confidence interval for the average number of
TV sets per household in the city.
True or false:
(i) 1.860.06 is a 95%-confidence interval for this
population average.
(ii) 1.860.06 is a 95%-confidence interval for this
sample average.
(iii) There is a 95% chance for the population
average to be in the range 1.860.06.
Example: The chart to the
right shows platelet counts
among 120 geriatric
patients. Find a 95%
confidence interval for the
average platelet count
among geriatric patients.
132
127
214
184
181
211
190
139
112
105
174
143
135
185
120
235
142
129
134
154
117
125
194
163
181
108
212
129
126
256
106
142
110
114
143
129
125
203
168
162
176
198
131
129
125
254
228
174
125
142
194
104
107
188
179
198
184
115
229
103
126
208
208
138
123
244
139
108
142
175
181
184
137
106
178
150
238
101
169
105
142
105
101
110
117
139
147
106
115
131
196
112
111
102
124
180
111
178
148
125
120
146
139
247
176
179
170
141
147
119
232
141
112
104
242
187
129
133
185
151
Fill in the blanks with either box or draws.
Probabilities are used when reasoning from the
__________ to the _____________.
Confidence levels are used when reasoning from the
____________ to the ______________.
Fill in the blank with either observed or
expected.
The chance error is in the _______________ value.
Fill in the blank with either sample or population.
The confidence level is for the ______________
average.
Confidence Intervals:
Projecting Sample Size
Example: In a preliminary simple random sample of
680 households (in a city of millions), the average
number of TV sets in the sample households is 1.86,
with an SD of 0.80.
Suppose that it’s desired to construct a 90%
confidence interval which has a margin of error of
0.03. How large a sample would be necessary?
Solution:
0.4
 0. 8 
z0.95 
  0.03
 n
 0. 8 
1.645
  0.03
 n
0.3
0.2
0.1
43.867  n
1924.3  n
-1.64485
1.64485
So, the sample size should be at least 1925
Confidence Intervals:
Small samples
Example: A biological research team measures
the weights of 14 chipmunks, randomly chosen.
Find a 90% confidence interval for the average
weight of chipmunks.
7.6
8.2
8.66 9.41 8.45 8.08 8.86 7.48
9.24 9.34 9.58 10.1 8.55 9.15
Note: The previous calculations used the fact that
X m
s/ n
approximately follows the normal curve for large
values of n. In this problem, we cannot use this
approximation.
However, for both small and large samples, we can
use the fact that
X m
S/ n
approximately follows the Student’s t-distribution
with n - 1 degrees of freedom.
0.4
1 - 2a
0.3
0.2
tn-1, a
0.1
tn-1, 1-a
In general, we say that the range
 s 
 s 
X  tn 1,a 
  m  X  tn 1,1a 

 n
 n
is a 1 - 2a confidence interval for the population
average m.
0.4
Excel:
TINV(0.1, 13)
90%
0.3
0.2
0.1
-1.77093
1.77093
Therefore, the 90% confidence interval is
 0.76 
8.76  1.77093
,
 14 
or 8.40 – 9.12 ounces.
Note: Be sure you look up the correct
number on the table in the back of the book.
The numbers at the bottom of Table 4 specify
the two-sided confidence levels.
Example: Duracell tests 12 batteries in
flashlights. They determine that the average
life of the batteries in this sample is 3.58
hours, with a sample SD of 1.58 hours. Find a
95% confidence interval for the average life
of a Duracell battery in a flashlight.
Repeat if 100 batteries were tested (with the
same sample mean and SD as above)
Note: In previous lectures, we considered another
technique of inferring information about the box
from the draws – namely, hypothesis testing.
Confidence intervals provide a method of
estimating the average of the box.
Hypothesis testing checks if the difference
between the supposed box average and the sample
average is either real or due to chance.
Download