ESTIMATION

advertisement
ESTIMATION
A. Armed w i t h t h e c e n t r a l l i m i t theorem and t h e normal d i s t r i b u t i o n , one can
o b t a i n t h e p r o b a b i l i t y t h a t one's ESTIMATE o f a p o p u l a t i o n parameter i s i n
error.
-
1. I f sample means ( i .e.,
Ys) are n o r m a l l y d i s t r i b u t e d about a p o p u l a t i o n
parameter, p, then we know t h a t 95% o f a l l Vs t a k e values on t h e
interval,
p f 1.96 u y / h
.
h
2. What we have i n p r a c t i c e , however, a r e n o t p and uy, b u t 7 and uy which
a r e used as p o i n t estimates o f p and uy.
3. I n e s t i m a t i n g these parameters we want estimates t h a t a r e b o t h UNBIASED
(centered around t h e parameter) and EFFICIENT (having a small degree o f
sampling e r r o r r e l a t i v e t o o t h e r e s t i m a t o r s ) .
For example,
Y+
10 i s a biased e s t i m a t o r o f p (although i t i s an
unbiased e s t i m a t o r o f p
+
10).
Also Y1 i s n o t as e f f i c i e n t as
7 in
2 whereas t h e variance o f
e s t i m a t i n g p, because t h e variance o f Y1 i s uy,
4. As you might expect,
Y
i s an e f f i c i e n t , unbiased e s t i m a t o r o f p.
2 i s NOT an UNbiased e s t i m a t o r o f uy:
2
However, t h e sample variance, sy,
To demonstrate t h i s , i t must f i r s t be acknowledged t h a t l / n Z (Yi
-
p)2
2
2 i s t h e average squared
Recall t h a t uy
i s an unbiased e s t i m a t o r o f uy.
d e v i a t i o n from t h e p o p u l a t i o n mean.
v a r i a b l e s (e.g.,
L i k e averages f o r o t h e r random
-
Y f o r Y), a sample average o f squared d e v i a t i o n s from
p i s an unbiased e s t i m a t o r o f t h e p o p u l a t i o n average o f such squared
deviations.
By showing t h a t t h i s unbiased e s t i m a t o r (namely,
65
2 i t can be demonstrated
l / n 1 (Yi - p ) 2 ) i s c o n s i s t e n t l y l a r g e r t h a n sy,
2
t h a t sy2 i s biased as an e s t i m a t o r o f ay.
Thus, t h e t h e s i s i s t h a t f o r
any sample
We begin by n o t i n g t h a t
Since
n(Y -
fi12 2 0 , t h e p r o o f i s complete.
Thus " s y M c o n s i s t e n t l y UNDERestimates ay.
We can a d j u s t f o r t h i s b i a s
by changing t h e denominator o f s t from n t o n-1.
"2
uy =
T h i s i s why
l(Yi - Y) 2
and n o t
st
i s used as t h e POINT ESTIMATE o f a t .
n - 1
(REMEMBER: The " h a t " n o t a t i o n i n d i c a t e s t h a t
5. WARNING:
:t
i s an e s t i m a t o r o f a t . )
When e s t i m a t i n g t h e v a r i a n c e o f X, d i v i d e by "n-1"; when
e s t i m a t i n g t h e v a r i a n c e o f T(, d i v i d e by "n".
Regarding t h e l a t t e r
warning, r e c a l l t h a t according t o t h e c e n t r a l l i m i t theorem,
2
2
"X
- - "X
- n
NOT
2
ax
!!!
n-1
B. We have a l r e a d y t a l k e d about INTERVAL ESTIMATION i n which a CONFIDENCE
INTERVAL i s estimated.
But i n t e r v a l s are o n l y meaningful along some
66
continuum o f values.
Thus i t seems p e c u l i a r t o speak o f c o n s t r u c t i n g an
i n t e r v a l around a "mean" f o r NOMINAL data.
up t h e numbers and d i v i d e by "n."
You CANNOT, o f course, j u s t add
(Reminiscent o f o u r e a r l i e r d i s c u s s i o n
on LEVELS OF MEASUREMENT, t h i s would be "1 ike adding apples and oranges. " )
Our s o l u t i o n w i l l be t o consider
& cateqorv s e ~ a r a t e l y . Taking a
v a r i a b l e such as r e l i g i o u s a f f i l i a t i o n , f o r example, a new v a r i a b l e ( c a l l
i t "D") c o u l d be c o n s t r u c t e d such t h a t
D=
NOTE:
{
0
1
when n o t C a t h o l i c
when C a t h o l i c
V a r i a b l e s l i k e t h i s (namely, t h a t t a k e t h e value, one, when s u b j e c t s
have an a t t r i b u t e and zero otherwise) are c a l l e d
dummv
variables.
C. ESTIMATING PROPORTIONS:
1. Applying t h e formula f o r t h e mean t o a dummy v a r i a b l e always produces a
number w i t h a value somewhere between zero and one.
For example, i f t h e
average o f t h e C a t h o l i c a f f i l i a t i o n v a r i a b l e were .6
,
67
t h i s would mean
t h a t 60% o f t h e C a t h o l i c s i n your sample a r e C a t h o l i c .
This f o l l o w s
because t h e mean o f t h e Ds i s
n =
-
n
2 Di
i=l
n
'
n
.
But s i n c e D
D
=
=
=
0 when D # C a t h o l i c and
1 when D = C a t h o l i c then
p r o p o r t i o n of C a t h o l i c s i n t h e sample.
h
2. Now l e t ' s consider t h e variance o f t h i s s t a t i s t i c , n.
You w i l l r e c a l l
t h a t t h e peg-board i l l u s t r a t i o n (used when we i n t r o d u c e d t h e concept o f
n o r m a l i t y ) was a demonstration o f t h e sampling d i s t r i b u t i o n o f a
proportion.
Applying t h e lesson from t h a t i l l u s t r a t i o n t o t h i s dummy
v a r i a b l e , we know t h a t i f one were t o draw a l o t o f random samples o f
"nu r e s i d e n t s from t h e c i t y o f Antwerp, one would expect t h a t t h e i r
r e s p e c t i v e C a t h o l i c - p r o p o r t i o n s would be d i s t r i b u t e d n o r m a l l y ( f o r l a r g e
n) around t h e t r u e p r o p o r t i o n o f C a t h o l i c s i n t h e p o p u l a t i o n .
To be able t o make p r o b a b i l i s t i c inferences about t h e p r o p o r t i o n o f
C a t h o l i c s t h a t we f i n d i n our sample, we must be more s p e c i f i c about t h e
exact n a t u r e o f t h e sampling d i s t r i b u t i o n o f these sample p r o p o r t i o n s :
a. Applying t h e formula f o r t h e variance t o t h e dummy v a r i a b l e , D:
N o t i n g t h a t when
n
=
0 or 1
,
D
=
,
D~
=Di
A
And since
D
=
-
n
'
F i n a l l y , n o t e t h a t when t h e sample s i z e i s l a r g e , t h e r a t i o n
n - 1
,
can ( f o r a l l p r a c t i c a l purposes) be considered equal t o 1.
A
Thus
A 2
uD
= nA ( 1 - n )
f o r l a r g e samples.
b. But t h i s i s t h e large-sample estimate o f t h e variance o f a dummy
The c e n t r a l l i m i t theorem i s a p p l i e d
variable, not o f a proportion.
A
t o n n e a r l y i d e n t i c a l l y as i t was a p p l i e d t o
X.
That i s , t h e
variance o f a p r o p o r t i o n equals t h e variance o f i t s associated dummy
I n o t h e r words, t h e variance o f
v a r i a b l e d i v i d e d by t h e sample size.
a p r o p o r t i o n i s estimated as f o l l o w s :
A
Accordingly,
a
-
N( n,
n(1
-
n)
n
)
i s what i s c a l l e d t h e "normal
approximation o f t h e binomial d i s t r i b u t i o n . "
BINOMIAL DISTRIBUTION s h o r t l y .
69 We s h a l l r e t u r n t o t h e
c. One more comment about e s t i m a t i n g p r o p o r t i o n s : 1
The f o l l o w i n g i s a t h r e e - p a r t " r u l e o f thumb" f o r d e c i d i n g when your
sample i s " l a r g e enough" f o r you t o assume t h a t
"i"
and " 1 - a"
A
1) Take whichever i s SMALLER o f
and
2) m u l t i p l y i t by "n".
A
3) I f t h i s number i s g r e a t e r than f i v e , you may assume t h a t a i s
normally d i s t r i b u t e d .
D. Sample s i z e r e q u i r e d i n e s t i m a t i n g p r o p o r t i o n s
1. CBS/N.Y.
Times telephone p o l l e r s c l a i m t o have estimates o f p u b l i c
o p i n i o n t h a t are i n e r r o r by o n l y
How
-
+
3% p o i n t s .
Consider t h e question,
a samole would you need t o estimate t h e p r o p o r t i o n o f Americans
t h a t would answer "Yes" t o t h e question, "Do you agree w i t h t h e
P r e s i d e n t ' s economic p o l i c i e s ? "
2. Put d i f f e r e n t l y , you want your estimate o f t h e p r o p o r t i o n t o have a
PRECISION (i.e.,
+
3% !
t h e s u b s t a n t i v e e r r o r one i s w i l l i n g t o t o l e r a t e ) o f
NOTE t h a t saying t h a t you want y o u r e s t i m a t e t o be w i t h i n 3% o f
t h e t r u e amount SAYS NOTHING about t h e s i g n i f i c a n c e l e v e l ( o r a ) .
A
a. We know t h a t a 100(1-a)% confidence i n t e r v a l around a would be
T h i s r u l e o f thumb i s from Alan A g r e s t i and Barbara F i n l a y ( S t a t i s t i c a l
Methods f o r t h e S o c i a l Sciences, 2nd e d i t i o n . San Francisco, CA: D e l l e n ,
1986, p. 142.).
However, see a l s o Alan A g r e s t i and Brent A. C o u l l (1998.
"Approximate i s B e t t e r than 'Exact' f o r I n t e r v a l E s t i m a t i o n o f Binomial
Proportions."
Amercian S t a t i s t i c i a n 52:119-126).
b. It i s t h e l a s t p a r t o f t h i s expression t h a t i s t h e p r e c i s i o n t h a t t h e
CBS/New York Times p o l l e r s set a t 3 percent.
I n accordance w i t h my
e a r l i e r d i s c u s s i o n o f t h e B i g P i c t u r e ( L e c t u r e Notes, p. 36), t h e
Greek c a p i t a l l e t t e r d e l t a , A, r e f e r s t o p r e c i s i o n .
Note t h a t
p r e c i s i o n i s o n e - h a l f t h e w i d t h o f a confidence i n t e r v a l .
PRECISION
=
A =
That i s ,
ZQ/2
REMEMBER: The l a r g e r your p r e c i s i o n i s , t h e
less p r e c i s e
(i.e.,
the
more open t o d e v i a t i o n from t h e t r u e value) y o u r e s t i m a t e w i l l be!
c. A f t e r s e l e c t i n g a p r e c i s i o n o f
l e t ' s say,
Q =
.05
,
A
=
3% and a s i g n i f i c a n c e l e v e l o f ,
t h i s formula r e q u i r e s t h a t we a l s o have a
A
v a l u e f o r n t o determine how l a r g e a sample i s needed.
A
The most c o n s e r v a t i v e choice when choosing a v a l u e f o r n i s a v a l u e
A
A
t h a t y i e l d s t h e l a r g e s t value o f n ( 1 - n ) .
T h i s a l l o w s one t o
e s t i m a t e t h e l a r g e s t p o s s i b l e confidence i n t e r v a l (and, e q u i v a l e n t l y ,
t h e l a r g e s t p o s s i b l e p r e c i s i o n ) t h a t c o u l d r e s u l t once one's d a t a a r e
collected.
is
A
A
n
=
h
n(1 - n)
I t i s easy t o v e r i f y t h a t t h i s most c o n s e r v a t i v e c h o i c e
.50 ! One need merely consider, f o r example, t h e values o f
A
as n ranges from 0 t o 1:
Thus, we f i n d t h a t
r e w r i t t e n as
Thus,
n
=
n
1068
.25
=
.50(1 - .50)
.03 = 1.96
[
1.96
o 3
]2
=
1067.11
,
,
whichcanbe
o r 1068
.
i s t h e s i z e o f CBS/NYT samples!
THREE COMMENTS:
1) As always, round
UD
t o whole numbers i n e s t i m a t i n g sample s i z e s
f o r a given level o f precision.
(I.e.,
By rounding down, a l a r g e r ( i .e.,
less precise) precision i s
a t t a i n e d t h a n t h e one t h a t i s sought.
cases, round
no h a l f people, please.)
(A REMINDER: I n a l l o t h e r
t o t h e nearest decimal v a l u e as p e r p. 3 1 o f these
L e c t u r e Notes.)
2) N o t i c e t h a t t h i s number ( i e . ,
LARGE THE POPULATION!!!
-3% ( a t
a = .05 ) ,
1068) i s t h e same
NO
MATTER
That i s , t o o b t a i n a p r e c i s i o n o f
HOW
A =
you would need t o sample 1068 people no m a t t e r
i f they were drawn from a town o f 20,000 people o r from a l l people
i n t h e world.
72
a) O f course, t h e reason why few surveys o f t h e w o r l d are done i s
due t o t h e d i f f i c u l t y i n o b t a i n i n g a l i s t o f a l l humans from
which such a sample c o u l d be drawn.
b) Although a l a r g e r sample i s n o t r e q u i r e d f o r l a r g e p o p u l a t i o n s ,
a s m a l l e r sample i s allowed when one's sample s i z e i s more than
one t e n t h t h e s i z e o f one's p o p u l a t i o n .
(For more d e t a i l s , see
" f i n i t e p o p u l a t i o n c o r r e c t i o n " i n A g r e s t i and F i n l a y [1986,
P
871.)
2
2 = .25):
3) A few words on e s t i m a t i n g oD
(or, when t o assume oD
a) I f you have c o l l e c t e d your data, you w i l l be a b l e t o c a l c u l a t e
A
n.
A 2
I n such cases use oD
reference:
A
=
nA ( l - n )
You w i l l always use
A 2
.
(As a p o i n t o f f u t u r e
A
A
oD = n ( l - n )
when f i n d i n g
confidence i n t e r v a l s . )
b) I f you are t e s t i n g a hypothesis, you w i l l have decided on a
"hypothesized value" o f n ( c a l l i t n o ) .
=
n o ( l - no)
.
NOTE: We s h a l l t a l k a l o t more about no when
we discuss hypothesis t e s t i n g .
reference:
"2
I n such cases, use oD
(ALSO, as a p o i n t o f f u t u r e
"2
You w i l l always use oD
= no(l -no)
in
c a l c u l a t i n g p - v a l ues and Type I 1 e r r o r s . )
A
c ) When you have n e i t h e r n n o r no (as i s o f t e n t h e case when
"2
choosing n), use t h e most c o n s e r v a t i v e estimate o f oD,
namely
A
d) When you have both r and ro (as o f t e n happens when you are
w r i t i n g up your f i n d i n g s ) , use t h e more c o n s e r v a t i v e e s t i m a t e
"2
o f aD,
namely whichever i s t h e l a r a e r o f
73
A 2
A
A
aD = r ( l - n )
or
aA 2 = 1
n
o
.
(WARNING: As mentioned i n "a" and "b" above,
t h e r e a r e exceptions t o t h i s r u l e when f i n d i n g confidence
i n t e r v a l s , p-values, and Type I 1 e r r o r s . )
E. You may r e c a l l from t h e second s e c t i o n o f these l e c t u r e notes t h a t n e i t h e r
t h e C e n t r a l L i m i t Theorem n o r t h e B i g P i c t u r e h o l d s f o r estimates based on
small samples.
The binomial d i s t r i b u t i o n i s a case i n p o i n t .
L e t us
c o n s i d e r what t o do i f we f i n d t h a t t h e sample i s NOT l a r g e enough t o
assume t h a t
A
n
- N(
n,
n(1 - n)
n
1 .
That i s , what should be done i f we f i n d t h a t n
*
A
n
(
5 or n
*
A
(1 - n) s 5 ?
To answer t h i s , we need t o t h i n k a l i t t l e more about t h e d i s t r i b u t i o n o f
D = {
01
if C
n oatt hCoal itch o l i c
N o t i c e t h a t t h i s random v a r i a b l e a c t s much l i k e f l i p p i n g a c o i n .
t h a t you have a p e r f e c t l y symmetric c o i n ( i e . ,
o f g e t t i n g a "heads" o r a " t a i l s , "
"heads" o f
n
=
OR
Assume
a c o i n w i t h equal chances
with a probability o f getting
With a sample o f s i z e n = 2 c o i n tosses, n o t e t h a t
.5 ) .
t h e d i s t r i b u t i o n a l r e a d y begins t o t a k e on a somewhat normal shape w i t h
A
Pr( n = 0 ) = .25
,
no heads
A
Pr( n = .5 )
=
.5
,
A
and
one head
Pr( n
=
1 ) = .25
.
two heads
T h i s p r o b a b i l i t y d i s t r i b u t i o n can be expressed as t h a t o f a BINOMIAL RANDOM
A
VARIABLE, Y , by m u l t i p l y i n g each n value by t h e sample s i z e .
The r e s u l t i n g
d i s t r i b u t i o n i s then
Pr( Y = 0 ) = .25
,
P r ( Y = 1 ) = .50
,
and
P r ( Y = 2 ) = .25
.
NOTE: The BINOMIAL RANDOM VARIABLE, Y, i s t h e sum o f t h e values taken by
74
each o f n independently sampled observations on a dummy v a r i a b l e .
The
sampling d i s t r i b u t i o n o f a binomial random v a r i a b l e i s c a l l e d t h e BINOMIAL
PROBABILITY DISTRIBUTION.
The general form o f t h i s b i n o m i a l d i s t r i b u t i o n
i s as f o l l o w s :
pr( Y = Y )
No.
=
k]
ny ( 1
- n)n-y ,
where
k]
i
n!
y!(n-y)!
P l a c i n g an exclamation p o i n t a f t e r a number does n o t i n d i c a t e t h e
s t a t i s t i c i a n ' s excitement about t h a t p a r t i c u l a r number.
When an
exclamation p o i n t i s placed a f t e r an i n t e g e r , t h i s r e f e r s t o a " f a c t o r i a l . "
I t means t h a t you m u l t i p l y t h a t number by a l l i n t e g e r s between i t and zero.
For example, " f i v e f a c t o r i a l " ( 5 ! ) i s c a l c u l a t e d as
Moreover, by d e f i n i t i o n ,
O!
= 1
5*4*3*2*1
=
120
.
.
T h i s seems a l o t o f t r o u b l e u n t i l you c o n s i d e r
Pr( Y < 3 )
when n = .8
and n = 32 !
L e t ' s see how we would c a l c u l a t e t h i s :
A f t e r c a l c u l a t i n g these numbers,
CONCLUSION:
Pr( Y < 3 )
equals t h e i r sum.
When your sample s i z e i s t o o small f o r you t o assume t h a t t h e
A
sampling d i s t r i b u t i o n o f n i s normally d i s t r i b u t e d , you should use t h e
binomial d i s t r i b u t i o n i n s t e a d o f t h e standard normal d i s t r i b u t i o n t o
75
determine i f y o u r r e s u l t s are s i g n i f i c a n t l y d i f f e r e n t from what you would
assume would happen by chance alone.
Moreover, note t h a t although t h e
binomial d i s t r i b u t i o n may be computationally i m p r a c t i c a l f o r very l a r g e
samples, i t ALWAYS APPLIES f o r p r o p o r t i o n s .
instance,
J
r * n = .8*32 = 25.6 > 5 p
I n t h i s l a s t example, f o r
( 1 - r ) * n = .2*32 = 6.5 > 5
,
a l l o w i n g one t o use e i t h e r t h e binomial d i s t r i b u t i o n o r i t s normal
approximation i n drawing i n f e r e n c e s about p o p u l a t i o n p r o p o r t i o n s .
F. t-DISTRIBUTION:
I f you a r e making an i n t e r v a l estimate o f t h e
mean
o f an
i n t e r v a l - o r r a t i o - l e v e l v a r i a b l e and i f y o u r sample s i z e i s small (say,
l e s s than about
n
=
30 ) , you w i l l need t o use t h e t - d i s t r i b u t i o n (see
Table B) i n s t e a d o f t h e standard normal d i s t r i b u t i o n .
1. A l l t h a t i s being s a i d here i s t h a t ( f o r t h e purposes hand-calculated
problems d u r i n g t h i s course) when n > 30, you should use y o u r standard
normal t a b l e i n e s t i m a t i n g p r o b a b i l i t i e s associated w i t h sample means.
When n
(
30, use t h e t - d i s t r i b u t i o n t a b l e .
( S t a t i s t i c a l software l i k e
SPSS w i l l always use t h e t - d i s t r i b u t i o n i n t e r n a l l y . )
2. There i s an important r e s t r i c t i o n when i t comes t o u s i n g t h e t d i s t r i b u t i o n , however.
I f i n one's p o p u l a t i o n t h e d i s t r i b u t i o n o f a
v a r i a b l e i s n o t normal, i t i s l e g i t i m a t e (given t h e c e n t r a l l i m i t
theorem) t o assume t h a t t h e sampling d i s t r i b u t i o n o f one's s t a t i s t i c i s
normal.
I n c o n t r a s t , when your sample i s small i t i s NOT l e g i t i m a t e t o
assume t h a t t h e sampling d i s t r i b u t i o n o f y o u r s t a t i s t i c i s t h a t o f a t d i s t r i b u t i o n , unless you have reason t o b e l i e v e t h a t y o u r s t a t i s t i c ' s
underlying variable
fi normally
d i s t r i b u t e d i n the population.
3. N o t i c e t h a t t h e t a b l e f o r t h e t - d i s t r i b u t i o n i s organized s i m i l a r l y t o
t h e chi-square t a b l e , w i t h degrees o f freedom down t h e l e f t column and
76
Like the chi-square table, the
p r o b a b i l i t i e s a t t h e t o p o f t h e columns.
t - t a b l e presents values on a s t a t i s t i c (here, t - s c o r e s i n s t e a d o f c h i square scores) f o r a l a r g e number o f d i s t r i b u t i o n s - e a c h
distribution
d e f i n e d according t o t h e s t a t i s t i c ' s degrees o f freedom.
For example,
when e s t i m a t i n g a confidence i n t e r v a l f o r a mean, t h e t - d i s t r i b u t i o n
w i t h n - 1 degrees o f freedom should be used.
(NOTE: The shape o f t h e t -
d i s t r i b u t i o n g e t s " f l a t t e r , " t h e fewer i t s degrees o f freedom.)
4. AN EXAMPLE: Imagine t h a t a government o f f i c i a l wishes t o a p p r o p r i a t e
As p a r t o f h i s planning, he wishes t o e s t i m a t e
funds f o r f l o o d r e l i e f .
( a t t h e .10 l e v e l o f s i g n i f i c a n c e ) how many o f t h e people who r e s i d e on
t h e M i s s i s s i p p i f l o o d p l a i n move o f f o f t h e f l o o d p l a i n w i t h i n a y e a r o f
the r i v e r ' s flooding.
He o n l y has d a t a f o r t h e l a s t 30 years and d u r i n g
these years o n l y s i x f l o o d s have occurred.
He assumes h i s s i x measures
o f moved r e s i d e n t s t o have been drawn from a n o r m a l l y d i s t r i b u t e d
p o p u l a t i o n o f resident-moves d u r i n g p a s t f l o o d s .
Thus, h i s goal i s t o
f i n d an i n t e r v a l estimate o f t h e number o f people who move a f t e r
Mississippi floods.
People who moved:
H i s d a t a are
21, 31, 19, 38, 60, 47.
C l e a r l y , t h i s sample i s smaller than
n
=
30
.
Note t h a t i f we use t h e
normal d i s t r i b u t i o n , t h e 90% confidence i n t e r v a l i s
-
* za/2
,
which s i n c e
X
=
A
"2 = 248 [ox
36 and ox
y i e l d s a 90% confidence i n t e r v a l o f 36
between 25.4 and 46.6
.
+
1.645
*
15'75
fi
=
15.751
o r an i n t e r v a l
However, s i n c e n = 6 < 30
expression.
,
we must use t
a/2
R e f e r r i n g t o Table B we f i n d t h a t
and n o t Z
a/2
i n this
t5,.05 = 2.015
.
Thus
o u r 90% confidence i n t e r v a l becomes
36 t 2.015
15'75
fi
o r an i n t e r v a l between 23.0 and 49.0
.
NOTE how i n comparison t o t h e z-score, t h e t - s c o r e leads you t o make a
more c o n s e r v a t i v e ( i .e.,
a 'wider')
i n t e r v a l estimate.
5. Now l e t ' s imagine t h a t you c o u l d o b t a i n d a t a on an a d d i t i o n a l f o u r t e e n
M i s s i s s i p p i f l o o d s i f you were t o go t o t h e c o n s i d e r a b l e e x t r a t i m e and
expense o f o b t a i n i n g d a t a from l o c a l newspapers on f l o o d s t h a t t o o k
p l a c e d u r i n g t h e previous century.
Making use o f t h e d a t a from t h e s i x
cases t h a t you do have, you assume t h a t t h e p o p u l a t i o n mean i s i n f a c t
36 f l o o d v i c t i m s , and t h a t t h e p o p u l a t i o n v a r i a n c e equals 248 (squared
victims).
Here's a q u e s t i o n f o r you: "How f a r from ( t h e assumed p o p u l a t i o n mean
o f ) 36 would t h e mean from a random sample o f 20 f l o o d s have t o be f o r
i t t o have a p r o b a b i l i t y o f l e s s than .05 o f having been sampled from
t h i s assumed p o p u l a t i o n o f M i s s i s s i p p i R i v e r f l o o d s ? "
Because "how f a r "
r e q u i r e s t h a t we consider d e v i a t i o n s b o t h l a r g e r and s m a l l e r t h a n 36,
t h e answer t o t h i s q u e s t i o n (again u s i n g Table B) i s
2.093
-1
=
t19,.025
$ /fi
7.37 moved r e s i d e n t s " f a r from" 36.
6. What a complicated way t o speak o f p r o b a b i l i t i e s !
The language o f
HYPOTHESIS TESTING was developed t o s i m p l i f y p r o b a b i l i s t i c statements
such as t h i s .
=
Download