HYPOTHESIS TESTING

advertisement
HYPOTHESIS TESTING
A. DEFINITION: A hyaothesis i s statement made p r i o r t o d a t a c o l l e c t i o n about
t h e value a sample s t a t i s t i c w i l l take.
W i t h i n t h e c o n t e x t o f hypothesis
t e s t i n g , one speaks o f t h e " n u l l hypothesis" and o f t h e " a l t e r n a t i v e
hypothesis."
B. The NULL hypothesis (Ho)
1. i s always o f t h e form,
Ho: 8 = 8,
,
and
2. i s always worded as t h e case w i t h NO EFFECTS.
3. R e j e c t i n g Ho means t h e r e i s EVIDENCE o f e f f e c t s .
F a i l i n g t o r e j e c t Ho
means t h e r e i s no evidence o f e f f e c t s .
C. More vocabulary o f hypothesis t e s t i n g :
1. Before c o l l e c t i n g data,
the s t a t i s t i c i a n
chooses (i.e.,
the s t a t i s t i c i a n
i s an i n t e n t i o n a l agent, t a k i n g f u l l r e s p o n s i b i l i t y f o r t h e c h o i c e o f )
a. a value t h a t ( i n accordance w i t h t h e n u l l hypothesis) we assume t o be
t h e v a l u e o f 8 ( t h e parameter o f i n t e r e s t ) .
This "value o f the
parameter under t h e n u l l hypothesis" w i l l be r e f e r r e d t o w i t h t h e
symbol, 8.,
b. t h e form o f an a l t e r n a t i v e hypothesis.
A l t e r n a t i v e hypotheses a r e
e i t h e r one- o r t w o - t a i l e d and t a k e t h e f o l l o w i n g forms:
1) HA: 8 > go
o n e - t a i 1ed a1 t e r n a t i v e hypotheses
2) HA: 8 < 8,
3 ) HA: 0 # 0,
a t w o - t a i l e d a1 t e r n a t i v e hypothesis
NOTE: O n e - t a i l e d hypothesis t e s t s s p e c i f y t h e d i r e c t i o n i n which you
b e l i e v e t h e r e w i l l be e f f e c t s .
For example, you might wish t o t e s t
whether o r n o t g h e t t o r e s i d e n t s earn
less ( b u t
n o t more) than o t h e r s .
c. a SIGNIFICANCE LEVEL t h a t s e t s t h e p r o b a b i l i t y t h a t you are w i l l i n g
t o wronaly r e j e c t t h e n u l l hypothesis.
The s i g n i f i c a n c e l e v e l ( o r
LEVEL OF SIGNIFICANCE o r a-LEVEL) w i l l o f t e n be represented by t h e
Greek l e t t e r , a.
I n t h e n e x t s e c t i o n , a w i l l a l s o be r e f e r r e d t o as
t h e " p r o b a b i l i t y o f a Type I e r r o r " (i.e.,
as t h e p r o b a b i l i t y o f
r e j e c t i n g t h e n u l l hypothesis g i v e n t h a t t h e n u l l h y p o t h e s i s i s
true).
"How do you want it-the cryslal mumbejumbo or
statistical probabilily?"
80
d. a PRECISION, A, t h a t sets t h e minimum s u b s t a n t i v e amount g r e a t e r o r
smaller than 8,
t h a t you are i n t e r e s t e d i n d e t e c t i n g as s t a t i s t i c a l l y
s i g n i f i c a n t a t t h e s i g n i f i c a n c e l e v e l you have chosen.
d i f f e r e n t l y , p r e c i s i o n i s t h e d i s t a n c e from 8,
Put
such t h a t a s m a l l e r
d i s t a n c e would be t o o small t o be o f t h e o r e t i c a l i n t e r e s t .
NOTE: You can decrease t h e p r e c i s i o n used i n y o u r hypothesis t e s t by
i n c r e a s i n g your sample s i z e and, conversely, you can increase your
p r e c i s i o n by decreasing your sample s i z e .
Likewise, an increase i n
your s i g n i f i c a n c e l e v e l y i e l d s a decrease i n p r e c i s i o n and a decrease
i n s i g n i f i c a n c e l e v e l y i e l d s an increase i n p r e c i s i o n .
It i s
important t o recognize t h a t t h e smaller t h e p r e c i s i o n , t h e
p r e c i s e y o u r hypothesis t e s t w i l l be.
the precision ( i . .
i n everyday usage)
decreasing o f t h e p r e c i s i o n ( i . e . ,
A)
more
Put d i f f e r e n t l y , i n c r e a s i n g
of
your t e s t e n t a i l s a
the test.
(Sorry, b u t t h a t
i s how s t a t i s t i c i a n s use t h i s term.)
2. CRITICAL VALUES
Whether you have one o r two c r i t i c a l values depends on t h e form you have
chosen f o r an a l t e r n a t i v e hypothesis.
A f t e r t h i s , y o u r choices o f 8,
and A determine what t h i s c r i t i c a l value(s) w i l l be.
A c r i t i c a l value's
meaning depends upon which a l t e r n a t i v e - h y p o t h e s i s - f o r m was chosen:
a. I n a t w o - t a i l e d t e s t , values along t h e i n t e r v a l between t h e c r i t i c a l
values are values o f t h e parameter t h a t a r e
not s i g n i f i c a n t l y
d i f f e r e n t from 8.,
b. The c r i t i c a l value when
HA: 8 > 8,
,
i s t h e minimum value a t which
t h e parameter could be considered s i g n i f i c a n t l y l a r ~ e trh a n Bo.
HA: 0 < 0,
c. The c r i t i c a l v a l u e when
,
i s t h e maximum v a l u e a t which
t h e parameter c o u l d be considered s i g n i f i c a n t l y s m a l l e r t h a n 0.,
3. REJECTION RULES
Each hypothesis t e s t has a s i n q l e r e j e c t i o n r u l e associated w i t h i t .
T h i s r e j e c t i o n r u l e determines when Ho i s t o be r e j e c t e d .
The r u l e
always d i r e c t s one t o r e j e c t Ho when t h e parameter's estimated v a l u e
(here, 2) i s s u b s t a n t i v e l y d i f f e r e n t from 0.,
a. I n a t w o - t a i l e d t e s t , t h e r e j e c t i o n r u l e i s always t h a t Ho i s t o be
rejected i f
2
i s n o t equal t o a number between t h e two c r i t i c a l
values.
b. When
HA: 0 > 0,
rejected i f
c. When
HA: 0
rejected i f
A
,
0 > 0,
t h e r e j e c t i o n r u l e i s always t h a t Ho i s t o be
+
.
A
< 0, , t h e r e j e c t i o n r u l e i s always t h a t Ho i s t o be
A
0
.
< Ro - A
4. REJECTION REGIONS
I f a f t e r d a t a are c o l l e c t e d
s t a n t i v e l y d i f f e r e n t from 0,
of
2
2
i s found t o be o f a magnitude t h a t i s sub( l e a d i n g you t o r e j e c t Ho), you may speak
as "having f a l l e n i n t o " t h e r e j e c t i o n r e g i o n ( o r CRITICAL REGION).
D. An ILLUSTRATION:
Imagine t h a t we have a "random sample" o f
n = 10 c o n t r a c t k i l l e r s .
One
t h e o r y o f deviance claims t h a t people become c o n t r a c t k i l l e r s o n l y i f t h e y
had no r e l i g i o u s upbringing.
D u r i n g a r e c e n t n a t i o n a l survey i t was found
t h a t h a l f o f Americans had a r e l i g i o u s u p b r i n g i n g ( i . e . ,
ro = . 5 ) .
"What
proportion of your sample of contract killers must have had a religious
upbringing in order to have evidence at the .05 level of significance in
support of the deviance theory?"
1. Note that no
* n s 5 (and, by the way,
[l
- no] *
that the binomial distribution is called for here.
n s 5 ),
and thus
Also note that the
required hypothesis test is one-tailed, because the deviance theory
suggests that contract killers will be
religious upbringing than others.
less likely
to have had a
The null and alternative hypotheses
are thus as follows:
2. Some probabilities:
We do NOT need to calculate any more probabilities to set up our
rejection rule. This can be easily shown by illustrating as much of the
probability distribution of Y as we have calculated so far:
Note that
Accordingly, t h e REJECTlON RULE i n t h i s i l l u s t r a t i o n i s , "Reject Ho i f
Y < 2
."
( I t would
not be
t o r e j e c t Ho i f
Y < 3
,
because t h e n you
would then no l o n g e r be a p p l y i n g t h e .05 s i g n i f i c a n c e l e v e l . )
Once d a t a
were c o l l e c t e d , you would need t o have 0 o r 1 ( b u t no more) c o n t r a c t
k i l l e r s o u t o f t h e t e n t o have had a r e 1 i g i o u s u p b r i n g i n g i n o r d e r t h a t
t h e d a t a would p r o v i d e s t a t i s t i c a l l y s i g n i f i c a n t evidence ( a t a
i n support o f t h e deviance theory.
=
.05)
Got i t ?
E. ANOTHER ILLUSTRATION:
Assume t h a t a random sample o f
n
=
25
New York C i t y g h e t t o r e s i d e n t s has
been assembled and t h a t t h e i r average income i s $23,500 w i t h a standard
d e v i a t i o n e s t i m a t e o f $8,000.
Ho: p = $26,000
"Could one r e j e c t
( t h e average income f o r N.Y. C i t y r e s i d e n t s )
i n favor o f
HA: p < $26,000
a t t h e .05 s i g n i f i c a n c e l e v e l ? "
Given t h a t
n = 25
and a
=
.05
(and making t h e
u n l i k e l v assumption
t h a t incomes w i t h i n t h e g h e t t o are normally d i s t r i b u t e d ) , we can use
Table
B
t o find that
t24,
.05
o f t h e sample means ( i .e.,
=
1.711
.
Moreover, t h e standard d e v i a t i o n
t h e standard e r r o r ) can be estimated as f o l l o w s :
x
The c r i t i c a l value f o r
OR
$26,000 - ( 1.711
*
i s thus
$1,600 )
po
=
-
$23,262.40
< $23,262.40
r u l e i s , "Reject Ho i f
t h e acceptance r e g i o n ( i .e.,
*
( t24,.05
because
."
.
A
ox )
Accordingly, t h e r e j e c t i o n
Because
r(
=
$23,500
$23,500 > $23,262.40
,
)
falls in
we f a i l t o
r e j e c t t h e n u l l hypothesis.
Note t h a t one can a l s o t e s t hypotheses by s t a n d a r d i z i n g o u r s t a t i s t i c s , and
t a k i n g c r i t i c a l values d i r e c t l y from Tables A o r B.
we c o u l d c a l c u l a t e an e m ~ i r i c a lvalue o f
standard d e v i a t i o n s $23,500 ( i . e . ,
t
In this illustration,
t h a t would measure how many
-
X) i s away from $26,000 (i.e.,
One can a l s o s e t up a r e j e c t i o n r u l e i n a standardized form.
one c o u l d use t h e r e j e c t i o n r u l e , "Reject Ho i f
t24> -t24,.05
= -1.711
,
For example,
t 2 4< -1*711
we f a i l t o r e j e c t Ho as before.
po):
."
Because
(As an
exercise, you may wish t o convince y o u r s e l f a l g e b r a i c a l l y t h a t b o t h methods
o f t e s t i n g hypotheses always l e a d you t o t h e same c o n c l u s i o n . )
F. To s t a n d a r d i z e o r n o t t o standardize
...
1. As j u s t noted, r e j e c t i o n r u l e s can be s e t up i n two ways.
For example,
i n t h e New York C i t y problem y o u r r e j e c t i o n r u l e c o u l d have been s t a t e d
i n e i t h e r o f t h e f o l l o w i n g ways:
a. R e j e c t t h e n u l l hypothesis t h a t New York C i t y g h e t t o r e s i d e n t s earn
t h e same as New York C i t y r e s i d e n t s i f t h e average g h e t t o income ( x )
i s l e s s than $23,262.40
.
b. R e j e c t t h e n u l l hypothesis t h a t New York C i t y g h e t t o r e s i d e n t s earn
85
t h e same as New York C i t y r e s i d e n t s i f t h e average g h e t t o income (1)
i s more than 1.711 standard e r r o r s below $26,000.
2. Both ways o f expressing a r e j e c t i o n r u l e are c o r r e c t .
I n homework
problems (unless one o r t h e o t h e r i s s p e c i f i c a l l y requested) you may use
either.
However, be s u r e t h a t you understand how t o use both.
G. a-LEVEL versus P-VALUE
1. The l e v e l o f s i g n i f i c a n c e ( o r a - l e v e l ) i s t h e p r o b a b i l i t y o f r e j e c t i n g
Ho d e s p i t e t h e f a c t t h a t Ho i s t r u e .
2. The P-value i s t h e p r o b a b i l i t y under t h e n u l l hypothesis (i.e.,
assuming
Ho t o be t r u e ) o f o b t a i n i n g a value f o r a s t a t i s t i c as s u p p o r t i v e o r
more s u p p o r t i v e o f HA than t h e value d e r i v e d from t h e d a t a i n one's
sample.
L e t ' s s t a r t w i t h two examples:
a. A r e c e n t i n t e r n a t i o n a l survey found t h a t t h e Netherlands has t a l l e r
c i t i z e n s on average than every o t h e r c o u n t r y i n t h e world.
average, Hollanders over t h e age o f 18 a r e 5'10" (i.e.,
tall.
On
70 inches)
You wish t o t e s t whether Hollanders who l i v e i n t h e c o u n t r y ' s
two n o r t h e r n provinces, F r i e s l a n d and Groningen, are (as rumored)
t a l l e r t h a n t h e average Hollander.
You randomly sample 64 r e s i d e n t s
o f t h e two n o r t h e r n provinces and f i n d t h e r e s i d e n t s ' average h e i g h t
t o equal 6 f e e t , w i t h a variance o f 36 squared inches.
"What i s t h e
P-value associated w i t h these r e s u l t s ? "
Answering t h i s question r e q u i r e s t h a t we determine how many standard
e r r o r s 72 inches ( i . e . ,
t h e mean h e i g h t c a l c u l a t e d from d a t a on y o u r
sample o f 64 Hollanders from t h e two provinces) i s g r e a t e r t h a n 70
inches ( i .e.,
t h e mean h e i g h t o f Hollanders o v e r a l l ) :
The P-value associated w i t h t h i s z - s t a t i s t i c i s found i n Table A t o
be
p = P r ( Z > 2.67 ) = .0038
.
I n words, "One would expect t o f i n d
average h e i g h t s as l a r g e o r l a r g e r t h a n t h i s i n o n l y 3.8 o u t o f a
thousand samples o f s i z e 64 t h a t were randomly sampled from a
p o p u l a t i o n having t h e same d i s t r i b u t i o n o f h e i g h t s as t h a t o f a l l
Hollanders."
Accordingly, one may l e g i t i m a t e l y conclude t h a t t h e
evidence supports y o u r ( a l t e r n a t i v e ) hypothesis t h a t Hollanders
l i v i n g i n t h e two northernmost provinces a r e t a l l e r than Hollanders
i n general.
b. A U.S.
p r e s i d e n t ' s approval r a t i n g was 50% b e f o r e h i s t e l e v i s e d
apology f o r having had an "improper" r e l a t i o n s h i p w i t h a White House
intern.
Imagine t h a t you wish t o evaluate whether t h i s r a t i n g
chansed a f t e r t h e apology was made.
You draw a random sample o f 50
U.S. c i t i z e n s and f i n d t h a t 30 o f them "approve o f how t h e President
i s running t h e country."
Answering t h i s question r e q u i r e s t h a t we determine how many standard
errors
A
r
=
30/50
(i.e.,
t h e p r o p o r t i o n o f approvers among t h e 50
U.S. c i t i z e n s i n your sample) i s d i f f e r e n t from no
=
.5
(i.e.,
the
p r o p o r t i o n o f approvers among U.S. c i t i z e n s p r i o r t o t h e apology):
Because t h e wording o f t h e question i s " t w o - t a i l e d " ( i . e . ,
87
it refers
t o e v a l u a t i n g "changeu-not
d e c l i n e o r improvement-in
the
p r e s i d e n t ' s r a t i n g ) , the P-value associated w i t h t h i s z - s t a t i s t i c i s
t w i c e t h e p r o b a b i l i t y associated w i t h i t i n Table A.
2
+
Pr( Z > 1.41 ) = .0793
.
.0793 = .I586
That i s ,
p =
I n words, "One would
expect t o f i n d a change i n t h e p r e s i d e n t ' s approval as l a r g e o r
l a r g e r than t h i s i n about 16 o u t o f a hundred samples o f s i z e 50 t h a t
were randomly sampled from a p o p u l a t i o n i n which t h e r e had been no
such change.
" Accordingly, one may l e g i t i m a t e l y conclude t h a t t h e
evidence does
not support
your ( a l t e r n a t i v e ) hypothesis t h a t t h e
p r e s i d e n t ' s approval r a t i n g changed a f t e r h i s apology.
c. Conclusion: Now when you see
p
=
.097
i n a j o u r n a l a r t i c l e , you
know t h a t t h i s r e f e r s t o some s t a t i s t i c ' s P-value.
( A l s o please
A
'remember t h a t no, n o t r, i s used i n e s t i m a t i n g oA i n t h e process o f
A
r
f i n d i n g a P-value f o r r . )
A
3. Here's a more general f o r m u l a t i o n : L e t 8 be an unbiased e s t i m a t o r o f t h e
parameter 8 from some population.
A
Further, l e t O1 be t h e value o f t h e
A
s t a t i s t i c , 8, computed u s i n g data from t h e f i r s t random sample (among
many p o s s i b l e random samples) from t h e p o p u l a t i o n .
a. If HA: 8 > O0
b. I f
HA: 8 < 8,
A
A
,
then
p = Pr( 8 > O1
,
then
p
A
A
=
8 = O0 )
.
p
O1
8,
A
then
and
=
1
tJo,
d. If HA: 8 # 8,
XI
.
p = 2
B1 > 8,
4. Thus, f o r example, i f
)
=
then
and
<
8
,
c. I f HA: 8 # 8,
A
A
P r ( 0 < O1
1
=
2
A
P r ( 8 > O1
*
A
1
8
=
8,
)
A
Pr( 8 < O1 1 8 = 8,)
-5 i s t h e average " a f f i r m a t i v e a c t i o n
approval score" i n our f i r s t - a n d - o n l y random sample o f Ku K l u x Klansmen
.
.
(KKK members) and i f our n u l l hypothesis i s t h a t t h e average KKK score
i s zero ( i . e . ,
Ho: p = p0 = 0 ) , t h e n we can d i v i d e t h e sampling
d i s t r i b u t i o n i n t o three parts:
a. I f HA: p > 0
,
then
p = Pr( T( > -5
b. I f HA: p < 0
,
then
p
c. If HA: p # 0
=
Pr(
Ti <
X1
and g i v e n t h a t
p=2*Pr(X<-5
-5
=
1
p = 0 ) = [2]
1
p
=
-5 < po
+
[3]
.
0 ) = [l]
=
0
1 p = O ) = 2 * [ 1 ] = [ 1 ]
,
then
+[3],
g i v e n t h e symmetry o f t h e sampling d i s t r i b u t i o n o f
X.
5. A f i n a l i l l u s t r a t i o n :
"What would be t h e a - l e v e l and t h e P-value f o r t h e mean income from a
sample o f 250 New York C i t y g h e t t o r e s i d e n t s , where we a r e t e s t i n g t h e
same hypotheses a t t h e same s i g n i f i c a n c e l e v e l as t e s t e d b e f o r e (on a
sample o f 25) and where, l i k e before, t h e sample mean equals $23,500 and
t h e standard d e v i a t i o n e s t i m a t e equals $8,000?"
Our n u l l and a l t e r n a t i v e hypotheses are again as f o l l o w s :
Note t h a t f i n d i n g t h e a - l e v e l r e q u i r e s no c a l c u l a t i o n , because i t i s a
number determined by t h e s t a t i s t i c i a n ( i n c o n j u n c t i o n w i t h t h e
89
scientific comnunity that provides audience to her analysis).
To retain
consistency with the previous illustration, we might decide to allow
a
-
.05 in this illustration as well.
More calculating is required to find a P-value, however.
Based on the
central limit theorem we know that
Given this (and after converting units to thousands of dollars), a
z-statistic can be calculated as follows:
Referring to Table A we find that
Pr( Z < -4.5 )
=
.0000034 and
Pr( Z < -5.0 )
=
.000000287
.
Thus one would expect to find an average income of $23,500 or less in
approximately 3 in 10 million samples of size 250 that were randomly
sampled from a population in which the true average income were $26,000.
In other words, with a sample ten times larger than before, the same
results strongly support the (alternative) hypothesis that ghetto
residents earn less than New York City residents do overall.
NOTE: Researchers commonly report such small P-values as p < .001
HANDOUT
CONFIDENCE INTERVALS vs.
HYPOTHESIS TESTS
I.Confidence I n t e r v a l s a r e c a l c u l a t e d around a s t a t i s t i c
AFTER d a t a
are
collected:
A. When c a l c u l a t i n g a mean u s i n g i n t e r v a l - l e v e l data,
A
1) if n
> 30 , then use K t za/2
"x
,
or
A
2) if n
s
30
-
,
then use
X
+
taI2
"x
rn
3) Note t h a t when using t h e t - d i s t r i b u t i o n , one must assume t h a t
2
X - N ( BXP OX
.
B. When c a l c u l a t i n g a p r o p o r t i o n from nominal - l e v e l data,
1) i f e i t h e r
n
*
A
n
<
5
or
n
*
A
(1 - n)
<
5
,
t h e n use t h e binomial
d i s t r i b u t i o n t o determine t h e range o f p r o p o r t i o n s such t h a t no more
than a/2 o f t h e p r o p o r t i o n s i n your sampling d i s t r i b u t i o n a r e h i g h e r
than t h e upper bound o f t h e i n t e r v a l AND such t h a t no more than a/2
o f t h e p r o p o r t i o n s a r e l o w e r than t h e lower bound, o r
2) i f b o t h
n
*
A
n
A
> 5 and n * ( 1 - n ) > 5 , then use
11. Hypotheses a r e decided upon BEFORE d a t a a r e c o l l e c t e d :
A. O n e - t a i l e d hypotheses:
jto
,
n > 30
,
then t h e r e j e c t i o n r u l e i s r e j e c t Ho i f
b) if n 5 30
,
then t h e r e j e c t i o n r u l e i s r e j e c t Ho i f
jto
,
1) When
a) i f
2) When
HA:
HA:
jt
jt
>
<
then t h e r e j e c t i o n r u l e s a r e t h e same as i n l),
except t h a t one r e j e c t s Ho i f
B. T w o - t a i l e d hypotheses:
When
HA:
jt
,
# jto
then t h e r e j e c t i o n r u l e i s r e j e c t Ho when
falls
outside the interval,
C.
When t e s t i n s h v ~ o t h e s e sabout p r o ~ o r t i o n s , use t h e above t e s t s ( w i t h z,
n o t t ) ONLY I F
n
*
;> 5
AND
n
*
(1 -
i)>
5
.
Otherwise t h e
A
b i n o m i a l d i s t r i b u t i o n i s used t o determine how l a r g e o r small n must be
i n o r d e r f o r t h e n u l l hypothesis t o be r e j e c t e d a t a s p e c i f i c a - l e v e l .
Download