HYPOTHESIS TESTING A. DEFINITION: A hyaothesis i s statement made p r i o r t o d a t a c o l l e c t i o n about t h e value a sample s t a t i s t i c w i l l take. W i t h i n t h e c o n t e x t o f hypothesis t e s t i n g , one speaks o f t h e " n u l l hypothesis" and o f t h e " a l t e r n a t i v e hypothesis." B. The NULL hypothesis (Ho) 1. i s always o f t h e form, Ho: 8 = 8, , and 2. i s always worded as t h e case w i t h NO EFFECTS. 3. R e j e c t i n g Ho means t h e r e i s EVIDENCE o f e f f e c t s . F a i l i n g t o r e j e c t Ho means t h e r e i s no evidence o f e f f e c t s . C. More vocabulary o f hypothesis t e s t i n g : 1. Before c o l l e c t i n g data, the s t a t i s t i c i a n chooses (i.e., the s t a t i s t i c i a n i s an i n t e n t i o n a l agent, t a k i n g f u l l r e s p o n s i b i l i t y f o r t h e c h o i c e o f ) a. a value t h a t ( i n accordance w i t h t h e n u l l hypothesis) we assume t o be t h e v a l u e o f 8 ( t h e parameter o f i n t e r e s t ) . This "value o f the parameter under t h e n u l l hypothesis" w i l l be r e f e r r e d t o w i t h t h e symbol, 8., b. t h e form o f an a l t e r n a t i v e hypothesis. A l t e r n a t i v e hypotheses a r e e i t h e r one- o r t w o - t a i l e d and t a k e t h e f o l l o w i n g forms: 1) HA: 8 > go o n e - t a i 1ed a1 t e r n a t i v e hypotheses 2) HA: 8 < 8, 3 ) HA: 0 # 0, a t w o - t a i l e d a1 t e r n a t i v e hypothesis NOTE: O n e - t a i l e d hypothesis t e s t s s p e c i f y t h e d i r e c t i o n i n which you b e l i e v e t h e r e w i l l be e f f e c t s . For example, you might wish t o t e s t whether o r n o t g h e t t o r e s i d e n t s earn less ( b u t n o t more) than o t h e r s . c. a SIGNIFICANCE LEVEL t h a t s e t s t h e p r o b a b i l i t y t h a t you are w i l l i n g t o wronaly r e j e c t t h e n u l l hypothesis. The s i g n i f i c a n c e l e v e l ( o r LEVEL OF SIGNIFICANCE o r a-LEVEL) w i l l o f t e n be represented by t h e Greek l e t t e r , a. I n t h e n e x t s e c t i o n , a w i l l a l s o be r e f e r r e d t o as t h e " p r o b a b i l i t y o f a Type I e r r o r " (i.e., as t h e p r o b a b i l i t y o f r e j e c t i n g t h e n u l l hypothesis g i v e n t h a t t h e n u l l h y p o t h e s i s i s true). "How do you want it-the cryslal mumbejumbo or statistical probabilily?" 80 d. a PRECISION, A, t h a t sets t h e minimum s u b s t a n t i v e amount g r e a t e r o r smaller than 8, t h a t you are i n t e r e s t e d i n d e t e c t i n g as s t a t i s t i c a l l y s i g n i f i c a n t a t t h e s i g n i f i c a n c e l e v e l you have chosen. d i f f e r e n t l y , p r e c i s i o n i s t h e d i s t a n c e from 8, Put such t h a t a s m a l l e r d i s t a n c e would be t o o small t o be o f t h e o r e t i c a l i n t e r e s t . NOTE: You can decrease t h e p r e c i s i o n used i n y o u r hypothesis t e s t by i n c r e a s i n g your sample s i z e and, conversely, you can increase your p r e c i s i o n by decreasing your sample s i z e . Likewise, an increase i n your s i g n i f i c a n c e l e v e l y i e l d s a decrease i n p r e c i s i o n and a decrease i n s i g n i f i c a n c e l e v e l y i e l d s an increase i n p r e c i s i o n . It i s important t o recognize t h a t t h e smaller t h e p r e c i s i o n , t h e p r e c i s e y o u r hypothesis t e s t w i l l be. the precision ( i . . i n everyday usage) decreasing o f t h e p r e c i s i o n ( i . e . , A) more Put d i f f e r e n t l y , i n c r e a s i n g of your t e s t e n t a i l s a the test. (Sorry, b u t t h a t i s how s t a t i s t i c i a n s use t h i s term.) 2. CRITICAL VALUES Whether you have one o r two c r i t i c a l values depends on t h e form you have chosen f o r an a l t e r n a t i v e hypothesis. A f t e r t h i s , y o u r choices o f 8, and A determine what t h i s c r i t i c a l value(s) w i l l be. A c r i t i c a l value's meaning depends upon which a l t e r n a t i v e - h y p o t h e s i s - f o r m was chosen: a. I n a t w o - t a i l e d t e s t , values along t h e i n t e r v a l between t h e c r i t i c a l values are values o f t h e parameter t h a t a r e not s i g n i f i c a n t l y d i f f e r e n t from 8., b. The c r i t i c a l value when HA: 8 > 8, , i s t h e minimum value a t which t h e parameter could be considered s i g n i f i c a n t l y l a r ~ e trh a n Bo. HA: 0 < 0, c. The c r i t i c a l v a l u e when , i s t h e maximum v a l u e a t which t h e parameter c o u l d be considered s i g n i f i c a n t l y s m a l l e r t h a n 0., 3. REJECTION RULES Each hypothesis t e s t has a s i n q l e r e j e c t i o n r u l e associated w i t h i t . T h i s r e j e c t i o n r u l e determines when Ho i s t o be r e j e c t e d . The r u l e always d i r e c t s one t o r e j e c t Ho when t h e parameter's estimated v a l u e (here, 2) i s s u b s t a n t i v e l y d i f f e r e n t from 0., a. I n a t w o - t a i l e d t e s t , t h e r e j e c t i o n r u l e i s always t h a t Ho i s t o be rejected i f 2 i s n o t equal t o a number between t h e two c r i t i c a l values. b. When HA: 0 > 0, rejected i f c. When HA: 0 rejected i f A , 0 > 0, t h e r e j e c t i o n r u l e i s always t h a t Ho i s t o be + . A < 0, , t h e r e j e c t i o n r u l e i s always t h a t Ho i s t o be A 0 . < Ro - A 4. REJECTION REGIONS I f a f t e r d a t a are c o l l e c t e d s t a n t i v e l y d i f f e r e n t from 0, of 2 2 i s found t o be o f a magnitude t h a t i s sub( l e a d i n g you t o r e j e c t Ho), you may speak as "having f a l l e n i n t o " t h e r e j e c t i o n r e g i o n ( o r CRITICAL REGION). D. An ILLUSTRATION: Imagine t h a t we have a "random sample" o f n = 10 c o n t r a c t k i l l e r s . One t h e o r y o f deviance claims t h a t people become c o n t r a c t k i l l e r s o n l y i f t h e y had no r e l i g i o u s upbringing. D u r i n g a r e c e n t n a t i o n a l survey i t was found t h a t h a l f o f Americans had a r e l i g i o u s u p b r i n g i n g ( i . e . , ro = . 5 ) . "What proportion of your sample of contract killers must have had a religious upbringing in order to have evidence at the .05 level of significance in support of the deviance theory?" 1. Note that no * n s 5 (and, by the way, [l - no] * that the binomial distribution is called for here. n s 5 ), and thus Also note that the required hypothesis test is one-tailed, because the deviance theory suggests that contract killers will be religious upbringing than others. less likely to have had a The null and alternative hypotheses are thus as follows: 2. Some probabilities: We do NOT need to calculate any more probabilities to set up our rejection rule. This can be easily shown by illustrating as much of the probability distribution of Y as we have calculated so far: Note that Accordingly, t h e REJECTlON RULE i n t h i s i l l u s t r a t i o n i s , "Reject Ho i f Y < 2 ." ( I t would not be t o r e j e c t Ho i f Y < 3 , because t h e n you would then no l o n g e r be a p p l y i n g t h e .05 s i g n i f i c a n c e l e v e l . ) Once d a t a were c o l l e c t e d , you would need t o have 0 o r 1 ( b u t no more) c o n t r a c t k i l l e r s o u t o f t h e t e n t o have had a r e 1 i g i o u s u p b r i n g i n g i n o r d e r t h a t t h e d a t a would p r o v i d e s t a t i s t i c a l l y s i g n i f i c a n t evidence ( a t a i n support o f t h e deviance theory. = .05) Got i t ? E. ANOTHER ILLUSTRATION: Assume t h a t a random sample o f n = 25 New York C i t y g h e t t o r e s i d e n t s has been assembled and t h a t t h e i r average income i s $23,500 w i t h a standard d e v i a t i o n e s t i m a t e o f $8,000. Ho: p = $26,000 "Could one r e j e c t ( t h e average income f o r N.Y. C i t y r e s i d e n t s ) i n favor o f HA: p < $26,000 a t t h e .05 s i g n i f i c a n c e l e v e l ? " Given t h a t n = 25 and a = .05 (and making t h e u n l i k e l v assumption t h a t incomes w i t h i n t h e g h e t t o are normally d i s t r i b u t e d ) , we can use Table B t o find that t24, .05 o f t h e sample means ( i .e., = 1.711 . Moreover, t h e standard d e v i a t i o n t h e standard e r r o r ) can be estimated as f o l l o w s : x The c r i t i c a l value f o r OR $26,000 - ( 1.711 * i s thus $1,600 ) po = - $23,262.40 < $23,262.40 r u l e i s , "Reject Ho i f t h e acceptance r e g i o n ( i .e., * ( t24,.05 because ." . A ox ) Accordingly, t h e r e j e c t i o n Because r( = $23,500 $23,500 > $23,262.40 , ) falls in we f a i l t o r e j e c t t h e n u l l hypothesis. Note t h a t one can a l s o t e s t hypotheses by s t a n d a r d i z i n g o u r s t a t i s t i c s , and t a k i n g c r i t i c a l values d i r e c t l y from Tables A o r B. we c o u l d c a l c u l a t e an e m ~ i r i c a lvalue o f standard d e v i a t i o n s $23,500 ( i . e . , t In this illustration, t h a t would measure how many - X) i s away from $26,000 (i.e., One can a l s o s e t up a r e j e c t i o n r u l e i n a standardized form. one c o u l d use t h e r e j e c t i o n r u l e , "Reject Ho i f t24> -t24,.05 = -1.711 , For example, t 2 4< -1*711 we f a i l t o r e j e c t Ho as before. po): ." Because (As an exercise, you may wish t o convince y o u r s e l f a l g e b r a i c a l l y t h a t b o t h methods o f t e s t i n g hypotheses always l e a d you t o t h e same c o n c l u s i o n . ) F. To s t a n d a r d i z e o r n o t t o standardize ... 1. As j u s t noted, r e j e c t i o n r u l e s can be s e t up i n two ways. For example, i n t h e New York C i t y problem y o u r r e j e c t i o n r u l e c o u l d have been s t a t e d i n e i t h e r o f t h e f o l l o w i n g ways: a. R e j e c t t h e n u l l hypothesis t h a t New York C i t y g h e t t o r e s i d e n t s earn t h e same as New York C i t y r e s i d e n t s i f t h e average g h e t t o income ( x ) i s l e s s than $23,262.40 . b. R e j e c t t h e n u l l hypothesis t h a t New York C i t y g h e t t o r e s i d e n t s earn 85 t h e same as New York C i t y r e s i d e n t s i f t h e average g h e t t o income (1) i s more than 1.711 standard e r r o r s below $26,000. 2. Both ways o f expressing a r e j e c t i o n r u l e are c o r r e c t . I n homework problems (unless one o r t h e o t h e r i s s p e c i f i c a l l y requested) you may use either. However, be s u r e t h a t you understand how t o use both. G. a-LEVEL versus P-VALUE 1. The l e v e l o f s i g n i f i c a n c e ( o r a - l e v e l ) i s t h e p r o b a b i l i t y o f r e j e c t i n g Ho d e s p i t e t h e f a c t t h a t Ho i s t r u e . 2. The P-value i s t h e p r o b a b i l i t y under t h e n u l l hypothesis (i.e., assuming Ho t o be t r u e ) o f o b t a i n i n g a value f o r a s t a t i s t i c as s u p p o r t i v e o r more s u p p o r t i v e o f HA than t h e value d e r i v e d from t h e d a t a i n one's sample. L e t ' s s t a r t w i t h two examples: a. A r e c e n t i n t e r n a t i o n a l survey found t h a t t h e Netherlands has t a l l e r c i t i z e n s on average than every o t h e r c o u n t r y i n t h e world. average, Hollanders over t h e age o f 18 a r e 5'10" (i.e., tall. On 70 inches) You wish t o t e s t whether Hollanders who l i v e i n t h e c o u n t r y ' s two n o r t h e r n provinces, F r i e s l a n d and Groningen, are (as rumored) t a l l e r t h a n t h e average Hollander. You randomly sample 64 r e s i d e n t s o f t h e two n o r t h e r n provinces and f i n d t h e r e s i d e n t s ' average h e i g h t t o equal 6 f e e t , w i t h a variance o f 36 squared inches. "What i s t h e P-value associated w i t h these r e s u l t s ? " Answering t h i s question r e q u i r e s t h a t we determine how many standard e r r o r s 72 inches ( i . e . , t h e mean h e i g h t c a l c u l a t e d from d a t a on y o u r sample o f 64 Hollanders from t h e two provinces) i s g r e a t e r t h a n 70 inches ( i .e., t h e mean h e i g h t o f Hollanders o v e r a l l ) : The P-value associated w i t h t h i s z - s t a t i s t i c i s found i n Table A t o be p = P r ( Z > 2.67 ) = .0038 . I n words, "One would expect t o f i n d average h e i g h t s as l a r g e o r l a r g e r t h a n t h i s i n o n l y 3.8 o u t o f a thousand samples o f s i z e 64 t h a t were randomly sampled from a p o p u l a t i o n having t h e same d i s t r i b u t i o n o f h e i g h t s as t h a t o f a l l Hollanders." Accordingly, one may l e g i t i m a t e l y conclude t h a t t h e evidence supports y o u r ( a l t e r n a t i v e ) hypothesis t h a t Hollanders l i v i n g i n t h e two northernmost provinces a r e t a l l e r than Hollanders i n general. b. A U.S. p r e s i d e n t ' s approval r a t i n g was 50% b e f o r e h i s t e l e v i s e d apology f o r having had an "improper" r e l a t i o n s h i p w i t h a White House intern. Imagine t h a t you wish t o evaluate whether t h i s r a t i n g chansed a f t e r t h e apology was made. You draw a random sample o f 50 U.S. c i t i z e n s and f i n d t h a t 30 o f them "approve o f how t h e President i s running t h e country." Answering t h i s question r e q u i r e s t h a t we determine how many standard errors A r = 30/50 (i.e., t h e p r o p o r t i o n o f approvers among t h e 50 U.S. c i t i z e n s i n your sample) i s d i f f e r e n t from no = .5 (i.e., the p r o p o r t i o n o f approvers among U.S. c i t i z e n s p r i o r t o t h e apology): Because t h e wording o f t h e question i s " t w o - t a i l e d " ( i . e . , 87 it refers t o e v a l u a t i n g "changeu-not d e c l i n e o r improvement-in the p r e s i d e n t ' s r a t i n g ) , the P-value associated w i t h t h i s z - s t a t i s t i c i s t w i c e t h e p r o b a b i l i t y associated w i t h i t i n Table A. 2 + Pr( Z > 1.41 ) = .0793 . .0793 = .I586 That i s , p = I n words, "One would expect t o f i n d a change i n t h e p r e s i d e n t ' s approval as l a r g e o r l a r g e r than t h i s i n about 16 o u t o f a hundred samples o f s i z e 50 t h a t were randomly sampled from a p o p u l a t i o n i n which t h e r e had been no such change. " Accordingly, one may l e g i t i m a t e l y conclude t h a t t h e evidence does not support your ( a l t e r n a t i v e ) hypothesis t h a t t h e p r e s i d e n t ' s approval r a t i n g changed a f t e r h i s apology. c. Conclusion: Now when you see p = .097 i n a j o u r n a l a r t i c l e , you know t h a t t h i s r e f e r s t o some s t a t i s t i c ' s P-value. ( A l s o please A 'remember t h a t no, n o t r, i s used i n e s t i m a t i n g oA i n t h e process o f A r f i n d i n g a P-value f o r r . ) A 3. Here's a more general f o r m u l a t i o n : L e t 8 be an unbiased e s t i m a t o r o f t h e parameter 8 from some population. A Further, l e t O1 be t h e value o f t h e A s t a t i s t i c , 8, computed u s i n g data from t h e f i r s t random sample (among many p o s s i b l e random samples) from t h e p o p u l a t i o n . a. If HA: 8 > O0 b. I f HA: 8 < 8, A A , then p = Pr( 8 > O1 , then p A A = 8 = O0 ) . p O1 8, A then and = 1 tJo, d. If HA: 8 # 8, XI . p = 2 B1 > 8, 4. Thus, f o r example, i f ) = then and < 8 , c. I f HA: 8 # 8, A A P r ( 0 < O1 1 = 2 A P r ( 8 > O1 * A 1 8 = 8, ) A Pr( 8 < O1 1 8 = 8,) -5 i s t h e average " a f f i r m a t i v e a c t i o n approval score" i n our f i r s t - a n d - o n l y random sample o f Ku K l u x Klansmen . . (KKK members) and i f our n u l l hypothesis i s t h a t t h e average KKK score i s zero ( i . e . , Ho: p = p0 = 0 ) , t h e n we can d i v i d e t h e sampling d i s t r i b u t i o n i n t o three parts: a. I f HA: p > 0 , then p = Pr( T( > -5 b. I f HA: p < 0 , then p c. If HA: p # 0 = Pr( Ti < X1 and g i v e n t h a t p=2*Pr(X<-5 -5 = 1 p = 0 ) = [2] 1 p = -5 < po + [3] . 0 ) = [l] = 0 1 p = O ) = 2 * [ 1 ] = [ 1 ] , then +[3], g i v e n t h e symmetry o f t h e sampling d i s t r i b u t i o n o f X. 5. A f i n a l i l l u s t r a t i o n : "What would be t h e a - l e v e l and t h e P-value f o r t h e mean income from a sample o f 250 New York C i t y g h e t t o r e s i d e n t s , where we a r e t e s t i n g t h e same hypotheses a t t h e same s i g n i f i c a n c e l e v e l as t e s t e d b e f o r e (on a sample o f 25) and where, l i k e before, t h e sample mean equals $23,500 and t h e standard d e v i a t i o n e s t i m a t e equals $8,000?" Our n u l l and a l t e r n a t i v e hypotheses are again as f o l l o w s : Note t h a t f i n d i n g t h e a - l e v e l r e q u i r e s no c a l c u l a t i o n , because i t i s a number determined by t h e s t a t i s t i c i a n ( i n c o n j u n c t i o n w i t h t h e 89 scientific comnunity that provides audience to her analysis). To retain consistency with the previous illustration, we might decide to allow a - .05 in this illustration as well. More calculating is required to find a P-value, however. Based on the central limit theorem we know that Given this (and after converting units to thousands of dollars), a z-statistic can be calculated as follows: Referring to Table A we find that Pr( Z < -4.5 ) = .0000034 and Pr( Z < -5.0 ) = .000000287 . Thus one would expect to find an average income of $23,500 or less in approximately 3 in 10 million samples of size 250 that were randomly sampled from a population in which the true average income were $26,000. In other words, with a sample ten times larger than before, the same results strongly support the (alternative) hypothesis that ghetto residents earn less than New York City residents do overall. NOTE: Researchers commonly report such small P-values as p < .001 HANDOUT CONFIDENCE INTERVALS vs. HYPOTHESIS TESTS I.Confidence I n t e r v a l s a r e c a l c u l a t e d around a s t a t i s t i c AFTER d a t a are collected: A. When c a l c u l a t i n g a mean u s i n g i n t e r v a l - l e v e l data, A 1) if n > 30 , then use K t za/2 "x , or A 2) if n s 30 - , then use X + taI2 "x rn 3) Note t h a t when using t h e t - d i s t r i b u t i o n , one must assume t h a t 2 X - N ( BXP OX . B. When c a l c u l a t i n g a p r o p o r t i o n from nominal - l e v e l data, 1) i f e i t h e r n * A n < 5 or n * A (1 - n) < 5 , t h e n use t h e binomial d i s t r i b u t i o n t o determine t h e range o f p r o p o r t i o n s such t h a t no more than a/2 o f t h e p r o p o r t i o n s i n your sampling d i s t r i b u t i o n a r e h i g h e r than t h e upper bound o f t h e i n t e r v a l AND such t h a t no more than a/2 o f t h e p r o p o r t i o n s a r e l o w e r than t h e lower bound, o r 2) i f b o t h n * A n A > 5 and n * ( 1 - n ) > 5 , then use 11. Hypotheses a r e decided upon BEFORE d a t a a r e c o l l e c t e d : A. O n e - t a i l e d hypotheses: jto , n > 30 , then t h e r e j e c t i o n r u l e i s r e j e c t Ho i f b) if n 5 30 , then t h e r e j e c t i o n r u l e i s r e j e c t Ho i f jto , 1) When a) i f 2) When HA: HA: jt jt > < then t h e r e j e c t i o n r u l e s a r e t h e same as i n l), except t h a t one r e j e c t s Ho i f B. T w o - t a i l e d hypotheses: When HA: jt , # jto then t h e r e j e c t i o n r u l e i s r e j e c t Ho when falls outside the interval, C. When t e s t i n s h v ~ o t h e s e sabout p r o ~ o r t i o n s , use t h e above t e s t s ( w i t h z, n o t t ) ONLY I F n * ;> 5 AND n * (1 - i)> 5 . Otherwise t h e A b i n o m i a l d i s t r i b u t i o n i s used t o determine how l a r g e o r small n must be i n o r d e r f o r t h e n u l l hypothesis t o be r e j e c t e d a t a s p e c i f i c a - l e v e l .