A. STATISTICS make understanding t h e w o r l d SIMPLER! 1. As a researcher you want t o understand something about a p a r t i c u l a r group o f people, c o u n t r i e s , p l a n t s , animals, o r whatever. That i s , you are i n t e r e s t e d i n a c e r t a i n POPULATION (every s i n g l e i n s t a n c e o f a phenomenon). 2. The f i r s t s t e p i n making l i f e SIMPLE i s t o SAMPLE a subset o f y o u r population o f interest. Then you do n o t have t o c o n s i d e r everybody! 3. Your f i r s t i t e m o f business i s t h a t your sample i s r e p r e s e n t a t i v e o f t h e p o p u l a t i o n you are i n t e r e s t e d i n . The reason f o r t h i s i s t o ensure t h a t you can make INFERENCES from your sample t o t h e p o p u l a t i o n . a. N o t i c e t h e d i s t i n c t i o n here between DESCRIPTION o f y o u r sample and INFERENCES from y o u r sample t o your p o p u l a t i o n : E l inferred < drawn described I sample population Inferential Statistics < / \ stat'istician Descriotive s t a t i s t i c s b. STATISTICS are numbers t h a t DESCRIBE y o u r sample and t h a t a l l o w you t o make INFERENCES about p o p u l a t i o n PARAMETERS (those c h a r a c t e r i s t i c s you c o u l d o n l y know i f you measured everyone i n y o u r p o p u l a t i o n <Such c h a r a c t e r i s t i c s are n o t impossible t o know i f y o u r p o p u l a t i o n i s , f o r example, t h e s e n i o r leaders o f Ames churches, b u t i s n e a r l y impossible t o know i f your p o p u l a t i o n i s t h a t o f t h e U n i t e d S t a t e s ) . c. E.g., l e t ' s say we want t o e s t i m a t e how many Americans agree t h a t "The T a l i b a n i s e v i l . " "Americans." sample agree. F i r s t , we must o b t a i n a sample o f Then we might c a l c u l a t e t h a t 35% o f t h e people i n t h e ( T h i s percentage would be a DESCRIPTIVE STATISTIC.) F i n a l l y , we might INFER t h a t t h i s p r o p o r t i o n has a p r o b a b i l i t y o f , say . 9 5 , o f being w i t h i n .02 o f t h e t r u e POPULATION PARAMETER. That's a p r e t t y n i c e t r i c k ! so sure o f our s t a t i s t i c The h i t c h i s , o f course, how c o u l d we be . . . , so sure, even, t h a t we can assign a p r o b a b i l i t y t o t h e confidence t h a t we a r e c o r r e c t ? d. THE TRICK: We can ensure t h a t g e n e r a l i z a t i o n s may l e g i t i m a t e l y be made from our sample t o t h e p o p u l a t i o n o f i n t e r e s t i f element in t h i s p o o u l a t i o n had t h e same chance o f beinq an element i n t h e samole. 4. The d e f i n i t i o n o f a SIMPLE RANDOM SAMPLE f o l l o w s d i r e c t l y from t h i s condition. Y], ( D e f i n i t i o n : A group o f n members o f a p o p u l a t i o n [ o f s i z e having had t h e same chance o f s e l e c t i o n as every o t h e r group o f n members.) NOTE the c o n t r a s t between 'n' and I N ' here!!! B. Once we have obtained a sample t h a t i s r e p r e s e n t a t i v e o f o u r p o p u l a t i o n - o f i n t e r e s t , t h e r e i s a second way t h a t s t a t i s t i c s s i m p l i f y t h i n g s . Rather than examine a l l aspects o f e v e r y t h i n g sampled, s t a t i s t i c i a n s o b t a i n i n f o r m a t i o n about s p e c i f i c c h a r a c t e r i s t i c s o f t h e i r samples, and r e p r e s e n t t h i s i n f o r m a t i o n i n a DATA MATRIX. Data m a t r i c e s l o o k as f o l l o w s : V a r l Var2 Var3 person # 1 2 4 2 5 7 person #2 1 person #3 3 2 4 ... 1. There w i l l be e x a c t l y "n" rows i n t h i s d a t a matrix--one f o r each u n i t o f a n a l y s i s ( o r phenomenon o f i n t e r e s t ) t h a t has been sampled from t h e "Nu o f these u n i t s i n t h e population. One's u n i t o f a n a l y s i s can be " t h e person" (as above), o r " t h e m a r r i e d couple" (e.g., sampled from those a t t e n d i n g a m a r i t a l therapy c l i n i c ) , " t h e c o r p o r a t i o n " (e.g., from those l i s t e d among t h e Fortune 500), " t h e egg" (e.g., sampled sampled from among c l u t c h e s l a i d by sea t u r t l e s on C a l i f o r n i a beaches), e t c . 2. Each column i n t h e d a t a m a t r i x corresponds t o a d i s t i n c t t y p e o f i n f o r m a t i o n r e g a r d i n g each u n i t o f a n a l y s i s . Because t h i s i n f o r m a t i o n v a r i e s among u n i t s o f a n a l y s i s , s t a t i s t i c i a n s commonly r e f e r c o l l e c t i v e l y t o t h e i n f o r m a t i o n contained i n any one column o f t h i s m a t r i x as a " v a r i a b l e . " Thus t o say t h a t a d a t a m a t r i x has f i v e v a r i a b l e s , i s t o say t h a t i t has f i v e columns. C. A VARIABLE i s a c h a r a c t e r i s t i c o f u n i t s o f a n a l y s i s t h a t can have more than one a t t r i b u t e . For example, i n d i v i d u a l s may have t h e c h a r a c t e r i s t i c , gender; c o r p o r a t i o n s may have t h e c h a r a c t e r i s t i c , c e n t r a l i z a t i o n ; e t c . V a r i a b l e s may be d i s c r e t e o r continuous. 1. DISCRETE v a r i a b l e s have an enumerable s e t o f p o s s i b l e a t t r i b u t e s . E.g., gender, number o f c h i l d r e n ( c a n ' t have a h a l f - c h i l d ) , income ( c a n ' t have a h a l f - c e n t ) , r e l i g i o u s a f f i l i a t i o n , e t c . 2. CONTINUOUS v a r i a b l e s have an i n f i n i t e number o f a t t r i b u t e s . E.g., age (can be 22.23156 ... years o l d ) , a n x i e t y , e t c . 3 . NOTE: I n p r a c t i c e , v a r i a b l e s such as income ( s i n c e t h e y may have a l a r g e number o f d i f f e r e n t a t t r i b u t e s ) a r e g e n e r a l l y t r e a t e d as continuous i n s t a t i s t i c a l analyses. And such v a r i a b l e s as a n x i e t y and p r e j u d i c e ( s i n c e they a r e d i f f i c u l t t o measure except i n v e r y g r o s s terms) may be considered d i s c r e t e (e.g., D. The s t a t i s t i c i a n low, medium, o r h i g h ) . assiqns numbers t o each u n i t o f a n a l y s i s ' s a t t r i b u t e on each v a r i a b l e o f i n t e r e s t . T h i s assignment o f a t t r i b u t e t o one and o n l y one number i s c a l l e d OPERATIONALIZATION. - A t t r i b u t e s must have b o t h o f t h e following characteristics: 1 . EXHAUSTIVE: You must have a number f o r each p o t e n t i a l a t t r i b u t e . For example, some s t u d i e s may n o t f i n d male and female exhaustive f o r t h e i r purposes and may add c a t e g o r i e s (and associated numbers) f o r androgenous and u n d i f f e r e n t i a t e d . 2. MUTUALLY EXCLUSIVE: assigned two numbers. You should n o t have an a t t r i b u t e t o which you have E.g., you must be a b l e t o assign a s i n g l e occupational code t o t h e teacher who moonlights as a j a n i t o r . 4 E. Once you have found m u t u a l l y e x c l u s i v e and exhaustive c a t e g o r i e s t h a t a l l o w you t o o p e r a t i o n a l i z e y o u r measure (by assigning a unique number t o each s u b j e c t i n your sample), YOU MUST BE CAREFUL NOT TO PUT TOO MUCH FAITH I N I f you assign numbers according t o peoples' ages (e.g., THOSE NUMBERS. 40 t o t h e a t t r i b u t e 4 0 years o l d ) , then i f person X has t h e value 40 on t h e v a r i a b l e and person Y has t h e value 20 on t h e v a r i a b l e , you may conclude t h a t person X i s t w i c e as o l d as person Y. However, i f " P r o t e s t a n t " i s assigned t h e value 1 and " C a t h o l i c " i s assigned 2 , then you would have t r o u b l e arguing t h a t C a t h o l i c s have " t w i c e t h e r e 1 i g i o u s a f f i l i a t i o n " o f Protestants. The i s s u e here i s one o f LEVEL OF MEASUREMENT. And what i s i m p l i e d by t h i s s i l l y example i s t h a t t h e " h i g h e r " one's l e v e l o f measurement, t h e more meaningful are t h e numbers one has assigned t o a t t r i b u t e s o f your variable. L e t ' s be s p e c i f i c . T r a d i t i o n a l l y , t h e r e are s a i d t o be 4 l e v e l s o f measurement: Table 1: Amount o f I n f o r m a t i o n A v a i l a b l e f o r V a r i a b l e s a t Each o f t h e Four Levels o f Measurement. Level o f Measurement nominal ordinal interval ratio Difference X X X X Level o f I n f o r m a t i o n D i r e c t i o n Maqnitude X X X X X NOTE: T h i s t a b l e should be read, " I f a v a r i a b l e i s judge t h e Pro~ortion X , then you can between two o f i t s a t t r i b u t e s . " 1. NOMINAL - A t t r i b u t e s a r e exhaustive and m u t u a l l y e x c l u s i v e . That's a l l . Examples o f nominal v a r i a b l e s a r e gender, race, and r e l i g i o u s affiliation. The assignment o f numbers t o nominal v a r i a b l e s ' a t t r i b u t e s i s ARBITRARY. 2. ORDINAL - I n a d d i t i o n t o being exhaustive and m u t u a l l y e x c l u s i v e , 5 a t t r i b u t e s o f o r d i n a l v a r i a b l e s can be rank ordered. Examples o f o r d i n a l v a r i a b l e s a r e p r e j u d i c e , re1 i g i o s i t y , a n x i e t y , and occupational prestige. With such v a r i a b l e s you cannot say how much more o r l e s s p r e j u d i c e d , r e 1 i g i o u s , e t c . one person i s than another because you have no u n i t s i n terms o f which t o measure them. A l l you can do i s r a n k them from l o w t o high. 3. INTERVAL - Distances between i n t e r v a l v a r i a b l e s ' rank ordered a t t r i b u t e s can be measured i n terms o f u n i t s (e.g., income, numbers o f c h i l d r e n ) . also r a t i o - l e v e l variables. c e n t u r i e s i n time, d o l l a r s o f Nearly a l l i n t e r v a l - l e v e l v a r i a b l e s a r e Examples o f s t r i c t l y i n t e r v a l - l e v e l v a r i a b l e s a r e t i m e (unless one be1 i e v e s i n t h e "Big Boom" t h e o r y o f t h e universe) and, p o s s i b l y , IQ. (As IQ scores have become a standard measure o f i n t e l l i g e n c e , i n t e r v a l s between consecutive IQ scores have come t o be regarded as e q u i v a l e n t . ) 4. RATIO - I n a d d i t i o n t o i n d i c a t i n g d i r e c t i o n and d i s t a n c e , a t t r i b u t e s o f r a t i o - l e v e l v a r i a b l e s have a " t r u e zero p o i n t , " a l l o w i n g f o r statements t h a t person X has "so-many times as many o f " an a t t r i b u t e than person Y. Examples a r e age, number o f c h i l d r e n , years l i v i n g i n Ames, e t c . NOTE: You can always convert a h i g h e r - o r d e r v a r i a b l e t o a l o w e r - o r d e r one (e.g., age t o Zodiac), b u t NOT t h e r e v e r s e ! Thus i n d e s i g n i n g your own research, use t h e h i g h e s t " l e v e l o f measure" you t h i n k you might need. F. MEASURES OF CENTRAL TENDENCY: a NOMINAL v a r i a b l e (e.g., L e t ' s say t h a t we have a sample w i t h d a t a on r e l i g i o u s a f f i l i a t i o n ) and we want t o g i v e o u r reader a s i n q l e s t a t i s t i c t h a t p r o v i d e s a measure o f c e n t r a l tendency ( v . , a s t a t i s t i c used t o convey an impression o f what t h e t y p i c a l 6 measurement i n t h e sample i s l i k e ) on t h i s v a r i a b l e . Because nominal v a r i a b l e s ' a t t r i b u t e s are simply " d i f f e r e n t " from each o t h e r (e.g., P r o t e s t a n t i s n o t "more" o r " l e s s " t h a n C a t h o l i c ) , i t s measure o f c e n t r a l tendency i s always t h e MODE ( i .e., u n i t s o f a n a l y s i s i n one's sample). t h e a t t r i b u t e most common among t h e Thus i f most o f y o u r respondents are P r o t e s t a n t , y o u r sample's mode f o r t h e r e 1 i g i o u s a f f i l i a t i o n v a r i a b l e i s "Protestant." I f i n your d a t a m a t r i x t h e v a l u e o f 1 i s assigned t o t h i s a t t r i b u t e , t h e modal value f o r t h e v a r i a b l e i s "1." G. The measure o f c e n t r a l tendency used w i t h INTERVAL o r RATIO (e.g., v a r i a b l e s i s u s u a l l y t h e MEAN ( o r a r i t h m e t i c average). age) You c o u l d always f i n d t h e modal v a l u e f o r an i n t e r v a l - o r r a t i o - l e v e l v a r i a b l e , o f course. However, t h e mean i s more i n f o r m a t i v e than t h e mode i n t h a t i t conveys c e n t r a l i t y as a p o s i t i o n between extremes. Conversely, i t would be meaningless t o c a l c u l a t e a mean f o r nominal o r o r d i n a l v a r i a b l e s , because n e i t h e r o f these types o f measures have UNITS associated w i t h them. .. . L e t ' s say t h a t someone t e l l s you t h a t t h e mean age i n h e r sample i s f i v e . You do n o t know f o r sure what t h i s means unless you are t o l d whether these are f i v e years, f i v e months ( i n a study o f i n f a n t s ) , f i v e minutes ( i n one o f f r u i t f l i e s ) , etc. 1. Note t h a t t h e mean i s c a l c u l a t e d by adding up a l l values on a v a r i a b l e and then by d i v i d i n g by t h e sample s i z e : 2. A more compact way o f w r i t i n g t h i s uses t h e symbol sigma (X): 3. It i s easy t o v e r i f y t h a t (RULE 1) T h i s i s what I s h a l l r e f e r t o as RULE 1. Two o t h e r r u l e s are: RULE 2: and RULE 3: H. WARNING: Be c a r e f u l when c a l c u l a t i n g a mean across two ( o r more) groups! Suppose t h a t i n one STAT 401 c l a s s 16 o f t h e 20 students g e t As. another 15 o f 30 g e t As. In Now c o n s i d e r t h e question, "What i s t h e mean p r o p o r t i o n o f students who g e t As?" THIS I S A TRICK QUESTION!!! NOTE: The answer depends on what y o u r u n i t o f a n a l y s i s (phenomenon o f i n t e r e s t ) i s . I f STUDENTS are your u n i t s o f a n a l y s i s , you must d i v i d e t h e number o f students who g e t As by t h e t o t a l number o f students. mean as a p r o p o r t i o n . ) I f CLASSES are your u n i t s o f a n a l y s i s , then you add t h e p r o p o r t i o n s and d i v i d e by t h e number o f classes. mean o f p r o p o r t i o n s . ) statistical issue. (This w i l l y i e l d a (This w i l l y i e l d a Determininq your u n i t o f a n a l y s i s i s n o t a I n s t e a d i t depends on what you wish t o draw conclusions about: students o r classes. NOTE: When students a r e t h e u n i t s o f a n a l y s i s , t h e mean p r o p o r t i o n i s a WEIGHTED AVERAGE, where you "weight" t h e two p r o p o r t i o n s by c l a s s s i z e : n1 NOTICE t h a t t h e two weights are n1 2 nl + "2 f o r t h e second. + f o r t h e f i r s t c l a s s and n2 These two q u o t i e n t s are, o f course, each group's p r o p o r t i o n o f t h e t o t a l students. ALSO NOTE: This i s NOT t h e same as (pl + . p2)/2 = .65 I. TWO IMPORTANT PROPERTIES OF THE MEAN: 1. The sum o f t h e d e v i a t i o n s from t h e mean i s zero. n o t a t i o n , t h i s i s s t a t e d as - C(Xi - X) = 0 . Using summation This property i s i l l u s t r a t e d i n t h e f i r s t two columns below: TOTALS - X - (x-To 3 3-4.4=-1.4 7 7-4.4= 2.6 6.76 9 4 3 3-4.4=-1.4 1.96 1 4 4 4-4.4=-0.4 0.16 0 1 5 - 5-4.4= 0.36 1 0 12 13 22 ( x - x ) ~ (x-4)2 1.96 1 0.6 0.0 11.20 (x-512 4 X = 4.4 2. The mean i s t h e s i n g l e number which minimizes t h e square o f t h e d i f f e r e n c e from i t t o a l l o t h e r numbers i n t h e sample. number ' k ' , such t h a t k + X(Xi - That i s , f o r any T(, X12 < C(Xi - k12 F i r s t , please v e r i f y t h e p o i n t by n o t i n g t h e l a s t t h r e e columns above. Now l e t us use t h e SUMMATION NOTATION p r i n c i p l e s t o prove t h a t t h i s i s t r u e i n general: We begin by assuming t h e opposite and then prove t h a t t h e o p p o s i t e i s impossible: 2 So we assume Z(Xi - r(12 > Z(Xi cross-multiplying Z ( X ~- 2XXi + x2) > Z(Xi2 - 2kXi + k 2 ) rule 2 XX: - X2XXi + ZX2 > Z X ~- Z2kXi + Xk2 cancel 1 a t i o n C X ~- X2XXi > Zk2 - Z2kXi rule 1 Xf2 - 2%Xi > Zk2 - 2EXi - k) A t t h i s p o i n t n o t e t h a t , s i n c e by d e f i n i t i o n - X = l/n * EXi , i t f o l l o w s t h a t EXi = nX by t h e d e f . o f T( 1x2 - 2XnK > 1 k 2 - 2knX rule 3 nK2 - 2nX2 > nk2 - 2knX division a subtr. - x2 > k2 - 2kK add r e . i n e q u a l i t y 0 > k2 - 2kX + factoring 0 > x2 ( k - K ) ~2 0 Thus t h e r e i s no number ' k ' d i f f e r e n t from t h e mean such t h a t d e v i a t i o n s from i t are s m a l l e r than from t h e mean. J. Now, imagine t h a t we have a sample o f a s s i s t a n t p r o f e s s o r s from a l a r g e university. We want a measure o f c e n t r a l tendency o f how many p u b l i c a t i o n s they authored d u r i n g t h e l a s t two years. More s p e c i f i c a l l y , " I f we have a sample o f 70 a s s i s t a n t p r o f e s s o r s and a t o t a l o f 215 a r t i c l e s p u b l i s h e d (over h a l f from t h r e e busy bodies on t h e f a c u l t y ) , what would be t h e measure o f c e n t r a l tendency f o r t h e j u n i o r f a c u l t y ? " Referring t o the data below, you w i l l note t h a t t h e mean equals 3.07 p u b l i c a t i o n s . Clearly i t would be m i s l e a d i n g ( i f n o t b l a t a n t l y inaccurate, g i v e n t h a t o n l y 15 o f 70 p u b l i s h e d a t l e a s t t h r e e a r t i c l e s ) t o say t h a t t h e t y p i c a l a s s i s t a n t p r o f e s s o r p u b l i s h e d about 3 a r t i c l e s i n t h e l a s t two years. t h e MEDIAN ( i . e . , I n such cases t h e score such t h a t h a l f o f t h e u n i t s o f a n a l y s i s have h i g h e r scores and h a l f have l o w e r scores) p r o v i d e s a l e s s m i s l e a d i n g c e n t r a l tendency measure. F i g u r e 1: Frequency o f A s s i s t a n t Professors according t o t h e i r Pub1 i c a t i o n s d u r i n g t h e Last Two Years.* Number o f asst. profs. Number o f p u b l i c a t i o n s : Number o f a s s t . p r o f s . : So we have a s p e c i a l case. busy bodies 0 17 1 25 2 13 3 7 4 5 41 3 When we need a measure o f c e n t r a l tendency f o r i n t e r v a l / r a t i o d a t a we g e n e r a l l y want t h e mean. However, when t h e d a t a are s t r o n g l y skewed, t h e median i s t h e p r e f e r r e d measure. C a l c u l a t i n g a v a r i a b l e ' s median r e q u i r e s t h a t one f i r s t o r d e r t h e u n i t s o f a n a l y s i s (here t h e a s s i s t a n t p r o f e s s o r s ) from lowest ( z e r o p u b l i c a t i o n s ) t o h i g h e s t ( 4 1 p u b l i c a t i o n s ) values on t h e v a r i a b l e . o f t h i s v a r i a b l e f o r t h e middle-ranked u n i t . The median i s t h e value I n t h i s case t h e m i d d l e - ranked a s s i s t a n t p r o f e s s o r s are those ranked 35th and 36th ( i .e., p r e c i s e l y i n t h e m i d d l e o f t h e t o t a l 70 a s s i s t a n t p r o f e s s o r s ) . 11 those Note from I I "Shad w s c n ~ the oppaillon by mnarndng our mean height a Id1 them by mmncing ow msdon holghr7" t h e d i s t r i b u t i o n a t t h e bottom o f F i g u r e 1 t h a t a l l a s s i s t a n t p r o f e s s o r s from t h e leth t o t h e 42" ( i n c l u d i n g those ranked 35th and 36th) have values on t h e v a r i a b l e equal t o one p u b l i c a t i o n d u r i n g t h e l a s t two years. Accordingly, t h e median value on t h e pub1 i c a t i o n s v a r i a b l e equals "1." AN ASIDE: The modal value on t h e p u b l i c a t i o n s v a r i a b l e a l s o equals "1." Note t h a t t h i s does not u s u a l l y happen. when t h e mode = median = mean. An even more s p e c i a l case would be T h i s case c o u l d o n l y occur when t h e v a r i a b l e ' s d i s t r i b u t i o n i s p e r f e c t l y symmetric (whereby t h e mean would equal t h e median), and has highest frequency a t t h e mean/median. 12 K. A few more comments about MEDIANS: 1. I f t h e number o f your observations i s odd, t h e n t h e median i s t h e middle o b s e r v a t i o n a f t e r you have ordered them. (NOTE: Your v a r i a b l e must be a t l e a s t ORDINAL f o r you t o o r d e r them!) E.g., say we had a sample o f f i v e j u n i o r f a c u l t y and we ordered them according t o t h e number o f t h e i r pub1 i c a t i o n s : 0 0 1 3 5 Here, t h e median ( o r middle o b s e r v a t i o n ) would c l e a r l y equal "1." 2. I f o u r sample were o f s i x a s s i s t a n t professors, t h e median would be calculated a l i t t l e differently. Imagine t h a t our d a t a a r e as f o l l o w s : 0 0 1 2 3 5 The median would be t h e average o f t h e TWO middle observations. I.e., 1.5 = (1 + 2)/2 . 3. Note t h a t t h e median i s t h e value a t t h e 5oth PERCENTILE o f y o u r distribution. a. A general formula f o r d e t e r m i n i n g t h e value one f i n d s a t t h e kt h percentile o f a d i s t r i b u t i o n i s t h i s d i s t r i b u t i o n ' s value a t the th observation, where n = t h e t o t a l number o f one's observations. N o t i c e how t h i s formula works n i c e l y f o r both t h e even and odd examples i l l u s t r a t e d above. b. The formula works f i n e , o f course, u n t i l you f i n d t h a t t h e v a l u e a t t h e 9gth p e r c e n t i l e i s t h e value a t t h e 67.7th 100 ] +t ( = [ ( 96 x 70 ) i ) observation, which f a l l s somewhere between 4 ( t h e 67th observation) and 41 ( t h e 6ath o b s e r v a t i o n ) . I n t h i s case, you would want a value t h a t equals 4 p l u s 0.7 o f t h e d i s t a n c e between 4 and 41. Because t h e d i s t a n c e from 4 t o 41 i s 37, t h e v a l u e a t t h e 9gth p e r c e n t i l e would equal 29.9 ( = 4 + [ 0.7 * ( 41 - 4 ) ] ) publications. 4. Beside t h e median, t h e two next most i m p o r t a n t p e r c e n t i l e s are t h e 25th and 75th percentiles-also distribution. known as t h e lower and upper QUARTILES o f a The d i f f e r e n c e between t h e upper ( o r 3rd) and l o w e r ( o r l S t ) q u a r t i l e s i s c a l l e d t h e INTERQUARTILE RANGE ( o r IQR). To c a l c u l a t e t h e I Q R o f o u r d i s t r i b u t i o n o f a s s i s t a n t professors, we must f i n d t h e number o f p u b l i c a t i o n s o f t h e would be 1 ( j u s t b a r e l y ) and 2 p u b l i c a t i o n s . lath and 53rd persons. That The I Q R would be t h e d i f f e r e n c e between these numbers, o r 1 pub1 i c a t i o n . Here the meaninq of t h e i n t e r a u a r t i l e ranqe i s t h a t t h e m i d d l e 50% o f t h e j u n i o r f a c u l t y i n y o u r sample d i f f e r i n t h e i r numbers o f p u b l i c a t i o n s i n t h e l a s t two years by no more than one p u b l i c a t i o n . L. The I Q R belongs t o a l a r g e r c l a s s o f MEASURES OF DISPERSION (i.e., measures t h a t i n d i c a t e how " w i d e l y dispersed" one's d a t a a r e on a p a r t i c u l a r measure-here, on t h e d i s p e r s i o n o f p u b l i c a t i o n s by t h e j u n i o r f a c u l t y ) . Two o t h e r measures o f d i s p e r s i o n a r e t h e f o l l o w i n g : 1. RANGE - the simplest measure, b u t a l s o t h e l e a s t i n f o r m a t i v e , s i n c e i t o n l y i n d i c a t e s t h e d i f f e r e n c e between t h e h i g h e s t and lowest values. The range o f p u b l i c a t i o n s i s 41-0=41, which i s a range l a r g e enough t o suggest t h a t t h e r e may be one o r more o u t l i e r s i n t h e data. (An OUTLIER i s an i d i o s y n c r a t i c a l l y l a r g e [ o r s m a l l ] value on a v a r i a b l e . ) advantage o f r e p o r t i n g a RANGE i s t h a t i t shows y o u r reader t h e 14 The magnitude o f any o u t l i e r s you may have. ( I N PRACTICE one u s u a l l y j u s t r e p o r t s t h e i n d i v i d u a l o u t l i e r s i n one's d a t a and drops them from analysis.) 2. The VARIANCE - t h e averase sauared d e v i a t i o n from t h e mean. It i s t h e measure o f d i s p e r s i o n a p p r o p r i a t e t o i n t e r v a l and r a t i o v a r i a b l e s . The STANDARD DEVIATION i s t h e p o s i t i v e square r o o t o f t h e variance. The symbol f o r t h e p o p u l a t i o n v a r i a n c e i s t h e l o w e r case sigma squared 2 (ox). The sample v a r i a n c e i s r e f e r r e d t o w i t h t h e l o w e r case " s " 2 ). squared ( s X The symbol f o r t h e estimated p o p u l a t i o n v a r i a n c e i s t h e "2 lower case sigma squared w i t h a " h a t " over i t (ox). The symbols f o r t h e p o p u l a t i o n , sample, and "estimated p o p u l a t i o n " standard d e v i a t i o n s a r e r e s p e c t i v e l y t h e same, b u t n o t squared (i.e., s u p e r s c r i p t e d "2"). t h e same, b u t w i t h o u t t h e Formulas f o r these variances a r e as f o l l o w s : NOTE: We s h a l l t a l k about t h e l a s t formula's "n-1" w i t h i n t h e n e x t few weeks. REMEMBER: C e r t a i n measures o f c e n t r a l tendency and o f d i s p e r s i o n are a p p r o p r i a t e t o c e r t a i n types o f v a r i a b l e s . WITH WHICH!!! Be sure you KNOW WHICH GOES When means a r e used as measures o f c e n t r a l tendency, variances should be used as measures o f d i s p e r s i o n ; when medians a r e used as measures o f c e n t r a l tendency, i n t e r q u a r t i l e ranges should be used as measures o f d i s p e r s i o n . Note a l s o t h a t t h e r e i s d i s p e r s i o n when t h e mode i s t h e a o ~ r o p r i a t emeasure of no measure of c e n t r a l tendency ( a t l e a s t n o t when d i s p e r s i o n i s conceived as v a r i a t i o n "above" o r "below" t h e v a r i a b l e ' s mode).' T h i s i s because none o f a n o m i n a l - l e v e l v a r i a b l e ' s a t t r i b u t e s can be r e f e r r e d t o as " c l o s e r t o " o r " f u r t h e r from" another o f t h e v a r i a b l e ' s a t t r i b u t e s . Accordingly, i t i s meaningless t o speak o f these a t t r i b u t e s as being "dispersed" along some common continuum. O f course, a nominal v a r i a b l e ' s v a r i a b i l i t y can be measured by n o t i n g t h e e x t e n t t o which i t s u n i t s o f a n a l y s i s a r e evenly p r o p o r t i o n e d among i t s a t t r i b u t e s . For example, t h e r e i s more v a r i a b i l i t y i n r e l i g i o n when h a l f are P r o t e s t a n t and h a l f C a t h o l i c than when 90 percent a r e P r o t e s t a n t . Yet such measures (e.g., t h e Index o f Q u a l i t a t i v e V a r i a t i o n ) do n o t c o n s t i t u t e measures o f d i s p e r s i o n as d e f i n e d here ( i . e . , measures o f v a r i a t i o n along a s i n g l e dimension above and below a v a r i a b l e ' s measure o f c e n t r a l tendency).