Document 10684117

advertisement
A. STATISTICS make understanding t h e w o r l d SIMPLER!
1. As a researcher you want t o understand something about a p a r t i c u l a r
group o f people, c o u n t r i e s , p l a n t s , animals, o r whatever.
That i s , you
are i n t e r e s t e d i n a c e r t a i n POPULATION (every s i n g l e i n s t a n c e o f a
phenomenon).
2. The f i r s t s t e p i n making l i f e SIMPLE i s t o SAMPLE a subset o f y o u r
population o f interest.
Then you do n o t have t o c o n s i d e r everybody!
3. Your f i r s t i t e m o f business i s t h a t your sample i s r e p r e s e n t a t i v e o f t h e
p o p u l a t i o n you are i n t e r e s t e d i n .
The reason f o r t h i s i s t o ensure t h a t
you can make INFERENCES from your sample t o t h e p o p u l a t i o n .
a. N o t i c e t h e d i s t i n c t i o n here between DESCRIPTION o f y o u r sample and
INFERENCES from y o u r sample t o your p o p u l a t i o n :
E l
inferred
<
drawn
described
I
sample
population
Inferential Statistics
<
/ \
stat'istician
Descriotive s t a t i s t i c s
b. STATISTICS are numbers t h a t DESCRIBE y o u r sample and t h a t a l l o w you
t o make INFERENCES about p o p u l a t i o n PARAMETERS (those c h a r a c t e r i s t i c s
you c o u l d o n l y know i f you measured everyone i n y o u r p o p u l a t i o n <Such c h a r a c t e r i s t i c s are n o t impossible t o know i f y o u r p o p u l a t i o n
i s , f o r example, t h e s e n i o r leaders o f Ames churches, b u t i s n e a r l y
impossible t o know i f your p o p u l a t i o n i s t h a t o f t h e U n i t e d S t a t e s ) .
c. E.g.,
l e t ' s say we want t o e s t i m a t e how many Americans agree t h a t
"The T a l i b a n i s e v i l . "
"Americans."
sample agree.
F i r s t , we must o b t a i n a sample o f
Then we might c a l c u l a t e t h a t 35% o f t h e people i n t h e
( T h i s percentage would be a DESCRIPTIVE STATISTIC.)
F i n a l l y , we might INFER t h a t t h i s p r o p o r t i o n has a p r o b a b i l i t y o f ,
say . 9 5 , o f being w i t h i n .02 o f t h e t r u e POPULATION PARAMETER.
That's a p r e t t y n i c e t r i c k !
so sure o f our s t a t i s t i c
The h i t c h i s , o f course, how c o u l d we be
. . . , so
sure, even, t h a t we can assign a
p r o b a b i l i t y t o t h e confidence t h a t we a r e c o r r e c t ?
d. THE TRICK: We can ensure t h a t g e n e r a l i z a t i o n s may l e g i t i m a t e l y be
made from our sample t o t h e p o p u l a t i o n o f i n t e r e s t i f
element
in t h i s p o o u l a t i o n had t h e same chance o f beinq an element i n t h e
samole.
4. The d e f i n i t i o n o f a SIMPLE RANDOM SAMPLE f o l l o w s d i r e c t l y from t h i s
condition.
Y],
( D e f i n i t i o n : A group o f n members o f a p o p u l a t i o n [ o f s i z e
having had t h e same chance o f s e l e c t i o n as every o t h e r group o f n
members.)
NOTE the c o n t r a s t between 'n' and I N ' here!!!
B. Once we have obtained a sample t h a t i s r e p r e s e n t a t i v e o f o u r p o p u l a t i o n - o f i n t e r e s t , t h e r e i s a second way t h a t s t a t i s t i c s s i m p l i f y t h i n g s .
Rather
than examine a l l aspects o f e v e r y t h i n g sampled, s t a t i s t i c i a n s o b t a i n
i n f o r m a t i o n about s p e c i f i c c h a r a c t e r i s t i c s o f t h e i r samples, and r e p r e s e n t
t h i s i n f o r m a t i o n i n a DATA MATRIX.
Data m a t r i c e s l o o k as f o l l o w s :
V a r l Var2 Var3
person # 1 2
4
2
5
7
person #2 1
person #3 3
2
4
...
1. There w i l l be e x a c t l y "n" rows i n t h i s d a t a matrix--one f o r each u n i t o f
a n a l y s i s ( o r phenomenon o f i n t e r e s t ) t h a t has been sampled from t h e "Nu
o f these u n i t s i n t h e population.
One's u n i t o f a n a l y s i s can be " t h e
person" (as above), o r " t h e m a r r i e d couple" (e.g.,
sampled from those
a t t e n d i n g a m a r i t a l therapy c l i n i c ) , " t h e c o r p o r a t i o n " (e.g.,
from those l i s t e d among t h e Fortune 500), " t h e egg" (e.g.,
sampled
sampled from
among c l u t c h e s l a i d by sea t u r t l e s on C a l i f o r n i a beaches), e t c .
2. Each column i n t h e d a t a m a t r i x corresponds t o a d i s t i n c t t y p e o f
i n f o r m a t i o n r e g a r d i n g each u n i t o f a n a l y s i s .
Because t h i s i n f o r m a t i o n
v a r i e s among u n i t s o f a n a l y s i s , s t a t i s t i c i a n s commonly r e f e r
c o l l e c t i v e l y t o t h e i n f o r m a t i o n contained i n any one column o f t h i s
m a t r i x as a " v a r i a b l e . "
Thus t o say t h a t a d a t a m a t r i x has f i v e
v a r i a b l e s , i s t o say t h a t i t has f i v e columns.
C. A VARIABLE i s a c h a r a c t e r i s t i c o f u n i t s o f a n a l y s i s t h a t can have more than
one a t t r i b u t e .
For example, i n d i v i d u a l s may have t h e c h a r a c t e r i s t i c ,
gender; c o r p o r a t i o n s may have t h e c h a r a c t e r i s t i c , c e n t r a l i z a t i o n ; e t c .
V a r i a b l e s may be d i s c r e t e o r continuous.
1. DISCRETE v a r i a b l e s have an enumerable s e t o f p o s s i b l e a t t r i b u t e s .
E.g.,
gender, number o f c h i l d r e n ( c a n ' t have a h a l f - c h i l d ) , income
( c a n ' t have a h a l f - c e n t ) , r e l i g i o u s a f f i l i a t i o n , e t c .
2. CONTINUOUS v a r i a b l e s have an i n f i n i t e number o f a t t r i b u t e s .
E.g.,
age
(can be 22.23156 ... years o l d ) , a n x i e t y , e t c .
3 . NOTE:
I n p r a c t i c e , v a r i a b l e s such as income ( s i n c e t h e y may have a
l a r g e number o f d i f f e r e n t a t t r i b u t e s ) a r e g e n e r a l l y t r e a t e d as
continuous i n s t a t i s t i c a l analyses.
And such v a r i a b l e s as a n x i e t y and
p r e j u d i c e ( s i n c e they a r e d i f f i c u l t t o measure except i n v e r y g r o s s
terms) may be considered d i s c r e t e (e.g.,
D.
The s t a t i s t i c i a n
low, medium, o r h i g h ) .
assiqns numbers t o each u n i t o f a n a l y s i s ' s a t t r i b u t e on
each v a r i a b l e o f i n t e r e s t .
T h i s assignment o f a t t r i b u t e t o one and o n l y
one number i s c a l l e d OPERATIONALIZATION.
-
A t t r i b u t e s must have b o t h o f t h e
following characteristics:
1 . EXHAUSTIVE:
You must have a number f o r each p o t e n t i a l a t t r i b u t e .
For
example, some s t u d i e s may n o t f i n d male and female exhaustive f o r t h e i r
purposes and may add c a t e g o r i e s (and associated numbers) f o r androgenous
and u n d i f f e r e n t i a t e d .
2. MUTUALLY EXCLUSIVE:
assigned two numbers.
You should n o t have an a t t r i b u t e t o which you have
E.g.,
you must be a b l e t o assign a s i n g l e
occupational code t o t h e teacher who moonlights as a j a n i t o r .
4
E. Once you have found m u t u a l l y e x c l u s i v e and exhaustive c a t e g o r i e s t h a t a l l o w
you t o o p e r a t i o n a l i z e y o u r measure (by assigning a unique number t o each
s u b j e c t i n your sample), YOU MUST BE CAREFUL NOT TO PUT TOO MUCH FAITH I N
I f you assign numbers according t o peoples' ages (e.g.,
THOSE NUMBERS.
40
t o t h e a t t r i b u t e 4 0 years o l d ) , then i f person X has t h e value 40 on t h e
v a r i a b l e and person Y has t h e value 20 on t h e v a r i a b l e , you may conclude
t h a t person X i s t w i c e as o l d as person Y.
However, i f " P r o t e s t a n t " i s
assigned t h e value 1 and " C a t h o l i c " i s assigned 2 , then you would have
t r o u b l e arguing t h a t C a t h o l i c s have " t w i c e t h e r e 1 i g i o u s a f f i l i a t i o n " o f
Protestants.
The i s s u e here i s one o f LEVEL OF MEASUREMENT.
And what i s
i m p l i e d by t h i s s i l l y example i s t h a t t h e " h i g h e r " one's l e v e l o f
measurement, t h e more meaningful are t h e numbers one has assigned t o
a t t r i b u t e s o f your variable.
L e t ' s be s p e c i f i c .
T r a d i t i o n a l l y , t h e r e are s a i d t o be 4 l e v e l s o f measurement:
Table 1: Amount o f I n f o r m a t i o n A v a i l a b l e f o r V a r i a b l e s a t Each o f t h e
Four Levels o f Measurement.
Level o f
Measurement
nominal
ordinal
interval
ratio
Difference
X
X
X
X
Level o f I n f o r m a t i o n
D i r e c t i o n Maqnitude
X
X
X
X
X
NOTE: T h i s t a b l e should be read, " I f a v a r i a b l e i s
judge t h e
Pro~ortion
X
,
then you can
between two o f i t s a t t r i b u t e s . "
1. NOMINAL - A t t r i b u t e s a r e exhaustive and m u t u a l l y e x c l u s i v e .
That's a l l .
Examples o f nominal v a r i a b l e s a r e gender, race, and r e l i g i o u s
affiliation.
The assignment o f numbers t o nominal v a r i a b l e s ' a t t r i b u t e s
i s ARBITRARY.
2. ORDINAL
-
I n a d d i t i o n t o being exhaustive and m u t u a l l y e x c l u s i v e ,
5
a t t r i b u t e s o f o r d i n a l v a r i a b l e s can be rank ordered.
Examples o f
o r d i n a l v a r i a b l e s a r e p r e j u d i c e , re1 i g i o s i t y , a n x i e t y , and occupational
prestige.
With such v a r i a b l e s you cannot say how much more o r l e s s
p r e j u d i c e d , r e 1 i g i o u s , e t c . one person i s than another because you have
no u n i t s i n terms o f which t o measure them.
A l l you can do i s r a n k them
from l o w t o high.
3. INTERVAL - Distances between i n t e r v a l v a r i a b l e s ' rank ordered a t t r i b u t e s
can be measured i n terms o f u n i t s (e.g.,
income, numbers o f c h i l d r e n ) .
also r a t i o - l e v e l variables.
c e n t u r i e s i n time, d o l l a r s o f
Nearly a l l i n t e r v a l - l e v e l v a r i a b l e s a r e
Examples o f s t r i c t l y i n t e r v a l - l e v e l
v a r i a b l e s a r e t i m e (unless one be1 i e v e s i n t h e "Big Boom" t h e o r y o f t h e
universe) and, p o s s i b l y , IQ.
(As IQ
scores have become a standard
measure o f i n t e l l i g e n c e , i n t e r v a l s between consecutive IQ
scores have
come t o be regarded as e q u i v a l e n t . )
4. RATIO
-
I n a d d i t i o n t o i n d i c a t i n g d i r e c t i o n and d i s t a n c e , a t t r i b u t e s o f
r a t i o - l e v e l v a r i a b l e s have a " t r u e zero p o i n t , " a l l o w i n g f o r statements
t h a t person X has "so-many times as many o f " an a t t r i b u t e than person Y.
Examples a r e age, number o f c h i l d r e n , years l i v i n g i n Ames, e t c .
NOTE:
You can always convert a h i g h e r - o r d e r v a r i a b l e t o a l o w e r - o r d e r
one (e.g.,
age t o Zodiac), b u t NOT t h e r e v e r s e !
Thus i n d e s i g n i n g your
own research, use t h e h i g h e s t " l e v e l o f measure" you t h i n k you might
need.
F. MEASURES OF CENTRAL TENDENCY:
a NOMINAL v a r i a b l e (e.g.,
L e t ' s say t h a t we have a sample w i t h d a t a on
r e l i g i o u s a f f i l i a t i o n ) and we want t o g i v e o u r
reader a s i n q l e s t a t i s t i c t h a t p r o v i d e s a measure o f c e n t r a l tendency
( v . ,
a s t a t i s t i c used t o convey an impression o f what t h e t y p i c a l
6
measurement i n t h e sample i s l i k e ) on t h i s v a r i a b l e .
Because nominal
v a r i a b l e s ' a t t r i b u t e s are simply " d i f f e r e n t " from each o t h e r (e.g.,
P r o t e s t a n t i s n o t "more" o r " l e s s " t h a n C a t h o l i c ) , i t s measure o f c e n t r a l
tendency i s always t h e MODE ( i .e.,
u n i t s o f a n a l y s i s i n one's sample).
t h e a t t r i b u t e most common among t h e
Thus i f most o f y o u r respondents are
P r o t e s t a n t , y o u r sample's mode f o r t h e r e 1 i g i o u s a f f i l i a t i o n v a r i a b l e i s
"Protestant."
I f i n your d a t a m a t r i x t h e v a l u e o f 1 i s assigned t o t h i s
a t t r i b u t e , t h e modal value f o r t h e v a r i a b l e i s "1."
G. The measure o f c e n t r a l tendency used w i t h INTERVAL o r RATIO (e.g.,
v a r i a b l e s i s u s u a l l y t h e MEAN ( o r a r i t h m e t i c average).
age)
You c o u l d always
f i n d t h e modal v a l u e f o r an i n t e r v a l - o r r a t i o - l e v e l v a r i a b l e , o f course.
However, t h e mean i s more i n f o r m a t i v e than t h e mode i n t h a t i t conveys
c e n t r a l i t y as a p o s i t i o n between extremes.
Conversely, i t would be
meaningless t o c a l c u l a t e a mean f o r nominal o r o r d i n a l v a r i a b l e s , because
n e i t h e r o f these types o f measures have UNITS associated w i t h them.
.. .
L e t ' s say t h a t someone t e l l s you t h a t t h e mean age i n h e r sample i s f i v e .
You do n o t know f o r sure what t h i s means unless you are t o l d whether these
are f i v e years, f i v e months ( i n a study o f i n f a n t s ) , f i v e minutes ( i n one
o f f r u i t f l i e s ) , etc.
1. Note t h a t t h e mean i s c a l c u l a t e d by adding up a l l values on a v a r i a b l e
and then by d i v i d i n g by t h e sample s i z e :
2. A more compact way o f w r i t i n g t h i s uses t h e symbol sigma (X):
3. It i s easy t o v e r i f y t h a t (RULE 1)
T h i s i s what I s h a l l r e f e r t o as RULE 1.
Two o t h e r r u l e s are:
RULE 2:
and RULE 3:
H. WARNING:
Be c a r e f u l when c a l c u l a t i n g a mean across two ( o r more) groups!
Suppose t h a t i n one STAT 401 c l a s s 16 o f t h e 20 students g e t As.
another 15 o f 30 g e t As.
In
Now c o n s i d e r t h e question, "What i s t h e mean
p r o p o r t i o n o f students who g e t As?"
THIS I S A TRICK QUESTION!!!
NOTE: The
answer depends on what y o u r u n i t o f a n a l y s i s (phenomenon o f i n t e r e s t ) i s .
I f STUDENTS are your u n i t s o f a n a l y s i s , you must d i v i d e t h e number o f
students who g e t As by t h e t o t a l number o f students.
mean as a p r o p o r t i o n . )
I f CLASSES are your u n i t s o f a n a l y s i s , then you add
t h e p r o p o r t i o n s and d i v i d e by t h e number o f classes.
mean o f p r o p o r t i o n s . )
statistical
issue.
(This w i l l y i e l d a
(This w i l l y i e l d a
Determininq your u n i t o f a n a l y s i s i s n o t a
I n s t e a d i t depends on what you wish t o draw conclusions
about: students o r classes.
NOTE:
When
students a r e t h e u n i t s o f a n a l y s i s , t h e mean p r o p o r t i o n i s a
WEIGHTED AVERAGE, where you "weight" t h e two p r o p o r t i o n s by c l a s s s i z e :
n1
NOTICE t h a t t h e two weights are
n1
2
nl
+ "2
f o r t h e second.
+
f o r t h e f i r s t c l a s s and
n2
These two q u o t i e n t s are, o f course, each
group's p r o p o r t i o n o f t h e t o t a l students.
ALSO NOTE:
This i s
NOT
t h e same as (pl
+
.
p2)/2 = .65
I. TWO IMPORTANT PROPERTIES OF THE MEAN:
1. The sum o f t h e d e v i a t i o n s from t h e mean i s zero.
n o t a t i o n , t h i s i s s t a t e d as
-
C(Xi - X) = 0
.
Using summation
This property i s
i l l u s t r a t e d i n t h e f i r s t two columns below:
TOTALS
-
X
-
(x-To
3
3-4.4=-1.4
7
7-4.4= 2.6
6.76
9
4
3
3-4.4=-1.4
1.96
1
4
4
4-4.4=-0.4
0.16
0
1
5
-
5-4.4=
0.36
1
0
12
13
22
( x - x ) ~ (x-4)2
1.96
1
0.6
0.0
11.20
(x-512
4
X = 4.4
2. The mean i s t h e s i n g l e number which minimizes t h e square o f t h e
d i f f e r e n c e from i t t o a l l o t h e r numbers i n t h e sample.
number ' k ' ,
such t h a t k
+
X(Xi -
That i s , f o r any
T(,
X12 < C(Xi - k12
F i r s t , please v e r i f y t h e p o i n t by n o t i n g t h e l a s t t h r e e columns above.
Now l e t us use t h e SUMMATION NOTATION p r i n c i p l e s t o prove t h a t t h i s i s
t r u e i n general:
We begin by assuming t h e opposite and then prove t h a t t h e o p p o s i t e i s
impossible:
2
So we assume
Z(Xi - r(12
> Z(Xi
cross-multiplying
Z ( X ~- 2XXi
+
x2)
> Z(Xi2 - 2kXi + k 2 )
rule 2
XX: - X2XXi
+
ZX2
> Z X ~- Z2kXi + Xk2
cancel 1 a t i o n
C X ~- X2XXi
> Zk2 - Z2kXi
rule 1
Xf2 - 2%Xi
> Zk2 - 2EXi
-
k)
A t t h i s p o i n t n o t e t h a t , s i n c e by d e f i n i t i o n
-
X = l/n
*
EXi
,
i t f o l l o w s t h a t EXi = nX
by t h e d e f . o f T(
1x2 - 2XnK
> 1 k 2 - 2knX
rule 3
nK2 - 2nX2
> nk2 - 2knX
division
a
subtr.
-
x2
> k2 - 2kK
add r e . i n e q u a l i t y
0 > k2 - 2kX +
factoring
0 >
x2
( k - K ) ~2
0
Thus t h e r e i s no number ' k ' d i f f e r e n t from t h e mean such t h a t d e v i a t i o n s
from i t are s m a l l e r than from t h e mean.
J. Now, imagine t h a t we have a sample o f a s s i s t a n t p r o f e s s o r s from a l a r g e
university.
We want a measure o f c e n t r a l tendency o f how many p u b l i c a t i o n s
they authored d u r i n g t h e l a s t two years.
More s p e c i f i c a l l y , " I f we have a
sample o f 70 a s s i s t a n t p r o f e s s o r s and a t o t a l o f 215 a r t i c l e s p u b l i s h e d
(over h a l f from t h r e e busy bodies on t h e f a c u l t y ) , what would be t h e
measure o f c e n t r a l tendency f o r t h e j u n i o r f a c u l t y ? "
Referring t o the data
below, you w i l l note t h a t t h e mean equals 3.07 p u b l i c a t i o n s .
Clearly i t
would be m i s l e a d i n g ( i f n o t b l a t a n t l y inaccurate, g i v e n t h a t o n l y 15 o f 70
p u b l i s h e d a t l e a s t t h r e e a r t i c l e s ) t o say t h a t t h e t y p i c a l a s s i s t a n t
p r o f e s s o r p u b l i s h e d about 3 a r t i c l e s i n t h e l a s t two years.
t h e MEDIAN ( i . e . ,
I n such cases
t h e score such t h a t h a l f o f t h e u n i t s o f a n a l y s i s have
h i g h e r scores and h a l f have l o w e r scores) p r o v i d e s a l e s s m i s l e a d i n g
c e n t r a l tendency measure.
F i g u r e 1: Frequency o f A s s i s t a n t Professors according t o t h e i r
Pub1 i c a t i o n s d u r i n g t h e Last Two Years.*
Number o f
asst.
profs.
Number o f p u b l i c a t i o n s :
Number o f a s s t . p r o f s . :
So we have a s p e c i a l case.
busy
bodies
0
17
1
25
2
13
3
7
4
5
41
3
When we need a measure o f c e n t r a l tendency f o r
i n t e r v a l / r a t i o d a t a we g e n e r a l l y want t h e mean.
However, when t h e d a t a are
s t r o n g l y skewed, t h e median i s t h e p r e f e r r e d measure.
C a l c u l a t i n g a v a r i a b l e ' s median r e q u i r e s t h a t one f i r s t o r d e r t h e u n i t s o f
a n a l y s i s (here t h e a s s i s t a n t p r o f e s s o r s ) from lowest ( z e r o p u b l i c a t i o n s ) t o
h i g h e s t ( 4 1 p u b l i c a t i o n s ) values on t h e v a r i a b l e .
o f t h i s v a r i a b l e f o r t h e middle-ranked u n i t .
The median i s t h e value
I n t h i s case t h e m i d d l e -
ranked a s s i s t a n t p r o f e s s o r s are those ranked 35th and 36th ( i .e.,
p r e c i s e l y i n t h e m i d d l e o f t h e t o t a l 70 a s s i s t a n t p r o f e s s o r s ) .
11
those
Note from
I
I
"Shad w s c n ~
the oppaillon by mnarndng our mean height a Id1
them by mmncing ow msdon holghr7"
t h e d i s t r i b u t i o n a t t h e bottom o f F i g u r e 1 t h a t a l l a s s i s t a n t p r o f e s s o r s
from t h e
leth
t o t h e 42"
( i n c l u d i n g those ranked 35th and 36th) have
values on t h e v a r i a b l e equal t o one p u b l i c a t i o n d u r i n g t h e l a s t two years.
Accordingly, t h e median value on t h e pub1 i c a t i o n s v a r i a b l e equals "1."
AN ASIDE: The modal value on t h e p u b l i c a t i o n s v a r i a b l e a l s o equals "1."
Note t h a t t h i s does
not
u s u a l l y happen.
when t h e mode = median = mean.
An even more s p e c i a l case would be
T h i s case c o u l d o n l y occur when t h e
v a r i a b l e ' s d i s t r i b u t i o n i s p e r f e c t l y symmetric (whereby t h e mean would
equal t h e median), and has highest frequency a t t h e mean/median.
12
K. A few more comments about MEDIANS:
1. I f t h e number o f your observations i s odd, t h e n t h e median i s t h e middle
o b s e r v a t i o n a f t e r you have ordered them.
(NOTE: Your v a r i a b l e must be
a t l e a s t ORDINAL f o r you t o o r d e r them!)
E.g.,
say we had a sample o f
f i v e j u n i o r f a c u l t y and we ordered them according t o t h e number o f t h e i r
pub1 i c a t i o n s :
0 0 1 3 5
Here, t h e median ( o r middle o b s e r v a t i o n ) would c l e a r l y equal "1."
2. I f o u r sample were o f s i x a s s i s t a n t professors, t h e median would be
calculated a l i t t l e differently.
Imagine t h a t our d a t a a r e as f o l l o w s :
0 0 1 2 3 5
The median would be t h e average o f t h e TWO middle observations.
I.e.,
1.5
=
(1
+
2)/2
.
3. Note t h a t t h e median i s t h e value a t t h e 5oth PERCENTILE o f y o u r
distribution.
a. A general formula f o r d e t e r m i n i n g t h e value one f i n d s a t t h e kt h
percentile o f a d i s t r i b u t i o n i s t h i s d i s t r i b u t i o n ' s value a t the
th
observation,
where
n
=
t h e t o t a l number o f one's observations.
N o t i c e how t h i s
formula works n i c e l y f o r both t h e even and odd examples i l l u s t r a t e d
above.
b. The formula works f i n e , o f course, u n t i l you f i n d t h a t t h e v a l u e a t
t h e 9gth p e r c e n t i l e i s t h e value a t t h e 67.7th
100 ]
+t
( = [ ( 96 x 70 )
i
) observation, which f a l l s somewhere between 4 ( t h e 67th
observation) and 41 ( t h e 6ath o b s e r v a t i o n ) .
I n t h i s case, you would
want a value t h a t equals 4 p l u s 0.7 o f t h e d i s t a n c e between 4 and 41.
Because t h e d i s t a n c e from 4 t o 41 i s 37, t h e v a l u e a t t h e 9gth
p e r c e n t i l e would equal 29.9 ( = 4
+
[ 0.7
*
( 41 - 4 ) ] )
publications.
4. Beside t h e median, t h e two next most i m p o r t a n t p e r c e n t i l e s are t h e 25th
and 75th percentiles-also
distribution.
known as t h e lower and upper QUARTILES o f a
The d i f f e r e n c e between t h e upper ( o r 3rd)
and l o w e r ( o r
l S t ) q u a r t i l e s i s c a l l e d t h e INTERQUARTILE RANGE ( o r IQR).
To c a l c u l a t e t h e I Q R o f o u r d i s t r i b u t i o n o f a s s i s t a n t professors, we
must f i n d t h e number o f p u b l i c a t i o n s o f t h e
would be 1 ( j u s t b a r e l y ) and 2 p u b l i c a t i o n s .
lath
and 53rd persons.
That
The I Q R would be t h e
d i f f e r e n c e between these numbers, o r 1 pub1 i c a t i o n .
Here
the meaninq of
t h e i n t e r a u a r t i l e ranqe i s t h a t t h e m i d d l e 50% o f t h e j u n i o r f a c u l t y i n
y o u r sample d i f f e r i n t h e i r numbers o f p u b l i c a t i o n s i n t h e l a s t two
years by no more than one p u b l i c a t i o n .
L. The I Q R belongs t o a l a r g e r c l a s s o f MEASURES OF DISPERSION (i.e.,
measures
t h a t i n d i c a t e how " w i d e l y dispersed" one's d a t a a r e on a p a r t i c u l a r
measure-here,
on t h e d i s p e r s i o n o f p u b l i c a t i o n s by t h e j u n i o r f a c u l t y ) .
Two o t h e r measures o f d i s p e r s i o n a r e t h e f o l l o w i n g :
1. RANGE
- the
simplest measure, b u t a l s o t h e l e a s t i n f o r m a t i v e , s i n c e i t
o n l y i n d i c a t e s t h e d i f f e r e n c e between t h e h i g h e s t and lowest values.
The range o f p u b l i c a t i o n s i s 41-0=41, which i s a range l a r g e enough t o
suggest t h a t t h e r e may be one o r more o u t l i e r s i n t h e data.
(An OUTLIER
i s an i d i o s y n c r a t i c a l l y l a r g e [ o r s m a l l ] value on a v a r i a b l e . )
advantage o f r e p o r t i n g a RANGE i s t h a t i t shows y o u r reader t h e
14
The
magnitude o f any o u t l i e r s you may have.
( I N PRACTICE one u s u a l l y j u s t
r e p o r t s t h e i n d i v i d u a l o u t l i e r s i n one's d a t a and drops them from
analysis.)
2. The VARIANCE - t h e averase sauared d e v i a t i o n from t h e mean.
It i s t h e
measure o f d i s p e r s i o n a p p r o p r i a t e t o i n t e r v a l and r a t i o v a r i a b l e s .
The
STANDARD DEVIATION i s t h e p o s i t i v e square r o o t o f t h e variance.
The symbol f o r t h e p o p u l a t i o n v a r i a n c e i s t h e l o w e r case sigma squared
2
(ox).
The sample v a r i a n c e i s r e f e r r e d t o w i t h t h e l o w e r case " s "
2 ).
squared ( s X
The symbol f o r t h e estimated p o p u l a t i o n v a r i a n c e i s t h e
"2
lower case sigma squared w i t h a " h a t " over i t (ox).
The symbols f o r t h e
p o p u l a t i o n , sample, and "estimated p o p u l a t i o n " standard d e v i a t i o n s a r e
r e s p e c t i v e l y t h e same, b u t n o t squared (i.e.,
s u p e r s c r i p t e d "2").
t h e same, b u t w i t h o u t t h e
Formulas f o r these variances a r e as f o l l o w s :
NOTE: We s h a l l t a l k about t h e l a s t formula's "n-1" w i t h i n t h e n e x t few
weeks.
REMEMBER:
C e r t a i n measures o f c e n t r a l tendency and o f d i s p e r s i o n are
a p p r o p r i a t e t o c e r t a i n types o f v a r i a b l e s .
WITH WHICH!!!
Be sure you KNOW WHICH GOES
When means a r e used as measures o f c e n t r a l tendency,
variances should be used as measures o f d i s p e r s i o n ; when medians a r e
used as measures o f c e n t r a l tendency, i n t e r q u a r t i l e ranges should be
used as measures o f d i s p e r s i o n .
Note a l s o t h a t t h e r e i s
d i s p e r s i o n when t h e mode i s t h e a o ~ r o p r i a t emeasure
of
no
measure
of
c e n t r a l tendency
( a t l e a s t n o t when d i s p e r s i o n i s conceived as v a r i a t i o n "above" o r
"below" t h e v a r i a b l e ' s mode).'
T h i s i s because none o f a n o m i n a l - l e v e l
v a r i a b l e ' s a t t r i b u t e s can be r e f e r r e d t o as " c l o s e r t o " o r " f u r t h e r
from" another o f t h e v a r i a b l e ' s a t t r i b u t e s .
Accordingly, i t i s
meaningless t o speak o f these a t t r i b u t e s as being "dispersed" along some
common continuum.
O f course, a nominal v a r i a b l e ' s v a r i a b i l i t y can be measured by n o t i n g
t h e e x t e n t t o which i t s u n i t s o f a n a l y s i s a r e evenly p r o p o r t i o n e d among i t s
a t t r i b u t e s . For example, t h e r e i s more v a r i a b i l i t y i n r e l i g i o n when h a l f are
P r o t e s t a n t and h a l f C a t h o l i c than when 90 percent a r e P r o t e s t a n t . Yet such
measures (e.g., t h e Index o f Q u a l i t a t i v e V a r i a t i o n ) do n o t c o n s t i t u t e measures
o f d i s p e r s i o n as d e f i n e d here ( i . e . , measures o f v a r i a t i o n along a s i n g l e
dimension above and below a v a r i a b l e ' s measure o f c e n t r a l tendency).
Download