Lecture Notes

advertisement
Psych 5500/6500
Measures of Central Tendency
Fall, 2008
1
Measures of Central Tendency
Various ways of indicating the most typical or
average score.
1. Mean
2. Median
3. Mode
2
The Mean
Y

Y
n
' Y' is themean of Y
‘n’ is the number of scores in the sample
n
 Y or  Y
i 1
i
meanssum all of thescores
Some people use ‘n’ to represent the size of a sample and ‘N’ to
represent the size of a population. I use either ‘n’ or ‘N’ apparently
3
arbitrarily and then I depend upon context to make it clear.
Summation Symbols
Y = 3, 4, 5, 8
n=4
n
 Y  3  4  5  8  20
Yi
i 1
i
Y1
3
Y2
4
Y3
5
 Y impliesthatyou willstart with
Y4
8
thefirst number and add all the way to
3
Y  4  5  9
i 2
i
thelast number.
4
The Mean (Computation
Example)
Y

Y
n
Y : 5, 4,1, 3, 2, 7
 Y  5  4  1  3  2  7  22
n 6
22
Y   3.6666666.
..  3.67
6
5
Rounding Conventions
The sample mean was 3.666666….with an infinite number of 6’s to the
right of the decimal point. I would like to establish the following
rules about rounding off your answers:
1.
Go at least two places to the right of the decimal point (e.g.
rounding off at 3.66 or 3.666 are ok but 3.6 is not). If you are
using SPSS or having your calculator keep track of your
intermediate calculations it won’t be rounding off at all and that is
fine.
2.
If the first number after that is ‘5’ or greater round up, if it is ‘4’ or
less don’t round up. Thus 3.666 is rounded to 3.67, while 3.333 is
rounded to 3.33
Now, if you know something about the topic of ‘significant figures’ this
policy doesn’t make any sense. It will, however, keep all of you in
the same ballpark when it comes to computing answers and
handing them in to be graded.
6
Mean (Interesting Property #1)
The mean is the balance point of a frequency
distribution
Y  1, 2, 2, 3, 3, 3, 5, 6
Y  3.13
7
Effect of Outliers
One extreme score can have a big effect on the mean.
Changelast 6 toa 10, Y  1, 2, 2, 3, 3, 3, 5,10 and Y  3.63
8
Outliers (cont.)
Changelast 6 to a 50, Y  1, 2, 2, 3, 3, 3, 5, 50 and Y  8.63
Thus one outlier can dramatically affect the mean, making it no
longer an effective representation of the majority of the scores.
9
Outliers and Skewed Data
An extreme score (extreme when compared to the
other scores in the distribution) is called an
outlier. A distribution that has a number of
extreme scores off in just one direction is said to
be skewed. In general the mean is not a good
measure of central tendency when you have an
outlier or with skewed data as it is affected by the
extreme scores off in one direction, making it no
longer representative of the majority of the scores.
10
The Median
The median is the middle score, the score that
half of the scores are less than and half of
the scores are greater than.
11
The Median (Computation)
Step 1: First put the scores in order from
smallest to largest.
Step 2: If n is odd then the median is the one
score in the middle, if n is even then the
median is the mean of the two middle
scores.
12
Median (Computation example)
Example when ‘n’ is odd.
Y = 1, 6, 5, 3, 2, 4, 2
Step 1:
1, 2, 2, 3, 4, 5, 6
Step 2: as n is odd (n=7) there is one score in
the middle. The median = 3.
13
Median (Computation example)
When when ‘n’ is even
Y = 12, 9, 10, 8, 11, 7
Step 1:
7, 8, 9, 10, 11, 12
Step 2: as n is even (n=6) there are two
scores in the middle, the median =
(9+10)/2=9.5
14
Median (Interesting Property #1)
The median divides the area of the histogram
into two equal parts.
15
Effect of Outliers
The median is not affect by an outlier.
16
Median: Special Case
Sometimes, when the median is a value that
occurs more than once in the data, then the
simple formula I gave doesn’t quite work. For
example, say your data are:
Y = 1, 2, 2, 2, 3, 4
The median is ‘2’ but there is only one score below
‘2’ while there are two scores above ‘2’. In this
case a median of 2 does not divide the area of
the distribution into two equal pieces (see next
slide).
17
Note we have 1+1/2+1/2+1/2 = 2.5 boxes below the median, while we have
2+1/2+1/2+1/2 = 3.5 boxes above the median.
18
If we tweak the value of the median a tad, then we get 1+2/3+2/3+2/3=3 boxes below
the median, and 2 + 1/3+1/3+1/3= 3 boxes above the median.
19
Final Word on Median
The ‘tweaking’ of the median to preserve its
definition of dividing the area of the
distribution into two equal parts is rarely
done. Usually the simpler formula I have
given (arrange the scores then find the
middle of that list) is used, this is what
SPSS does. Consequently, we will state
that the median of Y = 1, 2, 2, 2, 3, 4 is ‘2’.
20
Mean, Median, and Skewed
Data
• The median is often preferred over the mean
when you have skewed data.
• Price of homes:
$100,000
$130,000
$160,000
$180,000
$2,200,000
Mean = $554,000
Median = $160,000
21
The Mode
The mode is the score that occurs the most.
Y= 2, 4, 5, 5, 5, 7, 8, 9
Mode = 5
Sometimes there is no mode, sometimes there
is more than one mode.
22
Mode (Semi-Interesting Property)
The mode is the peak of a histogram
23
Bimodal Distributions
The term bimodal is used when there are two peaks in the
distribution even if both peaks aren’t exactly the same
size. On a survey question measuring people’s views
on a very controversial topic—one that few people feel
neutral about--you might get a clump of low scores
(with its own mode) and a clump of high scores (with
its own mode) and the distribution could be called
bimodal even if the two peaks are not identical in
height (see graph below).
24
Nominal Scales and Central
Tendency
Racial background:
1=African American
2=Asian American
3=European American
4=Native American
Y= 1, 1, 2, 4
Mean=2 Median=1.5 Mode=1
Only the mode makes sense.
25
Ordinal Scales and the Mean
Size of household debt:
1=None ($0)
2=Tiny ($1 to $500)
3=Very Small ($501-$1000)
4=Large ($1000+)
One person had a debt of $200 (Y=2) and one person had a
dept of $2,000,000 (Y=4)
Y= 2, 4
Mean=3 you are saying that the average debt in the sample
was ‘Very Small’. This obviously isn’t working.
26
Ordinal Scales and the Median
Size of household debt:
1=None ($0)
2=Tiny ($1 to $500)
3=Very Small ($501-$1000)
4=Large ($1000+)
Y= 1, 3, 4
Median=3 you are saying that half the sample had a debt that
was very small or less, and half had a debt that was very
small or larger. This makes sense.
27
Ordinal Scales and the Mode
Size of household debt:
1=None ($0)
2=Tiny ($1 to $500)
3=Very Small ($501-$1000)
4=Large ($1000+)
Y= 1, 2, 3, 3, 3, 4
Mode=3 you are saying that the score that happened the
most in the sample was a ‘3’, this also makes sense.
28
Rank Scales and Central Tendency (1)
Within a sample:
Order of finish in a foot race: Y = 1,2,3,4
Mean=2.5, Median=2.5, no mode
You will get exactly the same values anytime you
race four people, so what good are they?
29
Rank Scales and Central Tendency (2)
When rank scores are used it is usually within a somewhat more
complicated experimental design (e.g. one involving two groups).
An example would be to take ten out-of-shape people, randomly
divide them into two groups of 5 people each, have one group do a
lot of training, then have all ten run a race and measure how they
place in the race (a rank measure). The data might look like this:
Training group: Y = 1, 2, 4, 5, 7
No training group: Y = 3, 6, 8, 9, 10
median = 4
median = 8
It looks like the ‘training group’ placed better in the race than the ‘no
training group’. To compare the performance of the two groups you
could compare the medians of the two groups (it would be
inappropriate to use the means of the groups because these are
ordinal-type numbers). There is no mode so you can’t use that.
30
Cardinal Scales and Central
Tendency
How many magazines various households
subscribe to:
Y= 1, 1, 2, 4
Mean=2 Median=1.5 Mode=1
They all make sense.
31
Selecting a Measure of Central
Tendency
1. The most important guideline for selecting which
measure of central tendency to use is to select
the one that does the best job of representing the
data given what you are trying to determine.
Sometimes you would be more interested in
knowing that most families in some sample had 2
children than you would in knowing that the
average child per household was 2.43. Common
sense, what you need to know, and which
measure best represents what you need to know,
will all determine which measure(s) you select.
32
Selecting a Measure of Central
Tendency
2. By far, more statistical tools (including the
ones we will be covering in this class) are
developed around the mean than for any
other measure of central tendency. Also,
more people understand the mean as ‘the
average’ than they do the other measures.
33
Selecting a Measure of Central
Tendency
3. The median does a better job than the
mean at describing skewed data. There
are many more tools you can apply to the
mean, however, and so it may make more
sense to make the data be less skewed so
you can use the mean. We will learn how
to deskewify the data later in this class
(don’t try to find that word in dictionary).
34
Selecting a Measure of Central
Tendency
4. The measurement scale you use might
determine which measure of central
tendency would be appropriate (see the
earlier slides)
35
Download