Measures of Central Tendancy

advertisement
Measures of Central Tendency
Descriptive Statistics
Measures of Central Tendency
 A measure of central tendency is a value that
represents a typical, or central, entry of a data
set. The three most commonly used measures of
central tendency are:
 the mean
 the median
 the mode
The Mean
 The mean (arithmetic average)of a data set is the sum of the
data entries divided by the number of entries. To find the
mean of a data set, use one of the following formulas.
 Population (Parameter) Mean: μ = Ʃ
 Sample (Statistic) Mean: x = Ʃ
x/N
x/n
 The lowercase Greek letter μ (pronounced mu)
represents the population mean and x (read as “x bar”)
represents the sample mean. Note that N represents the
number of entries in a population and n represents the
number of entries in a sample.
Finding a Sample Mean
 The prices (in dollars) for a sample of room air conditioners are
listed. What is the mean price of the air conditioners?
$500 $840 $470 $480 $420 $440 $440
500 + 840 + 470 + 480 + 420 + 440 + 440 =3590 /7
= 512.9 or $512.90.
General Rounding Rule: In statistics the basic rounding rule is that
when computations are done in the calculations, rounding should not
be done until the final answer is calculated. When rounding is done
in the intermediate steps, it tends to increase the difference between
that answer and the exact one.
The Median
 The median of a data set is the value that lies in
the middle of the data when the data set is
ordered. If the data set has an odd number of
entries, the median is the middle data entry. If
the data set has an even number of entries, the
median is the mean of the two middle data
entries
Finding the Median
 Find the median of the air conditioner prices given in the
previous example.
$420 $440 $440 $470 $480 $500 $840
Because there are seven entries (an odd number), the median is
the middle, or fourth data entry. So therefore the median air
conditioning price is $470.00
$420 $440 $440 $470 $480 $500 $840
Finding the Median
 What if we added 600 to our data? Find the median of the air
conditioner prices given in this example.
$420 $440 $470 $480 $500 $600 $840
Because there are now eight entries (an even number), the median
is the middle, of the fourth and fifth data entry. Therefore we must
add the middle numbers and divide by 2 to find the median air
conditioning price.
$420 $440 $440 $470 $480 $500 $600 $840
=470 + 480/2
=$475
The Mode
 The mode of a data set is the data entry that
occurs with the greatest frequency ( 1 mode
=unimodal). If no entry is repeated, the data
set has no mode. If the two entries occur with
the same greatest frequency, each entry is a
mode and the data is called bimodal. More
than two modes is multimodal.
 The mode is the only measure of central
tendency that can be used to describe data at the
nominal level of measurement.
Finding the Mode
 Find the mode of the air conditioning prices in
our previous example.
420 440 440 470 480 500 840
From the ordered data, you can see that the entry
of 440 occurs twice, whereas the other data entries
occur once. So the mode of the air conditioning
prices is $440.00
Measures of Central Tendency
 Although the mean, the median, and the mode
each describe a typical entry of a data set, there
are advantages and disadvantages of using each,
especially when the data set contains outliners.
 Outliners is a data entry that is far removed from
the other entries in the data set.
Comparing the Mean, the Median and
Mode
 Find the mean, the median
Ages
in
a
Class
and the mode of the sample
ages of a class shown at the
left. Which measure of
central tendency best
describes a typical entry of
this data set? Are there any
outliners?
 Mean = 475/20 = 23.8
 Median = 21+22/2= 21.5
 Mode = The entry
occurring the greatest is
20.
20 20 20 20 20 20 21
21 21 21 22 22 22 23
23 23 23 24 24 65
Comparing the Mean, the Median and
Mode
 Mean = 475/20 = 23.8
Ages in a Class
 Median = 21+22/2= 21.5
 Mode = The entry occurring
the greatest is 20.
 Interpretation: The mean
takes every entry into
account but is influenced by
the outliner of 65. The
median also takes into
account every entry and it is
not affected by the outliner.
In this case the mode exists,
but it doesn’t appear to
represent a typical entry.
20 20 20 20 20 20 21
21 21 21 22 22 22 23
23 23 23 24 24 65
Graphical Comparison
Sometimes a graphical comparison can help you decide which measure of central tendency best
represents a data set. In this case the median best describes the data set.
Midrange
 The midrange is a rough estimate of the middle. It is found
by adding the lowest and the highest values in the data set and
dividing by 2. It is a very rough estimate of the average and
can be affected by one extremely high or low value.
Weighted Mean and Mean of Grouped
Data
 Sometimes data sets contain entries that have greater effect
on the mean than do other entries. To find the mean of such
data sets, you must find the weighted mean.
 A weighted mean is the mean of a data set whose entries have
varying weights. A weighted mean is given by
 Where w is the weight of each entry x.
Finding a Weighted Mean
 You are taking a class in
which your grade is
determined from 5 sources;
50% from your test mean,
15% 9 weeks test mean, 20%
for your semester exam, 10%
computer lab work, and 5%
homework.Your scores are 86
test mean, 96 nine weeks
test, 82 semester exam, 98
computer lab work and 100
homework. What is the
weighted mean of your
scores.
Source
Score,
x
Weight, w
xw
Test mean
86
0.50
43.0
Nine weeks
exam
96
0.15
14.4
Semester
Exam
82
0.20
16,4
Computer
Lab
98
0.10
9.8
Homework
100
0.05
5.0
Ʃw=1
Ʃ (x*w)=88.6
What if data is presented in a
frequency distribution?
 The mean of a frequency distribution for a sample is
approximated by where x and f are the midpoints and
frequencies of a class, respectively.
Sample
Population
Guidelines for finding the mean of a
frequency distribution
Finding the mean of a frequency
distribution
 Use the frequency
distribution at the right to
approximate the mean
number of minutes that a
sample of Internet
subscribers spent online
during their most recent
session.
2089.0/50 = 41.8
x
Frequency, f
(x*f)
12.5
6
75.0
24.5
10
245.0
36.5
13
474.5
48.5
8
388.0
60.5
5
302.5
72.5
6
435.0
84.5
2
169.0
n=50
Ʃ = 2089.0
The Shapes of Distributions
 A graph reveals several characteristics of a frequency distribution.
One such characteristic is the shape of the distribution.
 A frequency distribution is symmetric when a vertical line can
be drawn through the middle of a graph of the distribution and
the resulting halves are approximately mirror images.
 A frequency distribution is uniform (or rectangle) when all
entries, or classes, in the distribution have equal frequencies. A
uniform distribution is also symmetric.
 A frequency distribution is skewed if the “tail” of the graph
elongates more to one side than to the other. A distribution is
skewed left (negatively skewed) if its tail extends to the left. A
distribution is skewed right (positively skewed) if its tail
extends to the right.
Shapes cont…
 When a distribution is symmetric and unimodal, the mean,
median, and the mode are equal. If a distribution is skewed
left, the mean is less than the median and the median is
usually less than the mode. If a distribution is skewed right,
the mean is greater than the median and the median is
usually greater than the mode.
 The mean will always fall in the direction the distribution is
skewed. For instance, when the distribution is skewed left,
the mean is to the left of the median.
Examples
Properties and Uses of Central
Tendency
Download