Uploaded by Terry Ricky

# Measure of Central Tendency -Ungrouped and Grouped data

advertisement ```CHAPTER 3
MEASURES OF CENTRAL TENDENCY
CHAPTER 3: MEASURES OF CENTRAL TENDENCY
We can get a better understanding of a data set if we can locate the middle or centre of the
data, and also get an indication of its spread or dispersion. Knowing one of these without the
other is often of little use.
There are three statistics that are used to measure the centre of a dataset. They are the mode,
the mean and the median.
3.1
Measures of central tendency
If a dataset relates to more than one variable, one may construct a frequency distribution for
each of the variables. After producing a frequency distribution for a particular variable, the
next step in summarizing the values recorded for that variable is to indicate the point where
the “centre of the distribution” (in some sense) lies, ie. where the values in the dataset tend to
cluster. This is the purpose of a measure of central tendency, or average, or mean.
Since the term “centre of a distribution” may be interpreted in several different ways, there
are several different measures of central tendency, including arithmetic mean (usually just
called “mean” or “average”), geometric mean, harmonic mean, median, and mode.
A measure of central tendency may be calculated for a dataset relating to a population of N
units or for a dataset relating to a sample of n units. The formulas in the following sections
use capital letters and relate to populations; similar formulas using lower case letters relate to
samples.
3.2
Definition of the arithmetic mean
The arithmetic mean of a dataset is the value of the variable that each unit would have if
every unit to which the dataset relates had the same value and the total of their values was the
same as the actual total for the dataset, ie. it is the common value that every unit would have
if the total of the dataset were re-allocated equally amongst all the units to which the dataset
relates.
3.3
Calculating the mean of raw data
To calculate the (arithmetic) mean for a particular variable, add the values of all the
observations for that variable and divide by the number of observations.
Formula:
3.4
X   X  / N
where X denotes the value of an observation
Calculating the mean of an ungrouped frequency distribution
The procedure for calculating the mean in this case is as follows.
Step 1: Multiply each possible value (X) by its frequency (F), in order to calculate the class
total, ie. the total of all the observations in the class.
Step 2: Add all these products, in order to obtain the grand total of all the observations in the
dataset.
Step 3: Divide the grand total by the total frequency (N), ie. the total number of observations.
13
CHAPTER 3
MEASURES OF CENTRAL TENDENCY
Formulas:
N  F
and
X   FX / N
where X denotes the value of the variable for each class and F denotes the class
frequency.
Example 3.4.1:
X
6
7
8
9
34
F
1
4
3
2
1
FX
6
28
24
18
34
Total
11
110
N = 11
X   FX / N
= 110/11
= 10
Additional Examples
Example 3.4.2
The number of faulty products returned to an electrical goods store over a 21 day period is:
3
5
4
3
4
5
9
9
For this data set, find the: a. mean
8
8
8
6
6
3
b. median
4
7
c. mode
7
1
9
d. 60th percentile
Solutions
3  4  4  9  8  8  .........1 113

 5.38 faulty products
21
21
n 1
b. Median = as n  21,
 11 , therefore from the ordered set:
2
113333444556677888999
the 11th score is 5.
a. mean 
_
x
c. Mode is 3 which occurs the most often
60
d. P60 =
 100  12.6  13th score = 6
100
P60= 60/100X21=12.6=13th score=6
Example 3.4.3
If 6 people have a mean mass of 53.7 kg, find their mass.
Solutions
mass
 53.7kg
6
 sum of masses = 53.7x6 =322.2 kg
14
1
3
CHAPTER 3
3.5
MEASURES OF CENTRAL TENDENCY
Calculating the mean of a grouped frequency distribution
In a grouped frequency distribution, there is a loss of information; the precise value of each
observation is not known - only the class within which it falls, ie. the class limits.
In order to do any calculations relating to the distribution, it is necessary to use the
class midpoint to represent, or estimate, every observation in the class. In formulas, we use X
to refer to the class midpoint.
The procedure for estimating the mean in this case is as follows.
Step 1: Calculate the midpoint of each class.
Step 2: Multiply each class midpoint by its frequency, in order to estimate the class total.
Step 3: Add all these products, in order to estimate the grand total of all the observations in
the dataset.
Step 4: Divide the estimated grand total by the total frequency.
N  F
Formulas:
and
X   FX / N
where X denotes the midpoint of each class and F denotes the class frequency.
Example 3.5.1:
Estimate the mean for the data in Example 1.11.2 on roller bearings.
Diameters of a set of roller bearings
Diameter
(nearest
mm)
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 - 79
Total
X   FX / N
Number of
bearings
(F)
7
16
30
14
8
3
2
Class midpoint
(X)
FX
14.5
24.5
34.5
44.5
54.5
64.5
74.5
101.5
392.0
1,035.0
623.0
436.0
193.5
149.0
80
2,930.0
= 2,930.0/80 = 36.6 mm
3.6
Properties of the arithmetic mean
(a)
It is affected by outliers, or extreme observations, ie. values that are very different
from all the other values in the dataset.
Example 3.6.1:
In Example 2.4.1, if the value 34 (which is an outlier) is removed from
the distribution, the mean becomes 76/10 = 7.6, which is much less than the original
mean of 10.
15
CHAPTER 3
MEASURES OF CENTRAL TENDENCY
(b)
It uses all the values in the dataset.
(c)
It has a convenient mathematical formula.
(d)
It presents a problem if the distribution is open-ended, ie. if it has a class such as
“Less than 20” or “Over 500”. One cannot determine the midpoint of such a class, so
one has to make an arbitrary judgement about what value to use as an estimate of the
class midpoint.
3.7
Definition of the median
The median of a dataset is the value of the variable that divides the observations into
two groups of equal size: the half whose values are less than the median and the half whose
values are greater than the median. It is, of course, equal to the second quartile and the 50th
percentile.
For a frequency distribution, the median class is the class that contains the median.
3.8
Calculating the median
For raw data, the procedure for calculating the median is as follows.
Step 1: Arrange the observations in order, from lowest to highest.
Step 2: If N is odd, find the middle observation; if N is even, find the middle two
observations and average them.
For frequency distributions, it is necessary to determine which class is the median
class. From there, one can determine or estimate the median.
For an ungrouped frequency distribution, the procedure for calculating the median is
as follows.
Step1: Construct the cumulative frequency distribution.
Step2: Determine the median class, which is the class containing the observation whose rank
is N/2 (rounded to a whole number, of course).
Step 3: Determine the median, which is the common value of all the observations in the
median class.
For a grouped frequency distribution, the procedure for estimating the median is as
follows.
Step 1: Construct the cumulative frequency distribution.
Step 2: Determine the median class, which is the class containing the observation whose rank
is N/2 (rounded to a whole number, of course).
Step 3: Estimate the median by using the following formula.
Formula:
Me = LL + W(N/2 - CB)/F
where
LL = lower real limit of median class
W = width of median class
N = total number of observations
CB = cumulated frequency below median class
F = frequency of median class
16
CHAPTER 3
Example 3.8.1:
MEASURES OF CENTRAL TENDENCY
Estimate the median for the data in Example 1.11.2 on roller bearings.
Diameters of a set of roller bearings
Diameter
(nearest
mm)
Number of
bearings
(F)
Cumulative
frequency
(&lt;)
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 - 79
7
16
30
14
8
3
2
7
23
53
67
75
78
80
Total
80
Median class is 30 – 39
(because it contains the 24th
to 53rd observations and thus
includes the 40th and 41st)
LL = 29.5
W = 10
N/2 = 40
CB = 23
F = 30
Me = 29.5 + 10(40 - 23)/30
= 35.2 mm
3.9
Properties of the median
(a)
It is not affected by the sizes of the observations, especially extreme values. For
example, the example in Section 2.6 shows that the mean is affected by an extreme
value but the median for that dataset is 8 (the sixth value) regardless of whether the
highest value in the dataset is 34, 10 or 500.
(b)
It is readily used with frequency distributions having open-ended classes. Again, the
calculation of the median is not affected at all by the sizes of the values in the first or
last classes.
(c)
Its formula is not very convenient mathematically.
(d)
It can be used for data measured on an ordinal type of scale (where the mean cannot
be used). For example, the answers to an opinion-type question is a survey might be
17
CHAPTER 3
MEASURES OF CENTRAL TENDENCY
Answer
Strongly agree
Agree
Undecided
Disagree
Strongly disagree
Total
No. of respondents
14
38
6
18
11
Cumulative frequency
14
52
58
76
87
87
The median value is the 44th, which is “Agree”, but one cannot calculate an
arithmetic mean for this data, of course.
3.10 Definition of the mode
The mode of a dataset is the observed value that occurs most frequently, ie. it is the
value of the variable that has the highest frequency.
For a frequency distribution, the modal class is the class that has the highest
frequency.
It is sometimes observed that a distribution has two modal classes, ie. two classes
share the highest frequency. Such a distribution is called bi-modal.
3.11 Calculating the mode
If given raw data, one must firstly construct an ungrouped frequency distribution.
Thereafter, the procedure for determining the mode is as follows.
Step 1: Determine the modal class.
Step 2: Determine the mode, which is the common value of all the observations in the modal
class.
For a grouped frequency distribution, the procedure for estimating the mode is as
follows.
Step 1: Determine the modal class.
Step 2: Estimate the mode by using the following formula.
Formula:
Mo = LL + Wd1/(d1 + d2)
where
LL = lower real limit of modal class
W = width of modal class
d1 = absolute difference between modal class frequency and
frequency of the immediately preceding class
d2 = absolute difference between modal class frequency and
frequency of the immediately following class
18
CHAPTER 3
MEASURES OF CENTRAL TENDENCY
Example 3.11.1:
Estimate the mode for the data in Example 1.11.2 on roller bearings.
Diameters of a set of roller bearings
Diameter
(nearest
mm)
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 - 79
Number of
bearings
(F)
7
16
30
14
8
3
2
Total
80
Modal class is 30 – 39
LL = 29.5
W = 10
d1 = 30 - 16 = 14
d2 = 30 - 14 = 16
Mo = 29.5 + 10*14/(14 + 16)
= 34.2 mm
3.12 Properties of the mode
(a)
It is not affected by the sizes of the observations, especially extreme values.
(b)
It is readily used with frequency distributions that have open-ended classes.
(c)
It is not always “central”, eg. for a skewed distribution or a bi-modal distribution.
(d)
It may be affected by the choice of class width when the distribution was constructed.
(e)
Its formula is not very convenient mathematically.
(f)
It can be used for data measured on a nominal type of scale (where the mean and
median cannot be used). For example, the modal value of the following distribution,
which relates to respondents’ colour preferences, is “green”:
Colour
Blue
Green
Purple
Red
Yellow
Total
Number of respondents
37
44
14
25
11
131
3.13 Comparison of measures of central tendency
If a distribution is symmetric and uni-modal, the three measures we have considered all have
the same value. If the distribution is skewed, they are generally not equal. In a positively
skewed distribution, they usually occur in the order: mode, median and mean; in a negatively
skewed distribution, they usually occur in the reverse order.
19
CHAPTER 3
MEASURES OF CENTRAL TENDENCY
Tutorial exercises
1. The following data shows the purchase price of each vehicle in a company’s fleet in
kina (X): 1,700; 2,000; 3,000; 3,000; 8,100; 1,500; 2,000; 2,800; 3,000; 3,700; 1,700;
2,000; 6,500; 2,900; 3,000; 4,200.
(a) Define a variable, Y = X/100 and calculate the mean, median and mode values for Y.
(b) Use the answers in (a) to calculate the mean, median and mode values for X.
2.
The following table relates to readings of atmospheric pressure taken on 50 days by a
weather office.
Pressure
(millibars)
986 to 990
991 to 995
996 to 1,000
1,001 to 1,005
1,006 to 1,010
1,011 to 1,015
1,016 to 1,020
Number of days
3
5
10
14
9
6
3
(a)
If X denotes the pressure readings, estimate the mean, median and mode values for X.
(b)
Define another variable, Y, by subtracting 1,000 from every X-value, ie.
Y = X - 1,000 for every observation. Draw up the frequency distribution for Y.
(c)
Estimate the mean, median and mode values for Y.
(d)
Compare the measures calculated in (c) with the corresponding measures calculated
(for X) in (a).
20
```