Chapter 3: Summary measures

advertisement
MR. MARK ANTHONY GARCIA, M.S.
MATHEMATICS DEPARTMENT
DE LA SALLE UNIVERSITY
MEASURES OF CENTRAL
TENDENCY
The measure of central tendency
measures the centrality or center of a data
set. These measures are the mean,
median and the mode.
Mean
The mean of a numeric variable is
calculated by adding the values of all
observations in a data set and then
dividing that sum by the number of
observations in the set. This provides the
average value of all the data.
Mean: Sample and Population
Mean = sum of all observations divided
by the number of observations
Example: Mean
Mount Rival hosts a soccer tournament
each year. This season, in all their 10
games, the lead scorer for the home
team scored 7, 5, 0, 7, 8, 5, 5, 4, 1 and
5 goals. What was the mean score?
Example: Mean
Since we included all the 10 games of Mount
Rival, we compute the population mean. It is
given by
π‘₯
µ=
𝑁
7 + 5 + 0 + 7 + 8 + 5 + 5 + 4 + 1 + 5 47
=
=
10
10
= 4.7
Example: Mean
A marathon race was completed by more
than 100 participants. A sample of 5
participants were taken and their race
times were recorded (in hours) as follows:
2.7, 8.3, 3.5, 5.1 and 4.9. What is the
mean race time of the 5 participants?
Example: Mean
The mean race time of the sample of the 5
participants is 4.9 hours.
π‘₯ 2.7 + 8.3 + 3.5 + 5.1 + 4.9
π‘₯=
=
= 4.9
𝑛
5
Interpreting the Mean
Consider the mean first quiz scores for two
different freshman sections in Science:
Section X
72.7
Section Y
69.8
Which of the two sections got a higher
score?
Interpreting the Mean
The following are the times (in minutes)
that new employees need to learn a job
from two companies and their means:
Mean
Company A: 25, 19, 30, 27, 22
24.6
Company B: 24, 23, 27, 29, 24
25.4
Which company has fast-learning new
employees?
Remark: Mean
The mean is the most commonly used
measure of central tendency and is the
best measure when comparing two or
more sets of data. However, it is affected
by extreme values.
Median
The median of a set of observations
arranged in an increasing or decreasing
order of magnitude is the middle value
when the number of observations is odd or
the arithmetic mean of the two middle
values when the number of observations is
even.
Example: Median
On 5 term tests in sociology, a student has
made grades of 82, 93, 86, 92, and 79.
Find the median for this test grades.
No. of observations: n = 5 (Odd)
Ascending Order: 79, 82, 86, 92, 93
Median: 86
Example: Median
The nicotine contents for a random sample
of 6 cigarettes of a certain brand are found
to be 2.3, 2.7, 2.5, 2.9, 3.1, and 1.9
milligrams. Find the median.
Example: Median
No. of observations: n = 6 (Even)
Ascending Order:
1.9, 2.3, 2.5, 2.7, 2.9, 3.1
2.5 + 2.7
π‘€π‘’π‘‘π‘–π‘Žπ‘› =
= 2.6
2
Position of the Middle Values
ο‚ž
If the number of observations (n) is odd,
then the position of the middle value is
𝑛+1
the
π‘‘β„Ž observation.
2
ο‚ž
If the number of observations (n) is
even, then the position of the two middle
𝑛
𝑛
values are the
π‘‘β„Ž and
+ 1 π‘‘β„Ž
2
observations.
2
Interpreting the Median
Suppose that the median of the final exam
score of a particular class in Mathematics
is 62 points. What is the implication of this
value?
This means that 50% of the class scored
62 or more in the Mathematics final exam.
Remark: Median
The median is not affected by extreme
values. It is the best measure of center in
terms of position in an arranged sequence.
Often, it is used for curving or adjusting
values to fit in a normal distribution.
Mode
The mode of a set of observations is that
value which occurs most often or with the
greatest frequency.
Example: Mode
The following are the IQ scores of 10
teenagers:
89, 82, 84, 82, 87, 95, 79, 84, 82, 87
The mode of the data set is 82 with the
highest frequency 3.
Remark: Mode
For some sets of data, there may be
several values occurring with the
greatest frequency in which case we
have more than one mode.
ο‚ž The mode does not always exist. This is
certainly true if each distinct observation
occur with the same frequency.
ο‚ž
Example: Mode
Consider the following sets of data:
Data Set 1: 10, 20, 20, 30, 40, 40, 50, 60
Data Set 2: 7, 3, 6, 4, 6, 4, 3, 7, 4, 6, 3
For data set 1, there are two modes, 20
and 40. For data set 2, there are three
modes, 3, 4 and 6.
Example: Mode
Consider the following sets of data:
Data Set 1: 90, 97, 98, 97, 90, 98
Data Set 2: 89, 88, 92, 95, 98, 97, 91, 94
The two sets of data have no modes.
Remark: Mode
The mode is not the best measure of
center since not all data sets can possess
this value. However, the mode is the only
measure that may also be used for
qualitative data.
Comparing two sets of data
Consider the following sets of data:
1st set of data: 9, 10 and 11
Mean = 10
2nd set of data: 1, 10 and 19
Mean = 10
What is the difference between the two
sets of data?
Measures of Variation
Any measure describing how spread the
given observations relative from the
mean is a measure of variation. These
measures are the range, variance and
standard deviation.
Measures of Variation: Situation
Consider the following measurements,
in liters, for two samples of orange juice
bottled by companies A and B. In which
company would you buy based on the
following values?
Sample A 0.97 1.00 0.94 1.03
Sample B 1.06 1.01 0.88 0.91
1.11
1.14
Range
The range is the difference between the
highest and lowest value of the data set.
ο‚ž However, it is not a good measure of
variability.
ο‚ž In the previous table, the range of the
orange juice bottle contents for
companies A and B are 0.17 and 0.26
respectively.
ο‚ž
Variance and Standard Deviation
Population Variance or squared deviation
Variance and Standard Deviation
Sample Variance
Variance and Standard Deviation
Example: Variance and Standard
Deviation
Sample A 0.97 1.00 0.94 1.03
Sample B 1.06 1.01 0.88 0.91
1.11
1.14
Company A: 𝑠 2 = 0.0043 and 𝑠 = 0.065
Company B: 𝑠 2 = 0.0115 and 𝑠 = 0.107
Example: Variance and Standard
Deviation
Based from the variance and standard
deviation, company A consistently bottles
orange juice according to its advertised
volumes because it has a lower standard
deviation and lower variance.
Interpreting Variance and
Standard Deviation
Given the variances and standard
deviations of two or more sets of data, we
say that a set of data is consistent or has
less variability or is less dispersed
whenever it has the lowest variance or
standard deviation.
Measures of Position
The measures that describe or locate the
position of certain non-central pieces of
data relative to the entire set of data is
measure of position or measure of relative
standing. Some of these measures are the
percentiles and quartiles.
Percentiles
Percentiles divide the set of data into 100
equal parts. This is usually denoted by 𝑃𝑖 ,
which indicates the ith percentile of the
data set.
Position of 𝑃𝑖
The formula for the position of 𝑃𝑖 is given
by
𝑖
100
𝑛+1
where 𝑛 is the number of observations in
the set of data.
Example: Percentiles
The following are the scores of 11 students
in an exam:
65, 78, 85, 98, 54, 62, 72, 76, 83, 70, 69
Compute for the following:
1. 𝑃25
2. 𝑃60
Computing Percentiles
Arrange the observations in the set of
data in increasing order.
2. Compute for the position of 𝑃𝑖 .
3. If position is a whole number, then
identify the number in that position from
the set of data. Otherwise, we use
interpolation.
1.
Example: Percentiles
Arrangement of the set of data in
increasing order:
54, 62, 65, 69, 70, 72, 76, 78, 83, 85, 98
Example: Percentiles
For 𝑃25 ,
Position of 𝑃25 =
25
100
11 + 1 = 3
This means that 𝑃25 is the 3rd observation
in the arranged set of data. Hence,
𝑃25 = 65
Interpreting Percentiles
Since 𝑃25 = 65, we can say that 25% of
the 11 students have scores lower than 65.
Equivalently, we say that 75% of the 11
students have scores higher than 65.
Example: Percentiles
For 𝑃60 ,
Position of 𝑃60 =
60
100
11 + 1 = 7.2
Since position is not exact, we use
interpolation.
Example: Percentiles
Thus,
𝑃60 = 7π‘‘β„Ž + (0.2) 8π‘‘β„Ž − 7π‘‘β„Ž
𝑃60 = 76 + 0.2 78 − 76 = 76.4
Hence, 60% of the 11 students scored
76.4 or less. Equivalently, 40% of the 11
students scored more than 76.4.
Percentiles
A percentile is best described as a
comparison score. It’s a common term in
all kinds of testing of data, but many will be
most familiar with percentiles as they
relate to standardized testing in schools
Quartiles
Quartiles divides the set of data into 4
equal parts. This is usually denoted by 𝑄𝑖 ,
which indicates the ith quartile of the data
set.
Position of 𝑄𝑖
The formula for the position of 𝑄𝑖 is given
by
𝑖
4
𝑛+1
where 𝑛 is the number of observations in
the set of data.
Computing Quartiles
Arrange the observations in the set of
data in increasing order.
2. Compute for the position of 𝑄𝑖 .
3. If position is a whole number, then
identify the number in that position from
the set of data. Otherwise, we use
interpolation.
1.
Example: Quartiles
Again, consider the following scores of 11
students in an exam
54, 62, 65, 69, 70, 72, 76, 78, 83, 85, 98
Compute for 𝑄1 and 𝑄3 .
Example: Quartiles
Position of 𝑄1 =
Position of 𝑄3 =
1
4
3
4
11 + 1 = 3
11 + 1 = 9
𝑄1 = 3π‘Ÿπ‘‘ = 65
𝑄3 = 9π‘‘β„Ž = 83
Example: Quartiles
We can conclude that 25% of the 11
students scored at most 65 since 𝑄1 =
3π‘Ÿπ‘‘ = 65 . Furthermore, 75% of the 11
students scored at most 83 since 𝑄3 =
9π‘‘β„Ž = 83.
Interquartile Range
The interquartile range (IQR) is defined
to be the range of the middle 50% of the
data set.
ο‚ž The formula for IQR is given by
ο‚ž
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
Example: IQR
Again, consider the following scores of 11
students in an exam
54, 62, 65, 69, 70, 72, 76, 78, 83, 85, 98
Compute the IQR.
Example: IQR
Since
𝑄1 = 65 and 𝑄3 = 83,
we have
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 83 − 65 = 18
BOX PLOT
A box plot or boxplot is a convenient way
of graphically depicting groups of
numerical data through their quartiles.
BOX PLOT
If the median line is at the center of the
box, then the data set is symmetric.
ο‚ž If the median line is closer to the first
quartile, then data set is skewed to the
right.
ο‚ž If the median line is closer to the third
quartile, then the data set is skewed to
the left.
ο‚ž
Measure of Skewness
A set of observations is symmetrically
distributed if its graphical representation
(histogram, bar chart) is symmetric with
respect to a vertical axis passing through
the mean. For a symmetrically distributed
population or sample, the mean, median
and mode have the same value. Half of all
measurements are greater than the mean,
while half are less than the mean.
Histogram: Symmetric
Measure of Skewness
A set of observations that is not
symmetrically distributed is said to be
skewed. It is positively skewed if a greater
proportion of the observations are less
than or equal to the mean; this indicates
that the mean is larger than the median.
The histogram of a positively skewed
distribution will generally have a long right
tail; thus, this distribution is also known as
being skewed to the right.
Histogram: Skewed to the Right
Measure of Skewness
On the other hand, a negatively skewed
distribution has more observations that are
greater than or equal to the mean. Such a
distribution has a mean that is less than
the median. The histogram of a negatively
skewed distribution will generally have a
long left tail; thus, the phrase skewed to
the left is applied here.
Histogram: Skewed to the Left
Measure of Skewness
The formula for the coefficient of skewness
is given by
3 π‘₯ − π‘šπ‘’π‘‘π‘–π‘Žπ‘›
𝑆𝐾 =
𝑠
Interpreting Skewness
If SK = 0, the data has a symmetric
distribution.
ο‚ž If SK > 0, the data set has a positively
skewed distribution. This means that
more than 50% in the data set that are
less than the mean.
ο‚ž If SK < 0, the data set has a negatively
skewed distribution and this means that
there are more than 50% in the data set
are greater than or equal to the mean.
ο‚ž
Example: Skewness
Suppose that π‘₯ = 64, median = 62 and 𝑠 =
1.5 . Then the measure of skewness is
given by
3(π‘₯ − π‘šπ‘’π‘‘π‘–π‘Žπ‘›) 3(64 − 62)
𝑆𝐾 =
=
=4
𝑠
1.5
Example: Skewness
Since the measure of skewness or
coefficient of skewness SK = 4, we say
that the data set is positively skewed or
skewed to the right which means that more
than 50% of the data set is less than the
mean π‘₯ = 64.
Download