Measures of Dispersion - Lyndhurst School District

advertisement
Chapter 5:
Measures of Dispersion
• Dispersion or variation in statistics is the degree to
which the responses or values obtained from the
respondents differ from each other. (spread is
another interchangeable term)
• Dispersion helps us to measure the extent to which
the values or responses deviate from each other.
• The tools used to measure the amount of variability
or spread in the categories or values of a variable are
called the measures of dispersion or measures of
variability.
Example: Which one of these data sets has more
variability?
A: 22, 24, 25, 28, 30
B: 1, 5, 12, 20, 32
Data set B is clearly more spread out.
Measures of Dispersion for Numerical Variables
• Dispersion in the case of a numerical variable is
the degree to which the numbers in a distribution
vary from the typical, or average value.
• Three typical methods to measure dispersion for
numerical variables are:
– Range
– Interquartile range
– Standard deviation
Range
• The range is the difference between the
highest value and lowest value in a data set.
• It is the simplest way to obtain information
about dispersion.
• Daily temperature is an example of range as it
is usually given in the form of the highest and
lowest values.
Example: Compute the range for the following
set of data: 12, 3, 17, 21, 11, 18, 20, 19, 6, 15
Range: 21 – 3 = 18
• There are a few disadvantages to using range:
1) The range does not give any information
about the distribution of the observations
within its two endpoint values.
2) The range can be skewed by outliers, just as
the mean can be.
For example, add the value 200 to the above
data set. Now the range is 197. That is not
representative of the data.
Interquartile Range
• Interquartile range is the range of the middle
50% of a distribution after the bottom 25%
and top 25% have been eliminated.
• Interquartile range is more resistant to outliers
than the range.
• In order to calculate the IQR, we first have to
find the quartiles of the distribution, which
are the values that divide the distribution into
four groups of roughly the same size.
Steps to calculate quartiles
1) Put the data in numerical order and find the
median (this also happens to be the second
quartile)
2) Find the median of the values whose position in
the ordered list is to the left of the median. This
value is the first quartile.
3) Find the median of the values whose position in
the ordered list is to the right of the median. This
value is the third quartile.
Note: When the number of observations is odd,
don’t include the median value in the calculations in
steps 2 and 3.
• If the position of a quartile is between two numbers,
average those numbers as you would to find a median.
• Subtract the first quartile from the third quartile to get
the IQR.
• When dealing with frequency distributions, there are
useful formulas to help us figure out the position of the
quartiles.
• If the position is a whole number, then the quartile lies on
that exact one observation.
• If the position is a decimal, then the quartile is the average.
Example: Find the interquartile range.
5, 6, 5, 10, 11, 16, 15, 13
Order the data: 5, 5, 6, 10, 11, 13, 15, 16
The median is 10.5
Q1 is 5.5.
Q3 is 14
IQR=14-5.5=8.5
Example: Find the IQR for the following data.
Standard Deviation
• Standard deviation measures the variability in
a distribution using the squared deviations
from the mean.
• A deviation is the difference between the
score and the mean of the data set.
• Deviations can be positive, negative, or 0.
Example: Find the deviations and the sum of the
deviations for the following data set: 25, 26, 30.
• Since the sum of the deviations from the mean is
always zero, their average will always be zero. We
need a different way to measure the average
deviations from the mean.
• To counteract that problem, we will square each
deviation and then find the average of the squared
deviations.
The Variance
• The variance is the average of the squared
deviations from the mean.
• Standard deviation is the square root of the
variance.
• The variance (and as a result standard deviation)
is calculated differently for the population and the
sample.
Notation
Steps to find the variance.
1) Find the sum of your data set.
2) Find the mean.
3) Calculate the deviations.
4) Square the deviations.
5) Sum the squared deviations.
6) Divide the sum of the squared deviations by
n if working with the population or n-1 if
working with the sample.
To find the standard deviation, take the square
root of step 6.
Example: Find the variance and the standard
deviation for the sample data: 25, 26, 30.
You try. Find the variance and standard
deviation for the population data: 2, 4, 8, 14.
Short-cut Formula for Variance and SD
• There are two quantities in the short-cut
formula that we need to clarify.
• Note: For the population variance and standard
deviation, divide by n stead of n-1
• Let’s retry the previous example of the sample
of 25, 26, and 30. We know the variance
should come out to be 7 and the sd 2.65.
Now try the example with 2, 4, 8, and 14.
Remember we are dealing with a population!
Download