Chapter 4: Variability

advertisement
COURSE: JUST 3900
INTRODUCTORY STATISTICS
FOR CRIMINAL JUSTICE
Chapter 4:
Variability
Instructor:
Dr. John J. Kerbs, Associate Professor
Joint Ph.D. in Social Work and Sociology
© 2013 - - DO NOT CITE, QUOTE, REPRODUCE, OR DISSEMINATE WITHOUT
WRITTEN PERMISSION FROM THE AUTHOR:
Dr. John J. Kerbs can be emailed for permission at kerbsj@ecu.edu
Variability


The goal for variability is to obtain a measure of
how spread out the scores are in a distribution.
 Variability indicates if scores are clustered
together or spread out over a large distance
 Variability is usually defined in terms of
distance
 Example: Distance between two scores or
distance between a score and a mean
(average)
A measure of variability usually accompanies a
measure of central tendency as basic descriptive
statistics for a set of scores.
Central Tendency and Variability
Central tendency describes the central
point of the distribution, and variability
describes how the scores are scattered
around that central point.
 Together, central tendency and variability
are the two primary values that are used to
describe a distribution of scores.

Variability

Variability serves both as a descriptive
measure and as an important component
of most inferential statistics.
As a descriptive statistic, variability measures
the degree to which the scores are spread out
or clustered together in a distribution.
 In the context of inferential statistics,
variability provides a measure of how
accurately any individual score or sample
represents the entire population.

Variability (Continued)
When the population variability is small, all
of the scores are clustered close together
and any individual score or sample will
necessarily provide a good representation
of the entire set.
 On the other hand, when variability is large
and scores are widely spread, it is easy for
one or two extreme scores to give a
distorted picture of the general population.

Variability (Example)
Measuring Variability

Variability can be measured with
1. The range
 2. The variance
 3. The standard deviation


In all cases, variability is determined by
measuring distance.
The Range

The range is the total distance covered by the
distribution, from the highest score to the lowest
score (using the upper and lower real limits of the
range).
 RANGE = URL for X max – LRL for X min
 If
scores span values 1 to 5, then the range is 5.5 to
0.5 = 5 Points

When scores are measured in whole numbers,
the range is also a measure of the number of
discrete measurement categories:


1, 2, 3, 4, 5 = 5-pt range
Many computer programs use alternative
definition as calculated by upper minus lower
value: 5 – 1 = 4
The Range (Continued)

Problems with the use of range as a measure of
variability in a distribution
 The range does not consider all scores in a
distribution, which makes it less than optimal for
describing an entire distribution
 Thus, the range is considered a crude and
unreliable measure of variability.
 Although the textbook does not care about which
definition you use to measure the range, you
need to know how you can calculate all three
using the 1) range formula, 2) whole number
approach for discrete categories, and 3)
computer approach (see prior slide for details)
The Standard Deviation
Standard deviation measures the
standard (average) distance between a
score and the mean.
 The calculation of standard deviation can
be summarized as a four-step process:

The Standard Deviation (Continued)
1. Compute the deviation (distance from the mean) for each
score - - a.k.a., the deviation score = (X – μ)
2. Square each deviation: (X - μ) 2
3. Compute the mean of the squared deviations. For a
population, this involves summing the squared deviations
(sum of squares, SS) and then dividing by N. The
resulting value is called the variance (σ2) or mean square
and measures the average squared distance from the
mean.
For samples, variance (s2) is computed by dividing the sum of the
squared deviations (SS) by n - 1, rather than N. The value, n - 1,
is know as degrees of freedom (df) and is used so that the sample
variance will provide an unbiased estimate of the population
variance.
4. Finally, take the square root of the variance to obtain the
standard deviation.
Calculating The Standard Deviation
Standard Deviation for Population

Two ways to calculate: 1) using definitional formula,
and 2) using calculation formula.
 Use definitional formula for SS when the mean
(μ) is a whole number (μ = 8/4 = 2)
 Variance (σ2) = SS/N = 22/4 = 5.50
 SS = Σ(X - μ)2 = 1+4+16+1 = 22
 Standard Deviation (σ) =
=
Score
(X)
Deviation
(X - μ)
Squared Deviation
(X - μ) 2
1
(1 - 2) = - 1
(-1)2 = 1
0
(0 - 2) = - 2
(-2)2 = 4
6
(6 - 2) = +4
(+4)2 = 16
1
(1 - 2) = - 1
(-1)2 = 1
Standard Deviation for Population

Two ways to calculate: 1) using definitional formula,
and 2) using calculation formula.
 Use computational formula for SS when the
mean (μ) is not a whole number
 Example for scores 3, 1, 5, 1 (μ = 10/4 = 2.5)
 SS = Σ X 2 – [(ΣX ) 2 / N ]
SS = (9+1+25+1) – [(10) 2 / 4 ]
SS = 36 – 25 = 11
 Variance (σ2) = SS/N = 11/4 = 2.75
 Standard Deviation (σ) =
Standard Deviation for Sample




As compared to the calculations for populations, the
variance (and thus the standard deviation) for samples is
computed by dividing the sum of the squared deviations
(SS) by n - 1, rather than N.
The value, n - 1, is know as degrees of freedom (df), which
determine the number of scores in the sample that are
independent and free to vary.
DF is used so that the sample variance will provide an
unbiased estimate of the population variance.
 Without correcting the denominator in the calculations for
variance with the proper degrees of freedom, the
variance will be under-estimated (i.e., biased).
All of the prior steps for calculating SS still apply in relation
to both the use of definitional and computational formulas,
except that you will replace μ with M and N with n
Standard Deviation for Sample

Sample Variance



Sample Standard Deviation


a.k.a., Estimated Population Variance
In the context of inferential statistics, the variance that exists in a set
of sample data is often classified as error variance to indicate that
the variance represents unexplained or uncontrolled differences
between scores.
a.k.a., Estimated Population Standard Deviation
As “estimates,” there is always the potential for biases.


A sample statistic is biased if the average value of the statistic
either underestimates or overestimates the corresponding
population parameter.
A sample statistic is unbiased if the average value of the statistic
equals the population parameter.
 NOTE: The average value of the statistic is obtained from all the
possible samples for a specific sample size, n.
Standard Deviation for Sample



Consider the following sample (n = 7, M = 5)
Score
(X)
Deviation
(X - M)
Squared Deviation
(X - M) 2
1
(1-5) = - 4
(-4)2 = 16
6
(6-5) = + 1
(+1)2 = 1
4
(4-5) = - 1
(-1)2 = 1
3
(3-5) = - 2
(-2)2 = 4
8
(8-5) = + 3
(+3)2 = 9
7
(7-5) = + 2
(+2)2 = 4
6
(6-5) = + 1
(+1)2 = 1
The sample mean (M) = 35/7 = 5
Because the mean is a whole number, you can use
the definitional formula for the sum of squares (SS)
Standard Deviation for Sample




Consider the following sample (n = 7, M = 5)
Score
(X)
Deviation
(X - M)
Squared Deviation
(X - M) 2
1
(1-5) = - 4
(-4)2 = 16
6
(6-5) = + 1
(+1)2 = 1
4
(4-5) = - 1
(-1)2 = 1
3
(3-5) = - 2
(-2)2 = 4
8
(8-5) = + 3
(+3)2 = 9
7
(7-5) = + 2
(+2)2 = 4
6
(6-5) = + 1
(+1)2 = 1
SSsample = 16+1+1+4+9+4+1 = 36
Variance (s2) = SS/(n-1) = 36/(7-1) = 36/6 = 6
Standard Deviation (s) =
Standard Deviation for Sample
with Computational Formula




Consider the following sample (n = 7, M = 5)
Score
Score
Σ X = 35
(X)
Squared (X 2 )
1
1
(Σ X) 2 = 1225
6
36
2 = 211
Σ
X
4
16
3
9
8
64
7
49
6
36
Do NOT confuse the
SS denominator with the
denominator of the
sample variance (s2)
SSsample = Σ X 2 - [(Σ X) 2/n] = 211- (1225/7) = 211-175 = 36
Variance (s2) = SS/(n-1) = 36/(7-1) = 36/6 = 6
Standard Deviation (s) =
Properties of Standard Deviation

If a constant is added to every score in a distribution, the
standard deviation will not be changed.
 If you visualize the scores in a frequency distribution
histogram, then adding a constant will move each score
so that the entire distribution is shifted to a new location.
 The center of the distribution (the mean) changes, but the
standard deviation remains the same.

If each score is multiplied by a constant, the standard
deviation will be changed - - i.e., the standard deviation will
be multiplied by the same constant.
 Multiplying by a constant will multiply the distance
between scores, and because the standard deviation is a
measure of distance, it will also be multiplied.
The Mean and Standard Deviation
as Descriptive Statistics
If you are given numerical values for the mean
and the standard deviation, you should be
able to construct a visual image (or a sketch)
of the distribution of scores.
 As a general rule, about 70% of the scores will
be within one standard deviation of the mean,
and about 95% of the scores will be within a
distance of two standard deviations of the
mean.

Download