Fall, 2008
Psych 5500/6500
Measures of Variability
1
Measures of Variability
We will look at three ways of measuring how much the scores differ from each other.
1. Mean Absolute Deviation
2. Variance
3. Standard Deviation
These approaches are based upon the concept that you can tell how much the scores differ from each other by looking at how much they differ from the mean.
2
Spatial Analogy
The two figures below represent the location of students within two rooms. In room A the variability of location is large, in room B it is small.
3
Spatial Analogy (cont.)
How might we want to measure the variability of location? One solution would be to find the average distance each student is from every other student. These distances are reflected below.
4
Spatial Analogy (cont.)
A simpler procedure would be to find the geographic center of the students, and measure the average distance each student is from that.
5
Variability
In a similar fashion, if we want to know how close scores are to each other we could measure how close each score is to every other score. A simpler approach would be to measure how close each score is to the mean (which is located in the center of the scores). If all the scores are similar in value then they will all be close to the mean, if the scores differ a great deal some will be far from the mean.
6
Variability: Example 1
Sample : Y
11, 7, 10, 9, 11, 12.
Y
10.
Step 1: determine how far each score is from the mean
Y Y = deviation from mean
11 10 =
7 10 =
10 10 =
9 10 =
11 10 =
12 10 =
1
-3
0
-1
1
2
7
Mean (Interesting Property #2)
Given our spatial analogy, it would make sense to add up the distances (deviations) from the mean, and find the average deviation as our measure of variability. This does not work however, as the sum of the deviations from the mean (e.g.
1 + -3 + 0 + -1 + 1 + 2) always equals zero. The mean is the only value for which this is true (i.e. plug any number other than 10 into the table on the previous slide and you will not get a sum of deviations equal to zero). The problem here--the reason we end up with an average distance of zero--is that we allow there to be both negative and positive distances from the mean, it would make more sense not to allow that.
8
Mean Absolute Deviation
One solution would be to use the absolute value of the deviations, and find the average (mean) of those deviations as our measure of variability.
Y Y = deviations |deviations|
11 10 =
7 10 =
10 10 =
9 10 =
11 10 =
12 10 =
1
-3
0
-1
1
2
0
1
1
1
3
2 mean deviation
(1
3
0
1
1
2)/6
1.33
9
Variance
Another approach, that used by the ‘variance’, is to square each deviation, and find the average
(mean) of those.
Y Y = deviations deviations ²
11 10 = 1 1
7 10 =
10 10 =
9 10 =
11 10 =
12 10 =
-3
0
-1
1
2
9
0
1
1
4
Mean squared deviation = (1+9+0+1+1+4)/6=2.67
10
Variance (defined)
The variance, then, is the average squared distance each score is from the mean. This is technically stated as ‘the mean squared deviation from the mean’. The spatial analogy still applies, but we square each distance before finding the average.
11
Variance (continued)
The variance will be the mean of the squared deviations
(1+9+0+1+1+4)/6=16/6=2.67. If the scores are similar to each other the mean squared deviation will be small. If the scores differ a lot the mean squared deviation will be larger.
Y Y = deviations deviations ²
11 10 = 1 1
7 10 =
10 10 =
9 10 =
11 10 =
12 10 =
-3
0
-1
1
2
9
0
1
1
4
12
Sum of Squares
To find the variance we need to sum the squared deviations and divide by N, that sum of squared deviations has a name.
Y
11
7
10
9
11
12
-
-
-
-
-
-
Y
10
10
10
10
10
10
= deviations
=
=
=
=
=
=
1
-3
0
-1
1
2 deviations ²
0
1
1
9
1
4
“Sum of the squared deviations” = 1+9+0+1+1+4=16
This is usually abbreviated to “Sum of Squares”, or simply
“SS”.
13
Variance (computation)
The process we used to compute SS is called the
' definition al formula' for SS, and is : SS
(Y Y )
2
The symbol for the
S
2
SS n
(Y n
-
variance
Y )
2
of the sample is S
2
.
16
2.67
6
Note that as we are summing squared values there is no way for
SS (or the variance) to be a negative number.
14
N or (N-1) ?
We have defined the variance of our sample as being:
S
2
SS
N
You may have encountered a similar, but different, formula for variance that has (N-1) in the denominator. That is actually something different, and we will be covering it in the next lecture. Note that when SPSS gives you ‘variance’ it uses the
(N-1) formula.
15
Variance: Example 2
Let’s look at another sample:
Example 2: Y = 5, 10, 18, 8, 7, 12
Compare that to our first sample where:
Example 1: Y = 11, 7, 10, 9, 11, 12
Note that example 2 has greater variability among its scores, as variance measures variability the variance of example 2 should be greater than the variance of example 1.
16
Example 2 (Cont.)
Y = 5, 10, 18, 8, 7, 12. Again the mean is 10.
Y
5
10
18
8
7
12
-
-
-
-
-
-
Y
10
10
10
10
10
10
= deviations
=
=
=
=
=
=
8
-2
5
0
-3
2
Note how much the score of 18 added to the SS when its deviation of 8 was squared.
deviations ²
25
0
64
4
9
4
SS= 106
S
2
106
6
17.67
17
Effect of scores that are far from the mean
Because the variance is the average squared distance each score is from the mean, scores that are far from the mean have a disproportionate effect on the variance. A score that is 1 away from the mean adds 1 2
=1 to the SS, a score that is 10 away from the mean adds 10 2 = 100 to the SS.
18
Mean (Interesting Property #3)
The mean of the scores will give you the smallest possible ‘sum of squared deviations’. In other words, if you used any number other than the mean (10) to compute
SS in the previous examples then the resulting value of SS would have been larger.
19
Variance: Example 3
Y = 10, 10, 10, 10, 10, 10. Again the mean is 10.
Note the scores are identical, variance should be zero.
Y
10 -
Y
10
= deviations
= 0 deviations ²
0
10
10
10
-
-
-
10
10
10
=
=
=
0
0
0
0
0
0
10
10 -
10
10
=
=
0
0
0
0
SS= 0
S
2
0
6
0
20
Formulas for SS
The definitional formula for SS has the advantage of making it clear just exactly what ‘SS’ is, the ‘sum of the squared deviations’:
Definition al formula : SS
(Y Y )
2
The definitional formula has the disadvantage of being slow and cumbersome for large data sets. A much faster way to compute SS using a calculator is with the computational formula.
Computatio nal formula : SS
Y
2
Y
2
n
21
Computational Formula
Y
11, 7, 10, 9, 11, 12
Computatio nal formula : SS
Y
2
Y
2
n
Y
Y
2
11
2
11
7
Y
2
2
7
2
10
10
9
2
9
11
2
3600 n
12
11
2
6
12
2
60
616
SS
616
3600
6
616
600
16
22
Interpreting Variance
So what does knowing, for example, that a sample has a variance of ’10’ tell us about the sample?
Well, it tells us that the average squared distance each score is from the mean is 10.
The variance also has meaning when it comes to comparing two samples. If sample A had a variance of 6 and sample B had a variance of 8, then the scores in sample B varied more than the scores in sample A.
And finally, if the variance of sample equals zero that tells us that all the scores were identical.
23
Standard Deviation
The last measure of variability we will consider is the standard deviation.
It is simply the positive square root of the variance. Its symbol is ‘S’.
S
S
2
Example 1 : S
2.67
1.63
Example 2 : S
Example 3 : S
17.67
4.20
0
0
24
Interpreting Standard Deviation
So what does knowing, for example, that a sample has a standard deviation of ‘4’ tells us about the sample? Well, it tells us that the square root of the average squared deviation from the mean is ‘4’. As we will see in a future lecture, knowing the standard deviation is both interesting and comprehendible. Until then...
25
Interpreting Standard Deviation
(cont.)
...at least you know that, as with the variance, if sample A has a standard deviation of 3 and sample B has a standard deviation of
5, then the scores in sample B differed more than the scores in sample A.
And that if a sample had a standard deviation of zero that means that all of the scores were identical.
26