Chapter 5: Variability and standard (z) scores

advertisement
Chapter 5: Variability and Standard (z) Scores
How do we quantify the variability of the scores in a sample?
5
Frequency
4
3
2
1
0
55
60
65
70
75 80 85 90 95 100 105 110 115
Ice Dancing Score
Method 1: range: difference between the highest and lowest scores
111.15
108.55
106.6
103.33
100.06
97.38
96.67
96.12
92.75
89.62
85.36
84.58
83.89
83.12
80.47
80.3
79.31
76.73
74.25
72.01
68.87
63.73
59.64
Example: The range of ice dancing scores is 111.15-59.64 = 51.51 points
The range is easy to calculate, but it really only depends on
two scores. So it’s not a very informative or reliable
measure of variability.
Ice Dancing , compulsory dance scores, Winter Olympics
Method 2: The semi-interquartile range (Q): One half of the distance between P75 and P25.
Score
111.15
108.55
106.6
103.33
100.06
97.38
96.67
96.12
92.75
89.62
85.36
84.58
83.89
83.12
80.47
80.3
79.31
76.73
74.25
72.01
68.87
63.73
59.64
Percentile rank
98
93
89
85
80
76
72
67
63
59
54
50
46
41
37
33
28
24
20
15
11
7
2
Q1 = P25
Q3 = P75
Example: ice dancing scores
 p  PL 

SL  ( SH  SL)
 pH  pL 
 25  24 
Q1  P25  76.73  (79.31  76.73)
  77.38
 28  24 
 75  72 
Q3  P75  96.67  (97.38  96.67)
  97.20
 76  72 
Q
Q3  Q1 97.20  77.38

 9.91
2
2
Ice Dancing , compulsory dance scores, Winter Olympics
Method 3: Variance: the mean of the squares of the deviation scores
deviation score: The difference between a score and the mean of the scores
SS X   ( X  X ) 2

2
X
(X  

s X2 
X
)2
N
2
(
X

X
)

n
Sums of squared deviation scores

SS X
N

SS X
Formula for variance of a sample of scores
n
Formula for variance of a population of scores
s X2 
2
(
X

X
)

n

SS X
n
Example: find the variance of this sample of 7 numbers: 5,3,1,6,2,8,3
S
2
X
(X  X )


n
2
(5  4) 2  (3  4) 2  (1  4) 2  (6  4) 2  (2  4) 2  (8  4) 2  (3  4) 2


7
(1) 2  (1) 2  (3) 2  (2) 2  (2) 2  (4) 2  (1) 2

7
1  1  9  4  4  16  1 36

 5.14
7
7
Calculating variance this way can be tedious.
Fortunately there’s a shortcut for calculating SSx:
SS X   ( X  X ) 2   X 2
Sum of
squared
deviations
from the
mean
Sum of
squares

X


2
n
Sum squared
divided by n
SS X   ( X  X ) 2   X 2

X


2
n
Example: from this sample of 7 numbers: 5,3,1,6,2,8,3
 ( X  X )  (5  4) 2  (3  4)2  (1  4)2  (6  4)2  (2  4)2  (8  4)2  (3  4)2 
2
(1) 2  (1) 2  (3) 2  (2) 2  (2) 2  (4) 2  (1) 2 
1  1  9  4  4  16  1  36
 X 
2
X
2
X
2
 (5  3  1  6  2  3) 2  282  784
 52  32  12  62  22  82  32  148
 X 

2
n
 148 
784
 148  112  36
7
SS X   ( X  X ) 2   X 2

X


2
n
Example: calculate the variance of this sample of 10 numbers:
8
6
X
8
6
3
7
1
7
7
8
9
10
3
7
1
7
7
8
9 10
X2
n=
64
36
9
49
1
49
49
64
81
100
10
502
66
4356
435.6
SS X   X
 X 

2
2
n

66.4
6.64
standard deviation: the square root of the variance
X 
SX 
(X  
N
X
)2

(X  X )
n
SS X
N
2

SS X
n
Formula for standard deviation for a population of scores
Formula for standard deviation for a sample of scores
The standard deviation has the same units as the original scores (e.g. points, inches, etc.)
Warning! Point of future confusion!
The definition of variance and standard deviation has an (or N) in the denominator.
X 
SX 
(X  
X
)2
N
(X  X )

SS X
N

SS X
n
2
n
Later when we get in to inferential statistics, we’ll start dividing by n-1:
sX 
(X  X )
n 1
2

SS X
n 1
The first definition is the true average of the squared deviance from the mean. But this
number a biased estimate of the variance of the population.
Divide by ‘n’ when you just want the standard deviation of our sample (or population).
Divide by ‘n-1’ when you want to estimate the standard deviation of the population.
Example: calculate the standard deviation of this sample of 10 numbers:
8
6
X
8
6
3
7
1
7
7
8
9
10
3
X2
64
36
9
49
1
49
49
64
81
100
7
1
7
7
8
9 10
n=
10
502
66
4356
435.6
66.4
6.64
2.58
Example: calculate the standard deviation of this sample of 20 numbers:
X
X2
8
6
3
7
1
7
7
8
9
10
3
2
3
4
6
3
2
8
3
1
64
36
9
49
1
49
49
64
81
100
9
4
9
16
36
9
4
64
9
1
n=
20
663
101
10201
510.05
152.95
7.65
2.77
Characteristic
Frequency of use
Range
Semiinterquartile
range
Standard
deviation
Some
Very little
Almost always
Very little
Very little
Great
Sampling stability
Worst
OK
Best
Use with skewed
distributions
Not so good
OK
Interpret with
caution
Most closely
related central
tendency
None
Median
Mean
Use with open
ended
distributions
No
OK
No
Affected by
sample size
Yes
No
No
OK
OK
Mathematical
tractability
Ease of calculation Easy
Fun facts about the standard deviation:
Adding a constant to each number in a sample does not change
the standard deviation (or variance)
SX+b = SX
Multiplying each number in a sample by a constant multiplies the
standard deviation by that same constant.
SaX = aSX
How big is a standard deviation?
For a normal (bell-shaped) distribution:
68.2% of the values fall within one standard deviation of the mean
95.4% of the values fall within two standard deviations of the mean
99.7% of the values fall within three standard deviations of the mean
1 standard deviation
above and below the
mean is where the
bend of the curve
switches (the
‘inflection point’)
Guess the mean and standard deviation
160
140
120
100
80
60
40
20
0
80
90
100
Score
110
120
Guess the mean and standard deviation
Mean= 99, s.d. = 8.0
160
140
120
100
80
60
40
20
0
80
90
100
Score
110
120
Guess the mean and standard deviation
60
50
40
30
20
10
0
0
50
Score
100
Guess the mean and standard deviation
Mean= 60, s.d. = 27.3
60
50
40
30
20
10
0
0
50
Score
100
Guess the mean and standard deviation
150
100
50
0
-400
-300
-200
-100
Score
0
100
200
Guess the mean and standard deviation
Mean= -99, s.d. = 99.7
150
100
50
0
-400
-300
-200
-100
Score
0
100
200
Guess the mean and standard deviation
150
100
50
0
497
498
499
500
Score
501
502
Guess the mean and standard deviation
Mean= 500, s.d. = 1.0
150
100
50
0
497
498
499
500
Score
501
502
Guess the mean and standard deviation
100
80
60
40
20
0
-4
-2
0
2
Score
4
6
Guess the mean and standard deviation
Mean= 1, s.d. = 1.9
100
80
60
40
20
0
-4
-2
0
2
Score
4
6
Standard Scores (z scores)
Sometimes it is useful to compare scores across distributions that have
different means and standard deviations. A common way to do this is
to convert the scores into standard deviation units, or ‘z scores’.
The goal is to modify all of the scores so that the new mean is equal to
zero, and the new standard deviation equal to one.
To make the new mean zero, we subtract the mean from all scores.
Remember this shifts the mean but doesn’t change the standard
deviation.
To make the new standard deviation equal to 1, we divide all scores by
the standard deviation. This would normally change the mean, but
since it’s zero, it doesn’t change.
Here’s the formula for changing a sample of scores, X to z:
Example: Convert the following ten scores to z scores
Step 1, calculate the mean and standard deviation:
X
X2
23
4
12
42
62
93
7
23
8
54
529
16
144
1764
3844
8649
49
529
64
2916
n=
10
18504
328
107584
10758.40
7745.60
774.56
27.83
32.80
Example: Convert the following ten scores to z scores
Step 2, for each score, subtract the mean and divide by the standard deviation
X
23
4
12
42
62
93
7
23
8
54
27.83
-9.80
-28.80
-20.80
9.20
29.20
60.20
-25.80
-9.80
-24.80
21.20
-0.35
-1.03
-0.75
0.33
1.05
2.16
-0.93
-0.35
-0.89
0.76
32.80
Check for yourself that the mean of z is 0, and the standard deviation is 1.
Z-transforming your scores doesn’t affect the shape of the distribution.
Mean= 80, s.d. = 33.0
Mean= 0, s.d. = 1
100
100
80
80
60
60
40
40
20
20
0
0
50
100
Score
150
0
-2
-1
0
1
z score
2
3
Relative frequency
The standard normal distribution
-4
-3
-2
-1
0
z score
1
2
3
4
The standard normal distribution is a continuous distribution.
It has a mean of 0 and a standard deviation of 1
The total area under the curve is equal to 1
Table A (page 436) gives you the proportion of scores for given ranges in the standard normal
area =0.3413
Column 2
Area between 0 and z
-3
-2
-1
0
z
1
2
3
Column 3
Area above z
area =0.1587
-3
-2
-1
0
z
1
2
3
Download