Chapter 5: Variability and Standard (z) Scores How do we quantify the variability of the scores in a sample? 5 Frequency 4 3 2 1 0 55 60 65 70 75 80 85 90 95 100 105 110 115 Ice Dancing Score Method 1: range: difference between the highest and lowest scores 111.15 108.55 106.6 103.33 100.06 97.38 96.67 96.12 92.75 89.62 85.36 84.58 83.89 83.12 80.47 80.3 79.31 76.73 74.25 72.01 68.87 63.73 59.64 Example: The range of ice dancing scores is 111.15-59.64 = 51.51 points The range is easy to calculate, but it really only depends on two scores. So it’s not a very informative or reliable measure of variability. Ice Dancing , compulsory dance scores, Winter Olympics Method 2: The semi-interquartile range (Q): One half of the distance between P75 and P25. Score 111.15 108.55 106.6 103.33 100.06 97.38 96.67 96.12 92.75 89.62 85.36 84.58 83.89 83.12 80.47 80.3 79.31 76.73 74.25 72.01 68.87 63.73 59.64 Percentile rank 98 93 89 85 80 76 72 67 63 59 54 50 46 41 37 33 28 24 20 15 11 7 2 Q1 = P25 Q3 = P75 Example: ice dancing scores p PL SL ( SH SL) pH pL 25 24 Q1 P25 76.73 (79.31 76.73) 77.38 28 24 75 72 Q3 P75 96.67 (97.38 96.67) 97.20 76 72 Q Q3 Q1 97.20 77.38 9.91 2 2 Ice Dancing , compulsory dance scores, Winter Olympics Method 3: Variance: the mean of the squares of the deviation scores deviation score: The difference between a score and the mean of the scores SS X ( X X ) 2 2 X (X s X2 X )2 N 2 ( X X ) n Sums of squared deviation scores SS X N SS X Formula for variance of a sample of scores n Formula for variance of a population of scores s X2 2 ( X X ) n SS X n Example: find the variance of this sample of 7 numbers: 5,3,1,6,2,8,3 S 2 X (X X ) n 2 (5 4) 2 (3 4) 2 (1 4) 2 (6 4) 2 (2 4) 2 (8 4) 2 (3 4) 2 7 (1) 2 (1) 2 (3) 2 (2) 2 (2) 2 (4) 2 (1) 2 7 1 1 9 4 4 16 1 36 5.14 7 7 Calculating variance this way can be tedious. Fortunately there’s a shortcut for calculating SSx: SS X ( X X ) 2 X 2 Sum of squared deviations from the mean Sum of squares X 2 n Sum squared divided by n SS X ( X X ) 2 X 2 X 2 n Example: from this sample of 7 numbers: 5,3,1,6,2,8,3 ( X X ) (5 4) 2 (3 4)2 (1 4)2 (6 4)2 (2 4)2 (8 4)2 (3 4)2 2 (1) 2 (1) 2 (3) 2 (2) 2 (2) 2 (4) 2 (1) 2 1 1 9 4 4 16 1 36 X 2 X 2 X 2 (5 3 1 6 2 3) 2 282 784 52 32 12 62 22 82 32 148 X 2 n 148 784 148 112 36 7 SS X ( X X ) 2 X 2 X 2 n Example: calculate the variance of this sample of 10 numbers: 8 6 X 8 6 3 7 1 7 7 8 9 10 3 7 1 7 7 8 9 10 X2 n= 64 36 9 49 1 49 49 64 81 100 10 502 66 4356 435.6 SS X X X 2 2 n 66.4 6.64 standard deviation: the square root of the variance X SX (X N X )2 (X X ) n SS X N 2 SS X n Formula for standard deviation for a population of scores Formula for standard deviation for a sample of scores The standard deviation has the same units as the original scores (e.g. points, inches, etc.) Warning! Point of future confusion! The definition of variance and standard deviation has an (or N) in the denominator. X SX (X X )2 N (X X ) SS X N SS X n 2 n Later when we get in to inferential statistics, we’ll start dividing by n-1: sX (X X ) n 1 2 SS X n 1 The first definition is the true average of the squared deviance from the mean. But this number a biased estimate of the variance of the population. Divide by ‘n’ when you just want the standard deviation of our sample (or population). Divide by ‘n-1’ when you want to estimate the standard deviation of the population. Example: calculate the standard deviation of this sample of 10 numbers: 8 6 X 8 6 3 7 1 7 7 8 9 10 3 X2 64 36 9 49 1 49 49 64 81 100 7 1 7 7 8 9 10 n= 10 502 66 4356 435.6 66.4 6.64 2.58 Example: calculate the standard deviation of this sample of 20 numbers: X X2 8 6 3 7 1 7 7 8 9 10 3 2 3 4 6 3 2 8 3 1 64 36 9 49 1 49 49 64 81 100 9 4 9 16 36 9 4 64 9 1 n= 20 663 101 10201 510.05 152.95 7.65 2.77 Characteristic Frequency of use Range Semiinterquartile range Standard deviation Some Very little Almost always Very little Very little Great Sampling stability Worst OK Best Use with skewed distributions Not so good OK Interpret with caution Most closely related central tendency None Median Mean Use with open ended distributions No OK No Affected by sample size Yes No No OK OK Mathematical tractability Ease of calculation Easy Fun facts about the standard deviation: Adding a constant to each number in a sample does not change the standard deviation (or variance) SX+b = SX Multiplying each number in a sample by a constant multiplies the standard deviation by that same constant. SaX = aSX How big is a standard deviation? For a normal (bell-shaped) distribution: 68.2% of the values fall within one standard deviation of the mean 95.4% of the values fall within two standard deviations of the mean 99.7% of the values fall within three standard deviations of the mean 1 standard deviation above and below the mean is where the bend of the curve switches (the ‘inflection point’) Guess the mean and standard deviation 160 140 120 100 80 60 40 20 0 80 90 100 Score 110 120 Guess the mean and standard deviation Mean= 99, s.d. = 8.0 160 140 120 100 80 60 40 20 0 80 90 100 Score 110 120 Guess the mean and standard deviation 60 50 40 30 20 10 0 0 50 Score 100 Guess the mean and standard deviation Mean= 60, s.d. = 27.3 60 50 40 30 20 10 0 0 50 Score 100 Guess the mean and standard deviation 150 100 50 0 -400 -300 -200 -100 Score 0 100 200 Guess the mean and standard deviation Mean= -99, s.d. = 99.7 150 100 50 0 -400 -300 -200 -100 Score 0 100 200 Guess the mean and standard deviation 150 100 50 0 497 498 499 500 Score 501 502 Guess the mean and standard deviation Mean= 500, s.d. = 1.0 150 100 50 0 497 498 499 500 Score 501 502 Guess the mean and standard deviation 100 80 60 40 20 0 -4 -2 0 2 Score 4 6 Guess the mean and standard deviation Mean= 1, s.d. = 1.9 100 80 60 40 20 0 -4 -2 0 2 Score 4 6 Standard Scores (z scores) Sometimes it is useful to compare scores across distributions that have different means and standard deviations. A common way to do this is to convert the scores into standard deviation units, or ‘z scores’. The goal is to modify all of the scores so that the new mean is equal to zero, and the new standard deviation equal to one. To make the new mean zero, we subtract the mean from all scores. Remember this shifts the mean but doesn’t change the standard deviation. To make the new standard deviation equal to 1, we divide all scores by the standard deviation. This would normally change the mean, but since it’s zero, it doesn’t change. Here’s the formula for changing a sample of scores, X to z: Example: Convert the following ten scores to z scores Step 1, calculate the mean and standard deviation: X X2 23 4 12 42 62 93 7 23 8 54 529 16 144 1764 3844 8649 49 529 64 2916 n= 10 18504 328 107584 10758.40 7745.60 774.56 27.83 32.80 Example: Convert the following ten scores to z scores Step 2, for each score, subtract the mean and divide by the standard deviation X 23 4 12 42 62 93 7 23 8 54 27.83 -9.80 -28.80 -20.80 9.20 29.20 60.20 -25.80 -9.80 -24.80 21.20 -0.35 -1.03 -0.75 0.33 1.05 2.16 -0.93 -0.35 -0.89 0.76 32.80 Check for yourself that the mean of z is 0, and the standard deviation is 1. Z-transforming your scores doesn’t affect the shape of the distribution. Mean= 80, s.d. = 33.0 Mean= 0, s.d. = 1 100 100 80 80 60 60 40 40 20 20 0 0 50 100 Score 150 0 -2 -1 0 1 z score 2 3 Relative frequency The standard normal distribution -4 -3 -2 -1 0 z score 1 2 3 4 The standard normal distribution is a continuous distribution. It has a mean of 0 and a standard deviation of 1 The total area under the curve is equal to 1 Table A (page 436) gives you the proportion of scores for given ranges in the standard normal area =0.3413 Column 2 Area between 0 and z -3 -2 -1 0 z 1 2 3 Column 3 Area above z area =0.1587 -3 -2 -1 0 z 1 2 3