Chapter 4 Variability Variability In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. If all the scores are the same no variability If small difference, variability is small If large difference, variability is large Variability Variability provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together. Goal: to describe how spread out the scores are in a distribution Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 4.1 Population distributions of heights and weights Variability (cont.) Variability will serve two purposes Describe the distribution Close together Spread out over a large distance Measure how well an individual score (or group of scores) represents the entire distribution Variability (cont.) Variability provides information about how much error to expect when you are using a sample to represent a population. Three measures of variability Range Interquartile range Standard deviation Range The range is the difference between the upper real limit of the largest (maximum) X value and the lower real limit of the smallest (minimum) X value. Range is the most obvious way to describe how spread out the scores are. Range (cont.) Problem: Completely determined by the two extreme values and ignores the other scores in the distribution. It often does not give an accurate description of the variability for the entire distribution. Considered a crude and unreliable measure of variability Interquartile Range and Semi-Interquartile Range Divide the distribution into four equal parts Q1, Q2, Q3 The interquartile range is defined as the distance between the first quartile and the third quartile Interquartile Range Semi-interquartile Range Semi-interquartile Range 25% 25% Q1 25% Q2 25% Q3 Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 4.2 The interquartile range Interquartile Range (cont.) When the interquartile range is used to describe variability, it commonly is transformed into the semi-interquartile range. Semi-interquartile range is one-half of the interquartile range Interquartile Range (cont.) Because the semi-interquartile range is derived from the middle 50% of a distribution, it is less likely to be influenced by extreme scores and therefore gives a better and more stable measure of variability than the range. Interquartile Range (cont.) Does not take into account distances between individual scores Does not give a complete picture of how scattered or clustered the scores are. Standard Deviation Most commonly used Most important measure of variability Standard deviation uses the mean of the distribution as a reference point and measures variability by considering the distance between each score and the mean. Standard Deviation (cont.) Are the scores clustered or scattered? Deviation is the average distance and direction from the mean. Standard Deviation (cont.) Goal of standard deviation is to measure the standard, or typical, distance from the mean. Deviation is the distance and direction from the mean deviation score = X - m Standard Deviation (cont.) Step 1 Determine the deviation or distance from the mean for each individual score. If m = 50 X = 53 deviation score = X – m = 53-50 = +3 Standard Deviation (cont.) If m = 50 X = 45 deviation score = X – m = 45-50 = -5 Standard Deviation (cont.) Step 2: Calculate the mean of the deviation scores Add the derivation scores Divide by N Standard Deviation (cont.) X X–m 8 +5 1 -2 3 0 0 -3 Deviation scores must add up to zero S(X – m) = 0 Standard Deviation (cont.) Step 3: Square each deviation score. Why? The average of the deviation scores will not work as a measure of variability. Why? They always add up to zero Standard Deviation (cont.) Step 3 cont.: Using the squared values, you can now compute the mean squared deviation This is called variance Variance = mean squared deviation Standard Deviation (cont.) By squaring the deviation scores: You get rid of the + and – You get a measure of variability based on squared distances This is useful for some inferential statistics Note: This distance is not the best descriptive measure for variability Standard Deviation (cont.) Step 4: Make a correction for squaring the distances by getting the square root. Standard deviation = variance Sum of Squared Deviations (SS) Variance = mean squared deviation = SS N Definitional Formula SS = S ( X – m)2 Sum of Squared Deviations (SS) Definitional Formula X–m ( X – m)2 =8 1 -1 1 m=2 0 -2 4 6 +4 16 1 -1 1 X 22 = S ( X – m)2 Computational Formula SS = S X2 – (SX)2 N Computational Formula for SS X X2 1 1 0 0 SS = SX2 – (SX)2 N = 38 – (8)2 4 6 36 1 1 SX = 8 SX2 = 38 = 38 – 64 4 = 38 – 16 = 22 Definitional vs. Computational? Definitional is most direct way of calculating the sum of squares However if you have numbers with decimals, it can become cumbersome Computation is most commonly used Formulas Variance = SS N Standard deviation = variance = SS N Formulas (cont.) Variance and standard deviation are parameters of a population and will be identified with a Greek letter – s or sigma Population standard deviation = s = SS N Population variance = s2 = SS N Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 4.4 Graphic presentation of the mean and standard deviation Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 4.5 Variability of a sample selected from a population Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 4.6 Largest and smallest distance from the mean Example (pg. 94) X X–M 1 6 4 3 8 7 6 -4 1 -1 -2 3 2 1 ( X – M)2 S X = 35 16 M = 35/7=5 1 n=7 1 4 9 4 1 36 = S ( X – M)2 = SS Degrees of Freedom Degrees of freedom, use for sample variance where n is the number of scores in the sample. With a sample of n scores, the first n-1 scores are free to vary but the final score is restricted. As a result, the sample is said to have n-1 degrees of freedom Degrees of Freedom Degrees of freedom, or df, for sample variance are defined as df = n – 1 where n is the number of scores in the sample. Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Table 4.2 Reporting the mean and standard deviation in APA format