Measures of Variability: Variance and Standard Deviation Variability • Central tendency: where the data cluster • Variability: how the data are spread out across a variable’s distribution around the central tendency • Variable must be interval or ratio (some would argue ordinal, too) • NOT for nominal (cannot order, so the “spread” is meaningless because categories can be placed anywhere along the x-axis) • Low variability: data are close to the mean, not spread out much • High variability: data are spread out far away from the mean Calculating Variability • Range • Interquartile range • Variance • Standard deviation Variance Formula for a set of data The average squared distance of each observation from the mean Variance Calculation • Find the mean. Set aside. • For each value: subtract the mean. Then square the result. This is the squared difference of that value from the mean • Take all the squared differences you calculate, add them up, and divide by the number of squared differences you have (the mean of the squared differences) • Low variance: data are close to the mean, not spread out much • High variance: data are spread out far away from the mean Standard Deviation Formula for a set of data The average distance of each observation from the mean Standard Deviation • Another metric for how spread out the data are • Square root of the variance • Distance from the mean that is a standardized measure • Regardless of the unit of measurement, distance from mean is the same interpretation • SD of public approval=2.5% • SD of height=2.5 inches • Different units of measurement, but same standard distance from mean • Low standard deviation: data are close to the mean, not spread out much • High standard deviation: data are spread out far away from the mean Order of Operations • PEMDAS • Parentheses, exponents, multiplication, division, addition, subtraction 1: Find the mean 2: subtract mean from • Work “inward to outward” each value • Example: standard deviation 3: square each new value (original value-mean) 4. Add up all of the new values 6. Take the square root of the number you get in step 5 5. Divide the sum of all the new values by the total number of new values (take the mean) Things that affect the amount of variance and the standard deviation • Values of the outliers and number of outliers • More outliers, higher SD and variance around the mean • Higher values of outliers, higher SD and variance around the mean • Number of observations • More observations, lower SD and variance around the mean • Fewer observations, higher SD and variance around the mean • Taking a large number of samples and taking their means can be plotted on a frequency distribution; shape will be “normal” (bell curve) =Standard Deviation =Mean of population 68.2% of an infinite set of means of a population would be 1 standard deviation away from the population mean % of data under the curve. If we took an infinite amount of samples, the means of those samples would be the data we plot and it would be normally distributed Example Data: Height • Height of a group of friends (in inches) • 72 • 60 • 66 • 67 • 80 • 64 • 70 • Calculate the variance and standard deviation of the data Next time… • Why are we using these formulas? • Brief discussion of the Central Limit Theorem • Standard deviations and calculating sampling error and relationship to the population we care about • Calculating confidence intervals (“margins of error” in polling)