SD and Variance

advertisement
Measures of Variability:
Variance and Standard Deviation
Variability
• Central tendency: where the data cluster
• Variability: how the data are spread out across a variable’s
distribution around the central tendency
• Variable must be interval or ratio (some would argue ordinal, too)
• NOT for nominal (cannot order, so the “spread” is meaningless
because categories can be placed anywhere along the x-axis)
• Low variability: data are close to the mean, not spread out much
• High variability: data are spread out far away from the mean
Calculating Variability
• Range
• Interquartile range
• Variance
• Standard deviation
Variance Formula for a set of data
The average squared distance of each
observation from the mean
Variance Calculation
• Find the mean. Set aside.
• For each value: subtract the mean. Then square the result. This is the
squared difference of that value from the mean
• Take all the squared differences you calculate, add them up, and
divide by the number of squared differences you have (the mean of
the squared differences)
• Low variance: data are close to the mean, not spread out much
• High variance: data are spread out far away from the mean
Standard Deviation Formula for a set of data
The average distance of each observation from the
mean
Standard Deviation
• Another metric for how spread out the data are
• Square root of the variance
• Distance from the mean that is a standardized measure
• Regardless of the unit of measurement, distance from mean is the same
interpretation
• SD of public approval=2.5%
• SD of height=2.5 inches
• Different units of measurement, but same standard distance from mean
• Low standard deviation: data are close to the mean, not spread out much
• High standard deviation: data are spread out far away from the mean
Order of Operations
• PEMDAS
• Parentheses, exponents, multiplication, division, addition, subtraction
1: Find the mean
2: subtract mean from
• Work “inward to outward”
each value
• Example: standard deviation
3: square each new value
(original value-mean)
4. Add up all of the
new values
6. Take the square
root of the
number you get in
step 5
5. Divide the sum of all the
new values by the total
number of new values
(take the mean)
Things that affect the amount of variance and
the standard deviation
• Values of the outliers and number of outliers
• More outliers, higher SD and variance around the mean
• Higher values of outliers, higher SD and variance around the mean
• Number of observations
• More observations, lower SD and variance around the mean
• Fewer observations, higher SD and variance around the mean
• Taking a large number of samples and taking their means can be
plotted on a frequency distribution; shape will be “normal” (bell
curve)
=Standard Deviation
=Mean of population
68.2% of an infinite
set of means of a
population would
be 1 standard
deviation away
from the
population mean
% of data under the curve.
If we took an infinite amount
of samples, the means of
those samples would be the
data we plot and it would be
normally distributed
Example Data: Height
• Height of a group of friends (in inches)
• 72
• 60
• 66
• 67
• 80
• 64
• 70
• Calculate the variance and standard deviation of the data
Next time…
• Why are we using these formulas?
• Brief discussion of the Central Limit Theorem
• Standard deviations and calculating sampling error and relationship to
the population we care about
• Calculating confidence intervals (“margins of error” in polling)
Download