Variation
Measures of variation quantify how spread out the data is.
Variation is one of the core ideas in Statistics
Super-simple measure of variation
Range = highest value – lowest value
Not good for much, but gives us some idea how spread out the data is.
Standard Deviation
Standard Deviation is a measure of variation based on the mean
Because of this, it can be strongly influenced by outliers, just like the mean.
Standard Deviation is always positive or 0 (zero only if all the data are the same)
The standard deviation has the same units as the data
Calculating Standard Deviation
Definitional formula s
( x n
1 x )
2
Notice we are measuring variation of the data from the mean.
This formula is for the sample standard deviation , and is based on the sample mean and sample size
Calculating Standard Deviation
Shortcut Formula s
n
x n (
2 n
1 )
2
The advantage: No need to calculate the mean first
The disadvantage: Doesn’t make as much sense
Example: Definitional Form x
12 .
3
Data x
7
8
10
11
13
25 x
x
7-12.3 = -5.3
8-12.3 = -4.3
10-12.3 = -2.3
11-12.3 = -1.3
13-12.3 = 0.7
x
x
2
28.09
18.49
5.29
1.69
.49
25-12.3 = 12.7
161.29
s
( x n
1 x )
2
215 .
34
5
6 .
6
Data x
7
8
10
11
13
25
Sums: 74
Example: Shortcut Form x
2
49
64
100
121
169
625
1128 s
s
n
x n (
2 n
1 )
2
6
1128
2
6 ( 6
1 ) s
1292
30 s
6 .
6
Population Standard Deviation
If we have the population data, we can calculate the population standard deviation.
To distinguish it, we use a different symbol.
( x
)
2
N
Variance
Sample Variance: s
2
Population Variance:
2
Understanding Standard Deviation
Main idea:
Bigger value, data is more spread out.
Smaller value, data is closer together.
Rule of Thumb
To very roughly approximate s , s
range
4
Rough interpretation:
“Most” data will be within two standard deviations of the mean. In other words,
Approximate highest value
x
2 s
Approximate lowest value
x
2 s
Empirical Rule
For data sets with a bell-shaped distribution ,
Example
For a particular fast-food store, the time people have to wait at the drive-through has a bell-shaped distribution with x
3 .
5 min s
0 .
7 min
Then about 68% of people wait between x
s
2 .
8 min and x
s
4 .
2 min
About 95% of people wait between x
2 s
2 .
1 min and x
2 s
4 .
9 min
Almost everyone (99.7%) of people wait between x
3 s
1 .
4 min and x
3 s
5 .
6 min
Homework
2.5: 3, 9, 21, 23, 25, 33