Variation

advertisement

Variation

Measures of variation quantify how spread out the data is.

Variation is one of the core ideas in Statistics

Super-simple measure of variation

Range = highest value – lowest value

Not good for much, but gives us some idea how spread out the data is.

Standard Deviation

Standard Deviation is a measure of variation based on the mean

Because of this, it can be strongly influenced by outliers, just like the mean.

Standard Deviation is always positive or 0 (zero only if all the data are the same)

The standard deviation has the same units as the data

Calculating Standard Deviation

Definitional formula s

( x n

1 x )

2

Notice we are measuring variation of the data from the mean.

This formula is for the sample standard deviation , and is based on the sample mean and sample size

Calculating Standard Deviation

Shortcut Formula s

 n

  x n (

2 n

 

1 )

2

The advantage: No need to calculate the mean first

The disadvantage: Doesn’t make as much sense

Example: Definitional Form x

12 .

3

Data x

7

8

10

11

13

25 x

 x

7-12.3 = -5.3

8-12.3 = -4.3

10-12.3 = -2.3

11-12.3 = -1.3

13-12.3 = 0.7

 x

 x

2

28.09

18.49

5.29

1.69

.49

25-12.3 = 12.7

161.29

s

( x n

1 x )

2

215 .

34

5

6 .

6

Data x

7

8

10

11

13

25

Sums: 74

Example: Shortcut Form x

2

49

64

100

121

169

625

1128 s

 s

 n

  x n (

2 n

 

1 )

2

6

1128

  

2

6 ( 6

1 ) s

1292

30 s

6 .

6

Population Standard Deviation

If we have the population data, we can calculate the population standard deviation.

To distinguish it, we use a different symbol.

 

( x

 

)

2

N

Variance

Sample Variance: s

2

Population Variance:

2

Understanding Standard Deviation

Main idea:

Bigger value, data is more spread out.

Smaller value, data is closer together.

Rule of Thumb

To very roughly approximate s , s

 range

4

Rough interpretation:

“Most” data will be within two standard deviations of the mean. In other words,

Approximate highest value

 x

2 s

Approximate lowest value

 x

2 s

Empirical Rule

For data sets with a bell-shaped distribution ,

Example

For a particular fast-food store, the time people have to wait at the drive-through has a bell-shaped distribution with x

3 .

5 min s

0 .

7 min

Then about 68% of people wait between x

 s

2 .

8 min and x

 s

4 .

2 min

About 95% of people wait between x

2 s

2 .

1 min and x

2 s

4 .

9 min

Almost everyone (99.7%) of people wait between x

3 s

1 .

4 min and x

3 s

5 .

6 min

Homework

2.5: 3, 9, 21, 23, 25, 33

Download