Measuring Center and Spread

advertisement
Section 1.3 Describing Quantitative Data with Numbers
Barry Bonds’ home run counts from 1986 to 2001 in order:
16, 25, 24, 19, 33, 25, 34, 46, 37, 33, 42, 40, 37, 34, 49, 73
MEASURING CENTER
How can we use numbers to measure the center and spread rather than just looking at a graph?
Mean:
π‘₯Μ… =
π‘₯1 +π‘₯2 +β‹―+π‘₯𝑛
𝑛
=
∑ π‘₯𝑖
𝑛
“x bar”’
Bonds: π‘₯Μ… =
567
16
≈ 35.4375
What if we excluded the outlier? (excludes 73)
Bonds: π‘₯Μ… =
494
15
≈ 32.9333
The outlier increased his mean home run count by 2.5 home runs!
Because one extreme observation can influence the mean, we say that it is NOT a resistant measure of center.
Median: middle. Half the observations are smaller, half are bigger. Represented as M.
To find the Median:
1. Arrange observations in order from least the greatest
2. If the number of observations is odd, the median is the center observation
3. If the number of observations is even, the median is the mean of the two center observations
Bonds: Median or M
n = 16 οƒŸnumber of observations
Two middle: 34 and 34
So M = 34
Without outlier n = 15
Center = 34
So M = 34
Notice that the Median did not change after we removed the outlier unlike the mean. Median is a resistant
measure of center.
Note: the mean will
always be pulled towards
the tail of a distribution
that is skewed. If a
distribution is roughly
symmetric, mean and
median should be about
the same.
MEASURING SPREAD also known as variability
-
Range (max – min)
Interquartile Range or IQR (Q3 – Q1)
***To Calculate Quartiles Q1 and Q3:
1. Arrange observations in increasing order and locate Median
2. The first quartile Q1 is the median of the observations to the left of the Median.
3. The third quartile Q3 is the median of the observations to the right of the Median.
Bonds:
Q1 = 25
M = 34
Q3 = 41
16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73
0th
25th
50th
Percentiles
***To Calculate IQR:
ο‚·
ο‚·
ο‚·
75th
IQR = Q3 – Q1
100th
Bonds IQR: 41 – 25 = 16
IQR represents the middle 50% of observations
The first quartile is the first 25% of observations
The third quartile is the first 75% of observations
FYI: Do NOT use the median when looking for the first and third quartiles.
For example:
0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 4 5
Q1
M
Q3
Determining Outliers using IQR:
If an observation falls more than 1.5 x IQR above the third quartile or below the first quartile, it is considered an
outlier.
Bonds: We suspected 73 was an outlier.
16 x 1.5 = 24
41 + 24 = 65
So anything above 65 is considered an outlier
25 – 24 = 1
So anything below 1 is considered an outlier
Therefore, 73 is an outlier.
Download