AP Statistics
1.2 Measuring Center and Boxplots
Objectives: Given a data set, compute the mean and median as measures of center.
Explain what is meant by a resistant measure.
Identify situations in which the mean is the most appropriate measure of center and situations in which the median is the most appropriate measure.
Given a data set, find the quartiles.
Given a data set, find the five-number summary.
Use the five-number summary of a data set to construct a boxplot for the data.
Compute the interquartile rate (IQR) of a data set.
Given a data set, use the 1.5 x IQR Rule to identify outliers.
1.
The Mean (x̅)
To find the mean x̅ of a set of observations, add their values and divide by the number of observations. If the n observations are x
1,
, x
2
, … , x n
, their mean is:
OR in more compact notation,
The (capital Greek sigma) in the formula for the mean is short for “add them all up.”
The bar over the x indicates the mean off all the x values. Pronounce the mean x̅ as
“____________________.” Remember, the mean is nonresistant to outliers!
Finding x̅ on the calculator:
STAT Edit Insert data into L
1
(or L n
, it doesn’t matter) STAT CALC
1-Var Stats (make sure you have the correct List) Hit ENTER three times.
2.
The Median (M)
1.
Arrange all observations in order of size, from smallest to largest.
2.
If the number of observations n is odd, the median M is the center observation in the ordered list. Find the location by counting (n + 1)/2 observations up from the bottom of the list.
3.
If the number of observations n is even, the median M is the average of the two center observations in the ordered list. The location of the median is again
(n + 1)/2 from the bottom of the list.
*Note, the formula (n + 1)/2 does NOT given the median, just the location of the median in the ordered list.
While the word average can be used to describe EITHER the mean or the median, it’s a fair assumption that when people say “average” they’re referring to the “mean”.
Finding M on the calculator:
2 nd STAT (LIST) MATH median( ENTER 2 nd 1 (L
1
, end parentheses) ENTER
Mean vs Median
The mean and median of a symmetric distribution:
If the distribution is exactly symmetric:
In a skewed distribution:
Don’t confuse the “average” value of a variable (the mean) with its “typical” value, which we might describe with the median.
Measuring a center alone can be misleading. For example:
The Quartiles
Range:
Percentile: (pth)
Quartiles:
Calculating the Quartiles:
1.
Arrange observations in increasing order and locate the median M in the ordered list of observations.
2.
The first quartile Q
1
is the median of the observation whose position in the ordered list is to the left of the location of the overall median.
3.
The third quartile Q
3
is the median of the observations whose position in the ordered list is to the right of the location of the overall median.
Example 1:
The highway mileages of 20 gasoline-powered- two-seater cars are as follows:
24 28 28 25 25 20 16 16 23 15 13 22 17 28 23 19 26 29 23 32
The Five-Number Summary and Boxplots
Five-Number Summary of a set of observations consists of:
These five numbers offer a reasonably complete description of center and spread. The fivenumber summary leads to the visual representation called a Boxplot.
Creating a Boxplot (Box and Whisker Plot)
A central box spans the quartiles Q
1
and Q
3
.
A line in the box marks the median M
Lines extend from the box out to the smallest and largest observations.
Example 2:
Use the ordered data from Example 1. Find the Minimum, Q
1
, Median M, Q
2
, and Maximum.
You can plot more than Boxplot on one graph (as long as they’re comparable). A Boxplot can be horizontal or vertical.
Example 3:
Here are the city mileages of the 20 gasoline-powered- two-seater cars from Example 2.
17 20 20 17 18 12 11 10 17 9 9 15 12 22 16 13 20 20 15 26
Find the five-number summary and plot the boxplot on the SAME graph as the highway.
The Interquartile Range (IQR):
The distance between the quartiles is a more resistant measure of spread than just the range, which may include and outlier. This distance is called the Interquartile Range (Q
3
– Q
1
)
The 1.5 x IQR Rule for Outliers:
We call an observation a suspected outlier if it falls more than 1.5 x IQR above Q
3
or below Q
1
.
Would 60 mpg be an outlier in Example 3?
HW: pg. 74–75; 1.27, 1.28, 1.30, 1.32, pg. 82-83; 1.33, 1.37, 1.38