AU5 - Lesson 3 notes (bwp)

advertisement
Name: ________________________
Class: _________________
AU5: Notes #3 – Box-and-Whisker
Date: __________________
Vocabulary:
Box-and-Whisker Plot – is a data display that divides a set of data into four parts. The
median or second quartile separates the set into two halves: the numbers that are below
the median and the numbers that are above the median. The first quartile is the median
of the lower half. The third quartile is the median of the upper half. The lower extreme
is the least data value and the upper extreme is the greatest data value.
2
3
4
1
5
Five Number Summary
1 = minimum value, lower extreme
2 = 1st quartile, lower quartile (25th percentile)
3 = 2nd quartile, median (50th percentile)
4 = 3rd quartile, upper quartile (75th percentile)
5 = maximum value, upper extreme
Interquartile Range is the distance between the first quartile and third quartile. It
accounts for the middle 50% of the data.
Outliers are values much lower or much higher than most of the data. In a box-andwhisker plot, outliers are data that fall more than 1.5 times the interquartile range from
the quartiles. Do not extend whiskers to any outliers.
1
Application:
1.
Consider the following scenario. A television game show, “Fact or Fiction”, was
canceled after nine shows. Many people watched the nine shows and were rather upset
when it was taken off the air. A random sample of eighty viewers of the show was
selected. Viewers in the sample responded to several questions. The dot plot below shows
the distribution of ages of these eighty viewers:
a. Approximately where would you locate the mean (balance point) in the above
distribution?
b. How does the direction of the tail affect the location of the mean age compared to the
median age?
c. The mean age of the above sample is approximately 50. Do you think this age
describes the typical viewer of this show? Explain your answer.
2
d. Using the dot plot on the previous page, construct a box plot over the dot plot by
completing the following steps:
i. Locate the middle 40 observations and draw a box around these values.
ii. Calculate the median and then draw a line in the box at the location of the
median.
iii. Draw a line that extends from the upper end of the box to the largest
observation in the data set.
iv. Draw a line that extends from the lower edge of the box to the minimum value
in the data set.
e. Recall that the 5 values used to construct the dot plot make up the 5–number summary.
What is the 5–number summary for this data set of ages?
Minimum Age ________
Lower quartile or Q1 ________
Median Age ________
Upper quartile or Q3 ________
Maximum Age ________
f. What percent of the data falls between the minimum value and Q1?
g. What percent of the data does the box part of the box plot capture?
h. What is the range of the middle fifty percent?
i. The IQR provides a summary of the variability for a skewed data distribution. The IQR
is a number that specifies the length of the interval that contains the middle half of the
ages of viewers. Do you think producers of the show would prefer a show that has a small
or large interquartile range? Explain your answer.
3
Try-It!
2. Thirty female users and twenty-five male users were selected at random from a
database of people who play a video game regularly. Each of them agreed to be part of a
research study and report their scores. A leadership score is based on a player’s answers
to leadership questions. A score of 1 to 40 is considered a beginning level leadership
score, a score of 41 to 60 is considered a middle level leadership score, and a score of
greater than 60 is considered an advanced level leadership score.
a. Use the following data to make a box plot of the female scores, and a box plot of the
male scores on the line graph below.
Based on your box plots describe the variability of the leadership scores between male
and female users; and which users tend to have a higher leadership score.
4
3. Consider a box plot of the ages of 200 randomly selected people from Kenya:
A data distribution may contain extreme data. A box plot can be used to display extreme
data values that are identified as outliers. The “*” in the box plot are the ages of 4 people
from this sample. Based on the sample, these 4 ages were considered outliers. An outlier
is defined to be any data value that is more than 𝟏.πŸ“×(𝑰𝑸𝑹) away from the nearest
quartile.
a. Estimate the values of the 4 ages represented by an *.
b. What is the median age of the sample of ages from Kenya? What are the approximate
values of Q1 and Q3? What is the approximate interquartile range (IQR) of this sample?
c. Multiply the interquartile range (IQR) by 1.5. What value do you get?
d. Are there any age values that are less than π‘ΈπŸ−𝟏.πŸ“×(𝑰𝑸𝑹)? If so, these ages would
also be considered outliers.
e. Explain why there is no * on the low side of the box plot for ages of the people in the
sample from Kenya.
5
Download