Uploaded by Justin Chandler

2-04+Guided+Notes+-+Using+Measures+of+Center+and+Measures+of+Variability (1)

advertisement
Using Measures of Center and
Measures of Variability
Objective
In this lesson, you will
Comparing Data Sets Using Measures of Center
Any statistical data can be organized and summarized using measures of center.
•
mean = the
average
middle of the data set
→ ____________ the sum of all the values in the data set by the total number of values.
•
median = the
average
middle of the data set
→ ___________ the values from least to greatest. The median divides the set into two halves:
one set of values lower than the median and one set higher than the median.
The table shows the number of words Luis and Darcy each wrote for 10 random essays.
mean =
sum of values in data set
number of values in data set
→ The mean number of words in Luis's essays is
2,108
→ The mean number of words in Darcy's essays is
10
= _________.
2,415
10
= _________.
There are an even number of values in each data set.
So, the median is the mean of the two middle values.
Luis
178
213
198
245
236
198
221
253
189
177
Darcy
231
210
245
259
286
245
231
244
236
228
→ When listed in order, the fifth and sixth values of Luis's set are ______ and ______, respectively.
So, the median number of words in Luis's essays is
411
2
= _________.
→ When listed in order, the fifth and sixth values of Darcy's set are ______ and ______, respectively.
So, the median number of words in Darcy's essays is
411
2
= _________.
Both the mean and the median number of words are higher in Darcy’s essays than in Luis’s essays, I can
conclude that ______________ essays generally contain more words than _____________ essays.
If one data set has a higher mean or median than another data set, that suggests that the values in the
data set tend to be
higher
lower than the values in the other data set.
Which measure of center to use:
•
in many situations, the best measure of center is the mean
•
if there are ______________ (values that are significantly higher or lower than most of the data set),
then median is the measure that best summarizes the data
•
if the data is in a box plot , then _______________ is the only measure of center known
Example 1:
Example 2:
The table shows the numbers of movies rented
from two movie rental stores on 10 random days.
The dot plots show the number of dogs groomed at
two dog-grooming salons on nine random days.
Store 1
18
14
20
22
16
15
21
20
13
15
Store 2
35
24
28
32
27
29
34
25
33
26
There are no outliers in either data set.
In the data set for salon 1, the value ____ is an
outlier. In the data set for salon 2, the values ____
So, the best measure of center is the
and ____ are outliers.
_____________.
So, use the ___________ to compare the data sets.
Store ____ generally rents more movies than
Salon 1 generally grooms
store _____.
dogs per day than salon 2.
fewer
more
Luther and Nelson are seniors
on the school basketball team.
The box plots represent the number of points they scored in 20 randomly selected games.
From the box plots, we get these values:
Nelson
Luther
minimum value
0
3
first quartile
6
8
median
14
10
third quartile
18
15
So, we can infer that ______________ generally scores more
maximum value
24
18
points in a game than ______________ does.
Comparing Data Sets Using Measures of Variation
Measures of variation
→ describe the _________________ of the values
→ tell how much the values in the data set _________
interquartile range:
→ a
mean
median based measure
→ gives the spread of the middle _______% of the data
→ the difference of the _________ (third) and lower(first) quartiles:
❖ IQR = Q3 – Q1
mean absolute deviation:
→ a
mean
median based measure
→ represents the average distance of each value in the data set from the _________
Astrid and Warren often play dominoes with their parents.
The table shows their scores for 15 picked randomly games.
Astrid
Warren
15
0
20
25
25
15
15
10
0
20
15
30
30
10
40
15
35
5
20
25
15
20
25
15
30
20
10
5
5
10
To find the interquartile ranges, arrange each set of scores in ascending order.
Astrid: 0, 5, 10, 15, 15, 15, 15, 20, 20, 25, 25, 30, 30, 35, 40
Warren: 0, 5, 5, 10, 10, 10, 15, 15, 15, 20, 20, 20, 25, 25, 30
The first quartile is the ____th value, and the third quartile is the ____th value.
interquartile range = third quartile – first quartile
Astrid:
Warren:
interquartile range = _____ – _____
interquartile range = _____ – _____
= _____
= _____
To find the mean absolute deviation of the scores: mean absolute deviation =
sum of absoulte deviations
Find the absolute deviation of each value by __________________
Find the mean
of the data set.
the value from the mean, and taking the absolute value.
Divide the sum of those absolute deviations by the number of
values in the data set.
Astrid
Warren
300
15
225
15
= _____
mean absolute deviation =
= _____
mean absolute deviation =
The interquartile range of Astrid's data set is
greater
15
15
≈ 8. 6̅
≈ 6. 6̅
less than that of Warren's data set.
The mean absolute deviation of Astrid's data set is
higher
These results indicate that Astrid’s data is
more spread out than Warren’s data.
less
.
number of values in data set
lower than that of Warren's data set.
Which measure of variation to use:
•
In many cases, absolute deviation is the best to use
→ A data set with a
higher
→ If the mean absolute deviation is
lower mean absolute deviation has a greater spread.
higher
lower, the values are more clustered around
the mean, indicating that the spread is not as great.
•
If there are outliers, interquartile range is the best measure of variation to use.
→ The higher the interquartile range, the __________ spread out the data.
•
If the data is in a box plot , then the
interquartile range
mean absolute deviation is the only
measure of variation that can be used.
Summary
What is a possible explanation for why median and interquartile range are the preferred measures of center
and variation when a data set contains outliers?
Download