Unit 4: Describing Data
Central Tendency
S.ID.2: Use statistics appropriate to the shape of the data distribution to compare center and spread of two or more different data sets.
Essential Question:
How do I find the mean, median, mode, and range in a numerical set of data?
Vocabulary:
Me an : the average value from a numerical set of data. The symbol is x, which is read as “x-bar”.
M edian : the middle number from a numerical set of data when written in order. If the data set has an even number of values, the median is the mean of the two middle values. Also known as the
“quartile”.
M ode : The mode is the value that occurs most frequently in a data set. There can be one mode, or many. There can also be no mode in a data set.
R ange : In a numerical set of data, the range is the difference between the highest value and the lowest value.
M easure of Dispersion : Describes the dispersion, or spread, of data.
D eviation from the Mean : The difference of a data value and the mean of the data set.
Example 1:
TEST SCORES: The test scores received by students on a history exam are listed below.
Find the mean, median, mode, and range:
65, 68, 71, 77, 81, 82, 86, 88, 93, 93, 95, and 97
MEAN =
MEDIAN =
MODE =
RANGE =
5, 68, 71, 77, 81, 82, 86, 88, 93, 93, 95, and 97
MEAN = 83
MEDIAN = 84
MODE = 93
RANGE = 32
Me an and Median BEST represent the data given.
Class work:
Red Math 1 books that are under your desk.
Pg. 365, #’s 1-4 (find the mean, median, mode, and range)
Lastly, do # 15 a, b, and c
Answers:
1. Mean = 5 Median = 5 Modes = 1, 5 Range = 10
2. Mean = 19, Med = 22, Mode = 25, Range = 18
3. Mean = 16, Med = 13.5, Modes = 8, 28 Range = 22
4. Mean = 2.8, Med = 2.8, Mode = none Range=3.1
15a. 24
15b. Mean = 41.25, Med = 41.5, Mode = none
15c. Both mean and median represent the data well – they are both close to all data points.
Homework:
Worksheet
HW Answers:
1. Mean: 58
Mean: 56
Median: 59
Median: 55
Mode: none
Mode: none
Range: 85
Range: 78
3. Mean: 45
Mean: 56
Median: 36
Median: 50
Mode: none
Mode: none
Range: 85
Range: 75
5. Mean: 61
Mean: 66
Median: 65
Median: 79.5
Mode: none
Mode: 80, 81
Range: 86
Range: 82
7. Mean: 45
Mean: 75
Median: 41
Median: 80
Mode: none
Mode: none
Range: 87
Range: 81
9. Mean: 48
Mean: 55
Median: 43.5
Median: 53
Mode: none
Mode: 96
Range: 85
Range: 83
2.
4.
6.
8.
10.
Essential Question:
How do I find the mean absolute deviation in a numerical set of data?
Standard: S.ID.2
Finding the mean absolute deviation:
F ind the mean of the data set
Subtract the mean from each individual piece of data. Take the absolute value of your answer (MAKE IT POSITIVE!).
Add all of those numbers together and divide by the total number of data entries. Round to the nearest hundredth.
This value is the mean absolute deviation.
Example 2:
Fi nd the mean absolute deviation of the data set: 67, 69, 69, 71, 74,
76
MEAN =
Class work:
Te xtbook pg. 365, #’s 9-14 (just find the mean absolute deviation)
9. 1.68
10. 2.88
11. 4.67
12. 2.25
13. 2.44
14. 0.6
HW: Worksheet
ssential Questions:
H ow do I find the lower (Q1) and upper quartile (Q3) and the interquartile range?
How do I compare summary statistics?
Vocabulary:
Quartile : the median of an ordered data set (Q2)
Up per quartile : the median of the upper half of an ordered data set (Q3)
Lo wer quartile : the median of the lower half of an ordered data set (Q1)
In terquartile Range : the difference of the upper quartile and lower quartile of an ordered data set
Example 1:
Th e data sets below give the number of home runs by each player on the
Bears and the Wildcats during a season of the Oakmont Baseball League.
Compare the data using the mean, median, range and interquartile range.
Bears: 28, 25, 21,19, 18, 14, 10, 8, 7 , 5, 3, 2
Wildcats: 20, 19, 18, 16, 15, 15, 12, 11, 9, 8, 6, 5, 4
Solution:
Be ars:
Wildcats:
Mean: 13.33
12.15
Median: 12
Median: 12
Range: 26
Range: 16
Lower Quartile: 6
Quartile: 7
Upper Quartile: 20
Interquartile Range: 14
Mean:
Lower
Upper Quartile: 17
Interquartile Range: 10
The Bears’ mean is greater than the Wildcats’ mean so they averaged more homeruns per player than the Wildcats.
The Wildcats’ range is less than the Bears’, so their data is less spread out than the Bears’ data.
The WC interquartile range is less than the Bears’ interquartile range, so the WC middle 50% of the data showed less variation than the middle 50% of the Bears’ data.
Solution:
Average Mean = 10.5 + 11.6 + 13.4 =
3
11.83
Average median = 12.17
Average Range: 25.33
Average Interquartile Range: 15.33
What does the data mean??
Class work:
Red Mathematics 1 Textbook: pg. 371, #’s 1-4, 6, 8-9
Answers:
1. Mean= 8 Med = 8 R = 9 LQ = 5.5 UQ = 11 IQR= 5.5
2. Mean= 60.3 Med = 61 R = 55 LQ = 49 UQ = 71 IQR= 22
3. Mean= 4.7 Med = 4.9 R = 5.2 LQ = 3.1 UQ = 6.3 IQR= 3.2
4. Mean= 115 Med = 110 R = 119 LQ = 77 UQ = 148.5 IQR= 71.5
6. Answers may vary
8. Avg. Mean = 86.6
Avg. Median = 87.4
Avg. Range = 34.6
Avg. LQ = 80.2
Avg. UQ = 92.4
Avg. IQR = 12.2
Avg. Mean is greater than pop. The avg. median, range, and IRQ are less.
9. Avg. Mean = increases to 87.3
Avg. Median = increases to 88.75 and is now greater than the population median.
Avg. Range = decreases
Avg. IQR = decreases
Central Tendency Lab Instructions:
Form groups containing 5-6 people per group (move desks accordingly).
Measure the height of each group member in inches with a tape measure or ruler (I will provide those).
Record those heights (including yours) in the table provided on problem # 3.
Complete the worksheet front and back.
Each student will be turning in a completed worksheet.
Lab sheet answers:
Essential Questions:
How do I create a dot plot from a numerical data set?
How do I create a box and whisker plot from a numerical data set?
- data that has only a finite number of values.
D
- a graph that shows how discrete data are distributed using a number line.
D
- the way in which the data is spread out or clustered together.
- the left and right halves of the graph are nearly mirror images of each other.
- the peak of the data is to the left side of the graph.
- the peak of the data is to the side of the graph.
right
Twenty high school students were randomly selected from a very large high school. They were asked to keep a record for a week of the number of hours they slept each night.
These seven values were averaged to obtain an average night of sleep for each. The results are as follows:
9, 8, 8, 7.5, 6, 6, 4, 5.5, 7, 8, 5, 7.5, 6.5, 10, 8.5, 6.5, 5, 5.5, 7, and 7.5 hours.
Create a dotplot of these data and discuss what the display implies about the data.
The ages of the Oscar winners for best actor and best actress (at the time they won the award) from 1996 to 2004 are as follows:
45, 39, 59, 33, 45, 25, 42, 24, 35, 32, 46, 32, 28, 34, 42, 27, 36, and 30.
Create a doplot of these data and discuss what the display implies about the data.
- displays the data distribution on a five number summary.
The five number summary consists of: minimum value
Q1 (the first quartile) median
Q3 (the third quartile) maximum value
Construct a box and whiskers plot for the data set:
{5, 2, 16, 9, 13, 7, 10}
Construct a box and whiskers plot for the data set:
5 9 13 9 9 14 7 3 6 8 8 6 9 8 5
Assignment:
Pg. 453-458, #'s: ALL
OMIT #'S 7 AND 8 ON PG. 456!!!!!
Answers:
Find the mean, median, mode, range, IQR, make a dot plot and a box and whisker plot for the following data set:
105, 101, 102, 99, 110, 90, 100, 100, 108, 120, 32
Mean: 97
Median: 101
Mode: 100
Range: 88
IQR: 9
32, 90, 99, 100, 100, 101, 102, 105, 108, 110, 120
Five # Summary:
Minimum: 32
Q1: 99
Median: 101
Q3: 108
Maximum: 120
Outliers
EQ: How do I find outliers and what effects do they have on the context of a data set?
Standard: S.ID.3 - Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).
Outlier: a data value that is significantly greater or lesser than other data values in a data set.
You are determining a lower and upper limit for the data. Any value outside of these limits is an outlier.
The lower limit is called the "lower fence".
The upper limit is called the "upper fence".
Calculating the Fences: lower fence: Q1 - (IQR x 1.5) upper fence: Q3 + (IQR x 1.5)
Example 1: Make a box and whisker plot (find the five number summary) then calculate the upper and lower fences to determine if there are any outliers in the data set.
2, 5, 6, 6, 7, 9, 10, 11, 12, 12, 14, 28, 30
Five # Summary
Minimum Value:
Q1:
Median (Q2):
Q3:
Maximum Value:
Box and Whisker Plot:
Lower Fence: Q1 - (IQR x 1.5)
Upper Fence: Q3 + (IQR x 1.5)
Are there any outliers?
The idea is if any outliers are discovered, that you would remove them and recalculate the values of central tendency to decrease the spread of your data.
Example 2: Same directions as example 1.
10, 13, 17, 20, 22, 24, 24, 27, 28, 29, 35
Five # Summary
Minimum Value:
Q1:
Median (Q2):
Q3:
Maximum Value:
Bo x and Whisker Plot:
Lower Fence: Q1 - (IQR x 1.5)
Upper Fence: Q3 + (IQR x 1.5)
Are there any outliers?
Example 3: Same directions as example 1.
0, 7, 17, 17, 18, 24, 24, 24, 25, 27, 45
Five # Summary
Minimum Value:
Q1:
Median (Q2):
Q3:
Maximum Value:
Bo x and Whisker Plot:
Lower Fence: Q1 - (IQR x 1.5)
Upper Fence: Q3 + (IQR x 1.5)
Are there any outliers?
Assignment:
Pg. 467, #'s 1-4 AND pg. 468-473, #'s 1-12
HW answers:
1. b 2. d 3. a 4. c
2. IQR = 10 LF: 1 UF: 41
0 is an outlier
3. IQR = 12 LF: 10 UF: 58
9 and 59 are outliers
4. IQR = 11 LF: 15.5 UF: 59.5
no outliers
5. IQR = 6.5 LF: 8.75 UF: 34.75
8 is an outlier
6. IQR = 22.5 LF: 21.25 UF: 111.25
15, 20, and 115 are outliers
8. IQR = 4 LF: 10 UF: 26 no outliers
9. IQR = 15 LF: 22.5 UF: 82.5
At least 1 outlier on the lower side and at least 1 outlier on the upper side.
10. IQR = 5 LF: -.5 UF: 19.5
At least 1 outlier greater than the upper fence.
11. IQR = 200 LF: 50 UF: 850
At least 1 outlier less than the lower fence.
12. IQR = 6 LF: -5 UF: 19
No outliers.
Class work assignment:
Pg. 103-105 ALL (thin books)
**problem 1 d. wants you to reconstruct the box and whisker plot by removing the outliers you found and then recalculating your five number summary to create a new box and whisker plot.
Answers:
EQ: How do I use and analyze histograms?
Histogram : a graphical way to display quantitative data using vertical bars.
Bi n : intervals of data along the horizontal axis of a histogram
Fr equency : the height of each bar (vertical axis) which is the number of data values included in each bin
Bin Intervals: bin intervals can be written as compound inequalities.
For example: 0, 5, 10, 15, 20, 25 across x-axis
Data points: 2, 4, 5, 6, 15, 23, 10, 11, 14, 12, 24
Get a Student Text Volume 2 book...
Turn to page 463 and look at problem #3.
We are going to walk through some examples together.
Please draw the histogram at the bottom of pg. 463 in your notes.
Histogram Problem
A number of students were asked to calculate their average commute time one-way to school.
Their averages (in minutes) are listed in the table below.
2.3 1.8
21.1 5.2
10.2 7.0 29.4 12.4 15.3
12.1 41.9 22.1 9.3 15.2
18.4 21.6 12.0 10.8 31.5 15.9 6.2
30.0 23.6 17.6 12.3 15.2 19.4 8.1
32.5 29.1 21.4 12.0 9.3 16.3 22.7
Create a histogram with bin intervals of 5 minutes.
Ho w many modes are their? List all modal groups.
Describe the shape of the histogram.
Make some conclusions about driving times based on your graph.
Class work / Homework: pg. 97-100 (thin book), #'s 1-2 ALL
Answers: