Stat 104 – Lecture 5 Display of Numerical Data • Histogram

advertisement
Stat 104 – Lecture 5
Display of Numerical Data
• Histogram
– A picture of the distribution of the
data.
– Collects values into classes.
– Classes should be of equal width.
– Different class choices can yield
different pictures.
1
Frequency
Histogram
Measurement
2
Constructing a Histogram
• Order data from smallest to largest using
a stem and leaf display.
• Determine classes.
– equal width
– more data
more classes
3
1
Stat 104 – Lecture 5
Split Stem
Body Mass (kg) of Canidae
0 | 1,3,3,3,4,4,4
0*| 5,5,5,5,5,6,6,6,7,8,9,9
1 | 0,0,1,2,3
1*|
2 | 2,3
2*| 5
3 |
3*| 6
4
Freq
Class
0 ≤ Body Mass < 5
7
5 ≤ Body Mass < 10
10 ≤ Body Mass < 15
12
5
15 ≤ Body Mass < 20
0
20 ≤ Body Mass < 25
2
25 ≤ Body Mass < 30
1
30 ≤ Body Mass < 35
0
35 ≤ Body Mass < 40
1
5
Histogram
Distributions
Body Mass (kg)
12
8
6
Count
10
4
2
0
5
10
15
20
25
30
35
40
6
2
Stat 104 – Lecture 5
Shape
• Symmetry (mirror image)
– Mounded, flat
• Skew (mounded on one side)
– Toward higher values (right)
– Toward lower values (left)
• Other
– Multiple peaks, outliers
7
Symmetric & Mounded
Histogram of Octane Rating
10
9
8
Frequency
7
6
5
4
3
2
1
0
86
87
88
89
90
91
92
93
94
95
96
Octane
8
Skewed to Right
pH of Pork Loins
80
70
Frequency
60
50
40
30
20
10
0
5.0
5.5
6.0
6.5
7.0
pH
9
3
Stat 104 – Lecture 5
Skewed to Left
Flexibility Index of Young Adult Men
20
Frequency
15
10
5
0
1
2
3
4
5
6
7
8
9
10
Flexibility Index
10
Multiple Peaks
Size of Diamonds (carats)
Frequency
15
10
5
0
0.1
0.2
0.3
0.4
Size (carats)
11
Summarizing Numerical Data
• What is a “typical” value?
• Look for the center of the
distribution.
• What do we mean by “center”?
12
4
Stat 104 – Lecture 5
Summary Measures
• Central Tendency
– Sample midrange
– Sample median
– Sample mean
13
Measures of Center
• Sample Midrange
– Average of the minimum and the
maximum.
– Body mass of Canidae:
(1 + 36)/2=18.5 kilograms
– Greatly affected by outliers.
14
Measures of Center
• Sample Median
– A value that divides the data into a
lower half and an upper half.
– About half the data values are
greater than the median about half
are less than the median.
15
5
Stat 104 – Lecture 5
Sample Median (n even)
Body Mass (kg) of Canidae
0 | 1,3,3,3,4,4,4
0*| 5,5,5,5,5,6,6,6,7,8,9,9
1 | 0,0,1,2,3
1*|
Median = (6+6)/2
2 | 2,3
= 6 kilograms
2*| 5
3 |
3*| 6
16
Sample Median (n odd)
Body Mass (kg) of Felidae
0 | 2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,7,8
1 | 0,0,1,1,1,2,3,7
2 |1
Median = 8 kilograms
3 |6
4 | 0,7
5 |5
9 |6
16| 2
17| 8
17
Measures of Center
• Formula for the sample mean
y=
Total
=
n
(∑ y )
i
n
18
6
Stat 104 – Lecture 5
Sample Mean
• Body mass of Canidae
• Total = 260
• n = 28
y =
Total
260
=
= 9 . 3 kg
n
28
19
What does each measure?
• The sample midrange is midway
between the smallest and largest
values.
• The sample median divides the
distribution into a lower and an upper
half.
• The sample mean is the balance point
of the distribution.
20
Which summary is “best”?
• For symmetric shapes the sample
mean is most informative.
• For skewed shapes the sample
median is better because it is less
affected by outliers.
21
7
Download