Stats_lecture_2 (Statistics lecture 2 slides)

advertisement
Summarizing and Displaying
Measurement Data
If a study shows that daily use of a certain
expensive exercise machine resulted in an
average loss of 10 pounds, what more would
you want to know about the numbers than
just the average?
Imagine you wanted to compare the cost of
living in two different cities. You get local
papers and write down the rental costs of 50
apartments in each place. How would you
summarize the values in order to compare
the two places?




Realize that summarizing important features
of a list of numbers gives more information
than just the unordered list.
Understand the concept of the shape of a set
of numbers.
Learn how to make stemplots and histograms
Understand summary measures like the mean
and standard deviation
170, 163, 178, 163, 168, 165, 170, 155, 191, 178,
175, 185, 183, 165, 165, 180, 185, 165, 168, 152,
178, 183, 157, 165, 183, 157, 170, 168, 163, 165,
180, 163, 140, 163, 163, 163, 165, 178, 150, 170,
165, 165, 157, 165, 173, 160, 163, 165, 178, 173,
180, 196, 185, 175, 160, 168, 193, 173, 183, 165,
163, 175, 168, 160, 208, 157, 180, 170, 155, 173,
178, 170, 157, 163, 163, 180, 170, 165, 170, 170,
180, 168, 155, 175, 168, 147, 191, 178, 173, 170,
178, 185, 152, 170, 175, 178, 163, 175, 175, 165,
175, 175, 157, 163, 165, 160, 178, 152, 160, 170,
170, 160, 157,
208, 196, 193, 191, 191, 185, 185, 185, 185, 183,
183, 183, 183, 180, 180, 180, 180, 180, 180, 178,
178, 178, 178, 178, 178, 178, 178, 178, 178, 175,
175, 175, 175, 175, 175, 175, 175, 175, 173, 173,
173, 173, 173, 170, 170, 170, 170, 170, 170, 170,
170, 170, 170, 170, 170, 170, 168, 168, 168, 168,
168, 168, 168, 165, 165, 165, 165, 165, 165, 165,
165, 165, 165, 165, 165, 165, 165, 165, 163, 163,
163, 163, 163, 163, 163, 163, 163, 163, 163, 163,
163, 160, 160, 160, 160, 160, 160, 157, 157, 157,
157, 157, 157, 157, 155, 155, 155, 152, 152, 152,
150, 147, 140
 The
Center
 The Variability
 The Shape




Mean (average): Total of the values, divided
by the number of values
Median: The middle value of an ordered list of
values
Mode: The most common value
Outliers: Atypical values far from the center




Average: $2,827,104
Median: $950,000
Mode: $327,000 (also the minimum)
Outlier: $21.7 million (Alex Rodriguez of the
NY Yankees)
Some measures of variability:
 Maximum and minimum: Largest and
smallest values
 Range: The distance between the largest and
smallest values
 Quartiles: The medians of each half of the
ordered list of values
 Standard deviation: Think of it as the average
distance of all the values from the mean.



Don’t consider the average to be “normal”
Variability is normal
Anything within about 3 standard deviations
of the mean is “normal”
R
A
N
G
E
125 Highest
120
110 Upper quartile
110
100 Median
90
90 Lower quartile
80
75 Lowest
Interquartile
Range

Data: 90, 90, 100, 110, 110
◦
◦
◦
◦
◦
◦
Mean: 100
Deviations from mean: -10, -10, 0, 10, 10
Devs squared: 100, 100, 0, 100, 100
Sum of squared devs: 400
Sum of sq devs/(n-1): 400/4=100 (variance)
Square root of variance: 10
Therefore, the standard deviation is 10

Data: 50, 60, 100, 140, 150
◦
◦
◦
◦
◦
◦
Mean: 100
Deviations from mean: -50, -40, 0, 40, 50
Devs squared: 2500, 1600, 0, 2500, 1600
Sum of squared devs: 8200
Sum of sq devs/(n-1):8200/4=2050 (variance)
Square root of variance: 45.3
Therefore, the standard deviation is 45.3
The shape of a list of values will tell you
important things about how the values are
distributed.
To visualize the shape of a list of values, plot
them using:
◦ a stemplot (also called stem-and-leaf)
◦ a histogram
◦ or a smooth line (next lecture)


Divide the range into equal units, so that the
first few digits can be used as the stems.
(Ideally, 6-15 stems.)
Attach a leaf, made of the next digit, to
represent each data point. (Ignore any
remaining digits.)
Ages in years:
42.2, 22.7, 21.2, 65.4, 29.3, 22.3, 21.5, 20.7, 29.4, 23.1,
22.9, 21.5, 21.4, 21.3, 21.3, 21.2, 21.2, 21.1, 20.8, 30.2,
25.7, 24.5, 23.2, 22.3, 22.2, 22.2, 22.2, 22.1, 21.9, 21.8,
21.7, 21.7, 21.6, 21.4, 21.3, 21.2, 21.2, 21.2, 21.2, 21.2,
21.1, 21.1, 20.8, 20.7, 20.7, 20.1, 20.0, 19.5, 35.8, 26.1,
22.3, 22.2, 21.8, 21.5, 20.4, 47.5, 45.5, 30.6, 28.1, 27.4,
26.5, 24.1, 23.3, 23.3, 22.9, 22.9, 22.6, 22.4, 22.4, 22.3,
22.3, 22.0, 21.9, 21.9, 21.8, 21.7, 21.7, 21.7, 21.6, 21.6,
21.6, 21.5, 21.5, 21.5, 21.4, 21.2, 21.2, 21.2, 21.1, 21.1,
21.0, 20.9, 20.9, 20.8, 20.8, 20.8, 20.8, 20.8, 20.6, 20.6,
20.6, 20.5, 20.5, 20.5, 20.5, 20.4, 20.4, 20.3, 20.2, 19.9,
19.6, 63.2, 55.0
19
20
21
22
23
|
|
|
|
|
19
20
21
22
23
|
|
|
|
|
5
0123444
0111112222222222
01222
12
19 | 569
20 | 01234445555666777888888899
21 | 011111222222222223334445555556666777778889999
22 | 012222333334467999
23 | 1233
24 | 15
25 | 7
26 | 15
27 | 4
28 | 1
29 | 34
30 | 26
2|
2|
3|
3|
4|
4|
5|
(20-24)
(25-29)
(30-34)
(35-39)
(40-44)
(45-49)
(50-54)
2|000000000000001111111111111111111111111111111
11111111122222222222222222222222222223333333334
2|56677899
3|01
3|6
4|2
4|57
5|
5|5
6|3
6|5




Shows the shape of a set of values, similar to
a stemplot
More useful for large data sets because you
don’t have to enter every value
X-axis: Range of possible values
Y-axis: The count of each possible value
Drop Page Fields Here
Total
Count of inches
16
14
12
10
Drop Series Fields Here
8
Total
6
4
2
0
55
58
59
60
61
62
63
64
65
inches
66
67
68
69
70
71
72
73
(15-19)
(15-19)
Drop Page Fields Here
Total
Count of inches
16
14
12
10
Drop Series Fields Here
8
Total
6
4
2
0
55
58
59
60
61
62
63
64
65
inches
66
67
68
69
70
71
72
73
Drop Page Fields Here
Total
Count of inches
16
14
12
10
Drop Series Fields Here
8
Total
6
4
2
0
55
58
59
60
61
62
63
64
65
inches
66
67
68
69
70
71
72
73
Drop Page Fields Here
Count of Q3
16
14
12
10
Q3
8
Female
Male
6
4
2
0
55
58
59
60
61
62
63
64
65
66
67
Q1
68
69
70
71
72
73
75
76
77
89
R
A
N
G
E
125 Highest
120
110 Upper quartile
110
100 Median
90
90 Lower quartile
80
75 Lowest
Interquartile
Range
Median
Lower quartile
Lowest value
Upper quartile
Highest value





Lowest
First quartile
Median
Third quartile
Highest
140
163
168
178
208

Women: 140, 150, 152, 152, 155, 155, 155, 157,

Men: 147, 152, 163, 165, 168, 170, 170, 170, 173,
157,
160,
163,
165,
168,
170,
173,
178,
157,
160,
163,
165,
168,
170,
173,
180,
157,
163,
163,
165,
168,
170,
173,
180,
157,
163,
163,
165,
168,
170,
175,
180,
157,
163,
165,
165,
168,
170,
175,
208
157,
163,
165,
165,
168,
170,
175,
160,
163,
165,
165,
168,
170,
175,
160,
163,
165,
165,
168,
170,
175,
160,
163,
165,
168,
168,
170,
175,
160,
163,
165,
168,
170,
173,
178,
175, 175, 175, 178, 178, 178, 178, 178, 178, 178, 178,
180, 180, 180, 183, 183, 183, 183, 185, 185, 185, 185,
191, 191, 193, 196
Lowest
First quartile
Median
Third quartile
Highest
Women
140
163
165
170
208
Men
147
174
178
183
196


Presidents: 67, 90, 83, 85, 73, 80, 78, 79, 68,
71, 53, 65, 74, 64, 77, 56, 66, 63, 70, 49,
56, 71, 67, 71, 58, 60, 72, 67, 57, 60, 90,
63, 88, 78, 46, 64, 81, 93
Vice-Presidents: 90, 83, 80, 73, 70, 51, 68,
79, 70, 71, 72, 74, 67, 54, 81, 66, 62, 63,
68, 57, 66, 96, 78, 55, 60, 66, 57, 71, 60,
85, 76, 8, 77, 88, 78, 81, 64, 66, 70
Presidents Vice-Presidents
Lowest age
46
51
Lower quartile
63
64
Median age
69
70
Upper quartile
78
79
Highest age
93
98
Download