Q 1

advertisement
Chapter 2
Describing Distributions
with Numbers
BPS - 3rd Ed.
Chapter 2
1
Numerical Summaries
 Center
of distribution
– mean
– median
 Spread
of distribution
– five-point summary (&
interquartile range)
– standard deviation (&
variance)
BPS - 3rd Ed.
Chapter 2
2
Mean (Arithmetic Average)
 Traditional
measure of center
 Notation (“xbar”): x
 Sum the values and divide by the
sample size (n)
n
1
1
x   x1  x 2  xn    xi
n
n i 1
BPS - 3rd Ed.
Chapter 2
3
Mean
Illustrative Example: “Metabolic Rate”
Data: Metabolic rates, 7 men (cal/day) :
1792 1666 1362 1614 1460 1867 1439
1792  1666  1362  1614  1460  1867  1439
x
7
11,200

7
 1600
BPS - 3rd Ed.
Chapter 2
4
Median (M)
 Half
of the ordered values are less than
or equal to the median value
 Half of the ordered values are greater
than or equal to the median value
If n is odd, the median is the middle ordered value
 If n is even, the median is the average of the two
middle ordered values

BPS - 3rd Ed.
Chapter 2
5
Median

Example 1 data: 2 4 6
Median (M) = 4

Example 2 data: 2 4 6 8
Median = 5 (average of 4 and 6)

Example 3 data: 6 2 4
Median
2
(order the values: 2 4 6 , so Median = 4)
BPS - 3rd Ed.
Chapter 2
6
Location of the Median L(M)
Location of the median: L(M) = (n+1)÷2 ,
where n = sample size.
Example: If 25 data values are recorded, the
Median is located at position (25+1)/2 = 13 in
ordered array.
BPS - 3rd Ed.
Chapter 2
7
Median
Illustrative Example
Data: Metabolic rates, n = 7:
1792 1666 1362 1614 1460 1867 1439
L(M) = (7 + 1) / 2 = 4
Ordered array:
1362 1439 1460 1614 1666 1792 1867

median
Value of median = 1614
BPS - 3rd Ed.
Chapter 2
8
Comparing the Mean & Median
 Mean
= median when data are symmetrical
 Mean  median when data skewed or have
outlier (mean ‘pulled’ toward tail) while the
median is more resistant
If we switch this:
1362 1439 1460 1614 1666 1792 1867
to this:
1362 1439 1460 1614 1666 1792 9867
the median is still 1614 but the mean goes from
1600 to 2742.9
BPS - 3rd Ed.
Chapter 2
9
Question
 The
average salary at a high tech
company is $250K / year
 The median salary is $60K.
 How can this be?
 Answer: There are some very highly
paid executives, but most of the workers
make modest salaries
BPS - 3rd Ed.
Chapter 2
10
Spread = Variability
 the amount values spread
above and below the center
 Variability
 Can
be measured in several ways:
– range (rarely used)
– 5-point summary & inter-quartile range
– variance and standard deviation
BPS - 3rd Ed.
Chapter 2
11
Range
 Based
on smallest (minimum) and largest
(maximum) values in the data set:
Range = max  min
 The
range is not a reliable measure of
spread (affected by outliers, biased)
BPS - 3rd Ed.
Chapter 2
12
Quartiles
 Three
numbers which divide the
ordered data into four equal sized
groups.
 Q1 has 25% of the data below it.
 Q2 has 50% of the data below it. (Median)
 Q3 has 75% of the data below it.
BPS - 3rd Ed.
Chapter 2
13
Obtaining the Quartiles
 Order
the data.
 Find the median
– This is Q2
 Look
at the lower half of the data (those
below the median)
– The “median” of this lower half = Q1
 Look
at the upper half of the data
– The “median” of this upper half = Q3
BPS - 3rd Ed.
Chapter 2
14
Illustrative example: 10 ages
AGE (years) values, ordered array (n = 10):
05 11 21 24 27 | 28 30 42 50 52



Q1
Q2
Q3
Q1 = 21
Q2 = average of 27 and 28 = 27.5
Q3 = 42
BPS - 3rd Ed.
Chapter 2
15
Weight Data: Sorted n = 53
Median: L(M)=(53+1)/2=27  placing it at 165
L(Q1)=(26+1)/2=13.5  placing it between 127 and 128 (127.5)
L(Q3) = 13.5 from the top  placing it between 185 and 185
100
101
106
106
110
110
119
120
120
123
124
125
127
128
130
130
133
135
139
140
Q1 = 127.5
BPS - 3rd Ed.
148
150
150
152
155
157
165
165
165
170
170
170
172
175
175
180
180
180
180
185
Q2 = 165
Chapter 2
185
185
186
187
192
194
195
203
210
212
215
220
260
Q3 = 185
16
Weight Data:
Quartiles
Q1 = 127.5
Q2 = 165
Q3 = 185
BPS - 3rd Ed.
Chapter 2
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
0166
009
0034578
00359
08
00257
555
000255
000055567
245
3
025
0
0
17
Five-Number Summary
 minimum
= 100
 Q1 = 127.5
 M = 165
 Q3 = 185
 maximum = 260
Interquartile
Range (IQR)
= Q3  Q1
= 57.5
IQR gives spread of middle 50% of the data
BPS - 3rd Ed.
Chapter 2
18
Boxplot
 Central
 A line
box spans Q1 and Q3.
in the box marks the median M.
 Lines
extend from the box out to the
minimum and maximum.
BPS - 3rd Ed.
Chapter 2
19
Weight Data: Boxplot
min
100
Q1
125
M
150
Q3
175
max
200
225
250
275
Weight
BPS - 3rd Ed.
Chapter 2
20
Quartile extrapolation



Quartile divides data set into 4 segment: bottom,
bottom middle, top middle, upper
With small data sets  extrapolate values
Illustrative data: 2, 4, 6, 8
2
|
Q1
4
|
Q2
6
|
Q3
8
Q1 = average of 2 and 4, which is 3
Q2 = average of 4 and 5, which is 5
Q3 = average of 6 and 8, which is 7
BPS - 3rd Ed.
Chapter 2
21
Boxplots  useful for comparing two groups
(text p. 39)
BPS - 3rd Ed.
Chapter 2
22
Variances & Standard Deviation
 The
most common measures of spread
 Based
on deviations around the mean
 Each
data value has a deviation,
defined as
xi  x
BPS - 3rd Ed.
Chapter 2
23
Fig 2.3: Metabolic Rate for 7 men, with
their mean (*) and two deviations shown
BPS - 3rd Ed.
Chapter 2
24
Variance
 Find
the mean
 Find the deviation of each value
 Square the deviations
 Sum the squared deviations: we call this
the sum of squares, or SS
 Divide the SS by n-1
(gives typical squared deviation from mean)
BPS - 3rd Ed.
Chapter 2
25
Variance Formula
n
1
2
2
s 
( xi  x )

(n  1) i 1
BPS - 3rd Ed.
Chapter 2
26
Standard Deviation
Square root of the variance
s
s
BPS - 3rd Ed.
2

n
1
2
( xi  x )

(n  1) i 1
Chapter 2
27
Variance and Standard Deviation
Illustrative Example
Data: Metabolic rates, 7 men (cal/day) :
1792 1666 1362 1614 1460 1867 1439
1792  1666  1362  1614  1460  1867  1439
x
7
11,200

7
 1600
BPS - 3rd Ed.
Chapter 2
28
Variance and Standard Deviation
Illustrative Example (cont.)
Observations
Deviations
Squared deviations
xi  x 
xi
xi  x
1792
17921600 = 192
1666
1666 1600 =
1362
1362 1600 = -238
1614
1614 1600 =
1460
1460 1600 = -140
(-140)2 = 19,600
1867
1867 1600 = 267
(267)2 = 71,289
1439
1439 1600 = -161
(-161)2 = 25,921
sum =
SS = 214,870
BPS - 3rd Ed.
2
66
14
0
Chapter 2
(192)2 = 36,864
(66)2 =
4,356
(-238)2 = 56,644
(14)2 =
196
29
Variance and Standard Deviation
Illustrative Example (cont.)
214,870
s 
 35,811.67
7 1
2
s  35,811.67  189.24 calories
Notes:
(1) Use standard deviation s for descriptive purposes
(2) Variance & standard deviation calculated by calculator or
computer in practice
BPS - 3rd Ed.
Chapter 2
30
Summary Statistics
 Two
main measures of central location
– Mean ( x )
– Median (M)
 Two
main measures of spread
– Standard deviation (s)
– 5-point summary (interquartile range)
BPS - 3rd Ed.
Chapter 2
31
Choosing Summary Statistics
 Use
the mean and standard deviation
for reasonably symmetric distributions
that are free of outliers.
 Use the median and IQR (or 5-point
summary) when data are skewed or
when outliers are present.
BPS - 3rd Ed.
Chapter 2
32
Example: Number of Books Read
0
0
0
0
0
0
0
0
0
1
BPS - 3rd Ed.
1
1
1
1
2
2
2
2
2
2
L(M)=(52+1)/2=26.5
2
4
2
4
2
4
3
5
3
5
3
5
M
3
5
4
5
4
5
4
6
Chapter 2
10
10
12
13
14
14
15
15
20
20
30
99
33
Illustrative example: “Books read”
5-point summary: 0, 1, 3, 5.5, 99
Note highly asymmetric distribution
0
10
20
30
40
50
60
Number of books
70
80
90
100
“xbar” = 7.06 s = 14.43
The mean and standard deviation give false
impression with asymmetric data
BPS - 3rd Ed.
Chapter 2
34
Download