Chapter 2

advertisement
Chapter 6
1
Chebychev’s Theorem
• The portion of any data set lying within k standard
deviations (k > 1) of the mean is at least:
1
1 2
k
1 3
• k = 2: In any data set, at least 1  2  or 75%
2
4
of the data lie within 2 standard deviations of the
mean.
1 8
• k = 3: In any data set, at least 1  2  or 88.9%
3
9
of the data lie within 3 standard deviations of the
mean.
2
Example: Using Chebychev’s Theorem
The age distribution for Florida is shown in the
histogram. Apply Chebychev’s Theorem to the data
using k = 2. What can you conclude?
3
Solution: Using Chebychev’s Theorem
k = 2: μ – 2σ = 39.2 – 2(24.8) = -10.4 (use 0 since age
can’t be negative)
μ + 2σ = 39.2 + 2(24.8) = 88.8
At least 75% of the population of Florida is between 0
and 88.8 years old.
4
Standard Deviation for Grouped Data
Sample standard deviation for a frequency distribution
•
( x  x ) 2 f
s
n 1
where n= Σf (the number of
entries in the data set)
• When a frequency distribution has classes, estimate the
sample mean and standard deviation by using the
midpoint of each class.
5
Example: Finding the Standard Deviation
for Grouped Data
You collect a random sample of the
number of children per household in
a region. Find the sample mean and
the sample standard deviation of the
data set.
Number of Children in
50 Households
1
3
1
1
1
1
2
2
1
0
1
1
0
0
0
1
5
0
3
6
3
0
3
1
1
1
1
6
0
1
3
6
6
1
2
2
3
0
1
1
4
1
1
2
2
0
3
0
2
4
6
Solution: Finding the Standard Deviation
for Grouped Data
• First construct a frequency distribution.
• Find the mean of the frequency
distribution.
xf 91
x

 1.8
n
50
The sample mean is about 1.8
children.
x
f
xf
0
10
0(10) = 0
1
19
1(19) = 19
2
7
2(7) = 14
3
7
3(7) =21
4
2
4(2) = 8
5
1
5(1) = 5
6
4
6(4) = 24
Σf = 50 Σ(xf )= 91
7
Solution: Finding the Standard Deviation
for Grouped Data
• Determine the sum of squares.
x
f
xx
( x  x )2
0
10
0 – 1.8 = –1.8
(–1.8)2 = 3.24
3.24(10) = 32.40
1
19
1 – 1.8 = –0.8
(–0.8)2 = 0.64
0.64(19) = 12.16
2
7
2 – 1.8 = 0.2
(0.2)2 = 0.04
0.04(7) = 0.28
3
7
3 – 1.8 = 1.2
(1.2)2 = 1.44
1.44(7) = 10.08
4
2
4 – 1.8 = 2.2
(2.2)2 = 4.84
4.84(2) = 9.68
5
1
5 – 1.8 = 3.2
(3.2)2 = 10.24
10.24(1) = 10.24
6
4
6 – 1.8 = 4.2
(4.2)2 = 17.64
17.64(4) = 70.56
( x  x )2 f
( x  x )2 f  145.40
8
Solution: Finding the Standard Deviation
for Grouped Data
• Find the sample standard deviation.
x 2 x
( x  x )2
( x  x ) f
145.40
s

 1.7
n 1
50  1
( x  x )2 f
The standard deviation is about 1.7 children.
9
Quartiles
• Fractiles are numbers that partition (divide) an
ordered data set into equal parts.
• Quartiles approximately divide an ordered data set
into four equal parts.
 First quartile, Q1: About one quarter of the data
fall on or below Q1.
 Second quartile, Q2: About one half of the data
fall on or below Q2 (median).
 Third quartile, Q3: About three quarters of the
data fall on or below Q3.
10
Example: Finding Quartiles
The test scores of 15 employees enrolled in a CPR
training course are listed. Find the first, second, and
third quartiles of the test scores.
13 9 18 15 14 21 7 10 11 20 5 18 37 16 17
Solution:
• Q2 divides the data set into two halves.
Lower half
Upper half
5 7 9 10 11 13 14 15 16 17 18 18 20 21 37
Q2
11
Solution: Finding Quartiles
• The first and third quartiles are the medians of the
lower and upper halves of the data set.
Lower half
Upper half
5 7 9 10 11 13 14 15 16 17 18 18 20 21 37
Q1
Q2
Q3
About one fourth of the employees scored 10 or less,
about one half scored 15 or less; and about three
fourths scored 18 or less.
12
Interquartile Range
Interquartile Range (IQR)
• The difference between the third and first quartiles.
• IQR = Q3 – Q1
13
Example: Finding the Interquartile Range
Find the interquartile range of the test scores.
Recall Q1 = 10, Q2 = 15, and Q3 = 18
Solution:
• IQR = Q3 – Q1 = 18 – 10 = 8
The test scores in the middle portion of the data set
vary by at most 8 points.
14
Box-and-Whisker Plot
Box-and-whisker plot
• Exploratory data analysis tool.
• Highlights important features of a data set.
• Requires (five-number summary):
 Minimum entry
 First quartile Q1
 Median Q2
 Third quartile Q3
 Maximum entry
15
Drawing a Box-and-Whisker Plot
1. Find the five-number summary of the data set.
2. Construct a horizontal scale that spans the range of
the data.
3. Plot the five numbers above the horizontal scale.
4. Draw a box above the horizontal scale from Q1 to Q3
and draw a vertical line in the box at Q2.
5. Draw whiskers from the box to the minimum and
maximum entries.
Box
Whisker
Minimum
entry
Whisker
Q1
Median, Q2
Q3
Maximum
entry
16
Example: Drawing a Box-and-Whisker
Plot
Draw a box-and-whisker plot that represents the 15 test
scores.
Recall Min = 5 Q1 = 10 Q2 = 15 Q3 = 18 Max = 37
Solution:
5
10
15
18
37
About half the scores are between 10 and 18. By looking
at the length of the right whisker, you can conclude 37 is
a possible outlier.
17
18
The Standard Score
Standard Score (z-score)
• Represents the number of standard deviations a given
value x falls from the mean μ.
value - mean
x

• z
standard deviation

19
Example: Comparing z-Scores from
Different Data Sets
In 2007, Forest Whitaker won the Best Actor Oscar at
age 45 for his role in the movie The Last King of
Scotland. Helen Mirren won the Best Actress Oscar at
age 61 for her role in The Queen. The mean age of all
best actor winners is 43.7, with a standard deviation of
8.8. The mean age of all best actress winners is 36, with
a standard deviation of 11.5. Find the z-score that
corresponds to the age for each actor or actress. Then
compare your results.
20
Solution: Comparing z-Scores from
Different Data Sets
• Forest Whitaker
z
x

• Helen Mirren
z
x

45  43.7

 0.15
8.8
0.15 standard
deviations above
the mean
61  36

 2.17
11.5
2.17 standard
deviations above
the mean
21
Solution: Comparing z-Scores from
Different Data Sets
z = 0.15
z = 2.17
The z-score corresponding to the age of Helen Mirren
is more than two standard deviations from the mean,
so it is considered unusual. Compared to other Best
Actress winners, she is relatively older, whereas the
age of Forest Whitaker is only slightly higher than the
average age of other Best Actor winners.
22
Chapter 6 Summary
•
•
•
•
•
Determined the quartiles of a data set
Determined the interquartile range of a data set
Created a box-and-whisker plot
Interpreted other fractiles such as percentiles
Determined and interpreted the standard score
(z-score)
23
Download