Solutions

advertisement
Chapter One Review Exercises
Data Set A:
Data Set B:
Data Set C:
Data Set D:
10, 7, 18, 13, 12, 17, 11, 14, 16, 22
8, 13, 16, 10, 11, 7, 8, 11, 9, 17, 22
7, 9, 19, 8, 20, 17, 19, 9, 22, 18, 10, 10
17, 17, 7, 18, 8, 18, 9, 10, 22, 19, 22, 21, 20
1. a) For each of the data sets A-D find (i) the mean and (ii) the median.
(i)
The mean is found by summing the values and dividing by the number
of data points. So:
A: mean=140/10=14
B: mean=132/11=12
C: mean=168/12=14
D: mean=208/13=16
(ii)
The median requires that the data be ordered:
A:
B:
C:
D:
7,10,11,12,13,14,16,17,18,22
7,8,8,9,10,11,11,13,16,17,22
7,8,9,9,10,10,17,18,19,19,20,22
7,8,9,10,17,17,18,18,19,20,21,22,22
The median is the middle number in the list, or halfway between the
two middle numbers if the list has an even number of data values. So:
A: median=13.5
B: median=11
C: median=13.5
D: median=18
b) Which of these averages better describes the data set?
In all four cases there is very little to choose. The data lies between 7 and 22
and the mean and median values are quite close.
2. a) For each of the data sets A-D find the lower and upper quartiles.
The quartiles are the medians of the values above/below the median position
in the ordered sets.
A:
B:
C:
D:
7,10,11,12,13,|,14,16,17,18,22
7,8,8,9,10,11,11,13,16,17,22
7,8,9,9,10,10,|,17,18,19,19,20,22
7,8,9,10,17,17,18,18,19,20,21,22,22
We therefore have
A: LQ=11, UQ=17
B: LQ=8, UQ=16
C: LQ=9, UQ=19
D: LQ=9.5, UQ=20.5
b) Draw boxplots for the four data sets on the same number line. Use the
boxplots to determine which sets will have the larger standard deviations and
to classify the sets as unimodal symmetric, bimodal symmetric, left-skewed or
right skewed.
C will have a larger standard deviation than A – they have the same mean and
same high and low values but A’s rectangle (middle 50% of data) is shorter
than C’s. B has the same sized rectangle as A but some values are much
further from the mean than in A. B will have larger standard deviation than A
Likewise C will have a larger standard deviation than D. Remember that the
mean, the value used for standard deviation is not shown on the boxplot.
The center line is the median! Conclusion: A smaller than B smaller than D
smaller than C.
Boxplots A and C look symmetric; B looks skewed-to-the-right and D appears
to be skewed-to-the-left. This is based on looking at both the rectangle and the
dot-to-dot pieces in relation to the center line.
3. a) Draw 4-bar histograms for each data set A-D using the bins 6-10, 11-15, 1620 and 21-25.
Data Set A
Data Set B
5
Frequency
Frequency
6
5
4
3
2
1
0
6-10
11-15
16-20
Values
21-25
4
3
2
1
0
6-10
11-15
16-20
Values
21-25
Data Set C
Data Set D
7
6
5
Frequency
Frequency
7
4
3
2
1
0
6
5
4
3
2
1
0
6-10
11-15
16-20
21-25
6-10
Values
11-15
16-20
21-25
Values
b) Draw 4-bar histograms for each data set A-D using 4 equal-sized bins which
span only the numbers 7-22. [These are better than the bins in part a) which
include numbers such as 6 and 25 which lie outside the range of the data.]
Data Set B
5
6
4
5
Frequency
Frequency
Data Set A
3
2
1
4
3
2
1
0
0
7-10
11-14
15-18
19-22
7-10
Values
15-18
19-22
Values
Data Set D
Data Set C
7
6
6
5
5
Frequency
Frequency
11-14
4
3
2
1
4
3
2
1
0
0
7-10
11-14
15-18
Values
19-22
7-10
11-14
15-18
19-22
Values
c) Comment on whether the better choice in part b) actually gives a histogram
that says something different than the simpler 5-and-10 bin ranges of part a.
The histograms for Data Sets A and B are identical; there is little qualitative
change in the histograms for Data Sets C and D. The extra effort expended to
get ‘perfect’ bins with no overlap at the ends, isn’t really worth it.
d) Use the histograms to classify the sets as unimodal symmetric, bimodal
symmetric, left-skewed or right skewed. Is this consistent with the answer to
2b?
Data Set A looks symmetric, Data Set B looks skewed-to-the right and Data
Sets C and D look bimodal (because of the empty spot in the histogram.)
These conclusions for Data Sets a and B are the same as those from the
boxplots. But the bimodality seen in the histogram is something that is not
apparent from the boxplot. The boxplot merely says that approximately 25%
of the data values lie in each fourth – it does not indicate if these are well
spread within a region or all at one end. This is seen in the histogram. A closer
inspection of the histogram shows that in C there are an equal amount of high
and low data values – so possibly bimodally symmetric. In D there are twice
as many high values as low values, but the description ‘skewed-to-the-left’ is
a bit of a stretch!
4. a) For each of the data sets A-D compute the standard deviation.
Recall that the means were found in #1 a) i)
These are needed in the calculations.
Data Set A
Data Value
7
10
11
12
13
14
16
17
18
22
Difference from Mean (14)
7
4
3
2
1
0
2
3
4
8
Squared Difference
49
16
9
4
1
0
4
9
16
64
SUM = 172
Dividing by one less than the number of data points gives 172/9=19.11.
Taking the square root gives a standard deviation of 4.371625
Data Set B
Data Value
7
8
8
9
10
11
11
13
16
17
22
Difference from Mean (12)
5
4
4
3
2
1
1
1
4
5
10
Squared Difference
25
16
16
9
4
1
1
1
16
25
100
SUM = 214
Dividing by one less than the number of data points gives 214/10=21.40.
Taking the square root gives a standard deviation of 4.626013
Data Set C
Data Value
7
8
9
9
10
10
17
18
19
19
20
22
Difference from Mean (14)
7
6
5
5
4
4
3
4
5
5
6
8
Squared Difference
49
36
25
25
16
16
9
16
25
25
36
64
SUM = 342
Dividing by one less than the number of data points gives 342/11=31.09.
Taking the square root gives a standard deviation of 5.575922
Data Set D
Data Value
7
8
9
10
17
17
18
18
19
20
21
22
22
Difference from Mean (16)
9
8
7
6
1
1
2
2
3
4
5
6
6
Squared Difference
81
64
49
36
1
1
4
4
9
16
25
36
36
SUM = 362
Dividing by one less than the number of data points gives 362/12=30.17.
Taking the square root gives a standard deviation of 5.492419
SUMMARY
A: Standard Deviation
B: Standard Deviation
C: Standard Deviation
D: Standard Deviation
= 4.37
= 4.63
= 5.58
= 5.49
b) Is this consistent with your answer to 2b?
Yes. The standard deviations of A and B are much smaller than those of C and
D. This is expected from the sizes of the rectangles on the boxplot. B beats out
A and C beats out D for the reasons given in 2b).
5. Consider boxplots A, B and C as given along the same number line below:
Which of the following are true?
a) All the data values for A are greater than the median data value for B.
True. The Low value for A appears to be the same as the Median value of
B
b) One quarter of the data values for C are less than the median value for B.
True. The LQ for C appears to be the same as the Median value of B
c) A has a greater median than C.
False. The Median of C appears to be the same as the UQ for A.
d) Boxplot B is symmetric.
True. Folding the boxplot along the Median line gives a match on each
side.
e) Boxplot C is right skewed.
False. The boxplot indicates more on the left of the median and so is leftskewed.
f) In A, the mean is greater than the median.
False. Since the boxplot is symmetric, the mean and median are likely to
be very close.
g) In C, the mean is less than the median
True. Since the boxplot is left-skewed, the mean is likely to be less than
the median. The excess values to the left of the median are not balanced
by values above the median and will therefore pull the mean to a value
below the median.
Download