Chapter 2 – Visualizations of Data
Answers
1.
Number of Plastic Beverage
Bottles per Week
1
2
3
4
5
6
7
8
Tally
||
|||
||
|||
|
Frequency
The number of students who replied “2” was 6. Thirty minus twenty-four is 6.
2. There is not enough information given to determine the answer.
2
6
3
2
3
7
6
1
CK-12 Advanced Probability and Statistics
Concepts
b)
Chapter 2 – Visualizations of Data
3. a)
Number of Liters per Person
[
[
[
[
[
[
[
[
[
[
60 70
)
70 80
80 90
)
)
)
)
)
)
)
)
)
Frequency
4
3
0
1
3
2
0
1
1
0
c) As the number of bottles increases, the frequency generally decreases. The data is unimodal and skewed right.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
4. a)
Class
0-25
25-50
50-75
75-100
100-125
125-150
150-175
175-200
200-225
Frequency
7
1
3
1
1
0
0
0
1
50
7.1
21.4
7.1
7.1
0
0
0
7.1
Relative frequency (%)
Cumulative frequency
7
8
11
12
13
13
13
13
14 b)
60
50
40
30
20
10
0
60 c)
100
50
Relative Frequency (% BTUs)
40
50
30
20
10
Class
50 100
BTUs
150
Concepts
20
Relative cumulative frequency (%)
50
57.1
78.6
85.7
92.9
92.9
92.9
92.9
100
200
250
Chapter 2 – Visualizations of Data
d)
100
Cumulative Relative Frequency (%)
90
80
70
60
50
40
30
20
10
50 100
BTUs
150 200 250
10 e) The distribution is skewed to the right meaning that most materials use recycling materials to save energy. Only a few (aluminum cans, copper wire, and carpet) save little energy. f) Relative frequency total = 99.8%. The total should be 100% but there could be rounding errors in the calculations. g) The horizontal section of the ogive plot means there was no data to input (frequency = 0) so accumulated data did not change. h) Most of the data in the chart is from 0 – 50 million tons of BTUs per ton in energy savings. This will be the steepest part of the ogive plot. This tells you that the steepest part contains most of the data.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
5. a) The outliers are the first and last bars of the histogram. The first bar shows at least one CEO making a salary of $0 and the final bar shows at least one CEO making over $ 1000 000. b) The salary of $300 000 appears the most often. Approximately 14 CEOs report having this salary. c) Approximately 6 CEOs have a salary of $ 500 000
6. There is an error in the table. The first value of the eighth bin should be 66.
The data is grouped in bins such that the chosen interval is appropriate for the dataset. The histogram displays the center as being in the clustered region of 45 – 60. The histogram does not appear to be skewed and there are no obvious outliers. The data appears represent a normal distribution.
7. A dataset that represents continuous data is easily represented using frequency tables, histograms or frequency if the dataset is large. The range of the data must also be great enough to offer a large enough spread in order to create appropriate intervals or bin sizes.
8. A dataset that is large is best when representing data using frequency tables, histograms, frequency polygons or ogive plots. A frequency table helps to order the data but it does not present a visual representation that is easily interpreted. A histogram displays data spread uniformly over the entire interval. The shape of the histogram provides a great deal of information about the distribution of the data. The ogive plot has the cumulative frequency on its x-axis. This representation allows for the study of medians and quartiles.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
9. When the distribution’s shape is much skewed or has extreme outliers, the mean will be pulled towards the skewed end making it not very representative of the normal center of the dataset. When the distribution displays a positive skew, the mean is greater than the median.
When the distribution has a negative skew, the mean is less than the median. In a normal distribution, the mean is the center of the distribution.
10. Determine the range of the data. (maximum value – the minimum value). Decide how many classes you wish to display on your graph. (usually 7 – 10 bins provide a visual display of the distribution. When you have decided, divide the range by the number of bins to determine the number of values in each class.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
Answers
1.
7
6
5
4
3
2
1
0
2.
Material
Plastics
Lead
Aluminum
Iron
Copper
Tin
Zinc
Nickel
Barium
Other Elements and chemicals
Material
Kilograms Approx. % of Total Weight
6.21
1.71
3.83
23
6.33
14.18
5.54
2.12
0.27
20.52
7.85
1
0.60
0.23
0.05
6.44
2.22
0.85
0.185
23.85
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
3.
Plastics
Lead
Aluminum
Iron
Copper
Tin
Zinc
4. Answers will vary. Bar graphs are easier to analyze here because there are so many categories.
5.
8
6
4
2
0
16
14
12
10
A B C
Grade
D F
6.
7.
Grade
A
B
C
D
F
7
4
3
# Students Approximate % of Total Grade
14 48.28
24.14
13.79
10.34
3.45
CK-12 Advanced Probability and Statistics
Concepts
A
Chapter 2 – Visualizations of Data
8. Answers will vary. Although bar graphs are easier to analyze, the relatively few categories make the pie chart easy to analyze as well.
9.
Highest Level of Education
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
10.
Highest Level of Education
High School
High School Graduate
Median Income of
Persons age 25+
$20,321
$26,505
Some College
Associate’s Degree
$31,056
$35,009
Bachelor’s Degree or Higher $49,303
Bachelor’s Degree
Master’s Degree
Professional Degree
Doctorate Degree
$43,143
$52,390
$82,473
$69,432
Approximate Percentage of Total Income
4.96
6.47
7.58
8.55
12.04
10.53
12.79
20.13
16.95
11.
High School
High school graduate
Some college
Associate's degree
Bachelor's degree or higher
12. Answers will vary. Bar graphs are easier to analyze here because there are so many categories.
13. The median eliminates outliers. Especially if the data is skewed, the median is used.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
Answers
1.
Dot plot:
10 20 30
Percentage
40
2.
The distribution is uniform. The center of the data is approximately 25 with data somewhat evenly spread from 5 through to 48.
3.
Stem-and-leaf plot:
0 5 5
1 1 2 2 3 4 5 9 9
2 0 1 3 5 5 6 6 7 8 8 9
3 0 2 3 3 4 5 6 9
4 0 1 2 2 5 8
4.
27
5.
The distribution is left skewed with no outliers.
6.
The distribution is left skewed with one outlier.
7.
The distribution is symmetric with no apparent outliers.
8.
The distribution is right skewed with no apparent outliers.
9.
The first data set is symmetric with no apparent outliers. The second data set is symmetric with no apparent outliers. The third data set is bimodal. The fourth data set is evenly distributed.
CK-12 Advanced Probability and Statistics
Concepts
2
1
4
3
7
6
5
10
9
8
14
13
12
11
1
2
10 20
Chapter 2 – Visualizations of Data
10.
The first data set is centered on 52 with a large peak at 52. The second data set is centered on 52 with a peak at 52. The third data set is centered at 52 but has peaks at
25 and 85. The fourth data set has no center, all peaks are even.
11.
The first dot plot has the smallest standard deviation.
12.
The third dot plot has the largest standard deviation.
13.
Dot plots are useful with small data sets that use categorical data. When the data
14.
a) Stem-and-leaf plot describes qualitative observations, measures of spread or shape are not used. These characteristics to describe dot plots are used when the categories are numerical.
3 2 3 6 7 8
4 0 1 3 3 4 4 5 5 5 5 6 6 7 7 7 8 8 8 8 9
5 0 0 0 0 0 0 1 1 2 3 3 3 5 5 5 6 6 6 6 7 7 8 8 9
6 0 1 1 1 2 2 3 9 9
7 0 4 b) Dot Plot
30 40 50 c) The data set is symmetric.
Ages of CEOs d) Outliers could include 32, 33, and 74.
60 70 80 90
CK-12 Advanced Probability and Statistics
Concepts
4
3
6
5
2
1
9
8
7
12
11
10
1
2
3
Chapter 2 – Visualizations of Data
15.
The data set in this example is the measurement of pulse rate of 15 teenagers. If one of the teenagers had their pulse rate measured after running a five mile marathon, this measurement would be an outlier. If, however, all of the teenagers were in a five mile marathon and had their pulse rates taken at the finish line, there would be no outliers.
16.
Yes. The outliers can be seen as they lie outside the main group of numbers.
17.
When using a five number summary, use the interquartile range (IQR) to determine if a data set contains an outlier. The IQR is found by subtracting the first quartile value from the third quartile value. Then multiply the IQR by 1.5. If you subtract 1.5 x IQR from the first quartile value, any numbers less than this are outliers. If you add 1.5 x IQR from the third quartile value, any numbers more than this are outliers.
18.
a) Stem-and-leaf
5 5
6 1 7
7 5 5 6
8 0 0 1 4 5 7 8
9 0 3 3 4
b) Dot Plot
10 20 30 40 50 60 70 80
Exam Scores
19.
Web link does not exist.
CK-12 Advanced Probability and Statistics
Concepts
90 100
Chapter 2 – Visualizations of Data
Answers
1.
The independent variable is the explanatory variable and the dependent variable is the response variable. Therefore comparing the municipal waste to each state would have the explanatory variable as the state name and the response variable as the amount of waste.
If comparing the percentage of each state in the union versus the amount of waste, the percentage would be the explanatory variable and the response variable would be the amount of waste.
2.
13 386 000 tons
3.
50000
45000
40000
35000
30000
25000
20000
15000
10000
5000
0
0
10 20 30 40
Percentage of State in Union
50 60
4.
The direction is positive but there is a weak correlation between the two variables.
5.
There is a decrease in the recycling rate of plastic bottles made from PET and an increase in the recycling rate of HDPE.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
6.
The total change in PET recycling went from about 33% to about 22%, so from about
10-12% from the years 1995 to 2001.
7.
One explanation was that there was an increase in the use of HDPE in recycling containers and this type of recycled material is used more often in the production of plastic lumber, tables, roadside curbs, benches, truck cargo liners, trash receptacles, stationery (e.g. rulers) and other durable plastic products.
8.
This change was the most rapid from the middle of 1995 to the middle of 1996.
9.
Dot plots allow for the interpretation of shape, center, and spread but are only used for small sets of data. Stem and leaf plots are useful for seeing the shape of the distribution of data. Both of these plots are used for univariate data sets to determine if the data is symmetric or skewed, to see any gaps and spot outliers. A scatter plot is useful for determining trends in data and the correlation between the explanatory and response variables. Scatter plots are used for bivariate data sets to see the general relationship between the variables.
10.
Median and IQR can be used to describe any set of data but are particularly useful for skewed data. When data is skewed or has extreme outliers, the mean is pulled toward the skewed end. This makes the mean not representative of the middle
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
Answers
1.
Min X
35
Lower Quartile
53
2.
=
3
1
=
= 22.5
Q
1
1.5
IQR
Median
67.5
Q
3
+ 1.5
Upper Quartile
75.5
IQR
Max X
95
There are no data values less than 19.5 and none greater than 109.25. Therefore, there are no outliers.
3.
4.
The third quarter of the data is more densely concentrated in a smaller area. 50% of the data is between 53 and 75.5. The data is very close to being symmetric although the data does skew slightly to the left.
The median of the data is 67.5. The mean should be pulled left in the direction of the skewness and thus be smaller than the median. The mean of the data is 65.7.
5.
Median
82
Upper Quartile
89
Max X
105
Min X
0
Lower Quartile
72
6.
=
3
1
= 89 72
= 17
Q
1
1.5
IQR Q
3
+ 1.5
IQR
There are three data values 0, 4, and 46 that are less than 46.5. These values are outliers for this data set. There are no data values greater than 114.5.
7. The data in the lower 25% are widely spread compared to the other sections of the graph. 50% of the data is between 72 and 89. The data is moderately symmetric although it does skew to the left.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
8.
The median of the data is 82. The mean should be pulled left in the direction of the skewness and should be considerably smaller than the median. The mean of the data is
75.4.
9.
The median of the data for Utah is higher than that of Idaho. This indicates that the reservoirs in Idaho are less full than those in Utah. The IQR of the dataset for Idaho is 22.5 spread out between a capacity of 75.5 and 53. The IQR of the dataset for Utah it is 17 spread out between a capacity of 89 and 72. Therefore 50% of the capacity percentages in Utah are more concentrated than those of Idaho.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
Answers
1.
Min X
3.12
2.
=
3
1
=
= 0.173
Lower Quartile
3.22
Q
1
1.5
IQR
Median
3.282
Q
3
Upper Quartile
3.393
+ 1.5
IQR
Max X
3.528
There are no outliers since there are no data values less than 2.9605 and none greater than
3.6525.
3.
4.
Min X
.8242
Lower Quartile
.8506
Median
.8670
Upper Quartile
.8963
Max X
.9320
The center and the measures of spread for the given dataset will decrease by a factor of
1/3.7854 or 0.2642. The boxplots for both datasets will have the same shape but the plot for
US gallons will be stretched out more.
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
5.
State
Alaska
Washington
Idaho
Montana
North Dakota
Minnesota
Michigan
New York
Vermont
New Hampshire
Maine
Average Price of a
Gallon of Gas (US$)
3.458
3.528
3.26
3.22
3.282
3.12
3.352
3.393
3.252
3.152
3.309
Average Price of a
Liter of Gas (US$)
0.833
0.874
0.867
0.861
0.824
0.914
0.896
0.859
0.851
0.932
0.886
6. This data was retrieved from the website http://fuelgaugereport.opisnet.com/sbsavg.html on
July 5, 2014. The data will vary depending on the day.
State
Alaska
Washington
Idaho
Montana
North Dakota
Minnesota
Michigan
New York
Vermont
New Hampshire
Maine
7. The prices are already in US$.
Average Price of a Gallon of Gas (US$)
4.121
4.120
3.789
3.591
3.705
3.588
3.869
3.809
3.756
3.658
3.722
CK-12 Advanced Probability and Statistics
Concepts
Chapter 2 – Visualizations of Data
8. A dot plot and a stem-and-leaf plot are used when the dataset consists of a small number of values. A histogram and a box-and-whisker plot are used when the dataset is large.
9. Histograms and Box-and-Whisker Plots
10. The center of the distribution would change by the same scale factor. Calculations like the range, the IQR and the standard deviation will change proportionally by the same scale factor.
The five-number summary would also change proportionally.
11. e
12. a) The median is 121.5 b) The lower quartile is 114. The upper quartile is 129.5 c)
=
3
1
=
= 15.5
d)
Q
1
1.5
IQR Q
3
+ 1.5
IQR
There are no outliers since no data values are less than 90.75 or greater than 152.75 e)
CK-12 Advanced Probability and Statistics
Concepts