Answer Key CK-12 Chapter 02 Advanced Probability and Statistics Concepts (PDF) chap#2.3

Chapter 2 – Visualizations of Data

2.1 Histograms

Answer Key

Answers

1.

Number of Plastic Beverage

Bottles per Week

1

2

3

4

5

6

7

8

Tally

||

|||| |

|||

||

|||

|||| ||

|||| |

|

Frequency

The number of students who replied “2” was 6. Thirty minus twenty-four is 6.

2. There is not enough information given to determine the answer.

2

6

3

2

3

7

6

1

CK-12 Advanced Probability and Statistics

Concepts

1

b)


3. a)

Number of Liters per Person

[

[

[

[

[

[

[

[

[

[

60 70

)

70 80

80 90

)

)

)

)

)

)

)

)

)

Frequency

4

3

0

1

3

2

0

1

1

0

Answer Key

c) As the number of bottles increases, the frequency generally decreases. The data is unimodal and skewed right.


Concepts

2


Answer Key

4. a)

Class

0-25

25-50

50-75

75-100

100-125

125-150

150-175

175-200

200-225

Frequency

7

1

3

1

1

0

0

0

1

50

7.1

21.4

7.1

7.1

0

0

0

7.1

Relative frequency (%)

Cumulative frequency

7

8

11

12

13

13

13

13

14 b)

Relative Frequency

60

50

40

30

20

10

0

60 c)

100

50

Relative Frequency (% BTUs)

40

50

30

20

10

Class

50 100

BTUs

150

Concepts

20

Relative cumulative frequency (%)

50

57.1

78.6

85.7

92.9

92.9

92.9

92.9

100

200

3

250


Answer Key

d)

100

Cumulative Relative Frequency (%)

90

80

70

60

50

40

30

20

10

50 100

BTUs

150 200 250

10 e) The distribution is skewed to the right meaning that most materials use recycling materials to save energy. Only a few (aluminum cans, copper wire, and carpet) save little energy. f) Relative frequency total = 99.8%. The total should be 100% but there could be rounding errors in the calculations. g) The horizontal section of the ogive plot means there was no data to input (frequency = 0) so accumulated data did not change. h) Most of the data in the chart is from 0 – 50 million tons of BTUs per ton in energy savings. This will be the steepest part of the ogive plot. This tells you that the steepest part contains most of the data.


Concepts

4


Answer Key

5. a) The outliers are the first and last bars of the histogram. The first bar shows at least one CEO making a salary of $0 and the final bar shows at least one CEO making over $ 1000 000. b) The salary of $300 000 appears the most often. Approximately 14 CEOs report having this salary. c) Approximately 6 CEOs have a salary of $ 500 000

6. There is an error in the table. The first value of the eighth bin should be 66.

The data is grouped in bins such that the chosen interval is appropriate for the dataset. The histogram displays the center as being in the clustered region of 45 – 60. The histogram does not appear to be skewed and there are no obvious outliers. The data appears represent a normal distribution.

7. A dataset that represents continuous data is easily represented using frequency tables, histograms or frequency if the dataset is large. The range of the data must also be great enough to offer a large enough spread in order to create appropriate intervals or bin sizes.

8. A dataset that is large is best when representing data using frequency tables, histograms, frequency polygons or ogive plots. A frequency table helps to order the data but it does not present a visual representation that is easily interpreted. A histogram displays data spread uniformly over the entire interval. The shape of the histogram provides a great deal of information about the distribution of the data. The ogive plot has the cumulative frequency on its x-axis. This representation allows for the study of medians and quartiles.


Concepts

5


Answer Key

9. When the distribution’s shape is much skewed or has extreme outliers, the mean will be pulled towards the skewed end making it not very representative of the normal center of the dataset. When the distribution displays a positive skew, the mean is greater than the median.

When the distribution has a negative skew, the mean is less than the median. In a normal distribution, the mean is the center of the distribution.

10. Determine the range of the data. (maximum value – the minimum value). Decide how many classes you wish to display on your graph. (usually 7 – 10 bins provide a visual display of the distribution. When you have decided, divide the range by the number of bins to determine the number of values in each class.


Concepts

6


2.2 Displaying Categorical Variables

Answers

1.

7

6

5

4

3

2

1

0

2.

Weight of Materials in a Typical Desktop

Material

Plastics

Lead

Aluminum

Iron

Copper

Tin

Zinc

Nickel

Barium

Other Elements and chemicals

Computer

Material

Kilograms Approx. % of Total Weight

6.21

1.71

3.83

23

6.33

14.18

5.54

2.12

0.27

20.52

7.85

1

0.60

0.23

0.05

6.44

2.22

0.85

0.185

23.85

Answer Key


Concepts

7


Answer Key

3.

Percentage of Materials in a Typical

Desktop Computer

Plastics

Lead

Aluminum

Iron

Copper

Tin

Zinc

4. Answers will vary. Bar graphs are easier to analyze here because there are so many categories.

5.

Grades for Statistics Class

8

6

4

2

0

16

14

12

10

A B C

Grade

D F

6.

7.

Grade

A

B

C

D

F

7

4

3

# Students Approximate % of Total Grade

14 48.28

24.14

13.79

10.34

3.45


Concepts

A

8


Answer Key

8. Answers will vary. Although bar graphs are easier to analyze, the relatively few categories make the pie chart easy to analyze as well.

9.

Income of Persons age 25+ versus Highest

Level of Education

Highest Level of Education


Concepts

9


Answer Key

10.

Highest Level of Education

High School

High School Graduate

Median Income of

Persons age 25+

$20,321

$26,505

Some College

Associate’s Degree

$31,056

$35,009

Bachelor’s Degree or Higher $49,303

Bachelor’s Degree

Master’s Degree

Professional Degree

Doctorate Degree

$43,143

$52,390

$82,473

$69,432

Approximate Percentage of Total Income

4.96

6.47

7.58

8.55

12.04

10.53

12.79

20.13

16.95

11.

Percentage of Total Income

High School

High school graduate

Some college

Associate's degree

Bachelor's degree or higher

12. Answers will vary. Bar graphs are easier to analyze here because there are so many categories.

13. The median eliminates outliers. Especially if the data is skewed, the median is used.


Concepts

10


2.3 Displaying Univariate Data

Answer Key

Answers

1.

Dot plot:

10 20 30

Percentage

40

2.

The distribution is uniform. The center of the data is approximately 25 with data somewhat evenly spread from 5 through to 48.

3.

Stem-and-leaf plot:

0 5 5

1 1 2 2 3 4 5 9 9

2 0 1 3 5 5 6 6 7 8 8 9

3 0 2 3 3 4 5 6 9

4 0 1 2 2 5 8

4.

27

5.

The distribution is left skewed with no outliers.

6.

The distribution is left skewed with one outlier.

7.

The distribution is symmetric with no apparent outliers.

8.

The distribution is right skewed with no apparent outliers.

9.

The first data set is symmetric with no apparent outliers. The second data set is symmetric with no apparent outliers. The third data set is bimodal. The fourth data set is evenly distributed.


Concepts

11

2

1

4

3

7

6

5

10

9

8

14

13

12

11

1

2

10 20


Answer Key

10.

The first data set is centered on 52 with a large peak at 52. The second data set is centered on 52 with a peak at 52. The third data set is centered at 52 but has peaks at

25 and 85. The fourth data set has no center, all peaks are even.

11.

The first dot plot has the smallest standard deviation.

12.

The third dot plot has the largest standard deviation.

13.

Dot plots are useful with small data sets that use categorical data. When the data

14.

a) Stem-and-leaf plot describes qualitative observations, measures of spread or shape are not used. These characteristics to describe dot plots are used when the categories are numerical.

3 2 3 6 7 8

4 0 1 3 3 4 4 5 5 5 5 6 6 7 7 7 8 8 8 8 9

5 0 0 0 0 0 0 1 1 2 3 3 3 5 5 5 6 6 6 6 7 7 8 8 9

6 0 1 1 1 2 2 3 9 9

7 0 4 b) Dot Plot

30 40 50 c) The data set is symmetric.

Ages of CEOs d) Outliers could include 32, 33, and 74.

60 70 80 90


Concepts

12

4

3

6

5

2

1

9

8

7

12

11

10

1

2

3


Answer Key

15.

The data set in this example is the measurement of pulse rate of 15 teenagers. If one of the teenagers had their pulse rate measured after running a five mile marathon, this measurement would be an outlier. If, however, all of the teenagers were in a five mile marathon and had their pulse rates taken at the finish line, there would be no outliers.

16.

Yes. The outliers can be seen as they lie outside the main group of numbers.

17.

When using a five number summary, use the interquartile range (IQR) to determine if a data set contains an outlier. The IQR is found by subtracting the first quartile value from the third quartile value. Then multiply the IQR by 1.5. If you subtract 1.5 x IQR from the first quartile value, any numbers less than this are outliers. If you add 1.5 x IQR from the third quartile value, any numbers more than this are outliers.

18.

a) Stem-and-leaf

5 5

6 1 7

7 5 5 6

8 0 0 1 4 5 7 8

9 0 3 3 4

b) Dot Plot

10 20 30 40 50 60 70 80

Exam Scores

19.

Web link does not exist.


Concepts

90 100

13


2.4 Displaying Bivariate Data

Answer Key

Answers

1.

The independent variable is the explanatory variable and the dependent variable is the response variable. Therefore comparing the municipal waste to each state would have the explanatory variable as the state name and the response variable as the amount of waste.

If comparing the percentage of each state in the union versus the amount of waste, the percentage would be the explanatory variable and the response variable would be the amount of waste.

2.

13 386 000 tons

3.

50000

45000

40000

35000

30000

25000

20000

15000

10000

5000

0

0

Percentage of State in Union vs Amount of

Municipal Waste

10 20 30 40

Percentage of State in Union

50 60

4.

The direction is positive but there is a weak correlation between the two variables.

5.

There is a decrease in the recycling rate of plastic bottles made from PET and an increase in the recycling rate of HDPE.


Concepts

14


Answer Key

6.

The total change in PET recycling went from about 33% to about 22%, so from about

10-12% from the years 1995 to 2001.

7.

One explanation was that there was an increase in the use of HDPE in recycling containers and this type of recycled material is used more often in the production of plastic lumber, tables, roadside curbs, benches, truck cargo liners, trash receptacles, stationery (e.g. rulers) and other durable plastic products.

8.

This change was the most rapid from the middle of 1995 to the middle of 1996.

9.

Dot plots allow for the interpretation of shape, center, and spread but are only used for small sets of data. Stem and leaf plots are useful for seeing the shape of the distribution of data. Both of these plots are used for univariate data sets to determine if the data is symmetric or skewed, to see any gaps and spot outliers. A scatter plot is useful for determining trends in data and the correlation between the explanatory and response variables. Scatter plots are used for bivariate data sets to see the general relationship between the variables.

10.

Median and IQR can be used to describe any set of data but are particularly useful for skewed data. When data is skewed or has extreme outliers, the mean is pulled toward the skewed end. This makes the mean not representative of the middle


Concepts

15


2.5 Box-and- Whisker Plots

Answer Key

Answers

1.

Min X

35

Lower Quartile

53

2.

=

3

− Q

1

IQR

IQR

=

= 22.5

Q

1

−

1.5

∗

IQR

Median

67.5

Q

3

+ 1.5

∗

Upper Quartile

75.5

IQR

Max X

95

There are no data values less than 19.5 and none greater than 109.25. Therefore, there are no outliers.

3.

4.

The third quarter of the data is more densely concentrated in a smaller area. 50% of the data is between 53 and 75.5. The data is very close to being symmetric although the data does skew slightly to the left.

The median of the data is 67.5. The mean should be pulled left in the direction of the skewness and thus be smaller than the median. The mean of the data is 65.7.

5.

Median

82

Upper Quartile

89

Max X

105

Min X

0

Lower Quartile

72

6.

=

3

− Q

1

IQR

= 89 72

IQR

= 17

Q

1

−

1.5

∗

IQR Q

3

+ 1.5

∗

IQR

There are three data values 0, 4, and 46 that are less than 46.5. These values are outliers for this data set. There are no data values greater than 114.5.

7. The data in the lower 25% are widely spread compared to the other sections of the graph. 50% of the data is between 72 and 89. The data is moderately symmetric although it does skew to the left.


Concepts

16


8.

Answer Key

The median of the data is 82. The mean should be pulled left in the direction of the skewness and should be considerably smaller than the median. The mean of the data is

75.4.

9.

The median of the data for Utah is higher than that of Idaho. This indicates that the reservoirs in Idaho are less full than those in Utah. The IQR of the dataset for Idaho is 22.5 spread out between a capacity of 75.5 and 53. The IQR of the dataset for Utah it is 17 spread out between a capacity of 89 and 72. Therefore 50% of the capacity percentages in Utah are more concentrated than those of Idaho.


Concepts

17


2.6 Effects on Box-and-Whisker Plots

Answer Key

Answers

1.

Min X

3.12

2.

=

3

− Q

1

IQR

IQR

=

= 0.173

Lower Quartile

3.22

Q

1

−

1.5

∗

IQR

Median

3.282

Q

3

Upper Quartile

3.393

+ 1.5

∗

IQR

Max X

3.528

There are no outliers since there are no data values less than 2.9605 and none greater than

3.6525.

3.

4.

Min X

.8242

Lower Quartile

.8506

Median

.8670

Upper Quartile

.8963

Max X

.9320

The center and the measures of spread for the given dataset will decrease by a factor of

1/3.7854 or 0.2642. The boxplots for both datasets will have the same shape but the plot for

US gallons will be stretched out more.


Concepts

18


5.

State

Alaska

Washington

Idaho

Montana

North Dakota

Minnesota

Michigan

New York

Vermont

New Hampshire

Maine

Average Price of a

Gallon of Gas (US$)

3.458

3.528

3.26

3.22

3.282

3.12

3.352

3.393

3.252

3.152

3.309

Average Price of a

Liter of Gas (US$)

0.833

0.874

0.867

0.861

0.824

0.914

0.896

0.859

0.851

0.932

0.886

Answer Key

6. This data was retrieved from the website http://fuelgaugereport.opisnet.com/sbsavg.html on

July 5, 2014. The data will vary depending on the day.

State

Alaska

Washington

Idaho

Montana

North Dakota

Minnesota

Michigan

New York

Vermont

New Hampshire

Maine

7. The prices are already in US$.

Average Price of a Gallon of Gas (US$)

4.121

4.120

3.789

3.591

3.705

3.588

3.869

3.809

3.756

3.658

3.722


Concepts

19


Answer Key

8. A dot plot and a stem-and-leaf plot are used when the dataset consists of a small number of values. A histogram and a box-and-whisker plot are used when the dataset is large.

9. Histograms and Box-and-Whisker Plots

10. The center of the distribution would change by the same scale factor. Calculations like the range, the IQR and the standard deviation will change proportionally by the same scale factor.

The five-number summary would also change proportionally.

11. e

12. a) The median is 121.5 b) The lower quartile is 114. The upper quartile is 129.5 c)

=

3

− Q

1

IQR

=

IQR

= 15.5

d)

Q

1

−

1.5

∗

IQR Q

3

+ 1.5

∗

IQR

There are no outliers since no data values are less than 90.75 or greater than 152.75 e)


Concepts

20