MATH 1342 BPS 4

advertisement

MATH 1342 BPS 4 th ed. HW Ch.1 notes last updated 09/09/07

Send corrections / comments to Mary Parker at mparker@austincc.edu page 1 of 5

Homework Notes Chapter 1

These notes do not include full answers. Students are expected to read the full answers in

StatsPortal before looking at these notes. These notes contain more details about what you should be thinking about and learning from each of the exercises included in the list here.

I am not promising to do this for all chapters. I am doing it for Chapter 1 because, for some students, this chapter seems so different from the material in their previous math courses that they don’t know what they are supposed to be looking for or thinking.

HW Chapter 1: [1.1, 1.3, 1.5, 1.7, 1.9, 1.11, 1.13-1.22], 1.23, 1.27, 1.29, 1.31, 1.32, 1.39, 1.41,

1.44(M), 1.45(T)

1.1

Although the number of cylinders is a counting number, some people think that this should be considered a categorical variable, because it is often used to divide cars into categories –

4-cylinder cars, 6-cylinder cars, etc. Certainly when you graph this variable, it is just as reasonable to graph it as a categorical variable as a quantitative variable.

1.3 b. Do you know why it would NOT be correct to make a pie chart if you DON’T add the

“other” category? That’s because the values here don’t represent all possible choices. To construct a pie chart, you must have percentages that sum to 100%.

1.5. If you don’t understand the answer in the book, it is a good idea to consider another example. Here is a set of questions about a different data set that is provided in full, so it is easier to compare the questions and answers.

In a particular business college which offers three majors, the following table gives the numbers of men and women in various majors.

Men Women total

Accounting

Administration

Finance total

83

102

53

238

69

110

37

216

152

212

90

454

Consider the difference between these two questions:

What percentage of the women are accounting majors?

What percentage of the accounting majors are women?

Each of these percentages comes from a fraction. They have the same numerators, but not the same denominators.

What percentage of the women are accounting majors?

Answer: 69/216 = 0.3194 = 31.94%

What percentage of the accounting majors are women?

Answer: 69/152 = 0.4539 = 45.39%

Notice that to compute both of these percentages, you need both total counts, as well as the count of the number of women accounting majors.

In the exercise in our text about young people and MP3 players, we had neither the totals nor the counts, so there was not enough information to answer the question in exercise 1.5.

MATH 1342 BPS 4 th ed. HW Ch.1 notes last updated 09/09/07

Send corrections / comments to Mary Parker at mparker@austincc.edu page 2 of 5

1.6. Each student should know how to construct a histogram. The steps are illustrated in

Example 1.5. If looking at that example does not make it completely clear to you how to construct a histogram, and how the histogram shows the shape of the data, then you should practice it by doing exercise 1.6. Pay attention specifically to each of these steps.

Look over the data to see the smallest and largest observations, so you know what range you need on your horizontal axis.

 Decide on a reasonable “bin size” and start it at a reasonable number. Here they tell you to use a bin size of 2 and start it at 15. An example of unreasonable choices would be to use a bin size of 3.4 and start it at 14.7. That’s a joke! Probably you wouldn’t choose something like that. But the point is to choose something that makes the resulting graph easy to read.

Write your classes, with some room beside each one. (Each observation should fit into exactly one category. For observations exactly on an endpoint, make some reasonable decision about whether you’ll put it into the lower interval or the upper interval and then do it consistently for the entire histogram.)

Start through the data, making a tally mark for each observation in the correct class.

When you are finished, count your tally marks so that you know the frequency for each class. (The number of observations in that interval is the frequency for that class.)

Look at the endpoints of your bins and your lowest and highest frequencies and let that guide you about how to make the scales along the horizontal and vertical axes.

Draw your histogram, with your bins along the horizontal axis and your frequencies up the vertical axis.

Result:

This tally is only partially done, in order to illustrate how you should do it. I filled in the tally marks for Alabama through Colorado.

You should continue that. When you finish putting a tally mark for every state, then count them and fill in the count column.

Classes

15.0

 observation 17

Tally Count

etc. 4

17.0

19.0

21.0

23.0

 observation 19 observation observation observation

| etc. 4

| etc. 7

21

23

25

| | etc. 13

| etc. 14

25.0

27.0

29.0

 observation observation observation

| etc. 4

etc. 3

27

29

31 etc. 2

1.7 The main thing determining the shape of the histogram is the data you’re graphing, but the choice of bin size has some effect on the shape. This problem illustrates that fact. The applet is a very ingenious little program that shows easily the effect of changing the bin size. Please use it and answer the questions. If you can’t easily get to the Web to open StatsPortal at the time you’re working on this, the applet is also on the CD in the back of your text.

MATH 1342 BPS 4 th ed. HW Ch.1 notes last updated 09/09/07

Send corrections / comments to Mary Parker at mparker@austincc.edu page 3 of 5

1.9. To find the midpoint here, you must recognize that this means the midpoint of the data observed, not the midpoint of the horizontal axis. There are either 50 or 51 observations because there is one for each state. (If they include one for Washington DC, then there are 51.) So we need the score that has about 25 observations below it and 25 observations above it. Now look at how many observations are where.

Between 0 and 5 percent are 23 observations.

Between 5 and 10 percent are 14 observations.

We can stop here. So the observation that has 25 below it and 25 above it is somewhere in the interval from 5 to 10. So the class which has the midpoint in it is the class from 5 to 10 percent.

1.11. In this problem, they tell you to round the data and to split stems. Before we actually do that, let’s discuss why they tell you that.

Suppose they hadn’t said that and you started to do a stemplot on the data as it is. Since the last digit is the leaf and the digits before that are stems, we’d need to have stems all the way from 7 to 35 (for the 70’s to the 350’s.) That’s a lot of stems – more classes than we would want for a graph of a dataset unless it had MANY observations. Generally speaking, it’s a good idea to have between 6 and 20 classes for a graph.

So first we round the data, so that the last digits are all zeros, and then we can use the tens digit as the leaf of the stemplot. Here are the rounded data:

140 160 110 150 130 100 100 80 150

170 200 170 100 170 360 150 150 260

Now since the leaves are the tens digits then the stems are the hundreds digits. So there are only three stems.

0

1

2

3

Now, four stems isn’t enough different classes for a good graph. We want between 6 and 20 classes. So if we split each stem in two, we’ll have a better stemplot. Here are the stems

2

2

3

0

0

1

1

3

Now, I’ll put the first four rounded observations onto this to indicate how to get started:

0 |

0 | etc.

1 | 4 1 etc.

1 | 6 5 etc.

2 | etc.

2 |

3 |

3 |

MATH 1342 BPS 4 th ed. HW Ch.1 notes last updated 09/09/07

Send corrections / comments to Mary Parker at mparker@austincc.edu page 4 of 5

See the answer key in StatsPortal for the plot and the answers to the rest of the questions in the problem.

1.12 Notice that a timeplot has the values of the variable up the vertical axis, which is different from any other graph we have studied in this chapter. Time is on the horizontal axis. Usualy we make timeplots using software. In CrunchIt, this is called an index plot. Since the point of a timeplotis to see whether there is a trend over time, by convention, we sometimes use timeplots which don’t have the horizontal axis fully labeled with the year (or month, etc.) but just label it by an “index number” which is 1 for the first year shown, 2 for the second year shown, etc. In that case, you won’t have anywhere to put in anything about the column of years when you are telling the software to make the timeplot. That’s OK. In MINITAB you can choose some of the options if you really want to display the actual year along the horizontal axis rather than just an index number.

1.27 It is hard to estimate percentages very well when looking at a pie chart. So here, when asked for the percentage of Mexican origin, you should see that it is more than half and less than three quarters of the whole. So you should estimate that percentage as larger than 50% and smaller than 75%. I’d estimate about 60%, but that could easily be off by as much as 5%. Us the same idea to estimate the percentage of Puerto Rican. Did you get an answer that was somewhat close to the answer in the book/ebook?

1.31. Just because in this exercise they asked you to ignore outliers to answer the first question doesn’t mean that it is always a good idea to ignore outliers. In fact, the only reason it is good to completely ignore outliers when analyzing data is when you can confidently determine that the outliers are actual errors in reporting data or else they come from a case that doesn’t really belong with the other cases. Otherwise, you should report the results of your analysis both with and without the outliers. In this exercise, they are asking you to ignore the outlier because they have determined that the situation in October 1987, when there was a stock market crash, is not useful in summarizing the performance of stocks overall. You may or may not agree with that judgment, but you can answer the questions they ask with “The center of the distribution, without the outlier of October 1987, is ….”

1.32. Notice that it is pretty easy to determine that numbers 1 and 2 must go with graphs b and c.

Graph b illustrates a situation where one of the options is very much more prevalent than the other. Is it clear to you which of 1 and 2 that must be? So numbers 3 and 4 must go with graphs a and d. So notice that graph a is very skewed to the right. Which of numbers 3 and 4 is more likely to have a strongly skewed distribution?

1.39. To prepare to answer part a, just look at the counts. From the counts, you can see which category of drivers cause the MOST accidents. Does this make sense? Why or why not? Does it seem relevant that that category has a very large number of drivers?

This exercise was carefully selected to inspire you to think about the meaning of percentages and why we use them.

If these data don’t make this point clearly to you, let’s try something related to school. Consider

Mr. Wendall’s class marketing class whose tests have 100 short questions and he grades students by taking off one point per question from a perfect grade of 100. The consider Mr. Janow’s marketing class, whose tests have 50 questions and he grades students by taking off one point per question from a perfect grade of 100.

MATH 1342 BPS 4 th ed. HW Ch.1 notes last updated 09/09/07

Send corrections / comments to Mary Parker at mparker@austincc.edu page 5 of 5

Would it be fair to compare students from these two classes by using their grades on these tests?

Do you see why teachers usually grade tests on a percentage of the questions that are correct rather than the number of questions that are correct?

1.41 I don’t think any of the software you have available easily does two timeplots on one graph. Make these timeplots by hand. The horizontal axis should have the time and the vertical axis the number of problems per 100 vehicles. Use two different colored pencils for the two types of vehicles.

1.44. Use MINITAB for this problem to help you get started working on MINITAB. The point of this problem is for you to see that you can look at the same data in more than one way and different types of graphs are appropriate depending on what way you want to look at the data.

For part a, if you want to describe the shape, center, and spread of a distribution, then a histogram is useful. (A stemplot with appropriately split stems could also have been used, but they didn’t ask for that.)

For part b, if you want to see whether there is a trend in the values over time, a histogram is not helpful at all but a timeplot is very useful.

When students are just getting used to timeplots, they tend to focus on the fact that there is a lot of up-and-down movement and think that is cycles, and don’t pay attention to the fact that the overall trend of the graph is going upward. But the up and down movement here isn’t really cycles. They aren’t very regular and they don’t last over several years mostly – it is just that, quite often, there is quite a lot of difference in the number of attacks from one year to another.

1.45 Some students ask whether they are really just being asked their preference here and any answer is OK. Yes, you are being asked your preference. But you are supposed to think of yourself in the role of a person who is trying to explain to someone else what information the data provides. Since you are giving a stemplot, that means you think that the shape, center, and/or spread of the distribution is important. So which of these stemplots do you think best conveys that information and why?

Download