CHAPTER 2 1. You have data on the summer earnings of a sample

advertisement
CHAPTER 2
1. You have data on the summer earnings of a sample of 1,000 high school students. What kind
of graph should you use to describe the distribution of their earnings?
(a) Bar graph.
(b) Line graph.
(c) Histogram. (Quantitative data)
(d) Pie chart.
(e) None of these.
Here is a dotplot of the adult literacy rates in 177 countries in 2008, according to the United
Nations. For example, the lowest literacy rate was 23.6%, in the African country of Burkina
Faso.
2. The overall shape of this distribution is
(a) clearly skewed to the right
(b) clearly skewed to the left
(c) roughly symmetric
(d) no clear shape
3. The mean of this distribution (don't try to find it) is certainly
(a) very close to the median.
(b) clearly less than the median.
(c) clearly greater than the median.
(d) can't say because the mean is random.
4. Based on the shape of this distribution, what numerical measures would best describe it?
(a) the five-number summary.
(b) the mean and standard deviation.
(c) the mean and the quartiles.
(d) the median and the standard deviation.
(e) none of these
5. The mean of a distribution of scores of given to be x  43 with a standard deviation of s  4 . If five
is added to each of the values in the distribution, the new mean and standard deviation will be,
respectively:
(a)
(b)
(c)
(d)
(e)
x  43 and s  9
x  43 and s  9
x  43 and s  20
x  48 and s  2
x  48 and s  4
6. Last year, students in a Statistics class were given a survey and asked how many cans of soda
they had consumed the week before the survey. It turned out that these students consumed an
average of 6.25 cans of soda with a median of 4 cans and a standard deviation of 6.9 cans. A
histogram of the data looks like:
Which histogram above would best represent the distribution of soda consumed the week
before the survey was taken? B-skewed left
7.
If a bar graph is to be accurate, it is essential that
(a) the bars touch each other.
(b) the bars be drawn vertically.
(c) both horizontal and vertical scales be clearly marked in equal units.
(d) the bars all have the same width.
(e) the explanatory variable be plotted on the horizontal axis.
8.
Which of these statements about the standard deviation s is true?
(a) s is always 0 or positive.
(b) s should be used to measure spread only when the mean x is used to measure center.
(c) s is a number that has no units of measurement.
(d) Both (a) and (b), but not (c).
(e) All of (a), (b), and (c).
9. The five-number summary of the distribution of scores on the final exam in Psych 001 last
semester was 18 39 62 76 100. A total of 416 students took the exam. About how
many students had scores above 39?
(a) 416
(b) 312
(c) 104
(d) 400
(e) 250
10. The 5-number summary for a univariate data set is given by
{min = 5, Q1 = 18, Med = 20, Q3 = 40, max = 75}. If you wanted to construct a modified
boxplot for the dataset (that is, one that would show outliers, if any existed), what would be
the maximum possible length of the right side “whisker”?
(a) 33
(b) 35
(c ) 45
(d) 53
(e) 55
11. Which of the following is likely to have a mean that is smaller than the median?
(a) The salaries of all National Basketball Association players.
(b) Amounts awarded by juries from lawsuit involving injuries.
(c ) The prices of homes in a large city.
(d) The long distance race in which most runners took a long time but a few finished it
rather quickly.
(e) The scores of students (out of 100 points) on a very easy exam in which most get
nearly perfect scores but a few do very poorly. (I like both)
12. A biologist has gathered data on a population of bears in the forests of the northeast. A
frequency polygon plot of the weights of the sample of bears and their sex is given below.
Based on the plot, which statement below is TRUE?
Weights of Bears
14
Sex of Bear
Female
Male
12
Frequency
10
8
6
4
2
0
0
80
160
240
Weight
320
400
480
(a) Since the distributions overlap, there is not much difference between the weights of male
and female bears.
(b) The female bears have a higher mean weight than the male bears and also exhibit more
variability in those weights.
(c ) The female bears have a higher mean weight than the male bears and also exhibit less
variability in those weights.
(d) The male bears have a higher mean weight than the female bears and also exhibit
more variability in those weights.
(e) The male bears have a higher mean weight than the female bears and also exhibit less
variability in those weights
14. Here are the yearly wages of 30 randomly selected full-time employed people who hold at
least a Bachelor’s degree. The data are in thousands of dollars, rounded to the nearest
thousand. They come from the Current Population Survey for March 2009.
69 84 102
41
57
61
68
97
91
217
43
78
63
58 43
48
39
57
23
41
62
46
80
51
75
95
19 31
185
32
(a) Make an appropriate graph of these data. Describe the overall shape of the distribution.
Are there any clear outliers?
Exaamples. Yes, there are definite outliers. Shape: relatively symmetric, but with a couple
of high outliers. (or, skewed right)
Collection 1
0
Dot Plot
20 40 60 80 100 120 140 160 180 200 220
Wages
(b) Based on your findings in part (a), choose a numerical summary for this distribution.
Calculate your summary, and justify your choice.
Median/IQR. Median = 59.5 thousand dollars, IQR = 80-43 = 37 thousand dollars.
1. For a normal distribution with mean 20 and standard deviation 5, approximately what percent
of the observations will be between 5 and 35?
(a) 50% (b) 68%
(c) 95%
(d) 99.7%
(e) 100%
2. Two measures of center are marked on the density curve above.
(a) The median is at the dashed line and the mean is at the solid line.
(b) The median is at the solid line and the mean is at the dashed line.
(c) The mode is at the dashed line and the median is at the solid line.
(d) The mode is at the solid line and the median is at the dashed line.
(e) None of these is correct.
3. Items produced by a manufacturing process are supposed to weigh 90 grams. However, the
manufacturing process is such that there is variability in the items produced and they do not
all weigh exactly 90 grams. The distribution of weights can be approximated by a normal
distribution with a mean of 90 grams and a standard deviation of 1 gram. Using the 68–95–
99.7 rule, what percentage of the items will either weigh less than 88 grams or more than 92
grams?
(a) 0.3%
(b) 3%
(c) 5%
(d) 95%
(e) 99.7%
4. Which of the following is least likely to have a nearly normal distribution?
(a) Heights of all female students taking Statistics at Franklin Academy.
(b) IQ scores of all students taking Statistics at Franklin Academy.
(c) The SAT Math scores of all students taking Statistics at Franklin Academy.
(d) Family incomes of all students taking Statistics at Franklin Academy.
(e) Time from conception to birth of all students taking Statistics at Franklin Academy.
5. Scores on the American College Testing (ACT) college entrance exam follow the normal
distribution with mean 18 and standard deviation 6. Wayne's standard score on the ACT was
-0.7. What was Wayne’s actual ACT score?
(a) 4.2
(b) -4.2
(c) 9.6
(d) 13.8
(e) 22.2
6. The test grades at a large school have an approximately normal distribution with a mean of
50. What is the standard deviation of the data so that 80% of the students are within 12
points (above or below) the mean?
(a) 5.875
(d) 14.5
(b)
(e)
9.375
(c)
10.375
cannot be determined from the given information
The death rates from heart disease per 100,000 people in a group of developed countries were
recorded. The distribution is roughly described by this normal curve:
7. From this normal curve, we see that the mean heart disease death rate per 100,000 people is
about:
(a) 60
(b) 120
(c) 190
(d) 250
(e) 400
8. From the normal curve, we see that the standard deviation of the heart disease rate per
100,000 people is closest to
(a) 25
9.
(b) 65
(c) 100
(d) 200
(e) 400
Which of the following are true statements?
I.
II.
In all normal distributions, the mean and median are equal.
All bell-shaped curves are normal distributions no matter what the
particular mean and standard deviation are.
III.
Virtually all the area under a normal curve is within three standard
deviations of the mean, no matter what the particular mean and standard
deviation are.
(a) I only
(b) I and II
(c) II and III
(d) I, II, and III
(e) I and III Technically, E is the answer, but based on what
our book told you I’d take D too.
10. Suppose that adult women in China have heights that are normally distributed with mean 155
centimeters and standard deviation 8 centimeters. Adult women in Japan have heights which
are normally distributed with mean 158 centimeters and standard deviation 6 centimeters.
Which country has the higher percentage of women taller than 167 centimeters?
(a)
(b)
(c)
(d)
China z = 1.5
Japan z = 1.5
The percentages are the same.
It is not possible to tell from the information given.
11. Which one of the following would be a correct interpretation if you have a z-score of +2.0 on
an exam?
(a)
(b)
(c)
(d)
(e)
It means that you missed two questions on the exam.
It means that you got twice as many questions correct as the average student.
It means that your grade was two points higher than the mean grade on this exam.
It means that your grade was in the upper 2% of all grades on this exam.
It means that your grade is two standard deviations above the mean for this exam.
12. The mean blood pressure for 47-year-old males in the United States is normally distributed
with a mean of 139 mg and a standard deviation of 26 mg. A doctor tells a 47- year-old male
patient that he is in the lowest 10% of all people in this population. Which one of the values
below is nearest to the patient’s actual blood pressure?
(a) 96
(b) 106
(c) 108
(d) 125
(e) 127
Part II: Short answer questions.
13. The lifetime of a certain brand of tires is approximately normally distributed, with a mean of
40,000 miles and a standard deviation of 2,500 miles under normal driving conditions. Tire
wear is greatly affected by road, weather and driver conditions along with proper
maintenance of the tires. A driver who aggressively accelerates or makes quick stops, for
example, will wear out a tire much more quickly. The brand carries a warranty of 33,000
miles under normal driving conditions, i.e., the company will replace a tire if it wears out
before this mileage limit is reached. What percent of the tires will fail before the warranty
limit is reached?
Mean = 40,000
SD = 2500. Val = 33,000.
Z-score = (33000-40000)/2500 = -2.8\
Normcdf(-10, -2.8) = 0.002
(b) If the company sold 250,000 of these tires this year, approximately how many would it
expect to have to replace under the warranty conditions?
250,000 (0.002) = 500 (I think…no calc)
14. Below is a histogram of the opening day stock price for Apple from January 1, 2000 to July
9, 2009; a total of 496 days.
Opening Stock Prices for Apple
30
Relative Frequency (%)
25
20
15
10
5
0
30
60
90
120
150
180
Dollars
(a) Draw an appropriate density curve for summarizing the histogram above. How would
you describe the shape of this density curve?
Skewed right
(b) Where would the mean and median be located on the density curve you drew in part (a)?
Draw in their approximate locations.
Median = middle number or 50th percentile. So I’m going to count bars until I get to 50ish.
First bar looks like about 7%. Next one is about 28%, so I’m up to 35% total. Next bars are
6%, 6%, then 5%...that last one carries us over 50% total. That bar is at about a share
price of $50, so I’m guessing that’s the median.
The mean should be somewhat higher. Maybe $70.
(c) Based on the histogram, what is the approximate percentile of the opening price of $35?
Interpret this percentile in the context of this problem.
Same logic as the last one, looks like around 40th percentile or a bit higher.
(d) Based on the histogram, what is the approximate opening stock price which represents
the 97th percentile?
Counting down from 100%, it looks like about $180.
(e) The mean and standard deviation of the distribution of the opening day price for Apple
stock is $64.94 and $49.50, respectively. What is the z-score for the opening day price of
$107.40?
Interpret this z-score in the context of this problem.
(107.40 – 64.94)/49.5 = 0.86. When the price was $107.40, it was 0.86 standard deviations
above the mean price for the day.
15. Syracuse, New York is the snowiest metropolitan area in the United States. Based on 59
years of data from the National Weather Center, the mean annual snowfall is 118.5 inches
with a standard deviation of 33.5 inches. The annual snowfall in Syracuse follows a roughly
normal distribution.
(a) Sketch a normal curve to illustrate the annul snowfall in Syracuse. Be sure to mark the
mean and the points that determine one, two, and three standard deviations away from the
mean.
(Usual Sketch)
(b) Use the 68-95.99.7 rule to estimate the percent of years where the annual snowfall was
between 85 inches and 185.5 inches. Illustrate your method clearly.
We did one just like this in class. 68% are one standard deviation away, so that covers
from 85 inches to 152 inches. From 152 to 185.5 there is another (95-68)/2 = 13.5%.
So altogether it would be 68% + 13.5%=81.5%.
(c) Each year the city of Syracuse budgets enough money for snow removal to take care of
all but the snowiest 3% of years. It is willing to run some small risk of this happening,
especially in a tight budget year. How much would it have to snow in a particular year
in order for the city to exceed its snow-removal budget? (Sketch a normal curve and use
either your calculator or Table A.)
The top 3% is marked off by the 97th percentile. Invnorm (0.97) gives you the z-score:
1.881. How much snow is that? Use the z-formula:
1.88 
x  118.5
33.5
. Solving this for x, you should get 181.5 inches.
(c) In 2001, the snowfall totaled 59.4 inches. Was this an unusually low amount of snow for
Syracuse? Justify your answer. Include the sketch a normal curve and some numerical
calculations to support your answer.
What percentile is 59.4 inches? Use the z-formula: z 
59.4  118.5
 1.76 . So it’s a little
33.5
unusual (almost 2 standard deviations below the mean). For the percentile, we’d do
normcdf(-10, -1.76) = 0.0389. So this was at around the 4th percentile of all years. Pretty
low!
Download