CHAPTER 2 1. You have data on the summer earnings of a sample of 1,000 high school students. What kind of graph should you use to describe the distribution of their earnings? (a) Bar graph. (b) Line graph. (c) Histogram. (Quantitative data) (d) Pie chart. (e) None of these. Here is a dotplot of the adult literacy rates in 177 countries in 2008, according to the United Nations. For example, the lowest literacy rate was 23.6%, in the African country of Burkina Faso. 2. The overall shape of this distribution is (a) clearly skewed to the right (b) clearly skewed to the left (c) roughly symmetric (d) no clear shape 3. The mean of this distribution (don't try to find it) is certainly (a) very close to the median. (b) clearly less than the median. (c) clearly greater than the median. (d) can't say because the mean is random. 4. Based on the shape of this distribution, what numerical measures would best describe it? (a) the five-number summary. (b) the mean and standard deviation. (c) the mean and the quartiles. (d) the median and the standard deviation. (e) none of these 5. The mean of a distribution of scores of given to be x 43 with a standard deviation of s 4 . If five is added to each of the values in the distribution, the new mean and standard deviation will be, respectively: (a) (b) (c) (d) (e) x 43 and s 9 x 43 and s 9 x 43 and s 20 x 48 and s 2 x 48 and s 4 6. Last year, students in a Statistics class were given a survey and asked how many cans of soda they had consumed the week before the survey. It turned out that these students consumed an average of 6.25 cans of soda with a median of 4 cans and a standard deviation of 6.9 cans. A histogram of the data looks like: Which histogram above would best represent the distribution of soda consumed the week before the survey was taken? B-skewed left 7. If a bar graph is to be accurate, it is essential that (a) the bars touch each other. (b) the bars be drawn vertically. (c) both horizontal and vertical scales be clearly marked in equal units. (d) the bars all have the same width. (e) the explanatory variable be plotted on the horizontal axis. 8. Which of these statements about the standard deviation s is true? (a) s is always 0 or positive. (b) s should be used to measure spread only when the mean x is used to measure center. (c) s is a number that has no units of measurement. (d) Both (a) and (b), but not (c). (e) All of (a), (b), and (c). 9. The five-number summary of the distribution of scores on the final exam in Psych 001 last semester was 18 39 62 76 100. A total of 416 students took the exam. About how many students had scores above 39? (a) 416 (b) 312 (c) 104 (d) 400 (e) 250 10. The 5-number summary for a univariate data set is given by {min = 5, Q1 = 18, Med = 20, Q3 = 40, max = 75}. If you wanted to construct a modified boxplot for the dataset (that is, one that would show outliers, if any existed), what would be the maximum possible length of the right side “whisker”? (a) 33 (b) 35 (c ) 45 (d) 53 (e) 55 11. Which of the following is likely to have a mean that is smaller than the median? (a) The salaries of all National Basketball Association players. (b) Amounts awarded by juries from lawsuit involving injuries. (c ) The prices of homes in a large city. (d) The long distance race in which most runners took a long time but a few finished it rather quickly. (e) The scores of students (out of 100 points) on a very easy exam in which most get nearly perfect scores but a few do very poorly. (I like both) 12. A biologist has gathered data on a population of bears in the forests of the northeast. A frequency polygon plot of the weights of the sample of bears and their sex is given below. Based on the plot, which statement below is TRUE? Weights of Bears 14 Sex of Bear Female Male 12 Frequency 10 8 6 4 2 0 0 80 160 240 Weight 320 400 480 (a) Since the distributions overlap, there is not much difference between the weights of male and female bears. (b) The female bears have a higher mean weight than the male bears and also exhibit more variability in those weights. (c ) The female bears have a higher mean weight than the male bears and also exhibit less variability in those weights. (d) The male bears have a higher mean weight than the female bears and also exhibit more variability in those weights. (e) The male bears have a higher mean weight than the female bears and also exhibit less variability in those weights 14. Here are the yearly wages of 30 randomly selected full-time employed people who hold at least a Bachelor’s degree. The data are in thousands of dollars, rounded to the nearest thousand. They come from the Current Population Survey for March 2009. 69 84 102 41 57 61 68 97 91 217 43 78 63 58 43 48 39 57 23 41 62 46 80 51 75 95 19 31 185 32 (a) Make an appropriate graph of these data. Describe the overall shape of the distribution. Are there any clear outliers? Exaamples. Yes, there are definite outliers. Shape: relatively symmetric, but with a couple of high outliers. (or, skewed right) Collection 1 0 Dot Plot 20 40 60 80 100 120 140 160 180 200 220 Wages (b) Based on your findings in part (a), choose a numerical summary for this distribution. Calculate your summary, and justify your choice. Median/IQR. Median = 59.5 thousand dollars, IQR = 80-43 = 37 thousand dollars. 1. For a normal distribution with mean 20 and standard deviation 5, approximately what percent of the observations will be between 5 and 35? (a) 50% (b) 68% (c) 95% (d) 99.7% (e) 100% 2. Two measures of center are marked on the density curve above. (a) The median is at the dashed line and the mean is at the solid line. (b) The median is at the solid line and the mean is at the dashed line. (c) The mode is at the dashed line and the median is at the solid line. (d) The mode is at the solid line and the median is at the dashed line. (e) None of these is correct. 3. Items produced by a manufacturing process are supposed to weigh 90 grams. However, the manufacturing process is such that there is variability in the items produced and they do not all weigh exactly 90 grams. The distribution of weights can be approximated by a normal distribution with a mean of 90 grams and a standard deviation of 1 gram. Using the 68–95– 99.7 rule, what percentage of the items will either weigh less than 88 grams or more than 92 grams? (a) 0.3% (b) 3% (c) 5% (d) 95% (e) 99.7% 4. Which of the following is least likely to have a nearly normal distribution? (a) Heights of all female students taking Statistics at Franklin Academy. (b) IQ scores of all students taking Statistics at Franklin Academy. (c) The SAT Math scores of all students taking Statistics at Franklin Academy. (d) Family incomes of all students taking Statistics at Franklin Academy. (e) Time from conception to birth of all students taking Statistics at Franklin Academy. 5. Scores on the American College Testing (ACT) college entrance exam follow the normal distribution with mean 18 and standard deviation 6. Wayne's standard score on the ACT was -0.7. What was Wayne’s actual ACT score? (a) 4.2 (b) -4.2 (c) 9.6 (d) 13.8 (e) 22.2 6. The test grades at a large school have an approximately normal distribution with a mean of 50. What is the standard deviation of the data so that 80% of the students are within 12 points (above or below) the mean? (a) 5.875 (d) 14.5 (b) (e) 9.375 (c) 10.375 cannot be determined from the given information The death rates from heart disease per 100,000 people in a group of developed countries were recorded. The distribution is roughly described by this normal curve: 7. From this normal curve, we see that the mean heart disease death rate per 100,000 people is about: (a) 60 (b) 120 (c) 190 (d) 250 (e) 400 8. From the normal curve, we see that the standard deviation of the heart disease rate per 100,000 people is closest to (a) 25 9. (b) 65 (c) 100 (d) 200 (e) 400 Which of the following are true statements? I. II. In all normal distributions, the mean and median are equal. All bell-shaped curves are normal distributions no matter what the particular mean and standard deviation are. III. Virtually all the area under a normal curve is within three standard deviations of the mean, no matter what the particular mean and standard deviation are. (a) I only (b) I and II (c) II and III (d) I, II, and III (e) I and III Technically, E is the answer, but based on what our book told you I’d take D too. 10. Suppose that adult women in China have heights that are normally distributed with mean 155 centimeters and standard deviation 8 centimeters. Adult women in Japan have heights which are normally distributed with mean 158 centimeters and standard deviation 6 centimeters. Which country has the higher percentage of women taller than 167 centimeters? (a) (b) (c) (d) China z = 1.5 Japan z = 1.5 The percentages are the same. It is not possible to tell from the information given. 11. Which one of the following would be a correct interpretation if you have a z-score of +2.0 on an exam? (a) (b) (c) (d) (e) It means that you missed two questions on the exam. It means that you got twice as many questions correct as the average student. It means that your grade was two points higher than the mean grade on this exam. It means that your grade was in the upper 2% of all grades on this exam. It means that your grade is two standard deviations above the mean for this exam. 12. The mean blood pressure for 47-year-old males in the United States is normally distributed with a mean of 139 mg and a standard deviation of 26 mg. A doctor tells a 47- year-old male patient that he is in the lowest 10% of all people in this population. Which one of the values below is nearest to the patient’s actual blood pressure? (a) 96 (b) 106 (c) 108 (d) 125 (e) 127 Part II: Short answer questions. 13. The lifetime of a certain brand of tires is approximately normally distributed, with a mean of 40,000 miles and a standard deviation of 2,500 miles under normal driving conditions. Tire wear is greatly affected by road, weather and driver conditions along with proper maintenance of the tires. A driver who aggressively accelerates or makes quick stops, for example, will wear out a tire much more quickly. The brand carries a warranty of 33,000 miles under normal driving conditions, i.e., the company will replace a tire if it wears out before this mileage limit is reached. What percent of the tires will fail before the warranty limit is reached? Mean = 40,000 SD = 2500. Val = 33,000. Z-score = (33000-40000)/2500 = -2.8\ Normcdf(-10, -2.8) = 0.002 (b) If the company sold 250,000 of these tires this year, approximately how many would it expect to have to replace under the warranty conditions? 250,000 (0.002) = 500 (I think…no calc) 14. Below is a histogram of the opening day stock price for Apple from January 1, 2000 to July 9, 2009; a total of 496 days. Opening Stock Prices for Apple 30 Relative Frequency (%) 25 20 15 10 5 0 30 60 90 120 150 180 Dollars (a) Draw an appropriate density curve for summarizing the histogram above. How would you describe the shape of this density curve? Skewed right (b) Where would the mean and median be located on the density curve you drew in part (a)? Draw in their approximate locations. Median = middle number or 50th percentile. So I’m going to count bars until I get to 50ish. First bar looks like about 7%. Next one is about 28%, so I’m up to 35% total. Next bars are 6%, 6%, then 5%...that last one carries us over 50% total. That bar is at about a share price of $50, so I’m guessing that’s the median. The mean should be somewhat higher. Maybe $70. (c) Based on the histogram, what is the approximate percentile of the opening price of $35? Interpret this percentile in the context of this problem. Same logic as the last one, looks like around 40th percentile or a bit higher. (d) Based on the histogram, what is the approximate opening stock price which represents the 97th percentile? Counting down from 100%, it looks like about $180. (e) The mean and standard deviation of the distribution of the opening day price for Apple stock is $64.94 and $49.50, respectively. What is the z-score for the opening day price of $107.40? Interpret this z-score in the context of this problem. (107.40 – 64.94)/49.5 = 0.86. When the price was $107.40, it was 0.86 standard deviations above the mean price for the day. 15. Syracuse, New York is the snowiest metropolitan area in the United States. Based on 59 years of data from the National Weather Center, the mean annual snowfall is 118.5 inches with a standard deviation of 33.5 inches. The annual snowfall in Syracuse follows a roughly normal distribution. (a) Sketch a normal curve to illustrate the annul snowfall in Syracuse. Be sure to mark the mean and the points that determine one, two, and three standard deviations away from the mean. (Usual Sketch) (b) Use the 68-95.99.7 rule to estimate the percent of years where the annual snowfall was between 85 inches and 185.5 inches. Illustrate your method clearly. We did one just like this in class. 68% are one standard deviation away, so that covers from 85 inches to 152 inches. From 152 to 185.5 there is another (95-68)/2 = 13.5%. So altogether it would be 68% + 13.5%=81.5%. (c) Each year the city of Syracuse budgets enough money for snow removal to take care of all but the snowiest 3% of years. It is willing to run some small risk of this happening, especially in a tight budget year. How much would it have to snow in a particular year in order for the city to exceed its snow-removal budget? (Sketch a normal curve and use either your calculator or Table A.) The top 3% is marked off by the 97th percentile. Invnorm (0.97) gives you the z-score: 1.881. How much snow is that? Use the z-formula: 1.88 x 118.5 33.5 . Solving this for x, you should get 181.5 inches. (c) In 2001, the snowfall totaled 59.4 inches. Was this an unusually low amount of snow for Syracuse? Justify your answer. Include the sketch a normal curve and some numerical calculations to support your answer. What percentile is 59.4 inches? Use the z-formula: z 59.4 118.5 1.76 . So it’s a little 33.5 unusual (almost 2 standard deviations below the mean). For the percentile, we’d do normcdf(-10, -1.76) = 0.0389. So this was at around the 4th percentile of all years. Pretty low!