Math 1107 Answers to some final review problems Exercise 26 page 40. We compute the row and column totals in the given table: Comm parlor Elsewhere No tattoo Total Has hep. C 17 8 18 43 No hep. C 35 53 495 583 Total 52 61 513 626 In analyzing the given information we may answer questions such as: (A) Is hepatitis C more prevalent in individuals who are getting tattooed? (B) Among those who get tattoos, who are more likely to get hepatitis C? (A) The question asked actually answers whether there is an association between getting a tattoo and having hepatitis C. Let’s compute the percentage of individuals who get hepatitis C among those who get tattoos, compared to the general population. 17 8 .22 , the percentage of people in the sample Therefore we compare: pˆ1 52 61 43 .069 , the who have hepatitis C among those who are tattooed, with pˆ 2 626 percentage of people in the sample who have hepatitis C. Since p̂1 is much bigger than p̂2 , we may conclude that people who get tattooed are more likely to get hepatitis C, and therefore “getting a tattoo” and “getting hepatitis C” are dependent events. (B) For this question we may compare the conditional probabilities of individuals getting hepatitis C given that they were tattooed in a commercial parlor or elsewhere. We 17 8 .33, pˆ 2 .13 . Again, it seems that it is more likely to get have: pˆ1 52 61 hepatitis C if you get the tattoo in a commercial parlor than if you get your tattoo elsewhere. Now, this would be enough for a complete answer, but to show you how you can approach the question with the knowledge from confidence intervals, we can try to answer how significant is the difference between the proportions we computed above. For part (A) we will construct 95% confidence intervals for p1 the true proportion of individuals who get hepatitis C in the general population and for p2 the true proportion of individuals who get hepatitis C and have tattoos. Notice that the conditions listed under computing confidence intervals for proportions are met: we have a simple random sample (we hope it is representative), the sample is large enough and the number of successes (hep C cases) and failures are both greater than 10. We get 95% CI for p1 : (.132,.274) , and for p2 : (.049,.089) . Since the two intervals do not overlap, we may say with 95% confidence that the two population proportions are significantly different. We can do a similar analysis for part (B). This time we compute 95% confidence intervals for the population proportion pf those getting hepatitis C in the two different settings. We obtain: (.199,.454) a 95% confidence interval for the percentage of people getting hepatitis C among those who get tattoos in a commercial parlor, and (.046,.216) a 95% confidence interval for the percentage of people getting hepatitis C among those who get tattoos elsewhere. Since the two intervals do overlap we cannot conclude that there is a significant difference in the population proportion of those getting hepatitis C in commercial parlors vs elsewhere. Problem 20 page 91. (a) If we look at the three boxplots we can conclude that in 2002 gas prices had a distribution skewed to the left, with three outliers. The median price was about $1.40/gal and the prices varied from about $1 to about $1.45/gal. In 2003 the prices were much higher than in 2002: notice that the lowest gasoline price in 2003 is at or even above the highest price in 2002. The distribution was a little skewed to the right, with a median price of about $1.50/gal and a spread of about $0.30. In 2004 the distribution was more symmetrical, with one outlier to the right. The median price was about $1.70/gal and the spread was about $0.50=2-1.50. (b) In 2004 prices seemed to have been less stable since the data has the largest spread and an interquartile (IQR) range almost as large as the 2003 prices. In 2002 we observe a small range and a very small IQR showing very little variability in the gasoline price. Problem 28 page 93. (a) Class C performed better on the test since from the boxplot we can see that about 75% of the scores (i.e at and above the first quartile) are above 60, whereas the other two classes have their median score at that level. (b) The distributions A and B are symmetric, whereas C is skewed left. (c) Class 1 corresponds to plot A(smaller IQR due to mound shape), class 2 to B and class 3 to C (the skewness is noticeable) Problem 28 page 160. The scatterplot does not show a linear association between highway speed and traffic time delay in hours, therefore it is not meaningful to compute the correlation coefficient of the data. Problem 27 page 244. (a) The percent of variation in temperature explained by latitude is r 2 .719 . (b) The correlation being negative it means that as the latitude increases (as we move up north), the average January temperature will decrease. (c) Using formulae from page 169 we have: s ( JanTemp) yˆ r y latitude b0 2.11 latitude 108.77 sx (d) The slope, -2.11 means that for each increase in latitude of 1 degree, the average January temperature drops approximately 2.11 F. (e) The y-intercept would mean that at a zero degree latitude (i.e. at the Equator), the average Jan temperature is approximately 108.77 F. This seems a little too hot even for the Equator. Since the data was collected from U.S. cities which are far from being close to 0 degrees in latitude N, we cannot conclude that the y-intercept will give us a meaningful value. (f) Using the equation derived in (c) we get the predicted value for Denver: yˆ 24.4 (g) If a residual is positive it means that the predicted value is below the actual observed value. Problem 18 page 514. Denote by X the weight of an egg. Using the normal model we have: (a) P( X 62) normalcdf (62, E 99, 60.7,3.1) 0.3375 (b) Since we have to answer a question about the average weight of a dozen eggs we will use the Central Limit Theorem result (n=12 is small but the initial model is normal) and 3.1 we get: P( X 62) normalcdf (62, E 99, 60.7, ) 0.0732 . 12 (c) Now we look at the total weight of a dozen eggs, so the expected value of the total weight will be 12 60.7 728.4 grams, and the standard deviation will be 3.1 12 10.74 grams. So the 68-95-99.7 rule will show one, two and three standard deviations from the mean, respectively. That is, approximately 68% of the total weight of a dozen eggs will be between 717.66 and 739.14, approximately 95% will be between 706.92 and 749.88, and 99.7% between 696.18 and 760.62.