solutions from the review problems

advertisement
Math 1107
Answers to some final review problems
Exercise 26 page 40.
We compute the row and column totals in the given table:
Comm parlor
Elsewhere
No tattoo
Total
Has hep. C
17
8
18
43
No hep. C
35
53
495
583
Total
52
61
513
626
In analyzing the given information we may answer questions such as:
(A) Is hepatitis C more prevalent in individuals who are getting tattooed?
(B) Among those who get tattoos, who are more likely to get hepatitis C?
(A) The question asked actually answers whether there is an association between getting
a tattoo and having hepatitis C. Let’s compute the percentage of individuals who get
hepatitis C among those who get tattoos, compared to the general population.
17  8
 .22 , the percentage of people in the sample
Therefore we compare: pˆ1 
52  61
43
 .069 , the
who have hepatitis C among those who are tattooed, with pˆ 2 
626
percentage of people in the sample who have hepatitis C. Since p̂1 is much bigger
than p̂2 , we may conclude that people who get tattooed are more likely to get
hepatitis C, and therefore “getting a tattoo” and “getting hepatitis C” are dependent
events.
(B) For this question we may compare the conditional probabilities of individuals getting
hepatitis C given that they were tattooed in a commercial parlor or elsewhere. We
17
8
 .33, pˆ 2 
 .13 . Again, it seems that it is more likely to get
have: pˆ1 
52
61
hepatitis C if you get the tattoo in a commercial parlor than if you get your tattoo
elsewhere.
Now, this would be enough for a complete answer, but to show you how you can
approach the question with the knowledge from confidence intervals, we can try to
answer how significant is the difference between the proportions we computed above.
For part (A) we will construct 95% confidence intervals for p1 the true proportion of
individuals who get hepatitis C in the general population and for p2 the true proportion of
individuals who get hepatitis C and have tattoos. Notice that the conditions listed under
computing confidence intervals for proportions are met: we have a simple random sample
(we hope it is representative), the sample is large enough and the number of successes
(hep C cases) and failures are both greater than 10.
We get 95% CI for p1 : (.132,.274) , and for p2 : (.049,.089) . Since the two intervals do
not overlap, we may say with 95% confidence that the two population proportions are
significantly different.
We can do a similar analysis for part (B). This time we compute 95% confidence
intervals for the population proportion pf those getting hepatitis C in the two different
settings. We obtain: (.199,.454) a 95% confidence interval for the percentage of people
getting hepatitis C among those who get tattoos in a commercial parlor, and (.046,.216) a
95% confidence interval for the percentage of people getting hepatitis C among those
who get tattoos elsewhere. Since the two intervals do overlap we cannot conclude that
there is a significant difference in the population proportion of those getting hepatitis C in
commercial parlors vs elsewhere.
Problem 20 page 91.
(a) If we look at the three boxplots we can conclude that in 2002 gas prices had a
distribution skewed to the left, with three outliers. The median price was about
$1.40/gal and the prices varied from about $1 to about $1.45/gal. In 2003 the prices
were much higher than in 2002: notice that the lowest gasoline price in 2003 is at or
even above the highest price in 2002. The distribution was a little skewed to the right,
with a median price of about $1.50/gal and a spread of about $0.30. In 2004 the
distribution was more symmetrical, with one outlier to the right. The median price
was about $1.70/gal and the spread was about $0.50=2-1.50.
(b) In 2004 prices seemed to have been less stable since the data has the largest spread
and an interquartile (IQR) range almost as large as the 2003 prices. In 2002 we
observe a small range and a very small IQR showing very little variability in the
gasoline price.
Problem 28 page 93.
(a) Class C performed better on the test since from the boxplot we can see that about
75% of the scores (i.e at and above the first quartile) are above 60, whereas the other
two classes have their median score at that level.
(b) The distributions A and B are symmetric, whereas C is skewed left.
(c) Class 1 corresponds to plot A(smaller IQR due to mound shape), class 2 to B and
class 3 to C (the skewness is noticeable)
Problem 28 page 160.
The scatterplot does not show a linear association between highway speed and traffic
time delay in hours, therefore it is not meaningful to compute the correlation coefficient
of the data.
Problem 27 page 244.
(a) The percent of variation in temperature explained by latitude is r 2  .719 .
(b) The correlation being negative it means that as the latitude increases (as we move up
north), the average January temperature will decrease.
(c) Using formulae from page 169 we have:
s
( JanTemp) yˆ  r y  latitude  b0  2.11 latitude  108.77
sx
(d) The slope, -2.11 means that for each increase in latitude of 1 degree, the average
January temperature drops approximately 2.11 F.
(e) The y-intercept would mean that at a zero degree latitude (i.e. at the Equator), the
average Jan temperature is approximately 108.77 F. This seems a little too hot even
for the Equator. Since the data was collected from U.S. cities which are far from
being close to 0 degrees in latitude N, we cannot conclude that the y-intercept will
give us a meaningful value.
(f) Using the equation derived in (c) we get the predicted value for Denver: yˆ 24.4
(g) If a residual is positive it means that the predicted value is below the actual observed
value.
Problem 18 page 514.
Denote by X the weight of an egg. Using the normal model we have:
(a) P( X  62)  normalcdf (62, E 99, 60.7,3.1)  0.3375
(b) Since we have to answer a question about the average weight of a dozen eggs we will
use the Central Limit Theorem result (n=12 is small but the initial model is normal) and
3.1
we get: P( X  62)  normalcdf (62, E 99, 60.7,
)  0.0732 .
12
(c) Now we look at the total weight of a dozen eggs, so the expected value of the total
weight will be 12  60.7  728.4 grams, and the standard deviation will be
3.1 12  10.74 grams. So the 68-95-99.7 rule will show one, two and three standard
deviations from the mean, respectively. That is, approximately 68% of the total weight of
a dozen eggs will be between 717.66 and 739.14, approximately 95% will be between
706.92 and 749.88, and 99.7% between 696.18 and 760.62.
Download