Solutions to practice test #1

advertisement
Math 1107
Practice test #1 (selected answer key)
1. a.
70
60
50
40
30
20
10
0
En
g.
Ph
ys
.S
ci
Ps
.
yc
ho
lo
gy
c.
Ed
u
Sc
i.
Li
fe
C
S
Series1
b. A pie chart would not be appropriate since the percentages do not add up to 100.
2.
Placebo
Calcium
8 9
9 2 10 2 7 7
9 7 4 2 2 0 11 0 1 2 2
3 12 3
0 13 6
The distribution in the Calcium group seems to be slightly skewed to the right,
whereas the distribution in the placebo group is more symmetric. However, the
centers (sample means) of the two distributions are quite close in value.
3. The median is Q2  50.7, and the third quartile is Q3  58.1 . Therefore, the 1964,
1972, and 1984 were landslide elections.
4.
A
B
-10
9
-5
b) Since additive A increased mileage in 75% of the cars (notice that the first quartile is
1, whereas for additive B the first quartile is -2), it will be the choice if it is desired to
increase mileage in the highest proportion of cars.
c) Definitely additive B has a higher mean of increase due to a much higher third quartile,
and the values above the third quartile are much bigger than the corresponding values for
additive A. Moreover, the values below the first quartile for additive B are bigger than
the corresponding values for additive A.
40
40
Review exercises Part 1:
Ex. 5. a,b) Since men’s heights have a greater standard deviation they will vary more
than women’s heights. Therefore, it will be more likely to see a man qualify for the
beanstalk membership than a woman. We will compute the z-scores corresponding to the
heights needed for men and women to qualify for the membership. We have
74  69.1
70  64
zman 
 1.75, zwoman 
 2.4 . Since the z-score for a woman’s height is
2.8
2.5
much larger than the man’s height, it will be less likely to have women qualify for the
membership than men. This is due to the shape of the normal curve: since it peaks in the
middle, values closer to the center are more likely than values farther from the center.
Ex. 6. a) The distribution of number of loaves sold is skewed to the right with a center
around 110 and a range of 145-95=50.
b) Because of the skewness to the right, it is expected that the mean is higher than the
median.
c)
Sales
140
# of Days
130
120
110
100
90
Notice that the interquartile range is IQR  Q3  Q1  105.5  97  8.5 , so using the rule
for outliers, any value greater than Q3  1.5IQR  105.5  12.75  118.25 or smaller than
Q1  1.5IQR  97  12.75  84.25 are outliers. Now, there are no values less than 95, but
there are quite a few values above 118.25 so all of those would be outliers. Moreover, the
values that are more than 3IQR above the third quartile, i.e., values above 131 are
extreme outliers.
d) We cannot conclude that about 68% of the days the bakery will expect to sell between
94 and 112 loaves since the data does not follow a normal model. There is extreme
skewness associated with the presence of many outliers.
Ex. 8. b) In order to find the percent of storms that produce rain with pH over 6 we need
to compute the area under the normal curve with mean 4.9 and standard deviation 0.6, to
the right of 6. This can be obtained by: normalcdf(6,E99,4.9,0.6)=0.0334
c) We need to evaluate the area under the normal curve to the left of 4:
normalcdf(-E99,4,4.9,0.6)=0.0668.
d) This is an inverse problem: we are given the area and we need to find the value that
cuts off that area. Read again in the problem: the lower the pH the more acidic the rain.
Since we are looking at the most acidic 20% of the storms, we need to find the lowest
20% of the pH values. We get this by performing invNorm(0.20, 4.9, 0.6)=4.4
e) The pH of the least 5% acidic storms we need to find the top 5% of the pH values. We
do invNorm(0.95, 4.9, 0.6)=5.89.
f) We have Q1  invNorm(0.25, 4.9,0.6)  4.5, Q3  invNorm(0.75, 4.9,0.6)  5.3 , so the
interquartile range is IQR  0.8 .
Exercise 12. We are told that 51% of homes had a personal computer (PC) and 41.5%
had access to the internet. Common sense tells us that in order to have internet access one
must have a personal computer, so it is conceivable to believe that some(if not all) of the
homes with internet access had PC. Then it does not make sense to add the two
percentages. So we do not agree with the conclusion in the newspaper.
Exercise 14. a) We treat the data as a categorical variable. We don’t know the distance
that was traveled from home. The bar graph appears below.
Chart of Miles from home
30
25
Percent
20
15
10
5
0
1 to 5
11 to 15
16 to 20
6 to 10
Miles from home
less than 1
over 20
b) As explained above, we don’t know how far the car traveled, we only know where the
accident happened. It may have happened when the car was returning back home.
Exercise 21. Here is a histogram of the arm/nose ratio done in Minitab.
Note: the data can be downloaded from the text website or the CD.
Histogram of arm/nose ratio
6
Frequency
5
4
3
2
1
0
12.0
12.8
13.6
14.4
15.2
arm/nose ratio
16.0
16.8
The distribution of ratios is skewed to the left, with a mean around 15 and a range of
approximately 5. There may be outliers to the left of the center.
b) The summary statistics appears below:
Variable
arm/nose ratio
N
18
N*
0
Variable
arm/nose ratio
Maximum
16.900
Mean
14.978
SE Mean
0.295
StDev
1.252
Minimum
11.800
Q1
14.350
Median
15.250
Q3
15.750
Despite skewness, the mean and median are fairly close in value. The data does not have
a high variability. This can be seen from the standard deviation (fairly small) and a small
IQR=1.4. However, if we use the 1.5IQR rule for outliers we see that:
Q1  1.5IQR  14.35  1.5 1.4  12.25
Q3  1.5IQR  15.75  1.5 1.4  17.85
We see that the minimum ratio value 11.8 is below 12.25 so it is an outlier. There are no
outliers to the right of the center since 16.9<17.85.
c) Based on the observations above, a ratio of 9.3 is indeed unusually small since it is
more that 3 IQR below the first quartile: 9.2  Q1  3  IQR  14.35  3 1.4  10.15 .
1101
 27.5% Republicans.
4002
b) Since the sample was random and representative, yes, this is a reasonable estimate.
c) We have a total of 1101 people under 30 and 1004 over 65, and since these are nonoverlapping categories, the percentage of people under 30 or over 65 is:
1101  1004
 50.1% .
4002
409
 10.2% .
d) There were 409 Independents under 30, so the percentage is
4002
Exercise 33. a) There were
e) Be careful how you read this question: it is not the same as the above. Now we look at
409
 27.3% .
the percentage of people under 30 out of the Independents:
1497
f) This percentage is now computed out of the people under 30 so we divide by the row
409
 40.9% .
total:
1001
Exercise 23 page 125:
b) Using the 68-95-99.7 rule we have that the central 95% should have a diameter
between 10.4  2  4.7  1in, and 10.4  2  4.7  19.8in .
c) Since less than one inch represents 2 standard deviations below the mean, from the
normal model we get a percentage of 2.5%.
d) A diameter of 5.7 represents one standard deviation below the mean. We know that
within one standard deviation of the mean there are approx. 68% of diameters. This is
exactly between 5.7 and 15.1. Then, between 5.7 and the center, 10.4 there are 34%.
e) Between 10.4 and 15.1 there are also 34% of diameters. Since over 10.4 there are 50%
of diameters, we have 50%-34%=16%.
Exercise 25. The distribution is approximately uniform. There is too much heaviness in
the tails to allow us to use the normal model.
Exercise 28. a), b) Since the data is so dramatically skewed to the left, the median is a
better measure of center and the IQR would be a better measure of spread.
c) According to the Normal model, about 68% of neighborhoods would have a percent
white within one standard deviation of the mean..
d) First, if we measure one standard deviation to the right of the mean we obtain
83.59+22.26=105.85 (impossible to be a percentage). Here you can see what skewness
can lead to. One standard deviation to the left: 83.59-22.26=61.33. From the histogram
we need to get an estimate of the number of neighborhoods that have a percentage of
white between 61.33% and 100%. Estimating the heights of the bars in he histogram we
get a total count of at least 400. Of the total of 500 neighborhoods this represents at least
400/500=80%, which is very different from what the normal model predicts.
e) The Normal model is not a good model for this data. The distribution we have is
skewed to the left. Therefore no conclusions can be reached by using the Normal model.
Exercise 31. We use the TI83 to answer these questions. All questions are inverse
problems: we are given the area under the Normal model and we need to find the z-scores
that cut off those areas.
a) invNorm(0.8)=0.84
b) invNorm(0.25)=-0.67
c) invNorm(0.03)=-1.88
d) Since we have the middle 90% area, the z-score to the left has and area of 0.05 to
its left, and we have invNorm(.05)=-1.645. The right end point of the middle
interval is by symmetry 1.645..
Exercise 33. a) The percent of steers that weigh over 1250 is
normalcdf(1250, E99, 1152, 84)=.1217=12.17%
b) The percent of steers weighing under 1200 is normalcdf(0,1200, 1152, 84)=71.61%
c) Between 1000 and 1100: normalcdf(1000,1100, 1152, 84)=23.28%
Exercise 37. To find the 40th percentile we compute invNorm(0.40,1152,84)=1130.7 lbs
b) The 99th percentile: invNorm(.99, 1152, 84)=1347.4 lbs
c) For IQR we need the first and third quartiles: IQR= invNorm(.75, 1152, 84)invNorm(.25, 1152, 84)=1208.66- 1095.34=113.32
Download