Answers to Review Questions for Final

advertisement
Math 251, Review for Final, Autumn 2002
(Rough Answers)
The following questions are samples of the types of questions that may be on the final. There
may be questions on the final from topics not represented here. For further review, look at your
old tests and reviews, assigned homework, etc.
Material covered since 3rd test will probably comprise about 30% of the points on the final test.
This material includes hypothesis test for means (large samples and small samples), hypothesis
tests for proportions, hypothesis test for the difference of two population means. The chi-square
test for goodness of fit, and analysis of variance. The rest of the test will comprise of questions
chosen from the other material covered throughout the quarter.
1. (a) Which type of random variable is the number of consumers refusing to answer a telephone
survey and what possible values can it take?
Discrete -- can take any nonnegative integer value, i.e, {0,1,2,3,4,5,....}
(b) How many bridge hands are there that have 4 aces? What is the probability of getting such a
hand?
The number of total Bridge Hands is N(S) = 52C13=
52! = 635,013,559,600
13!39!
The number of hands with four aces is N(A) =4C4 48C9= 1 48! =1,677,106,640
9!39!
The probability of four aces is N(A) =.00264
N(S)
2. For events A and B in a sample space S, we are told P(A) = .5 and P(B) = .3 and
P(A and B) = .15. Which of the following is true?
(a) A and B independent events.
(b) P(A or B) = .8
(c) A and B mutually exclusive events.
(d) All of the above.
ANSWER: A
[Check that P(A and B) = P(A)*P(B), also, neither (b) nor (c) are true
because P(A and B) > 0]
3. Which of the following is true about a binomial random variable for n trials with probability of
success on each trial given as p.
(a) The probability of n successes is pn.
(b) Its variance is equal to np(1-p).
(c) The probability of no successes is (1-p)n.
(d) All of the above.
ANSWER: D
[Look at formulas for binomial random variables!]
4. An hypothesis test on the mean reports a P-value of .031. Which of the following is true?
(a) The null hypothesis should be accepted if the level of significance is .03.
(b) The null hypothesis should be rejected if the level of significance is .05.
(c) There is almost a 97% chance of making a Type I error.
(d) All of the above.
ANSWER: B (Reject null hypothesis if P-value  level of significance  )
5. If a 95% confidence interval for the population mean has length 12 when the sample size is
100, what would the length of a 95% confidence interval from the same population be if the
sample size were 1600?
(a) 12
(b) 48
(c) 3
(d) 6
ANSWER: C (the length decreases by a factor of 4 which is the square root of 1600 over the
square root of 100)
6. A two-tailed hypothesis test on the mean of an approximately normal population is
conducted with a sample size of n=10. For what t-values should the null hypothesis be rejected
given that the level of significance is .05?
(a) t  -1.96 or t  1.96
(c) t  -1.833 or t  1.833
(b) t -2.262 or t  2.262
(d) t  -2.228 or t  2.228
ANSWER: B (two-tailed test, find that the critical value is 2.262 with d.f. = 9)
7. (a) Given the data 9,12,15,17,17,19,23,44,57,61,63,70. Find the mean, median, range, and the
mode.
ANSWERS:
i) x = 407, therefore the mean is 4071233.92.
ii) The median is in the 6.5th place, therefore the median is (19+23)  2 = 21.
iii) The range is 70 - 9 = 61.
iv) The mode is 17 which is the most common data value.
b) If your score is at the 81st percentile on a national exam which was taken by 200,000 people,
approximately how many of those 200,000 test takers scored higher than you?
ANSWER: Approximately 19% of 200,000 which is 38,000.
8. In a state with 459,341 voters, a poll of 2300 voters finds that 45 percent support the
Republican candidate, where in reality, unknown to the pollster, 42 percent support the
Republican candidate.
(a) What is the value of the statistic of interest?
ANSWER: 45%
(a statistic is a numerical property of the sample)
(b) What is the value of the parameter of interest?
ANSWER: 42% (a parameter is a numerical property of the population)
(c) Describe the population of interest.
ANSWER: The population is all voters in the state.
(d) In general, is it true that given a certain population, the parameter of interest will not change
under repeated sampling? Explain.
ANSWER: True, the parameter does not depend on a specific sample, so it doesn’t change when
the sample changes.
9. (a) According to Chebychev’s theorem, how much data from any distribution can be more than
3 standard deviations from the mean?
ANSWER: At most 1/9 of the data.
(b) Given a population of size 4,800 with unknown distribution, at least how many data values
are within 4 standard deviations of the mean?
ANSWER: At least 15/164,800 = 4,500
10. The following ranked data represent the number of miles driven each day by a salesman over
a 30-day period.
31
71
86
37
74
86
43
75
87
44
75
89
44
78
89
55
81
92
58
81
92
65
81
93
65
82
99
66
84
101
Construct a relative frequency histogram for these data whose first class has class limits 30-44:
ANSWER:
limits
30-44
45-59
60-74
75-89
90-104
Frequency
5
2
5
13
5
Rel. Freq.
.167
.067
.167
.433
.167
See your text for constructing
the relative frequency histogram.
Class Width = 15
11. Consider the sample of 30 numbers
31
71
86
37
74
86
43
75
87
44
75
89
44
78
89
55
81
92
58
81
92
65
81
93
65
82
121
66
84
133
for which x = 2258, and x2 = 184670 (or (x-2= 14,717.86667) Find:
(a) the sample mean
225830=75.27
(b) the sample variance
507.51
[Note: s2 = (30(184670)-22582)/(3029) ]
(c) the sample standard deviation
22.53
(the square root of the answer in (b))
(d) Given that Q1= 65, Q2= 79.5 and Q3= 87, construct a boxplot for the data.
For the box plot -- see text -- noting that the lower whisker starts at 31, the box
has hinges at 65 and 87 with the vertical line in the box at 79.5 (the median), and the upper
whisker ends at 133
(e) Find the interquartile range for the data.
IQR = 22
12. (True or False)
(a) True The median is a resistant measure because it is not influenced by extreme
observations.
(b) False The mean is a resistant measure because extreme measures on one side average out
with those on the other side.
(c)
True The mean and median are equal in a symmetric distribution.
(d) True The mean is usually to the right of the median in a distribution that is skewed to the
right.
13. The following represent scores of a group of 15 students on Math and English tests.
Scores on English Test
73 75 77 77 78
79
80
81
82
83
84
85
85
86
89
Scores on Math test
72 75 79 83 84
85
87
88
90
91 92
93
93
97
98
(a) Construct and leaf plots splitting stems 7,8,9 into two parts with leaves 0-4 on one part and 59 on the other part?
Math:
7
7
8
8
9
9
2
59
34
578
01233
78
7
7
8
8
9
9
3
57789
01234
5569
English:
9|8 = 98
(b) Which test scores seem to have a higher standard deviation? Explain. Don't compute!
The math test scores appear to have a higher standard deviation because they are more spread out
(the English test scores are bunched much more closely together than the math test scores).
14. Suppose distribution of test scores for a certain test is normal with  = 70 and  = 12.
Suppose that 500 students wrote the test.
(a) What test score would have a z-score of -2.25?
ANSWER: x = 43
[Solve -2.25 = (x - 70)  12]
(b) What score would put a student at the 90th percentile?
ANSWER: the 90th percentile has a z-score of approximately 1.28, thus x  85.36,
i.e. P 90 85.36
[The x was found by solving 1.28 = (x - 500)  12]
(c) Approximately what number of students would have scores between 60 and 90?
ANSWER: .7492
z = (60 - 70)/12  -.833
z = (90 - 70)/12  1.67
P(-.833 < z < 1.67) .9525 - .2033 = .7492
15. A study of behavior of a large number of drug offenders after treatment for drug abuse
suggests that the likelihood of conviction within a two-year period after treatment may depend on
the offender's education. The proportions of the total number of cases falling to four
education/conviction categories are shown in the following table:
10 or more years of
education
9 or less years of education
Convicted
.1
Not Convicted
.3
.27
.33
Suppose a single offender is selected from the treatment program. Define the events:
A: The offender has 10 or more years of education.
B: The offender is convicted within 2 years of completion of treatment.
ANSWERS:
(a) P(A or B) = P(A)+ P(B) - P(A and B) = .4 + .37 - .1 = .67
(b) P(A and B) = .1
(c) P(B|A) = P(A and B)  P(A) = .1  .4 = .25
(d) The probability that neither A nor B occurs is: 1 - P(A or B) = 1 - .67 = .33
(e) A and B are not independent because P(A)P(B)  P(A and B)
(f) A and B are not mutually exclusive because P(A and B)  0.
(Note that these answers (a)-(d) can be found in different ways using the table above.)
16. A business employs 600 men and 400 women. Five percent of the men and 10% of the
women have been working there for more than 20 years. If an employee is selected by chance,
what is the probability the employee is male, given that the length of employment is more than
20 years?
ANSWER. Let A = event that the employee is male and
B = event the employee has worked there more than 20 years
Then P(B) = [.05*600+.10*400]/1000 = .07
P(A and B) = (.05)(600)/1000 = .03
P(A|B) = P(A and B)  P(B) = .03/.07  .4286 which is the probability an employee is male given
that the length of employment is more than 20 years.
(alternatively, this can be solved noting that 30 out of the 70 employees who have worked there
for more than 20 years are male).
17. (a) How many permutations are there of 30 objects taken 3 at a time?
30P 3
= 30!27! = 302928 = 24,360
(b) In how many ways can a gold medal, silver medal and bronze medal be awarded to 30
competitors in a fencing competition?
30P 3
= 30!27! = 302928 = 24,360
(c) How many menu possibilities are there in a restaurant that offers 5 different appetizers, 6
Salads, 12 main dishes and 10 desserts if one choice is made from each category?
ANSWER: 561210 = 3,600
(d) Suppose that a large shipment of CD’s contains 5% defective CD’s. Suppose a customer
chooses 2 of these CD’s at random. What is the probability that:
i) Both CD’s will be good?
ANSWER: (.95)(.95) = .9025
ii) Both CD’s will be defective?
ANSWER: (.05)(.05) = .0025
iii) Exactly one CD is defective?
ANSWER: 1 – (.9025+.0025) = .095
iv) At least one CD is defective?
ANSWER: 1 - .9025 = .0975
(or .095+.0025 = .0975)
v) At least one CD is good?
ANSWER: 1 - .0025 = .9975 (or .9025+ .095 = .9975)
18. A jury pool consists of 13 men and 15 women. What is the probability that a randomly
chosen jury from this pool will consist of 5 men and 7 women?
Number of ways of choosing a jury:
28C12
= 30421755
Number of ways of choosing a jury with 5 men and 7 women from pool:
13C5
* 15C7 = 1287*6435 = 8281845
Probability of choosing a jury with 5 men and 7 women from pool:
8281845  30421755  .2722
19. Let x be the random variable that represents the number of heads observed when
5 fair coins are tossed. Make a probability distribution for x, and find the probability that
one will get more than 3 heads when tossing five fair coins.
x
p(x)
0
.03125
1
.15625
2
.3125
3
.3125
4
.15625
5
.03125
This was computed by using the binomial probability formula, i.e. p(x) = 5Cx (.5)x(.5)5-x
Thus, P(x > 3) = .15625+.03125 = .1875. Thus there is a probability of .1875 of getting
more than 3 heads when tossing 5 fair coins. Hence if one were to toss 5 fair coins 10000 times,
they would expect to have more than 3 heads (on average) 1875 of those times.
20. Consider the random variable whose probability distribution is given by
the following table.
x
p(x)
3
.1
7
.3
8
.45
11
?
(a) Is this a discrete or continuous random variable?
ANSWER: Discrete
(b) P(x = 11) = 1 - (.1 + .3 + .45) = .15
(c) Construct a probability histogram for p(x), and compute the expected value of x and the
standard deviation of x.
The expected value is: E(x) = 3(.1) + 7(.3) + 8(.45) + 11(.15) = 7.65
The variance is:
2=32(.1)+ 72(.3)+ 82(.45)+ 112(.15)- 7.652 = 4.0275
Thus the standard deviation is  = 4.02751/2 2.0069
See text for constructing histograms (Section 4.1, p. 164ff).
21. The following sample data concerns the number of years a student studied German in school
versus their score on a proficiency test.
Years (x)
3
Test Score(y) 57
Note: x = 35
4
78
y = 697
4
72
2
58
5
89
3
63
x2 = 133 y2 = 50085
4
73
5
84
3
75
2
48
xy =2554
(a) Find the equation of the least squares line for this data.
slope = (102254 - 35697)/(10133 - 352)  10.90476
y-intercept = (697 10) - 10.90476(3510)  31.533
Thus the equation of the line is:
y  10.905x + 31.533
(b) Use your line from (a) to predict the score on the proficiency test of a person who had 3.5
years of German.
y  10.9053.5 + 31.533  69.7
(c) Use the regression line in (a) to predict the number of years of German required to achieve a
proficiency score of 75.
x  (75 - 31.533)  10.905  3.99 years
(d) Compute the correlation coefficient r for this data. What does this coefficient suggest about a
linear relationship between number of years German was studied in school and test scores for this
sample? That is, determine whether it is a good fit, and whether it indicates a positive or
negative linear relationship.
r = [(10)(2554)-(35)(6970)]  [(10(133)-352)1/2 (10(50085)-6972)1/2]  9.11
This value is reasonably close to +1, which means it represents a good linear relation with
positive slope (i.e. as x increases so does y). The closer r is to +1 (or to -1) the better the linear fit
will be.
22. Cascade Airlines (a.k.a. “Crashcade” and now defunct) records showed that on average 10%
of prospective passengers will not claim their reservations on a certain flight.
Suppose that they booked 21 passengers for 20 seats on that flight.
(a) Find the mean and standard deviation for the number of passengers who will claim a
reservation.
 = (21)(.9) = 18.9
2 = 21(.9)(.1) = 1.89
Thus the mean number of passengers showing up is 18.9 with standard deviation  1.375
(b) Find the probability that all passengers who show up for the flight will receive a seat?
P(x  20) = 1 - P(x = 21) = 1 - (.9)21 = 1 - .1094 = .8906
[We have solved for P(x  20) because if 20 or fewer passengers show up, they will all have
seats.]
23. A developer wishes to test whether the mean depth of water below the surface in a large
development tract was less than 500 feet. For the sample data, n = 32 test holes, the sample mean
was 486 feet, and the standard deviation was s = 53 feet. Complete the test using the P-value
approach, and report the conclusion for a 1% level of significance.
Null Hypothesis:   500
Alternative Hypothesis:  < 500
Standardized Test Statistic: z = [486 - 500]  [53/(32)1/2]  -1.49
P-Value: P(z < -1.49) = .5 - .4319 = .0681
We would not reject the null hypothesis at a level of significance of .01 because the
P-value is larger than .01.
24. A vendor was concerned that a soft drink machine was not dispensing 6 ounces per cup, on
average. A sample size of 40 gave a mean amount per cup of 5.95 ounces and a standard
deviation of .15 ounce.
(a) Find the P-value
This is a two-tailed test:
Null Hypothesis:  = 6
Alternative Hypothesis:
  500
The observed value is:
z = [5.95 - 6]  [.15/(40)1/2]  -2.11
The P-value is: P(z < -2.11) + P(z > 2.11) = 2(.0174) = .0348
(b) For which of the following levels of significance would the null hypothesis be rejected?
(i)  = .10
(ii)  = .05
(iii )  = .01
Reject the null hypothesis in (i) and (ii) since the P-value is smaller than ; do not reject the null
hypothesis in (iii).
(c) For each case in part (b), what type of error has possibly been committed?
Possible Type I error may occur in (i) and (ii) while a Type II error may occur in (iii).
(d) Find a 98% confidence interval for the mean amount of soda dispensed per cup.
For c = .96, zc = 2.05 (approximately), look at z value corresponding to an area of .98 on table.
Thus the confidence interval, using the large sample method (n is at least 30) yields endpoints:
6  2.05.15(401/2)
and, so the confidence interval is:
(5.9514,6.0486)
(e) Supposing that the population standard deviation is  = .15, what sample size would be
needed so that the margin of error in a 98% confidence interval is E = .01?
z   
z 
ANSWER: The formula to use is: E = c
which implies n  c
and so we compute
 E 
n
2
n = (2.05.15/.01)2 = 945.5625
thus we should use a sample size of 946.
25. On June 7, 1999 a poll on the USA Today website showed that out of 2000 respondents, 71%
felt that Andre Agassi deserved to be ranked among the greatest tennis
players ever.
(a) Assuming that the 2000 respondents form a random sample of the population of tennis fans,
construct a 95% confidence interval for the proportion of all tennis fans who feel that Andre
Agassi should be ranked among the greatest tennis players ever.
ANSWER: (.6901, .7299)
To find this confidence interval we computed:
.71  1.96(.71.29/2000)1/2 = .71  .0199, where the 1.96 = zc for c = .95 and p-hat is .71.
(b) Based on (a), would you be comfortable in saying that the poll is accurate to within plus or
minus 2 percent 19 times out of 20? Explain.
Yes, 95% confidence interval is .71  .0199, hence intervals based on this size of random
sample with the given proportion should have an accuracy of  2% on average 19 times out of
20.
(c) In actuality, the survey was based on voluntary responses from readers of the USA Today
sports website. Do you think the 2000 respondents actually formed random sample? Explain.
No -- the readership of the website is limited to those who have access to the site and chose to
visit it; moreover, the survey was not based on a random selection of even those users of the
website, but on those who chose to respond to the poll.
26. (a) Suppose that a February Gallup poll of 1200 randomly selected voters found that 53
percent support George W. Bush's energy policy. Conduct an hypothesis test at a level of
significance of = .01 to test whether the true voter population support for George W. Bush's
energy policy in February was greater than 50 percent.
ANSWER:
Null Hypothesis:
p  .50
Alternative Hypothesis: p > .50
Critical Region:
z  2.33
Test Statistic:
z = (.53 - .5)/(.5.5/1200)1/2  2.08
Conclusion: Because 2.08 does not fall in the critical region there is not sufficient evidence to
reject the null hypothesis at a level of significance of 1%.
(b) Report the P-value of the test in (a) and give a practical interpretation of it.
P-value = P(z > 2.08) = 1 - .9812 = .0188. Thus we are roughly 98% certain that more than 50%
of all voters support President G.W. Bush’s energy policy.
27. A brand of paint claims that in one coat, 1 gallon will cover at least 350 square feet on
average. A random sample of ten 1-gallon cans produced the following data.
Area Covered (Square Feet): 342, 378, 358, 364, 381, 392, 339, 356, 386, 347
Note: for this data x = 3643
x2 = 1330395
(a) Conduct the hypothesis test:
H0: = 350 vs. Ha:  > 350
at a level of significance of significance of = .05. Be sure to state critical region, test statistic
and conclusion in your answer.
We assume the distribution is approximately normal, so we use the Student’s t-distribution with
d.f. = 9.
The critical region is: t  1.833
From the data, we calculate the standard deviation s  19.0032 (using the same method as we
used in 11(c)) and so the test statistic is
t  (364.3 - 350)  (19.0032/(10)1/2)  2.378
Because 2.378 is in the critical region, we reject the null hypothesis and conclude that on average
a gallon of paint covers more than 350 square feet.
(b) Construct a 99% confidence interval for the mean.
tc = 3.25
for c=.99 and d.f.= 9, thus the confidence interval is
364.3  3.2519.0032/(10)1/2
i.e.
(344.8,383.8)
28. In a 1993 survey of 50 Education graduates and 50 Social Science graduates, the following
data were obtained for their average starting salaries.
Major
Education
Social Sciences
Mean
22,554
20,348
St. Dev
2225
2375
(a) Find a point estimate for the difference in average starting salaries for Education and Social
Science majors.
ANSWER: 22,554 - 20,348 = 2206
(b) Let 1 be the population mean salary for the Education graduates and 2 be the population
mean salary for the Social Science graduates.
Report the P-value for the hypothesis test H0: 1- 2 = 1200 versus Ha: 1- 2 > 1200.
First, [22252/50 + 23752/50]1/2  460.24. Thus we compute the test statistic
z  (2206 - 1200)  460.24  2.19
Therefore, the P-value is: P(z > 2.19) = .5 - .4857 = .0143
(c) Based on (b), do you think there is sufficient evidence to believe that 1 is at least $1200
greater than 2 ? Explain.
Yes -- based on the P-value we are 98.5% sure that the alternative hypothesis is true, i.e.,
the mean salary for education graduates is at least $1200 more than the mean salary for social
science graduates.
29. Suppose that the probability is .91 that a person who has reservations for a certain opera will
show up, and the decision of one person is independent from that of another. Suppose the opera
has sold 1243 tickets. What is the probability that at least 1140 people will show up for the opera.
ANSWER: use the normal approximation to the binomial theorem (section 5.6):
First, we check that np = 1243.91 = 1131.13  5 and nq = 1243.09= 111.87  5 so that it is valid
to use the normal approximation to the binomial distribution.
Then,
 = 1131.13
 = (1243.91.09)1/2  10.0897
and
using the continuity correction, we compute
P(x > 1139.5 ) = P(z > (1139.5– 1131.13)10.0897) = P(z > .82956) = 1-.7967 = .2033
That is, there is a 20.33% chance (approximately) that at least 1140 people will show up.
30. (a) If you were to conduct an hypothesis test to determine if the means from several different
populations are equal using the method of analysis of variance, what assumptions would you
make on the populations? What distribution would you use to conduct your test?
The populations are normal with equal variances and the samples are independent. (See text p.
524)
(b) Problem 3, p. 532. (See Answer in Text)
31. (a) A local radio station claims that 15 percent of all people in Riverside say it is their
favorite station, 65 percent of all people in Riverside listen to it occasionally, while 20 percent
never listen to it. Suppose you surveyed 200 randomly selected people in
Riverside and found that of those 200 people, 20 claimed it was their favorite station, 131 said
they listen to it occasionally, while 49 never listen to it. Conduct an hypothesis test at a level of
significance of .05 to determine whether the stations claim is correct. Make sure to state the
rejection region for your test.
ANSWER:
Favorite
Occasional
Never
%
15
65
20
O
20
131
49
E
30
130
40
(O - E)2
100
1
81
Because the expected number is at least 5 in each category, we can proceed with the chi-square
goodness of fit test as in section 10.1. The rejection region with d.f. = 2 and  = .05 is 2  5.991.
The observed value 2 =  (O-E) 2/E = 100/30 + 1/130 + 81/40 = 5.336.
Because the observed value does not fall in the rejection region, we don’t have enough evidence
at the .05 level of significance to reject the radio station’s claim.
(b) What are the assumptions one must make when using the chi-square test for Goodness-of-Fit?
ANSWER: sample must be randomly selected, and the expected frequency for each category
must be at least 5.
(c) For further practice, see, e.g. problem 3, p. 500.
32. List conditions that are needed on the population and on the random sample(s) in order to
make inferences in the following settings. In some cases, there may be no conditions required, so
just list none.
(a) Confidence interval for a mean from a large sample.
ANSWER: sample size is at least30.
(b) Hypothesis test on a mean using a small sample.
ANSWER: population must be approximately normal. If population standard deviation is
known, use the normal distribution. If population standard deviation is not known, use the
sample standard deviation and the student’s t-distribution.
(c) Hypothesis test on a proportion.
ANSWER: np and nq must be at least 5.
(d) Hypothesis test concerning two means from large independent samples.
ANSWER: samples form each population must be independent (as suggested) and each must be
of size at least 30.
33. Confidence intervals for variance and standard deviation. Do problem #11 on p. 307. See text
for answer.
Download