  9 .

advertisement
251y0741 5/20/07
ECO 251 QBA1
FINAL EXAM, Version 1
MAY 7 and 10, 2007
Name KEY
Class ________________
Part I. Do all the Following (14 Points) Make Diagrams! Show your work! Illegible and poorly presented
sections will be penalized. Exam is normed on 75 points. There are actually 123+ possible points. If you haven’t
done it lately, take a fast look at ECO 251 - Things That You Should Never Do on a Statistics Exam (or Anywhere
Else).
x ~ N 10, 7.9
Material in italics below is a description of the diagrams you were asked to make or a general explanation and will
not be part of your written solution. The x and z diagrams should look similar. If you know what you are doing,
you only need one diagram for each problem. General comment - I can't give you much credit for an answer with a
negative probability or a probability above one, because there is no such thing!!! In all these problems we must
 x 
find values of z  
 corresponding to our values of x before we do anything else. A diagram for z will help
  
us because, if areas on both sides of zero are shaded, we will add probabilities, while, if an area on one side of zero
is shaded and it does not begin at zero, we will subtract probabilities. Note: All the graphs shown here are missing
a vertical line. They are also to scale. A hand drawn graph should exaggerate the distances of the points from
the mean. Note also that, because of the rounding error necessary to use a conventional normal table, the
results for x will disagree with results for z, especially for small values of z. There may also be discrepancies of
.0001 between the computer generated z results and those taken from the Normal table due to the fact that Minitab
carries its probabilities as more than 5 places.
14  10 

 Pz  0.51  .5  .1950  .3050
1. Px  14   P  z 
7.9 

For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the
entire area above 0.51. Because this is entirely on one side of zero, we must subtract the area between zero and 0.51
from the entire area above zero. If you wish, make a completely separate diagram for x . Draw a Normal curve with
a mean at 10. Indicate the mean by a vertical line! Shade the area above 14. This area is entirely to the right of the
mean (10), so we subtract the area between the mean and 14 from the half of the distribution that is above the mean.
1
251y0741 5/20/07
9  10 
 0  10
z
 P 1.27  z  0.13   P 1.27  z  0  P 0.13  z  0
2. P0  x  9  P 
7
.
9
7.9 

= .3980 - .0517 = .3463
For z make a diagram. Draw a Normal curve with a mean at 0. Indicate zero by a vertical line! Shade the entire
area between -1.27 and -0.13. Because this is on one side of zero, we must subtract the area between -0.13 and zero
from the larger area from -1.27 to zero. If you wish, make a completely separate diagram for x . Draw a Normal
curve with a mean at 10. Indicate the mean by a vertical line! Shade the area from 0 to 9. This area is on one side
of the mean (10), so we subtract the area between 9 and the mean from the larger area between 0 and the mean.
2  10 

 Pz  1.01  Pz  0  P1.01  z  0
3. F 2.00  (Cumulative Probability)  P  z 
7.9 

 .5  .3438  .1562
A cumulative distribution is represented by the entire area below a point. For z make a diagram. Draw a Normal
curve with a mean at 0. Indicate zero by a vertical line! Shade the entire area below -1.01. Because this is on one
side of zero, we must subtract the area between -1.02 and zero from the entire area below zero. If you wish, make a
completely separate diagram for x . Draw a Normal curve with a mean at 10. Indicate the mean by a vertical line!
Shade the entire area below 2. This area is on one side of the mean (10), so we subtract the area between 2 and the
mean from the larger area below the mean. Because the Normal distribution is symmetrical around the mean and the
entire area under a Normal curve is 1, the area below the mean must be .5.
2
251y0741 5/20/07
36  10 
  2  10
z
 P 1.52  z  3.29   P1.52  z  0  P0  z  3.29 
4. P2  x  36   P 
7
.
9
7.9 

 .4357  .4995  .9352
Note that P0  z  3.29  must be read from the area below the main table. For z make a diagram. Draw a Normal
curve with a mean at 0. Indicate the mean by a vertical line! Shade the entire area between -1.52 and 3.29.
Because this is on both sides of zero, we must add the area between -1.52 and zero to the area from zero to 3.29. If
you wish, make a completely separate diagram for x . Draw a Normal curve with a mean at 10. Indicate the mean
by a vertical line! Shade the area from -2 to 36. This area is on both sides of the mean (13), so we add the area
between -2 and the mean to the area between the mean and 36.
10  10 
  10  10
z
 P 2.53  z  0   .4943
5. P10  x  10   P 
7
.
9
7.9 

For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the
area between -2.53 and zero. Because this area begins at zero, we may read the probability directly from the
standard Normal distribution table. If you wish, make a completely separate diagram for x . Draw a Normal curve
with a mean at 10. Indicate the mean by a vertical line! Shade the area from 10 to 10. This area starts at the
mean (10), so we will not need to add or subtract areas.
6. x.34 Solution: z .34 is, by definition, the value of z with a probability of 34% above it. Make a diagram. The
diagram for z will show an area with a probability of 100% - 34% = 66% below z .14 . It is split by a vertical line
at zero into two areas. The lower one has a probability of 50% and the upper one a probability of 66% - 50% =
16%. The upper tail of the distribution above z .34 has a probability of 34%, so that the entire area below z .34
adds to 50% + 16% = 66%. From the diagram, we want one point z .34 so that Pz  z .34   .66 or
3
251y0741 5/20/07
P0  z  z.34   .1600 . If we try to find this point on the Normal table, the closest we can come is
P0  z  0.41  .1591 . So we will use z.34  0.41 , though 0.42 might be acceptable.
Since x ~ N 10, 7.9 , if we bother with a diagram for x.34 , it would show 66% probability below x.34 split in two
regions on either side of the mean (10) with probabilities of 50% below 10 and 16% above 10 and below x.34 , and
with 34% above x.34 . We already know that z.34  0.41 , so the value of x can then be written x.34    z.34
 10  0.417.9  10  3.239  13.239 .
13 .239  10 

Check: Px  13.239   P  z 
  Pz  0.41  Pz  0  P0  z  0.41  .5  .1591  .3409  34%
7.9


7. A symmetrical region around the mean with a probability of 32%.
Solution: This is something of a gift since you can reuse z.34  0.41 . But let’s assume that you don’t know that.
Make a diagram. The diagram for z will show a central area with a probability of 32%. It is split in two by a
vertical line at zero into two areas with probabilities of 16%. The tails of the distribution each have a probability of
50% - 16% = 34%. From the diagram, we want two points z .34 and z.66   z.34 so that
Pz.66  z  z.34   .66  .34  .3200 . The upper point, z .34 will have P0  z  z .34  
32 %
 .1600 , and by
2
symmetry z.66   z.34 . From the interior of the Normal table the closest we can come to .1600 is
P0  z  0.41  .1591 , which is slightly too low. The next best point would be 0.42 since P0  z  0.42   .1628 ,
but 0.41 is closer so we can say z.34  0.41 , and our 32% symmetrical interval for z is -0.41 to 0.41.
Since x ~ N 10, 7.9 , the diagram for x (if we bother) will show 32% probability split in two 16% areas
on either side of 10, with 16% above 10 and 16% below 10. The interval for x can then be written
x    z .34  10  0.417.9  10  3.239 or 6.761 to 13.239.
4
251y0741 5/20/07
13 .239  10 
 6.761  10
z
To check this: P6.761  x  13.239   P 
  P 0.41  z  0.41
7
.
9
7.9


 2P0  z  0.41  2.1592   .3182  32%
5
251y0741 5/20/07
II. (10 points+, 2 point penalty for not trying parts a) and b) Show your work! Mark individual sections clearly.
Answers without work and/or reasons cannot be accepted!
(Webster) A local bar sells ’16 oz’ glasses of beer. A group of students buys 22 glasses of beer and, using their own
measuring cup, tries to estimate the mean contents. The measurements are below. (Webster)
13.5
15.3
13.8
15.4
14.2
15.4
14.3
15.6
14.6
15.6
14.6
15.8
14.6
15.8
22
Using Minitab, I have computed the following

14.9
15.9
15.0
16.1
15.1
16.7
15.3
16.9
20
x  334 .4 and
i 1
x
2
 4533 .84 .
i 1
a) Note that the sum of squares is the sum of only the first 20 numbers. To show me that you know how to compute
x 2 using the total that I have given to you. (1)
a sum of squares, compute

b) Compute the sample variance of the contents of the 22 glasses given above. (2)
c) Compute a 99% confidence interval for the mean contents (3)
d) Is the mean significantly different from 16 oz.? Why? (1)
e) Because of all the hoopla around the students’ experiment only 100 glasses of beer were sold that night.
Assuming that 100 is the population size from which the sample of 22 was taken, recompute the confidence interval
in c). (2) [9]
f) Assume that the costs to the bar owner in drawing a glass of beer are $0.15 per glass plus $0.015 per oz, use the
mean and variance that you found in b) to compute the mean and variance of the costs to the bar owner. There will
be no credit in this section unless you use the mean and variance from b). (2.5) [11.5]
Solution: The entire computation spreadsheet appears in Problem IIIA below.
22
a)

x
 x  334 .4  15.20
s
 0.7362  0.8580
20
x2 
i 1
x
2
2
2
 x 21
 x 22
 4533 .84 278 .89  285 .61  5098 .34
n
i 1
b) s x2 
x
2
 nx 2
n 1

5098 .34  2215 .20 2 15 .46

 0.7362
21
21
Of course, many people tried to use the definitional formula s x2 
 x x 
x
 x  x 
n 1
22

2
. But since they had forgotten what
2
the formula meant, they computed
computed
x
2
n 1
instead. This is doubly self-defeating, since they had already
.
c) Because we do not know the population standard deviation of x, we must use t . Since our confidence level is
99%, the significance level is 1%. Since our sample size is n  22 , we have n  1  21 degrees of freedom.
According to Table 18, t n1  t 21  2.831 . We also need the standard error of the mean

sx 
s
n
to 15.72.

2
.005
0.7362
 0.1829 . The interval is thus   x  t n1 s x  15.20  2.8310.1829  15.20  0.52 or 14.68
2
22
d) Since 16 is not on the confidence interval we can say with 99% confidence that our calculated mean is
significantly different from 16.
6
251y0741 5/20/07
e) If we must assume that the population size is 100, our standard error becomes


N n s
100  22 0.7362

 0.787879 0.1829  0.887625 0.1829   0.1623 . This means that the
N 1 n
100  1
22
interval is   x  t n1 s x  15.20  2.8310.1623  15.20  0.46 or 14.74 to 15.64.
sx 
2
f) The formula relating cost to ounces is w  0.015 x  0.15 . We know x  15 .20 and s x2  0.7362 . Since
w  ax  b , and E w  aEx  and Varw  a 2Varx apply to sample data, w  0.015 15.2  0.15  0.378 .
s w2  0.015 2 .7362   .000166 .
7
251y0741 5/20/07
III. Do at least 5 of the following 6 sections (at least 12 each) (or do items adding to at least 48 points - Anything
extra you do helps, and grades wrap around) . Show your work! Please indicate clearly what sections of the
problem you are answering! If you are following a rule like E ax  aEx  please state it! If you are using a
formula, state it! If you answer a 'yes' or 'no' question, explain why! If you are using the
Poisson or Binomial table, state things like n , p or the mean. Avoid crossing out answers that you think are
inappropriate - you might get partial credit. Choose the problems that you do carefully – most of us are unlikely to
be able to do more than half of the entire possible credit in this section!) This is not an opinion questionnaire.
Answers without reasons or supporting calculations or table references will not be accepted (except in multiple
choice questions)!!!! Answers that are hard to follow will be penalized. Note that some sections extend over
more than one page.
A. Remember the problem on the previous page. The data is repeated here. x is volume in ounces. y is simply the
order in which the beers were drawn. The students believe that the bartender got more generous as the evening
x 2 from
x and
passed. To test this they computed the correlation of the volume with the order. You have

the previous problem. It is quite easy to compute
Row
1
2
3
4
5
6
7
8
9
10
11
x
13.5
13.8
14.2
14.3
14.6
14.6
14.6
14.9
15.0
15.1
15.3
 y  253
y
and
Row
12
13
14
15
16
17
18
19
20
21
22
1
2
3
4
5
6
7
8
9
10
11
y
2

20
 3795 and Minitab says
 xy  3237 .3
i 1
x
15.3
15.4
15.4
15.6
15.6
15.8
15.8
15.9
16.1
16.7
16.9
y
12
13
14
15
16
17
18
19
20
21
22
1) Note that the sum xy is the sum of only the first 20 numbers. To show me that you know how to compute a sum
of this type, compute
 xy
using the total that I have given to you. (1)
2). Compute the sample covariance s xy between x and y . (2) Explain what this covariance tells us. (0.5)
3) Compute the sample correlation rxy between x and y . (2) Explain what this correlation tells us about the
relationship between the size of the beer and the order in which it was drawn. (1) [6.5]
4) Using the correlation and covariance you computed in b) and c) and the equation relating the size of the beer and
its cost that you used on the last page, find the covariance and correlation between cost of the beer to the bar owner
and the order in which it was drawn. (4) [10.5]
5) The true test of a correlation is a significance test which not only looks at the size of the correlation, but the size
r
r
of the sample. To do such a test compute t 
. On the assumption that there is no significant

sr
1 r 2
n2
n  2)
n  2)
correlation and using a 99% confidence level, 99% of the values of this t-ratio will fall between  t .(005
and t .(005
.
If your value of the t ratio does not fall between these two values, you can say that there is significant correlation.
Find the appropriate values of t, compute the t-ratio and tell me if the correlation is significant. (5) [15.5]
22
1) You have already found

i 1
20
x2 
x
2
 4533 .84 278 .89  285 .61  5098 .34 , x  15 .20 and
i 1
8
251y0741 5/20/07


s x2  0.7362 s x  0.7362  0.8580 . You have been given
y
2
 3795 . You also have
22
Now you need

 x  334 .4 and  x
2
 y  253
(which implies that y 
253
 11 .50 ) and
22
 5098 .34 .
20
 xy  3237 .3 350 .7  371 .8  3959 .8 . The spreadsheet for computing these is at the end
xy 
i 1
i 1
of this section.
 xy  nxy  3959 .8  2215.211.5  114 .2  5.438095
2) s xy 
This implies that x and y move together. It is
21
21
n 1
impossible to judge the strength or significance of the relationship from the covariance.
3) We need a variance and/or a standard deviation for y.
s 2y
y

2
 nx 2

n 1
3795  2211 .50 2 885 .50

 42 .16667 ( s y  42.16667  6.493587 }
21
21
5.438095 2
 0.952639  .9760 . Since the square of the
0.7362 42.16667 
sx s y
0.7362 42 .16667
correlation is quite close to 1, this is quite strong. Note that 1  r  1 . A value not in this range is a fatal error.
r
s xy
5.438095


4) The formula relating cost to ounces is w  0.015 x  0.15 . We know that if w  ax  b and v  cy  d ,
s wv  acs xy and rwv  signacrxy . We can say that a  0.015 and c  1 (since v  y  1y  0 ). So
s wv  0.015 15.438095  0.8159 and rwv  sign0.015 1.9760   .9760 .
5) t 
r

sr
r
.9760

.9760


.952639
 402 .99  20 .075 . The values of t that we
.0023639
1  .952639
.002368
20
compare this to are  t n2  t .20
005  2.845 . Obviously 20.075 is not between these two values, so we reject the
1 r
n2
2
2
hypothesis that there is no significant correlation between x and y .
The entire spreadsheet for computing the means, variances, covariances and correlations follows. You should have
only done a very few numbers, as specified.
Row
1
2
3
4
5
6
7
8
9
10
11
x
y
x2
y2
xy
13.5
13.8
14.2
14.3
14.6
14.6
14.6
14.9
15.0
15.1
15.3
1
2
3
4
5
6
7
8
9
10
11
182.25
190.44
201.64
204.49
213.16
213.16
213.16
222.01
225.00
228.01
234.09
1
4
9
16
25
36
49
64
81
100
121
13.5
27.6
42.6
57.2
73.0
87.6
102.2
119.2
135.0
151.0
168.3
2
Row
x
x
xy
y
y2
12 15.3 12 234.09 144 183.6
13 15.4 13 237.16 169 200.2
14 15.4 14 237.16 196 215.6
15 15.6 15 243.36 225 234.0
16 15.6 16 243.36 256 249.6
17 15.8 17 249.64 289 268.6
18 15.8 18 249.64 324 284.4
19 15.9 19 252.81 361 302.1
20 16.1 20 259.21 400 322.0
21 16.7 21 278.89 441 350.7
22 16.9 22 285.61 484 371.8
Sum 334.4 253 5098.34 3795 3959.8
9
251y0741 5/20/07
B. Let us define the following events. A1 : y  3, A2 : y  5 , B1 : x  1 B 2 : x  2 .
B1
B2
A1
A2
1) Create a joint probability table like the one at the left on the assumption that the four
events are the only ones that can occur, P A1   .4 , PB1   .3 and that A1 and B1 are independent. (3)
2) Compute the variance of y and the covariance and correlation between x and y in 1). (3)
3) Repeat 1) on the assumption that A1 and B1 are mutually exclusive. (2)
4) Compute the variance of y and the covariance and correlation between x and y in 3). (3)
5) Use the addition rule to show that if P A1   .4 and PB1   .3 , these two events cannot be collectively
exhaustive no matter what the relationship between these two events. (1) [12]
Solution: 1) If P A1   .4 and PB1   .3 and these two events are independent, P A1  B1   P A1  PB1 
B1 B2
.4
A
.12 .28
.
 .4.3  .12 . If we fill in the table so that the probability of the events add to 1 we get 1
.6
A2 .18 .42
.30 .70
1.0
2) If we create a tableau as was done in Section K of the course, we get the following.
x
3
5
Px 
xPx 
y
2
.28 

.42 
 .70
 1.40
1
 .12

 .18
.30
0.30
x 2 Px  0.30
   x
Ex
2
2
 2.8
Px   3.1 ,
 
 Var x   E x  
P y  yP y  y 2 P y 
.4
1.2
3.6
.6
3.0
15 .0 To summarize  1.0
4.2
18 .6
 1.1
 3.1
 P y   1 , 
y
 Ey 
 Px   1 , 
x
 E x  
 yP y   4.2 and E y    y
2
2
 xPx   1.1 ,
P y   18 .6
 y2  Var y   E y 2   y2  18.6  4.22  18.6  17.64  0.96 . Note that we do not need
 3.1   1.12  3.1  1.21  1.89 . We also do not need to compute the covariance or
correlation, since, if x and y are independent, the covariance and correlation are zero. But, if you don’t believe me,
 x2
E xy  
2
2
x
 13.12   23.28  0.36  1.68 

 4.62 . So  xy  Exy   x  y
 25.42  0.90  4.20 
 xyPxy    15.18 
 4.62  1.14.2  4.62  4.62  0 .  xy 
 xy
 x y

0
 x y
 0.
10
251y0741 5/20/07
3) If P A1   .4 and PB1   .3 and these two events are mutually exclusive, P A1  B1   0 . If we fill in the table
B1 B 2
.4
A
0 .4
so that the probability of the events add to 1 we get 1
.
.6
A2
.3 .3
.30 .70 1.0
x
y
4)
1 2
 0 .4 


 .3 .3 
.30  .70
0.30  1.40
3
5
Px 
xPx 
x 2 Px  0.30
   x
Ex
2
2
 2.8
Px   3.1 ,
P y  yP y  y 2 P y 
.4
1.2
3.6
.6
3.0
15 .0 To summarize  1.0
4.2
18 .6
 1.1
 3.1
 P y   1 , 
y
 Ey 
 Px   1 , 
 yP y   4.2 and E y    y
2
 
  Varx   E x    3.1   1.1  3.1  1.21  1.89
 130  23.40   0  2.4
E xy    xyPxy   

  3.90 . So 
 15.30   25.30  1.5  3.0
x
2
 E x  
 xPx   1.1 ,
P y   18 .6
 y2  Var y   E y 2   y2  18.6  4.22  18.6  17.64  0.96 .
2
x
2
2
x
2
 3.9  1.14.2  3.90  4.62  0.72 .  xy 
 xy
 x y

0.72
0.96 1.89

xy
 Exy   x  y
0.72 2
 0.285714  0.5345 .
0.96 1.89 
5) If the events A1 and B1 are collectively exhaustive, P A1  B1   1 . But the addition rule says
P A1  B1   P A1   PB1   P A1  B1  . But this means P A1   PB1   P A1  B1   1 or
P A1   PB1   1  P A1  B1  . Since P A1  B1   0 , it must be true that if the two events are collectively
exhaustive P A1   PB1   1 . P A1   PB1   1 . So if P A1   .4 and PB1   .3 , since their probabilities do not
add to 1 or more, they cannot be collectively exhaustive.
11
251y0741 5/20/07
C. Answer the following 6 multiple choice questions. (These should be 2 each, but to discourage guessing, how
about 2.5 each for right answers and 0.5 penalty for wrong answers.)
1) Which of the following is a major difference between the binomial and the hypergeometric
distributions?
a) The sum of the outcomes can be greater than 1 for the hypergeometric.
b) *The probability of a success changes in the hypergeometric distribution.
c) The number of trials changes in the hypergeometric distribution
d) The outcomes cannot be whole numbers in the hypergeometric distribution
e) None of the above is correct.
2) The continuity correction factor is used when
a) The sample size is at least 5.
b) Both np and nq are at least 30.
c) *A continuous distribution is used to approximate a discrete distribution
d) A discrete distribution is used to approximate a continuous distribution
e) A binomial distribution is used to approximate the hypergeometric distribution
f) None of the above.
3) Suppose a population consisted of 20 items. How many different samples of n = 3 are possible?
a) 20
b) 40
c) 120
d) 20 3  8000
e) *1140
f) 6840
g) None of the above is correct.
C 320 
20! 20 19 18

 1140
17!3!
3  2 1
4) The finite population correction factor is used when
a) * n/N is more than .05.
b) N is more than 1000
c) np is greater than 5.
d) n is more than 30.
e) None of the above
12
251y0741 5/20/07
5) In the formula   x  z  x , the 
2
2
is
a) The confidence level
b) *The area of one tail of the sampling distribution of the mean
c) The probability that the interval would not include the mean
d) The proportion of confidence intervals that will contain the mean
e) None of the above.
6) All Normal distributions have
a) A finite range
b) The same coefficient of variation
c) The same probability density function f x 
d) *The same area between    and   2
e) All of the above must be true.
  2      P 1  z  2 . This
      
z
Explanation: P    x    2   P 





computation does not depend on the value of the mean or standard deviation.
13
251y0741 5/20/07
D.
1) We are baking chocolate chip cookies again. Assume that all cookies are discarded that have less than 4
chips, but the mean is only 4.5, what proportion of cookies will be discarded? (1)
2) In view of the results of problem 1 and using only the distributions for which you have tables, what is
the lowest value the mean can take if we want to be sure that no more than 1% of the cookies are discarded?
3) Find P1  x  15  for the following distributions. If you must substitute one distribution for another,
show that the substitution criterion is satisfied.
a) Poisson with parameter of 2.5 (1)
b) Poisson with parameter of 36 (2)
c) Binomial with n  35 and p  .02 . (2)
d) Binomial with n  35 and p  .2 (2)
e) Binomial with n  50 and p  .2 (1)
f) Binomial with n  50 and p  .55 (2)
[13]
g) Continuous uniform c  8 and d  13 (1)
4) Find Px  1 for the following.
a) Hypergeometric N  16 , p  .25 , n  5 (2)
b) Hypergeometric N  600 , p  .25 , n  5 (2)
c) Geometric p  .25 (1)
[19 – 58.5]
Solution:
1) We are baking chocolate chip cookies again. Assume that all cookies are discarded that have
less than 4 chips, but the mean is only 4.5, what proportion of cookies will be discarded? (1) For the Poisson
distribution with a parameter of 4.5, Px  4  Px  3  .34230 .
2) In view of the results of problem 1 and using only the distributions for which you have tables, what is
the lowest value the mean can take if we want to be sure that no more than 1% of the cookies are discarded? If we
just wander through the tables, as the mean rises, the probability Px  3 falls. If the mean is 10, Px  3 = .01034.
But if the mean is 10.5, Px  3 = .00715, which is below 1%. So the lowest value the mean can take is somewhere
between 10 and 10.5.
3) Find P1  x  15  for the following distributions. If you must substitute one distribution for another,
show that the substitution criterion is satisfied.
The substitution table in ‘Great Distributions I Have Known’ is copied below.
Replace
Binomial
With
Poisson with m  np
Hypergeometric
Binomial with p 
Binomial
M
N
Normal with   np ,
If
n
 500
p
N  20 n
np  5 and nq  5
  npq
Poisson
Normal with   m ,
if m  25
 m
Hypergeometric
Normal p 
  np ,
M
,
N
np  5 and nq  5 , but
think about Binomial if
N  20 n
N n
npq
N 1
There seems to be a fairly large minority that has decided that one can use the Normal distribution to
approximate any distribution without justification. Wake up!

14
251y0741 5/20/07
Note that in all solutions below the continuity correction is used. No credit was lost for
solutions that did not use it. Credit was added at my discretion when it was used.
a) Poisson with parameter of 2.5 (1)
P1  x  15   Px  15   Px  0  1  Px  0  1  .08208  .9178
For no good reason, as soon as they saw the word ‘Poisson,’ people started writing
e m m x
. This very rarely gets you any credit. If you don’t know how to use the
x!
tables, you shouldn’t be doing this section.
b) Poisson with parameter of 36 (2). Since the mean is over 25, we can replace the Poisson
distribution with the Normal. If we use the continuity correction, P1  x  15 
Px  
 0.5  36
15 .5  36 
 PN 0.5  x  15 .5  P 
z
  P 5.92  z  3.41
36 
 36
 .5  .4997  .0003
n 35

 1750  500 , we can use the Poisson
c) Binomial with n  35 and p  .02 . (2). Since
p .02
distribution with a mean of np  .0235   0.7 . P1  x  15   Px  15   Px  0
 1  .49659  .5034 .
d) Binomial with n  35 and p  .2 (2). Since the expected number of successes is
  np  .235   7  5 and the expected number of failures is n  np  35  7  28  5 ,
we can use the Normal distribution with standard deviation   npq  7.8  5.60
15 .5  7 
 0.5  7
z
 2.366 . P1  x  15   PN 0.5  x  15 .5  P 
2
.
366
2.366 

 P2.74  z  3.60   .4969  .4998  .9967
e) Binomial with n  50 and p  .2 (1) P1  x  15   Px  15   Px  0 = .96920 - .00001
=.9692
f) Binomial with n  50 and p  .55 . (2) If the probability of failure is .45. 1 success corresponds
to 49 failures and 15 successes corresponds to 35 failures. P35  x  49 
 Px  49   Px  34  = 1 - .99670 = .0033
g) Continuous uniform c  8 and d  13 . (1) Make a diagram. You will have a rectangle with a
1
1
1

 . Since the entire area of the rectangle is between 1 and 15,
height of
d  c 13  8 5
shade the whole bloody thing. The area of the rectangle is 1, so P1  x  15  = 1.
4) Find Px  1 for the following.
 C 4 C 12 
a) Hypergeometric N  16 , p  .25 , n  5 (2) M  Np  4. 1  P0  1   0 165 
 C

5


 12! 
12 11 10  9  8
 1

7!5! 
12 11 10  9  8

 1 
 1  5  4  3  2 1  1 
 1  .1813  .8187
16! 
16 15 14 13 12
16 15 14 13 12


5  4  3  2 1
 11!5! 
15
251y0741 5/20/07
b) Hypergeometric N  600 , p  .25 , n  5 . (2) Since the sample is much less than 5% of the
population, use the binomial distribution with p  .25 , n  5 . 1  P0  1  .23730
 .7627 . The Normal distribution is not appropriate here because the mean is too small.
c) Geometric p  .25 (1). There are a number of ways to approach this like computing 1  P0 .
However, the easiest way to do this is to note that if the first success must fall on a try
after the first one, there must be a failure on the first try. Px  1  q  .75 . [19 – 58.5]
16
251y0741 5/20/07
E.
1) Do the following confidence intervals.
(Webster) Assume that a real estate developer wants to estimate the mean family income in an area
where a mall is proposed. The developer assumes that the population standard deviation for income is
$7200
a) If a survey of 100 families yields a sample mean of $35500, create a 90% confidence
interval for the mean. (3)
b) Find a confidence interval using the data in a) with a 32% confidence level. (2)
c) Assume a 90% confidence level again and that the sample of 100 families comes from
a neighborhood of only 1000 families. Repeat a) (3)
d) (Extra Credit) Assume that your results in a) are correct and that there are 30000
families in the target area. Can you make the 90% confidence interval in a) into a
confidence interval for total income? (2)
d) Is the income found in a) significantly different from $36000? Why? (1) [9]
e) A Business Week article claims that 25% of CEOs are ‘outsiders.’ A survey of 350
corporations is taken to check this ‘fact’ and the survey finds that 77 of the 350 firms
have outsider CEOs. Create a 90% confidence interval for the proportion of firms that
have outsider CEOs. Does this conflict with the statement in Business Week? Why? (4)
2) The amount of time a bank teller spends with each customer has a population mean   3.10
and a population standard deviation   0.40 minute.
minutes
a) If a random sample of 16 customers is selected from a large normally distributed
population, what is the probability that the average (mean) time spent per customer is less
than 3 minutes? (2.5)
b) How would this change if the sample was taken from a population of only 144
customers? Give a specific answer! (2)
c) If the teller puts in a 420 minute day, and the conditions in 2a) prevail, what is the
probability of serving 150 customers or more? (2.5) [20]
d) Under the circumstances in a), what is the probability that the time spent with 1 of the
16 customers is less than 3 minutes? (2)
e) Under the circumstances in a), what is the probability that the time spent with each of
the 16 customers in 2a) is less than 3 minutes apiece? (2) [26 – 84.5]
Solution: 1 Assume that a real estate developer wants to estimate the mean family income in an area
where a mall is proposed. The developer assumes that the population standard deviation for income is
$7200
a) If a survey of 100 families yields a sample mean of $35500, create a 90% confidence
interval for the mean. (3) The relevant formula for this section is   x  z  x .
2
If 1    .90 ,   .10 and z  z.05  1.645 and  x 
2

n

7200
 720
100
  35500  1.645 720   35500  1184 or 34316 to 36684.
b) Find a confidence interval using the data in a) with a 32% confidence level. (2)
If 1    .32 ,   .68 and z   z.34 . We found z.34  0.41 on the first page of the exam.
2
  35500  0.41720   35500  295
17
251y0741 5/20/07
c) Assume a 90% confidence level again and that the sample of 100 families comes from
a neighborhood of only 1000 families. Repeat a) (3)
x 


n
7200
100
1000  100
 720 .9009  720 .9492   683 .40
1000  1
  35500  1.645 638 .40   35500  1124
d) (Extra Credit) Assume that your results in a) are correct and that there are 30000
families in the target area. Can you make the 90% confidence interval in a) into a
confidence interval for total income? (2) We have   35500  1.645 720   35500  1184 or
34316 to 36684. Just multiply each number by 30000. This gives us 10,294,800,000 to
11,005,200,000
d) Is the income found in a) significantly different from $36000? Why? (1) [9] Since 36000 is
included in the confidence interval, we cannot say that the average income is significantly
different from 36000.
e) A Business Week article claims that 25% of CEOs are ‘outsiders.’ A survey of 350
corporations is taken to check this ‘fact’ and the survey finds that 77 of the 350 firms
have outsider CEOs. Create a 90% confidence interval for the proportion of firms that
have outsider CEOs. Does this conflict with the statement in Business Week? Why? (4)
The observed proportion is p 
p  p  z 2
77
 0.22 . A confidence interval for a proportion is
350
pq
.22 .78 
 0.22  1.645
 0.22  1.645 .00049  0.22  1.645 .0221 
n
350
 0.22  0.36 or .184 to .256. Since 25% is included in this interval, there is no conflict.
18
251y0741 5/20/07
2) The amount of time a bank teller spends with each customer has a population mean   3.10 minutes
and a population standard deviation   0.40 minute. This is essentially Exercise 7.71 on CD and the
solution was posted.
a) If a random sample of 16 customers is selected from a large normally distributed
population, what is the probability that the average (mean) time spent per customer is less
than 3 minutes? (2.5) The sample mean has a Normal distribution with a mean of 3.10 and a
0.40
standard deviation of  x 
 0.10 . So x ~ N 3.10,0.40  and x ~ N 3.1,0.10 
16
3  3.10 

Px  3  P z 
  Pz  1.00   Pz  0  P 1  z  0  .5  .3531
.10 

= .1469
b) How would this change if the sample was taken from a population of only 144
customers? Give a specific answer! (2)
x 
x
n
.40 2 128 
N  n 0.40 144  16


 0.008951  .09461
N 1
16 143 
16 144  1
3  3.10 

Px  3  P z 
  Pz  1.06   Pz  0  P 1.06  z  0  .5  .3554
.09461 

= .1446
c) If the teller puts in a 420 minute day, and the conditions in 2a) prevail, what is the
probability of serving 150 customers or more? (2.5) [20] In order to do this, the teller must spend
420
 2.87 minutes or less with a customer.
an average of
150
2.87  3.10 

Px  2.87   P z 
  Pz  2.30   Pz  0  P 2.30  z  0 
.10


= .5 - .4893 = .0107
d) Under the circumstances in a), what is the probability that the time spent with 1 of the
16 customers is less than 3 minutes? (2) x ~ N 3.10,0.40 
3  3.10 

Px  3  P z 
  Pz  0.25   Pz  0  P 0.25  z  0 
.40 

=.5 - .0987 = .4013
e) Under the circumstances in a), what is the probability that the time spent with each of
the 16 customers in 2a) is less than 3 minutes apiece? (2) [26 – 84.5]
If we can assume that the times are independent, we have .4013 16  .000000452
19
251y0741 5/20/07
F.
1) (Dunleavy) Assume that for every 100,000 people treated in an emergency room, 5,000 are
misdiagnosed, and 1,000 misdiagnosed patients die. Overall, 90,000 of the 100,000 live through the
experience.
Define the following events MD misdiagnosed ; S survives ; MD not misdiagnosed and
S dies.
Don’t just guess, make a table. Distinguish clearly between conditional and joint probabilities.
a) What is the probability that a patient will be diagnosed correctly? (1)
b) What is the probability that a patient will be diagnosed correctly and live? (1)
c) What is the probability that a patient will die given that they were diagnosed correctly?
(2)
d) Are the diagnosis and survival independent? (1)
e) If a patient dies, what is the probability that the patient was not misdiagnosed? (2) [7]


f) What is the value of P MD  S ?
2) OK. I ran out of ideas, so here is a mini-Jorcillator problem.
Here is the joint probability table and the table relating joint events to failure times.
Failure of the phillinx in period 1, 2, 3, 4 are events A1, A2 , A3 , and A4 .
Failure of the flubberall in period 1, 2, 3, 4 are events B1, B2 , B3 , and B4 .
Failure of the jorcillator in period 1, 2, 3, 4 are events C1, C2 , C3 , and C 4 .
a) Assuming that the Jorcillator lives as long as one of its components lives, fill in the
second table with failure times of 1, 2, 3 and 4. (1)
A1
A2
A3
A4
B1
B2
B3
B4
.20
.12
.04
.04
.40
.15
.09
.03
.03
.30
.10
.06
.02
.02
.20
.05
.03
.01
.01
.10
B1
.5
.3
.1
.1
1.0
B2
B3
B4
A1
A2
A3
A4
b) Find PC1  (1)
c) Find PC 2  (1)
d) Find PC3  (1)
e) Find PC 4  (1)


f) Find P C3 A3 (1) [13 – 97.5]
Solution: 1) (Dunleavy) Assume that for every 100,000 people treated in an emergency room, 5,000 are
misdiagnosed, and 1,000 misdiagnosed patients die. Overall, 90,000 of the 100,000 live through the
experience.
Define the following events MD misdiagnosed ; S survives ; MD not misdiagnosed and
S dies.
20
251y0741 5/20/07
Don’t just guess, make a table. Distinguish clearly between conditional and joint probabilities.
a) What is the probability that a patient will be diagnosed correctly? (1)
Facts: P MD  1  .05  .95 . As always, the easiest way to do this type of problem is a box. Let us assume
 
95000
 .95 .
100000
This means that we can divide our group of 100 into 95 diagnosed correctly and 5 that were not. We also
90000
 .90 so 90 out of 100 survive.
know that P S  
100000
that there are only 100 people available. The probability of being diagnosed correctly is
S
S
S
S
MD
5 MD
5
MD
95 MD
95
100
. If someone is misdiagnosed the probability of dying is
90 10 100
S
1000
5000
 .2 . So out of the 5 misdiagnosed, 20% or 1 die.
MD
MD
S
1
5
95
Fill in the rest of the
90 10 100
S
4
MD
table.
MD 86
90
S
1
5
. This can be converted into a joint probability table by dividing by 100.
9
95
10 100
S
MD
MD
S
.04 .01 .05
.86 .09
.95
. 90 .1 1.00


b) What is the probability that a patient will be diagnosed correctly and live? (1) P MD  S  .86
c) What is the probability that a patient will die given that they (?) were diagnosed correctly? (2)
.09
P S MD 
 .09474 . No! You still cannot read a conditional probability directly from a joint
.95
probability table.
.09
 .09474 . P S  .10 . Since these are not
d) Are the diagnosis and survival independent? P S MD 
.95
identical, they are not independent. Better, just note that the joint probabilities inside the table are not the
products of the probabilities outside the table. (1)
e) If a patient dies, what is the probability that the patient was not misdiagnosed? (2) [7]
.09
.95
P S MD P MD
.09
 .95

 .9
According to Bayes’ rule P MD S 
.10
.10
PS










 

f) What is the value of P MD  S ? I’m feeling lazy. Look at the table. 1- .86 = .14.
21
251y0741 5/20/07
2) OK. I ran out of ideas, so here is a mini-Jorcillator problem.
Here is the joint probability table and the table relating joint events to failure times.
Failure of the phillinx in period 1, 2, 3, 4 are events A1, A2 , A3 , and A4 .
Failure of the flubberall in period 1, 2, 3, 4 are events B1, B2 , B3 , and B4 .
Failure of the jorcillator in period 1, 2, 3, 4 are events C1, C2 , C3 , and C 4 .
a) Assuming that the Jorcillator lives as long as one of its components lives, fill in the
second table with failure times of 1, 2, 3 and 4. (1)
A1
A2
A3
A4
B1
B2
B3
B4
.20
.12
.04
.04
.40
.15
.09
.03
.03
.30
.10
.06
.02
.02
.20
.05
.03
.01
.01
.10
B1
.5
.3
.1
.1
1.0
B2
B3
A1
A2
A3
A4
B4
Since whatever component lasts
longest determines the result, an event like B1  A4 will down it in the 4th period. The table reads
A1
A2
A3
A4
B1
B2
1
2
B3
3 4
B4
2
3
4
2
3
4
3 4
3 4
4 4
b) Find PC1  (1) .20.
This section is very simple logic. If you were unwilling to
figure it out, maybe you are allergic to logical thinking.
c) Find PC 2  (1) .12 + .09 + .15 = .36
d) Find PC3  (1) .04 + .03 + .02 + .06 + .10 = .25
e) Find PC 4  (1) .1 + .1 - .01 = .19 These add to 1.


f) Find P C3 A3 (1) PC 3 A3  
Incidentally PA3 C 3  
PC3 A3  
PC 3  A3  .04  .03  .02

 .36 . So Bayes’ rule says
PC 3 
.25
PA3 C3 PC3 
P A3 
PC 3  A3  .04  .03  .02

 .9
P A3 
.1

.36.25 
 .9
.1
[13 – 97.5]
22
251y0741 5/20/07
23
Download