According to scientists, asteroids 500 meters in diameter or larger

advertisement
Practice Problems for Midterm #2
1. According to scientists, asteroids 500 meters in diameter or larger are expected to strike the
earth once every 10,000 years (on average). Fill in the blank: “Mathematically, there is a 10%
chance that the earth will be struck by a large asteroid within the next __________ years.”
Answer: P(x  x0) = 0.1 = 1-e-x0 / 10,000  x0 = 1053.6 years.
2. You have collected the following data from a population that is approximately normal in
distribution:
41.19 40.04 35.27 18.91 15.60
47.02 58.67 39.38 35.04 47.16
30.60 45.50 64.91 39.95 20.61
The sample has a mean of 38.66 and a standard deviation of 13.70. Suppose you plan to collect
one additional data point.
a) What is the probability that the sample point will be less than 13.452?
Answer: z13.452 = (13.452 – 38.66)/13.70 = -1.84. From the normal table, we see that –
1.84 corresponds tom P(-1.84 < z < 0) = 0.4671. Therefore, P(x<13.452) = 0.5-0.4671 =
0.0329.
b) Find the point x0 such that P(x > x0) =1.1%.
Answer: We want the upper tail to have a probability of 0.011. We must therefore find
0.5-0.011 = 0.489 in the table. This corresponds to a z of 2.29. Therefore x0 =
38.66+2.2913.70 = 70.03.
3. Suppose there is a test for a disease which affects 1 person out of every 2000. The test
correctly identifies a sick person 99% of the time (i.e., 1% of the time, the test says that the sick
person is actually healthy). Unfortunately, the test returns false positives (a healthy person is
identified as being sick) 2% of the time. A friend tests positive for the disease. What is the
probability that your friend actually has the disease?
Answer: P(sick) = 1/2000 = 0.0005; P(failed test|sick) = 0.99; P(failed test|healthy) = 0.02.
Using Bayes’ Rule,
Psick   P failed test sick 
Psick failed test  
Psick   P failed test sick   Phealthy   P failed test healthy 

0.0005  0.99
0.0005  0.99  0.9995  0.02
 0.0242
4. Serious traffic accidents occur, on average, twice per weekday in a city. The city has
emergency services capable of handling four serious accidents in a day. When more than four
accidents occur, emergency services are requested from surrounding communities. What
percentage of weekdays will the city need help from surrounding communities?
Answer: We want P(x>4) = f(5)+f(6)+f(7)+f(8)+….. and can use the Poisson distribution.
This is an infinite sum, so we can simplify it by calculating P(x>4) = 1 - f(4) - f(3) - f(2) - f(1)
– f(0).
2 4 e 2
2 3 e 2
2 2 e 2
f 4 
 0.0902; f 3 
 0.1804 ;
f 2 
 0.2707;
4!
3!
2!
21 e  2
2 0 e 2
f 1 
 0.2707;
f 0 
 0.1353
1!
0!
So, P(x>4) = 1-0.0902-0.1804-0.2707-0.2707-0.1353 = 0.0527 = 5.27%
5. BRIEFLY evaluate the validity of the following statement: “Even if a variable is continuous
in nature, we cannot measure it in a continuous way. For example, we cannot measure things out
to an infinite number of decimal places. An implication of this is that continuous probability
distributions are only useful in theory and are not useful when applied to real-world situations.”
Answer: There are two strong arguments here. First, many distributions have so many
discrete possibilities that they are nearly continuous (suppose we measure temperature to
three decimal places, for instance). In such cases, a continuous distribution is a reasonable
approximation. Second, the Central Limit Theorem says that the sample mean from any
distribution (even if it is discrete) will be approximately normal in distribution if n is large.
Thus a continuous distribution is the theoretical distribution for sample means. For both
these reasons, continuous distributions are of critical importance in real-world settings.
6. You are interested in selecting a simple random sample of 200 households from the phone
book. You know the number of entries in the book, but some of them are for businesses. A
colleague suggests the following method. You randomly select 200 numbers from a discrete
uniform distribution between 1 and the number of entries in the phone book. You then select the
entries corresponding to those numbers. If a selection happens to be a business, you simply
move to the next item in the phone book and include that in your sample. Does this
methodology meet the criteria for a simple random sample? BRIEFLY explain your answer.
Answer: Households immediately following businesses are more likely to be chosen than
those not following businesses. The strategy does not therefore meet the criteria.
7. Explain the difference between unbiasedness and efficiency. Which is more important in
choosing a point estimate? BRIEFLY justify your answer.
Answer: An unbiased estimate is one that is correct in expectation. An estimate is more
efficient than another estimate if its standard deviation is lower. Having an unbiased
estimate is usually considered more important than having the most efficient estimate.
8. You have collected the following data from an unknown population:
47 44 42 0
8
4
1 49 4
2 43 43
42 5 42 46 5
8
45 3 49 3
9 46
5 48 44 9
1 47
BRIEFLY explain how you would estimate the distribution of the sample mean. IN ONE
SENTENCE, justify your method. There is no need to do any calculations here.
Answer: The expected value of the sample mean would be the average of the numbers.
The standard deviation of the sample mean would be the sample standard deviation
divided by the square root of 30. The sample mean would be approximately normal in
distribution. This approach is justified by the Central Limit Theorem.
9. Suppose that you plan to flip a fair coin 1000 times and record the number of heads. What is
the probability that at least 520 heads will be recorded?
Answer: By the Central Limit Theorem, the sample proportion will be approximately normal in
distribution.  p 
p1  p 

n
0.52  0.5
0.51  0.5
 1.27 . From the
 0.0158 . Then, z 
0.0158
1000
normal table, we see that z=1.27 corresponds to an area of 0.3980 between z=1.27 and z=0.
P( p >0.52) = 0.5-0.3980 = 0.1020.
10. A company purchases electronic switches from three different suppliers. 55% of the
switches come from supplier A, 30% from supplier B, and 15% from supplier C. Supplier A
switches are defective with probability 1.0%. Supplier B switches are defective with probability
2.0%. Supplier C switches are defective with probability 3.0%.
a) Assuming that the company manufactures a radio that uses exactly one of the switches,
what is the probability that a randomly chosen radio will have a defective switch?
Answer: P(def) = P(A)P(def|A) + P(B)P(def|B) + P(C)P(def|C)
= 0.550.01 + 0.30.02 + 0.150.03
= 0.016
b) Suppose that a radio is found to have a defective switch. Which supplier is most likely to
have provided the switch?
Answer: Using Bayes’ Rule,
P A def  

P A  Pdef A  PB   Pdef B  PC   Pdef C 
0.55  0.01
0.55  0.01  0.3  0.02  0.15  0.03
 0.344
PB def  


PB   Pdef B 
P A  Pdef A  PB   Pdef B   PC   Pdef C 
0.3  0.02
0.55  0.01  0.3  0.02  0.15  0.03
 0.375
PC def  
P A  Pdef A
PC   Pdef C 
P A  Pdef A  PB   Pdef B   PC   Pdef C 
0.15  0.03
0.55  0.01  0.3  0.02  0.15  0.03
 0.281
Supplier B is the most likely to have supplied the switch.
c) What is the probability that the supplier in b) provided the switch?
Answer: See answer to b).
11. On average, 10 people go through a supermarket checkout line every hour. The probability
of someone entering the checkout line is the same for any two time intervals of equal length.
a) What is the probability that between 2 and 5 (inclusive) people will approach the checkout
line during a 30 minute period?
Answer: Using the Poisson distribution,
5 2 e 5 53 e 5 5 4 e 5 55 e 5
f 2  f 3  f 4  f 5 



 0.576
2!
3!
4!
5!
b) What is the probability that no customers will approach the checkout line in the next 6
minutes?
Answer:  = 10/12 = 0.8333
0.83330 e 0.8333
f 0 
 0.435
0!
12. You have conducted a study of spring term behavior and have found the following: 60% of
all students both attend class regularly and go to Goshen every week. Of the students who do not
go to Goshen every week, 85% attend class regularly. 75% of all students go to Goshen every
week. What percentage of students attend class regularly?
Answer: Pattend Goshen  0.6 ; Pattend no Goshen  0.85 ; PGoshen  0.75
Pattend   Pattend  Goshen  Pattend  no Goshen
Pattend  no Goshen  Pattend no Goshen Pno Goshen  0.85  0.25  0.2125
 Pattend   0.6  0.2125  0.8125 .
13. As part of a promotion, the supermarket randomly chooses customers who then receive a
20% discount on their purchases. The randomization process is such that the probability that a
given customer receives the discount is 0.12. What is the probability that more than 4 of the first
30 customers will receive the discount?
Answer: Using the Binomial distribution,
f 5  f 6  f 7   ...  1  f 0  f 1  f 2  f 3  f 4
 30 
 30 
 30 
 30 
 30 
 1    0.12 0 0.88 30    0.121 0.88 29    0.12 2 0.88 28    0.12 3 0.88 27    0.12 4 0.88 26
0 
1 
2 
3 
4 
 0.288
14. You work in a building that has a notoriously slow elevator. On average, the elevator arrives
at a floor 3 minutes after the button is pressed. The probability that the elevator will arrive
during a given 5-minute interval is the same as for any other 5-minute interval.
a) What is the probability that you will have to wait longer than 5 minutes for the elevator to
arrive?
Answer: The exponential applies. Px  5  e 5 / 3  0.1889 .
b) What is the probability that you will have to wait for between 1 and 2 minutes?
Answer: Px  2  e 2 / 3  0.5134 and Px  1  e 1 / 3  0.7165 , so
P1  x  2  0.7165  0.5134  0.2031.
15. An oil well has produced an average of 100 barrels per day over the last 30 days. The
standard deviation of daily production is 10 barrels per day. The data appears to be normal in
distribution.
a) What is the probability that the well will produce at least 110 barrels tomorrow?
110 100
 1 gives an area of 0.3413 between z=1 and z=0. P(x>110) = 0.5Answer: z 
10
0.3413 = 0.1587.
b) What is the probability that the well will produce between 90 and 120 barrels tomorrow?
90 100
120  100
  1 and z 
 2 give areas of 0.3413 and 0.4772. So
Answer: z 
10
10
P(90<x<120) = 0.3413+0.4772 = 0.8185.
c) Suppose that a geologist originally told you that the well should produce an average of 105
barrels per day. Evaluate the validity of his assessment.
105  110
z
  2.74 for 105, which would not necessarily be considered an outlier. It is far
1.826
enough away from the mean that we should be concerned, however.
16. On average, it rains 55 days per year (assume 365 days per year). What is the probability
that it will rain 2 days or less over the next 7 days? Be sure to list any assumptions that you
make in answering the problem (i.e., What distribution did you use? What assumptions are
needed so that it is reasonable to use that distribution?).
Answer: P(rain) = 55/365 = 0.1507. Using the binomial distribution,
7
f 0     0.1507 0 0.84937  0.3187
0
7
f 1    0.15071 0.84936  0.3959
1 
7
f 2     0.1507 2 0.84935  0.2107
 2
f(0)+f(1)+f(2) = 0.9254
The binomial is a reasonable assumption when we have a two-state outcome and
observations are independent.
17. On average, the February temperature is 40 degrees with a standard deviation of 8 degrees.
What is the probability that the temperature will be between 36 and 50 degrees on a given day in
February? Be sure to list any assumptions that you make in answering the problem (i.e., What
distribution did you use? What assumptions are needed so that it is reasonable to use that
distribution?).
Answer: Using the normal distribution, we have z36 = (36-40)/8 = -0.5 and z50 = (50-40)/8 =
1.25. From the normal table, we see that z36  -0.1915 and z50  0.3944. P(36<x<50) =
0.1915+0.3944 = 0.5859. In using the normal distribution, we assume that the distribution
of temperature is shaped, at least approximately, like the bell curve.
18. Consider the following data:
Day: Monday Tuesday Wednesday Thursday Friday
Week 1 Sales
43
50
57
47
55
Week 2 Sales
37
58
56
44
42
Week 3 Sales
34
59
54
44
44
Week 4 Sales
32
54
59
57
49
Week 5 Sales
43
54
59
52
41
Week 6 Sales
31
54
51
58
59
Week 7 Sales
42
42
55
59
52
Week 8 Sales
49
45
54
47
42
Week 9 Sales
48
57
42
42
55
Week 10 Sales
33
45
41
54
41
Your boss asks you to calculate the average daily sales. How do you respond? What is the
probability of getting sales between 38 and 41 (inclusive) next Monday? Be sure to list any
assumptions that you make in answering the problem (i.e., What distribution did you use? What
assumptions are needed so that it is reasonable to use that distribution?).
Answer: The mean daily sales during the period is 48.44. This might be misleading because
Monday may be an outlier (it is the only day with any sales in the 30s and it is the only day without
sales in the 50s). The mean sales figures by day are
Day
Mean Sales
Monday
39.2
Tuesday
51.8
Wednesday
52.8
Thursday
50.4
Friday
48.0
One approach in addressing the potential outlier is to calculate the standard deviation of daily
sales. In this case, s = 5.47. Monday is therefore (39.2-48.44)/5.47 = -1.69 standard deviations
away from the mean. For a small sample, this suggests (although it is far from conclusive) that
Monday may be an outlier. Coupling this with the intuition above (which basically says that the
ranges are vastly different) leads me to believe that Monday is quite likely an outlier. I would
therefore tell my boss that although the mean daily sales is 48.44, Monday averages 39.2 while the
rest of the week averages 50.75.
There are several reasonable approaches to estimating the probability of getting between 38 and
41 sales next Monday. Treating Monday as an outlier, we must use Monday’s mean and standard
deviation (6.66) in the analysis. The data is clearly discrete, but we may choose to use the normal
distribution as an approximation. z37.5 = (37.5-39.2)/6.66 = -0.255 and z41.5 = (41.5-39.2)/6.66 =
0.345. From the normal table, z37.5  0.21055 and z41.5  0.13495 (in both cases, I interpolated
between the numbers). P(38x41)0.21055+0.13495 = 0.3455.
An alternative approach is to assume that the data follows a discrete uniform distribution. This
seems reasonable when we observe that the numbers are spread pretty evenly between 30 and 50.
Given a 20 unit distribution and four outcomes of interest (38, 39, 40, and 41), we estimate
P(38x41) = 4/20 = 0.20.
The two estimates differ greatly because the assumptions behind the discrete uniform and
normal distributions are so different. Ideally, we would gather as much data as possible to
determine what distribution is most appropriate. Note that we might also consider a
binomial distribution if we can get data on the number of sales attempted, etc.
19. Your company recently manufactured a lot of 1000 units. Those units were tested and 8
were found to be defective. Unfortunately, the defective units were subsequently mixed in with
the good units. Your task is to separate the good units from the bad. What is the probability that
you will have to test 999 units to determine to separate the units? (Hint: once you test 999 units,
you know whether the last one is good or not….if you have 7 bad units out of 999, you know that
last one is bad….if you have 8 bad units out of 999, you know the last one is good).
Answer: You would have to test 999 only if there is one defective unit left after you have tested the
first 998 units (if you find 6 bad units out of the first 998 tested, you would stop knowing that the
last two were bad). So, the probability of having to test 999 units is equal to the probability that
exactly 1 unit is bad out of the last 2. Here, p=8/100 = 0.008.
 2
f x     0.008  1  0.008  0.01587 . So, there is a 1.587% chance that you would have to test
1 
the 999 units.
20. A bank screens credit applicant based on three factors, current debt, income, and prior
payment history. 40% of all applicants are rejected. 15% of applicants fail the debt test. 20% of
applicants fail the income test. 5% of applicants fail the payment history test. You know that a
certain customer applied and was rejected. What is the probability that the customer was
rejected due to low income? Comment on your ability to answer the question if 30% of all
applicants are rejected (and the other numbers are the same).
Answer: We want P(low income|rejection). We know from Venn diagrams and the derivation of
Bayes’ Rule that
Pincome  rejection 
Pincome  rejection  Pdebt  rejection   Phistory  rejection 
0.20

0.20  0.15  0.05
 0.5
So there is a 50% chance that the applicant failed due to low income. We can use this
approach because the factors are independent. If 30% of all applicants are rejected, then
there must be some applicants that were rejected for multiple reasons. To answer the
question, we must clarify whether we are interested in “rejected due only to low income” or
“rejected due to low income and/or other reasons”. If the former, the answer is 0.20/0.30 =
66.67%. If the latter, we cannot answer without additional information.
Pincome rejection 
21. Suppose that telemarketing sales are dependent on two factors: weather (when it’s raining,
more people are home) and time of day (if you call during prime time, people are less likely to
answer the phone). Those factors are independent. It rains with probability 0.1 and prime time
constitutes 40% of the normal calling hours. A telemarketer can make 10 calls per hour. The net
profit (including everything except telemarketer wages) per successful call is $9 and the
probabilities of success on a given call are as follows.
Raining Not Raining
Prime Time
0.25
0.15
Not Prime
Time
0.3
0.2
Telemarketers charge $15 per hour. What is the expected profit per hour of calling? Should you
implement a restricted calling plan? If so, what would you recommend? What is the expected
profit per hour of calling under the new plan?
Answer: The probabilities for the possible scenarios are
Raining
Not Raining
Prime Time
0.10.4 = 0.04 0.90.4 = 0.36
Not Prime Time 0.10.6 = 0.06 0.90.6 = 0.54
P(success on a randomly chosen call) = P(prime time & raining)P(success | prime time & raining)
+ P(not prime time & raining)P(success | not prime time & raining)
+ P(prime time & not raining)P(success | prime time & not raining)
+ P(not prime time & not raining)P(success | not prime time & not raining)
= 0.040.25 + 0.060.3 + 0.360.15 + 0.540.2
= 0.19
The expected profit per hour of calling is then 100.19$9 - $15 = $2.10. On average, 1.9 calls per
hour are successful, giving a net profit of $17.10 less the $15 paid to the telemarketer.
The lowest probabilities of success occur during prime time, so we might consider not making calls
during prime time. To answer this, we consider whether prime time calling is profitable or not.
P(success on a randomly chosen call during prime time) =
P(raining)P(success | raining)
+ P(not raining)P(success | not raining)
= 0.10.25 + 0.90.15 = 0.16
The expected profit per hour of calling during prime time is then 100.16$9 - $15 = -$0.60. So, we
should not make calls during prime time.
P(success on a randomly chosen call not during prime time) = 0.10.3 + 0.90.2 = 0.21.
Expected profit under the new plan = 100.21$9 - $15 =$3.90.
One might also consider not calling unless it is raining, but that would be difficult to implement. It
also might result in low morale because the employees would be on a very uncertain work schedule.
I therefore chose not to consider that possibility.
22. Briefly discuss the importance of the Central Limit Theorem.
Answer: The CLT is the cornerstone of most of statistics. It is important because the distribution
of the average of any sample is approximately normal in distribution for large n. Even if we have
no idea what the underlying distribution is, we can test certain hypotheses because of the CLT.
23. A city is investigating traffic patterns and observes that, on average, 20 cars go through an
intersection per hour.
a) What is the probability that no car will go through during the next five minutes? Answer
that question twice, using a different distribution each time. Be sure to list any assumptions
that you make in answering the problem (i.e., What distribution did you use? What
assumptions are needed so that it is reasonable to use that distribution?).
Answer: The average # of cars during a 5 minute period is 205/60 = 1.6667. Using the Poisson
1.6667 0  e 1.6667
 0.1889 . The average time between cars is 60/20 = 3
distribution, f 0 
0!
minutes. Using the exponential distribution, Px  5  e
5
3
 0.1889 .
b) What is the probability that no more than three cars will go through the intersection during
the next ten minutes?
Answer: Average # of cars during a 10 minute interval = 3.33333. Using the Poisson
distribution, f(0)+f(1)+f(2)+f(3) = 0.0357+0.1189+0.1982+0.2202 = 0.573.
24. Briefly explain how you would choose a random sample of 10 items from a group of 100.
Answer: Assign a random number to each of the 10 (using the Excel RAND() function, for
example) and then select the ten with the lowest random numbers.
25. Conceptually, why is the normal distribution a reasonable approximation to the binomial
under some circumstances?
Answer: Consider an experiment in which we flip a coin 100 times and record the number
of heads. We then repeat that experiment many times and produce a histogram of the
results. Intuitively, we believe that the histogram would show a peak around 50, with
symmetric, declining tails on both sides. I.e., the histogram would have a bell-shaped
curve. Mathematically, it is a reasonable approximation because of the Central Limit
Theorem. We could rescale the histogram to be the sample mean with a head=1 and a
tail=0. By the CLT, we know that the average would be approximately normal in
distribution.
26. Briefly explain why the normal distribution might be considered the most important of all
distributions.
Answer: From the CLT, we know that the sample mean of any distribution is approximately
normal in distribution when n is large. The normal distribution therefore appears naturally
in nearly every scenario. It is consequently the most used and most important distribution.
27. What is the standard error of the mean and why is it an important concept?
Answer: The standard error of the mean is the standard deviation of the sample mean. It is
important because we can use it in conjunction with the sample mean to test hypotheses about
population parameters.
Download