The Binomial Distribution

advertisement
The Uniform and Normal Distribution
When a random variable X can take on any value in an interval, we call it a continuous random
variable and describe its outcomes using a probability density function. Instead of talking about the
probability that X takes on a specific value P(X=x) as we do in the discrete case, we talk about the
probability that X falls in an interval such as [c, d] and write it as P(c  X  d ) . The probability
that X takes on a specific value in the continuous case is 0. The probability that X falls in an
interval [c, d] is determined by calculating the area under the density function that is bounded by c,
d, and the x-axis. In the continuous case, probabilities correspond to areas.
Uniform Distribution
The uniform distribution is possibly the simplest example of a continuous random variable. Its
probability density function (“density” for short) is given by
 1
 b  a for a  x  b
f ( x)  

 0 otherwise
The graph of this density is shown below:
Uniform Density
1
ba
b
a
Notice that the height of the elevated horizontal line is precisely
1
so that the total area under
ba
the curve is 1.
The mean of a uniform random variable is E( X ) 
ab
( b  a) 2
. The variance is Var( X ) 
.
2
12
Example. Suppose U follows a uniform distribution on [3,8] (usually written U[3,8]). (a) What is
the probability that U falls between 4 and 6? (b) What is the probability that U falls between 0 and
5?
(a)
A
R
E
A
1/5
3
4
6
8
1|P age
The Normal Distribution
The Normal distribution is the most important continuous distribution in all of statistics. The “bell
curve,” as it is frequently referred to, has a density function given by
1  x  2

 
 
1
f ( x) 
e 2
 2
for   x   ,
where  and  are parameters (   0 ) which affect the shape of the distribution. The choice of
Greek letters is prophetic. If X is a random variable whose density function is
1  x  2
 

1
f ( x) 
e 2    , then X has a normal distribution with mean E( X )   and variance
 2
Var( X )   2 (equivalently, the standard deviation of X is  ).
Notation: A normal random variable X with mean  and variance  2 is described using the
shorthand notation X ~ N ( ,  2 ) . The tilde (~) is read “is distributed as.”
Calculations with the Normal Distribution in Excel
Example. Suppose a Book Rep informs you that his/her publisher can provide you with a high
quality customized statistics textbook for your MBA students, who just happen to be attending an
elite, private Business School in North Texas whose name rhymes with the word “Sox.” However,
as the Rep acknowledges, this typically requires 6-8 weeks of production time after it is ordered.
Suppose the production time is normally distributed with a mean (  ) of 7 weeks and a standard
deviation (  ) of 2 weeks.
(a) What is the probability that the production time really takes between 6 and 8 weeks?
(b) What is the probability that the production of the customized book takes more than 11 weeks?
Solution: Draw a picture! For part (a), we want the probability (read area) under the density
between 6 and 8. Be sure to shade in the desired area in your diagram.
Excel provides the cumulative distribution function for any normal distribution. This function
calculates the area to the left of a point x for a normal distribution with mean (  ) and standard
2|P age
deviation (  ). The function is NORMDIST(x, mean, standard deviation, true). For example,
the total area to the left of the point 6 in our book production example is =NORMDIST(6,7,2,true),
which equals .308538.
The area to the left of 8 in the same distribution is
=NORMDIST(8,7,2,true), which equals .691462. The area between 6 and 8 is therefore .691462 –
.308538 = .382925.
For part (b), draw another picture and shade the appropriate area.
(Answer: 1-NORMDIST(11,7,2,True) = .02275)
Example. Suppose you know going into negotiations that production time for a customized STAT
book (used at an elite, private B-School in ……..) is normally distributed with a mean of 7 weeks
and a standard deviation of 2 weeks. You want to determine how much time to allocate for
production so that your book is done within the allotted time with a high probability, say 98%
(equivalently, a 2% chance that the book isn’t completed in the time allotted). How much time
should you allocate for production?
Solution: This problem does not ask for a probability, but rather states a probability (read area)
and asks for a value (time) that corresponds to this area. This is the inverse of the previous
problem, which started with values and asked for a probability. Still, it is important to draw a rough
picture and label all relevant information.
Excel provides values correspond to cumulative areas through the inverse cumulative distribution
function (NORMINV(area, mean, standard deviation)). This is actually the inverse function of
the cumulative distribution function. Given an area A ( 0  A  1) , the function determines the value
x in the normal distribution such that the cumulative area to the left of x is A. In our example
3|P age
above, we want to determine the value x in a normal distribution (with mean 7 and standard
deviation 2) such that 98% of the area is to the left of x. The desired x-value is
=NORMINV(.98,7,2), which equals 11.1075 of production time.
Standardization
The case where   0 and   1 is of special importance. It is called the standard normal and its
random variable is denoted by Z. Tables are available for Z (Appendix B, page 630).
One of the advantages of working with Normal random variables is that probability statements
regarding general normal random variables can be transformed into equivalent statements regarding
a standard normal. This is due to the following standardization rule.
Standardization Rule for Normal Random Variables
If X ~ N ( ,  2 ) , then
X

~ N (0,1) .
Sums of Normal Random Variables
In the previous book production example, we were only concerned with production time. In reality,
we probably need to be concerned with two things, the production time and the order/delivery time.
Example. Suppose the production time for the STAT book is as before (i.e., N(7,4)). The
order/delivery time (from a separate delivery company) follows a normal distribution with a mean
of 2 weeks and a standard deviation of 1 week (i.e., N(2,1)). What is the probability that the
textbook is at the campus bookstore within 12 weeks from the time it is ordered?
Solution. We’d like to combine the production time and the order/delivery time into a single
random variable representing the total time between placing the order and its arrival in the
bookstore. If we denote the random production time by PT (normally distributed with a mean of 7
and a standard deviation of 2) and the order/delivery time by ODT (normally distributed with a
mean of 2 and a standard deviation of 1), then the total time is PT+ODT. But what is the
distribution of PT+ODT? In general, this is a hard problem, but there is a simple case that occurs
quite frequently. It is the case where the two random variables are independent. In simple words,
independence means that the two outcomes are not related to one another in any way. This
condition is often simply assumed when it seems plausible. Assume PT and ODT are
independent (this makes sense in the context of our example because the delivery company is a
separate entity). Then we have the following rule, which will dramatically improve our ability to
obtain a solution.
Rule for Combining Independent Normal Random Variables
If X ~ N (  X ,  X2 ) and Y ~ N ( Y ,  Y2 ) are independent and a and b are any
two constants, then aX  bY ~ N (a X  b Y , a 2 X2  b 2 Y2 ) .
4|P age
We can now complete the problem. In our example, we apply the formula above with a=1, b=1.
PT+ODT has a normal distribution with mean 1×7+1×2=9, and variance (1)2×4+(1)2×1=5. The
standard deviation of this distribution is therefore 5 = 2.236 (approximately). The probability
that the book arrives in the bookstore within 12 weeks is =NORMDIST(12,9,sqrt(5),true), which is
equal to .9101 (approximately).
An Application to Inventory Pooling. The monthly demand for a product at four outlet stores
follows a normal distribution with a mean of 100 and a variance of 625. Demands are independent
at the four stores. Unsatisfied demand is lost (i.e., you cannot backorder units). Each store is
allowed to place a single order at the end of the month. This order arrives right at the start of the
next month. There are no additional shipments per month.
(a) Suppose each outlet stocks with the goal of meeting monthly demand 90% of the time. How
much stock should each outlet have on hand at the start of each month? Calculate how much is
needed at the four stores collectively at the start of each month (this is the inventory held by the
entire chain at the start of a month).
(b) Now suppose the stores decide to pool (or share) their inventory from a common source. For
example, there could be a central warehouse that all four stores draw from as depicted below. How
much inventory is needed in the system’s central pool at the start of a month to satisfy customer
demand 90% of the time? [Here, observe that when the central pool is empty, all four stores are out
of stock simultaneously; compare this situation with that in (a)]. Is the total amount of inventory
held by the chain at the start of a month the same as in part (a)? Explain the difference, if any.
Store
1
Store
2
Inventory
Pool
Store
3
Store
4
5|P age
Sampling Distributions: Estimating a Population Mean
Until now, you have been given privileged information about a random variable. You have been
given its distribution, which has enabled you to know the “true” mean and “true” variance with
precision, even if it involved a minor calculation. In practice, you will not have such detailed
information regarding the distribution of a random variable.
Because the mean (expected value) of a random variable is one of the most important measures in
all of statistics, we need a way of estimating its value in the real world. This involves the intuitive
concepts of sampling and estimation, which are best motivated by a real example.
Example (From: Jon Danklefs, SMU MBA Class 45D, KIA Motors). In March of 2000, a
hailstorm ripped through North Texas and did serious damage to a number of homes and
businesses. KIA Motor’s Midlothian distribution center was particularly hard hit. Nearly 5000
exposed vehicles were hail-damaged.
The distributor initially authorized a local hail-dent remover to repair up to 200 of the vehicles.
Each car was fixed panel by panel using a “paintless” dent removal method, and then a detailed
invoice was prepared to document the cost of the repairs. After finishing approximately 180
vehicles to the distributor’s satisfaction, it was mutually agreed that the remaining cars should be
repaired using a flat rate per car.
Jon Danklefs was largely responsible for negotiating the appropriate flat rate. He already had a
sample of 180 cars whose repair costs were known. The costs for these 180 vehicles are listed in
the file dents.xls.
Using this information, come up with a reasonable estimate of the expected cost per vehicle. How
does your value compare with the true expected cost per vehicle? What are the risks (to both sides)
if the current invoice method is continued instead of using a flat rate?
A Solution
Our proposed solution method first involves drawing a random sample, which is a set of
observations drawn in such a manner that on each draw the remaining observations are all equally
likely to be selected. Assuming the cars were drawn this way (by Jon Danklefs), we have a random
sample “without replacement” from a finite population.1 If the observations are replaced after each
draw, then we have a random sample with replacement. When the sample is small compared to the
population (less than 5%), this distinction is not critical. One may think of the population as nearly
infinite, which makes the replacement issue—with or without—unimportant2. This simplifies the
formulas we use (more complicated formulas are needed for the case of small finite populations
where items are drawn without replacement). Procedures for drawing random samples from both
finite and infinite populations are discussed in section 7.2 of your book. As is often the case in the
1
Some authors reserve the term simple random sampling for precisely this situation. Your book does not.
One can make any finite population infinite by drawing with replacement. The term infinite refers to the number of
observations of the random variable, not the number of distinct outcomes of the random variable.
6|P age
2
real world, our data was collected without our involvement. We will assume it constitutes a random
sample from an infinite population. This will be our standard assumption throughout the course.
The second part of our solution involves selecting a formula or rule to convert the observed sample
values into an appropriate estimate. With a little thought, I suspect you’d come up with the
following estimate of the expected value:
x
x1  x 2  x3          x178  x179  x180
 $215.66 .
180
The value x is called the sample mean, and it is a sample estimate of the true expected cost or true
mean cost,  cost .
As a practical matter, the KIA distributor might refuse to accept a fixed price higher than $215.66
per car. But the true mean cost  cost could actually be higher than $215.66, in which case $215.66
might be a good deal. The unfortunate truth is that we will never know with 100% accuracy how
close $215.66 is to  cost since the true cost of repairs on the remaining vehicles will not be
documented with invoices. But we can get a probabilistic idea of how close we are.
To understand how this is done, imagine we were to draw another random sample (now with
replacement) of size n=180 and compute another sample estimate of the true mean repair cost.
Would this second estimate likely be $215.66? What if we took a third random sample of size
n=180? In general, we can think of our original estimate x  $215.66 as a single draw from a
distribution of sample averages. In the language of statistics, we are interested in the distribution of
X  X 2  X 3          X 178  X 179  X 180
the estimator X  1
(remember, capital X’s stand for
180
random variables whose values have not yet been realized). We think of the estimator as a rule, and
the number we get from a particular sample as an estimate. Characterizing the distribution of the
estimator X called its sampling distributionis a critical step in understanding how close (in a
probabilistic sense) our computed sample mean x  $215.66 is to the true mean cost  cost .
To get there, we need two theoretical facts about the distribution of our estimator X .
Fact 1: The expected value of X satisfies E ( X )   cost . In simple terms, this means the
estimator’s theoretical average is “dead on” the value it is intended to estimate. This can be
shown using the rules for expectation given in an earlier lecture. Generally speaking,
estimator’s that have this property are said to be unbiased. Unfortunately, this is like
knowing that a manufacturing machine makes parts that are “correct on average” without
knowing what that average actually is.
Fact 2: The distribution of X is approximately Normal with a mean of  cost and a variance
of  2 / n , where  2 is the variance of the population we are sampling from. This is not at
all obvious and is a consequence of the Central Limit Theorem (CLT).
7|P age
Fact 2 deserves some discussion, which will be supplemented by an Excel demonstration. What
makes it such a profound result is that it doesn’t depend on the distribution of the underlying
population we are sampling from.
An Excel Simulation/Demonstration of the Central Limit Theorem
In this demonstration, we use a random variable that is uniformly distributed on [100,300], denoted
by U[100, 300]. Consequently   $200 and   3333.333 (you can calculate these from the
formulas given for the uniform distribution earlier). I selected this distribution for two convenient
reasons: (1) it is clearly not normal; (2) we talked about it at the start of the lecture (so you know a
bit about it). Of course, since we know the distribution, we wouldn’t need to estimate  because
we could actually compute it. However, I am using this distribution simply to demonstrate what the
Central Limit Theorem tells us about the distribution of our estimator X .
Suppose we draw a random sample of 100 values from U[100, 300] and compute the sample mean.
We can actually do this using the computer (I will show you how). Now suppose we repeat this
step over and over. We would never actually do this in practice; we are only doing it here to prove
a point about the distribution of X . If we take enough samples, we’ll get a pretty good idea of the
distribution of X for the case where the sample size is n = 100. I generated 10000 samples (each of
size n = 100) for our in-class example. Here’s a really geeky summarization:
Excel Experiment
Do 888
888
Sample = 1, 10,000
Draw a Sample of n=100; Calculate the Sample Mean; Store the Mean.
End
Construct a histogram of the 10,000 sample means.
The number of samples (10,000) is arbitrary; I chose it because I figured it would give us a really
nice picture (meaning histogram) for the distribution of X . The n stated in the CLT is the sample
size, n=100, not the number of samples. Observe that the distribution of X is quite different from
the underlying population (U[100,300]). It is approximately bell-shaped. Evidently, adding values
together and taking their average has this effect. The histogram gets more bell-shaped if we take a
larger sample size (a bigger n than 100).
The Central Limit Theorem (CLT)
Let ( X 1 , X 2 ,...., X n ) be a random sample from any infinite population with mean  and variance
X  X2   Xn
 2 . As n becomes large, the distribution of X  1
is approximately Normal
n
Approx.
2
2
) ).
with mean  and variance
(in fancy notation, X ~ N (  ,
n
n
Recall that this rule does not require the underlying population distribution to have any particular
form. The underlying population above was uniform, which is relatively nondescript. However, if
8|P age
the underlying distribution is normal to begin with, the distribution of X is exactly normal (for all
n), and we can dispense with the word “approximately.” This follows directly from the
combination rule for independent normal random variables. How big does n need to be for this
approximation to work? In this class, we will agree that n = 100 is big enough (although many
people say n = 30 is big enough). Nevertheless, the bigger the sample size, the better the
approximation.
Notice that the variance of X is shrinking as n becomes large in the statement of the Central Limit
Theorem above. Another way of stating the theorem that avoids this is to convert X to a standard
normal. The equivalent statement becomes
X 

Approx.
~ N (0,1)
(for large n).
n
This is often a more convenient form to work with. The Central Limit Theorem does have direct
applications. Consider the following example.
Application: Suppose you run an insurance company that processes claims at 100 regional offices
around the country. Weekly claims at the regional offices are independent of one another and
follow a distribution with   1000 and  2  22500 . What is the probability that the company
experiences more than 103,000 claims nationwide in a given week? (Note: This is only an average
of 1030 claims per regional office. Does exceeding 1030 at a regional office seem unusual given
  150 ?).
Solution. Turn this into an equivalent statement involving average claims and then apply the CLT.
Assignment #1 (Due Saturday, July 12th)
1.
2.
3.
4.
Book, 6.19
Book, 6.20
Book, 6.23
Book, 6.25
5. Suppose you run an insurance company that processes claims at 25 regional offices around the
country. Weekly claims at the regional offices are independent of one another and follow a normal
distribution with   100 and  2  400 .
(a) What is the probability that a single regional office experiences more than 110 claims in a given
week?
(b) What is the probability that the company experiences more than 2750 claims nationwide (an
average of 110 per office at all 25 stores) in a given week?
(c) What is the probability that at least 15 of the 25 offices experience more than 110 claims in a
week?
Note: The problems on Assignment 1 do not require the Central Limit Theorem. Problems
involving the Central Limit Theorem appear at the beginning of Assignment 2.
9|P age
Download