PowerPoint: Familiar Discrete Distributions.

advertisement
Some Common
Discrete Random Variables
Binomial Random Variables
Binomial experiment
• A sequence of n trials (called Bernoulli trials),
each of which results in either a “success” or a
“failure”.
• The trials are independent and so the probability
of success, p, remains the same for each trial.
• Define a random variable Y as the number of
successes observed during the n trials.
• What is the probability p(y), for y = 0, 1, …, n ?
• How many successes may we expect? E(Y) = ?
Returning Students
• Suppose the retention rate for a school indicates
the probability a freshman returns for their
sophmore year is 0.65. Among 12 randomly
selected freshman, what is the probability 8 of
them return to school next year?
Each student either returns or doesn’t.
Think of each selected student as a trial,
so n = 12.
If we consider “student returns” to be a
success, then p = 0.65.
12 trials, 8 successes
• To find the probability of this event, consider the
probability for just one sample point in the event.
• For example, the probability the first 8 students return
and the last 4 don’t.
• Since independent, we just multiply the probabilities:
P(( S , S , S , S , S , S , S , S , F , F , F , F ))
 P( R1  R2 
 P( R1 ) P( R2 )
 R8  R9  R10  R11  R12 )
P( R8 ) P( R9 )
 (0.65)8 (1  0.65) 4
P( R12 )
12 trials, 8 successes
• For the probability of this event, we sum the
probabilities for each sample point in the event.
• How many sample points are in this event?
• How many ways can 8 successes and 4 failures occur?
12
8
4
4
12
8
C C , or simply C
• Each of these sample points has the same probability.
• Hence, summing these probabilities yields
P(8 successes in n trials)
= C812 (0.65)8 (0.35) 4  0.237
Binomial Probability Function
• A random variable has a binomial distribution
with parameters n and p if its probability function
is given by
p( y )  C yn p y (1  p) n y
Rats!
• In a research study, rats are injected with a drug.
The probability that a rat will die from the drug
before the experiment is over is 0.16.
Ten rats are injected with the drug.
What is the probability that at
least 8 will survive?
Would you be surprised if at
least 5 died during the
experiment?
Quality Control
• For parts machined by a particular lathe, on
average, 95% of the parts are within the
acceptable tolerance.
• If 20 parts are checked, what is the probability that
at least 18 are acceptable?
• If 20 parts are checked, what is the probability that
at most 18 are acceptable?
Binomial Theorem
• As we saw in our Discrete class,
the Binomial Theorem allows us to expand
n
( p  q)n   C yn p y q n y
y 0
• As a result, summing the binomial probabilities,
where q = 1- p is the probability of a failure,
n
n
y
n y
n
P
(
Y

y
)

C
p
(1

p
)

(
p

(1

p
))
1

 y
y
y 0
Mean and Variance
• If Y is a binomial random variable with
parameters n and p, the expected value and
variance for Y are given by
E(Y )  n p and V (Y )  n p(1  p)
Rats!
• In a research study, rats are injected with a drug.
The probability that a rat will die from the drug
before the experiment is over is 0.16.
Ten rats are injected with the drug.
• How many of the
rats are expected to
survive?
• Find the variance
for the number of
survivors.
Geometric Random Variables
Your
•
•
•
•
st
1
Success
Similar to the binomial experiment, we consider:
A sequence of independent Bernoulli trials.
The probability of “success” equals p on each trial.
Define a random variable Y as the number of the
trial on which the 1st success occurs.
(Stop the trials after the first success occurs.)
• What is the probability p(y), for y = 1,2, … ?
• On which trial is the first success expected?
S = success
• Consider the values of Y:
y = 1: (S)
(S)
S
y = 2: (F, S)
(F, S)
y = 3: (F, F, S)
S
y = 4: (F, F, F, S)
F
(F, F, S)
S
and so on…
F
(F, F, F, S)
p(1) = p
S
F
p(2) = (q)( p)
p(3) = (q2)( p)
….
3
p(4) = (q )( p)
Geometric Probability Function
• A random variable has a geometric distribution
with parameter p if its probability function is
given by
p( y)  q y 1 p
where q  1  p, for y  1,2,...
Success?
• Of course, you need to be clear on what you
consider a “success”.
• For example, the 1st success might mean finding
the 1st defective item!
(D)
D
(G, D)
D
G
(G, G, D)
D
G
G
Geometric Mean, Variance
• If Y is a geometric random variable with
parameter p the expected value and variance for Y
are given by
1
1 p
E (Y ) 
and V (Y )  2
p
p
At least ‘a’ trials? (#3.55)
• For a geometric random variable and a > 0,
show
P(Y > a) = qa
• Consider
P(Y > a) = 1 – P(Y < a)
= 1 – p(1 + q + q2 + …+ qa-1)
= qa , based on the sum of a
geometric series
“Memoryless Property”
• For the geometric distribution
P(Y > a + b | Y > a ) = qb = P(Y > b)
• “at least 5 more trials?”
We note P(Y > 7 | Y > 2 ) = q5 = P(Y > 5).
That is, “knowing the first two trials were failures,
the probability a success won’t occur on the next
5 trials”
is identical to…
“just starting the trials and a success won’t occur
on the first 5 trials”
Negative Binomial Distribution
• Again, considering a independent Bernoulli trials
with probability of “success” p on each trial…
• Instead of watching for the 1st success, let Y be the
number of the trial on which the rth success occurs.
(Stop the trials after the rth success occurs.)
• For a given value r, the probability p(y) is
p( y)  Cy1,r1 pr (1  p) yr , y  r, r 1,...
Negative Binomial
• To determine the probability the 4th success occurs
on the 7th trial, we compute
p(7)  C6,3 p4 (1  p)3
• Note this is actually just the binomial probability of
3 successes during the first 6 trials, followed by one
more success:
p (7)   C6,3 p 3 (1  p )3   p 
“a success on 4th last trial”
Negative Binomial
• For the negative binomial distribution, we have
r
r (1  p)
E (Y ) 
and V (Y ) 
2
p
p
• For example, if a success occurs 10% of the time
(i.e., p = 0.1), then to find the 4th success, we expect
to require 40 trials on average.
4
E (Y ) 
 40
0.1
Intuitively, wouldn’t you expect 40 trials?
Poisson Random Variables
Number of occurrences
• Let Y represent the number of occurrences of an
event in an interval of size s.
• Here we may be referring to an interval of time,
distance, space, etc.
• For example, we may be interested in the number
of customers Y arriving during a given time
interval.
• We call Y a Poisson random variable.
Poisson R. V.
• A random variable has a Poisson distribution with
parameter l if its probability function is given by
p( y) 
y l
l e
y!
where y = 0, 1, 2, …
We’ll see that l is the “average rate” at which
the events occur. That is, E(Y) = l .
Queries
• If the number of database queries processed by a
computer in a time interval is a Poisson random
variable with an average of 6 queries per minute,
find the probability that 4 queries occur in a one
minute interval.
64 e 6
p (4) 
 0.13385
4!
Fewer Queries
• As before, for the Poisson random variable with
an average of 6 queries per minute…
• find the probability there are less than 6 queries in
a one minute interval:
P(Y  6)  P(Y  5)
 poissoncdf (6,5)  0.44568
Some PoissonVariables
• Number of incoming telephone calls to a
switchboard within a given time interval;
• Number of errors (incorrect bits) received by a
modem during a given time interval;
• Number of chocolate chips in one of Dr. Vestal’s
chocolate chip cookies;
• Number of claims processed by a particular
insurance company on a single day;
• Number of white blood cells in a drop of blood;
• Number of dead deer along a mile of highway.
Poisson mean, variance
• If Y is a Poisson random variable with
parameter l, the expected value and variance
for Y are given by
E(Y )  l and V (Y )  l
Hypergeometric Random
Variables
Sampling without replacement
• When sampling with replacement, each trial
remains independent. For example,…
• If balls are replaced, P(red ball on 2nd draw) =
P(red ball on 2nd draw | first ball was red).
• If balls not replaced, then given the first ball is red,
there is less chance of a red ball on the 2nd draw.
Though for a large population of balls,
the effect may be minimal.
n trials, y red balls
• Suppose there are r red balls, and N – r other balls.
• Consider Y, the number of red balls in n selections,
where now the trials may be dependent.
(for sampling without replacement, when sample
size is significant relative to the population)
• The probability y of the n selected balls are red is
p( y ) 
r
y
N r
n y
N
n
CC
C
Hypergeometric R. V.
• A random variable has a hypergeometric
distribution with parameters N, n, and r
if its probability function is given by
p( y ) 
r
y
N r
n y
N
n
CC
C
where 0 < y < min( n, r ).
Hypergeometric mean, variance
• If Y is a hypergeometric random variable with
parameter p the expected value and variance for Y
are given by
nr
n r  N  r  N  n 
E (Y ) 
and V (Y ) 



N
N  N  N  1 
Sample of 20
Suppose among a supply of 5000 parts produced during a
given week, there are 100 that don’t meet the required quality
standard. Twenty of the parts are randomly selected and
checked to see if they meet the standard. Let Y be the number
in the sample that don’t meet the standard.
a). Compute the probability exactly 2 of the sampled parts
fail to meet the quality standard.
b). Determine the mean, E(Y).
Download