Chapter 4

advertisement
Chapter 4 Random Variables and Discrete Probability Distributions
A Random variable is function that assigns a unique numerical value to
each outcome of the sample space. Its values vary according to rules of
probability.
It is different from variables that you saw in algebra, because the value it
will take on when an experiment is run is unknown, non-deterministic.
The probability distribution of a random variable gives all the values the
random variable can take on and the probability the random variable takes
on each of its values.
Discrete RV’s count the number of some object, occurrence or subject.
Ex: A package of 5 pens. Let Y be the RV that counts the number of pens
that are defective in the sample.
The probability distribution is given as follows:
Y
0
1
2
3
4
5
P(Y) .75
.10
.06
.04
.03
.02
P(Y=0)
P(Y=1)
P(Y=2)
P(Y=3)
P(Y=4)
P(Y=5)
= .75
= .10
= .06
= .04
= .03
= .02
0 pens of 5 was defective
Note that all probabilities are between 0 and 1 and the sum of the
probabilities = 1.
Y = number of pens that are defective in the package
These two properties will be true of all DRV’s.
What is the probability that at most 2 pens are defective?
P(Y ≤ 2) = P(Y=0) + P(Y=1) + P(Y=2) = .75 + .10 + .06 = .91
What is the probability that at least 1 pen is defective?
P( Y ≥ 1) = 1 – P(Y = 0) = 1 - .75 = .25
What is the probability that between 1 and 4 (inclusive) pens are
defective?
P (1 ≤ Y ≤ 4) = P (Y ≤ 4) – P (Y < 1) = .98 - .75 = .23
P(Y=1) + P(Y=2) + P(Y=3) + P(Y=4) = .10 + .06 + .04 + .03 = .23
Note that P (Y ≤ 4) – P (Y ≤ 1) = .98 - .85 = .15 which is not the correct
answer.
Chapter 4 Discrete Random Variables Expected Value, Variance and
Standard Deviation
Expected Values
Expected value = mean (weighted by probabilities)
E(X) = μ = µx = ∑x P(X = x) = ∑x p(x)
The mean of a DRV is the weighted average. It is interpreted as the value
that will happen on average. It does not have to be an actual value that the
DRV can be.
For the pen example recall that the probability distribution function (pdf)
There is a package of 5 pens. Let Y be the RV that counts the number of
pens that are defective in the sample.
The probability distribution is given as follows:
Y
0
1
2
3
4
5
P(Y) .75
.10
.06
.04
.03
.02
Last example:
E(Y) = 0*.75 + 1*.10 + 2*.06 + 3*.04 + 4*.03 + 5*.02 = 0.56
Does this mean that there are 0.56 defective pens in each package of 5?
No.
It means on average there are 0.56 defective pens in each package of 5.
So if you had 100 packages of 5 pens, there would be about 56 defective
pens.
Variance(X) = V(X) = σ2 = E[(X – μ)2] = E(X2) - µ2 where μ = E(X)
Standard Deviation(X) = σ = √( σ2)
For pen example: Recall E(Y) = .56
Y
0
1
P(Y)
0.75
0.1
Y-µ
(Y-µ)2
(Y-µ)2*p(y)
-0.56
0.3136
0.2352
Variance
Stdev
1.3664
1.168931
0.44
0.1936
0.01936
2
0.06
3
0.04
4
0.03
5
0.02
1.44
2.0736
0.124416
2.44
5.9536
0.238144
3.44
11.8336
0.355008
4.44
19.7136
0.394272
Y
P(Y)
0
0.75
1
0.1
2
0.06
3
0.04
4
0.03
5
0.02
Y^2
Y^2P(Y=y)
0
0
1
0.1
4
0.24
9
0.36
16
0.48
25
0.5
variance = 1.68 -0.562 = 1.3664
Standard deviation = 1.168931
1.68
The Binomial Probability Distribution.
Ex:
Test of 3 Multiple Choice Questions. The probability of getting any
one question right is ¼ = .25 Assume that the questions are independent.
Let Y = # of questions right
S = {RRR, RRW, RWR, WRR, WWR, WRW, RWW, WWW}
Note: P(Y=3) = P(RRR) ≠1/8
The outcomes are not equally-likely!
A tree diagram may help.
P(Y=0) =¾*¾*¾ = 27/64=.42
P(Y=1) = P(RWW+WRW+WWR)
P(Y=1) =3 * (¾*¾*¼) = 27/64 =.42
P(Y=2) = P(WRR+RWR+RRW)
P(Y=2) =3 * (¾*¼*¼) = 9/64 =.14
P(Y=3) = P(RRR)=¼*¼*¼ =1/64 = .02
Note: 27/64+27/64+9/64+1/64=1
A Binomial Random Variable counts the number of “successes” in n trials.
In the last example the number of correct answers in 3 questions.
Five Characteristics of a Bin R.V.
1.
Fixed # n of identical trials. Ex. 3 Multiple Choice questions or
Selecting 10 people from a large population.
2. The outcome of each trial can be classified as a Success or Failure.
Success is not necessarily a good thing.
3. The trials of the experiment are independent. Outcomes of previous
trials to not affect future trials.
4. The probability of a Success at each trial = p, is the same for all trials.
This also means that the probability of a Failure is the same also = 1-p = q.
5. The random variable counts the number of Successes in the n trials.
Clues that the RV is Binomial:
1. Random sample of size n
2. Sample comes from a large population, or sampling with replacement.
3. Each trial can be classified as a Success or Failure.
Binomial Formula
P(Y = x) = n C x p x q n – x
For x = 0, 1, 2, … , n
For Multiple Choice test example. Y be Bin(n=3, p = ¼ )
Sampling with replacement, if you answer the first question correctly 1 you
can answer correctly again, R is replaced.
x = 0 means 0 questions correct.
P(Y=0) = 3 C 0 (¾) 3 (¼) 0
P(Y=0) = 1 (27/64) (1) = 27/64
x = 1 means 1 questions correct.
P(Y=1) = 3 C 1 (¾) 2 (¼) 1
P(Y=1) = 3 (9/16) (¼) = 27/64
x = 2 means 1 question correct.
P(Y=2) = 3 C 2 (¾) 1 (¼) 2
P(Y=2) = 3 (3/4) (1/16) = 9/64
x = 3 means 3 questions correct.
P(Y=3) = 3 C 3 (¾) 0 (¼) 3
P(Y=3) = 1 (1) (1/64) = 1/64
On the TI83/84
[2nd] [DIST] (VARS key)
0: binompdf(n, p, x)
binompdf (n, p, x) gives P(Y = x)
if Y is Bin(n=3, p = ¼ )
P(Y = 2) = binompdf(3, ¼, 2) = 9/64 = 0.141
On the TI83/84
[2nd] [DISTR] (VARS key)
binomcdf(n, p, x)
binomcdf(n, p, x) gives P(Y ≤ x)
if Y is Bin(n=10, p = .6)
P(Y ≤ 8) = binomcdf(10, .6, 8) = 0.9536
What if you were asked for P(Y > 8)?
P(Y > 8) = 1 – P(Y ≤ 8) = 1 - .9536 =.0464
Use pdf for probability that Y exactly = a number and
Use cdf for probability that Y <, > , ≤ , ≥ numbers.
Ex: It is known that 25% of the population are bald. A random sample of 20
people is taken.
1.
What is the probability exactly 5 people of the sample are bald?
2.
What is the probability at most 4 people of the sample are bald?
3.
What is the probability more than 6 people of the sample are bald?
4.
What is the probability at least 1 person of the sample is bald?
5.
What is the probability between 3 and 10 people of the sample are
bald?
Clues that it is binomial: sampling from a large population so the trials will
be (in effect) independent, each trial is a success or failure. A success = the
person in bald. Your n = 20, p = .25 and Y counts the number of bald
people in the sample.
1. P(Y = 5) = binompdf(20, .25, 5) = .202
2. P(Y ≤ 4) = binomcdf(20, .25, 4) = .415
3. P(Y > 6) = 1 – P(Y ≤ 6) = 1 - binomcdf(20, .25, 6) = 1 - .786 = .214
4. P(Y ≥ 1) = 1 – P(Y = 0) = 1 - binompdf(20, .25, 0) = 1 - .003 = .997
5. P(3 < Y < 10) = P (4 ≤ Y ≤ 9) = P(Y ≤ 9) – P(Y ≤ 3) =
Binomcdf( 20, .25, 9) – binomcdf(20,.25,3) = .986 - .225 = .761
The mean or expected value of a Binomial RV is μ = np.
The standard deviation of a Binomial RV is σ = √(npq)
Recall q = 1- p
In the above example (bald people):
µ = 20 * .25 = 5
σ = √(20 * .25 * .75) = √3.75 = 1.936
A general note: In most disciplines 0.05 is considered the cutoff value
between rare / unusual events and non-rare events. When the
probability of an event is less than 0.05 it is considered rare or unusual.
Ex2. It is known that 10% of the US population is left-handed. A random
sample of 15 people is taken.
1. What is the probability exactly 0 people are left-handed?
2. What is the probability exactly 1 person is left-handed?
3. What is the probability exactly 2 people are left-handed?
4. What is the probability less than 3 people are left-handed?
5. What is the probability of at least one left-handed person?
6. What are the mean and the standard deviation of left handed people in
the sample?
Answers:
Let X count the number of left-handed people in the sample. Since we
have a large population and each trial can be classified as a success or
failure, X is a binomial random variable. Its parameters are n = 15 and p =
0.10. Then the answers to the questions are:
1.
P(X = 0) = .2059 = binompdf(15,.10,0)
2.
P(X = 1) = .3432
3.
P(X = 2) = .2669
4.
P(X < 3) = P(X≤ 2)=.8159 (used cdf) = binomcdf(15,.1,2)
P(X< 3) = .2059 + .3432 + .2669
P(X <3) = .816 (Round off error)
5.
P(X ≥ 1) = 1 – P(X < 1) = 1 – P(X = 0) = 1 - .2059 = .7941
6.
μ= n*p = 15 * .1 = 1.5
σ = √(npq) = √(15 * .1 * .9) = √1.35 = 1.162
Sample Problem
1. An allergist claims that 20% of her patients are allergic to
dandelions. Find the following:
a. What is the probability exactly 2 of her next 5 patients are
allergic to dandelions?
b. What is the probability none of her next 5 patients will be
allergic to dandelions?
c. What is the probability at least 1 of her next 5 patients will
be allergic to dandelions?
Answers:
X = number of patients who are allergic to dandelions of the 5
X has a Binomial Distribution with n = 5 and p = .20
a. P(X = 2) = .2048
b. P(X = 0) = .3277
c. P(X ≥ 1) = 1 – P(X = 0) = .6723
Another Example:
There are 10 pens in a bag. Two of the pens do not work. Three pens are
randomly selected. What is the probability that all three pens work?
Your first instinct might be to try the Binomial distribution. Let Y be the
number of pens that work in the sample, so n = 3 and p = .8, and you want
to find the P(Y = 3) so you would calculate: binompdf (3, .8 ,3) = .512.
This however is incorrect, because you have a small population (10) and
the trials are not independent so your p changes. Draw tree diagram!
The good news is you have already seen problems like this before in
chapter 4. The correct answer is to calculate how many ways you can get 3
pens that work divided by how many ways you can select 3 pens total.
We can still let Y be the number of pens that work in the sample, so
P(Y = 3) = 8C3 / 10C3 = 56/120 = .467 which is not too far from .512.
The above is an example of the Hyper-Geometric Distribution. It is much
like the binomial distribution except now we have a small population so
our trials are independent and our p = probability of a success at each trial
changes. Let n be the sample size, N be the population size and let M be
the number of successes in the population. Then Y has a hyper-geometric
distribution when Y counts the number of successes in the sample. The pdf
of Y is given by:
MCx * ( N  M )C (n  x)
P(Y  x) 
NCn
Where max (0, n – N + M) ≤ x ≤ min (n, M)
Note that M + (N – M) = N and x + (n – x) = n
Ex.
In a lot of 28 gun cartridges, 8 were found to be contaminated and
20 were “clean.” A random sample if 6 cartridges is taken from the lot.
a. Find the probability that all 6 are clean.
b. Find the probability that at least one is contaminated.
c. Find the probability that exactly 4 are clean.
Download