Example

advertisement
1
Chapter 4 – Probability Distributions
Defn: A random variable is a real-valued function whose domain is
the sample space of a random experiment. A random variable is
called discrete if it has either a finite or a countably infinite number
of possible values. If the number of possible values is uncountably
infinite, then the random variable is called continuous.
Example 1: If the random experiment is to flip a fair coin twice, the
sample space is
š‘† = {š»š», š»š‘‡, š‘‡š», š‘‡š‘‡}.
Define a random variable X = number of heads that occur when I
flip a fair coin twice. This is an example of a discrete random
variable. Then š‘‹(š»š») = 2, š‘‹(š»š‘‡) = 1, š‘‹(š‘‡š») = 1, and š‘‹(š‘‡š‘‡) =
0.
Note: We will discuss continuous random variables in Chapter 5.
In Chapter 4, we discuss discrete distributions, including certain
useful families of discrete distributions.
Defn: The probability distribution of a discrete random variable X
is a set of ordered pairs of numbers. In each pair, the first number is
a possible value, x, of X, and the second number is the probability
that the observed value of X will be x when we perform the random
experiment. We can represent the distribution either as a function or
(if the number of possible values of X is relatively small) as a table.
As a function, we would write š‘“ (š‘„ ) = š‘ƒ(š‘‹ = š‘„ ), for all possible
values of X. The probability distribution must satisfy the following
two conditions:
š‘–) š‘“(š‘„ ) ≥ 0, š‘“š‘œš‘Ÿ š‘Žš‘™š‘™ š‘„, š‘Žš‘›š‘‘
š‘–š‘–) ∑ š‘“ (š‘„ ) = 1.
š‘Žš‘™š‘™ š‘„
2
The function f is also called the probability mass function (p.m.f.) of
the random variable.
Example : Continuing the above example, we can find the
probabilities associated with X by considering the probabilities
associated with the outcomes in the sample space. Since we assume
the coin is fair, each outcome in the sample space is equally likely to
occur. Therefore, we may represent the probability distribution of X
in the following table:
x
š‘“ (š‘„ ) = š‘ƒ(š‘‹ = š‘„ )
0
0.25
1
0.50
2
0.25
The conditions above translate into saying that every number in the
second column must be between 0 and 1, and the sum of the second
column must be 1.
Defn: The cumulative distribution function (c.d.f.) (or distribution
function) for a random variable X is defined by
š¹ (š‘„ ) = š‘ƒ(š‘‹ ≤ š‘„ ), −∞ < š‘„ ≤ ∞.
Note that the c.d.f. is defined for all real values of x, even though the
random variable is discrete.
Example: Continuing the example with flipping a fair coin twice.
We want to find the c.d.f. and construct a graph. The graph will be a
step function.
3
Bernoulli Distribution
The simplest type of discrete distribution is one for which the r.v.
has two possible values.
Defn: A discrete r.v. X is said to have a Bernoulli distribution with
parameter p (X ~ Bernoulli(p)) if there are exactly two possible
values 0 and 1 of X, such that P(X = 1) = p, and P(X = 0) = 1 – p.
Example: Our random experiment is to flip a fair coin once. We
define X = number of heads. Then
X ~ Bernoulli(0.5). Then P(X = 1) = 0.5, and P(X = 0) = 0.5
Binomial Distribution
Assume that instead of flipping the fair coin once or twice, we flip it
10 times. The sample space of the experiment has 1024 possible
outcomes. The number of events that could be defined is then 21024 .
These numbers are rather unwieldy to work with directly. Hence we
define a random variable X to be the number of heads that occur
when we flip a fair coin 10 times. We want to be able to calculate
probabilities associated with X.
Defn: A discrete r.v. X is said to have a binomial distribution with
ļƒ¦nļƒ¶ x
nļ€­ x
P
X
ļ€½
x
ļ€½
p
1
ļ€­
p
ļ€Ø
ļ€©
ļ€Ø
ļ€©
ļƒ§ ļƒ·
parameters n and p if
, for x = 0, 1, …, n.
ļƒØ xļƒø
Derivation of the Binomial Distribution:
A binomial experiment is a random experiment which satisfies the
following conditions:
1) The experiment consists of a fixed number, n, of trials
4
2) The trials are identical to each other. (“Identical” means that the
trials are performed in the same way).
3) The trials are independent of each other, meaning that the
outcome of one trial gives us no information about the outcome of
any other trial.
4) Each trial has two possible outcomes, which we will call Success
and Failure.
5) The probability of Success is the same, p, for each of the trials.
Note that the author states only three conditions. Please use the five
conditions listed above when checking to see whether a random
experiment is a binomial experiment.
We let X = # of Successes in the n trials. The possible values of X
are 0, 1, 2, 3, …, n. For a given x ļƒŽ {0, 1, 2, …, n}, what is P(X =
x)?
One way that we can have exactly x Successes out of n trials is for
the first x trials to result in Success and the remaining n – x trials to
result in failure. If the trials are independent (the outcome of one
trial is unrelated to the outcome of any other trial), then
P(x Successes followed by n-x Failures) = p ļ€Ø1 ļ€­ p ļ€© .
Any other ordering of x Successes and n – x Failures will have the
same probability of occurring. How many such orderings are there?
x
nļ€­ x
Defn: Given a set of n objects, the number of ways to choose a
subset of x of the objects is given by the binomial coefficient:
ļƒ¦nļƒ¶
n!
ļƒ§ ļƒ·ļ€½
ļƒØ x ļƒø x !ļ€Ø n ļ€­ x ļ€© ! .
The number of different orderings of x Successes and n – x Failures
is the same as the number of ways of choosing x of the n trials to be
Successes.
5
Hence, the probability that there will be exactly x Successes in n
Bernoulli trials is given by:
ļƒ¦nļƒ¶
nļ€­ x
P ļ€Ø X ļ€½ x ļ€© ļ€½ ļƒ§ ļƒ· p x ļ€Ø1 ļ€­ p ļ€© ,
ļƒØ xļƒø
or x = 0, 1, …, n.
Example: Let’s go back to our random experiment of flipping a fair
coin 10 times. Let X = number of heads that occur. Does this
satisfy the conditions of being a binomial experiment?
We have
ļƒ¦10 ļƒ¶
5
5
P ļ€Ø X ļ€½ 5ļ€© ļ€½ ļƒ§ ļƒ· ļ€Ø 0.5 ļ€© ļ€Ø 0.5 ļ€© ļ€½ 0.24609375
ļƒØ5 ļƒø
.
What about P(X ļ‚£ 5)?
ļƒ¦10 ļƒ¶
x
10 ļ€­ x
P ļ€Ø X ļ‚£ 5 ļ€© ļ€½ ļƒ„ ļƒ§ ļƒ· ļ€Ø 0.5 ļ€© ļ€Ø 0.5ļ€©
ļ€½ 0.6230 .
x ļ€½0 ļƒØ x ļƒø
5
Clearly
the calculations can become tedious.
To find binomial probabilities using Excel: If X ~ binomial(n, p),
and we want to find P(X ļ‚£ x), then in the cell of the worksheet, enter
=BINOMDIST(x, n, p, TRUE).
In our example, we want to find P(X ļ‚£ 5). In cell A1, we enter
=BINOMDIST(5,10,0.5,TRUE)
We get 0.6230.
If we want P(X = 5), we enter
=BINOMDIST(5,10,0.5,TRUE) – BINOMDIST(4,10,0.5,TRUE)
We get 0.2461.
If we want P(X > 5), we enter
=1-BINOMDIST(5,10,0.5,TRUE)
We get 0.3770.
6
To find binomial probabilities using Table 1 in the Appendix (p.
505): If X ~ binomial(n, p), and we want to find P(X ≤ x), we look
up the appropriate entry in the table.
Examples of Binomial Experiments:
1) Assume that the date is October 15, 2012. We want to predict
the outcome of the Presidential election. We will assume, for
simplicity, that there are only two candidates, President Barack
Obama and former Governor Mitt Romney. We select a simple
random sample of n = 1068 voters from the population of all
eligible, registered, and likely voters. We ask each voter in the
sample, “Do you intend to vote for President Obama?” Let X =
number of voters in the sample who plan to vote to re-elect
President Obama. Is this a binomial experiment? We need to check
to see whether each of the five conditions is satisfied.
2) A worn machine tool produces 1% defective parts. We select a
simple random sample of 25 parts produced by this machine, and let
X = number of defective parts in the sample.
3) I give a pop quiz to the class consisting of 10 multiple choice
questions, each with four possible responses, only one of which is
the correct response. A student has been goofing off all semester,
and comes to class totally unprepared for the quiz. He decides to
randomly guess the answer to each question. Let X = his score on
the quiz.
4) It is known that of the entire population of adults in Florida, 5%
have a certain blood type. We select a random sample of Florida
and obtain blood samples to test. Let X = number of people in the
sample who have the blood type.
7
Example: p. 88, first example
Mean and Variance of a Discrete Distribution
Defn: The mean, or expectation, or expected value, of a discrete r.v.
n
X is given by ļ­ ļ€½ ļƒ„ xi f ļ€Ø xi ļ€© .
i ļ€½1
Defn: The variance of a discrete r.v. X is given by
n
ļ³ ļ€½ ļƒ„ ļ€Ø xi ļ€­ ļ­ ļ€© f ļ€Ø xi ļ€©
2
2
i ļ€½1
.
The standard deviation of X is just the square root of the variance.
Note: It is generally easier to calculate the variance using the
equivalent formula
n
ļ³ ļ€½ ļƒ„ xi2 f ļ€Ø xi ļ€© ļ€­ ļ­ 2
2
i ļ€½1
.
Example: The random experiment is to flip a fair coin twice. Let X
= number of heads. We found the distribution of X earlier. We
want to find the expected number of heads, and the variance and
standard deviation of X. First, the mean is given by
2
šœ‡ = ∑ š‘„š‘“(š‘„ ) = (0)(0.25) + (1)(0.50) + (2)(0.25) = 1.
š‘„=0
After we have calculated the mean, we need to calculate the second
moment of the distribution
2
∑ š‘„ 2 š‘“ (š‘„ ) = (0)(0.25) + (1)(0.50) + (4)(0.25) = 1.5.
š‘„=0
If we subtract the square of the mean from the second moment, we
obtain the variance:
8
2
šœŽ 2 = ∑ š‘„ 2 š‘“(š‘„ ) − šœ‡2 = 1.5 − 1 = 0.50,
š‘„=0
and the standard deviation
2
šœŽ = √∑ š‘„ 2 š‘“ (š‘„ ) − šœ‡2 = √0.50 = 0.7071.
š‘„=0
Interpretation of the mean of the distribution: The random
experiment is to flip a fair coin twice and count the number of heads
that occur. If we perform this experiment repeatedly, very many
times the average of the counts obtained will get closer and closer to
1.
The above example is a special case of the binomial distribution.
If X ~ Binomial(n, p), then šœ‡ = šø [š‘‹] = š‘›š‘. and šœŽ 2 = š‘‰š‘Žš‘Ÿ(š‘‹) =
š‘›š‘(1 − š‘).
Derivation of the mean of a binomial distribution (if you want to do
this; we will not go through it in class):
By definition,
š‘›
š‘›
š‘›!
šœ‡ = ∑ š‘„š‘“ (š‘„ ) = ∑ š‘„
š‘ š‘„ (1 − š‘)š‘›−š‘„ .
š‘„! (š‘› − š‘„ )!
š‘„=0
š‘„=0
The first term in the sum is 0, giving
š‘›
š‘›!
šœ‡ = ∑š‘„
š‘ š‘„ (1 − š‘)š‘›−š‘„ .
š‘„! (š‘› − š‘„ )!
š‘„=1
We will factor out n and p from the sum, and use the fact that
š‘„
1
=
,
š‘„! (š‘„ − 1)!
9
Giving
š‘›
(š‘› − 1)!
šœ‡ = š‘›š‘ ∑
š‘ š‘„−1 (1 − š‘)(š‘›−1)−(š‘„−1) .
(š‘„ − 1)! ((š‘› − 1) − (š‘„ − 1))!
š‘„=1
Next, we change the variable to z = x – 1. We find
š‘›−1
(š‘› − 1)!
š‘ š‘§ (1 − š‘)(š‘›−1)−š‘§ .
š‘§! ((š‘› − 1) − š‘§)!
š‘§=0
But the sum is just the sum of probabilities, over all possible values
of a random variable Z that has a Binomial(n-1, p) distribution.
Hence, the sum is 1, and we have
šœ‡ = š‘›š‘.
Derivation of the variance (using the same technique) is left as a
exercise.
šœ‡ = š‘›š‘ ∑
Example 1: Assume that the date is October 15, 2012. We want to
predict the outcome of the Presidential election. We will assume,
for simplicity, that there are only two candidates, President Barack
Obama and Governor Mitt Romney. We select a simple random
sample of n = 1068 voters from the population of all eligible,
registered, and likely voters. We ask each voter in the sample, “Do
you intend to vote for President Obama?” Let X = number of voters
in the sample who plan to vote to re-elect President Obama.
The expected number of voters in the sample who will vote to reelect President Obama is
šœ‡ = š‘›š‘ = (1068)(š‘).
If his level of support in the 2012 election were the same as in the
2008 election, then we would expect that
šœ‡ = š‘›š‘ = (1068)(0.53) = 566.04
voters in the sample would vote to re-elect the President. The
standard deviation of the distribution would be
10
šœŽ = √š‘›š‘(1 − š‘) = √(1068)(0.53)(0.47) = 16.3107.
Example 2: A worn machine tool produces 1% defective parts. We
select a simple random sample of 25 parts produced by this
machine, and let X = number of defective parts in the sample.
X ~ Binomial(n = 25, p = 0.01). Therefore, the expected number of
defective parts in the sample would be šœ‡ = 0.25, and the standard
deviation of X would be šœŽ = √(25)(0.01)(0.99) = 0.4975.
What if we selected such a sample and found that there were 2
defective parts in the sample? The value x = 2 is about 3.5176
standard deviations greater than the expected value under the
assumption that the defect rate is 1%. We would then conclude that
the actual defect rate is likely to be higher than 1% (assuming that
the sampling was done randomly).
Defn: Let X be a discrete random variable with p.m.f. f(x). We
define the kth moment about the origin to be
šœ‡š‘˜′ = ∑ š‘„ š‘˜ š‘“ (š‘„ ).
th
š‘Žš‘™š‘™ š‘„
We also define the k central moment (or the kth moment about the
mean) as
šœ‡š‘˜ = ∑ (š‘„ − šœ‡)š‘˜ š‘“ (š‘„ ).
š‘Žš‘™š‘™ š‘„
The first moment about the origin is just the mean of the
distribution. The second central moment is the variance of the
distribution. Third moments are related to the skewness of the
distribution (see page 89).
11
Chebyshev’s (aka Tchebychev’s)Theorem
Theorem 4.1: If a probability distribution has mean µ and standard
deviation σ < +∞, then for any k ≥ 1, the probability of obtaining a
1
value of X that deviates from µ by at least kσ is at most š‘˜ 2 .
Symbolically, we write
š‘ƒ(|š‘‹ − šœ‡| ≥ š‘˜šœŽ) ≤
1
.
š‘˜2
Equivalently, we can say that
š‘ƒ(|š‘‹ − šœ‡| < š‘˜šœŽ) ≥ 1 −
1
.
š‘˜2
Example: A worn machine tool produces 1% defective parts. We
select a simple random sample of 25 parts produced by this
machine, and let X = number of defective parts in the sample.
X ~ Binomial(n = 25, p = 0.01). Therefore, the expected number of
defective parts in the sample would be šœ‡ = 0.25, and the standard
deviation of X would be šœŽ = √(25)(0.01)(0.99) = 0.4975.
Let k = 2. Then we find that
1
š‘ƒ(|š‘‹ − šœ‡| < š‘˜šœŽ) = š‘ƒ(šœ‡ − 2šœŽ < š‘‹ < šœ‡ + 2šœŽ) ≥ 1 − = 0.75.
4
Assuming that the 1% defect rate is true, the probability that the
measure value of X will differ from the expected count, šœ‡ = 0.25,
by no more than 2šœŽ = 0.995, is at least 75%.
Let k = 3. Then we find that
1
š‘ƒ(|š‘‹ − šœ‡| < š‘˜šœŽ) = š‘ƒ(šœ‡ − 3šœŽ < š‘‹ < šœ‡ + 3šœŽ) ≥ 1 − = 0.8889.
9
Assuming that the 1% defect rate is true, the probability that the
measure value of X will differ from the expected count, šœ‡ = 0.25,
by no more than 3šœŽ = 0.995, is at least 88.89%.
12
Poisson Distribution
This distribution provides the model for the occurrence of rare
events over a period of time, distance, or some dimension.
Examples:
1) X = number of cars driving through an intersection in an hour.
2) X = number of accidents occurring at an intersection in a year.
3) X = number of alpha particles emitted by a sample of U-238
over a period of time.
The common characteristics of Poisson processes are these:
We divide the interval of time (distance, etc.) into a large number of
equal subintervals.
1) The probability of occurrence of more than one count in a
small subinterval is 0;
2) The probability of occurrence of one count in a small
subinterval is the same for all equal subintervals, and is
proportional to the length of the subinterval;
3) The count in each small subinterval is independent of other
subintervals.
We let X = count of occurrences in the entire interval.
Defn: A discrete r.v. X is said to have a Poisson distribution with
mean ļ¬ if the p.m.f. of the distribution is
eļ€­ ļ¬ ļ¬ x
f ļ€Ø xļ€© ļ€½
, for x = 0, 1, 2, 3, ….
x!
The mean and variance of the distribution are
ļ­ ļ€½ E ļ› X ļ ļ€½ ļ¬ and V(X) = ļ¬.
Note: We may derive the Poisson distribution as a limiting case of
the binomial distribution with the number of trials going to infinity
13
and the probability of success on each trial going to 0 in such a way
that the mean of the distribution remains constant. This is done in
the textbook. Here it is:
Example 1: The number of cracks in a section of interstate highway
that are significant enough to require repair is assumed to follow a
Poisson distribution with a mean of 2 cracks per mile.
First, does this situation actually satisfy the Poisson conditions?
a) What is the probability that there are no cracks that require repair
in a 5-mile section of highway? (We can find this using Table 2 in
the Appendix, page 510.)
b) What is the probability that at least one crack requires repair in a
½ mile section of highway?
Example 2: Contamination is a problem in the manufacture of
optical strorage disks. The number of particles of contamination
that occur on an optical disk has a Poisson distribution, and the
average number of particles per square centimeter of media surface
is 0.1. The area of a disk under study is 100 square centimeters.
Find the probability that 12 particles occur in the area under study.
Poisson Processes
Any random process that satisfies the Poisson conditions is called a
Poisson process.
Example: p. 107.
Example: p. 108.
14
Another example occurs in the insurance industry. Assume that an
insurance company has a large number of policyholders who have a
certain type of auto-insurance policy. Under certain general
assumptions, accidents occur for policyholders according to a
Poisson process with an expected value of λ. Hence, claims for
accidents also are modeled by a Poisson process.
Download