Slide set 10 Stat 330 (Spring 2015) Last update: January 28, 2015

advertisement
Slide set 10
Stat 330 (Spring 2015)
Last update: January 28, 2015
Stat 330 (Spring 2015): slide set 10
Geometric distribution
Review: X =number of repetitions of the experiment until we have the first
success in a Bernoulli experiment.
p
1. The pmf is: pX (k) = P (X = k) = (1 − p)k−1 · |{z}
| {z }
k−1 failures
2. Expectation E[X] = p1 ,
Variance Var[X] =
success!
1−p
p2
3. The cdf is: FX (t) = P (X ≤ t) = 1 − (1 − p)btc
Example 1: Examine the following programming statement:
Repeat S until B
Solution: Assume P (B = true) = 0.1 and let X be the number of times S
is executed. Then, X has a geometric distribution with pmf:
P (X = k) = pX (k) = 0.9k−1 · 0.1
How often is S executed on average? - What is E[X]?
1
Stat 330 (Spring 2015): slide set 10
Geometric distribution Example 2
Example 2. Watch the input queue at the alpha farm for a job that times
out. The probability that a job times out is 0.05. Let Y be the index of the
first job to time out, then Y ∼ Geo0.05. What’s then the probability that
• the third job times out?
P (Y = 3) = 0.9520.05 = 0.045
• Y is less than 3?
P (Y < 3) = P (Y ≤ 2) = 1 − 0.952 = 0.0975
• the first job to time out is between the third and the seventh?
P (3 ≤ Y ≤ 7) = P (Y ≤ 7)−P (Y ≤ 2) = 1−0.957 −(1−0.952) = 0.204
2
Stat 330 (Spring 2015): slide set 10
Geometric distribution Example 2 (cont’d)
What are the expected value for Y , what is V ar[Y ]?
Plugging in p = 0.05 in the above formulas gives us:
E[Y ] =
V ar[Y ] =
1
= 20
we expect the 20th job to be the first to time out
p
1−p
= 380
very spread out!
2
p
Interesting property of the Geometric distribution
If X ∼ Geop, then P (X ≥ i + j|X ≥ i) = P (X ≥ j) for i, j = 0, 1, 2, . . .
That is, X is memoryless: “does not remember that it counts up to i
already”!
3
Stat 330 (Spring 2015): slide set 10
Poisson distribution
Situation: The Poisson distribution follows from a certain set of assumptions
about the occurrence of “rare” events in time or space.
Examples:
X = # of alpha particles emitted from a polonium bar in an 8 minute
period.
Y = # of flaws on a standard size piece of manufactured product (e.g.,
100m coaxial cable, 100 sq.meter plastic sheeting)
Z = # of hits on a web page in a 24h period.
Definition: The Poisson probability mass function (pmf) is defined as:
p(x) =
e−λ λx
x!
for x = 0, 1, 2, 3, . . .
λ is called the rate parameter.
We denote the cdf by P oλ(t)
4
Stat 330 (Spring 2015): slide set 10
Poisson pmf (cont’d)
Check that p(x) defined above is actually a probability mass function. How?
1. Obviously, all values of p(x) ≥ 0 for x ≥ 0.
2. Do all probabilities sum to 1?
∞
x
x
X
λ
λ
= e−λeλ = 1
p(x) =
e−λ = e−λ ·
x!
x!
x=0
x=0
x=0
∞
X
∞
X
Expected Value and Variance of X ∼ P oλ are:
∞
P
P∞ e−λλx
−λ
• E[X] = x=0 x x! = 0 + e
x=1
=e
−λ
λ
∞
P
y=0
λy
y!
λx
(x−1)!
=e
−λ
λ
∞
P
x=1
λx−1
(x−1)!
=λ
• Var[X] = . . . = λ (left as an exercise)
5
Stat 330 (Spring 2015): slide set 10
Poisson distribution: Example 3.22 (Baron)
New Accounts Customers of an internet service provider initiate new
accounts at the average rate of 10 accounts per day.
Part (a) What is the probability that more than 8 new accounts will be
initiated today?
The number of initiation per day X has a Poisson distribution with parameter
λ = 10.
( The above assumes that account initiations is a rare event within the time
period of one day because no two customers can open an account at the
same time.)
Then we have P (X > 8) = 1 − P o10(8) = 1 − 0.333 = .667
6
Stat 330 (Spring 2015): slide set 10
Poisson distribution: Example 3.22 (Baron) (Cont’d
Part (b) What is the probability that more than 16 new accounts will be
initiated in two days?
The number of initiation in a two-day period Y has a Poisson distribution
with parameter λ = 20
(Note carefully that the average number of initiation for a two-day period
is 20)
Then we have
P (Y > 16) = 1 − P o20(16) = 1 − 0.221 = .779
Note that X and Y are random variables with different Poisson distributions
because the events they represent occur during different time intervals.
This is a key step in solving Poisson distribution related problems.
7
Stat 330 (Spring 2015): slide set 10
Poisson distribution: Another Example
How do we choose λ in an example? - look at the expected value!
Example: A manufacturer of chips produces 1% defectives. What is the
probability that in a box of 100 chips no defective is found?
Solution: Let X be the number of defective chips found in the box. Model
X as a Binomial variable with distribution B100,0.01. Then
P (X = 0) =
100
0.991000.010 = 0.366.
0
Approximation: On the other hand, a defective chip can be considered to
be a rare event, since p is small (p = 0.01). So, approximate X as Poisson
variable.
We need to obtain a value for λ!
8
Stat 330 (Spring 2015): slide set 10
Poisson distribution: Example (cont’d)
Note that we expect 100 · 0.01 = 1 chip out of the box to be defective.
We know that the expected value of X is λ. In this example, therefore, we
take λ = 1.
Then
e−110
= 0.3679.
P (X = 0) =
0!
n
k
Ramification: For larger k, however, the binomial coefficient
becomes
hard to compute, and it is easier to use the Poisson distribution instead of
the Binomial distribution.
9
Stat 330 (Spring 2015): slide set 10
Poisson to approximate Binomial
Result (not a theorem): For large n, the Binomial distribution can be
approximated by the Poisson distribution, where λ is taken as np:
k
n k
n−k
−np (np)
p (1 − p)
≈e
k
k!
Rule of thumb: use Poisson approximation if n ≥ 20 and (at the same time)
p ≤ 0.05.
Theorem: {Xn} is a sequence of random variables s.t. Xn ∼ Bin(Nn, pn)
with Nn → ∞, pn → 0 and Nnpn → λ ∈ (0, ∞), then
Xn → X ∼ Poisson(λ)
in distribution.
Such a beautiful result requires very delicate mathematics.
10
Stat 330 (Spring 2015): slide set 10
Poisson to approximate Binomial (example)
Example: (Typos) Imagine you are supposed to proofread a paper. Let us
assume that there are on average 2 typos on a page and a page has 1000
words. This gives a probability of 0.002 for each word to contain a typo.
The number of typos on a page X is then a Binomial random variable, i.e.
X ∼ B1000,0.002.
The probability for no typo on a page is P (X = 0), i.e
P (X = 0) = (1 − 0.002)1000 = 0.9981000 = 0.13506
alternatively
2 1000
≈ e−2 = 0.13534
P (X = 0) = 1 −
1000
since (1 − x/n)n → ex.
11
Stat 330 (Spring 2015): slide set 10
The probability of one typo on a page is
1000
P (X = 1) =
0.002 · 0.998999 = 0.27067
1
and
2 999
2 1−
P (X = 1) = 1000 ·
≈ 2 · e−2 = 0.27067!
1000
1000
So basically, we are calculating this probability using the Poisson pmf with
λ = 1000 · 0.002 = 2
That is use P (X = x) =
e−λ λx
x!
to calculate
e−221
P (X = 1) ≈
= 2 · e−2 = 0.27067
1!
12
Stat 330 (Spring 2015): slide set 10
Poisson to approximate Binomial (example cont’d)
The probability for two typos on a page is P (X = 2), i.e
P (X = 2) =
1000
(1 − 0.002)9980.0022 = 0.27094
2
alternatively, using X ≈ P o2
e−222
= 0.27067
P (X = 2) ≈
2!
13
Download