Probability --

advertisement
Discrete Math
CS 2800
Prof. Bart Selman
selman@cs.cornell.edu
Module
Probability --- Part d)
1) Probability Distributions
2) Markov and Chebyshev Bounds
1
Discrete Random variable
Discrete random variable
– Takes on one of a finite (or at least countable) number of
different values.
– X = 1 if heads, 0 if tails
– Y = 1 if male, 0 if female (phone survey)
– Z = # of spots on face of thrown die
2
Continuous Random variable
Continuous random variable (r.v.)
– Takes on one in an infinite range of different values
– W = % GDP grows (shrinks?) this year
– V = hours until light bulb fails
For a discrete r.v., we have Prob(X=x), i.e., the probability that
r.v. X takes on a given value x.
What is the probability that a continuous r.v. takes on a specific
value? E.g. Prob(X_light_bulb_fails = 3.14159265 hrs) = ??
0
However, ranges of values can have non-zero probability.
E.g. Prob(3 hrs <= X_light_bulb_fails <= 4 hrs) = 0.1
3
Probability Distribution
The probability distribution is a complete probabilistic description
of a random variable.
All other statistical concepts (expectation, variance, etc) are derived
from it.
Once we know the probability distribution of a random variable,
we know everything we can learn about it from statistics.
4
Probability Distribution
Probability function
– One form the probability distribution of a discrete
random variable may be expressed in.
– Expresses the probability that X takes the value x as a
function of x (as we saw before):
PX x  P( X  x)
5
Probability Distribution
The probability function
– May be tabular:
1 w. p. 1 / 2

X  2 w. p. 1 / 3
3 w. p. 1 / 6

6
Probability Distribution
The probability function
– May be graphical:
.50
.33
.17
1
2
3
7
Probability Distribution
The probability function
– May be formulaic:
4 x
P X  x  
6
for x  1,2,3
8
1
2

3
X 
4
5

6
Probability Distribution: Fair die
w. p. 1 / 6
w. p. 1 / 6
w. p. 1 / 6
w. p. 1 / 6
w. p. 1 / 6
w. p. 1 / 6
.50
.33
.17
1
2
3
4
5
6
9
Probability Distribution
The probability function, properties
PX x  0 for each x
 P x   1
X
x
10
Cumulative Probability Distribution
Cumulative probability distribution
– The cdf is a function which describes the probability
that a random variable does not exceed a value.
FX x  P X  x
Does this make sense for a continuous r.v.?
Yes!
11
Cumulative Probability Distribution
Cumulative probability distribution
– The relationship between the cdf and the probability
function:
FX x   P X  x    PX  X  y 
y x
12
PX  x   P( X  x)  1/ 6
Cumulative Probability Distribution
FX x   P X  x    PX  X  y 
Die-throwing
y x
tabular
 0
1 / 6

2 / 6

FX  x   3 / 6
4 / 6

5 / 6

6 / 6
graphical
x 1
1 x  2
2 x3
3 x  4
4 x5
5 x6
x6
1
1 2 3 4 5 6
13
Cumulative Probability Distribution
The cumulative distribution function
– May be formulaic (die-throwing):
P  X  x 

floor min  max  x,0  , 6 

6
14
Cumulative Probability Distribution
The cdf, properties
0  FX x  1 for each x
FX x is non - decreasing
FX x is continuous from the right
15
Example CDFs
Of a discrete probability
distribution
Of a continuous probability
distribution
Of a distribution which has both a
continuous part and a discrete part.
16
Functions of a random variable
It is possible to calculate expectations and variances
of functions of random variables
E g  X    g x P X  x 
x
2












V g X   g x  E g x P X  x 
x
17
Functions of a random variable
Example
– You are paid a number of dollars equal to the
square root of the number of spots on a die.
– What is a fair bet to get into this game?
P(X=x) Product
x
x
1
1
1/6
0.167
2
1.414
1/6
0.236
3
1.732
1/6
0.289
4
2
1/6
0.333
5
2.231
1/6
0.372
6
2.449
1/6
0.408
Tot
1.804
18
Functions of a random variable
Linear functions
– If a and b are constants and X is a random variable
– It can be shown that:
EaX  b  aE X   b
V aX  b   a V  X 
2
Intuitively,
why does b not
appear in
variance?
And, why a2 ?
19
The Most Common
Discrete Probability Distributions
(some discussed before)
1) --- Bernoulli distribution
2) --- Binomial
3) --- Geometric
4) --- Poisson
20
Bernoulli distribution
The Bernoulli distribution is the “coin flip” distribution.
X is Bernoulli if its probability function is:
p
1 w. p.
X 
0 w. p. 1  p
X=1 is usually interpreted as a “success.” E.g.:
X=1 for heads in coin toss
X=1 for male in survey
X=1 for defective in a test of product
X=1 for “made the sale” tracking performance
21
Bernoulli distribution
Expectation:
E X   p1  1  p0  p
Variance:
  E X 
V X   E X
2
2
 p1  1  p 0   p 
2
2
2
 p  p  p1  p 
2
22
Binomial distribution
The binomial distribution is just n independent Bernoullis
added up.
It is the number of “successes” in n trials.
If Z1, Z2, …, Zn are Bernoulli, then X is binomial:
X  Z1  Z 2    Z n
23
Binomial distribution
The binomial distribution is just n independent Bernoullis
added up. Testing for defects “with replacement.”
–
–
–
–
Have many light bulbs
Pick one at random, test for defect, put it back
Pick one at random, test for defect, put it back
If there are many light bulbs, do not have to replace
24
Binomial distribution
Let’s figure out a binomial r.v.’s probability function.
Suppose we are looking at a binomial with n=3.
We want P(X=0):
– Can happen one way: 000
– (1-p)(1-p)(1-p) = (1-p)3
We want P(X=1):
– Can happen three ways: 100, 010, 001
– p(1-p)(1-p)+(1-p)p(1-p)+(1-p)(1-p)p = 3p(1-p)2
We want P(X=2):
– Can happen three ways: 110, 011, 101
– pp(1-p)+(1-p)pp+p(1-p)p = 3p2(1-p)
We want P(X=3):
– Can happen one way: 111
– ppp = p3
Binomial distribution
0
So, binomial r.v.’s probability function

1
X 
2

3
w. p.
1  p 
3
w. p. 3 p 1  p 
2
w. p. 3 p 2 1  p 
w. p.
p3
PX x   # of ways  p 1  p 
x
n x
n!
n x
x

p 1  p 
x ! n  x  !
26
Binomial distribution
Typical shape of binomial:
– Symmetric
27
Expectation:
n
 n
 n
E  X   E   Zi    E  Zi    p  np
i 1
 i 1  i 1
Variance:
 n
 n
V  X   V   Z i    V Z i 
 i 1  i 1
n
  p1  p   np1  p 
i 1
Aside:
V  X  Y   V ( X )  V (Y )  2V ( X )
If V(X) = V(Y). And?
But
V  X  X   V (2 X )  4V ( X ) Hmm…
28
Binomial distribution
A salesman claims that he closes a deal 40% of the time.
This month, he closed 1 out of 10 deals.
How likely is it that he did 1/10 or worse given his claim?
29
Binomial distribution
PX  0  
PX 1 
10!
0
10
0.4
0.6
     11 0.006   0.006
0!10!
10!
9
0.4
0.6
    10  0.4  0.010   0.040
1! 9!
Note:
PX ( X  1)  0.046
Less than 5% or
1 in 20.
So, it’s unlikely
that his success
rate is 0.4.
10  9 8  7 
10!
4
6
PX  4  
 0.4   0.6  
 0.0256  0.0467   0.251
4! 6!
4  3 2 
30
Binomial and normal / Gaussian
distribution
The normal distribution
is a good approximation
to the binomial
distribution. (“large” n,
small skew.)
B(n, p)
Prob. density function:
Geometric Distribution
A geometric distribution is usually interpreted as number of
time periods until a failure occurs.
Imagine a sequence of coin flips, and the random variable X is
the flip number on which the first tails occurs.
The probability of a head (a success) is p.
32
Geometric
Let’s find the probability function for the geometric
distribution:
PX 1  1  p 
PX 2   p1  p 
PX 3  p p 1  p   p 1  p 
2
etc.
So,
PX  x   ?
PX x   p
x 1
1  p 
(x is a positive integer)
Geometric
Notice, there is no upper limit on how large X can be
Let’s check that these probabilities add to 1:


x 1
x 1
 PX x    p
x 1

1  p   1  p  p x 1
x 1

1
 1  p  p  1  p 
1
1  p 
x 0
x
Geometric series
34

1
p 

1 p
x 0
x

 xp
x 0
x 1
(| p | 1)
Geometric
differentiate both sides w.r.t. p:
1
1
 1
 1 
2
(1  p)
(1  p) 2
See Rosen
page 158,
example 17.
Expectation:


x 1
x 1
 xPX x    xp
x 1

1  p   1  p  xpx 1
x 1
1
1
 1  p 

2
1  p  1  p
Variance:
p
V X  
1  p 2
35
Poisson distribution
The Poisson distribution is typical of random variables which
represent counts.
– Number of requests to a server in 1 hour.
– Number of sick days in a year for an employee.
36
The Poisson distribution is derived from the following underlying
arrival time model:
– The probability of an unit arriving is uniform through time.
– Two items never arrive at exactly the same time.
– Arrivals are independent --- the arrival of one unit does not make
the next unit more or less likely to arrive quickly.
37
Poisson distribution
The probability function for the Poisson distribution with parameter  is:
 is like the arrival rate --- higher means more/faster arrivals
e   x
Px  X  x  
x!
for x  0,1,2,3,...
E X   V  X   
38
Poisson distribution
Shape
Low 
Med 
High 
39
Markov and Chebyshev bounds
40
Often, you don’t know the exact probability distribution
of a random variable.
We still would like to say something about the probabilities
involving that random variable…
E.g., what is the probability of X being larger (or smaller) than some
given value.
We often can by bounding the probability of events based on partial
information about the underlying probability distribution
Markov and Chebyshev bounds.
Note: relates
cumulative distribution
to expected value.
Theorem  Markov Inequality
Let X be a nonnegative random variable with E[X] = .
Then, for any t > 0,
Hmm. What if
But
P( X  t ) 
t
?

t
P( X  t )  1
1
t  2 gives P( X  2 ) 
2
Sure! 
“Can’t have too much
prob. to the right42of
E[X]”
t . P( X  t )  E[ X ]
P( X  x )
 t
Proof
t . P( X  t )  t .  P( X  x)
x:x t
x

x  P( X  x)

x:x t

x x  P( X  x)
 E[ X ]
Where did we use X >= 0?
3rd line
E[ X ]
I.e. P ( X  t ) 
t 43
I(X )  0
E[ X ]  
P( X  t ) 

t
t 0
Alt. proof  Markov Inequality
0
Define Y  
t
X t
X t
 P( X  t )
 pY ( y )  
 P( X  t )
A discrete random variable
y0
y t
 E[Y] E[X]
E[Y ]  0  P( X  t )  t  P( X  t )  E[ X ]  
44
P( X  t ) 

t
Example:
Consider a system with mean time to failure = 100 hours.
Use the Markov inequality to bound the reliability of the system,
R(t) for t = 90, 100, 110, 200
X – time to failure of the system; E[X]=100
By Markov
R(t)= P[X>t] , with t =90, 100, 110 , 200
P( X  90)  100 / 90  1.11

P( X  100)  100 /100  1
P( X  110)  100 /110  0.9
Markov inequality is somewhat crude,
P( X  200)  100 / 200  0.5
since only the mean is assumed to be known.
Theorem  Chebyshev's Inequality

Assume that mean and variance are given.
Better estimate of probability of events of interest
using Chebyshev inequality:

Proof: Apply Markov inequality to non-negative
r.v. (X- )2 and number t2 to obtain

46
I(X )  0
E[ X ]  
P( X  t ) 

t
t 0
Theorem  Chebyshev's Inequality
Alternate form
E[ X ]   X
V[X ]  
2
X
P  X  X  t  

t
2
X
2
47
I(X )  0

P( X  t ) 
t 0
E[ X ]  
t
Theorem  Chebyshev's Inequality
P  X  X  t  

t
2
X
2
Because
2

P  X  X  t   P  X  X 

2

E  X  X  


2

t
2

t
48
P  X  X  t  


t
2
X
2
Chebyshev inequality: Alternate forms
Yet two other forms of Chebyshev’s ineqaulity:
Says something about the probability of
being “k standard deviations from the mean”.
49
P  X  X  t  

t
2
X
2
Theorem  Chebyshev's Inequality
1
P  X   X  k X   2
k
1
P  X   X  k X   1  2
k
X
X
X
 0.934
 0.889
 0.75
0
X
X
X
X
X
X
50
X ~ N (  , P ) X   X  t  
2

2
X
2
Inequality
Theorem  Chebyshev's t
1
P  X   X  k X   2
k
1
P  X   X  k X   1  2
k
Facts:
X
X
X
 0.934
 0.889
 0.75
0
X
X
X
X
X
X
51
Example
52
P  X  X
Aside “just” Markov:
P( X  t ) 

t
70
P( X  84) 
 0.8
84
  60
 6
P  X   X  4 X   1/16
X  84
X    24  4



1
 k X   2
k
1000  161  62.5  70






53
Download