# Discrete Random Variables and Probability Distributions

```Discrete Random Variables and
Probability Distributions
Random Variables
• Random Variable (RV): A numeric outcome that results
from an experiment
• For each element of an experiment’s sample space, the
random variable can take on exactly one value
• Discrete Random Variable: An RV that can take on only
a finite or countably infinite set of outcomes
• Continuous Random Variable: An RV that can take on
any value along a continuum (but may be reported
“discretely”)
• Random Variables are denoted by upper case letters (Y)
• Individual outcomes for an RV are denoted by lower
case letters (y)
Probability Distributions
• Probability Distribution: Table, Graph, or Formula that
describes values a random variable can take on, and its
corresponding probability (discrete RV) or density
(continuous RV)
• Discrete Probability Distribution: Assigns probabilities
(masses) to the individual outcomes
• Continuous Probability Distribution: Assigns density at
individual points, probability of ranges can be obtained by
integrating density function
• Discrete Probabilities denoted by: p(y) = P(Y=y)
• Continuous Densities denoted by: f(y)
• Cumulative Distribution Function: F(y) = P(Y≤y)
Discrete Probability Distributions
Probabilit y (Mass) Function :
p ( y )  P (Y  y )
p ( y )  0 y
 p( y)  1
all y
Cumulative Distributi on Function (CDF) :
F ( y )  P (Y  y )
F (b)  P (Y  b) 
b
 p( y )
y  
F ()  0 F ()  1
F ( y ) is monotonica lly increasing in y
Example – Rolling 2 Dice (Red/Green)
Y = Sum of the up faces of the two die. Table gives value of y for all elements in S
Red\Green
1
2
3
4
5
6
1
2
3
4
5
6
7
2
3
4
5
6
7
8
3
4
5
6
7
8
9
4
5
6
7
8
9
10
5
6
7
8
9
10
11
6
7
8
9
10
11
12
Rolling 2 Dice – Probability Mass Function &amp; CDF
y
p(y)
F(y)
2
1/36
1/36
3
2/36
3/36
4
3/36
6/36
5
4/36
10/36
6
5/36
15/36
7
6/36
21/36
8
5/36
26/36
9
4/36
30/36
10
3/36
33/36
11
2/36
35/36
12
1/36
36/36
# of ways 2 die can sum to y
p( y) 
# of ways 2 die can result in
y
F ( y )   p (t )
t 2
Rolling 2 Dice – Probability Mass Function
Dice Rolling Probability Function
0.18
0.16
0.14
0.12
p(y)
0.1
0.08
0.06
0.04
0.02
0
2
3
4
5
6
7
y
8
9
10
11
12
Rolling 2 Dice – Cumulative Distribution Function
Dice Rolling - CDF
1
0.9
0.8
0.7
F(y)
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
y
8
9
10
11
12
13
Expected Values of Discrete RV’s
• Mean (aka Expected Value) – Long-Run average
value an RV (or function of RV) will take on
• Variance – Average squared deviation between a
realization of an RV (or function of RV) and its mean
• Standard Deviation – Positive Square Root of
Variance (in same units as the data)
• Notation:
– Mean: E(Y) = m
– Variance: V(Y) = s2
– Standard Deviation: s
Expected Values of Discrete RV’s
Mean : E (Y )  m   yp( y )
all y
Mean of a function g (Y ) : E g (Y )   g ( y ) p( y )
all y

 

p ( y )    y  2 ym  m  p ( y ) 
Variance : V (Y )  s 2  E (Y  E (Y )) 2  E (Y  m ) 2 
  ( y  m )2
2
all y
2
all y
  y p ( y )  2 m  yp ( y )  m
2
all y
 
all y
2
 p( y) 
all y
 
 E Y 2  2 m ( m )  m 2 (1)  E Y 2  m 2
Standard Deviation : s   s 2
Expected Values of Linear Functions of Discrete RV’s
Linear Functions : g (Y )  aY  b (a, b  constants )
E[aY  b]   (ay  b) p( y ) 
all y
 a  yp ( y )  b p( y )  am  b
all y
all y
V [aY  b]   (ay  b)  (am  b)  p( y ) 
2
all y
 ay  am 
2

all y
a
all y
2
 ( y  m)
all y
s aY b  a s

p( y )   a  y  m  p( y ) 
2
2
2
p( y )  a s
2
2
Example – Rolling 2 Dice
y
p(y)
yp(y)
y2p(y)
2
1/36
2/36
4/36
3
2/36
6/36
18/36
4
3/36
12/36
48/36
5
4/36
20/36
100/36
6
5/36
30/36
180/36
7
6/36
42/36
294/36
8
5/36
40/36
320/36
9
4/36
36/36
324/36
10
3/36
30/36
300/36
11
2/36
22/36
242/36
12
1/36
12/36
144/36
Sum 36/36
=1.00
252/36
=7.00
1974/36=
54.833
12
m  E (Y )   yp( y )  7.0
y 2
s 2  E Y 2  m 2   y 2 p( y )  m 2
12
y 2
 54.8333  (7.0) 2  5.8333
s  5.8333  2.4152
Tchebysheff’s Theorem/Empirical Rule
• Tchebysheff: Suppose Y is any random variable
with mean m and standard deviation s. Then:
P(m-ks ≤ Y ≤ m+ks) ≥ 1-(1/k2) for k ≥ 1
– k=1: P(m-1s ≤ Y ≤ m+1s) ≥ 1-(1/12) = 0 (trivial result)
– k=2: P(m-2s ≤ Y ≤ m+2s) ≥ 1-(1/22) = &frac34;
– k=3: P(m-3s ≤ Y ≤ m+3s) ≥ 1-(1/32) = 8/9
• Note that this is a very conservative bound, but
that it works for any distribution
• Empirical Rule (Mound Shaped Distributions)
– k=1: P(m-1s ≤ Y ≤ m+1s)  0.68
– k=2: P(m-2s ≤ Y ≤ m+2s)  0.95
– k=3: P(m-3s ≤ Y ≤ m+3s)  1
Proof of Tchebysheff’s Theorem
Breaking real line into 3 parts :
i ) (-,( μ-ks )  ]
ii ) [( μ-ks ), ( μ  ks )] iii ) [( μ  ks )  , )
Making use of the definition of Variance :

V (Y )  s   ( y  m ) 2 p( y ) 
2

( μ-ks ) 
 ( y  m)
2
p( y) 

( μ  ks )
s ( y  m )
2

p( y ) 
 ( y  m)
2
p( y)
( μ  ks ) 
( μ-k )
In Region i ) : y  m   ks  ( y  m ) 2  k 2s 2
In Region iii ) : y  m  ks  ( y  m ) 2  k 2s 2
 s  k s P(Y  m  ks ) 
2
2
2
( μ  ks )
2
2 2
(
y

m
)
p
(
y
)

k
s P(Y  m  ks )

( μ-ks )
 s 2  k 2s 2 P(Y  m  ks )  k 2s 2 P(Y  m  ks ) 
 k 2s 2 1  P ( m  ks  Y  m  ks )
s2
1
1
 2 2  2  1  P( m  ks  Y  m  ks )  P ( m  ks  Y  m  ks )  1  2
ks
k
k
Moment Generating Functions (I)
Consider t he series expansion of e x :

i
2
3
x
x
x
e x    1  x    ...
2 6
i  0 i!
Note that by taking derivative s with respect to x, we get :
de x
2 x 3x 2
x2
 0 1

 ...  1  x   ...  e x
dx
2!
3!
2!
d 2e x
2x
 0 1
 ...
2
dx
2!
Now, Replacing x with tY , we get :

i
2
3
(
tY
)
(
tY
)
(
tY
)
e tY  
 1  tY 

 ... 
i!
2
6
i 0
t 2Y 2 t 3Y 3
 1  tY 

 ... 
2
6
Moment Generating Functions (II)
Taking derivative s with respect to t and evaluating at t  0 :
de tY
dt
t 0
d 2 e tY
dt 2
2tY 2 3t 2Y 3
t 2Y 3
2
 0Y 

 ...  Y  tY 
 ...  Y  0  0  ...  Y
2!
3!
2!
t 0
t 0
 0  Y 2  tY 3  ...
t 0
t 0
 Y 2  0  ...  Y 2
Taking the expected value of e tY , and labelling function as M (t ) :
 
M (t )  E e tY
i





ty
ty
  e p ( y )   
 p( y )
i! 
all y
all y  i  0
 
 M ' (t ) t 0  E (Y ), M ' ' (t ) t 0  E Y 2 , ... M ( k ) (t )
t 0
 
 E YK
M(t) is called the moment-generating function for Y, and can be used to
derive any non-central moments of the random variable (assuming it
exists in a neighborhood around t=0).
Also, useful in determining the distributions of functions of random
variables
Probability Generating Functions
Consider the function t Y and its derivatives :
dt Y
 Yt Y 1
dt
d 2t Y
Y 2

Y
(
Y

1)
t
dt 2
d ktY
Y k

Y
(
Y

1)...(
Y

(
k

1))
t
dt k
Let P (t )  E  t Y  :
k 3
 P '(t ) t 1  E (Y )
 P ''(t ) t 1  E Y (Y  1) 
 P ( k ) (t )
t 1
 E Y (Y  1)...(Y  (k  1)) 
k 3
P(t) is the
probability
generating
function for Y
Discrete Uniform Distribution
• Suppose Y can take on any integer value between a and b
inclusive, each equally likely (e.g. rolling a dice, where a=1
and b=6). Then Y follows the discrete uniform distribution.
f ( y) 
1
b  (a  1)
a yb
0
ya
 int ( y )  (a  1)
F ( y)  
a  y  b int( x)  integer portion of x
b

(
a

1
)

1
yb
b
a 1 
b


1
1
1
 b(b  1) (a  1)a  b(b  1)  a (a  1)
 
E (Y )   y
y

y 


 2  2   2(b  (a  1))
b

(
a

1
)
b

(
a

1
)
b

(
a

1
)
y a 
y 1 

 y 1
b
 b 2 a 1 2 


1
1
1
 b(b  1)( 2b  1) (a  1)a (2a  1) 
 
E Y 2   y 2 

 y   y  


b

(
a

1
)
b

(
a

1
)
b

(
a

1
)
6
6


y a
y 1


 y 1

b(b  1)( 2b  1)  a(a  1)( 2a  1)

6(b  (a  1))
 
b(b  1)( 2b  1)  a(a  1)( 2a  1)  b(b  1)  a(a  1) 
 V (Y )  E Y  E (Y ) 


6(b  (a  1))
 2(b  (a  1)) 
 
2
2
Note : When a  1 and b  n :
E (Y ) 
n 1
2
V (Y ) 
(n  1)( n  1)
12
s
(n  1)( n  1)
12
2
Bernoulli Distribution
• An experiment consists of one trial. It can result in one of
2 outcomes: Success or Failure (or a characteristic being
Present or Absent).
• Probability of Success is p (0&lt;p&lt;1)
• Y = 1 if Success (Characteristic Present), 0 if not
p
p( y)  
1  p
y 1
y0
1
E (Y )   yp ( y ) 0(1  p )  1 p  p
 
y 0
E Y 2  0 2 (1  p )  12 p  p
 
 V (Y )  E Y 2  E (Y )  p  p 2  p (1  p )
s 
p (1  p )
2
Binomial Experiment
• Experiment consists of a series of n identical trials
• Each trial can end in one of 2 outcomes: Success or
Failure
• Trials are independent (outcome of one has no
bearing on outcomes of others)
• Probability of Success, p, is constant for all trials
• Random Variable Y, is the number of Successes in
the n trials is said to follow Binomial Distribution with
parameters n and p
• Y can take on the values y=0,1,…,n
• Notation: Y~Bin(n,p)
Binomial Distribution
Consider outcomes of an experiment with 3 Trials:
SSS  y  3 P ( SSS )  P (Y  3)  p (3)  p 3
SSF , SFS , FSS  y  2 P ( SSF  SFS  FSS )  P (Y  2)  p (2)  3 p 2 (1  p )
SFF , FSF , FFS  y  1 P( SFF  FSF  FFS )  P (Y  1)  p (1)  3 p (1  p ) 2
FFF  y  0 P( FFF )  P(Y  0)  p(0)  (1  p)3
In General:
n
n!
1) # of ways of arranging y S s (and (n  y) F s ) in a sequence of n positions    
 y  y !(n  y )!
2) Probability of each arrangement of y S s (and (n  y ) F s )  p y (1  p) n  y
n
3)  P (Y  y )  p ( y )    p y (1  p ) n  y
 y
EXCEL Functions:
y  0,1,..., n
p ( y ) is obtained by function:  BINOM.DIST(y, n, p, 0)
F ( y ) is obtained by function:  BINOM.DIST(y, n, p,1)
n
Binomial Expansion: ( a  b)     a i b n i
i 0  i 
n
n
n y
n
  p ( y )     p (1  p ) n  y   p  (1  p )   1  &quot;Legitimate&quot; Probability Distribution
y 0
y 0  y 
n
n
Binomial Distribution (n=10,p=0.10)
0.5
0.45
0.4
0.35
p(y)
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
y
6
7
8
9
10
Binomial Distribution (n=10, p=0.50)
0.5
0.45
0.4
0.35
p(y)
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
y
6
7
8
9
10
Binomial Distribution(n=10,p=0.8)
0.35
0.3
0.25
p(y)
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
y
6
7
8
9
10
Binomial Distribution – Expected Value
f ( y) 
n!
p y q n y
y!(n  y )!
y  0,1,..., n q  1  p
n



n!
n!
y n y 
E (Y )   y 
p q    y
p y q n y 
y  0  y!( n  y )!
 y 1  y!(n  y )!

(Summand  0 when y  0)
n
n


yn!
n!
y n y 
y n y 
 E (Y )   
p q   
p q 
y 1  y ( y  1)! ( n  y )!
 y 1  ( y  1)!(n  y )!

Let y *  y  1  y  y *  1 Note : y  1,..., n  y *  0,..., n  1
n
n 1
n(n  1)!
(n  1)!
y*1 n  ( y*1)
y* ( n 1)  y*
 E (Y )   *
p
q

np
p
q


*
*
*
y * 0 y ! n  ( y  1) !
y * 0 y ! ( n  1)  y !
n 1



 np ( p  q ) n 1  np p  (1  p) 
n 1
 np(1)  np

Binomial Distribution – Variance and S.D.
f ( y) 
n!
p y q n y
y!(n  y )!
y  0,1,..., n q  1  p
 
 
Note : E Y 2 is difficult (impossibl e?) to get, but E Y (Y  1)   E Y 2  E (Y ) is not :
n



n!
n!
y n y 
E Y (Y  1)    y ( y  1) 
p q    y ( y  1) 
p y q n y 
y 0
 y!(n  y )!
 y 2
 y!(n  y )!

(Summand  0 when y  0,1)
n
n
 E Y (Y  1)   
n!
p y q n y
y  2 ( y  2)! ( n  y )!
Let y **  y  2  y  y **  2
 E Y (Y  1)  
n2

y** 0
Note : y  2,..., n  y **  0,..., n  2
n2
n(n  1)( n  2)! y** 2 n ( y** 2 )
(n  2)!
2
p
q

n
(
n

1
)
p
p y**q ( n  2 )  y** 

**
**
**
*
y ! n  ( y  2) !
y** 0 y ! ( n  2)  y !



 n(n  1) p 2 ( p  q) n  2  n(n  1) p 2  p  (1  p ) 
n2
 

 n(n  1) p 2
 E Y 2  E Y (Y  1)   E (Y )  n(n  1) p 2  np  np[( n  1) p  1]  n 2 p 2  np 2  np  n 2 p 2  np(1  p)
 
 V (Y )  E Y 2  E (Y )  n 2 p 2  np (1  p )  (np ) 2  np (1  p)
 s  np (1  p )
2
Binomial Distribution – MGF &amp; PGF
 
M (t )  E e
tY
 n  y

n y
  e   p (1  p )  
y 0
 y 

n
ty
n
    pe t
y 0  y 

n

 (1  p)
y
M ' (t )  n pe t  (1  p )



n 1
n y

 pe t  (1  p )


n2



n 1
et
pe t e t  pe t  (1  p )
 E (Y )  M ' (0)  np  p (1)  (1  p ) 
n 1

n
pe t  np pe t  (1  p )
M ' ' (t )  np (n  1) pe t  (1  p )
 

(1)  np
 E Y 2  M ' ' (0)  np ( n  1) p (1)  (1  p ) 
n2
 e 
n 1
t

p (1) (1)   p (1)  (1  p ) 
 np ( n  1) p  1  n 2 p 2  np 2  np  n 2 p 2  np (1  p )
 
 V (Y )  E Y 2  E (Y )  n 2 p 2  np (1  p )  ( np ) 2  np (1  p )
s 
2
np (1  p )
 
P (t )  E t
Y
 n  y

n y
  t   p (1  p )  
y 0
 y 

n
y
n
y
n
    pt  (1  p ) n  y   pt  (1  p ) 
y 0  y 
n
n 1

[1] 
Geometric Distribution
• Used to model the number of Bernoulli trials needed until
the first Success occurs (P(S)=p)
– First Success on Trial 1  S, y = 1  p(1)=p
– First Success on Trial 2  FS, y = 2  p(2)=(1-p)p
– First Success on Trial k  F…FS, y = k  p(k)=(1-p)k-1 p
p ( y )  (1  p ) y 1 p
y  1,2,...



y 1
y 1
y 1
y 1
y 1
p
(
y
)

(
1

p
)
p

p
(
1

p
)



Setting y *  y  1 and noting that y  1,2,...  y *  0,1,...

 p
1
  p ( y )  p  (1  p )  p 
 1

y 1
y * 0
1  (1  p )  p


y*
Geometric Distribution - Expectations
dq y
d  y
d   y 1 
E (Y )   y  q p   p 
 p  q  p q q  
dq y 1
dq  y 1
y 1
y 1 dq

 (1  q )(1)  q (1)  p  (1  q )  q  p 1
d  q 
p 

p

 2 



2
2
dq 1  q 
(1  q )
(1  q)
p
p



y 1


E Y (Y  1)    y ( y  1)  q
y 1
y 1
d2
 pq 2
dq
d 2q y
d2
p   pq 
 pq 2
2
dq
y 1 dq
 V (Y )  E Y 2    E (Y )  
2
q
p2

d2
q  pq 2

dq
y 1
y
  y 1 
q  q  
 y 1

 q 
d
1
2 pq
2 pq 2q
3

pq

pq

2(1

q
)
(

1)


 2


1  q 
3
2
3
dq (1  q)
p
p
1  q 


 E Y 2   E Y (Y  1)   E (Y ) 
s 

2q 1 2(1  p)  p 2  p
 
 2
2
2
p
p
p
p
2
2 p 1 
2  p 1 1  p q


 2  2


2
2
p
p
p
p
 p
Geometric Distribution – MGF &amp; PGF
 
M (t )  E e
pqe

q
tY
t
 
P (t )  E t
Y
 



p
p
  e ty q y 1 p   e ty q y   qe t
q y 1
q y 1
y 1
 qe 

t y 1
y 1
t
t
pe
pe


t
1  qe 1  (1  p )e t



p
p
y
y y 1
y y
  t q p   t q   tq  
q y 1
q y 1
y 1
ptq 
pt
pt
y 1
tq 



q y 1
1  tq 1  (1  p )t
y

Negative Binomial Distribution
• Used to model the number of trials needed until the rth
Success (extension of Geometric distribution)
• Based on there being r-1 Successes in first y-1 trials,
followed by a Success
 y  1 r
 p (1  p ) y  r y  r , r  1,...
p ( y )  
 r 1
r
E (Y ) 
(Proof Given in Chapter 5)
p
r (1  p )
V (Y ) 
(Proof Given in Chapter 5)
2
p
Poisson Distribution
• Distribution often used to model the number of
incidences of some characteristic in time or space:
– Arrivals of customers in a queue
– Numbers of flaws in a roll of fabric
– Number of typos per page of text.
• Distribution obtained as follows:
–
–
–
–
–
–
Break down the “area” into many small “pieces” (n pieces)
Each “piece” can have only 0 or 1 occurrences (p=P(1))
Let l=np ≡ Average number of occurrences over “area”
Y ≡ # occurrences in “area” is sum of 0s &amp; 1s over “pieces”
Y ~ Bin(n,p) with p = l/n
Take limit of Binomial Distribution as n  with p = l/n
Poisson Distribution - Derivation
n!
n!
l   l 
p( y ) 
p y (1  p ) n  y 
  1  
y!(n  y )!
y!(n  y )!  n   n 
Taking limit as n   :
y
n!
l  l
lim p ( y )  lim
  1  
n 
n  y!( n  y )! n
   n
y
ly
n y
n y
ly
n(n  1)...( n  y  1)( n  y )!  l   n  l 
 lim
1   

y! n
n y (n  y )!
 n  n 
n
n(n  1)...( n  y  1)  l 
ly
 n  n  1   n  y  1  l 
 lim
1    lim 


...
1  
y
y! n
(n  l )
y! n n  l  n  l   n  l  n 
 n
n
 n 
 n  y 1
Note : lim 
 ...  lim 

  1 for all fixed y
n  n  l
n 


 nl 
ly
 l
 lim p ( y )  lim 1  
n 
y! n n 
n
n
 a
From Calculus, we get : lim 1    e a
n 
 n
ly
e l l y
 lim p ( y )  e l 
y  0,1,2,...
n 
y!
y!

Series expansion of exponentia l function : e x  
x 0


e l
l
 e l   e l e l  1  &quot; Legitimate &quot; Probabilit y Distributi on
y!
y 0
y  0 y!

  p( y )  
y 0
l
xi
i!
y
EXCEL Functions :
p ( y ) :  POISSON(y, l ,0)
F ( y ) :  POISSON(y, l ,1)
y
n
y

Poisson Distribution - Expectations
el ly
f ( y) 
y!
y  0,1,2,...

 e l l y    e l l y   e l l y
l y 1
l
l l
E (Y )   y 

y


l
e

l
e
e l

  
 
y!  y 1  y!  y 1 ( y  1)!
y 0 
y 1 ( y  1)!

 e l l y  
 e l l y   e l l y
E Y (Y  1)    y ( y  1) 

   y ( y  1) 

y 0
 y!  y  2
 y!  y  2 ( y  2)!

ly 2

 l2 e l 
y 2
 
( y  2)!
 l2 e l e l  l2
 E Y 2  E Y (Y  1)   E (Y )  l2  l
 
 V (Y )  E Y 2  E (Y )  l2  l  [l ]2  l
s  l
2
Poisson Distribution – MGF &amp; PGF
l
e l 
e
M (t )  E e    e 

y 0
 y!  y 0

tY
e
l


y 0

y
l
ty
le 
t y
y!
 l le t
e e

e
l e t 1
le 
t y
y!

e l 
e lt 
P(t )  E t    t 


y!
y 0
 y!  y 0
l

Y
e
l
y

l
y


y 0
lt 
y!
y
 l lt
e e e
l ( t 1)
y

Hypergeometric Distribution
• Finite population generalization of Binomial Distribution
• Population:
– N Elements
– k Successes (elements with characteristic if interest)
• Sample:
– n Elements
– Y = # of Successes in sample (y = 0,1,,,,,min(n,k)
 k  N  k 
 

y n y 
p ( y )   
N
 
n
y  0,1,..., min( n, k )
k
E (Y )  n 
N
 k  N  k  N  n 
V (Y )  n 


 N  N  N  1 
(Proof in Chapter 5)
(Proof in Chapter 5)
```