Var (X)

advertisement
Coupon Conspiracy
Heather Parsons
Dickens Nyabuti
Ben Speidel
Megan Silberhorn
Mark Osegard
Nick Thull
Introduction
Today we are going explain two
coupon collecting examples that
incorporate important statistical
elements. We will first begin by using
a well-known example of a coupon
collecting-type of game.
We are going to use the
McDonald’s Monopoly Best
Chance Game to connect a real
world example with our Coupon
Collecting Examples.
McDonald’s Scam
©
List of Prizes Allegedly Won
By Fraudulent Means
Dodge Viper
1996 (1) $100,000 Prize
(2) $1,000,000 Prizes
1997 (1) $1,000,000 Prize
1998 $200,000
1999 (3) $1,000,000 Prizes
2000 (3) $1,000,000 Prizes
2001 (3) $1,000,000 Prizes
McDonald’s Monopoly
©
The Odds of Winning
Instant Win • 1 in 18,197
Collect & Win • 1 in 2,475,041
Best Buy Gift Cards • 1 in 55,000
Important Statistical Elements
• Expected Value – gives the average behavior of
the Random Variable X
Denoted as: E X 
xP X x
  
i

i

• Variance – measures spread, variability, and
dispersion of X
Denoted as: Var ( X )  E[ X  E  X 2 ]
We will use these concepts in more detail in our
fair coupon collecting examples later in our
discussion.
Review
• Sample space – the set of all possible
outcomes of a random experiment (S)
• Event – any subset of the sample space
• Probability – is a set function defined on
the power set of S
So if A  S ,
then P(A) = probability of A
Review Continued
Bernoulli Random Variable
• Toggles between 0 and 1 values
• Where 1 represents a success and 0 a failure
• Success probability = p
• Probability of Failure = (1-p)
Bernoulli Trials
• Experiments having two possible outcomes
• Independent sequences of Bernoulli
RV’s with the same success probability
E[X] = p
Var(X) = p(1-p)
Review Continued
Geometric Random Variables
Performing Bernoulli Trials:
P=success probability
X=# of trials until 1st success
Assume 0  P  1
1
EX  
P

1 p
Var  X  
p2
Expectations of Sums of Random
Variables
• For single valued R.V.’s:
E[ X ]   x i P{X  x i }
i
• Suppose g : R  R
2
For multiple R.V.’s:
( i.e. g(x,y) = g )
E[g(X, Y)]  x y g(x, y) P{X  x, Y  y}
Properties of Sums of Random
Variables
– Expectations of Random Variables can be
summed.
– Recall:
E[g(X, Y)]  x y g(x, y) P{X  x, Y  y}
– Corollary:
E[X  Y]  E[X]  E[Y]
– More generally:
n
n
i 1
i 1
E[  X i ]   E[X i ]
Sums of Random Variables
- Sums of Random Variables can be summed
up and kept in its own Random Variable

i.e.
Y   Xi
i 0
Where Y is a R.V. and X i is the i th instance
of the Random Variable X
Variance & Covariance of
Random Variables
•
Variance a measure of spread and variability
var (X)  E[(X  E[X]) ]
var  X  Y   var  X   var Y   2 cov X,Y 
2
•
Facts:
–
–
•
i)
ii)
var (X)  E[X ]  (E[X] )
var (aX  b)  a 2Var(X)
2
2
Covariance
– Measure of association between two r.v.’s
cov (X,Y)  E[(X-E[X])(Y-E[Y])]
Properties of Covariance
Where X, Y, Z are random variables, C constant:
 cov (X,X)  var (X)
 cov (X,Y)  E[XY] – E[X]E[Y]
 cov (X,Y)  cov (Y,X)
 cov (cX,Y)  c * cov (X,Y)
 cov (X,Y  Z)  cov (X,Y)  cov (X,Z)
McDonald’s
Coupon Conspiracy I
©
Suppose there are M different types of coupons, each
equally likely.
Let
X = number of coupons one needs to collect in order
to get the entire set of coupons.
Problem:
Find the expected value and variance of X
i.e E [X]
Var (X)
Solution:
Idea: Break X up into a sum of simpler random variables
Let
Xi = number of coupons needed after i distinct types have been
collected until a new type has been obtained.
Note:
m 1
X Xi
i 1
the Xi are independent so
m 1
E X    E
i 1
X 
m 1
i
Var  X    Var  X i 
i 1
Observe:
If we already have i distinct types of coupons, then using Geometric
Random Variables,
(m  i )
P(next is new) 
m
Regarding each coupon selection as a trial
Xi = number of trials until the next success
and
(m  i )
P(next is new) 
m
We know that the
E
X   1p  mm i
i
2


1  p  i  m
mi

Var  X i  




2
2
2

m




p
 (m i)  (m i)
so
m 1
 
m 1
 m 
EX    E X i   

i 0
i 0  m  i 
m 1
m 1 
 mi
Var  X    Var  X i    
i 0
i  0  m i




2



so
m 1
m 1
m
1


EX    
  m
i 0  m  i 
i 0 m  i
1
1
1 1
1
 m 

 ..........   
2 1
 m m 1 m  2
reversing the expression above,
1
 1 1
 m 1    ..........  
m
 2 3
 m 1
 m  
 i 1 i 
Euler’s Constant
 
lim
m 
 m 1

 log m 

 i 1 i

so
m
1
 log m

i 1 i
 m 1
 E  X   m    m log m
 i 1 i 
So,


Var  X    
i 0 

m 1 

 
i 1 

m 1

mi 
2
m i 

mi 
2
m i 






i


 m 
2

i 1  m i


to simplify t his, we will use a trick in the next slide
m 1


Trick : adding m to and subtractin g m from the
numerator of the sum.
i

i  m  m

mi mi
2
2



m  m  i 
mi
2
m

m  i 
mi mi
2
i

2
m

1
mi mi mi
2
2
Applying the trick from the previous slide to,


m 
i 1 

m 1


m 
i 1 

m 1
i


2


mi
m
we get,

m 1 


 m 
2

i 1 


mi 
1
mi 





pulling out the m in the first sum,

2

m 

i 1

m 1
1

m 1 


 m 
2

i 1 


mi 
1
mi 




Expanding the sums from the previous slide
m
2




1
1


1 
 .....  2  m
2
1 
m1 m2
2





1

1
m1 m2

1 
 ..... 
1 

reversing the order of the sum above,
m
2

 1  1  ...... 
2
 2
2
1
  
m
2
m 1

i 1
1
i
2
m


2
 m
m 1 
1

m 1

i 1
1
i


 1  1  ...... 

2
1
  
1
m1




By the Basel series we will
simplify this in the next slide.
Explaining the Basel Series
Basel Series
1

2
1

2
1 2


......

2
6
3
1
2

for a large m, the Basel Series converges to
6
2


2 
2


Var  X   m
 m log( m)  m
 6 


2
In Conclusion
So clearly the variance is:
Var  X   m
2
and the expected value is:
EX   m log m
These approximations are dependent on the fact that
m is a large number approaching infinity. Where m
is the number of different types of coupons.
McDonald’s
Coupon Conspiracy II
©
The
nd
2
Coupon Conspiracy
Given:
 Sample of n coupons
 m possible types
 X := number of distinct types of coupons
Find:
 E[X]
(expected value of x)
 Var(X) (variance of x)
Redefining X
Decompose X into a sum of Bernoulli indicators
1 if a type i is present
Let X i  
0 otherwise
1 i  m
Note: X 
m
X
i 1
Note: E  X  
i
m
EX 
i 1
Remark: X1 , X 2 ,
i
, X m are dependent
(knowing one coupon type occured lowers
the opportunity for the other coupons types to occur)
Redefining Var(X)
Re call : Var  X   Cov  X , X 
m
 m

 m

Var  X   Var   X i   Cov   X i ,  X i 
i 1
 i 1

 i 1

m


  Cov  X i ,  X j 
i 1
j 1


m

  Cov  X
m
i 1
m
j 1
(by covariance bilineararity)
i
,Xj
Summary thus far
EX  
m
 EX 
i
i 1
Var  X    Cov  X i , X j 
m
m
i 1 j 1
Success Probability
Recall: X i is Bernoulli, with success Probabilitly P
P  P{ X i  1}  1  P{ X i  0}
Note: P{ X i  0}  P{type i does not occur on jth trial}
n
P
j 1
n
 1   P{X i  0}
j 1
 m-1 
 1 

m


n
n
m-1


Thus E[ X i ]  1  

 m 
Recap
n
 m-1 
E[ X i ]  1  

m


 m-1 
Var[ X i ]  

m


So, E[ X ] 
n 
n
m-1  

1  
 

 m  


m
 E[ X
i 1
i
]

m-1  n 

  1  
 
 m  
i 1 


n

m-1  

 m 1  
 

 m  


m
Var(X)
To calculate Var ( X ), we need to find Cov  X i , X j 
Cov  X i , X j   E[ X i X j ]  E[ X i ]E[ X j ]
In particular when i  j
E[ X i X j ]
Let A k  event that type k is present
E[ X i X j ]  P{ X i X j  1}
 P  Ai
A j   1  P  Ai
By De Morgan,

c
c
 1  Ai A j

A
Aj 
c
Aj 
c
i
 Ai c
A jc
Using the Inclusion/Exclusion rule:
P  A
B   P  A  P  B   P  A
    

 1   P Aic  P A jc  P Aic A jc 


B 
E[ X i X j ] cont.
Re call :

P Ai
c

 m 1 


 m 

n

, P Aj
P Ai c

 m 1 


 m 
n

 m2


m


n
c
A jc
    

E[ X i X j ]  1   P Aic  P A jc  P Aic A jc 


n
n
 m 1 n
m

1
m

2



 
 
 1  




 

m
m
 m 







n
 m 1 n
m

2

 
 
 1   2

 

m
m
 
 
 

Cov ( X i X j )
n

m

1


Re call : E[ X i ]  E[ X j ]   1  
 

 m  


  m 1  n  m  2  n 
E[ X i X j ]  1   2
 
 
  m 
 m  


n
  m 1 n  m  2 n  
m

1



Cov ( X i , X j ) = 1   2
 
   1  

m  
  m   m   


 

n
n
n
2n
m

1
m

2
m

1
m

1








 1 2


1

2








 m 
 m 
 m 
 m 
n
2n
m2
 m 1 

 

m
m




2
Back to the Var(X)
Re call : Var  X i   Cov( X i , X j )
m
m
i 1
j 1
Var ( X )= Cov( X i , X j )
 Cov( X1 , X 1 ),
Cov( X , X ),
1
2

Cov( X1 , X 3 ),

 Cov( X1 , X 4 ),


Cov( X1 , X m ),
m
Cov( X 2 , X 2 ),
Cov( X 2 , X 3 ), Cov( X 3 , X 3 ),
Cov( X 2 , X 4 ), Cov( X 3 , X 4 ), Cov( X 4 , X 4 ),
Cov( X 2 , X m ), Cov( X 3 , X m ), Cov( X 4 , X m ),
m
m








Cov( X m , X m ) 
  Cov( X i , X i )  2 Cov( X i , X j )
i 1
i 1 j i
Var(X) in terms of Covariance
m
m
m
Var ( X )   Cov( X i , X i )  2 Cov( X i , X j )
i 1
i 1 j i
Simplifying the summation, We get:
m
m
i 1
i 1
  Cov( X i , X i )  2 m  m  1 Cov( X i , X j )
Var(X) equals
m
m
i 1
i 1
Var ( X )   Cov( X i , X i )  2 m  m  1 Cov( X i , X j )
n
 m  1    m  1  
Recall: Var ( X i ) = m 
 1 

 m    m  


n
 m 1 
 m

m


n
  m  1 n 
2
1  


2
m
 2m

  m  




  m  2  n  m  1  2n 

 
 
 m   m  


Simplified ,
n
n
2n
m

1
m

2
m

1





2
Var ( X )  m 

m
(
m

1)

m





 m 
 m 
 m 
In Case you Forgot:
Independent Case:
E[ X ]  m log m
Var ( X )  m
2
Dependent Case:
n

 m-1  

E[ X ]  m 1  

m


 


 m 1 
Var ( X )  m 

m


n
n
2
n
 m2
 m 1 
2
 m(m  1) 
 m 

m
m




Analytic vs. Simulation
Approaches
For a simulation we will consider a fair set
of the McDonald’s Coupons from Problem
I.
We will analytically find E[X] and Var(X)
and compare these two values with the
results from our computer simulation.
Analytic Approach
Given M = 10, compute
 m 1
E  X   m    m log m
 i 1 i 
2


2 
2


Var  X   m
 m log( m)  m
 6 


•References
Sheldon Ross
Author of “Probability Models for Computer
Science”, Academic Press 2002
2-3 pages from C++ books
McDonald’s
Dr. Deckelman
©
Any Questions???
Download