Poisson Distribution - English Premier League Soccer Scores 2006/7 Season

advertisement
Poisson Distribution
Goals in English Premier Football
League – 2006/2007 Regular Season
Poisson Distribution
• Distribution often used to model the number of
incidences of some characteristic in time or space:
– Arrivals of customers in a queue
– Numbers of flaws in a roll of fabric
– Number of typos per page of text.
• Distribution obtained as follows:
– Break down the “area” into many small “pieces” (n
pieces)
– Each “piece” can have only 0 or 1 occurrences (p=P(1))
– Let l=np ≡ Average number of occurrences over “area”
– Y ≡ # occurrences in “area” is sum of 0s & 1s over
“pieces”
– Y ~ Bin(n,p) with p = l/n
– Take limit of Binomial Distribution as n  with p = l/n
Poisson Distribution - Derivation
n!
n!
l   l 
p( y ) 
p y (1  p ) n  y 
  1  
y!(n  y )!
y!(n  y )!  n   n 
Taking limit as n   :
y
n!
l  l
lim p ( y )  lim
  1  
n 
n  y!( n  y )! n
   n
y
ly
n y
n y
ly
n(n  1)...( n  y  1)( n  y )!  l   n  l 
 lim
1   

y! n
n y (n  y )!
 n  n 
n
n(n  1)...( n  y  1)  l 
ly
 n  n  1   n  y  1  l 
 lim
1    lim 


...
1  
y
y! n
(n  l )
y! n n  l  n  l   n  l  n 
 n
n
 n 
 n  y 1
Note : lim 
 ...  lim 

  1 for all fixed y
n  n  l
n 


 nl 
ly
 l
 lim p ( y )  lim 1  
n 
y! n n 
n
n
 a
From Calculus, we get : lim 1    e a
n 
 n
ly
e l l y
 lim p ( y )  e l 
y  0,1,2,...
n 
y!
y!

Series expansion of exponentia l function : e x  
x 0


e l
l
 e l   e l e l  1  " Legitimate " Probabilit y Distributi on
y!
y 0
y  0 y!

  p( y )  
y 0
l
xi
i!
y
EXCEL Functions :
p ( y ) :  POISSON(y, l ,0)
F ( y ) :  POISSON(y, l ,1)
y
n
y

Poisson Distribution - Expectations
el ly
f ( y) 
y!
y  0,1,2,...

 e l l y    e l l y   e l l y
l y 1
l
l l
E (Y )   y 

y


l
e

l
e
e l

  
 
y!  y 1  y!  y 1 ( y  1)!
y 0 
y 1 ( y  1)!

 e l l y  
 e l l y   e l l y
E Y (Y  1)    y ( y  1) 

   y ( y  1) 

y 0
 y!  y  2
 y!  y  2 ( y  2)!

ly 2

 l2 e l 
y 2
 
( y  2)!
 l2 e l e l  l2
 E Y 2  E Y (Y  1)   E (Y )  l2  l
 
 V (Y )  E Y 2  E (Y )  l2  l  [l ]2  l
  l
2
Example – English Premier League
• Total Goals Per Game (Both Teams)
– Mean=2.47
Variance=2.49
• Goals by Team by Half
–
–
–
–
Home Team, 1st Half:
Road Team, 1st Half:
Home Team, 2nd Half:
Road Team, 2nd Half:
Mean=0.68 Variance=0.73
Mean=0.44 Variance=0.39
Mean=0.77 Variance=0.75
Mean=0.58 Variance=0.83*
*Does not reject based on Goodness-of-Fit test
Goals by Team by Half
Observed Counts
Goals
0
1
2
3
4
5+
All
828
492
157
31
9
0
Home1
199
121
46
11
3
0
Road1
236
122
21
0
1
0
Home2
175
134
56
12
3
0
Road2
218
115
34
8
2
0
Expected Counts Under Poisson Model
Goals
0
1
2
3
4
5+
All
818.97
506.47
156.60
32.28
4.99
0.69
Home1
192.72
130.84
44.42
10.05
1.71
0.26
Road1
244.22
107.97
23.87
3.52
0.39
0.04
Home2
175.30
135.63
52.47
13.53
2.62
0.46
Road2
212.99
123.31
35.69
6.89
1.00
0.13
Goodness of Fit Tests (Lumping 3 and
More Together for Team Halves)
Goals
0
1
2
3+
Chi-Square
P-value
Home1
0.2048
0.7407
0.0563
0.3263
1.3282
0.7225
Road1
0.2766
1.8229
0.3444
2.1967
4.6407
0.2001
Home2
0.0005
0.0195
0.2381
0.1563
0.4144
0.9373
Road2
0.1181
0.5597
0.0804
0.4928
1.2509
0.7408
For each cell, the contributi on to the Chi - Square statistic is obtained by :
2


observed
expected
X2 
expected
Under the null hypothesis that the Poisson model fits, the chi - square statistic
follows the  32 - distributi on
Correlations Among Goals Scored
Correlations
Home1
Road1
Home2
Road2
Home1
Road1
Home2
Road2
1.0000 0.0491 0.0262 -0.0587
0.0491 1.0000 -0.0388 -0.0475
0.0262 -0.0388 1.0000 -0.0771
-0.0587 -0.0475 -0.0771 1.0000
t-test (r=0)
Home1
Road1
Home2
Road2
Home1
Road1
Home2
Road2
#N/A
1.0047 0.5239 -1.0774
1.0047
#N/A
-0.7259 -0.8808
0.5239 -0.7259
#N/A
-1.3910
-1.0774 -0.8808 -1.3910
#N/A
The t - Statistic for testin g whether t he " population correlatio ns" are 0 :
r
tobs 
1 r 2
n2
Under the hypothesis that r  0, This statistic is distribute d (approxima tely) N (0,1)
Observed and Expected Counts - Total Goals Per Game
120
100
Frequency
80
Observed
Expected
60
40
20
0
-1
0
1
2
3
Goals
4
5
6
7
Download