Poisson Distribution Goals in English Premier Football League – 2006/2007 Regular Season Poisson Distribution • Distribution often used to model the number of incidences of some characteristic in time or space: – Arrivals of customers in a queue – Numbers of flaws in a roll of fabric – Number of typos per page of text. • Distribution obtained as follows: – Break down the “area” into many small “pieces” (n pieces) – Each “piece” can have only 0 or 1 occurrences (p=P(1)) – Let l=np ≡ Average number of occurrences over “area” – Y ≡ # occurrences in “area” is sum of 0s & 1s over “pieces” – Y ~ Bin(n,p) with p = l/n – Take limit of Binomial Distribution as n with p = l/n Poisson Distribution - Derivation n! n! l l p( y ) p y (1 p ) n y 1 y!(n y )! y!(n y )! n n Taking limit as n : y n! l l lim p ( y ) lim 1 n n y!( n y )! n n y ly n y n y ly n(n 1)...( n y 1)( n y )! l n l lim 1 y! n n y (n y )! n n n n(n 1)...( n y 1) l ly n n 1 n y 1 l lim 1 lim ... 1 y y! n (n l ) y! n n l n l n l n n n n n y 1 Note : lim ... lim 1 for all fixed y n n l n nl ly l lim p ( y ) lim 1 n y! n n n n a From Calculus, we get : lim 1 e a n n ly e l l y lim p ( y ) e l y 0,1,2,... n y! y! Series expansion of exponentia l function : e x x 0 e l l e l e l e l 1 " Legitimate " Probabilit y Distributi on y! y 0 y 0 y! p( y ) y 0 l xi i! y EXCEL Functions : p ( y ) : POISSON(y, l ,0) F ( y ) : POISSON(y, l ,1) y n y Poisson Distribution - Expectations el ly f ( y) y! y 0,1,2,... e l l y e l l y e l l y l y 1 l l l E (Y ) y y l e l e e l y! y 1 y! y 1 ( y 1)! y 0 y 1 ( y 1)! e l l y e l l y e l l y E Y (Y 1) y ( y 1) y ( y 1) y 0 y! y 2 y! y 2 ( y 2)! ly 2 l2 e l y 2 ( y 2)! l2 e l e l l2 E Y 2 E Y (Y 1) E (Y ) l2 l V (Y ) E Y 2 E (Y ) l2 l [l ]2 l l 2 Example – English Premier League • Total Goals Per Game (Both Teams) – Mean=2.47 Variance=2.49 • Goals by Team by Half – – – – Home Team, 1st Half: Road Team, 1st Half: Home Team, 2nd Half: Road Team, 2nd Half: Mean=0.68 Variance=0.73 Mean=0.44 Variance=0.39 Mean=0.77 Variance=0.75 Mean=0.58 Variance=0.83* *Does not reject based on Goodness-of-Fit test Goals by Team by Half Observed Counts Goals 0 1 2 3 4 5+ All 828 492 157 31 9 0 Home1 199 121 46 11 3 0 Road1 236 122 21 0 1 0 Home2 175 134 56 12 3 0 Road2 218 115 34 8 2 0 Expected Counts Under Poisson Model Goals 0 1 2 3 4 5+ All 818.97 506.47 156.60 32.28 4.99 0.69 Home1 192.72 130.84 44.42 10.05 1.71 0.26 Road1 244.22 107.97 23.87 3.52 0.39 0.04 Home2 175.30 135.63 52.47 13.53 2.62 0.46 Road2 212.99 123.31 35.69 6.89 1.00 0.13 Goodness of Fit Tests (Lumping 3 and More Together for Team Halves) Goals 0 1 2 3+ Chi-Square P-value Home1 0.2048 0.7407 0.0563 0.3263 1.3282 0.7225 Road1 0.2766 1.8229 0.3444 2.1967 4.6407 0.2001 Home2 0.0005 0.0195 0.2381 0.1563 0.4144 0.9373 Road2 0.1181 0.5597 0.0804 0.4928 1.2509 0.7408 For each cell, the contributi on to the Chi - Square statistic is obtained by : 2 observed expected X2 expected Under the null hypothesis that the Poisson model fits, the chi - square statistic follows the 32 - distributi on Correlations Among Goals Scored Correlations Home1 Road1 Home2 Road2 Home1 Road1 Home2 Road2 1.0000 0.0491 0.0262 -0.0587 0.0491 1.0000 -0.0388 -0.0475 0.0262 -0.0388 1.0000 -0.0771 -0.0587 -0.0475 -0.0771 1.0000 t-test (r=0) Home1 Road1 Home2 Road2 Home1 Road1 Home2 Road2 #N/A 1.0047 0.5239 -1.0774 1.0047 #N/A -0.7259 -0.8808 0.5239 -0.7259 #N/A -1.3910 -1.0774 -0.8808 -1.3910 #N/A The t - Statistic for testin g whether t he " population correlatio ns" are 0 : r tobs 1 r 2 n2 Under the hypothesis that r 0, This statistic is distribute d (approxima tely) N (0,1) Observed and Expected Counts - Total Goals Per Game 120 100 Frequency 80 Observed Expected 60 40 20 0 -1 0 1 2 3 Goals 4 5 6 7