Binomial This is a brief description of the Binomial Distribution and how we can apply it to set thresholds for our models. We know there is some uncertainty in predictions. Suppose we toss a coin 8 times. We expect it to come up heads 4 times. Suppose further it comes up heads 5 times, or perhaps 3 times. We would think those outcomes lie in the realm of possible and normal variation. But if it came up heads 6 times, we might suspect the fairness of the coin. With a portfolio of 10,000 loans, our forecasting model might predict a default rate of 2% within one year, or 200 loans. If the actual number turned out to be 189 or 205, we would attribute the difference to normal variation in our uncertain world. But, if the actual number was 165 or 375, should we also have expected them? Or would these numbers differ sufficiently from the expected 200 that we should suspect our model? We can apply the Binomial Distribution to answer our questions. We can describe the distribution with these expressions: U(N,j) = [(N)*(N-1)*(N-2)*…*(j+2)*(j+1)]/[(N-j)*(N-j-1)*(N-j-2)*…*(2)*(1)] V(N,j,p) =U(N,j) * pj * (1 – p)(N-j) Note: If N=j, then U(N,j)=1. V(N,I,p) is the probability that j of N loans will default (or j of N coin-tosses will come up heads), if the probability of default is p. For our coins, U(8,6)=[8*7]/[2*1]=28, and V(8,6,0.5)=28*(0.5)6*(1-0.5)(8-6)=0.1094. We can expect 6 heads in 8 tosses about 11 percent of the time. The average number of defaults is Np. The variance (square of the standard deviation) is Np(1p). Several hundred years ago, Abraham de Moivre proved mathematically that the normal (or Gaussian) distribution approximates the binomial distribution. Indeed, for populations where N>30, the two distributions are equivalent for nearly all practical purposes. The practical equivalency of the normal distribution implies we can use well-known proportions to define our thresholds of expected normal variation. In 95% of the cases, the actual outcome j will fall in the interval Np – 2*sqrt(Np(1-p)) < j < Np + 2*sqrt(Np(1-p)) In 68% of the cases, the actual outcome j will fall in the interval Np – 1*sqrt(Np(1-p)) < j < Np + 1*sqrt(Np(1-p)) For our N=10,000 loans with probability p=0.02 of default, we calculate 95% interval: 172 < j < 228 68% interval: 186 < j < 214 95% of cases will normally have 172 to 228 defaults. 68% of cases will normally have 186 to 214 defaults. But 375 defaults won’t likely occur in 95 of 100 cases. If we see 375 actual defaults, then we should suspect the model. For our model performance thresholds, we can use the 95% and 68% intervals. Thresholds Actual < (Np - 2*sqrt(Np(1-p))) or (Np + 2*sqrt(Np(1-p))) < Actual (Np - 2*sqrt(Np(1-p))) <= Actual <= (Np - 1*sqrt(Np(1-p))) or (Np + 1*sqrt(Np(1-p))) <= Actual <= (Np + 2*sqrt(Np(1-p))) (Np - 1*sqrt(Np(1-p))) < Actual < (Np + 1*sqrt(Np(1-p))) For more information, please see: Charles Grinstead, J. Laurie Snell, Introduction to Probability, Second Revised Edition (American Mathematical Society, 2003, retrieved 2011-12-16 from http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook. mac.pdf); approximating Binomial Distribution with Normal Distribution: p. 329; Abraham de Moivre: p. 337. Richard Williams, The Binomial Distribution (University of Notre Dame, 2004, retrieved 2011-1216 from http://www.nd.edu/~rwilliam/stats1/x13.pdf), p. 11.