Binomial Probability Distribution For the binomial distribution P is the probability of m successes out of N trials. Here p is probability of a success and q=1-p is probability of a failure only two choices in a binomial process. Tossing a coin N times and asking for m heads is a binomial process. The binomial coefficient keeps track of the number of ways (“combinations”) we can get the desired outcome. 2 heads in 4 tosses: HHTT, HTHT, HTTH, THHT, THTH, TTHH Does this formula make sense, e.g. if we sum over all possibilities do we get 1? P(m, N , To show that this distribution is normalized properly, first remember the Binomial Theorem: k k (a b)k a k l bl l 0 l For this example a = q = 1 - p and b = p, and (by definition) a +b = 1. N N N p m q N m ( p q)N 1 P(m, N, p) Thus the distribution is normalized properly. m0 m 0 m What is the mean of this distribution? N mP(m, N, p) m 0 N P(m, N, p) N mP(m,N, p) m 0 N N mmp m q N m p) N! p m q N m m!( N m)! N C N ,m m binomial coefficien t : m0 m0 A cute way of evaluating the above sum is to take the derivative: N N m N m N m 1 N m N N N p m (N m)(1 p)N m 1 p q 0 m p q p m 0m m 0 m m 0 m N N m 1 N m N m p q m mp m (N m)(1 p) N m 1 m 0 m0 N p 1 N N N m N m N m N 1 Nm 1 mpm (1 p) N m m p q (1 p) N p (1 p) (1 p) m 0 m m 0 m m 0 m N 1 1 1 p (1 p) N(1) (1 p) 880.P20 Winter 2006 =Np Richard Kass Binomial Probability Distribution What’s the variance of a binomial distribution? Using a trick similar to the one used for the average we find: N 2 (m ) P(m, N , p) 2 m 0 N Npq P(m, N , p) m 0 Detection efficiency and its “error”: Suppose you observed m special events (or successes) in a sample of N events. The measured probability (sometimes called “efficiency”) for a special event to occur is m / N . What is the error ( standard deviation or ) in ? Since N is a fixed quantity it is plausible (we will show it soon) that the error in is related to the error ( standard deviation or m) in m by: m / N . This leads to: m / N Npq / N N (1 ) / N (1 ) / N This is sometimes called "error on the efficiency". Thus you want to have a sample (N) as large as possible to reduce the uncertainty in the probability measurement! Note: , the “error in the efficiency” 0 as 0 or 1. (This is NOT a gaussian so don’t stick it into a Gaussian pdf to calculate probability) 880.P20 Winter 2006 G G x x Richard Kass Binomial Probability Distributions When a -ray goes though material there is chance that it will convert into an electron-positron pair, e+e-. Let’s assume the probability for conversion is 10%. If 100 ’s go through this material on average how many will convert to e+e-? = Np = 100(0.1) = 10 conversions Consider the case where the ’s come from 0’s. most (98.8%) of the time. We can ask the following: What is the probability that both ’s will convert? P(2)=Probability of 2/2 = (0.1)2 =0.01= 1% What is the probability that one will convert? P(1)=Probability of 1/2 = [2!/(1!1!)](0.1)1(0.9)1 = 18% What is the probability that both ’s will not convert? P(0)=Probability of 0/2 =[2!/(0!2!)](0.1)0(0.9)2 = 81% Note: P(2)+P(1)+P(0)=100% Finally, the probability of at least one conversion is: P(1)=1- P(0) = 19% 880.P20 Winter 2006 Richard Kass Poisson Probability Distribution Another important discrete distribution is the Poisson distribution. Consider the following conditions: a) p is very small and approaches 0. For example suppose we had a 100 sided dice instead of a 6 sided dice. Here p = 1/100 instead of 1/6. Suppose we had a 1000 sided dice, p = 1/1000...etc b) N is very large, it approaches . For example, instead of throwing 2 dice, we could throw 100 or 1000 dice. aradioactive decay c) The product Np is finite. A good example of the above conditions occurs when one considers radioactive decay. anumber of Prussian soldiers kicked to death by horses per year per army corps! Suppose we have 25 mg of an element. This is 1020 atoms. aquality control, failure rate predictions Suppose the lifetime () of this element = 1012 years 5x1019 seconds. The probability of a given nucleus to decay in one second = 1/ = 2x10-20/sec. For this example: N = 1020 (very large) p = 2x10-20 (very small) Np = 2 (finite!) We can derive an expression for the Poisson distribution by taking the appropriate limits of the binomial distribution. P(m, N, p) Using condition b) we obtain: q N m (1 p) N m N! m Nm p q m!(N m)! N! N(N 1) (N m 1)(N m)! N>>m m N (N m)! (N m)! p2 (N m)(N m 1) ( pN)2 pN 1 p(N m) ... 1 pN e 2! 2! Putting this altogether we obtain: P(m, N, p) N m p me p N e m m! m! Here we've let = pN. It is easy to show that: = Np = mean of a Poisson distribution 2 = Np = = variance of a Poisson distribution. Note: m is always an integer 0 however, does not have to be an integer. In a counting experiment if you observe m events: m 880.P20 Winter 2006 Richard Kass Poisson Probability Distribution Radioactivity Example: a) What’s the probability of zero decays in one second if the average = 2 decays/sec? e 2 20 e2 1 P(0,2) e 2 0.135 13.5% 0! 1 b) What’s the probability of more than one decay in one second if the average = 2 decays/sec? e 2 20 e 2 21 P( 1,2) 1 P(0,2) P(1,2) 1 1 e 2 2e 2 0.594 59.4% 0! 1! c) Estimate the most probable number of decays/sec? P(m, ) m* 0 We want: m To solve this problem its convenient to maximize lnP(m, ) instead of P(m, ). e m ln10!=15.10 10ln10-10=13.03 14% ln P(m, ) ln( ) mln lnm! m! ln50!=148.48 50ln50-50=145.601.9% In order to handle the factorial when take the derivative we use Stirling's Approximation: ln(m!) mln(m)-m lnP(m, ) ( m * ln lnm *!) ( m * ln m * lnm * m*) ln lnm * 1 1 0 m m m m* = In this example the most probable value for m is just the average of the distribution. Therefore if you observed m events in an experiment, the error on m is m . Caution: The above derivation is only approximate since we used Stirlings Approximation which is only valid for large m. Another subtle point is that strictly speaking m can only take on integer values 0.4 while is not restricted to be an integer. 0.5 poisson binomialN=3, p=1/3 0.3 0.2 binomialN=10,p=0.1 poisson 0.25 0.2 0.15 0.1 0.1 0.05 0 0 880.P20 Winter 2006 0.3 Probability 0.4 Probability Comparison of Binomial and Poisson distributions with mean =1. 0.35 0 1 2 m 3 4 5 0.0 1.0 2.0 3.0 m 4.0 5.0 6.0 7.0 Richard Kass Not much difference between them here! Poisson Probability Distribution Counting the numbers of cosmic rays that pass through a detector in a 15 sec interval Data is compared with a poisson using the measured average number of cosmic rays passing through the detector in eighty one 15 sec. intervals (=5.4) Error bars are (usually) calculated using ni (ni=number in a bin) Why? Assume we have N total counts and the probability to fall in bin i is pi. For a given bin we have a binomial distribution (you’re either in or out). The expected average number in a given bin is: Npi and the variance is Npi(1-pi)=ni(1-pi) If we have a lot of bins then the probability of a event falling into a bin is small so (1-pi) 1 20 poisson with =5.4 15 number of occurrences 10 5 counts occurrences 0 0 1 2 2 9 3 11 4 8 5 10 6 17 7 6 8 8 9 6 12 10 3 number of cosmic rays in a 15 sec. interval 11 0 12 0 13 1 2 880.P20 Winter 2006 4 6 8 10 In our example the largest pi =17/81=0.21 correction=(1-.21)1/2=0.88 Richard Kass Gaussian Probability Distribution The Gaussian probability distribution (or “bell shaped curve” or Normal distribution) is perhaps the most used distribution in all of science. Unlike the binomial and Poisson distribution the Gaussian is a continuous distribution. It is given by: 1 p(y) e 2 y 2 2 2 with = mean of distribution (also at the same place as mode and median) 2 = variance of distribution y is a continuous variable (-y The probability (P) of y being in the range [a, b] is given by an integral: 2 ( y ) 1 b 22 P(a y b) dy e 2 a Since this integral cannot be evaluated in closed form for arbitrary a and b (at least no one's figured out how to do it in the last couple of hundred years) the values of the integral have to be looked up in a table. The total area under the curve is normalized to one. In terms of the probability integral we have: 2 ( y 2 ) 1 P( y ) dy 1 e 2 2 Quite often we talk about a measurement being a certain number of standard deviations () away from the mean () of the Gaussian. We can associate a probability for a measurement to be |- n| from the mean just by calculating the area outside of this region. n Prob. of exceeding ±n 0.67 0.5 1 0.32 It is very unlikely (<0.3%) that a 2 0.05 measurement taken at random from a 3 0.003 gaussian pdf will be more than 3from 4 0.00006 880.P20 Winter 2006 the true mean of the distribution. x Richard Kass Central Limit Theorem Why is the gaussian pdf so important ? “Things that are the result of the addition of lots of small effects tend to become Gaussian” The above is a crude statement of the Central Limit Theorem: A more exact statement is: Let Y1, Y2,...Yn be an infinite sequence of independent random variables each with the 2 same probability distribution. Suppose that the mean () and variance ( ) of this distribution are both finite. Then for any numbers a and b: Actually, the Y’s can Y Y2 Yn n lim P a 1 b n n 1 b 1/2y2 dy e 2 a be from different pdf’s! Thus the C.L.T. tells us that under a wide range of circumstances the probability distribution that describes the sum of random variables tends towards a Gaussian distribution as the number of terms in the sum . Alternatively, Y Y lim P a b lim P a b / n n n m 1 b 1/2 y2 dy e 2 a Note: m is sometimes called “the error in the mean” (more on that later). For CLT to be valid: and of pdf must be finite No one term in sum should dominate the sum 880.P20 Winter 2006 Richard Kass Central Limit Theorem Best illustration of the CLT. a) Take 12 numbers (ri) from your computer’s random number generator b) add them together c) Subtract 6 d) get a number that is from a gaussian pdf ! Computer’s random number generator gives numbers distributed uniformly in the interval [0,1] A uniform pdf in the interval [0,1] has =1/2 and 2=1/12 12 12 12 ri 12 ri 12 (1 / 2) ri 12 (1 / 2) Y Y Y Y n 1 2 3 n i 1 i 1 i 1 P a b P a b P 6 6 P 6 6 n n (1 / 12 ) 12 (1 / 12 ) 12 12 1 6 ( y 2 / 2) P 6 ri 6 6 dy e 2 6 i 1 Thus the sum of 12 uniform random numbers minus 6 is distributed as if it came from a gaussian pdf with =0 and =1. A) 5000 random numbers C) 5000 triplets (r1+ r2+ r3) of random numbers B) 5000 pairs (r1+ r2) of random numbers D) 5000 12-plets (r1+ ++r12) of random numbers. E) 5000 12-plets E (r + ++r -6) of 1 12 random numbers. Gaussian =0 and =1 In this case, 12 is close to . -6 880.P20 Winter 2006 0 +6 Richard Kass Central Limit Theorem Example: An electromagnetic calorimeter is being made out of a sandwich of lead and plastic scintillator. There are 25 pieces of lead and 25 pieces of plastic, each piece is nominally 1 cm thick. The spec on the thickness is 0.5 mm and is uniform in [-0.5,0.5] mm. The calorimeter has to fit inside an opening of 51 cm. What is the probability that it won’t will fit? Since the machining errors come from a uniform distribution with a well defined mean and variance the Central Limit 1 b 12 y 2 Y Y Theorem ...Yn nis applicable: lim Pa 1 2 b dy e n n 2 a The upper Y1 limit Y2 ...corresponds Yn n 50(0to .5) many 50 0large machining errors, all +0.5 mm: b n 1 12 12.2 50 The lower limit a sum Y1 Ycorresponds 1to 50 0 of machining errors of 1 cm. 2 ...Yn n a n 1 12 0.49 50 12.2 The probability for to be greater than 5 cm is: 1 the stack y P 2 e 1 2 2 dy 0.31 0.49 (and a 100% chance someone will get fired if it doesn’t fit inside the box…) There’s a 31% chance the calorimeter won’t fit inside the box! 880.P20 Winter 2006 Richard Kass When Doesn’t the Central Limit Theorem Apply? Case I) PDF does not have a well defined mean or variance. The Breit-Wigner distribution does not have a well defined variance! BW (m) 1 2 (m m0 ) 2 ( / 2) 2 Describes the shape of a resonance, e.g. K* normalized : BW (m)dm 1 well defined average : mBW (m)dm m undefined variance since : 0 m 2 BW (m)dm Case II) Physical process where one term in the sum dominates the sum. i) Multiple scattering: as a charged particle moves through material it undergoes many elastic (“Rutherford”) scatterings. Most scattering produce small angular deflections (d/dW~q-4) but every once in a while a scattering produces a very large deflection. If we neglect the large scatterings the angle qplane is gaussian distributed. The mean q depends on the material thickness & particle’s charge & momentum ii) The spread in range of a stopping particle (straggling). A small number of collisions where the particle loses a lot of its energy dominates the sum. iii) Energy loss of a charged particle going through a gas. Described by a “Landau” distribution (very long “tail”). 880.P20 Winter 2006 Richard Kass