Gaussian Distribution The Gaussian distribution is the most widely known distribution, and the most widely used. 1 P(x; µ, σ ) = 2 πσ (x−µ ) 2 − 2 e 2σ The mean is µ and the variance is σ2. All Gaussians are similar in shape and symmetric, as opposed to the Binomial or Poisson distribution, and easily characterized. E.g., 68.3% of the probability lies within 1 standard deviation of the mean 95.45% within 2 standard deviations 99.7% within 3 standard deviations FWHM = 2.35σ May 4, 2009 Data Analysis 1 Derivation of Gauss Distribution We consider two derivations of the Gauss function. First, the derivation starting from the binomial distribution. The appropriate limit in this case is N→∞ and r →∞ and p not too small and not too big. We have already seen that this leads to a symmetric distribution. Binomial N=50, p=0.5 Gaussian µ=25,σ2=Np(1-p) We will need Stirling’s approximation We now substitute in the Binomial formula May 4, 2009 Data Analysis ln n!≈ ln 2πn + n ln n − n or n n n!≈ 2πn e 2 Gaussian - derivation N! 2πN (N /e) N r N −r r N −r f (r;N, p) = p (1− p) ≈ p (1− p) r!(N − r)! 2πr(r /e) r 2π (N − r)((N − r) /e) N −r 1 = 2π N NN r N −r p (1− p) r(N − r) r r (N − r) N −r 1 N N +1 r N −r = p (1− p) 2πN r r+1/ 2 (N − r) N −r+1/ 2 or € −r−1/ 2 N − r −N +r−1/ 2 r 1 r f (r;N, p) ≈ p (1− p) N −r N 2πN N Doesn’t look much like the Gaussian … May 4, 2009 € Data Analysis 3 Derivation-cont. Change variables r=Np+ξ. ξ measures the distance from the mean of the binomial, Np, and the measured quantity, r. The variance of a binomial is Np(1-p), so the typical deviation of r from Np is given by σ = Np(1− p) Terms of the form ξ/r will therefore be of order 1/√N and will be small. Furthermore, ln(1+ ξ / N ) ≈ ξ / N −1/ 2(ξ / N )2 € First the rewrite in terms of ξ r −r−1/ 2 −r−1/ 2 = ( p + ξ /N ) = p−r−1/ 2 (1+ ξ /N)−r−1/ 2 € N N − r −r−1/ 2 −N +r−1/ 2 −N +r−1/ 2 = (1− p) 1− ξ /N(1− p) ( ) N € May 4, 2009 Data Analysis 4 Derivation-cont. −r−1/ 2 N − r −N +r−1/ 2 r 1 r f (r;N, p) ≈ p (1− p) N −r N 2πN N so −r−1/ 2 −N +r−1/ 2 1 ξ ξ = 1+ 1− 2πN p(1− p) Np N(1− p) Rewrite in exponential form and use approximations from last page f (r;N, p) ≈ € 1 ξ ξ exp(−r −1/2)ln1+ + (−N + r −1/2)ln1− 2πNp(1− p) Np N(1− p) ξ 1 ξ 2 1 = exp(−Np − ξ −1/2) − 2πNp(1− p) Np 2 Np 2 ξ 1 ξ + (−N(1− p) + ξ −1/2)− − N(1− p) 2 N(1− p) 2 ξ 1 ξ 2 1 ξ 1 ξ ≈ exp−Np − − N(1− p)− − 2πNp(1− p) Np 2 Np N(1− p) 2 N(1− p) 1 ξ2 = exp− σ 2 = Np(1− p) 2πNp(1− p) 2Np(1− p) May 4, 2009 € Data Analysis 5 A different derivation Here we follow the argument used by Gauss. Gauss wanted to solve the following problem: What is the form of the function ϕ(xi-µ) which gives a maximum probability for µ=arithmetic mean of the observed values {xi}. f ( x | µ) = ϕ (x1 − µ)ϕ (x 2 − µ)ϕ (x n − µ) is the probability to get {xi} n ∑x i Gauss wanted this function to peak at df =0 dµ µ =x ⇒ d n =0 ∏ ϕ (x i − µ) dµ i=1 µ =x Assuming f (µ = x ) ≠ 0,€ ∑ i ϕ′ Define ψ = ϕ Then ∑ zi = 0 i May 4, 2009 µ= i =1 n ϕ ′(x i − x ) =0 φ (x i − x ) zi = x i − x ∑ψ (zi ) = 0 for all possible z i, so ψ ∝ z Data Analysis 6 i Gauss’ derivation-cont. dϕ kz 2 dz ψ = kz ⇒ = kz, or ϕ (z) ∝ exp 2 ϕ We get the prefactor via normalization. Lessons: € • Binomial looks like Gaussian for large enough N,p • Poisson also looks like Gaussian for large enough n • Gauss’ formula follows from general arguments (maximizing posterior probability) • Gauss’ formula is much easier to use than Binomial or Poisson, so use it when you’re allowed. May 4, 2009 Data Analysis 7 Comparison Gaussian-Poisson Four events expected Binomial: N p 10 0.4 <(r-µ)2> 2.4 <r> 4 <(r- µ)3> 0.48 Poisson: <r> 4 ν 4 <(r-µ)2> 4 <(r- µ)3> 4 Gaussian: µ 4 • • May 4, 2009 Data Analysis σ2 2.4 <(r- µ)3> 0 In this case, the Binomial more closely resembles a Gaussian than does the Poisson Note, for Binomial, can change N,p 8 Smaller number expected Binomial: N p 2 0.9 <(r-µ)2> 0.18 <r> 1.8 <(r- µ)3> -0.14 Poisson: ν 1.8 <r> 1.8 <(r-µ)2> 1.8 <(r- µ)3> 1.8 Gaussian: µ 1.8 σ2 0.18 <(r- µ)3> 0 In general, need to use Poisson or Binomial when dealing with small statistics or p≅0,1 May 4, 2009 Data Analysis 9 Larger number expected Binomial: N p 100 0.1 <r> 10 <(r-µ)2> 9 <(r- µ)3> 7.2 Poisson: <(r-µ)2> 10 <r> 10 ν 10 <(r- µ)3> 10 Gaussian: µ 10 σ2 9 <(r- µ)3> 0 For large numbers, Gaussian excellent approximation. May 4, 2009 Data Analysis 10 Some Applications When we don’t know better, we use a Gaussian for unknown probability distributions. E.g., the distribution of systematic deviations from the true values. This can sometimes be justified with the Central Limit Theorem. When reporting uncertainties on a measurement, we quote ±1σ values. These are understood as Gaussian standard deviations, and therefore refer to a probability that our measurement is within the uncertainty from the true value (68.3% central probability interval). May 4, 2009 Data Analysis 11 Over-applications From a book review of The (Mis)behavior of Markets: A Fractal View of Risk, Ruin, and Reward Benoit Mandelbrot and Richard L. Hudson. Review by Ian Kaplan: Bachelier claimed that the change in market prices followed a Gaussian distribution. This distribution describes many natural features, like height, weight and intelligence among people. The Gaussian distribution is one of the foundations of modern statistics. If economic features followed a Gaussian distribution, a range of mathematical techniques could be applied in economics. Unfortunately, as Mandelbrot points out in The (Mis)behavior of Markets, the foundation of this new era of economics was rotten. …There are far more market bubbles and market crashes than these models suggest. The change in market prices does not follow a Gaussian distribution in a reliable fashion. Like income distribution, market statistics frequently follow a power law. When a graph is made of market returns (e.g., profit and loss), the curve will not fall toward zero as sharply as a Gaussian curve. The distribution of market returns has "fat tails". The "fat tails" of the return curve reflect risk, where large losses and profits can be realized. May 4, 2009 Data Analysis 12 Gaussian Distribution 1 P( x; µ , σ ) = e 2π σ ( x−µ )2 − 2σ 2 The Gaussian distribution is very important in practice: many distributions resemble Gaussians, and the Gaussian distribution is relatively easy to work with – can be used to estimate uncertainties, etc. Central Limit Theorem underlies much of this, so we look into the derivation to understand how is arises. First introduce characteristic functions. These will be generally useful May 4, 2009 Data Analysis 13 Characteristic Function A characteristic function is a moment generating function ϕ (k ) = ∫ dx eikx p ( x) It is simply the Fourier Transform of the p.d.f. Expand the exponential, 1 2 2 i 3 3 ϕ (k ) = ∫ dx p ( x) 1 + ikx − k x − k x + 2! 3! n ( k2 2 ik ) n = 1 + ik x − x ++ x + 2! n! so d nϕ (k ) n n = i x dk n k =0 May 4, 2009 Data Analysis 14 Characteristic Function Characteristic function for a Gaussian: ∞ ϕ (k) = ∫ dx e ikx −∞ 1 2πσ (x− µ ) 2 e− 2σ 2 2 1 x µ 1 ∞ k 2σ 2 = ∫ dx exp − − + ikσ exp ikµ − 2πσ −∞ 2 σ σ 2 = k 2σ 2 − ikµ e e 2 ∞ where we have used ∫ e −z 2 / a 2 dz = a π −∞ so ϕ (k) = May 4, 2009 k 2σ 2 − ikµ e e 2 Data Analysis 15 Characteristic Function Suppose x is a random variable with pdf px (x) and y is an independent random variable with pdf py (y) and z = f (x, y). We are interested in the probability that z lies in the interval z → z + dz. Call this pz (z)dz the characteristic function of z is ϕ z (k) = ∫ e ikz pz (z)dz = ∫ ∫ e ikf (x,y ) px (x) dx py (y) dy Make sure this is clear Once we have the characteristic function, we can get the pdf for z with an inverse Fourier Transform pz (z) = 1 −ikz ∫ e ϕ z (k) dk 2π May 4, 2009 Data Analysis 16 Central Limit Theorem concrete example, suppose z = x + y ϕ z (k) = ∫ ∫ e ikx p(x) dx e iky q(y) dy or The characteristic function of a sum of r.v.s is ϕ z (k) = ϕ x (k) ϕ y (k) the product of the individual char. fns. We now use this to prove the CLT: Suppose we make n measurements of x.The average of the measurements is 1 a = ( x1 + x2 ++ xn ) n What is the distribution of a ? It's simpler to consider the distribution of a − µ , Q(a − µ ), where µ =< x > Φ(k) = ∫e ik (a−µ ) May 4, 2009 Q(a − µ ) da Data Analysis 17 Central Limit Theorem-cont. Φ(k) = ik [(x1 − µ )++(x n − µ )] n p(x ∫e 1 )dx1 p(x n )dx n ik (x − µ ) ik (x − µ ) = ∫ e n p(x1 )dx1 ∫ e n p(x n )dx n 1 n k n = ϕ where ϕ (k) is the characteristic function of x − µ n ϕ (k) = ∫ e ik(x− µ ) p(x) dx k2 k 2σ 2 2 = 1+ ik x − µ − (x − µ) + = 1− + 2 2 so 2 2 n 2 1k σ 1k σ Φ(k) = [ϕ (k /n)] = 1− + → 1− 2 2 n n →∞ 2 n n May 4, 2009 Data Analysis 2 = n →∞ k 2σ 2 − e 2n 18 Central Limit Theorem-cont. To get the pdf, we use an inverse Fourier transform k kσ − − 1 1 n 1 −ik(a− µ ) 2n Q(a − µ) = e = ∫ dk e ∫ dk e−ik(a− µ ) e 2ξ 2π 2π σ 2πξ 2 Q(a − µ) = P(a) = n 2πσ 2 2 2 n ( a− µ ) 2 − 2 e 2 σ The distribution of the average of a large number of measurements of a random variable x (given here by a) follows a Gaussian distribution. The width of the Gaussian is given by σ ξ= where σ is the standard deviation of x n May 4, 2009 Data Analysis The shape of the initial distribution is unimportant ! 19 Central Limit Theorem-Example 10 experiments where we sample 10 times randomly from a flat distribution. The data are shown as the black bars. The red bar gives the mean for the 10 samples. May 4, 2009 Data Analysis 20 Central Limit Theorem-Example The mean value from 1000 experiments each with 10 samplings of the distribution. The red curve is a Gaussian with: µ=0.5 and σ= 1 1 12 10 Do you understand how the factors arise ? May 4, 2009 Data Analysis 21 Central Limit Theorem - conclusion When results are presented, the uncertainties are usually quoted assuming Gaussian distributions: • For event counting, we have seen that the Binomial and Poisson reduce to the Gaussian distribution for large numbers of events (≥ 25 or so). The statistical error (1 Gaussian standard deviation) is then taken to be σ=√N (from Poisson distribution). • For other types of uncertainties (so-called systematic uncertainties or systematic errors), again a Gaussian distribution is often assumed to describe the distribution of the measured relative to the true. This is usually justified with the CLT, although it is a rather indirect use. Examples of systematic uncertainties: energy calibration, alignment, time dependence, … May 4, 2009 Data Analysis 22 Full Width Half Maximum (FWHM) This quantity is often used instead of σ to quantify the width of a distribution: (x−µ)2 1 G(x; µ, σ) = √ e− 2σ2 2πσ Peak at x = µ 0.2 0.175 0.15 0.125 − FWHM : e 0.075 0.05 F W HM ≈ 2.35σ 0.025 0 (x−µ)2 2σ 2 = 0.5 √ x = µ ± σ 2 ln 2 0.1 -20 -15 -10 -5 May 4, 2009 0 5 10 15 20 Data Analysis 23 Gaussian used for Binomial or Poisson Probability of r successes in N trials N! f (r; N , p) = p r q N −r r!( N − r )! where q = 1 − p Number of combinations - Binomial coefficient ! Binomial : µ = N p; σ = N p(1 − p) E[n]=ν by definition σ2=ν variance=mean most important property √ Poisson : µ = ν; σ = ν ν n e −ν f (n; ν ) = n! € May 4, 2009 Data Analysis 24 Poisson Distribution-cont. So, ν=0.1 ν=0.5 € ν=1.0 ν=5.0 ν=20. April 27, 2009 E[n]=ν by definition σ2=ν variance=mean most important property ν n e −ν f (n; ν ) = n! ν=2.0 ν=10. ν=50. Notes: • As ν increases, the distribution becomes more symmetric • Approximately Gaussian for ν>20 • Poisson formula is much easier to use that the Binomial formula. Data Analysis 25 Gaussian used for Binomial or Poisson Gaussian is a continuous distribution, whereas Binomial and Poisson are discrete. Need to integrate Gaussian to get probability for a given outcome. Poisson E.g., f (n; ν) G(n; µ = µ, σ = √ ν) = = e−ν ν n n! ! n+0.5 n−0.5 Comparison: May 4, 2009 (x−ν)2 1 − 2ν dx √ e 2πν √ f (3; 0.5) = 0.013 G(3, 0.5, 0.5) = 0.0023 √ f (10; 9.) = 0.12 G(10, 9., 9) = 0.12 Data Analysis 26 Cumulative Distribution Function for Gaussian CDF = = ! (x! −µ)2 1 − 2σ2 dx! √ e 2πσ −∞ # " 1 x−µ 1 + erf ( √ ) The ‘error function’ is available in 2 2σ many computer math libraries. x Sum and difference of two independent Gaussian distributed quantities: u = x+y (u−µu )2 1 − 2σ2 u √ p(u; µx , σx , µy , σy ) = e 2πσu µ = µ + µ σ2 = σ2 + σ2 u v = p(u; µx , σx , µy , σy ) = May 4, 2009 x y u x y x−y (v−µ )2 1 − 2σ2v v √ e 2πσv µv = µx − µy σv2 = σx2 + σy2 Data Analysis 27 Multivariate Gaussian T µ ! = µ1 µ2 . . . µN cov(x1 , x1 ) cov(x2 , x1 ) Σ= cov(xN , x1 ) f (x1 , x2 , ..., xN ) = cov(x1 , x2 ) . . cov(x1 , xN ) . cov(xN , xN ) 1 (2π)N/2 |Σ|1/2 ! " 1 exp − ("x − µ " )T Σ−1 ("x − µ ") 2 Example: Bivariate f (x, y) = 2πσx σy May 4, 2009 1 ! " 1 exp − 2(1 − ρ2 ) 1 − ρ2 Data Analysis " x y 2ρxy + 2 − 2 σx σY σx σy 2 2 28 ##