Gaussian Distribution

advertisement
Gaussian Distribution
The Gaussian distribution is the most widely known distribution, and
the most widely used.
1
P(x; µ, σ ) =
2 πσ
(x−µ ) 2
−
2
e 2σ
The mean is µ and the variance is σ2.
All Gaussians are similar in shape and symmetric, as opposed to the
Binomial or Poisson distribution, and easily characterized. E.g.,
68.3% of the probability lies within 1 standard deviation of the mean
95.45% within 2 standard deviations
99.7% within 3 standard deviations
FWHM = 2.35σ
May 4, 2009
Data Analysis
1
Derivation of Gauss Distribution
We consider two derivations of the Gauss function. First, the
derivation starting from the binomial distribution. The appropriate
limit in this case is N→∞ and r →∞ and p not too small and not too
big. We have already seen that this leads to a symmetric
distribution.
Binomial N=50, p=0.5
Gaussian µ=25,σ2=Np(1-p)
We will need Stirling’s approximation
We now substitute in the Binomial formula
May 4, 2009
Data Analysis
ln n!≈ ln 2πn + n ln n − n
or
 n n
n!≈ 2πn  
 e
2
Gaussian - derivation
N!
2πN (N /e) N
r
N −r
r
N −r
f (r;N, p) =
p (1− p)
≈
p
(1−
p)
r!(N − r)!
2πr(r /e) r 2π (N − r)((N − r) /e) N −r
1
=
2π
N
NN
r
N −r
p
(1−
p)
r(N − r) r r (N − r) N −r
1
N N +1
r
N −r
=
p
(1−
p)
2πN r r+1/ 2 (N − r) N −r+1/ 2
or
€
−r−1/ 2
 N − r −N +r−1/ 2 r
1 r
f (r;N, p) ≈
p (1− p) N −r

 

 N 
2πN  N 
Doesn’t look much like the Gaussian …
May 4, 2009
€
Data Analysis
3
Derivation-cont.
Change variables r=Np+ξ. ξ measures the distance from the mean of the
binomial, Np, and the measured quantity, r. The variance of a binomial is
Np(1-p), so the typical deviation of r from Np is given by
σ = Np(1− p)
Terms of the form ξ/r will therefore be of order 1/√N and will be small.
Furthermore,
ln(1+ ξ / N ) ≈ ξ / N −1/ 2(ξ / N )2
€
First the rewrite in terms of ξ
 r −r−1/ 2
−r−1/ 2
= ( p + ξ /N )
= p−r−1/ 2 (1+ ξ /N)−r−1/ 2
 €
N
 N − r −r−1/ 2
−N +r−1/ 2
−N +r−1/ 2
=
(1−
p)
1−
ξ
/N(1−
p)


(
)
 N 
€
May 4, 2009
Data Analysis
4
Derivation-cont.
−r−1/ 2
 N − r −N +r−1/ 2 r
1 r
f (r;N, p) ≈
p (1− p) N −r
 


 N 
2πN  N 
so
−r−1/ 2
−N +r−1/ 2


1
ξ 
ξ 
=
1+ 
1−

2πN p(1− p)  Np 
 N(1− p) 
Rewrite in exponential form and use approximations from last page
f (r;N, p) ≈
€



1
ξ 
ξ 
exp(−r −1/2)ln1+  + (−N + r −1/2)ln1−

2πNp(1− p)
 Np 
 N(1− p) 


 ξ 1  ξ  2 
1
=
exp(−Np − ξ −1/2) −   
2πNp(1− p)

 Np 2  Np  
2


ξ
1  ξ  
+ (−N(1− p) + ξ −1/2)−
− 
 
N(1−
p)
2
N(1−
p)

 


2

 ξ 1  ξ 2 

1
ξ
1  ξ  
≈
exp−Np −    − N(1− p)−
− 
 
2πNp(1− p)
Np
2
Np
N(1−
p)
2
N(1−
p)



 






1
ξ2
=
exp−
σ 2 = Np(1− p)

2πNp(1− p)
 2Np(1− p)
May 4, 2009
€
Data Analysis
5
A different derivation
Here we follow the argument used by Gauss. Gauss wanted to solve
the following problem: What is the form of the function ϕ(xi-µ) which
gives a maximum probability for µ=arithmetic mean of the observed
values {xi}.

f ( x | µ) = ϕ (x1 − µ)ϕ (x 2 − µ)ϕ (x n − µ)
is the probability
to get {xi}
n
∑x
i
Gauss wanted this function to peak at
df
=0
dµ µ =x
⇒
d n
=0
∏ ϕ (x i − µ)
dµ i=1
µ =x
Assuming f (µ = x ) ≠ 0,€ ∑
i
ϕ′
Define ψ =
ϕ
Then ∑ zi = 0
i
May 4, 2009
µ=
i =1
n
ϕ ′(x i − x )
=0
φ (x i − x )
zi = x i − x
∑ψ (zi ) = 0
for all possible z i, so ψ ∝ z
Data Analysis
6
i
Gauss’ derivation-cont.
dϕ
 kz 2 
dz

ψ = kz ⇒
= kz, or ϕ (z) ∝ exp
 2
ϕ
We get the prefactor via normalization.
Lessons:
€
•  Binomial looks like Gaussian for large enough N,p
•  Poisson also looks like Gaussian for large enough n
•  Gauss’ formula follows from general arguments (maximizing posterior
probability)
•  Gauss’ formula is much easier to use than Binomial or Poisson, so use
it when you’re allowed.
May 4, 2009
Data Analysis
7
Comparison Gaussian-Poisson
Four events expected
Binomial:
N p
10 0.4
<(r-µ)2>
2.4
<r>
4
<(r- µ)3>
0.48
Poisson:
<r>
4
ν
4 
<(r-µ)2>
4
<(r- µ)3>
4
Gaussian:
µ
4
• 
• 
May 4, 2009
Data Analysis
σ2
2.4
<(r- µ)3>
0
In this case, the Binomial
more closely resembles a
Gaussian than does the
Poisson
Note, for Binomial, can
change N,p
8
Smaller number expected
Binomial:
N p
2 0.9
<(r-µ)2>
0.18
<r>
1.8
<(r- µ)3>
-0.14
Poisson:
ν
1.8
<r>
1.8
<(r-µ)2>
1.8
<(r- µ)3>
1.8
Gaussian:
µ
1.8
σ2
0.18
<(r- µ)3>
0
In general, need to use
Poisson or Binomial
when dealing with small
statistics or p≅0,1
May 4, 2009
Data Analysis
9
Larger number expected
Binomial:
N p
100 0.1
<r>
10
<(r-µ)2>
9
<(r- µ)3>
7.2
Poisson:
<(r-µ)2>
10
<r>
10
ν
10 
<(r- µ)3>
10
Gaussian:
µ
10
σ2
9
<(r- µ)3>
0
For large numbers,
Gaussian excellent
approximation.
May 4, 2009
Data Analysis
10
Some Applications
When we don’t know better, we use a Gaussian for unknown probability
distributions. E.g., the distribution of systematic deviations from the true
values. This can sometimes be justified with the Central Limit Theorem.
When reporting uncertainties on a measurement, we quote ±1σ values.
These are understood as Gaussian standard deviations, and therefore
refer to a probability that our measurement is within the uncertainty from
the true value (68.3% central probability interval).
May 4, 2009
Data Analysis
11
Over-applications
From a book review of The (Mis)behavior of Markets: A Fractal View of Risk, Ruin, and Reward
Benoit Mandelbrot and Richard L. Hudson. Review by Ian Kaplan:
Bachelier claimed that the change in market prices followed a Gaussian distribution. This
distribution describes many natural features, like height, weight and intelligence among people.
The Gaussian distribution is one of the foundations of modern statistics. If economic features
followed a Gaussian distribution, a range of mathematical techniques could be applied in
economics.
Unfortunately, as Mandelbrot points out in The (Mis)behavior of Markets, the foundation of this
new era of economics was rotten. …There are far more market bubbles and market crashes than
these models suggest.
The change in market prices does not follow a Gaussian distribution in a reliable fashion. Like
income distribution, market statistics frequently follow a power law. When a graph is made of
market returns (e.g., profit and loss), the curve will not fall toward zero as sharply as a Gaussian
curve. The distribution of market returns has "fat tails". The "fat tails" of the return curve reflect
risk, where large losses and profits can be realized.
May 4, 2009
Data Analysis
12
Gaussian Distribution
1
P( x; µ , σ ) =
e
2π σ
( x−µ )2
−
2σ 2
The Gaussian distribution is very important in practice: many
distributions resemble Gaussians, and the Gaussian distribution is
relatively easy to work with – can be used to estimate uncertainties, etc.
Central Limit Theorem underlies much of this, so we look into the
derivation to understand how is arises.
First introduce characteristic functions. These will be generally useful
May 4, 2009
Data Analysis
13
Characteristic Function
A characteristic function is a moment generating function
ϕ (k ) = ∫ dx eikx p ( x)
It is simply the Fourier Transform of the p.d.f.
Expand the exponential,
1 2 2 i 3 3


ϕ (k ) = ∫ dx p ( x) 1 + ikx − k x − k x + 
2!
3!


n
(
k2 2
ik ) n
= 1 + ik x −
x ++
x +
2!
n!
so
d nϕ (k )
n
n
=
i
x
dk n k =0
May 4, 2009
Data Analysis
14
Characteristic Function
Characteristic function for a Gaussian:
∞
ϕ (k) = ∫ dx e
ikx
−∞
1
2πσ
 (x− µ ) 2 
e− 2σ 2 
2
 1 x  µ
  
1 ∞
k 2σ 2 
=
∫ dx exp −  −  + ikσ   exp ikµ −





2πσ −∞
2
σ
σ
2


 
=
k 2σ 2
−
ikµ
e e 2
∞
where we have used ∫ e
−z 2 / a 2
dz = a π
−∞
so
ϕ (k) =
May 4, 2009
k 2σ 2
−
ikµ
e e 2
Data Analysis
15
Characteristic Function
Suppose x is a random variable with pdf px (x) and
y is an independent random variable with pdf py (y)
and z = f (x, y). We are interested in the probability that
z lies in the interval z → z + dz. Call this pz (z)dz
the characteristic function of z is
ϕ z (k) = ∫ e ikz pz (z)dz = ∫ ∫ e ikf (x,y ) px (x) dx py (y) dy
Make sure
this is clear
Once we have the characteristic function, we can get the pdf for z
with an inverse Fourier Transform
pz (z) =
1 −ikz
∫ e ϕ z (k) dk
2π
May 4, 2009
Data Analysis
16
Central Limit Theorem
concrete example, suppose z = x + y
ϕ z (k) = ∫ ∫ e ikx p(x) dx e iky q(y) dy
or
The characteristic function of a sum of r.v.s is
ϕ z (k) = ϕ x (k) ϕ y (k)
the product of the individual char. fns.
We now use this to prove the CLT:
Suppose we make n measurements of x.The average of the measurements is
1
a = ( x1 + x2 ++ xn )
n
What is the distribution of a ? It's simpler to consider the distribution of
a − µ , Q(a − µ ), where µ =< x >
Φ(k) =
∫e
ik (a−µ )
May 4, 2009
Q(a − µ ) da
Data Analysis
17
Central Limit Theorem-cont.
Φ(k) =
ik
[(x1 − µ )++(x n − µ )]
n
p(x
∫e
1 )dx1  p(x n )dx n
  ik (x − µ )

 ik (x − µ )
=  ∫ e n
p(x1 )dx1  ∫ e n
p(x n )dx n 

 

1
n
  k n
= ϕ  where ϕ (k) is the characteristic function of x − µ
  n 
ϕ (k) = ∫ e ik(x− µ ) p(x) dx
k2
k 2σ 2
2
= 1+ ik x − µ −
(x − µ) +  = 1−
+
2
2
so
2
2
n
2
 1k σ

1k σ
Φ(k) = [ϕ (k /n)] = 1−
+  → 1−
2
 2 n
 n →∞ 2 n
n
May 4, 2009
Data Analysis
2
=
n →∞
k 2σ 2
−
e 2n
18
Central Limit Theorem-cont.
To get the pdf, we use an inverse Fourier transform
k
kσ

−
−
1
1
n 1
−ik(a− µ )
2n
Q(a − µ) =
e
=
∫ dk e
∫ dk e−ik(a− µ ) e 2ξ
2π
2π σ  2πξ
2
Q(a − µ) = P(a) =
n
2πσ
2
2
2



n ( a− µ ) 2
−
2
e 2 σ
The distribution of the average of a large number of measurements
of a random variable x (given here by a) follows a Gaussian
distribution. The width of the Gaussian is given by
σ
ξ=
where σ is the standard deviation of x
n
May 4, 2009
Data Analysis
The shape of the
initial distribution is
unimportant !
19
Central Limit Theorem-Example
10 experiments where we
sample 10 times randomly
from a flat distribution. The
data are shown as the black
bars. The red bar gives the
mean for the 10 samples.
May 4, 2009
Data Analysis
20
Central Limit Theorem-Example
The mean value from
1000 experiments each
with 10 samplings of the
distribution. The red
curve is a Gaussian
with:
µ=0.5 and
σ=
1 1
12 10
Do you understand how
the factors arise ?
May 4, 2009
Data Analysis
21
Central Limit Theorem - conclusion
When results are presented, the uncertainties are usually quoted
assuming Gaussian distributions:
•  For event counting, we have seen that the Binomial and Poisson reduce
to the Gaussian distribution for large numbers of events
(≥ 25 or so). The statistical error (1 Gaussian standard deviation) is then
taken to be σ=√N (from Poisson distribution).
•  For other types of uncertainties (so-called systematic uncertainties or
systematic errors), again a Gaussian distribution is often assumed to
describe the distribution of the measured relative to the true. This is
usually justified with the CLT, although it is a rather indirect use.
Examples of systematic uncertainties: energy calibration, alignment, time
dependence, …
May 4, 2009
Data Analysis
22
Full Width Half Maximum (FWHM)
This quantity is often used instead of σ to quantify the width of a
distribution:
(x−µ)2
1
G(x; µ, σ) = √
e− 2σ2
2πσ
Peak at x = µ
0.2
0.175
0.15
0.125
−
FWHM : e
0.075
0.05
F W HM ≈ 2.35σ
0.025
0
(x−µ)2
2σ 2
= 0.5
√
x = µ ± σ 2 ln 2
0.1
-20
-15
-10
-5
May 4, 2009
0
5
10
15
20
Data Analysis
23
Gaussian used for Binomial or Poisson
Probability of r successes in N trials
N!
f (r; N , p) =
p r q N −r
r!( N − r )!
where q = 1 − p
Number of combinations - Binomial coefficient
!
Binomial : µ = N p; σ = N p(1 − p)
E[n]=ν by definition
σ2=ν
variance=mean
most important property
√
Poisson : µ = ν; σ = ν
ν n e −ν
f (n; ν ) =
n!
€
May 4, 2009
Data Analysis
24
Poisson Distribution-cont.
So,
ν=0.1
ν=0.5
€
ν=1.0
ν=5.0
ν=20.
April 27, 2009
E[n]=ν by definition
σ2=ν
variance=mean
most important property
ν n e −ν
f (n; ν ) =
n!
ν=2.0
ν=10.
ν=50.
Notes:
•  As ν increases, the
distribution becomes more
symmetric
•  Approximately Gaussian for
ν>20
•  Poisson formula is much
easier to use that the Binomial
formula.
Data Analysis
25
Gaussian used for Binomial or Poisson
Gaussian is a continuous distribution, whereas Binomial and Poisson are
discrete. Need to integrate Gaussian to get probability for a given
outcome.
Poisson
E.g.,
f (n; ν)
G(n; µ = µ, σ =
√
ν)
=
=
e−ν ν n
n!
! n+0.5
n−0.5
Comparison:
May 4, 2009
(x−ν)2
1
− 2ν dx
√
e
2πν
√
f (3; 0.5) = 0.013 G(3, 0.5, 0.5) = 0.0023
√
f (10; 9.) = 0.12 G(10, 9., 9) = 0.12
Data Analysis
26
Cumulative Distribution Function for Gaussian
CDF
=
=
!
(x! −µ)2
1
− 2σ2 dx!
√
e
2πσ
−∞
#
"
1
x−µ
1 + erf ( √ )
The ‘error function’ is available in
2
2σ
many computer math libraries.
x
Sum and difference of two independent Gaussian distributed quantities:
u = x+y
(u−µu )2
1
− 2σ2
u
√
p(u; µx , σx , µy , σy ) =
e
2πσu
µ = µ + µ σ2 = σ2 + σ2
u
v
=
p(u; µx , σx , µy , σy )
=
May 4, 2009
x
y
u
x
y
x−y
(v−µ )2
1
− 2σ2v
v
√
e
2πσv
µv = µx − µy σv2 = σx2 + σy2
Data Analysis
27
Multivariate Gaussian




T
µ
! =



µ1
µ2
.
.
.
µN









cov(x1 , x1 )
 cov(x2 , x1 )


Σ=



cov(xN , x1 )
f (x1 , x2 , ..., xN ) =
cov(x1 , x2 )
.
.
cov(x1 , xN )
.
cov(xN , xN )
1
(2π)N/2 |Σ|1/2








!
"
1
exp − ("x − µ
" )T Σ−1 ("x − µ
")
2
Example: Bivariate
f (x, y) =
2πσx σy
May 4, 2009
1
!
"
1
exp −
2(1 − ρ2 )
1 − ρ2
Data Analysis
"
x
y
2ρxy
+ 2 −
2
σx
σY
σx σy
2
2
28
##
Download