Statistics

advertisement
Cornelia Parker
Colder Dark Matter
Introduction to
Carlos Mana
Benasque-2014
Astrofísica y Física de Partículas
CIEMAT
What is the Monte Carlo method ?
numerical method
stochastic process
Monte Carlo
problem
solution
Theory of Probability and
Statistics
How does it work ?
Design the
experiment
Do it
…or simulate
on a computer
Outcome
problem
What kind of problems can we tackle ?
Why Monte Carlo ?
incapability?
What do we get ?
Magic Box?, …, more information?...
Particle physics, Bayesian inference,…
Important and General Feature of Monte Carlo estimations:
Example: Estimation of 
(… a la Laplace)
S
C
Probability that a draw falls
inside the circumference
N
trials
X  {Point falls inside the circumference}
X ~ Bi ( x | N ,  )
event
Throws (N)
 ~ Be ( | n  1 / 2, N  n  1 / 2)
  4
 (C ) 


 (S ) 4
100
1000
10000
100000
1000000
Accepted (n)
83
770
7789
78408
785241
~(4  ~ )  12
n


~
 ~ 
 4
N 

N
3.32
3.08
3.1156
3.1363
3.141
0.1503
0.0532
0.0166
0.0052
0.0016
… for large N:
4n
~ 
N
1.6
~
 ( ) 
N
1
The dependence of the uncertainty with
N
is a general feature of the Monte Carlo estimations
regardless the number of dimensions of the
problem
Basis of Monte Carlo
simulations
Get a sequence
z1 , z2 ,...zn 
of random numbers
Design the
experiment
Get a sampling of size n
from a stochastic
process
Need sequences of
random numbers
Do not do the experiment but….
generate sequences of
(pseudo-)random numbers
on a computer…
Do it
…or simulate
on a computer
“Anyone who considers arithmetical
methods of producing random digits is, of
course, in a state of sin”
J. von Neumann
(∞ Refs.)
X ~ UnD x | 0,1
Sampling moments:
1 n k
mk   x j
n j 1
1 nk
ck 
x j x j k

n  k j 1
Sample size: 106 events
Unx 0,1
  0.5000

 2  0.0833  112
 1   0.0

 2    1.2   6 5 
m1  0.5002
2
 sam
 0.0833
 1 sam  0.0025
 2 sam   1.2029
  0.0003
 ( xi , xi 1 )  0.00076
 ( xi , xi  2 )   0.00096
 ( xi , xi 3 )   0.00010
We can generate pseudo-random sequences
z1, z2 ,... zk 
with Z ~ Un( z 0,1)
Great!
But life is not uniform…
In Physics, Statistics,… we want
to generate sequences from
X  {x1 , x2 ,..., xn } ~ p( x | θ )
How?
Inverse Transform
Acceptance-Rejection (“Hit-Miss”)
“Importance Sampling”,…
+
Usually a combination of
them
Tricks
Normal Distribution
M(RT)2
(Markov Chains)
Gibbs Sampling,….
Method 1: Inverse Transform
X ~ p( x  )
 We want to generate a sample of
P( X  (, x] |  ) 

Y  F ( X  ) :  X  0,1
x
x


 p( x |  )dx   dF ( x |  ) F ( x |  )
How is
Y distributed?


FY ( y )  PY  y   PF ( X |  )  y   P X  F 1 ( y |  ) 
F 1 ( y| )
 dF ( x |  )  y

Y  F ( X |  ) ~ Un(x 0,1)
FY ( y)  y
Algorithm
i) Sample
n)
ii) Transform
Y ~ Un( y 0,1)
yi ~ Un( y 0,1)
X  F 1 (Y  )
xi  F 1 ( yi  )
x1, x2 ,... xn  ~
p( x  )
Examples:
X ~ Ex( x  )
Sampling of
p( x  )   e x
Ex(x 1)
F ( x  )  1  e x
i ) ui ~ Un(u 0,1)
ii )
xi  F (ui  )  
1
ln ui

X ~ We( x 1 , 2 )
Sampling of
p( x θ )  1 2 x
 2 1 1 x2
e
We(x 0.2,1.7)
1[ 0, ) ( x)
2
F ( x θ )  1  e 1x 1[ 0, ) ( x)
i ) ui ~ Un(u 0,1)
 1

ii ) xi  F (ui | θ )   ln ui 
 1

1
1
2
…some fun…
Problem:
1) Generate a sample :
X ~ Un( x 0,1)
x1 , x2 ,...xn 
n  106
3 2
X ~ p ( x)  x I[ 1,1] ( x)
2
2) Get for each case the sampling distribution of
1
1
X ~ Ca( x 0,1) 
 1 x2
1 m
Ym   X k
m k 1
m  {2, 5, 10, 20, 50}
3) Discuss the sampling distribution of Ym in connection with the Law of Large
Numbers and the Central Limit Theorem
n
4) If X ~ Un( x 0,1) and Wn   U k  [0,1] How is Z n   log Wn distributed?
k 1
5) If
X k ~ Ga( x 0, k )
How is Z n 
6) If
X i ~ Ga( x |  ,  i )
How is
Xn
distributed?
Xn  Xm
Y   X i distributed?
i
(assumed to be independent random quantities)
1 m
Ym   X k
m k 1
m  {1, 2, 5, 10, 20, 50}
X ~ p( x) 
X ~ Un( x 0,1)
X ~ Ca( x 0,1) 
1
1
 1 x2
3 2
x I[ 1,1] ( x)
2
X ~ p( X  k  )  pk
For discrete random quantities
k  0, 1, 2, ...
F0  P( X  0  )  p0
F1  P( X  1 )  p0  p1
…
n
Fn  P( X  n  )   pk
k 1
…
p1
p0
0
p2
F0
F1
1
F2
Algorithm
i) Sample
n)
ii) Find
U ~ Un( x 0,1)
k Fk 1  ui  Fk
( F1  0)
x1, x2 ,... xn  ~
ui ~ Un(u 0,1)
xi  k
p( x  )
X ~ Po( X |  )
Example: Sampling of
50,000 events
e  k 
P( X  k  ) 
 P( X  k  1  )
k!
k
i ) ui ~ Un(u 0,1)

ii ) a ) pk 
pk 1
k
b) Fk  Fk 1  pk
Po(k 2)
p0  e  
from k  0 until
k
50,000 events
Bi (n 10,0.5)
Fk 1  ui  Fk
Sampling of
N n
N n
X ~ Bi ( X | N , )     1   
n
N  n 1 
P( X  n | N , ) 
1
n
P( X  n  1 | N , )
i ) ui ~ Un(u 0,1)
ii ) a ) pn 
N  n 1 
pn 1
n
1
b) Fn  Fn 1  pn
n
p0  1   
N
from n  0 until
Fn1  ui  Fn
Examples:
Sampling of
N
N n
X ~ Bi ( X | N , )     n 1   
n
N  n 1 
P( X  n | N , ) 
1
n
50,000 events
Bi (n 10,0.5)
P( X  n  1 | N , )
i ) ui ~ Un(u 0,1)
ii )
a ) pn 
N  n 1 
pn 1
n
1
p0  1   
N
b) Fn  Fn 1  pn
from n  0 until
Fn 1  ui  Fn
n
Sampling of
X ~ Po( X |  )
e  k 
P( X  k  ) 
 P( X  k  1  )
k!
k
i ) ui ~ Un(u 0,1)

ii ) a ) pk 
pk 1
k
b) Fk  Fk 1  pk
k
p0  e  
from k  0 until
Fk 1  ui  Fk
50,000 events
Po(k 2)
Even though Discrete Distributions can be sampled this way,
usually one can find more efficient algorithms…
Example: X ~ Po( X |  )
U i ~ Un(u | 0,1)
Wn ~ p( wn | n) 
n
Wn   U k  [0,1]
1
( log wn ) n 1
 ( n)
k 1
1
P (Wn  a) 
 ( n)

e
x
n 1
x dx
a  e
log a
n 1
e   n  nk (n)
m

e 

(n) k 1 (n  k  1)
m 0 (m  1)

i) generate ui ~ Un(u | 0,1)
ii) multiply them
wn  u1 un

and go to i) while wn  e
iii) deliver x  n  1 ~ Po( x |  )
P (Wn  e )  e

n 1
m
 (m  1)  Po( X  n 1)
m 0
PROBLEMS:
Gamma Distribution:
1) Show that if
X ~ Ga( x | a, b)
2) Show that if b  n  N
then
Y  aX ~ Ga( y | 1, b)
n
Y   Z i ~ Ga( y | 1, n)
i 1
Z i ~ Ga( z | 1,1)  Ex( z | 1)
( generate exponentials)
Beta Distribution:
 X 1 ~ Ga( x1 | a, b1 ) 
X1
1) Show that if 
 then Y  X  X ~ Be ( y | b1 , b2 )
1
2
 X 2 ~ Ga( x2 | a, b2 )
Generalization to n dimensions
… trivial but …
F ( x1, x2 ,..., xn )  Fn ( xn xn1,..., x1 ) Fn1 ( xn1 xn2 ,..., x1 )F2 ( x2 x1 ) F1 ( x1 )
and
p( x1, x2 ,..., xn )  pn ( xn xn1,..., x1 ) pn1 ( xn1 xn2 ,..., x1 ) p2 ( x2 x1 ) p1 ( x1 )
n
if
p( x1 , x2 ,..., xn )   pi ( xi )
i 1
n
X k are independent
F ( x1 , x2 ,..., xn )   Fi ( xi )
i 1
… but if not ...
there are
n!
ways to decompose the pdf and some are
easier to sample than others
( X , Y ) ~ p( x, y )  2 exp   x 
y

Example:
; x  0,   ; y  0,1
Marginal Densities:

1
p y ( y)   p( x, y) dx  2 y

px ( x )   p( x, y ) dy  2 x  e
0
p ( x, y )  p x ( x ) p y ( y )
0
u
x
u
2
du
not independent
Conditional Densities:
p ( x, y )  p ( x | y ) p y ( y )
Fy ( y )  y 2
exp   x 
y
p ( x, y )

p( x y ) 

py ( y)
y
Fx ( x y )  1  exp   x 
y

easy
p( x, y)  p( y | x) px ( x)
difficult
Properties:

Direct and efficient in the sense that
one

ui ~ Un(u 0,1)
one
xi ~ p( x  )
Useful for many distributions of interest
(Exponential, Cauchy, Weibull, Logistic,…)

…but in general,
F ( x  ) difficult to invert
numeric approximations are slow,…
“Hit-Miss”
J. Von Neuman, 1951
Method 2: Acceptance-Rejection
Sample
X   X  [a, b]
0  p( x |  )  max x p( x |  )
X ~ p( x  )
1) Enclose the pairs
such that:
( x, y  p ( x |  )) in a domain    ,   0, k 
a, b   ,  
(not necessarily hypercube)
and
0, max p( x |  )  0, k 
2) Consider a two-dimensional random quantity Z  (Z1 , Z 2 )
uniformly distributed in the domain
   ,    0, k 
g ( z1 , z 2 |  ) dz1dz 2 
k
max p( x |  )
p( x |  )

a
dz1 dz 2
  k
3) Which is the conditional distribution
P(Z  x Z  p( x |  )) ?
1
0
; that is:
b

2
P(Z1  x Z 2  p( x |  ))
p ( z1 | )
x
 dz  g ( z , z
2
 dz  g ( z , z
2
1

1
1
1
P ( Z1  x, Z 2  p ( x |  ))

P2 ( Z 2  p ( x |  ))
x
 p( z
|  ) dz2
1

0
p ( z1 | )



 p( z
|  ) dz2
1
0
|  ) [ a ,b ] ( z1 ) dz1
|  ) [ a ,b ] ( z1 )dz1
Algorithm
i) Sample
and
Z1
Z2
ii) Get
X ~ p( x  )
n)
k
0
max p( x |  )
 F (x )
z1i ~ Un( z1  ,  )
z2i ~ Un( z2 0, k )
if
z2i  p( z1i  )
if
z2i  p( z1i  )
rejected
accept
reject
and go back to i)
p( x |  )
x1, x2 ,... xn  ~
accepted

a
b

xi  z1i
xi  z1i
p( x  )
Example:
(pedagogical; low efficiency)
X ~ Be ( x a, b)
p ( x | a, b)  x
density:
a 1
1  x 
b 1
; x  0,1
1
(normalisation:
 px a, bdx  Be(a, b)
…not needed )
0
Covering adjusted for maximum efficiency:
max p( x a, b)  p( xmax
( x, y )
a 1
b 1
(
a

1
)
(
b

1
)
 a 1
| a, b) 
ab2
(a  b  2) a b2
generated in the domain
   ,   0, k   0,1 0, max p( x a, b)
Algorithm
i) Sample
n)
xi ~ Un( x 0,1)
yi ~ Un( y 0, maxp ( x a , b))
ii) Accept-Reject
if y  p( x a, b)
i
i
if y  p( x a, b)
i
a  2.3 ; b  5.7
i
accept xi
reject xi and go to i)
  0,1 0, max p( x a, b)
ngen  2588191 e  nacc  0.3864
ff
ngen
nacc  1000000
 nacc 
 p( x  )dx   () ngen   0.016791

 (0.000013)
X
Be (2.3,5.7)  0.01678944
 ()  max p( x a, b)
Be (x | 2.3,5.7)
… weighted events ???
We can do the rejection as follows:
i) Assign a weight to each generated pair
0, max p( x |  )  0, k 
ii) Acceptance-rejection
ui ~ Un(u 0,1)
wi 
( xi , yi )
0  wi  1
k
(if we know max p( x |  )
k  max p( x |  ) )
xi
if
ui  wi
xi
if
ui  wi
accept
reject
p ( xi  )
Obviously, events with a larger weight are more likely to be accepted
 After step ii), all weights are:
if rejected
0
wi 
1
if accepted
 Sometimes, it is interesting not to apply step ii) and keep the weights
 g ( x) p( x  ) dx  g x 
x
p
1

w
n
 g x  w
i 1
i
i
n
;
w   wi
i 1

And suddenly…


 Many times we do not know max p( x  )
 After having generated
we find a value
Nt
and start with an initial guess
events in the domain
k1
   ,   0, k1 
xm | p( xm |  )  k1
incorrect estimation of
k  max p( x  )
k1
p( x |  )
0

b 
a
xm

Don't throw away all generated events…
The pairs
( x, y )
have been generated in
with constant density so
   ,   0, k1 
Nt
Nt  Ne
with

(   ) k1 (   ) k 2
k2  p( xm |  )  k1
we have to generate:
k

N e  N t  2  1
 k1 
additional pairs ( x, y )
… in the domain    ,   k1 , k2 
with the pseudo-distribution
g x     px    k1  1p  x   k1 ( x)
… and proceed with
 2   ,   0, k2 
Properties:

Easy to implement and generalize to n dimensions
 

Efficiency
ef 
depends on the covering domain
area under
p( x  )
area covering domain

   ,   0, k 
# accepted events
# generated events
1
N acc
 p( x  )dx   () e f   () N gen
X
k
max p( x |  )
ef  1
rejected
is equivalent to the Inverse Transform
p( x |  )
The better adjusted the covering is,
the higher the efficiency
accepted
0

a
b

Straight forward generalization to n
dimensions
Algorithm:
i) Generate n+1 Un( x ) distributed random quantities
( xi1) , xi2 ) ,..., xin ) ; yi ) ~ Un( x1) 1 )  Un( x 2 )  2 )  ...
...  Un( x n )  n )  Un( y 0, k )
k)
ii) Accept or reject
accept ( xi1) , xi2 ) ,..., xin ) ) if
yi  p ( xi1) , xi2 ) ,..., xin ) |  )
reject if not and go back to i)
(pedagogical; low efficiency)
3D Example:
ri
ro
0,0,0
Points inside a torus of radius
( ri , ro )
centred at
0,0,0
Cover the torus by a parallelepiped:
 (ri  ro ), (ri  ro )   (ri  ro ), (ri  ro )   ri , ri 
Algorithm:
n)
i) Generate
xi ~ Un( x  ( ri  ro ), ( ri  ro ))
yi ~ Un( y  ( ri  ro ), ( ri  ro ))
zi ~ Un( y  ri , ri )
ii) Reject if
2
2
xi  yi  ( ri  r o )2
2
2
or xi  yi  ( ri r o )2
2
2 12 2
2
2
z

(
r

(
x

y
)
)

r
or i
0
i
i
i
And go back
to i)
otherwise accept ( xi , yi , zi )
( x1 , y1 , z1 ), ( x2 , y2 , z2 ),..., ( xn , yn , zn ) T ( ro , ri ,0,0,0)
ri  1 ; ro  3
N generated  10,786
N accepted  5,000
N accepted
 0.4636
N generated
2 2 r0 ri
volume toroid
ef 

volume parallelep iped 8ri r0  ri 2
2
knowing that:
V parallelepiped  128
we can estimate:
Vtoroid  e f V parallelepipedo  59,34  0,61
Vtoroid  59,218
Problem 3D:
Sampling of Hidrogen
atom wave functions
n ,l ,m r ,  ,   
(3,1,0)
( x, y )
(3,2,0)
 Rn ,l r Yl ,m  ,  
Pr, ,  n, l , m 
 Rn,l r  Yl ,m  ,   r 2 sin  
2
Evaluate the energy
using Virial Theorem
T 1 V
2
E n1 V n
2
e2 1
V (r )  
40 r
(3,2,±1)
2
( x, z )
( y, z )

(problem for convergence of
estimations,…)
Problems with low efficiency
Example in n-dimensions:
n-dimensional sphere of radius
i) Sample
x
1)
i
,, xin )



 
y 2  1 accept
centred at
(0,0,...,0)


~ Un x1)  1,1   Un x n )  1,1
2
1)
ii) Acceptance-rejection: y  xi
if
r 1
x
1)
i
2

 xi2 )
,  , xin )

2

   xin )

2
 as inner point of the sphere
 xi1)
xin )
or 
,,
y
y



over the surface

if y 2  1 reject the n-tuple and go back to i)
V ( Sn ) 2 n / 2 / n(n / 2)
ef 

V (Hn )
2n
n 

 0
e f ( n  4)  31%
e f ( n  5)  16%
Why do we have low
efficiency?
Most of the samplings are done in regions of the
sample space that have low probability
p( x  ) 
Example:
Sampling of
X ~ p( x  ) ~ e x ; x [0,1]
Generate values of x
uniformly in  x  0,1
… do a more clever
sampling…
usually used in combination with
“Importance Sampling”

1  e 
e
 x
1  e  x
F ( x ) 
1  e 
Example of inefficiencies: Usual problem in Particle Physics…
(2 ) 4
d 
| iM fi |2 d n ( p1 , pn | p)
F
i) Sample phase-space variables
d n ( p1 , pn | p)
ii) Acceptance-rejection on
| iM fi |2
iii) Estimate cross-section
sample of events with
dynamics of the process
E[| iM fi |2 ]
“Naïve” and simple generation of n-body phase-space
d n ( p1 ,, pn | p)  d j ( p1 ,, p j | q)  d n1 j (q, p j 1 ,, pn | p)2  dmq
3
2
In
p
rest-frame:
n
S k   mi
p2
p3
mq0  M p ; mqn1  mn
i k
d n ( p1 ,, pn | p)  d n1 ( p2 ,, pn | q1 )  d 2 (q1 , p1 | p) dmq1
pn
(m2   mn )  mq1  M p  m1
q1
mq1  S 2  u1 (mq0  S1 )
u1 ~ Un(0,1)
p
p3 d ( p ,, p | q )  d ( p ,, p | q )  d (q , p | q ) dm
n 1
2
n
1
n2
3
n
2
2
2
2
1
q
p4
p1
pn
(m3   mn )  mq2  mq1  m2
p2

2
2
mq2  S3  u2 (mq1  S 2 )
u2 ~ Un(0,1)
q2
q1
2

(mk 1    mn )  mqk  mqk 1  mk
mqk  S k 1  uk (mqk 1  S k )
uk ~ Un(0,1)
for the k  1,, n  2 intermediate states
and then weight each event…

1
| p1 |
d 2 ( p1 , p2 | p) 
d1
4 (2 )6 m p
n2
WT  
1/ 2 (mq , mn  k , mq )
k 0
+ Lorentz boosts to overall centre of mass system
k 1
k
mqk
 ( x, y, z )  [ x 2  ( y  z ) 2 ][ x 2  ( y  z ) 2 ]
…but
| iM fi |2 , cuts,…!
usually inefficient (“very”)
Method 3: Importance Sampling
Sample
1) Express
X ~ p( x |  )  0 ; x   X
p( x |  )  g ( x | 1 ) h( x | 2 )  0 ; x   X
i)
g ( x 1 )  0 ; h( x  2 )  0 ; x   X
ii)
h( x  2 )
probability density
p( x |  )dx  g ( x | 1 ) h( x | 2 ) dx  g ( x | 1 ) dH ( x | 2 )
In particular, take a convenient
2) Consider a sampling of
h( x |  2 )  0 (easiness) and define g ( x | 1 ) 
X ~ h( x | 2 )
and apply the acceptance-rejection algorithm to
p( x |  )
h( x |  2 )
g ( x 1 )  0
How are the accepted values distributed?
x
P X  x | Y  g ( x | 1 )
   X  Y
 ()   ( X ) (Y )
1
h( x |  2 )dx
 
g ( x|1 )
dy
P X  x, Y  g ( x | 1 ) 

 
g ( x|1 )
PY  g ( x | 1 ) 
1
h( x |  2 )dx  dy

 

x
x
 h( x ) g ( x )dx  p( x )dx
2
1
 
 

 h( x 2 ) g ( x1 )dx  p( x )dx

F (x |  )

Algorithm
i) Sample
xi ~ h( x  2 )
ii) Apply Acceptance-Rejection to
g ( x 1 )
x1 , x2 ,...xn  sample drawn from p( x  )  g ( x 1 ) h( x  2 )
…∞ manners to choose h( x  2 )
 Easy to invert so we can apply the Inverse Transform Method
Easiest choice: h( x |  2 ) 
1
1 ( x)
 ()
… but this is just the acceptance – rejection procedure
 We would like to choose h( x |  2 ) such that,
instead of sampling uniformly the whole domain  X ,
we sample with higher probability from those regions where
p( x |  ) is “more important”
Take
p( x |  )
g ( x | 1 ) 
h( x |  2 )
h( x |  2 ) as “close” as possible to p( x |  )
will be ”as flat as possible” (“flatter” than p( x |  ) )
and we shall have a higher efficiency when applying
the acceptance-rejection algorithm
Example:
density:
X ~ Be ( x a, b)
a  2.3 ; b  5.7
p( x | a, b)  x a 1 1  x 
b 1
1[ 0,1] ( x)
1) We did “standard” acceptance-rejection
2) “Importance Sampling”
m  [a]  2  a  m  1 ; 1  0.3
n  [b]  5  b  n   2 ;  2  0.7
b 1


x 1 x
n 1
m 1


p( x | a, b)  m 1
x
1

x
n 1
x 1  x 
a 1
 [ x1 1  x  2 ][ x m1 1  x  ]

n 1
dP( x | a, b) ~ [ x1 1  x  2 ]dBe ( x | m, n)

p( x | a, b)  [ x1 1  x  2 ][ x m1 1  x  ]

n 1
2.1) Generate
X ~ Be ( x m, n)
U k ~ Un( x 0,1)
m
Ym   log(  U k ) ~ Ga( x 1, m)
Z
Ym
~ Be ( x m, n)
Ym  Yn
k 1
2.2) Acceptance-Rejection on
g (x ) concave on x  0,1
g ( x)  x1 1  x  2

0  1 ,  2  1
1  2
max x {g ( x)} 
(1   2 ) 
1
2
1
p( x |  )
2
P( X  x)
Un(x 0,1)
Be (x 2,5)
Be (x 2.3,5.7)
Be (x 2,5)
Un(x 0,1)
acceptance-rejection
ngen  2588191
nacc  1000000
eff  0.3864

Be (2.3,5.7)  eff  f max 
 0.016791
 0.000013
Importance Sampling
ngen  1077712
nacc  1000000
eff  0.9279
Be  (2.3,5.7)  eff  f max  Be (2,5) 
 0.0167912
 0.0000045
Be (x 2.3,5.7)
Be (2.3,5.7)  0.01678944
Method 4: Decomposition
(… trick)
Idea: Decompose the pdf as a sum
m
p ( x |  )   ai pi (x |  )
of simpler densities
ai  0 ;
pi ( x |  )  0
i 1
a
i
1
i
Normalisation:


m
m
 p( x |  ) dx   a  p ( x |  ) dx  a
i 1

Each density
i  1, m

x   x
pi ( x |  )has
Sample
i
i
i 1

a relative weight
i
1
ai
X ~ pi ( x |  ) with probability pi  ai
… thus, more often from those with larger weight
Algorithm
n)
i) Generate ui ~ Un(u 0,1)
ii) Sample
to select pi ( x |  ) with probability ai
X ~ pi ( x  )
Note:
Sometimes the pdf
pi ( x |  ) can not be easily integrated
… normalisation unknown  generate from
Evaluate the normalisation integrals
Ii
integration) and assign eventually a weight
m
p( x |  )   ai I i 
i 1
f i x    pi x  
during the generation process (numeric
wi  ai I i
f i x  
Ii
to each event
 ai I i  pi x  
m
i 1

3
2
X
~
p
(
x
)

1

x
Example:
8
g1 ( x)  1
normalisation
2
g 2 ( x)  x
p1 ( x)  1
22
p ( x)  3x
2
3
1
p( x) 
p1 ( x ) 
p2 ( x )
4
4

; x  [ 1,1]
Algorithm:
2
i) Generate ui ~ Un(u 0,1)
ui 
3
4
3
 ui  1
4
ii) Sample from:
p1 ( x)  const
p2 ( x )  x 2
75% of the times 15% of the times
p2 ( x )  x 2
p( x)  1  x 2
p1 ( x)  const
Generalisation:
m
Extend
 g (x ) to a continuous family  g ( y ) dy and consider
i 1
i
p( x |  ) 
p ( x |  ) as a marginal density:
 p( x, y |  ) dy   p( x y,  )
y
Algorithm:
n)
p( y |  ) dy
y
i) Sample
yi ~ p( y |  )
ii) Sample
xi ~ p( x yi ,  )
Structure in bayesian analysis
Experimental resolution
p ( | x)  p ( x |  )  ( )
pobs ( y |  ,  ) 
 R ( y | x,  )
pt ( x |  ) dx
Y
p( x, y |  ,  )  R( y | x,  ) pt ( x |  )
The Normal Distribution
Standardisation Z 
X ~ N ( x , ) 
( x   )
1
e
2 
X 

Z ~ N ( z 0,1) 
2
1
z
e 2
2
X    Z
but...
i) Inverse Transform:
F ( z ) is not an elementary function
entire function
ii) Central Limit Theorem:
series expansion convergent but inversion slow,…
12
Z  U i  6
U i ~ Un(u 0,1)
Z  [6, 6] ; E[ Z ]  0 ; V ( Z )  1
approx
i 1
Z ~ N ( z 0,1)
iii) Acceptance - Rejection: not efficient although…
iv) Importance Sampling:
easy to find good approximations
Sometimes, going to higher dimensions helps:
in two dimensions…
2
2 2
Inverse Transform in two dimensions
(G.E.P.Box, M.E. Muller; 1958)
i) Consider two independent random quantities

1
p( x1 , x2 ) 
e
2

X 1 , X 2 ~ N ( x 0,1)
ii) Polar Transformation:
X1  R cos 
X 2  R sin 
1 2 2
x1  x2
2

( X1 , X 2 )  ( R, )  [0, )  [0,2 )
1
p(r ,  ) 
e
2
r2

2
r  p(r ) p( )
( R, ) independent
iii) Marginal Distributions:
2
Fr ( r ) 
 p(r, ) d 1  e
r2

2
0

F ( )   p( r, ) dr 
2
0
…trivial inversion
Algorithm
i) Generate
n)
u1 , u2 ~ Un(u 0,1)
ii) Invert
Fr (r )
F ( ) to
get
iii) Obtain
x1 , x2 ~ N ( x 0,1)
as
and
x1   2 ln( u1 ) cos2 u1 
r   2 ln u1
  2u2
x2   2 ln( u1 ) sen 2 u1 
(independent)
Example: 2-dimensional Normal random quantity
Sampling from Conditional Densities
( X1 , X 2 ) ~ N ( x1 , x2 1 , 2 ,  1 ,  2 ,  )
p(w1 , w2 |  )  p(w2 | w1 ,  ) p(w1 )
 N ( w2 |  w1 , (1   )
2
Z2 
W2  W1
1 
2
i) Generate
~ N ( z 2 | 0,1)
1
2
) N ( w1 | 0,1)
Z1 ( W1 ) ~ N ( z1 | 0,1)
z1 , z2 ~ N ( z 0,1)
z1 1

 x1   1  








ii) 
2 




 x2   2   2 ( z1   z2 1   ) 
Standardisation
X  i
Wi  i
i
… different ways
n-dimensional Normal Density
Factorisation Theorem:
(Cholesky)
C
1
V  nn symmetric and positive defined
unique, lower triangular and with
positive diagonal elements
 C
V C
if
1 T
1
x  μT V 1 x  μ  C1 (x  μ)T C1 (x  μ)
if X ~ N ( x μ,V ) take
C
V  C CT
such that
X  μ  CY
and Y ~ N ( y 0, I )
Ci1 
such that
Cij  0 ; j  i
since it is triangular inferior
matrix
Vi1
V11
;1  i  n
j 1
Cij 
Vij   CikC jk
;1 j  i  n
k 1
C jj


Ci1  Vii   Cik2 
k 1


i 1
1
2
;1 i  n
Algorithm:
0) Determine matrix
C
from
V
z1, z2 ,..., zn 
i) Generate n independent random quantities
each as
zi ~ N ( z 0,1)
n)
n
ii) Get
xi
 i   C ij z j
j 1
z  C1 (x  μ)
x  μC z
x ~ N (x μ, V)
Example: 2-dimensional Normal random quantity
  12
V
  
1
2

 1
C
 
2

 1 2 

2
 2 


2 
1  
0
2
z1 , z2 ~ N ( z 0,1)
C11 
V11
 1
V11
C 21 
V21
  2
C11
x  μC z
C12  0

2
C 22  V22  C 21
2

1  2
1
2

Example: Interaction of
photons with matter
(Compton scattering)
Interaction of photons
with matter
Photoelectric
Rayleigh
(Thomson)
Compton
Pairs e+e-
Photonuclear
Absorption
Radiography
Photographic
film
Iron
Carbon
Photon
beam
1) What kind of process?
pCompton 
 t   Compton   pairs  
p pairs

 = cross-section =
pairs

 pairs

t
“probability of interaction”
with atom expressed in cm2
u ~ Un(u 0,1)
Compton
 Compton
t
u  F1
F1  u  F2
F1  pCompton
F2  pCompton  p pairs

Fi  pCompton  p pairs  
Fn  1

2) Where do we have the first/next interaction?
In a thin slab of surface S and thickness 
Volume
Density
S

cm3
We have
g cm-3
S
NA
atoms
A
“probability of interaction”
with atom expressed in cm2
NA 
“Interaction surface” covered by atoms S eff   
cm2
S



A 

 = cross-section =
S

Total surface S  Probability to interact with an atom in a thin slab
S eff
NA

p  PI ( ) 
 

S
A

POne I ( x   )  PNo I ( x ) PI ( )

x has m  x 1 slabs of thickness 
A
N A
p 
POne I ( x   )  (1  p) p  e mp p  e  x /  1dx
m
p( x |  )  e

x

 ;
1
x  0,  
“mean free path”
E[ x ]  
Get the “mean free path”

A
cm
t NA
 t   Compton   pairs  
Generate distance until the
next interaction
x
pint ( x |  )  e


1
Fint ( x |  )  1  e
u ~ Un(u 0,1)
x

… Inverse Transform
x   ln u
cm
(2)  (1)
In this example, we shall simulate only the Compton process so I took
the “Mean Free Path” for Compton Interaction only
 
d
1 dx dx
1
 0 ( E ) 

  eatomic
    e free
 Thomson   1  a  2(1  a )
4
1
1  3a 
 1
 ln( 1  2a )  
ln( 1  2a ) 
  2 
2 
a
1

2
a
a
2
a
(
1

2
a
)




  atom     atom  e free
a
E
me

 t ( E )  Z 0 ( E )
A
cm
 [ Z 0 ( E )] N A
(2)  (1)
3) Compton Interaction
easy: 2 variables (8-2-4)  , 
Eout
’
in
E


Ein
a
me
e

Eout
1
  in 
E
1  a (1  cos )
 0
 1
 
1
 
1  2a
Eout  maxE   Ein
Eout
Ein
 minE  
1  2a
Perturbative expansion in Relativistic Quantum Mechanics
d
3  Thomson

f ( x)
x  cos   1,1
dx
8
1
f ( x) 
1  a (1  x )2
  0,  

a 2 (1  x ) 2 
2

1  x  1  a (1  x ) 



 Thomson  0.665 barn  0.665  1024 cm 2
  0,2 
(integrated)
3.1) Generate polar angle for the outgoing photon ()
x  cos   1,1
1
f ( x) 
1  a (1  x )2

a 2 (1  x ) 2 
2

1  x  1  a (1  x ) 



complicated to apply inverse transform…
straight-forward acceptance-rejection very inefficient...
Use: Decomposition + inverse transform + acceptance-rejection
f ( x )   f1 ( x )  f 2 ( x )  f 3 ( x ) g ( x )
1
fn ( x) 
1  a (1  x )n
( 2  x 2 ) f1 ( x )
g ( x)  1 
1  f1 ( x )  f 2 ( x )
f n ( x)  0
easy to invert… Inverse Transform
g ( x)  0
“fairly flat”… Acceptance-rejection
f ( x)   f1 ( x)  f 3 ( x)  f 3 ( x) g ( x)
Probability Densities
1
pi ( x ) 
fi ( x)
wi
1

pi ( x ) dx  1
1
wi 

f i ( x ) dx
1
b  1  2a
1
fn ( x) 
1
1  a (1  x )n
1
ln b
a
2
w2  2 2
a (b  1)
2b
w3  3 2
a (b  1) 2
w1 
f ( x)   f1 ( x)  f 3 ( x)  f 3 ( x) g ( x) 
 w1 p1 ( x )  w2 p2 ( x )  w3 p3 ( x ) g ( x ) 
wT  w1  w2  w3
w
i  i
wT
1  2  3  1
 wT 1 p1 ( x )  2 p2 ( x )  3 p3 ( x ) g ( x ) 
 wT h( x) g ( x)  f ( x)
… we have everything...
1) Sample
X g ~ wT 1 p1 ( x)   2 p2 ( x)   3 p3 ( x) g ( x)
u  1
Decomposition
u ~ Un(u 0,1)
Inverse transform
x g ~ p1 ( x )
1  u  1  2
1  2  u
x g ~ p2 ( x )
x g ~ p3 ( x )
x
F ( x )i 
 p ( s) ds
i
u ~ Un(u 0,1)
1
ln 1  a (1  x )
F1 ( x )  1 
ln b
2
b 1
1
F2 ( x ) 

2(b  x ) 2a
 (b  1)2

1
F3 ( x ) 

1
2

4a (1  a ) 
 (b  x )

xg
xg
xg
1  a  bu

a
a (b2  1)
b
1  2au
b 1
b
1  4a(1  a )u12
acceptance-rejection u ~ Un(u 0, g M )
g M  maxg (x) 
 g ( x  1)  1 
b
1  b  b2
u  g ( xg )
accept
xg
u  g ( xg )
reject
xg
2-body kinematics
xg ~ f ( x )
N
in
out
E
e
e

… for the electron:
1


Eeout  Ein 1 
 
a


E

1  a (1  xg )
g ~ Un( 0,2 )
tan  e 
in
,  0,   0
x  cos
out
e
2
1 a
, g  acos( xg ), g 


E
E
’
cot 
g
with respect to the direction of
incidence of the photon !! …
rotation
Eout
 
Ein
100,000 generated
photons
Ein  1 MeV
trajectory of one photon
C : ( Z  6, A  12,   2.26 gr  cm 3 )
Fe : ( Z  26, A  55.85,   7.87 gr  cm 3 )
END
OF
FIRST
PART
To come:
Markov Chain Monte Carlo (Metropolis, Hastings, Gibbs,…)
Examples: Path Integrals in Quantum Mechanics
Bayesian Inference
Method 5: Markov Chain Monte Carlo
X ~ Bi ( k | n, p )
P ( X  k | n, p )
n k
P( X  k | n, p)    p (1  p) n k
k 
k  0,1,2,, n
k  0,1, ,10
n  10
p  0.45
nbins  11
k  0,1, , n
 k 1  P( X  k )
π   1 ,  2 ,,  n1 
Probability vector
i  0 ;
n 1

i 1
i
1
Step 0)
Generate N=100,000
number of bins = 11
UnD (k 1,, n  1  11)
P( X  k )  1
n 1
1
; k
11
d1 , d2 ,, d n1   d
n 1
i 1
i
N
Sampling probability vector π (0)  1( 0) ,  2( 0) ,,  n(0)1    d1 , d 2 ,, d n1 
N
N
 N
First step done: We have already the N=100,000 events
But we want this:
Second step: redistribute all N generated events moving
each event from its present bin to a different one (or
the same) to get eventually a sampling from P( X  k )   k
HOW?
In one step: an event in bin i goes to bin j with probability
… But this is equivalent to sample from
P( X  j | n, p )
Bi ( k | n, p )
Sequentially: At every step of the evolution, we move all events from the bin where
they are to a new bin (may be the same) with a migration probability
( P ) ij  P(i  j )  a( j | i)
populations
step
( 0)
(1)

(i )
(i  1)

d
d

(0)
1
(1)
1
d
d
(i )
1
probability vector
, d 2( i ) ,  , d n( i)1
( i 1)
1
π (1)


, d 2( i 1) ,  , d n( i11)

 


π ( 0 )   1( 0 ) ,  2( 0 ) ,  ,  n( 0)1
, d 2( 0 ) ,  , d n( 01)
, d 2(1) ,  , d n(1)1 
(1)
1
,  2(1) ,  ,  n(1)1
π ( i )   1( i ) ,  2( i ) ,  ,  n( i)1



π ( i 1)   1( i 1) ,  2( i 1) , ,  n( i11)

Transition Matrix among states
Transitions among states of the system
 p11

 p 21
P 


p
 n1
π(0)
π (1)  π ( 0 ) P
π ( 2 )  π(1) P  π ( 0 )P2
p12
p 22

pn2
 p1n 

 p2n 
  

 p nn 
( P ) ij  P(i  j )  a( j | i)
π ( k )  π ( k 1) P  π ( k 2) P 2    π ( 0) P k
nn
that allows
PR
π ( 0 ) to desired π   1 ,  2 ,,  N 
Goal: Find a Transition Matrix
to go from
π ( i 1) depends upon π (i )
and not on
π ( ji )
Markov Chain with transition matrix P
Transition Matrix is a Probability Matrix
n
p
j 1
n
ij
  P(i  j )  1
j 1
the probability to go from state i to whatever some other state is 1
Remainder: If the Markov Chain is…
irreducible : all the states of the system communicate among themselves
and
ergodic: that is, the states of the system are
recurrent : being at one state, we shall return to it with p  1
positive: we shall go to it in a finite number of steps
aperiodic: the system is not trapped cycles
i) There is a unique stationary distribution
ii) Starting at any arbitrary state
iii)
π
with
π  πP
(unique fix vector)
π ( 0 ) , the sequence
π ( 0) , π (1)  π (0) P , , π ( n )  π ( n1) P    π ( 0) P n , 
tends asymptotically to the fix vector π  πP
1  2   N 


1  2   N 
lim n P  

   


1  2   N 

ways to choose the Probability Transition Matrix
A sufficient condition (not necessary) for
the Detailed Balance relation
Why?
It assures that
π to be a fix vector of P
 i ( P )ij   j ( P ) ji
is that
is satisfied
π Pπ
N
N
N

π P     i P i1 ,   i P i 2 ,,   i P iN   π
i 1
i 1
 i 1

If
 i ( P )ij   j ( P ) ji
N
N
  P     P 
i 1
i
ik
i 1
k
k  1,2,, N
ki
 k
Procedure to follow
After specifying
For all the 11 bins
(i  1)
according to the
Detailed Balance condition
probability vector
(state of the system)
π ( i )   1( i ) ,  2( i ) ,  ,  n( i)1 
populations
step
(i )
(P ) ij  P(i  j )  aij
d
(i )
1
, d 2( i ) ,  , d n( i)1

i) For each event at bin i  1,...,11 choose a candidate bin
disc
to go (j) among the 11 possible ones as Un (k | 1, 11)
aij
ii) Accept the migration
i j
with probability
Accept the migration
i i
with probability

π ( i 1)   1( i 1) ,  2( i 1) , ,  n( i11)
d
( i 1)
1
, d 2( i 1) ,  , d n( i11)

1  aij

How do we take aij  a( j | i ) so that
Detailed Balance condition
lim i  π (i )  π  ( p0 , p1 ,, p10 )
 i ( P ) ij   i aij   j a ji   j ( P ) ji
 j j
 i   j  a( j | i)  min 1,  
 i  i
  i 
a(i | j )  min 1,   1
  j 
 j
 i   j  a( j | i)  min 1,   1
 i 
  i   i
a(i | j )  min 1,  
  j   j
 i (P)ij   j (P) ji
limi π (i )  π
 i a( j | i)   j a(i | j )
For instance, at step t …
For an event in bin
i7
Choose a bin j to go as
J ~ UnD ( j 1, 11)
j2
  2 P( X  1)

a72  a(7  2)  min 1,

 0.026  0.026
  7 P( X  6)

u ~ Un(u 0,1)
j6
u  0.026 Move event to bin 2
u  0.026 Leave event in bin 7
  6 P( X  5)

a76  a(7  6)  min 1,

 1.47  1.
  7 P( X  6)

Move event to bin 6
72
76
n  10
p  0.45
After 20 steps…
X ~ Bi (k | n  10, p  0.45)
Convergence?
p
DKL [ p | ~
p ]   pk log ~k
pk
k
E[ X ]  N  4.5
V [ X ]  N (1   )  2.475
Watch for trends, correlations,
“good mixing”,…
Detailed Balance Condition

Trivial election:
(P)ij   j
Still Freedom to choose the
Transition Matrix
P
i  j   j i
Select a new state migrating events from one bin to
another with probability P( X 

k)   k
Basis for Markov Chain Monte Carlo simulation
(P)ij  q( j i )  aij
Simple probability to choose a new
possible bin j for an event that is
at bin i
probability to accept the proposed new
bin j for an event at bin i taken in such
a way that the Detailed Balance
Condition is satisfied
 i (P)ij   j (P) ji
Metropolis-Hastings algorithm
  q (i j )   j q (i j )
aij  min 1, j

  i q( j i )   i q( j i )
  i q( j i ) 
a ji  min 1,
 1

q
(
i
j
)
ji


  j q (i j ) 
aij  min 1,

  i q( j i ) 
 j q (i j )

 i q( j i )
  j q(i j )   j q(i j ) a ji   j (P) ji
 i (P)ij   i q( j i ) aij   i q( j i )
Can take: q( j i)  q(i j ) (not symmetric), even q( j i)  q( j )
but better as close as possible to desired distribution for high acceptance probability
q(i j )   i  aij  1
If symmetric:
q( j i)  q(i j )
Metropolis algorithm
 j
aij  min 1, 
 i 
q( j i )  q(i j )
(symmetric)
 For absolute continuous distributions
Probability density function
X ~  ( x | θ)
 ( x | θ)
p( x  x )  q( x  | x) a( x  x )
  ( x  | θ ) q( x | x ) 

a( x  x )  min 1,



(
x
|
θ
)
q
(
x
|
x
)


x, x    X
Example:
step=0
X ~ Be ( x 4,2)
X ~ Un( x 0,1)
 (s)  s3 (1  s)
p(s  s)  q(s s)  a(s  s)
Metropolis-Hastings
Metropolis
q( s s)  2 s
q( s s)  Un( s 0,1)
 s3 (1  s) 
a( s  s)  min 1, 3

 s (1  s) 
 s2 (1  s) 
a( s  s)  min 1, 2

 s (1  s) 
step=1
step=1
… and after 20 steps..
In practice, we do not proceed as in the previous examples, (meant for illustration)
Once equilibrium reached…
different samplings from “same” distribution
different steps
1) At step t  0 , choose an admissible arbitrary state
t  t 1
2) Generate a proposed new value
q( x  | x (t 1) ,  )  q( x (t 1) | x ,  )
x
a( x , x
( t 1)
x (t )  x 
Metropolis
(0)

  x ;  ( x (0) | θ )  0
q( x  | x (t 1) ,  )  q( x (t 1) | x ,  )
(not symmetric)
x  with probability
  ( x | θ) 
)  min 1,

( t 1)

(
x
|
θ
)


4) If accepted, set
x
from the distribution
(symmetric)
3) Accept the new value
x (0)
a( x , x
( t 1)
  ( x  | θ ) q( x (t 1) | x ,  ) 
)  min 1,

( t 1)
| θ ) q( x  | x (t 1) ,  ) 
  (x
Otherwise, set
x ( t )  x (t 1)
Metropolis-Hastings
1) Need some initial steps to reach “equilibrium” (stable running conditions)
2) For samplings, take draws every few steps to reduce correlations (if important)
Properties:


“Easy” and powerful
Virtually any density regardless the number
of dimensions and analytic complexity
Changes of configurations
depend on
 ( s) q( s s)
r
 ( s ) q( s s )
Normalization is not relevant

Sampling “correct”
asymptotically
Need previous sweeps (“burn-out”,
“thermalization”) to reach asymptotic limit

Correlations among
different states
If an issue, additional sweeps
… nevertheless, in many
circumstances is the only practical
solution...
And Last…
1) Some (“many”) times, we do not have the explicit expression of the pdf
but know the conditional densities
2) Usually, conditionals densities have simpler expressions
3) Bayesian Structure:
p( ,  | x)  p( x |  ,  ) p( |  ) p( )
(and in particular hierarchical models)
…Sampling from Conditional densities
Method 5: Markov Chain Monte Carlo
with Gibbs Sampling
EXAMPLE: Sampling from Conditionals
Sampling of

X ~ St ( x |  ) ;    
F ( x | ) 
p( x |  )  N 1  x
2


  1 


 2 
1 ((  1) / 2)

xF (1 / 2, (  1) / 2,3 / 2, x 2 / 2)
2 ( / 2) 



0) Consider that
 au b 1
b
e
u
du


(
b
)
a

0
1) Introduce a new random quantity U ~ Ga(u | a, b) (extra dimension) with
a  1
x2

b
 1
2
N
p ( x, u |  ) 
u b 1e  au
(b)
Marginals:



0
 dx  p( x, u)du  1
Conditionals:

p( x |  )   p( x, u |  )du  a b
0
p (u |  ) 


~ St ( x |  )
p( x | u, ) 
p( x, u |  )
 N ( x | 0,  2  u 1 )
p(u |  )
p(u | x, ) 
p ( x, u |  )
 Ga(u | a, b)
p( x | )
p ( x, u |  ) dx  u b 3 / 2 e u

~ Ga(u | a, b)
Sampling:
1) Start at t=0 with
x0  R
2) At step t Sample
U | X ~ Ga(u | a, b)
b
 1
2
a  1   1 xt21
X | U ~ N ( x | 0,  2  u 1 )
X ~ St ( x |   5)
E[ X ]  0
 5
E[ X 2 ]  1.667
Yet another EXAMPLE…
Trivial marginals for the Behrens-Fisher problem (lect. 2)


W  1  2 ~ p(w | x1 , x2 ) ~  s1  ( x1  w  u )
2

2 ( n1 1/ 2 )
s
2
2
 ( x2  u )

2 ( n2 1/ 2 )
du

However:
p(1, 2 | x1, x2 )  p(1 | x1 ) p(2 | x2 )
 i  xi 
 ~ St (ti | ni  1)
Ti  ni  1
si 

i  1,2
so, instead of sampling from p(w | x1 , x2 )
Algorithm
n)
1) ti ~ St (ti | ni  1)
2)
w  1  2
i  xi  ti si (ni  1) 1/ 2
i  1,2
Basic Idea:
We want a sampling of
marginal densities
X   X 1 , X 2 ,, X n  ~ px1 , x2 ,, xn 
p( si )   p( x1 , x2 ,, xn ) dxi

conditional densities
si  {x1 ,..., xi 1 , xi 1 ,, xn }
p( x1 , x2 ,, xn )
p( xi | si ) 
p( si )
2) Sample the n-dimensional random quantity
from the conditional distributions
X
p x1 | x2 , x3 , , xn 
p x2 | x1 , x3 , , xn 

p xn | x1 , x2 , , xn 1 
First, sampling from conditional densities:
p( x  x )  q( x  | x) a( x  x )
Consider:
1) The probability density function
2) The conditional densities
qx1 , x2 , x3 ,, xn 
qx1 | x2 , x3 ,, xn 
qx2 | x1 , x3 ,, xn 

qxn | x1 , x2 ,, xn 1 
3) An arbitrary initial state
( x1 (0), x2 (0), , xn (0))   X
(usually less than n are needed)
4) Sampling of the X k ; k  1,, n
random quantities at step t
step
t 1
( x1 (t  1), x2 (t  1), , xn (t  1))   Z
step
t
xk (t )
Generate a possible new value
from the conditional density
Xk
of
qxk | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1)
t
t 1
Propose a change of the system from
the state
to
the state
s  x1 (t ), x2 (t ),, xk 1 (t ), xk (t  1), xk 1 (t  1),, xn (t  1) 
s  x1 (t ), x2 (t ),, xk 1 (t ), xk (t ), xk 1 (t  1),, xn (t  1) 
Metropolis-Hastings:
px1 , x2 ,, xn 
Desired density
Acceptance factor

p x1 (t ), x2 (t ), , xk 1 (t ), xk (t ), xk 1 (t  1), , xn (t  1) 

px1 (t ), x2 (t ), , xk 1 (t ), xk (t  1), xk 1 (t  1), , xn (t  1) 

qxk (t  1) | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1) 
qxk (t ) | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1) 
so we shall accept the change with probability
At the end of step
the sequences
t
x1, x2 ,, xn 
will converge towards the stationary p.d.f
a( s  s)  min( 1,  )
( x1 (t ), x2 (t ), , xn (t ))   X
px1 , x2 ,, xn 
regardless the starting sampling ( x1 (0), x2 (0), , xn (0))   X
In particular,… Gibbs algorithm…
Take as conditional densities to generate a proposed
value the desired conditional densities
Gibbs Algorithm:
qxk | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1)  
 pxk | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1) 
acceptance factor:
qxk (t  1) | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1)  
 pxk (t  1) | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1)  

px1 (t ), x2 (t ), , xk 1 (t ), xk (t  1), xk 1 (t  1), , xn (t  1) 
px1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1) 
qxk (t ) | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1)  
 pxk (t ) | x1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1)  

px1 (t ), x2 (t ), , xk 1 (t ), xk (t ), xk 1 (t  1), , xn (t  1) 
px1 (t ), x2 (t ), , xk 1 (t ), xk 1 (t  1), , xn (t  1) 
 1
So we accept the “proposed” change with probability
a( s  s )  1
After enough steps to erase the effect of the initial values in the samplings and to
achieve a good approximation to the asymptotic limit, we shall have sequences
x
1)
1


, x12) , , x1n) , , x1m ) , x2m ) , , xnm )

~ p( x1 , x2 ,, xn )
Example: X  { X 1 ,..., X n } ~ Di ( x | α)
Conjugated prior for Multinomial:
p( x | α ) 
( 0 )
1 1  2 1
x1
n
 (
k
x2
 n 1
 xn
)
k 1
n
xk  [0, 1]
x
k  0

k 1
k
k
1
 0
k
 k 1
p ( x k | s k , α )  xk
(1  S k  xk )
 n 1
sk  {x1 , x2 ,..., xk 1 , xk 1 , xn1}
n
Sk   x j
j 1
jk
n5
α  1, 2, 3, 4, 5
x (0)  0.211, 0.273, 0.262, 0.101, 0.152
x
k
(0)
k
1
xk
zk 
~ Be ( z |  k ,  n )
1  Sk
Example: Dirichlet and Generalised Dirichlet
Conjugated priors for Multinomial:
p( x | α )  D(α ) x11 1 x2 2 1  xn n 1
X  { X 1 ,..., X n } ~ Di ( x | α)
k  0
D (α ) 
( 0 )
n
 0   k
n
 ( k )
k 1
k 1
xk  [0, 1]
n
x
k 1
Degenerated Distribution:
1
xn  1   xk
k 1
n 1
 n 1  i 1 
p( x | α )  D(α )   xi (1   xk ) n 1
k 1
 i 1

 i 0 ij   i j
V[Xi , X j ] 
 02 ( 0  1)

E[ X i ]  i
0
Z k ~ Ga(1,  k )
k
n 1
Xj 
Zj
X ~ { X 1 ,..., X n } ~ Di ( x | α)
n
Z
k 1
k
X ~ { X 1 ,..., X n } ~ GDi( x | α, β )
n 1
p( x | α )  
i 1
n 1
x
0  xi  1
k 1
k  0
E[ X i ] 
i
k
xn  1   xk
k 1
i 
i   i 1  i 1 ; i  1,2,..., n  2
 n1  1 ; i  n  1
i 1
Si
Si    k ( k   k ) 1
 i  i
  i   ij


V [ X i ]  E[ X i ]
Ti  E[ X i ] 
  i  i  1

X1 ~ Z1
X 2 | X1 ~ Z 2 (1  X1 )
X 3 | X 1 , X 2 ~ Z 3 (1  X 1  X 2 )
i
n 1
1
k  0
( i  i ) i 1 

xi 1   xk 
( i )( i )
 k 1 
i
k 1
i 1
Ti   (  k  1)( k   k  1) 1
k 1
 k 1 
X k  Z k 1   X j 
 j 1 
n 1
X n  1  X j
Z k ~ Be ( k ,  k )
k  1,..., n  1
j 1
X ~ { X 1 ,..., X n } ~ GDi( x | α, β )
Quantum Mechanics
Path Integral formulation of Q.M. (R.P. Feynman)
Probability Amplitude
(xf ,tf )
e
K ( x f , t f xi , ti ) 
Dx (t )
paths
x (t )
( xi , ti )
Propagator
i S x ( t ) 

S x t  

tf
 Lx, x; t  dt
ti

x f , t f    K x f , t f x f , t f xi , ti  dxi
;
t f  ti
Local Lagrangians
(additive actions)
K x f , t f xi , ti    K x f , t f x, t  K x, t xi , ti dx
Feynman-Kac
Expansion
Theorem
K ( x f , t f xi , ti )   e
Expected Value
of an operator Ax 
Chapman-Kolmogorov
 i En ( t f  ti )

n ( x f )n ( xi )
n
A 
 Ax(t )e
e
i S x (t ) 

i S x (t ) 

Dx(t )
Dx(t )
L x , x; t  
One Dimensional Paticle
Wick Rotation
i
 e
2
1
m x (t ) 2  V x (t )
2
t  i
t  it
f
1

S xt   i   m x (t ) 2  V [ x(t )]  dt

i  2
Feynman-Kac
expansion
theorem
K ( x f , f xi , i )   e
 En ( f  i )
n ( x f )n ( xi )
n
x f  xi  0
i  0
K (0, f 0,0)   e
 En f
n (0)n (0)
n
 E1
e
f
1 (0)1 (0)  
Discretised time
D[ x(t )]  AN
tn  t fin
tn 1
tn2
N 1
 dx
j 1
j
 1  x j  x j 1 2


S N x0 , x1 ,, xN    
m
 V (xj )



j 1 2




N
K ( x f , t f xi , ti )  AN  dx1 
x1
N 1
A 
  dx
j 1
j
 dx
N 1
exp S N ( x0 , x1 ,, xN )
x N 1
A( x0 , x1 ,, xN ) exp  S N ( x0 , x1 ,, xN )
N 1
  dx
j 1
j
exp  S N ( x0 , x1 ,, xN )
ti1
ti
t2
t1
t0  tin
xn 1
xi 1
xn2

xi
x2
x1
tn  t0  n  
N 1
A 
  dx
j 1
j
A( x0 , x1 ,, xN ) exp  S N ( x0 , x1 ,, xN )
N 1
  dx
j 1
Goal:
generate
N tray
exp  S N ( x0 , x1 ,, xN )
j
as
p( x0 , x1 ,, xN )  exp  S N ( x0 , x1 ,, xN )
and estimate
A 
(importance
sampling)
N tra y
(k )
(k )
A
(
x
,
x
,

,
x
 0 1
N 1 , x N )
k 1
Very
complicated…
Markov Chain Monte
Carlo
1
V x (t )  k x (t ) 2
2
Harmonic Potential
  x j  x j 1 2

2

S N  x0 , x1 ,, xN    m 
  k xj

2 j 1  





N
Parameters
x0  x(tin )  0
x N  x (t fin )  0
xj  R
;
j  1,  , N  1
 x j   10,10
  0.25
(    t continuo)
N  2000
( N    fin  N )
To isolate fundamental state
Termalisation and correlations
tn  t fin
tn 1
tn2
ti 1
ti
t2
t1
tin  0
N term  1000
N tray  3000
xn 1
xi 1
xn2
xi
x2
x1
xin  x fin  0
N util  1000
Generación de trayectorias
1) Generate initial trajectory
( x0 , x1( 0 ) ,, x N( 0)1 , x N )
2) Sweep over
x1 ,, xN 1
xj ~ Un( x  10,10)
P( x j  xj )  exp  S N ( x0 , x1 ,, xj ,, x N )
P( x j  x j )  exp  S N ( x0 , x1 ,, x j ,, xN )
 P ( x j  xj ) 
a ( x j  xj )  min 1,

P
(
x

x
)
j
j


 min1, exp S N ( x j 1 , x j x j 1 )  S N ( x j 1 , xj x j 1 )
3) Repeat 2)
N term  1000 times
4) Repeat 2)
N tray  3000 times and take one trajectory out of 3
N util  1000 trajectories x0  0, x1,, xN 1, xN  0
T
Virial Theorem

1  


x  V ( x )
2

Harmonic Potential
V x (t ) 
T

1
k x (t ) 2
2
 V

E
1
 k x2
2

 T

 V

 k x2


X4 Potential
2
a
V x(t ) 
4

2
2

a  x   x 

     1
2  a   a 

2
T
 x 

   1
 a 

2
2
E

3
4

x
4a 2

 x
2

a2

4
Harmonic Potential
k 1
E
E
0
 x2
exact
0
0
 0.486
 0 .5
X4 Potential
  0.25
N  9000
a2
V x(t ) 
4
 x(t ) 2


1


a 


3
4
2
E 0
x

x
0
4a 2
exact
E 0
 0.697
a2

 0.668
4
0
a5
Download