( ) ( ) ( ) ( )

advertisement
Ch 7 Probability Theory and Stochastic Simulation:
Frequentist statistics:
-
Probability of observing E:
-
Joint Probability:
-
Expectation:
NE
N
p (E1 ∩ E2 ) = p (E1 ) p (E2 | E1 )
1 N exp
E (W ) = W ≈
∑ν =1 Wν
N exp
p (E ) ≈
Bayes’ Theorem:
p (E1 ) p (E2 | E1 ) = p (E2 ) p(E1 | E2 )
- Bayes’ Theorem is general.
Definitions:
-
[
] ( ) [
]
variance: var(W ) = E (W − E (W )) = E W 2 − E (W )
(X, Y independent, var(X+Y) = var(X) + var(Y))
standard deviation:
σ = var(W )
covariance
cov( X , Y ) = E{[ X − E ( X )][Y − E (Y )]} , for two
random variables X and Y
covariance matrix
2
2
Important Probability Distributions Definitions:
-
Discrete random variable
o For X i = {X 1 , X 2 ,..., X m }
o N ( X i ) = number of observations of Xi
M
-
( )
∑N X j =T
o T is the total number of observations
i =1
o Probability is definied by:
P( X i ) =
o Normalization is defined by
∑P
M
i =1
N (X i )
T
(X ) = 1
j
Continuous random variable
o This is just the continuous version of the above, defined by
integrals instead of limits, differentials instead of increments
o Normalization condition:
o Expectation
-
Cumulative Probability distribution
o Basis of RAND in matlab
xhi
∫
xlo
p( x )dx = 1
xhi
E ( x ) = x = ∫ xp(x )dx
xlo
o
F ( X M ) = ∫ p( x')dx' = u
x
xlo
o u is defined as uniformly distributed 0 ≤ u ≤ 1
Bernoulli trials
o Concept that observed error is the net sum of many small random
errors
Random Walk Problem
- key point: independence of coin tosses
- Main results:
x =0
x 2 = nl 2
Binomial Distribution
-
 n  n
(n − n )
probability distribution: P(n, nH ) =   p H H (1 − p H ) H
 nH 
 n 
n!
binomial coefficient:   =
 nH  nH !(n − nH )!
BINORND
Matlab to generate random number distributed
using binomial distribution
Gaussian (Normal) Distribution
-
Take binomial distribution, change into probability of observing net
displacement after n steps of length l
n
-
n!
1
o p ( x; n, l ) =
 
 (n + x / l )   (n − x / l )   2 


! 
2
2
Evaluate in limit that nÆ ∞, take natural log, and use Stirling’s
approximation
Algebra, and taylor expand around the ln terms
-
Taking the exponential and normalizing such that: ∫ P( x; n, l )dx = 1
-
-
∞
−∞
 x2 
1
exp − 2 
P( x; σ ) =
σ 2 = nl 2
σ 2π
 2σ 
Binomial Distribution of random walk reduces to Gaussian Distribution
as nÆ ∞
Central Limit Theorem: sequence of random variables, which are not
distributed normally, the statistic
1 N ξ j − µ j.
o Sn =
∑
n j =1 σ j
o random variable: ξ j with mean µ j . and variance σ j
-
-
is normally distributed in the limit that nÆ ∞, with variance = 1
 S 2
1
o P (S n ) =
exp − n 
2π
 2 
Non-zero Mean (basis of randn)
 ( x − µ )2 
1
2
o N µ,σ =
exp −

2σ 2 
σ 2π

(
-
2
)
Multivariate Gaussian Distribution (use of covariance matrix)
o Covariance Matrix: [cov(ν )]ij = E {[ν i − E (ν i )]ν j − E (ν j ) }
[
]
o Covariance Matrix is always symmetric and positive definite
o For independent components: cov(ν ) = σ 2 I
o cov(ν ) = cov( A x ) = A[cov( x )]AT
o if ν is a random vector and c is a constant vector:
( )
o
var (c ⋅ν ) = var c ν = c [cov(ν )]c = c ⋅ [cov(ν )]c
o
P (ν ; µ , Σ ) =
T
1
(2π )
N /2
T

 1
T
exp− (ν − µ ) Σ −1 (ν − µ )

 2
Σ
Poisson Distribution
-
Poisson distribution can be used to determine probability of success if
there are n trials, derived in the limit as nÆ ∞
Total number of successes in trial is a random variable, which d
Another limiting case of binomial distribution
ξ
(
pn ) − pn
P(ξ ; n, p ) =
e
ξ!
p = probability of individual success
n = number of trials
ξ = result if success or failure, typically {1,0} with different probabilities
Boltzmann/Maxwell Distributions
1
 E (q ) 
o P q = exp −
Q
 kT 
o Q is the normalization constant
o Replacing E(q) for kinetic energy we arrive at Maxwell Distribution
 mν 2 
o P(ν ) ∝ exp −

 2kT 
()
Brownian Dynamics and Stochastic Differential Equations
-
-
velocity autocorrelation function
o
CV (t ≥ 0 ) ≈ CV (0)e
o
Vx (t )Vx (0 ) = 2 Dδ (t )
-
-
x
2 ρR 2
τv =
9µ
x
Dirac Delta Function
 t2 
1
o δ (t ) = lim
exp − 2 
σ →0 σ 2π
 2σ 
o
-
x
− t / τ Vx
∞
∫
−∞
f (t )δ (t )dt = f (0 )
Langevin equation
Wiener process
Stochastic Differential equations
o Explicit Euler SDE method

1  dU
(∆t ) + [2 D ]1/ 2 (∆Wt )
o x(t + ∆t ) − x(t ) = − 

ζ  dx x (t ) 
Ito’s Stochastic Calculus
o Example: Black-Scholes
o Fokker-Planck
o Einstein Relation
o Brownian motion in multiple dimensions
MCMC
o Stat Mech example
o Metropolis recipe (pg497)
o Example: Ising Lattice
o Field theory
o Monte Carlo Integration
o Simulated annealings
o Genetic Programming
Bayesian Statistics and Parameter Estimation
Goal of this material is to draw conclusions from data (“statistical inference”) and
estimate parameters. Basic definitions
-
Predictor variables: x = [x1 x2 x3… xM]
Response variable: y(R) = [y1 y2 y3 … yL]
θ: model parameters
Main goal: match model prediction to that of the observed response by selecting
θ.
Single-Response Linear Regression
-
set of predictor variables, known a priori: x[k] = [x1[k] x2[k] x3[k]… xM[k]], for
the kth experiment
measurement y[k]
[k ]
[k ]
[k ]
assume a linear model: y [k ] = β 0 + β1 x1 + β 2 x2 + ... + β M xM + ε [k ]
-
the error in ε[k] is responsible for the difference between model and
observed
T
define θ = [β 0 β1 β 2 ... β M ]
-
response is:
-
-
-
[k ]
y [k ] = x ⋅ θ
(true )
+ ε [k ]
[k ] (true )
model prediction is:
yˆ [k ] = x ⋅θ
define design matrix X, which contains all information about every
experiment (with different predictor variables)
 − − − x[1] − − − 


[ 2]
− − − x − − −

X=


:


[N]
− − − x − − − 
vector of predicted responses:
 y) [1] (θ ) 
 )[ 2] 
y (θ ) 
)
y (θ ) = 
= Xθ
 : 
 )[ N ] 
 y (θ )
Linear Least Squares Regression
[
]
2
N
)
S (θ ) = ∑ y [k ] − y [k ] (θ )
-
minimize sum of squared errors:
-
First derivative = 0, 2nd derivative is > 0, using these conditions with
above equation you can derive a linear system
-
(X X )θ
T
LS
(
= X T y Æ θ LS = X T X
)
−1
k =1
X T y (review point?)
XTX, contains information about experimental design to probe the
parameter values
XTX is a real, symmetric matrix that is positive-semidefinite
Solving this is through standard linear solving, or QR decomposition or
some other method
All this estimates parameters, but does not give us accuracy of our
estimates
Bayesian view of statistical inference
-
Statement of belief (especially in random number generators)
Bayesian view of single-response regression
[k ]
(true )
-
Begin with y [k ] = x ⋅θ
+ ε [k ]
When we repeat this experiment multiple times, we get a vector ε
With Gauss-Markov Conditions: E ε [k ] = 0 cov(ε [k ] , ε [ j ] ) = δ kjσ 2
-
We also assume that our error is normally distributed
Probability of observing some response y
-
 1  −N
 1

p y | θ ,σ = 
 σ exp − 2 S (θ )
 2σ

 2π 
We use Bayes’ Theorem to get probability of θ and σ
p y | θ , σ p (θ , σ )
Posterior density: p θ , σ | y =
py
( )
o
-
(
N
)
(
(
)
)
()
p(y) is a normalizing factor
we redefine p(y | θ, σ) to l(θ, σ| y)
in the Bayesian framework we want to maximize posterior density
p (θ ) ~ c
Non-informative priors: p(θ,σ)=p(θ)p(σ)
p(θ ) ∝ σ −1
Nonlinear least squares
-
the treatment via least squares still works, we just use numerical
optimization, utilizing a cost function, to get there: (review point?)
2
1
1 N
[k ]
Fcos t (θ ) = S (θ ) = ∑ y [k ] − f x ;θ
2
2 k =1
use of linearized design matrix
Hessians (first order approximation to get to XTX). Remember to get
convergence, approximate Hessian needs to be positive-definite.
Levenberg-Marquardt method: ill-conditioned systems
[
(
)]
Generating Confidence Intervals
-
t-statistic
o
t≡
y −θ
s/ N
(
)
−
-
(ν +1)
 t2  2
o p(t | ν ) ∝ 1 + 
 ν
o in the limit that ν approaches infinity, t-distribution reduces to
Normal distribution
confidence intervals for model parameters
{[
o θ j = θ M , j ± Tν ,α / 2 s X T X θ
o ν = N − dim(θ )
MCMC in Bayesian Analysis
]}
−1 1 / 2
M
jj
Download