Ch 7 Probability Theory and Stochastic Simulation: Frequentist statistics: - Probability of observing E: - Joint Probability: - Expectation: NE N p (E1 ∩ E2 ) = p (E1 ) p (E2 | E1 ) 1 N exp E (W ) = W ≈ ∑ν =1 Wν N exp p (E ) ≈ Bayes’ Theorem: p (E1 ) p (E2 | E1 ) = p (E2 ) p(E1 | E2 ) - Bayes’ Theorem is general. Definitions: - [ ] ( ) [ ] variance: var(W ) = E (W − E (W )) = E W 2 − E (W ) (X, Y independent, var(X+Y) = var(X) + var(Y)) standard deviation: σ = var(W ) covariance cov( X , Y ) = E{[ X − E ( X )][Y − E (Y )]} , for two random variables X and Y covariance matrix 2 2 Important Probability Distributions Definitions: - Discrete random variable o For X i = {X 1 , X 2 ,..., X m } o N ( X i ) = number of observations of Xi M - ( ) ∑N X j =T o T is the total number of observations i =1 o Probability is definied by: P( X i ) = o Normalization is defined by ∑P M i =1 N (X i ) T (X ) = 1 j Continuous random variable o This is just the continuous version of the above, defined by integrals instead of limits, differentials instead of increments o Normalization condition: o Expectation - Cumulative Probability distribution o Basis of RAND in matlab xhi ∫ xlo p( x )dx = 1 xhi E ( x ) = x = ∫ xp(x )dx xlo o F ( X M ) = ∫ p( x')dx' = u x xlo o u is defined as uniformly distributed 0 ≤ u ≤ 1 Bernoulli trials o Concept that observed error is the net sum of many small random errors Random Walk Problem - key point: independence of coin tosses - Main results: x =0 x 2 = nl 2 Binomial Distribution - n n (n − n ) probability distribution: P(n, nH ) = p H H (1 − p H ) H nH n n! binomial coefficient: = nH nH !(n − nH )! BINORND Matlab to generate random number distributed using binomial distribution Gaussian (Normal) Distribution - Take binomial distribution, change into probability of observing net displacement after n steps of length l n - n! 1 o p ( x; n, l ) = (n + x / l ) (n − x / l ) 2 ! 2 2 Evaluate in limit that nÆ ∞, take natural log, and use Stirling’s approximation Algebra, and taylor expand around the ln terms - Taking the exponential and normalizing such that: ∫ P( x; n, l )dx = 1 - - ∞ −∞ x2 1 exp − 2 P( x; σ ) = σ 2 = nl 2 σ 2π 2σ Binomial Distribution of random walk reduces to Gaussian Distribution as nÆ ∞ Central Limit Theorem: sequence of random variables, which are not distributed normally, the statistic 1 N ξ j − µ j. o Sn = ∑ n j =1 σ j o random variable: ξ j with mean µ j . and variance σ j - - is normally distributed in the limit that nÆ ∞, with variance = 1 S 2 1 o P (S n ) = exp − n 2π 2 Non-zero Mean (basis of randn) ( x − µ )2 1 2 o N µ,σ = exp − 2σ 2 σ 2π ( - 2 ) Multivariate Gaussian Distribution (use of covariance matrix) o Covariance Matrix: [cov(ν )]ij = E {[ν i − E (ν i )]ν j − E (ν j ) } [ ] o Covariance Matrix is always symmetric and positive definite o For independent components: cov(ν ) = σ 2 I o cov(ν ) = cov( A x ) = A[cov( x )]AT o if ν is a random vector and c is a constant vector: ( ) o var (c ⋅ν ) = var c ν = c [cov(ν )]c = c ⋅ [cov(ν )]c o P (ν ; µ , Σ ) = T 1 (2π ) N /2 T 1 T exp− (ν − µ ) Σ −1 (ν − µ ) 2 Σ Poisson Distribution - Poisson distribution can be used to determine probability of success if there are n trials, derived in the limit as nÆ ∞ Total number of successes in trial is a random variable, which d Another limiting case of binomial distribution ξ ( pn ) − pn P(ξ ; n, p ) = e ξ! p = probability of individual success n = number of trials ξ = result if success or failure, typically {1,0} with different probabilities Boltzmann/Maxwell Distributions 1 E (q ) o P q = exp − Q kT o Q is the normalization constant o Replacing E(q) for kinetic energy we arrive at Maxwell Distribution mν 2 o P(ν ) ∝ exp − 2kT () Brownian Dynamics and Stochastic Differential Equations - - velocity autocorrelation function o CV (t ≥ 0 ) ≈ CV (0)e o Vx (t )Vx (0 ) = 2 Dδ (t ) - - x 2 ρR 2 τv = 9µ x Dirac Delta Function t2 1 o δ (t ) = lim exp − 2 σ →0 σ 2π 2σ o - x − t / τ Vx ∞ ∫ −∞ f (t )δ (t )dt = f (0 ) Langevin equation Wiener process Stochastic Differential equations o Explicit Euler SDE method 1 dU (∆t ) + [2 D ]1/ 2 (∆Wt ) o x(t + ∆t ) − x(t ) = − ζ dx x (t ) Ito’s Stochastic Calculus o Example: Black-Scholes o Fokker-Planck o Einstein Relation o Brownian motion in multiple dimensions MCMC o Stat Mech example o Metropolis recipe (pg497) o Example: Ising Lattice o Field theory o Monte Carlo Integration o Simulated annealings o Genetic Programming Bayesian Statistics and Parameter Estimation Goal of this material is to draw conclusions from data (“statistical inference”) and estimate parameters. Basic definitions - Predictor variables: x = [x1 x2 x3… xM] Response variable: y(R) = [y1 y2 y3 … yL] θ: model parameters Main goal: match model prediction to that of the observed response by selecting θ. Single-Response Linear Regression - set of predictor variables, known a priori: x[k] = [x1[k] x2[k] x3[k]… xM[k]], for the kth experiment measurement y[k] [k ] [k ] [k ] assume a linear model: y [k ] = β 0 + β1 x1 + β 2 x2 + ... + β M xM + ε [k ] - the error in ε[k] is responsible for the difference between model and observed T define θ = [β 0 β1 β 2 ... β M ] - response is: - - - [k ] y [k ] = x ⋅ θ (true ) + ε [k ] [k ] (true ) model prediction is: yˆ [k ] = x ⋅θ define design matrix X, which contains all information about every experiment (with different predictor variables) − − − x[1] − − − [ 2] − − − x − − − X= : [N] − − − x − − − vector of predicted responses: y) [1] (θ ) )[ 2] y (θ ) ) y (θ ) = = Xθ : )[ N ] y (θ ) Linear Least Squares Regression [ ] 2 N ) S (θ ) = ∑ y [k ] − y [k ] (θ ) - minimize sum of squared errors: - First derivative = 0, 2nd derivative is > 0, using these conditions with above equation you can derive a linear system - (X X )θ T LS ( = X T y Æ θ LS = X T X ) −1 k =1 X T y (review point?) XTX, contains information about experimental design to probe the parameter values XTX is a real, symmetric matrix that is positive-semidefinite Solving this is through standard linear solving, or QR decomposition or some other method All this estimates parameters, but does not give us accuracy of our estimates Bayesian view of statistical inference - Statement of belief (especially in random number generators) Bayesian view of single-response regression [k ] (true ) - Begin with y [k ] = x ⋅θ + ε [k ] When we repeat this experiment multiple times, we get a vector ε With Gauss-Markov Conditions: E ε [k ] = 0 cov(ε [k ] , ε [ j ] ) = δ kjσ 2 - We also assume that our error is normally distributed Probability of observing some response y - 1 −N 1 p y | θ ,σ = σ exp − 2 S (θ ) 2σ 2π We use Bayes’ Theorem to get probability of θ and σ p y | θ , σ p (θ , σ ) Posterior density: p θ , σ | y = py ( ) o - ( N ) ( ( ) ) () p(y) is a normalizing factor we redefine p(y | θ, σ) to l(θ, σ| y) in the Bayesian framework we want to maximize posterior density p (θ ) ~ c Non-informative priors: p(θ,σ)=p(θ)p(σ) p(θ ) ∝ σ −1 Nonlinear least squares - the treatment via least squares still works, we just use numerical optimization, utilizing a cost function, to get there: (review point?) 2 1 1 N [k ] Fcos t (θ ) = S (θ ) = ∑ y [k ] − f x ;θ 2 2 k =1 use of linearized design matrix Hessians (first order approximation to get to XTX). Remember to get convergence, approximate Hessian needs to be positive-definite. Levenberg-Marquardt method: ill-conditioned systems [ ( )] Generating Confidence Intervals - t-statistic o t≡ y −θ s/ N ( ) − - (ν +1) t2 2 o p(t | ν ) ∝ 1 + ν o in the limit that ν approaches infinity, t-distribution reduces to Normal distribution confidence intervals for model parameters {[ o θ j = θ M , j ± Tν ,α / 2 s X T X θ o ν = N − dim(θ ) MCMC in Bayesian Analysis ]} −1 1 / 2 M jj