1.017/1.010 Class 11 Multivariate Probability Multiple Random Variables Recall the dart tossing experiment from Class 4. Treat the 2 dart coordinates as two different scalar random variables x and y. In this experiment the experimental outcome is the location where the dart lands. The random variables x and y both depend on this outcome (they are defined over the same sample space). In this case we can define the following events: A = [ x (ξ ) ≤ x] B = [ y (ξ ) ≤ y ] C = [ x (ξ ) ≤ x, y (ξ ) ≤ y ] = A I B = AB x and y are independent if A and B are independent events for all x and y: P(C) = P(AB) = P(A)P(B) Another example … Consider a time series constructing from a sequence of random variables defined at different times (a series of n seismic observations or stream flows x1, x2, x3, …, xn.). Each possible time series can be viewed as an outcome ξ of an underlying experiment. Events can be defined as above: Ai = [ x i (ξ ) ≤ xi ] Aij = [ x i (ξ ) ≤ xi , x j (ξ ) ≤ x j ] = Ai I A j = Ai A j xi and xj are independent if: P(Aij) = P(Ai Aj) = P(Ai)P(Aj) Multivariate Probability Distributions Multivariate cumulative distribution function (CDF), for x, y continuous or discrete: Fxy ( x, y ) = P[( x (ξ ) ≤ x)( y (ξ ) ≤ y )] Multivariate probability mass function (PMF), for x, y discrete: p xy ( xi , y j ) = P[( x (ξ ) = xi )( y (ξ ) = y j )] Multivariate probability density function (PDF), for x, y continuous: 1 f xy ( x, y ) = ∂ 2 Fxy ( x, y ) ∂x∂y If x and y are independent: Fxy ( x, y ) = P[ x ≤ x]P[ y ≤ y ] = Fx ( x) F y ( y ) p xy ( xi , y j ) = p x ( x i ) p y ( y j ) f xy ( x, y ) = f x ( x) f y ( y ) Computing Probabilities from Multivariate Density Functions Probability that (x, y)∈ the region D: P [( x, y ) ∈ D] = f ( x, y ) dxdy ∫ ( x, y ) ∈ D xy Covariance and Correlation Dependence between random variables x and y is frequently described with the covariance and correlation: +∞ Cov ( x , y ) = E[( x - x )( y - y )] = ∫ ( x - x )( y - y) f xy ( x,y ) dx dy -∞ Correl ( x , y ) = Cov ( x, y ) Cov ( x, y ) = 1/ 2 Std ( x ) Std ( y ) [Var ( x )Var ( y )] Uncorrelated x and y: Cov(x, y) = Correl (x, y) = 0 Independence implies uncorrelated (but not necessarily vice versa) Examples Two independent exponential random variables (parameters ax and ay): f xy ( x, y ) = f x ( x) f y ( y ) = x y x 1 1 1 y exp − exp − = exp − − ax a x a y a y a x a y ax a y ax = E(x), ay = E(y), Correl(x,y) = 0 2 Two dependent normally distributed random variables (parameters µx, µy, σx , σy, and ρ): f xy ( x, y ) = 1 2π C 0.5 ( Z − µ )' C −1 ( Z − µ ) exp− 2 Z = vector of random variables = [x y]′ µ = vector of means = [ E(x) E(y) ] ′ σ x2 C = covariance matrix = C = ρσ xσ y ρσ xσ y σ y2 σx = Std(x), σy = Std(y), ρ = Correl(x,y) |C| = determinant of C = σ x2σ y2 (1 − ρ 2 ) 1 C = inverse of C = C -1 σ y2 − ρσ xσ y − ρσ xσ y σ x2 Multivariate probability distributions are rarely used except when: 1. The random variables are independent 2. The random variables are dependent but normally distributed Exercise: Use the MATLAB function mvnrnd to generate scatterplots of correlated bivariate normal samples. This function takes as arguments the means of x and y and the covariance matrix defined above (called SIGMA in the MATLAB documentation). Assume E[x] = 0, E[y] = 0, σx = 1, σy = 0. Use mvnrnd to generate 100 (x, y) realizations . Use plot to plot each of these as a point on the (x,y) plane (do not connect the points). Vary the correlation coefficient ρ to examine its effect on the scatter. Consider ρ = 0., 0.5, 0.9. Use subplot to put plots for all 3 ρ values on one page. Copyright 2003 Massachusetts Institute of Technology Last modified Oct. 8, 2003 3