Random Variable and Probability Distribution Outline of Lecture Random Variable – Discrete Random Variable. – Continuous Random Variables. Probability Distribution Function. – Discrete and Continuous. – PDF and PMF. Expectation of Random Variables. Propagation through Linear and Nonlinear model. Multivariate Probability Density Functions. Some Important Probability Distribution Functions. 2 Random Variables A random variables are functions that associate a numerical value to each outcome of an experiment. – Function values are real numbers and depend on “chance”. The function that assigns value to each outcome is fixed and deterministic. – The randomness is due to the underlying randomness of the argument of the function X. – If we roll a pair of dice then the sum of two face values is a random variable. Random numbers can be Discrete or Continuous. – Discrete: Countable Range. – Continuous: Uncountable Range. 3 Discrete Random Variables A random variable X and the corresponding distribution are said to be discrete, if the number of values for which X has non-zero probability is finite. Probability Mass Function of X: f ( x) pj when x x j 0 otherwise Probability Distribution Function of x: F ( x ) P( X x ) Properties of Distribution Function: monotonically increasing Right Continuous 0 F ( x) 1 P(a x b) F (b) F (a) 4 Examples X denote the number of heads when a biased coin with probability of head p is tossed twice. – X can take value 0, 1 or 2. 0 x0 (1 p)2 0 x 1 F ( x) p(1 p)1 x 2 p2 x2 X denote the random variable that is equal to sum of two fair dices. – Random variable can take any integral value between 1 and 12. 5 Continuous Random Variables and Distributions X is a continuous random variable if there exists a non-negative function f(x) defined for real line having the property that x P( X x ) F ( x ) f ( y)dy F '( x) f ( x) The integrand f(y) is called a probability density function. Properties: f ( x)dx 1 b P (a X b) F (b) F (a ) f ( x)dx a 6 Continuous Random Variables and Distributions Probability that a continuous random variable will assume any particular value is zero. a P( X a) f ( x)dx 0 a It does not mean that event will never occur. – Occur infrequently and its relative frequency will converge to zero. – f(a) large Probability mass is very dense. – f(a) small Probability mass is not very dense. f(a) is the measure of how likely it is that random variable will be a near a. P(a X a ) f ( x)dx 2 f (a) a 7 Difference Between PDF and PMF Probability density function does not defines a probability but probability density. – To obtain the probability we must integrate it in an interval. Probability mass function gives the true probability. – It does not need to be integrate to obtain the probability. a b Probability distribution function is either continuous or has a jump discontinuity. 1) P(a X b) 3) P(a X b) 2) P(a X b) 4) P(a X b) – Are they equal? 8 Statistical Characterization of Random Variables Recall, a random number denote the numerical attribute assigned to an outcome of an experiment. We can not be certain which value of X will be observed on a particular trial. Will average of all the values will be same for two different set of trials? x1 x2 x n xn y1 y2 y n yn Recall, probability approx. equal to relative frequency. – Approx. Np1 number of xi’s have value u1 x1 x2 x n xn np1u1 npmum n ui pi 9 Statistical Characterization of Random Variables Expected Value: –The expected value of a discrete random variable, x is found by multiplying each value of random variable by its probability and then summing over all values of x. Expected value of x: E[ x] xP( x) xf ( x) x x – Expected value is equivalent to center of mass concept. r mi ri mi – That’s why name first moment also. – Body is perfectly balanced abt. Center of mass The expectation value of x is the “balancing point” for the probability mass function of x – Expected value is equal to the point of symmetry in case of symmetric pmf/pdf. 10 Statistical Characterization of Random Variables Law of Unconscious Statistician (LOTUS): We can take an expectation of any function of a random variable. Expected value of (y=g(x)) = E[y]= yf y g(x)f x y x This balance point is the value expected for g(x) for all possible repetitions of the experiment involving the random variable x. Expected value of a continuous density function f(x), is given by E ( x) xf ( x)dx 11 Example Let us assume that we have agreed to pay $1 for each dot showing when a pair of dice is thrown. We are interested in knowing, how much we would lose on the average? Values of x Frequency Values of Probability Function 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 5 4 3 2 1 P(x=2) = 1/36 P(x=3) = 2/36 P(x=4) = 3/36 P(x=5) = 4/36 P(x=6) = 5/36 P(x=7) = 6/36 P(x=8) = 5/36 P(x=9) = 4/36 P(x=10) = 3/36 P(x=11) = 2/36 P(x=12) = 1/36 Sum 36 1.00 Probability Distribution Function P(x2) = 1/36 P(x3) = 3/36 P(x4) = 6/36 P(x5) = 10/36 P(x6) = 15/36 P(x7) = 21/36 P(x8) = 26/36 P(x9) = 30/36 P(x10) = 33/36 P(x11) = 35/36 P(x12) = 1 Average amount we pay= (($2x1)+($3x2)+……+($12x1))/36=$7 E(x)=$2(1/36)+$3(2/36)+……….+$12(1/36)=$7 12 Example (Continue…) Let us assume that we had agreed to pay an amount equal to the squares of the sum of the dots showing on a throw of dice. – What would be the average loss this time? Will it be ($7)2=$49.00? Actually, now we are interested in calculating E[x2]. – E[x2]=($2)2(1/36)+……….+($12)2(1/36)=$54.83 $49 – This result also emphasized that (E[x])2 E[x2] 13 Expectation Rules Rule 1: E[k]=k; where k is a constant Rule 2: E[kx] = kE[x]. Rule 3: E[x y] = E[x] E[y]. Rule 4: If x and y are independent E[xy] = E[x]E[y] Rule 5: V[k] = 0; where k is a constant Rule 6: V[kx] = k2V[x] 14 Variance of Random Variable Variance of random variable, x is defined as V ( x) 2 E[( x )2 ] V ( x) E[ x 2 2 x 2 ] E[ x 2 ] 2( E[ x]) 2 ( E[ x]) 2 E[ x 2 ] ( E[ x]) 2 This result is also known as “Parallel Axis Theorem” 15 Propagation of moments and density function through linear models y=ax+b – Given: = E[x] and 2 = V[x] – To find: E[y] and V[y] E[y] = E[ax]+E[b] = aE[x]+b = a+b V[y] = V[ax]+V[b] = a2V[x]+0 = a2 2 Let us define z (x ) Here, a = 1/ and b = - / Therefore, E[z] = 0 and V[z] = 1 z is generally known as “Standardized variable” 16 Propagation of moments and density function through non-linear models If x is a random variable with probability density function p(x) and y = f(x) is a one to one transformation that is differentiable for all x then the probability function of y is given by – p(y)=p(x)|J|-1, for all x given by x=f-1(y) – where J is the determinant of Jacobian matrix J. Example: Let y ax 2 and p( x) 1 x 2 exp( x 2 / 2 x2 ) NOTE: for each value of y there are two values of x. 1 p( y ) exp( y / 2a x2 ), y 0 2 x 2 ay and p(y) = 0, otherwise We can also show that E( y) a x2 and V ( y) 2a4 x4 17 Random Variables One random number depicts one physical phenomenon. – Web server. Just an extension to random variable – A vector random variable X is a function that assigns a vector of real number to each outcome in the sample space. – e.g. Sample Space = Set of People. – Random vector=[X=weight, Y=height of a person]. A random point (X,Y) has more information than X or Y. – It describes the joint behavior of X and Y. The joint probability distribution function: F ( X , Y ) P({X x} {Y y}) What Happens: x x y y 18 Random Vectors Joint Probability Functions: – Joint Probability Distribution Function: F ( X ) P[{X1 x1} {X 2 x2} ......... {X n xn}] – Joint Probability Density Function: n F ( X ) f ( x) X1X 2 ...X n Marginal Probability Functions: A marginal probability functions are obtained by integrating out the variables that are of no interest. F ( x) P ( x, y ) or y y f ( x, y )dy y 19 Multivariate Expectations xf E( X ) ( x)dx X What abt. g(X,Y)=X+Y f X ( x) f X ,Y ( x, y )dy E( X ) E (Y ) xf ( x)dx X xf yf Y ( y )dy X ,Y E ( g ( X )) yf X ,Y g ( x) f X ( x, y )dxdy ( x, y )dydx ( x)dx g ( x) f X ,Y ( x, y )dydx E (h(Y )) h( y ) f X ,Y ( x, y )dxdy E ( g ( X , Y )) g ( x, y ) f X ,Y ( x, y )dxdy 20 Multivariate Expectations Mean Vector: E[x] [ E[ x1 ] E[ x2 ] ...... E[ xn ]] Expected value of g(x1,x2,…….,xn) is given by E[ g (x)] ..... g (x) f ( x) or xn xn1 x1 ..... g (x) f (x)dx xn xn-1 x1 Covariance Matrix: cov[x] P E[(x )(x )T ] E[xxT ] T where, S E[xxT ] is known as autocorrelation matrix. 1 0 0 1 0 0 2 21 NOTE: P R 0 0 n n1 12 1 n2 1n 1 0 2 n 0 2 1 0 0 0 0 n R is the correlation matrix 21 Covariance Matrix Covariance matrix indicates the tendency of each pair of dimensions in random vector to vary together i.e. “co-vary”. Properties of covariance matrix: – Covariance matrix is square. – Covariance matrix is always +ive definite i.e. xTPx > 0. – Covariance matrix is symmetric i.e. P = PT. – If xi and xj tends to increase together then Pij > 0. – If xi and xj are uncorrelated then Pij = 0. 22 Independent Variables Recall, two random variables are said to be independent if knowing values of one tells you nothing about the other variable. – Joint probability density function is product of the marginal probability density functions. – Cov(X,Y)=0 if X and Y are independent. – E(XY)=E(X)E(Y). Two variables are said to be uncorrelated if cov(X,Y)=0. – Independent variables are uncorrelated but vice versa is not true. Cov(X,Y)=0Integral=0. – It tells us that distribution is balanced in some way but says nothing abt. Distribution values. – Example: (X,Y) uniformly distributed on unit circle. 23 Gaussian or Normal Distribution The normal distribution is the most widely known and used distribution in the field of statistics. – Many natural phenomena can be approximated by Normal distribution. Central Limit Theorem: – The central limit theorem states that given a distribution with a mean and variance 2, the sampling distribution of the mean approaches a normal distribution with a mean and a variance 2/N as N, the sample size increases. Normal Density Function: f ( x) 1 e 2 0.399 ( x )2 2 2 x -2 - + +2 24 Multivariate Normal Distribution Multivariate Gaussian Density Function: 1 f ( X) n 2 R 1 2 e T 1 1 2 X μ R X μ How to find equal probability surface? 1 Xμ 2 T R 1 Xμ constant More ever one is interested to find the probability of x lies inside the quadratic hyper surface – For example what is the probability of lying inside 1-σ ellipsoid. 1 R CΣC T P zi2 c 2 f ( z )dV V Y C( X μ) Yi zi i z12 z22 zn2 c 2 1 2 1 0 0 0 1 22 0 0 0 Σ 1 n2 25 Multivariate Normal Distribution Yi represents coordinates based on Cartesian principal axis system and σ2i is the variance along the principal axes. Probability of lying inside 1σ,2σ or 3σ ellipsoid decreases with increase in dimensionality. n\c 1 2 3 1 0.683 0.955 0.997 2 0.394 0.865 0.989 Curse of Dimensionality 3 0.200 0.739 0.971 26 Summary of Probability Distribution Functions Probability Distribution Discrete Parameters Characteristics Probability Function Binomial 0 p 1 and n 0,1, 2, Skewed unless p=0.5 M=0…n, N=0,1,2… Hypergeometric n=0…N Poisson >0 n Cx p x q n x M Skewed Skewed positively Continuous C x N M Cn x N Cn x e Symmetric about Standardized Normal Symmetric about zero 1 x2 e 2 Exponential Skewed Positively - and 0 0 np n M N npq nM ( N M )( N n) N 2 ( N 1) 2 0 1 1/ 1/2 x! 1 e 2 Normal Mean Variance ( x )2 2 2 2 e T A distribution is skewed if it has most of its values either to the right or to the left of its mean 27 Properties of Estimators Unbiasedness – On average the value of parameter being estimated is equal to true value. E[xˆ ] x Efficiency – Have a relatively small variance. – The values of parameters being estimated should not vary with samples. Sufficiency – Use as much as possible information available from the samples. Consistency – As the sample size increases, the estimated value approaches the true value. 28