Introduction to Random Variables 1. Random variables 1.1 A random variable and its realization x is a random variable that takes different possible values. is a specific value, or realization, of x. Example: x is the concentration of particulate matter concentration in Chapel Hill on July 6, 1988. is the concentration measured at a monitoring station in Chapel Hill on July 6, 1988 by collecting particulate matter dust in a filter and sending the filter to a lab for analysis. 1 1.2 The cdf of a random variable Let P[] be the probability that the event A occurs. Note: We always have P[] [0,1] Then the cumulative density function (cdf) of a random variable x is defined as Fx()=P[x≤] Important properties: Fx(-∞)=0 why? Fx(∞)=1 why? Fx() is always increasing Proof: b>a P[x<b]= P[x<a]+ P[a≤x<b] ≥ P[x<a] Fx(b) ≥ Fx(a) 2 1.3 The pdf of a random variable Definition: The probability density function (pdf) of a random variable may be defined as the derivative of its cdf, F ( ) fx() = x Important properties: Since Fx() is always increasing, then fx()≥0, b b a d f x ( ) Fx ( )a Fx(b)-Fx(a) P[x≤b]-P[x≤a] P[a<x≤b] In other words, the area under the fx() –curve between a and b is the probability of the event a<x≤b. Hence the pdf is really a density of probability, or a probability density function. The normalization constraint: We must always have proof: d f x ( ) = Fx ( ) d f x ( ) 1 = Fx(∞)-Fx(-∞) = 3 1.4 The expected value of a random variable The expected value of g(x) is E[g(x)]= d g ( ) f x ( ) Examples The mean is the expected value of x mx=E[x]= d f x ( ) The variance is the expected value of (x-mx)2 varx=E[(x-mx)2]= d ( m x ) 2 f x ( ) The expected value of (x-mx)4 is E[(x-mx)4]= 4 d ( mx ) f x ( ) 4 1.5 Exercises k if [a,b] Let f x ( ) 0 otherwise (o/w) 1) Use the normalization constraint to find k so that fx() is a pdf. d f x ( ) 1 … k=1/(b-a) 2) Assume that fx()is the pdf of a random value x, what is the expected value of x? E[x]= d f x ( ) = … = (a+b)/2 3) Write the formulae for the variance of x a b 1 varx=E[(x-mx)2]= d ( m x ) f x ( ) = a d 2 ba 2 2 b 5 2. Bivariate distributions x and x’ are two random variables that may be dependent on one another. Example: x is the lead concentration in the drinking water at the tap of a house x’ is the lead concentration in the blood of the person living in that house. 2.1 The bivariate cdf Fxx’(’)=P[x≤ AND x’≤’] =P[x≤ , x’≤’]’ 2.2 The bivariate pdf fxx’(’)= Fx ( , ' ) ' x ' 2.3 Normalization constraint d d ' f x x ' ( , ' ) 1 6 2.4 The marginal and conditional pdf The marginal pdf of fxx’(’) with respect to is fx()= d ' f x x ' ( , ' ) The marginal pdf of fxx’(’) with respect to ’ is fx(’)= d f x x ' ( , ' ) The conditional pdf fx|x’(x'=’) of x given that x'=’ is f x, x ' ( , ' ) fx|x’(x'=’) = = f x' ( ' ) f x, x ' ( , ' ) d f x x ' ( , ' ) 7 2.5 The expected value E[g(x,x’)]= d d ' g ( , ' ) f x ( , ' ) Examples: The expected value of x is mx=E[x]= d d ' f x x ' ( , ' ) The expected value of x’ is mx’=E[x’]= d The variance of x is E[(x-mx)2]= d ' ' f x x ' ( , ' ) 2 d d ' ( mx ) f x x ' ( , ' ) The variance of x’ is E[(x’-mx’)2]= 2 d d ' ( 'mx ' ) f x x ' ( , ' ) The covariance between x and x’ is E[(x-mx)(x’-mx’)]= d d ' ( mx )( 'mx ' ) f x x ' ( , ' ) 8 2.6 Exercises k if [a,b] and [a,b] Let f x y ( , ) 0 o/w 1) Use the normalization constraint to find k so that fxy() is a pdf. d d f xy ( , ) 1 … k=1/(b-a)2 2) Assume that fxy() is the pdf of x and y, what is the expected value of x? E[x]= d d f xy ( , ) =… = (a+b)/2 3) Calculate the variance of x. E[(x-mx)2]= a d b 2 2 a d ( mx ) /(b a) =…= b 2 (b-a) /12 9 4) Write the formulae for E[y] E[y]= d d f xy ( , ) = …= (a+b)/2 5) Calculate the covariance between x and y. E[(x-mx)(y-my)]= b b 2 a d a d ( mx )( my ) /(b a) =…= 0 6) Find the marginal pdf fx() of the random variable x 1 1 b a d 2 (b a) ba f x ( ) d f x y ( , ) 0 if a b otherwise 7) Calculate E[x] using f x ( ) E[x] = d f x ( ) = … = (a+b)/2 (same as in 2) 10 8) Find the marginal pdf fy() of the random variable y 1 1 b d (b a ) 2 b a f y ( ) d f x y ( , ) a 0 if a b otherwise 9) Find the conditional pdf of x given that y= f x| y ( | y ) 10) f x , y ( , ) f y ( ) 1 b a 0 if a b and a b otherwise Find the probability that x <(a+b)/2 given that y=b P[x <(a+b)/2 | y=b] = ( a b) / 2 d f x| y ( | y b) = ( a b) / 2 a d /(b a) =1/2 11 Exercises on Conditional probability using discrete variables Let a be a random variable taking values a1, a2, …, an . Let b be a random variable taking values b1, b2, …, bm . Let P[ai] represent the probability that a=ai . Let P[bj] represent the probability that b=bj . Let P[ai , bj] represent the probabilities that a=ai AND b=bj Then the probability P[ai | bj] that a=ai GIVEN that b=bj is P[ai | bj] = P[ai , bj] / P[bj] Example 1: Consider the case where a takes values a1 or a2, and b takes values b1 or b2. In an experiment we record over 1000 trials the values for a and b, and we obtain the following distribution b1 b2 a1 100 400 a2 100 400 Number of trials = P[a1]= P[a2]= P[b1]= P[b2]= P[a1, b1]= P[a1, b2]= P[a1 | b1]= P[a1| b2]= 12 Example 2: Redo the example with the following distribution b1 B2 a1 400 100 a2 100 400 Number of trials = P[a1]= P[a1, b1]= P[a1, b2]= P[a1 | b1]= P[a1| b2]= P[a2]= P[b1]= P[b2]= Note that in this example, the conditional probability did change, while that was not the case in Example 1. Why? The change in probability can be though of as knowledge updating: P[a1] is the prior probability, while P[a1 | b1] is the updated probability when we know that b=b1. 13