Chapter 3 Statistics 2A STA02A2 CHAPTER 3: Joint Distributions Lecture notes sections Content from corresponding textbook sections 1. Introduction 3.1. Introduction 2. Discrete random variables 3.2. Discrete random variables 3. Continuous random variables 3.3. Continuous random variables 4. Independent random variables 3.4. Independent random variables 5. Conditional distributions 3.5. Conditional distributions 1. Introduction In statistical analysis we are often interested in several random variables that are related to one another. For example, in ecological studies the counts of different species could be related to the predator count in the habitat, or in marketing, the sales revenue, marketing expenditure, and consumer behaviour and preference are all related measurements. In such cases we not only look at the distribution of each variable individually, i.e., univariate analysis, we also look at the joint distribution of two or more variables, i.e., bivariate/multivariate analysis. The joint behaviour of two random variables X and Y, discrete or continuous, is determined by the joint cdf: F ( x, y ) = P ( X x Y y ) = P ( X x, Y y ) The cdf gives the probability that the point (X, Y) belongs to a semi-infinite rectangle in the plane (figure on the left) and the probability that the point (X, Y) belongs to a given rectangle (figure on the right) is: P ( x1 X x2 , y1 Y y2 ) = F ( x2 , y2 ) − F ( x2 , y1 ) − F ( x1 , y2 ) + F ( x1 , y1 ) Y Y b y2 y1 X X a In general, if X 1 , X 2 , F ( x1 , x2 , x1 x2 , X n are jointly distributed random variables, their joint cdf is: , xn ) = P ( X1 x1 , X 2 x2 , , X n xn ) 1 Chapter 3 Statistics 2A STA02A2 2. Discrete Random Variables 2.1 Joint probability mass function Let X and Y be discrete random variables defined on the same sample space, where X can take on the values and Y can take on the values y1 , y2 , x1 , x2 , . The joint probability mass function (joint pmf) is defined as: p ( xi , y j ) = P ( X = xi , Y = y j ) Consider two discrete random variables X and Y with any valid range − X and − Y with no further restrictions. The joint probability for a range of X and Y values is found through double summation. x2 y2 y2 x2 P ( x1 X x2 , y1 Y y2 ) = p ( x, y ) = p ( x, y ) x = x1 y = y1 y = y1 x = x1 Example Consider two discrete random variables X and Y with joint pmf given in the following table, and presented in the graph below: Y=0 Y=1 Y=2 X=0 P ( X = 0, Y = 0 ) = 16 P ( X = 0, Y = 1) = 14 P ( X = 0, Y = 2 ) = 18 X=1 P ( X = 1, Y = 0 ) = 18 P ( X = 1, Y = 1) = 16 P ( X = 1, Y = 2 ) = 16 0.25 0.167 0.25 0.125 0.20 0.167 0.167 0.125 0.15 0.10 x=0 0.05 x =1 0.00 y=0 y=1 y=2 Note that p ( xi , y j ) = 16 + 14 + 18 + 18 + 16 + 16 = 4+6+324+3+ 4+ 4 = 24 . 24 = 1 1 2 i =0 j =0 2 Chapter 3 Statistics 2A STA02A2 2.2 Marginal probability mass function The joint pmf contains all information about the distributions on both X and Y. It is therefore possible to derive the marginal distribution of X (marginal pmf of X) and the marginal distribution of Y (marginal pmf of Y) from the joint pmf using the law of total probability, namely: ➢ Marginal pmf of X = p X ( x ) = p ( x, y j ) for all j ➢ Marginal pmf of Y = pY ( y ) = p ( xi , y ) for all i Example In the example above, the marginal pmf of X and the marginal pmf of Y, with both shown in one graph, are: ➢ Marginal pmf of X: o For X = 0: p X ( 0 ) = p ( 0, y j ) = 16 + 14 + 81 = 4+246+3 = 13 24 for all j o For X = 1: p X (1) = p (1, y j ) = 18 + 16 + 16 = 3+244+ 4 = 11 24 for all j ➢ Marginal pmf of Y: o For Y = 0: pY ( 0 ) = p ( xi , 0 ) = 16 + 18 = 424+3 = 247 for all i o For Y = 1: pY (1) = p ( xi ,1) = 14 + 16 = 624+ 4 = 10 24 for all i o For Y = 2: pY ( 2 ) = p ( xi , 2 ) = 18 + 16 = 324+ 4 = 247 for all i 0.6 0.54 0.5 0.46 0.42 0.4 0.29 0.3 0.29 0.2 0.1 0 x=0 x=1 y=0 y=1 y=2 3 Chapter 3 Statistics 2A STA02A2 2.3 Multivariate joint and marginal probability mass functions In the case of several discrete random variables X 1 , X 2 , ➢ Joint pmf of X 1 , X 2 , , X n is p ( x1 , x2 , ➢ Marginal pmf of, say, X 1 , is p X1 ( x1 ) = , X n , the joint and marginal functions are: , xn ) = P ( X1 = x1 , X 2 = x2 , p(x ,x , 1 x2 x3 2 , X n = xn ) , xn ) xn p(x , x , ➢ Two-dimensional marginal pmf of, say, X 1 and X 2 , is p X1 X 2 ( x1 , x2 ) = 1 x3 x4 2 , xn ) xn 2.4 Calculating probabilities We can use the joint (and marginal) mass functions to calculate joint (and marginal) probabilities. The first step is to identify the valid values of X and Y for the required probability. Example Calculate P ( X = 1, Y 0 ) from the joint pmf: Y=0 Y=1 Y=2 X=0 P ( X = 0, Y = 0 ) = 16 P ( X = 0, Y = 1) = 14 P ( X = 0, Y = 2 ) = 18 X=1 P ( X = 1, Y = 0 ) = 18 P ( X = 1, Y = 1) = 16 P ( X = 1, Y = 2 ) = 16 P ( X = 1, Y 0 ) = P ( X = 1, Y = 1) + P ( X = 1, Y = 2 ) = 16 + 16 = 62 = 13 3. Continuous Random Variables 3.1 Joint probability density function Let X and Y be continuous random variables. The joint pdf f ( x, y ) is a piecewise continuous function of variables − − − − X and Y that is nonnegative, and f ( x, y ) dxdy = f ( x, y ) dydx = 1 . Consider two continuous random variables X and Y with any valid range − X and − Y with no further restrictions. The joint probability for a range of X and Y values is found through double integration. x2 y2 y2 x2 x1 y1 y1 x1 P ( x1 X x2 , y1 Y y2 ) = f ( x, y ) dydx = f ( x, y ) dxdy 4 Chapter 3 Statistics 2A STA02A2 Example Show that the following function f ( x, y ) is a valid joint pdf of random variables X and Y. f ( x, y ) = 127 ( x 2 + xy ) 1 1 f ( x, y ) = − − 0 0 0 x, y 1 1 12 7 ( x + xy ) dydx = ( x y + xy ) 2 2 12 7 2 1 2 0 1 = 127 ( x 2 + 12 x ) dx = 127 ( 13 x 3 + 14 x 2 ) 0 y =1 y =0 dx . x =1 x =0 = 127 ( 13 + 14 ) = 127 127 = 1 Since f ( x, y ) integrates to 1 over both variables it follows that it is a valid pdf. The graph of f ( x, y ) is: 3.2 Joint cumulative distribution function The joint cdf of X and Y is defined as: x y y x F ( x, y ) = P ( X x, Y y ) = f ( u , v ) dvdu = f ( u , v ) dudv − − − − From the fundamental theorem of multivariable calculus it follows that f ( x, y ) = 2 F ( x, y ) . xy 5 Chapter 3 Statistics 2A STA02A2 Example ( ) 1) Derive F ( x, y ) from f ( x, y ) = 127 x 2 + xy for 0 x, y 1 . x y x 0 0 0 F ( x, y ) = 127 ( u 2 + uv ) dvdu = 127 ( u 2v + 12 uv 2 ) v= y v =0 = 127 ( 13 x 3 y + 14 x 2 y 2 ) for 0 x, y 1 1 du = 127 ( u 2 y + 12 uy 2 ) du = 127 ( 13 u 3 y + 14 u 2 y 2 ) 0 u=x u =0 2) Use the answer in (1) to derive f ( x, y ) . f ( x, y ) = 2 12 1 3 7 ( 3 x y + 14 x 2 y 2 ) = 127 ( x 2 y + 12 xy 2 ) = 127 ( x 2 + xy ) for 0 x, y 1 xy y 3.3 Marginal cumulative distribution and probability density functions The marginal cdf of a random variable X, FX ( x ) , is: x FX ( x ) = P ( X x ) = f ( u , y ) dydu − − The marginal pdf of a random variable X, f X ( x ) , is: fX ( x) = d FX ( x ) = f ( x, y ) dy dx − Example ( ) Derive f X ( x ) and fY ( y ) from f ( x, y ) = 127 x 2 + xy for 0 x, y 1 . 1 1) 2) 1 f X ( x ) = f ( x, y ) dy = 127 ( x 2 + xy ) dy = 127 ( 13 x 2 y + 12 xy 2 ) 0 0 1 1 fY ( y ) = f ( x, y ) dx = 127 ( x 2 + xy ) dx = 127 ( 13 x 3 + 12 x 2 y ) 0 0 y =1 y =0 x =1 x =0 = 127 ( 13 x 2 + 12 x ) for 0 x 1 = 127 ( 13 + 12 y ) for 0 y 1 6 Chapter 3 Statistics 2A STA02A2 3.4 Multivariate joint and marginal probability density functions In the case of several discrete random variables X 1 , X 2 , − − − ➢ Marginal pmf of, say, X 1 , is f X1 ( x1 ) = , X n , the marginal functions are: f (x , x , 2 3 , xn ) dx2 dx3 dxn ➢ Two-dimensional marginal pmf of, say, X 1 and X 2 , is f X1 X 2 ( x1 , x2 ) = − − f (x , x , 3 4 , xn ) dx3dx4 dxn − 3.5 Calculating probabilities To calculate probabilities from a joint pdf of two variables, i.e., a bivariate density, we use the double integral. It is important that we incorporate any restrictions on the valid values of the two variables in the double integral. To do this, we must determine the range of the sample space, and the range of the required joint probability. The first step is to draw the valid region for the sample space. The second step is to add the valid region for the event space of the required probability. This determines the successive integral ranges for X and Y, and also the order in which we will use the double integral. Exercise 1 Let X and Y be continuous random variables with joint pdf f ( x, y ) , where 0 x y 1 . 1) Draw the valid range of the sample space. 7 Chapter 3 Statistics 2A STA02A2 2) Draw the valid range of the events space to calculate P ( X 0.5) . We can find the P ( X 0.5) by either (1) finding the marginal distribution of X (i.e., integrating out over y) and then calculating the probability, or (2) using the double integral in a single step. For option (2) we need to decide whether we first integrate over y and then over x, or over x and then over y. To determine this, we must check how the two variables vary together. 8 Chapter 3 Statistics 2A STA02A2 3) Draw the valid range of the events space to calculate P (Y 0.5) . We must check how the two variables vary together. 9 Chapter 3 Statistics 2A STA02A2 4) Draw the valid range of the events space to calculate P ( X 0.5Y ) . Since we now have both variables in the event space it creates an additional line in the graph. 10 Chapter 3 Statistics 2A STA02A2 3.6 Copulas In all the preceding joint distribution examples, the joint pmf or pdf were given. The question we can ask is how was that joint distribution created? If two variables X and Y are, say, univariate normal variables, does it necessarily follow that they follow a bivariate normal distribution? The answer to this is no, as the underlying dependence structure has an impact on the joint behaviour of the random variables. It is however true that, if variables X and Y follow a bivariate normal distribution, then both variables are univariate normally distributed. If we want to model the relationship between two variables, an easy way to do this is through linear regression models. However, this requires the assumption of linearity, which is not always the case. If a bivariate relationship between X and Y is such that they are closely correlated for small values of X and Y, and weakly correlated for large values of X and Y, the relationship is not constant. One approach to model this type of dependence and find a joint probability distribution is through the use of copulas, which can estimate more complex relationships. Copulas are based on Sklar’s theorem, developed in 1959. The theorem states that any multivariate joint distribution can be written in terms of the univariate marginal distribution functions and a copula, which describes the dependence structure between the variables. To estimate a copula, we need to determine which copula to use and then find the parameter which best fits the data. Common copulas are the Gaussian, Clayton, Farlie-Morgenstern, Gumbel and Frank copulas. Formally defined, a copula is a joint cdf of random variables that have uniform marginal distributions. A copula C ( u, v ) is nondecreasing in each variable because it is a cdf. The density is defined as: c ( u, v ) = 2 C ( u, v ) 0 uv If X and Y are continuous random variables with cdf’s FX ( x ) and FY ( y ) , then U = FX ( x ) and V = FY ( y ) are uniform random variables (from the probability integral transform). For a copula C ( u, v ) , the joint cdf FXY ( x, y ) and joint pdf f XY ( x, y ) , are: ➢ FXY ( x, y ) = C ( FX ( x ) , FY ( y ) ) ➢ f XY ( x, y ) = c ( FX ( x ) , FY ( y ) ) f X ( x ) fY ( y ) Marginal distributions alone do not determine the joint distribution. A bivariate joint distribution can be constructed from two marginal distributions and any copula, which captures the dependence between X and Y. 11 Chapter 3 Statistics 2A STA02A2 Farlie-Morgenstern family If FX ( x ) and GY ( y ) are univariate cdf’s, then the function H ( x, y ) is a bivariate cdf, for any 1 : H ( x, y ) = F ( x ) G ( y ) 1 + 1 − F ( x ) 1 − G ( y ) Because lim FX ( x ) = lim FY ( y ) = 1 , the marginal distributions of H ( x, y ) are FX ( x ) and GY ( y ) , which are x → y → uniform on 0,1 . We can construct an infinite number of different bivariate distributions using H ( x, y ) . For example, use H ( x, y ) to construct a bivariate distribution with uniform(0,1) marginals, where = −1: ➢ FX ( x ) = x, 0 x 1 and FY ( y ) = y, 0 y 1 ➢ H ( x, y ) = xy 1 − (1 − x )(1 − y ) = x 2 y + xy 2 − x 2 y 2 ➢ h ( x, y ) = 2 x 2 y + xy 2 − x 2 y 2 ) = ( x 2 + 2 xy − 2 x 2 y ) = 2 x + 2 y − 4 xy ( xy x For example, use H ( x, y ) to construct a bivariate distribution with uniform(0,1) marginals, where = +1: ➢ FX ( x ) = x, 0 x 1 and FY ( y ) = y, 0 y 1 ➢ H ( x, y ) = xy 1 + (1 − x )(1 − y ) = 2 xy − x 2 y − xy 2 + x 2 y 2 2 ➢ h ( x, y ) = 2 xy − x 2 y − xy 2 + x 2 y 2 ) = ( 2 x − x 2 − 2 xy + 2 x 2 y ) = 2 − 2 x − 2 y + 4 xy ( xy x 12 Chapter 3 Statistics 2A STA02A2 Exercise 2 1) Consider the following function of a random variable X, where = 0,1, 2,3 , and find the value of k that will ensure that p(x) is a valid pmf. k ( x3 + 1) x = 0,1, 2,3 p ( x) = otherwise 0 2) Consider the following joint density distribution f ( x, y ) . Draw the graph, with coordinates, of the valid region of X and Y, as well as the event space to calculate P ( X Y ) . x+ y f ( x, y ) = 2 0 for x 0, y 0, and 3 x + y 3 otherwise 13 Chapter 3 Statistics 2A STA02A2 3) Calculate P ( X Y ) . 14 Chapter 3 Statistics 2A STA02A2 4. Independent Random Variables In bivariate analysis, we not only look at the joint distribution of two random variables, but we must also evaluate the dependence between the two variables. The concept of independent random variables is similar to the concept of statistical independence in probability analysis, which states that two events A and B are statistically independent if P ( A B ) = P ( A) P ( B ) . In other words, knowing the outcome of one event does not change or influence the outcome of the other event. The same concept applies to random variables. In general, if two random variables X and Y are independent, then: P ( X A, Y B ) = P ( X A) P (Y B ) for all sets A and B This means that n random variables X 1 , X 2 , , X n are independent if their joint pmf/pdf factors into the product of their marginal pmf/pfd’s, and their joint cdf factors into the product of their marginal cdf’s. ➢ For X i independent discrete random variables, it follows that: o p ( x1 , x2 , , xn ) = pX1 ( x1 ) pX 2 ( x2 ) pX n ( xn ) o F ( x1 , x2 , , xn ) = FX1 ( x1 ) FX 2 ( x2 ) FX n ( xn ) ➢ For X i independent continuous random variables, it follows that: o f ( x1 , x2 , , xn ) = f X1 ( x1 ) f X 2 ( x2 ) f X n ( xn ) o F ( x1 , x2 , , xn ) = FX1 ( x1 ) FX 2 ( x2 ) FX n ( xn ) Consider the case of two jointly continuous random variables X and Y. If they are independent, then: F ( x, y ) = FX ( x ) FY ( y ) Proof/Derivation: x y F ( x, y ) = f XY ( u , v ) dvdu − − x y = f X ( u ) fY ( v ) dvdu − − x y = f X ( u ) du fY ( v ) dv − − = FX ( x ) FY ( y ) 15 Chapter 3 Statistics 2A STA02A2 Consider the case of two independent random variables X and Y. Then any function Z = g ( X ) and W = h (Y ) are also independent. Proof/Derivation: Let A ( z ) be the set of x such that g ( x ) z , and let B ( w) be the set of y such that h ( y ) w P ( Z z,W w ) = P ( X A ( z ) , Y B ( w ) ) = P ( X A ( z ) ) P (Y B ( w ) ) = P ( Z z ) P (W w ) In conclusion: ➢ If we know two variables are independent, we find the joint distribution as the product of the marginal distributions. ➢ If we want to test if two variables are independent, we check if the joint distribution is equal to the product of the marginal distributions. Example Consider two discrete random variables X and Y with joint pmf: Y=0 Y=1 Y=2 X=0 P ( X = 0, Y = 0 ) = 16 P ( X = 0, Y = 1) = 14 P ( X = 0, Y = 2 ) = 18 X=1 P ( X = 1, Y = 0 ) = 18 P ( X = 1, Y = 1) = 16 P ( X = 1, Y = 2 ) = 16 Use the marginal pmf’s of X and Y to check if X and Y are independent. p ( x ) = 11 24 0 13 24 x=0 x =1 otherwise 247 10 p ( y ) = 247 24 0 y=0 y =1 y=2 otherwise To check independence, we need to see if P ( X = xi , Y = y j ) = P ( X = xi ) P (Y = y j ) for all possible values of X and Y. For example, consider pXY (1,0 ) : 7 P ( X = 1, Y = 0 ) = 18 = 0.125 P ( X = 1) P (Y = 0 ) = 11 24 24 = 0.134 16 Chapter 3 Statistics 2A STA02A2 5. Conditional Distributions A conditional distribution is the distribution of values for one random variable that exists when we specify the values of another random variable. For example, we have observed the value of a random variable Y and we need to define the distribution of another random variable X, give what we have observed for Y. Therefore, we use the conditional distribution of X given Y. Consider two random variables X and Y. ➢ If X and Y are jointly distributed discrete random variables, the conditional probability that X = x (some value x) given that Y = y (some value y), where pY ( y ) 0 , is: p X |Y ( x | y ) = P ( X = x | Y = y ) = P ( X = x, Y = y ) P (Y = y ) = p XY ( x, y ) pY ( y ) ➢ If X and Y are jointly distributed continuous random variables, the conditional density of X given Y, where fY ( y ) 0 , is: f X |Y ( x | y ) = f XY ( x, y ) fY ( y ) The joint pmf/pdf can be expressed in terms of the marginal and conditional pmf/pdf’s, which leads to an extremely useful application of the law to total probability. ➢ X and Y discrete: p XY ( x, y ) = p X |Y ( x | y ) pY ( y ) p X ( x ) = p XY ( x, y ) = p X |Y ( x | y ) pY ( y ) y y p XY ( x, y ) = pY | X ( y | x ) p X ( x ) pY ( y ) = p XY ( x, y ) = pY | X ( y | x ) p X ( x ) x x ➢ X and Y continuous: − − − − f XY ( x, y ) = f X |Y ( x | y ) fY ( y ) f X ( x ) = f XY ( x, y ) dy = f X |Y ( x | y ) fY ( y ) dy f XY ( x, y ) = fY | X ( y | x ) f X ( x ) fY ( y ) = f XY ( x, y ) dx = fY | X ( y | x ) f X ( x ) dx 17 Chapter 3 Statistics 2A STA02A2 Example Consider two discrete random variables X and Y with joint pmf: Y=0 Y=1 Y=2 X=0 P ( X = 0, Y = 0 ) = 16 P ( X = 0, Y = 1) = 14 P ( X = 0, Y = 2 ) = 18 X=1 P ( X = 1, Y = 0 ) = 18 P ( X = 1, Y = 1) = 16 P ( X = 1, Y = 2 ) = 16 Given the marginal pmf p ( x ) , calculate pY | X ( 2 | 0 ) = P (Y = 2 | X = 0 ) . x=0 13 24 11 p ( x ) = 24 0 x =1 pY | X ( y | x ) = p XY ( x, y ) pX ( x ) otherwise pY | X ( 2 | 0 ) = p XY ( 2, 0 ) 18 3 = = 13 pX ( 0 ) 13 24 Exercise 3 Consider two continuous random variables X and Y with joint pdf f ( x, y ) . 0.5 f ( x, y ) = 0 for 0 x y 2 otherwise 1) Identify the valid range, with coordinates, of the sample space. 18 Chapter 3 Statistics 2A STA02A2 2) Find the marginal distribution, with valid range, of X. 3) Find the marginal distribution, with valid range, of Y. 4) Determine whether X and Y are independent. 19 Chapter 3 Statistics 2A STA02A2 5) Find the conditional distribution, with valid range, of fY | X ( y | x ) . 6) Calculate fY | X ( y |1.5) and specify the valid range. 7) Calculate P (Y 1.7 | X = 1.5) . 20