Lecture 3: Review of statistics – two random variables BUEC 333 Professor David Jacks 1 The most interesting questions in economics generally involve two (or more) variables, e.g. the relationship between stock prices and earnings. We can describe the probabilistic relationship between two (or more) RVs using three kinds of probability distributions: 1.) the joint distribution 2.) marginal distributions 3.) conditional distributions Two random variables 2 The joint distribution of discrete RVs X and Y is the probability that the two RVs simultaneously take on certain values, x and y and denoted as Pr (X = x, Y = y). Example: the relationship between weather and commuting time. Let C denote commuting time which is either long (C = 1) or short (C = 0) and let W denote weather The joint distribution 3 Thus, there are four possible outcomes: (C = 0, W = 0); (C = 0, W = 1); (C = 1, W = 0); (C = 1, W = 1). The probabilities of each of these outcomes define the joint distribution of C and W. Short Commute (C = 0) Long Commute (C = 1) Total Foul Weather (W = 0) Fair Weather (W = 1) Total 0.15 0.25 0.55 0.05 The joint distribution 4 When X, Y have a joint distribution, a marginal distribution is the probability of X or Y alone. We can compute the marginal distribution of Y from the joint distribution of X, Y by adding up the probabilities of all possible outcomes where Y takes a particular value (y). That is, if X takes one of k possible values: Marginal distributions 5 Example: weather and commuting time. The marginal distribution of weather The marginal distribution of commuting time Foul Weather (W = 0) Fair Weather (W = 1) Short Commute (C = 0) 0.15 0.25 Long Commute (C = 1) 0.55 0.05 Total 0.70 0.30 Marginal distributions Total 0.40 0.60 1.00 6 The distribution of a RV Y given that another RV X takes a specific value is called the conditional distribution of Y given X. The conditional probability that Y takes value y when X takes value x is written as Pr(Y = y | X = x). In general, Conditional distributions 7 Example: what is the probability of a long commute (C = 1), given that the weather is foul (W = 0) or in other words, Pr(C = 1 | W = 0)? Foul Weather (W = 0) Fair Weather (W = 1) Short Commute (C = 0) 0.15 0.25 Long Commute (C = 1) 0.55 0.05 Total 0.70 0.30 Total 0.40 0.60 1.00 The joint probability is given by 0.55 whereas the marginal probability of foul weather is 0.70. Pr(C = 1 | W = 0) Conditional distributions 8 The mean of the conditional distribution of Y given X is called the conditional expectation (or mean) of Y given X. It is the expected value of Y, given that X takes a particular value. It is computed just like a regular (i.e. unconditional) expectation, but uses the conditional distribution instead of the marginal. Conditional expectation 9 If Y takes one of k possible values y1, ... , yk, then k E Y | X x yi PrY yi | X x i 1 From before, suppose a long commute takes 45 minutes and a short one 30 minutes: what is the expected length of the commute conditional on weather (fair or foul)? Foul: 30 * 0.15/0.7 + 45*0.55/0.7 = 41.79 minutes Conditional expectation 10 There is a simple relationship between conditional and unconditional expectations called the law of iterated expectations. Intuitively, an unconditional expectation is just a weighted average of conditional expectations where the weights are the probabilities of outcomes on which we are conditioning. E.g., the mean commuting time is just a weighted average of the mean time in foul weather The law of iterated expectations 11 For a RV Y and a discrete RV X that takes one of m possible values, the law of iterated expectations is m E Y E Y | X xi Pr X xi i 1 E Y E[ E (Y | X )] E(commuting time) = E(commuting time | foul weather)*Pr(foul weather) + E(commuting time | fair weather)*Pr(fair weather) The law of iterated expectations 12 Just like before, we call the variance of the conditional distribution the conditional variance. It tells us how dispersed the distribution of a RV is, conditional on another RV taking a specific value. Again, it is calculated just like the unconditional variance only we replace the unconditional mean Conditional variance 13 So, if Y takes one of k possible values, then k Var (Y | X x) yi E Y | X x Pr Y yi | X x 2 i 1 Example: the conditional variance of commuting in foul weather. Var(commuting time | foul weather) = (45 – 41.79)2 * 0.55/0.7 + (30 – 41.79)2 * 0.15/0.7 Var(commuting time | foul weather) = 37.88 min2 Conditional variance 14 Quite often, we are interested in quantifying the relationship between two RVs (in fact, linear regression methods do exactly this). When two RVs are completely unrelated, we say they are independently distributed (or simply independent). This implies that knowing the value of one RV (X) provides Independence 15 X and Y are independent if the conditional distribution of Y given X equals the marginal distribution of Y, or Pr(Y = y | X = x) = Pr(Y = y). Equivalently, X and Y are independent if the joint distribution of X and Y equals the product of their marginal distributions, or Pr(Y = y, X = x) = Pr(Y = y) * Pr(X = x). From above and definition of the conditional distribution: Independence 16 A very common measure of association between two RVs is their covariance, a measure of the extent to which two RVs move together. In general, Cov(X,Y) = σXY = E[(X – μX)(Y – μY)]. In the discrete case, if X takes one of m values and Y takes one of k values, then Cov( X , Y ) x j X yi Y PrX x j , Y yi k m i 1 j 1 Covariance 17 Interpretation: 1.) If X and Y have positive covariance (σXY > 0) then X > μX when Y > μY, and X < μX when Y < μY on average, meaning X and Y tend to move in the same direction. 2.) Conversely, if σXY < 0 then X > μX when Y < μY, and X < μX when Y > μY on average, Covariance 18 3.) If X and Y have zero covariance (σXY = 0), this does not mean that X and Y are independent. But the opposite is true: if X and Y are independent, then σXY = 0; this implies independence is a stronger property than zero covariance. In fact, covariance is only a measure of linear association An important caveat 19 Unfortunately, covariance is measured in units of X times units of Y, making interpretation difficult. A unit-less measure of association instead is the correlation between X and Y: Cov X , Y XY Corr X , Y XY Var X Var Y X Y Fun facts: 1.) Corr(X,Y) lies between –1 and 1. 2.) If Cov(X,Y) = 0 Covariance and correlation 20 Foul Weather (W = 0) Fair Weather (W = 1) 30 min Commute (C = 0) 0.15 0.25 45 min Commute (C = 1) 0.55 0.05 Total 0.70 0.30 Total 0.40 0.60 1.00 E(weather) = 0*0.7 + 1*0.3 = 0.3 (1 = fair weather) E(commuting time) = 39 minutes Var(weather) = (0 – 0.3)2*0.7 + (1 – 0.3)2*0.3 = (0.09)*0.7 + (0.49)*0.3 Covariance and correlation 21 Var(commuting time) = 54 (check this at home!!!) Cov(weather, commuting time) = (0 – 0.3)(30 – 39)*0.15 + (0 – 0.3)(45 – 39)*0.55 + (1 – 0.3)(30 – 39)*0.25 + (1 – 0.3)(45 – 39)*0.05 Cov(weather, commuting time) = Cov(weather, commuting time) = Covariance and correlation 22 That is, when weather is fair, commuting time is shorter, but the magnitude is hard to interpret. But Corr(weather, commuting time) = 1.95 0.579 0.21* 54 This is easier to interpret: –0.579 is a quite “large” negative number on a scale from –1 to 1. Covariance and correlation 23 If X, Y, and V are RVs and a, b, and c are constants: E a bX cY a b X cY Var aX bY a b 2ab XY 2 2 X 2 2 Y E Y 2 Y2 Y2 Cova bX cV , Y b XY c VY E XY XY X Y Covariance and correlation 24