CIS 2033 BASED ON DEKKING ET AL. A MODERN INTRODUCTION TO PROBABILITY AND STATISTICS. 2007 INSTRUCTOR LONGIN JAN LATECKI CHAPTER 10: COVARIANCE AND CORRELATION 1 2 As an example, take g(x, y) = xy for discrete random variables X and Y with the joint probability distribution given in the table. The expectation of XY is computed as follows: 3 With the rule above we can compute the expectation of a random variable X with a Bin(n,p) which can be viewed as sum of Ber(p) distributions: 4 Proof that E[X + Y] = E[X] + E[Y]: 5 Var(X + Y) is generally not equal to Var(X) + Var(Y) 6 If Cov(X,Y) > 0 , then X and Y are positively correlated. If Cov(X,Y) < 0, then X and Y are negatively correlated. If Cov(X,Y) =0, then X and Y are uncorrelated. Gustavo Orellana 7 8 Now let X and Y be two independent random variables. Then Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0. Hence, then X and Y are uncorrelated. We proved that if X and Y are two independent random variables, then they are uncorrelated. In general, E[XY] is NOT equal to E[X]E[Y]. INDEPENDENT VERSUS UNCORRELATED. If two random variables X and Y are independent, then X and Y are uncorrelated. The converse is not true as we will see on the next slide. 9 Then Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0 and X and Y are uncorrelated, but they are dependent. 10 The variance of a random variable with a Bin(n,p) distribution: 11 The covariance changes under a change of units The covariance Cov(X,Y) may not always be suitable to express the dependence between X and Y. For this reason, there is a standardized version of the covariance called the correlation coefficient of X and Y, which remains unaffected by a change of units and, therefore, is dimensionless. 12 13 Correlation coefficient is also called Pearson correlation coefficient. (from Wikipedia) Examples of scatter diagrams with different values of correlation coefficient. 14 (from Wikipedia) Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. 15