Terminology Independence Correlation Covariance and Correlation Professor Richard A. Levine San Diego State University Two Things Terminology Independence Correlation Two Things Definitions Relationship between two variables; joint distributions • µX = E (X ), µY = E (Y ) • σX2 = VAR(X ) = E {(X − µX )2 }; σY2 = VAR(Y ) = E {(Y − µY )2 } • Covariance: σXY = COV (X , Y ) = E {(X − µX ) · (Y − µY )} • Correlation: COV (X , Y ) ρ= σX σY Terminology Independence Correlation Two Things Covariance The sign of COV (X , Y ) provides information on the X , Y relationship: • Large values of X tend to be observed with large values of Y : COV (X , Y ) positive • If X > µX , then Y > µY likely to be true, product of deviations will be positive • If X < µX , then Y < µY likely to be true, product of deviations will be positive too Terminology Independence Correlation Two Things Covariance The sign of COV (X , Y ) provides information on the X , Y relationship: • Large values of X tend to be observed with large values of Y : COV (X , Y ) positive • If X > µX , then Y > µY likely to be true, product of deviations will be positive • If X < µX , then Y < µY likely to be true, product of deviations will be positive too • If large values of X tend to be observed with small values of Y : COV (X , Y ) negative • If small values of X tend to be observed with large values of Y : COV (X , Y ) negative Terminology Independence Correlation Consequences COV (X , Y ) = E {(X − µX ) · (Y − µY )} = E (XY − µX Y − X µX + µX µY ) = E (XY ) − µX µY =⇒ E (XY ) = ρσX σY + µX µY Two Things Terminology Independence Correlation X and Y independent • E (XY ) = E (X ) · E (Y ) • COV (X , Y ) = E (XY ) − µX µY = 0 • ρ = 0, no relationship • VAR(X + Y ) = VAR(X ) · VAR(Y ) Two Things Terminology Independence Correlation Two Things Variance of a sum, VAR(X + Y ) VAR(X + Y ) = E {(X + Y )2 } − {E (X + Y )}2 = E {(X + Y ) · (X + Y )} − E (X + Y ) · E (X + Y ) = E (X 2 + 2XY + Y 2 ) −{E (X )}2 − {E (Y )}2 − 2E (X )E (Y ) = VAR(X ) + VAR(Y ) + 2COV (X , Y ) Terminology Independence Correlation |ρ| ≤ 1 Why? Consider the quadratic h(b) = E {(X − µX )b + (Y − µY )}2 = b 2 E {(X − µX )2 } + 2bE {(X − µX )(Y − µY )} +E {(Y − µY )2 } = b 2 σX2 + 2bCOV (X , Y ) + σY2 ≥ 0, for every b Two Things Terminology Independence Correlation |ρ| ≤ 1 Why? Consider the quadratic h(b) = E {(X − µX )b + (Y − µY )}2 = b 2 E {(X − µX )2 } + 2bE {(X − µX )(Y − µY )} +E {(Y − µY )2 } = b 2 σX2 + 2bCOV (X , Y ) + σY2 ≥ 0, for every b There is one real root and the discriminant b 2 − 4ac must be non-positive: =⇒ {2COV (X , Y )}2 − 4σX2 σY2 ≤ 0 =⇒ −σX σY ≤ COV (X , Y ) ≤ σX σY =⇒ −1 ≤ ρ ≤ 1 Two Things Terminology Independence Correlation Two Things ρ = ±1 iff P(Y = c + dX ) = 1 Why? |ρ| = 1 iff discriminant equals 0 or h(b) has a single root. h(b) = 0 iff P[{(X − µX )b + (Y − µY )}2 = 0] = 1 iff P{(X − µX )b + (Y − µY ) = 0} = 1 iff P(Y = cX + d) = 1 where c = −b and d = µX b + µY with b being the root of h(b). Terminology Independence Correlation Regression line Least squares: find line (slope b) that minimizes h(b). Two Things Terminology Independence Correlation Two Things Regression line h(b) = b 2 σX2 + 2bρσX σY + σY2 set h0 (b) = −2ρσX σY + 2bσX2 = 0 =⇒ b = ρσY σX h00 (b) = 2σX2 > 0 So least squares regression line (best fit according to h(b)) is y = µY + ρσY (x − µX ) σX Terminology Independence Correlation Two Things Regression line y = µY + ρσY (x − µX ) σX • If ρ > 0, slope is positive • If ρ < 0, slope is negative • If ρ = 0, h(ρσY /σX ) = σY2 a constant If ρ is close to +1 or -1, h(ρσY /σX ) is relatively small. Vertical distances of a point from the line are small since h is the expected value of the square of those distances! In all, ρ measures the amount of linearity in the distribution of points. Terminology Independence Correlation Two Things Uncorrelated =⇒ 6 Independent Independent random variables: f (x, y ) = fX (x) · fY (y ) Let Y have density symmetric about zero and X = SY . Here S is independent of Y and takes on values +1 and -1 with probability 1/2 each. This means fX (x) = 12 fY (x) + 21 fY (−x). Terminology Independence Correlation Two Things Uncorrelated =⇒ 6 Independent Independent random variables: f (x, y ) = fX (x) · fY (y ) Let Y have density symmetric about zero and X = SY . Here S is independent of Y and takes on values +1 and -1 with probability 1/2 each. This means fX (x) = 12 fY (x) + 21 fY (−x). E (S) = 1 · P(S = 1) + (−1) · P(S = −1) = 0.5 − 0.5 = 0 COV (X , Y ) = COV (SY , Y ) = E (SY · Y ) − E (SY ) · E (Y ) = E (S) · E (Y 2 ) − E (S) · {E (Y )}2 = 0 but X = SY , so X and Y not independent Terminology Independence Correlation Covariance inequality ρ = COV (X , Y )/(σX σY ) and |ρ| ≤ 1 {COV (X , Y )}2 ≤ VAR(X ) · VAR(Y ) (for those who have taken more analysis, this is a version of the Cauchy-Schwarz inequality) Two Things