Uploaded by vingts

7. Lectures D- Covariance and Correlation

advertisement
Terminology
Independence
Correlation
Covariance and Correlation
Professor Richard A. Levine
San Diego State University
Two Things
Terminology
Independence
Correlation
Two Things
Definitions
Relationship between two variables; joint distributions
• µX = E (X ), µY = E (Y )
• σX2 = VAR(X ) = E {(X − µX )2 };
σY2 = VAR(Y ) = E {(Y − µY )2 }
• Covariance: σXY = COV (X , Y ) = E {(X − µX ) · (Y − µY )}
• Correlation:
COV (X , Y )
ρ=
σX σY
Terminology
Independence
Correlation
Two Things
Covariance
The sign of COV (X , Y ) provides information on the X , Y
relationship:
• Large values of X tend to be observed with large values of Y :
COV (X , Y ) positive
• If X > µX , then Y > µY likely to be true, product of
deviations will be positive
• If X < µX , then Y < µY likely to be true, product of
deviations will be positive too
Terminology
Independence
Correlation
Two Things
Covariance
The sign of COV (X , Y ) provides information on the X , Y
relationship:
• Large values of X tend to be observed with large values of Y :
COV (X , Y ) positive
• If X > µX , then Y > µY likely to be true, product of
deviations will be positive
• If X < µX , then Y < µY likely to be true, product of
deviations will be positive too
• If large values of X tend to be observed with small values of
Y : COV (X , Y ) negative
• If small values of X tend to be observed with large values of
Y : COV (X , Y ) negative
Terminology
Independence
Correlation
Consequences
COV (X , Y ) = E {(X − µX ) · (Y − µY )}
= E (XY − µX Y − X µX + µX µY )
= E (XY ) − µX µY
=⇒ E (XY ) = ρσX σY + µX µY
Two Things
Terminology
Independence
Correlation
X and Y independent
• E (XY ) = E (X ) · E (Y )
• COV (X , Y ) = E (XY ) − µX µY = 0
• ρ = 0, no relationship
• VAR(X + Y ) = VAR(X ) · VAR(Y )
Two Things
Terminology
Independence
Correlation
Two Things
Variance of a sum, VAR(X + Y )
VAR(X + Y ) = E {(X + Y )2 } − {E (X + Y )}2
= E {(X + Y ) · (X + Y )} − E (X + Y ) · E (X + Y )
= E (X 2 + 2XY + Y 2 )
−{E (X )}2 − {E (Y )}2 − 2E (X )E (Y )
= VAR(X ) + VAR(Y ) + 2COV (X , Y )
Terminology
Independence
Correlation
|ρ| ≤ 1
Why? Consider the quadratic
h(b) = E {(X − µX )b + (Y − µY )}2
= b 2 E {(X − µX )2 } + 2bE {(X − µX )(Y − µY )}
+E {(Y − µY )2 }
= b 2 σX2 + 2bCOV (X , Y ) + σY2 ≥ 0, for every b
Two Things
Terminology
Independence
Correlation
|ρ| ≤ 1
Why? Consider the quadratic
h(b) = E {(X − µX )b + (Y − µY )}2
= b 2 E {(X − µX )2 } + 2bE {(X − µX )(Y − µY )}
+E {(Y − µY )2 }
= b 2 σX2 + 2bCOV (X , Y ) + σY2 ≥ 0, for every b
There is one real root and the discriminant b 2 − 4ac must be
non-positive:
=⇒ {2COV (X , Y )}2 − 4σX2 σY2 ≤ 0
=⇒ −σX σY ≤ COV (X , Y ) ≤ σX σY
=⇒ −1 ≤ ρ ≤ 1
Two Things
Terminology
Independence
Correlation
Two Things
ρ = ±1 iff P(Y = c + dX ) = 1
Why? |ρ| = 1 iff discriminant equals 0 or h(b) has a single root.
h(b) = 0 iff
P[{(X − µX )b + (Y − µY )}2 = 0] = 1
iff
P{(X − µX )b + (Y − µY ) = 0} = 1
iff
P(Y = cX + d) = 1
where c = −b and d = µX b + µY with b being the root of h(b).
Terminology
Independence
Correlation
Regression line
Least squares: find line (slope b) that minimizes h(b).
Two Things
Terminology
Independence
Correlation
Two Things
Regression line
h(b) = b 2 σX2 + 2bρσX σY + σY2
set
h0 (b) = −2ρσX σY + 2bσX2 = 0 =⇒ b =
ρσY
σX
h00 (b) = 2σX2 > 0
So least squares regression line (best fit according to h(b)) is
y = µY +
ρσY
(x − µX )
σX
Terminology
Independence
Correlation
Two Things
Regression line
y = µY +
ρσY
(x − µX )
σX
• If ρ > 0, slope is positive
• If ρ < 0, slope is negative
• If ρ = 0, h(ρσY /σX ) = σY2 a constant
If ρ is close to +1 or -1, h(ρσY /σX ) is relatively small.
Vertical distances of a point from the line are small since h is the
expected value of the square of those distances!
In all, ρ measures the amount of linearity in the distribution of
points.
Terminology
Independence
Correlation
Two Things
Uncorrelated =⇒
6
Independent
Independent random variables: f (x, y ) = fX (x) · fY (y )
Let Y have density symmetric about zero and X = SY . Here S is
independent of Y and takes on values +1 and -1 with probability
1/2 each. This means fX (x) = 12 fY (x) + 21 fY (−x).
Terminology
Independence
Correlation
Two Things
Uncorrelated =⇒
6
Independent
Independent random variables: f (x, y ) = fX (x) · fY (y )
Let Y have density symmetric about zero and X = SY . Here S is
independent of Y and takes on values +1 and -1 with probability
1/2 each. This means fX (x) = 12 fY (x) + 21 fY (−x).
E (S) = 1 · P(S = 1) + (−1) · P(S = −1) = 0.5 − 0.5 = 0
COV (X , Y ) = COV (SY , Y )
= E (SY · Y ) − E (SY ) · E (Y )
= E (S) · E (Y 2 ) − E (S) · {E (Y )}2
= 0
but X = SY , so X and Y not independent
Terminology
Independence
Correlation
Covariance inequality
ρ = COV (X , Y )/(σX σY ) and |ρ| ≤ 1
{COV (X , Y )}2 ≤ VAR(X ) · VAR(Y )
(for those who have taken more analysis, this is a version of the
Cauchy-Schwarz inequality)
Two Things
Download