Uploaded by vingts

Covariance and Correlation Presentation

advertisement
Terminology
Independence
Correlation
Covariance and Correlation
Professor Richard A. Levine
San Diego State University
Two Things
Terminology
Independence
Correlation
Two Things
Definitions
Relationship between two variables; joint distributions
• µX = E (X ), µY = E (Y )
• σX2 = VAR(X ) = E {(X − µX )2 };
σY2 = VAR(Y ) = E {(Y − µY )2 }
• Covariance: σXY = COV (X , Y ) = E {(X − µX ) · (Y − µY )}
• Correlation:
COV (X , Y )
ρ=
σX σY
Terminology
Independence
Correlation
Two Things
Covariance
The sign of COV (X , Y ) provides information on the X , Y
relationship:
• Large values of X tend to be observed with large values of Y :
COV (X , Y ) positive
• If X > µX , then Y > µY likely to be true, product of
deviations will be positive
• If X < µX , then Y < µY likely to be true, product of
deviations will be positive too
Terminology
Independence
Correlation
Two Things
Covariance
The sign of COV (X , Y ) provides information on the X , Y
relationship:
• Large values of X tend to be observed with large values of Y :
COV (X , Y ) positive
• If X > µX , then Y > µY likely to be true, product of
deviations will be positive
• If X < µX , then Y < µY likely to be true, product of
deviations will be positive too
• If large values of X tend to be observed with small values of
Y : COV (X , Y ) negative
• If small values of X tend to be observed with large values of
Y : COV (X , Y ) negative
Terminology
Independence
Correlation
Consequences
COV (X , Y ) = E {(X − µX ) · (Y − µY )}
= E (XY − µX Y − X µX + µX µY )
= E (XY ) − µX µY
=⇒ E (XY ) = ρσX σY + µX µY
Two Things
Terminology
Independence
Correlation
X and Y independent
• E (XY ) = E (X ) · E (Y )
• COV (X , Y ) = E (XY ) − µX µY = 0
• ρ = 0, no relationship
• VAR(X + Y ) = VAR(X ) · VAR(Y )
Two Things
Terminology
Independence
Correlation
Two Things
Variance of a sum, VAR(X + Y )
VAR(X + Y ) = E {(X + Y )2 } − {E (X + Y )}2
= E {(X + Y ) · (X + Y )} − E (X + Y ) · E (X + Y )
= E (X 2 + 2XY + Y 2 )
−{E (X )}2 − {E (Y )}2 − 2E (X )E (Y )
= VAR(X ) + VAR(Y ) + 2COV (X , Y )
Terminology
Independence
Correlation
|ρ| ≤ 1
Why? Consider the quadratic
h(b) = E {(X − µX )b + (Y − µY )}2
= b 2 E {(X − µX )2 } + 2bE {(X − µX )(Y − µY )}
+E {(Y − µY )2 }
= b 2 σX2 + 2bCOV (X , Y ) + σY2 ≥ 0, for every b
Two Things
Terminology
Independence
Correlation
|ρ| ≤ 1
Why? Consider the quadratic
h(b) = E {(X − µX )b + (Y − µY )}2
= b 2 E {(X − µX )2 } + 2bE {(X − µX )(Y − µY )}
+E {(Y − µY )2 }
= b 2 σX2 + 2bCOV (X , Y ) + σY2 ≥ 0, for every b
There is one real root and the discriminant b 2 − 4ac must be
non-positive:
=⇒ {2COV (X , Y )}2 − 4σX2 σY2 ≤ 0
=⇒ −σX σY ≤ COV (X , Y ) ≤ σX σY
=⇒ −1 ≤ ρ ≤ 1
Two Things
Terminology
Independence
Correlation
Two Things
ρ = ±1 iff P(Y = c + dX ) = 1
Why? |ρ| = 1 iff discriminant equals 0 or h(b) has a single root.
h(b) = 0 iff
P[{(X − µX )b + (Y − µY )}2 = 0] = 1
iff
P{(X − µX )b + (Y − µY ) = 0} = 1
iff
P(Y = cX + d) = 1
where c = −b and d = µX b + µY with b being the root of h(b).
Terminology
Independence
Correlation
Regression line
Least squares: find line (slope b) that minimizes h(b).
Two Things
Terminology
Independence
Correlation
Two Things
Regression line
h(b) = b 2 σX2 + 2bρσX σY + σY2
set
h0 (b) = −2ρσX σY + 2bσX2 = 0 =⇒ b =
ρσY
σX
h00 (b) = 2σX2 > 0
So least squares regression line (best fit according to h(b)) is
y = µY +
ρσY
(x − µX )
σX
Terminology
Independence
Correlation
Two Things
Regression line
y = µY +
ρσY
(x − µX )
σX
• If ρ > 0, slope is positive
• If ρ < 0, slope is negative
• If ρ = 0, h(ρσY /σX ) = σY2 a constant
If ρ is close to +1 or -1, h(ρσY /σX ) is relatively small.
Vertical distances of a point from the line are small since h is the
expected value of the square of those distances!
In all, ρ measures the amount of linearity in the distribution of
points.
Terminology
Independence
Correlation
Two Things
Uncorrelated =⇒
6
Independent
Independent random variables: f (x, y ) = fX (x) · fY (y )
Let Y have density symmetric about zero and X = SY . Here S is
independent of Y and takes on values +1 and -1 with probability
1/2 each. This means fX (x) = 12 fY (x) + 21 fY (−x).
Terminology
Independence
Correlation
Two Things
Uncorrelated =⇒
6
Independent
Independent random variables: f (x, y ) = fX (x) · fY (y )
Let Y have density symmetric about zero and X = SY . Here S is
independent of Y and takes on values +1 and -1 with probability
1/2 each. This means fX (x) = 12 fY (x) + 21 fY (−x).
E (S) = 1 · P(S = 1) + (−1) · P(S = −1) = 0.5 − 0.5 = 0
COV (X , Y ) = COV (SY , Y )
= E (SY · Y ) − E (SY ) · E (Y )
= E (S) · E (Y 2 ) − E (S) · {E (Y )}2
= 0
but X = SY , so X and Y not independent
Terminology
Independence
Correlation
Covariance inequality
ρ = COV (X , Y )/(σX σY ) and |ρ| ≤ 1
{COV (X , Y )}2 ≤ VAR(X ) · VAR(Y )
(for those who have taken more analysis, this is a version of the
Cauchy-Schwarz inequality)
Two Things
Download