Lecture 3: Review of statistics – two random variables BUEC 333

advertisement
Lecture 3:
Review of statistics – two random variables
BUEC 333
Professor David Jacks
1
The most interesting questions in economics
generally involve two (or more) variables, e.g. the
relationship between stock prices and earnings.
We can describe the probabilistic relationship
between two (or more) RVs using three kinds of
probability distributions:
1.) the joint distribution
2.) marginal distributions
3.) conditional distributions
Two random variables
2
The joint distribution of discrete RVs X and Y is
the probability that the two RVs simultaneously
take on certain values, x and y and denoted as
Pr (X = x, Y = y).
Example: the relationship between weather and
commuting time.
Let C denote commuting time which is either long
(C = 1) or short (C = 0) and let W denote weather
The joint distribution
3
Thus, there are four possible outcomes:
(C = 0, W = 0); (C = 0, W = 1);
(C = 1, W = 0); (C = 1, W = 1).
The probabilities of each of these outcomes define
the joint distribution of C and W.
Short Commute (C = 0)
Long Commute (C = 1)
Total
Foul Weather (W = 0) Fair Weather (W = 1) Total
0.15
0.25
0.55
0.05
The joint distribution
4
When X, Y have a joint distribution, a marginal
distribution is the probability of X or Y alone.
We can compute the marginal distribution of Y
from the joint distribution of X, Y by adding up the
probabilities of all possible outcomes where Y
takes a particular value (y).
That is, if X takes one of k possible values:
Marginal distributions
5
Example: weather and commuting time.
The marginal distribution of weather
The marginal distribution of commuting time
Foul Weather (W = 0) Fair Weather (W = 1)
Short Commute (C = 0)
0.15
0.25
Long Commute (C = 1)
0.55
0.05
Total
0.70
0.30
Marginal distributions
Total
0.40
0.60
1.00
6
The distribution of a RV Y given that another RV
X takes a specific value is called the conditional
distribution of Y given X.
The conditional probability that Y takes value y
when X takes value x is written as
Pr(Y = y | X = x).
In general,
Conditional distributions
7
Example: what is the probability of a long
commute (C = 1), given that the weather is foul
(W = 0) or in other words, Pr(C = 1 | W = 0)?
Foul Weather (W = 0) Fair Weather (W = 1)
Short Commute (C = 0)
0.15
0.25
Long Commute (C = 1)
0.55
0.05
Total
0.70
0.30
Total
0.40
0.60
1.00
The joint probability is given by 0.55 whereas the
marginal probability of foul weather is 0.70.
Pr(C = 1 | W = 0)
Conditional distributions
8
The mean of the conditional distribution of Y
given X is called the conditional expectation (or
mean) of Y given X.
It is the expected value of Y, given that X takes a
particular value.
It is computed just like a regular (i.e.
unconditional) expectation, but uses the
conditional distribution instead of the marginal.
Conditional expectation
9
If Y takes one of k possible values y1, ... , yk, then
k
E Y | X  x    yi PrY  yi | X  x 
i 1
From before, suppose a long commute takes 45
minutes and a short one 30 minutes:
what is the expected length of the commute
conditional on weather (fair or foul)?
Foul: 30 * 0.15/0.7 + 45*0.55/0.7 = 41.79 minutes
Conditional expectation
10
There is a simple relationship between conditional
and unconditional expectations called the law of
iterated expectations.
Intuitively, an unconditional expectation is just a
weighted average of conditional expectations
where the weights are the probabilities of outcomes
on which we are conditioning.
E.g., the mean commuting time is just a weighted
average of the mean time in foul weather
The law of iterated expectations
11
For a RV Y and a discrete RV X that takes one of m
possible values, the law of iterated expectations is
m
E Y    E Y | X  xi  Pr  X  xi 
i 1
 E Y   E[ E (Y | X )]
E(commuting time) =
E(commuting time | foul weather)*Pr(foul weather)
+
E(commuting time | fair weather)*Pr(fair weather)
The law of iterated expectations
12
Just like before, we call the variance of the
conditional distribution the conditional variance.
It tells us how dispersed the distribution of a RV is,
conditional on another RV taking a specific value.
Again, it is calculated just like the unconditional
variance only we replace the unconditional mean
Conditional variance
13
So, if Y takes one of k possible values, then
k
Var (Y | X  x)    yi  E Y | X  x  Pr Y  yi | X  x 
2
i 1
Example: the conditional variance of commuting in
foul weather.
Var(commuting time | foul weather) =
(45 – 41.79)2 * 0.55/0.7 + (30 – 41.79)2 * 0.15/0.7
Var(commuting time | foul weather) = 37.88 min2
Conditional variance
14
Quite often, we are interested in quantifying the
relationship between two RVs (in fact, linear
regression methods do exactly this).
When two RVs are completely unrelated, we say
they are independently distributed (or simply
independent).
This implies that knowing the value of one RV (X)
provides
Independence
15
X and Y are independent if the conditional
distribution of Y given X equals the marginal
distribution of Y, or Pr(Y = y | X = x) = Pr(Y = y).
Equivalently, X and Y are independent if the joint
distribution of X and Y equals the product of their
marginal distributions, or
Pr(Y = y, X = x) = Pr(Y = y) * Pr(X = x).
From above and definition of the conditional
distribution:
Independence
16
A very common measure of association between
two RVs is their covariance, a measure of the
extent to which two RVs move together.
In general, Cov(X,Y) = σXY = E[(X – μX)(Y – μY)].
In the discrete case, if X takes one of m values and
Y takes one of k values, then
Cov( X , Y )   x j   X  yi  Y  PrX  x j , Y  yi 
k
m
i 1 j 1
Covariance
17
Interpretation:
1.) If X and Y have positive covariance (σXY > 0)
then X > μX when Y > μY, and X < μX when Y < μY
on average, meaning X and Y tend to move in the
same direction.
2.) Conversely, if σXY < 0 then X > μX when
Y < μY, and X < μX when Y > μY on average,
Covariance
18
3.) If X and Y have zero covariance (σXY = 0), this
does not mean that X and Y are independent.
But the opposite is true: if X and Y are independent,
then σXY = 0; this implies independence is a
stronger property than zero covariance.
In fact, covariance is only a measure of linear
association
An important caveat
19
Unfortunately, covariance is measured in units of X
times units of Y, making interpretation difficult.
A unit-less measure of association instead is the
correlation between X and Y:
Cov X , Y 
 XY
Corr  X , Y    XY 

Var  X Var Y   X  Y
Fun facts:
1.) Corr(X,Y) lies between –1 and 1.
2.) If Cov(X,Y) = 0
Covariance and correlation
20
Foul Weather (W = 0) Fair Weather (W = 1)
30 min Commute (C = 0)
0.15
0.25
45 min Commute (C = 1)
0.55
0.05
Total
0.70
0.30
Total
0.40
0.60
1.00
E(weather) = 0*0.7 + 1*0.3 = 0.3 (1 = fair weather)
E(commuting time) = 39 minutes
Var(weather) = (0 – 0.3)2*0.7 + (1 – 0.3)2*0.3
= (0.09)*0.7 + (0.49)*0.3
Covariance and correlation
21
Var(commuting time) = 54 (check this at home!!!)
Cov(weather, commuting time) =
(0 – 0.3)(30 – 39)*0.15 +
(0 – 0.3)(45 – 39)*0.55 +
(1 – 0.3)(30 – 39)*0.25 +
(1 – 0.3)(45 – 39)*0.05
Cov(weather, commuting time) =
Cov(weather, commuting time) =
Covariance and correlation
22
That is, when weather is fair, commuting time is
shorter, but the magnitude is hard to interpret.
But Corr(weather, commuting time) =
 1.95
 0.579
0.21* 54
This is easier to interpret: –0.579 is a quite “large”
negative number on a scale from –1 to 1.
Covariance and correlation
23
If X, Y, and V are RVs and a, b, and c are constants:
E a  bX  cY   a  b X  cY
Var aX  bY   a   b   2ab XY
2
 
2
X
2
2
Y
E Y 2   Y2  Y2
Cova  bX  cV , Y   b XY  c VY
E  XY    XY   X Y
Covariance and correlation
24
Download