Notes 21 - Wharton Statistics Department

advertisement
Statistics 510: Notes 21
Reading: Sections 7.4-7.5
Schedule: I will e-mail homework 9 by Friday. It will not
be due in two weeks (the Friday after Thanksgiving).
I. Covariance, Variance of Sums and Correlations (Chapter
7.4)
The covariance between two random variables is a measure
of how they are related.
The covariance between X and Y, denoted by Cov( X , Y ) , is
defined by
Cov( X , Y )  E[( X  E[ X ])(Y  E[Y ])] .
Interpretation of Covariance: When Cov( X , Y )  0 , higher
than expected values of X tend to occur together with
higher than expected values of Y.
When Cov( X , Y )  0 , higher than expected values of X
tend to occur together with lower than expected values of
Y.
Example 1: From the Excel file stockdata.xls, we find that
the correlations of the monthly log stock returns, in
percentages, of Merck & Company, Johnson & Johnson,
General Electric, General Motors and Ford Motor
Company from January 1990 to December 1999 are
Covariance
1
Merck
Merck
Merck
Merck
J&J
J&J
J&J
GE
GE
GM
J&J
GE
GM
Ford
GE
GM
Ford
GM
Ford
Ford
36.17205
26.85792
15.92461
16.19535
48.45063
37.28136
40.37623
27.72861
26.63151
84.76252
Correlation: The magnitude of the covariance depends on
the variance of X and the variance of Y. A dimensionless
measure of the relationship between X and Y is the
correlation  ( X , Y ) :
Cov( X , Y )
( X ,Y ) 
Var ( X )Var (Y ) .
The correlation is always between -1 and 1. If X and Y are
independent, then  ( X , Y )  0 but the converse is not true.
Generally, the correlation is a measure of the degree of
linear dependence between X and Y.
Note that for a  0, b  0 ,
 (aX , bY ) 
Cov(aX , bY )
abCov( X , Y )

 ( X ,Y )
Var (aX ) Var(bY ) a Var( X ) b Var(Y )
(this is what is meant by saying that the correlation is
dimensionless – if X and Y are measured in certain units,
and the units are changed so that X becomes aX and Y
becomes bY, the correlation is not changed).
Example 1 continued: From the Excel file stockdata.xls, we
find that the correlations of the monthly log stock returns,
in percentages, of Merck & Company, Johnson & Johnson,
2
General Electric, General Motors and Ford Motor
Company from January 1990 to December 1999 are
Merck
Merck
Merck
Merck
J&J
J&J
J&J
GE
GE
GM
Correlation
0.419128
0.296728
0.182804
0.182194
0.449641
0.359493
0.381549
0.254941
0.239957
0.793547
J&J
GE
GM
Ford
GE
GM
Ford
GM
Ford
Ford
Properties of covariance:
By expanding the right side of the definition of the
covariance, we see that
Cov( X , Y )  E[ XY  E[ X ]Y  XE[Y ]  E[Y ]E[ X ]]
 E[ XY ]  E[ X ]E[Y ]  E[ X ]E[Y ]  E[ X ]E[Y ]
 E[ XY ]  E[ X ]E[Y ]
Note that if X and Y are independent, then
E[ XY ]  



 
 

xyf ( x, y )dxdy

 
xyf X ( x) fY ( y )dxdy


  xf X ( x)  yfY ( y )dy  dx
 




  xf X ( x)E[Y ]dx

 E[ X ]E[Y ]
Thus, if X and Y are independent,
3
Cov( X , Y )  E[ XY ]  E[ X ]E[Y ]  0
(1.1)
The converse of (1.1) is not true. Consider the sample
space
S  {(2, 4), (1,1), (0, 0), (1,1), (2, 4)} with each point
having equal probability. Define the random variable X to
be the first component of the sample point chosen, and Y
the second. Therefore, X (2, 4)  2, Y (2, 4)  4, and so
on. X and Y are dependent, yet their covariance is 0. The
former is true because
1
12 2
 P( X  1, Y  1)  P ( X  1) * P (Y  1) 

5
5 5 25
To verify the latter, note that
1
1
E ( XY )  [(8)  (1)  0  1  8]  0, E ( X )  [(2)  (1)  0  1  2)]  0
5
5
1
and E (Y )  [4  1  0  1  4]  2
5
Thus,
Cov( X , Y )  E ( XY )  E ( X ) E (Y )  0  0* 2  0 .
Proposition 4.2 lists some properties of covariance.
Proposition 4.2:
(i) Cov( X , Y )  Cov(Y , X )
(ii) Cov( X , X )  Var ( X )
(iii) Cov(aX , Y )  aCov( X , Y )
m
 n
 n m
(iv) Cov   X i ,  Y j    Cov( X i , Y j ) .
j 1
 i 1
 i 1 j 1
Combining properties (ii) and (iv), we have that
4
 n
 n
Var   X i   Var ( X i )  2 Cov( X i , X j ) (1.2)
i j
 i 1  i 1
If X 1 , , X n are pairwise independent (meaning that X i and
X j are independent for i  j ), then Equation (1.2) reduces
to
 n
 n
Var   X i   Var ( X i ) .
 i 1  i 1
Example 2: Let X be a hypergeometric
(n, m, N  m) random variable – i.e., X is the number of
white balls drawn in n random draws from an urn without
replacement that originally consists of m white balls and
N  m black balls. Find the variance of X.
1 if ith ball selected is white
I

Let i 0 if ith ball selected is black .

Then X  I1   I n and
Var ( X )  Var ( I1  I n )
n
  Var ( I i )  2 Cov( I i , I j )
i 1
i j
m
E
(
I
)

I i2  I i so that
Var
(
I
)
i
To calculate
and
i , note
N
m
m m2
2
2
2
E ( I i )  , Var ( I i )  E ( I i )  [ E ( I i )]   2 .
N
N N
5
To calculate Cov( I i , I j ) , we use the formula
Cov( I i , I j )  E ( I i I j )  E ( I i ) E ( I j ) . Note that I i I j  1 if
both the ith and jth balls are white, and 0 otherwise. Thus,
E ( I i I j ) =P(ith and jth balls are white). By considering the
sequence of experiments look at the ith ball, look at the jth
ball, then look at the 1st ball, look at the 2nd’s ball, ..., look
at the i-1 ball, look at the i+1 ball, ..., look at the j-1 ball,
look at the j+1 ball, ..., look at the nth ball, we see that
P(ith and jth balls are white) =
m *(m  1)*( N  2)*( N  3)* *( N  n  1) m *(m  1)

N *( N  1)* *( N  n  1)
N *( N  1) .
n *(n  1)
E
(
I
I
)

i j
Thus,
N *( N  1) and
Cov( I i , I j )  E ( I i I j )  E ( I i ) E ( I j )

m *(m  1) m m
 *
N *( N  1) N N

m  m 1 m 
 

N  N 1 N 

m  m N 


N  ( N  1) N 
and
6
Var ( X )  Var ( I1 
In )
n
  Var ( I i )  2 Cov( I i , I j )
i 1
i j
 m  m 2 
n(n  1) m  m  N 
 n    2


2
N  ( N  1) N 
 N  N  
m  m 
n 1 
 1  1 

N  N  N  1 
Note that the variance of a binomial random variable with n
m
p

trials and probability
N of success for each trial is
m m
1   , so the variance for the hypergeometric is
N N
n 1 

1

smaller by a factor of  N  1  ; this is due to the
negative covariance between I i and I j for the
hypergeometric.
II. Conditional Expectation (Chapter 7.5)
Recall that if X and Y are joint discrete random variables,
the conditional probability mass function of X, given that
Y  y , is defined for all y, such that P(Y  y )  0 , by
p ( x, y )
p X |Y ( x | y )  P( X  x | Y  y ) 
pY ( y ) .
7
It is natural to define, in this case, the conditional
expectation of X, given that Y  y for all values of y such
that pY ( y )  0 by
E[ X | Y  y ]   xP{ X  x | Y  y}
x
  xp X |Y ( x | y )
.
x
The conditional expectation of X, given that Y  y ,
represents the long run mean value of X in many
independent repetitions of experiments in which Y  y .
For continuous random variables, the conditional
expectation of X, given that Y  y , by

E[ X | Y  y]   xf X |Y ( x | y)dx

provided that fY ( y)  0 .
Example 3: Suppose that events occur according to a
Poisson process with rate  . Let N be the number of
events occurring in the time period [0,1]. For p  1 , let X
be the number of events occurring in the time period [0, p ] .
Find the conditional probability mass function and the
conditional expectation of X given that N  n .
8
9
III. Computing Expectations and Probabilities by
Conditioning (Section 7.5.2-7.5.3)
Let us denote by E[ X | Y ] that function of the random
variable Y whose value at Y  y is E[ X | Y  y ] . Note that
E[ X | Y ] is itself a random variable. An important property
of conditional expectations is the following proposition:
Proposition 7.5.1:
E[ X ]  E[ E[ X | Y ]]
(1.3)
If Y is a discrete random variable, then equation (1.3) states
that
E[ X ]   E[ X | Y  y ]P{Y  y}
(1.4)
y
If Y is a continuous random variable, then equation (1.3)
states that

E[ X ]   E[ X | Y  y] fY ( y)dy

One way to understand equation (1.4) is to interpret it as
follows: To calculate E[ X ] , we may take a weighted
average of the conditional expected value of X, given that
Y  y , each of the terms E[ X | Y  y ] being weighted by
the probability of the event on which it is conditioned.
Equation (1.4) is a “law of total expectation” that is
analogous to the law of total probability (Section 3.3, notes
6).
10
Example 4: A miner is trapped in a mine containing 3
doors. The first door leads to a tunnel that will take him to
safety after 3 hours of travel. The second door leads to a
tunnel that will return him to the mine after 5 hours of
travel. The third door leads to a tunnel that will return him
to the mine after 7 hours. If we assume that the miner is at
at all times equally likely to choose any one of the doors,
what is the expected length of time until he reaches safety?
11
Example 5: A random rectangle is formed in the following
way: The base, X, is a uniform [0,1] random variable and
after having generated the base, the height is chosen to be
uniform on [0, X ] . Find the expected area of the rectangle.
12
IV. Conditional Variance (Section 7.5.4)
The conditional variance of X | Y  y is the expected
squared difference of the random variable X and its
conditional mean, conditioning on the event that Y  y :
Var ( X | Y  y )  E[ X  E[ X | Y  y ] | Y  y ] .
2
There is a very useful formula for the variance of a random
variable X in terms of the conditional mean and conditional
variance of X | Y :
Proposition 7.5.2:
Var ( X )  E[Var ( X | Y )]  Var ( E[ X | Y ]) .
Proof:
By the same reasoning that yields
Var ( X )  E[ X 2 ]  ( E[ X ])2 , we have that
Var ( X | Y )  E[ X 2 | Y ]  ( E[ X | Y ])2 . Thus,
E[Var ( X | Y )]  E[ E[ X 2 | Y ]]  E[( E[ X | Y ]) 2 ]
(1.5)
 E[ X 2 ]  E[( E[ X | Y ]) 2 ]
Also, as E[ E[ X | Y ]]  E[ X ] , we have that
Var ( E[ X | Y ])  E[( E[ X | Y ])2 ]  ( E[ X ]) 2
(1.6)
Hence, by adding Equations (1.5) and (1.6), we have that
Var ( X )  E[Var ( X | Y )]  Var ( E[ X | Y ]) .
Example 6: Suppose that by any time t, the number of
people that have arrived at a train depot is a Poisson
random variable with mean t . If the initial train arrives at
13
the depot at a time (independent of when the passengers
arrive) that is uniformly distributed over (0, T ) , what is the
mean and variance of the number of passengers that enter
the train?
14
Download