docx - Tsinghua Math Camp 2015

advertisement
Probability & Statistics
Professor Wei Zhu
July 23rd
(1)
Let’s have some fun: A miner is trapped!
A miner is trapped in a mine containing 3 doors.
• The 1st door leads to a tunnel that will take him to safety after 3 hours.
• The 2nd door leads to a tunnel that returns him to the mine after 5 hours.
• The 3rd door leads to a tunnel that returns him to the mine after 7 hours.
At all times, he is equally likely to choose any one of the doors.
E(time to reach safety) ?
Theorem (Law of Total Expectation):
E(X) = EY (EX|Y [X|Y])
Exercise: Prove this theorem for the situation when X and Y are both discrete
random variables.
Special Case: If A1 , A2 , β‹― , Ak is a partition of the whole outcome space (*that is,
these events are mutually exclusive and exhaustive), then:
k
E(X) = ∑
E(X|Ai ) P(Ai )
i=1
(2) *Review: MGF, its second function: The m.g.f. will generate
the moments
Moment:
1st (population) moment: 𝐸(𝑋) = ∫ π‘₯ βˆ™ 𝑓(π‘₯) 𝑑π‘₯
2nd (population) moment: 𝐸(𝑋 2 ) = ∫ π‘₯ 2 βˆ™ 𝑓(π‘₯) 𝑑π‘₯
…
Kth (population) moment: 𝐸(𝑋 π‘˜ ) = ∫ π‘₯ π‘˜ βˆ™ 𝑓(π‘₯) 𝑑π‘₯
Take the Kth derivative of the 𝑀𝑋 (𝑑) with respect to t, and the set t = 0, we obtain the
Kth moment of X as follows:
𝑑
𝑀 (𝑑)|
= 𝐸(𝑋)
𝑑𝑑 𝑋
𝑑=0
𝑑2
𝑀 (𝑑)|
= 𝐸(𝑋 2 )
𝑑𝑑 2 𝑋
𝑑=0
…
π‘‘π‘˜
𝑀 (𝑑)|
= 𝐸(𝑋 π‘˜ )
π‘‘π‘‘π‘˜ 𝑋
𝑑=0
Note: The above general rules can be easily proven using calculus.
Exercise: Prove the above general relationships.
1
Example (proof of a special case): When ~𝑁(πœ‡, 𝜎 2 ) , we want to verify the above
equations for k=1 & k=2.
𝑑
𝑀 (𝑑)|
𝑑𝑑 𝑋
𝑑=0
1 2 2
𝑑
= (𝑒 πœ‡π‘‘+2𝜎
) βˆ™ (πœ‡ + 𝜎 2 𝑑) (using the chain rule)
So when t=0
𝑑
𝑀 (𝑑)|
= πœ‡ = 𝐸(𝑋)
𝑑𝑑 𝑋
𝑑=0
𝑑2
𝑑 𝑑
(𝑑)
𝑀
=
[ 𝑀 (𝑑)]
𝑋
𝑑𝑑 2
𝑑𝑑 𝑑𝑑 𝑋
1 2 2
𝑑
= [(𝑒 πœ‡π‘‘+2𝜎 𝑑 ) βˆ™ (πœ‡ + 𝜎 2 𝑑)] (using the result of the Product Rule)
𝑑𝑑
1 2 2
1 2 2
= (𝑒 πœ‡π‘‘+2𝜎 𝑑 ) βˆ™ (πœ‡ + 𝜎 2 𝑑)2 + (𝑒 πœ‡π‘‘+2𝜎 𝑑 ) βˆ™ 𝜎 2
𝑑2
𝑀 (𝑑)|
= πœ‡ 2 + 𝜎 2 = 𝐸(𝑋 2 )
𝑑𝑑 2 𝑋
𝑑=0
Considering 𝜎 2 = Var(X) = E(𝑋 2 ) − πœ‡ 2
*(3). Review: Joint distribution, and independence
Definition. The joint cdf of two random variables X and Y are defined as:
𝐹𝑋,π‘Œ (π‘₯, 𝑦) = 𝐹(π‘₯, 𝑦) = 𝑃(𝑋 ≤ π‘₯, π‘Œ ≤ 𝑦)
Definition. The joint pdf of two discrete random variables X and Y are defined as:
𝑓𝑋,π‘Œ (π‘₯, 𝑦) = 𝑓(π‘₯, 𝑦) = 𝑃(𝑋 = π‘₯, π‘Œ = 𝑦)
Definition. The joint pdf of two continuous random variables X and Y are defined
as:
πœ•2
𝑓𝑋,π‘Œ (π‘₯, 𝑦) = 𝑓(π‘₯, 𝑦) = πœ•π‘₯πœ•π‘¦ 𝐹(π‘₯, 𝑦)
Definition. The marginal pdf of the discrete random variable X or Y can be obtained
by summation of their joint pdf as the following: 𝑓𝑋 (π‘₯) = ∑𝑦 𝑓(π‘₯, 𝑦) ; π‘“π‘Œ (𝑦) =
∑π‘₯ 𝑓(π‘₯, 𝑦) ;
Definition. The marginal pdf of the continuous random variable X or Y can be
∞
obtained by integration of the joint pdf as the following: 𝑓𝑋 (π‘₯) = ∫−∞ 𝑓(π‘₯, 𝑦) 𝑑𝑦;
∞
π‘“π‘Œ (𝑦) = ∫−∞ 𝑓(π‘₯, 𝑦) 𝑑π‘₯;
Definition. The conditional pdf of a random variable X or Y is defined as:
2
𝑓(π‘₯, 𝑦)
𝑓(π‘₯, 𝑦)
; 𝑓(𝑦|π‘₯) =
𝑓(𝑦)
𝑓(π‘₯)
Definition. The joint moment generating function of two random variables X and Y
is defined as
𝑀𝑋,π‘Œ (𝑑1 , 𝑑2 ) = 𝐸(𝑒 𝑑1 𝑋+𝑑2 π‘Œ )
Note that we can obtain the marginal mgf for X or Y as follows:
𝑀𝑋 (𝑑1 ) = 𝑀𝑋,π‘Œ (𝑑1 , 0) = 𝐸(𝑒 𝑑1 𝑋+0∗π‘Œ ) = 𝐸(𝑒 𝑑1 𝑋 ); π‘€π‘Œ (𝑑2 ) = 𝑀𝑋,π‘Œ (0, 𝑑2 ) = 𝐸(𝑒 0∗𝑋+𝑑2 ∗π‘Œ )
= 𝐸(𝑒 𝑑2 ∗π‘Œ )
𝑓(π‘₯|𝑦) =
Theorem. Two random variables X and Y are independent ⇔ (if and only if)
𝐹𝑋,π‘Œ (π‘₯, 𝑦) = 𝐹𝑋 (π‘₯)πΉπ‘Œ (𝑦) ⇔ 𝑓𝑋,π‘Œ (π‘₯, 𝑦) = 𝑓𝑋 (π‘₯)π‘“π‘Œ (𝑦) ⇔ 𝑀𝑋,π‘Œ (𝑑1 , 𝑑2 ) = 𝑀𝑋 (𝑑1 ) π‘€π‘Œ (𝑑2 )
Definition. The covariance of two random variables X and Y is defined as
𝐢𝑂𝑉(𝑋, π‘Œ) = 𝐸[(𝑋 − πœ‡π‘‹ )(π‘Œ − πœ‡π‘Œ )].
Theorem. If two random variables X and Y are independent, then we have
𝐢𝑂𝑉(𝑋, π‘Œ) = 0. (*Note: However, 𝐢𝑂𝑉(𝑋, π‘Œ) = 0 does not necessarily mean that X
and Y are independent.)
(4) *Definitions: population correlation & sample correlation
Definition: The population (Pearson Product Moment) correlation coefficient ρ is
defined as:
π‘π‘œπ‘£(𝑋, π‘Œ)
𝜌=
√π‘£π‘Žπ‘Ÿ(𝑋) ∗ π‘£π‘Žπ‘Ÿ(π‘Œ)
Definition: Let (X1 , Y1), …, (Xn , Yn) be a random sample from a given bivariate
population, then the sample (Pearson Product Moment) correlation coefficient r is
defined as:
π‘Ÿ=
∑(𝑋𝑖 − 𝑋̅)(π‘Œπ‘– − π‘ŒΜ…)
√[∑(𝑋𝑖 − 𝑋̅)2 ][∑(π‘Œπ‘– − π‘ŒΜ…)2 ]
3
Left: Karl Pearson FRS[1] (/ˈpΙͺΙ™rsΙ™n/; originally named Carl; 27 March 1857 – 27 April
1936[2]) was an influential English mathematician and biometrician.
Right: Sir Francis Galton, FRS (/ˈfrɑːnsΙͺs ΛˆΙ‘Ι”ΛltΙ™n/; 16 February 1822 – 17 January 1911)
was a British Scientist and Statistician.
(5) *Definition: Bivariate Normal Random Variable
(𝑋, π‘Œ)~𝐡𝑁(πœ‡π‘‹ , πœŽπ‘‹2 ; πœ‡π‘Œ , πœŽπ‘Œ2 ; 𝜌) where 𝜌 is the correlation between 𝑋 & π‘Œ
The joint p.d.f. of (𝑋, π‘Œ) is
1
1
π‘₯ − πœ‡π‘‹ 2
π‘₯ − πœ‡π‘‹ 𝑦 − πœ‡π‘¦
𝑓𝑋,π‘Œ (π‘₯, 𝑦) =
exp {−
[(
) − 2𝜌 (
)(
)
2
2(1 − 𝜌 )
πœŽπ‘‹
πœŽπ‘‹
πœŽπ‘Œ
2πœ‹πœŽπ‘‹ πœŽπ‘Œ √1 − 𝜌2
𝑦 − πœ‡π‘Œ 2
+(
) ]}
πœŽπ‘Œ
Exercise: Please derive the mgf of the bivariate normal distribution.
Q5. Let X and Y be random variables with joint pdf
1
1
π‘₯ − πœ‡π‘‹ 2
π‘₯ − πœ‡π‘‹ 𝑦 − πœ‡π‘¦
𝑓𝑋,π‘Œ (π‘₯, 𝑦) =
exp {−
[(
) − 2𝜌 (
)(
)
2
2(1 − 𝜌 )
πœŽπ‘‹
πœŽπ‘‹
πœŽπ‘Œ
2πœ‹πœŽπ‘‹ πœŽπ‘Œ √1 − 𝜌2
𝑦 − πœ‡π‘Œ 2
+(
) ]}
πœŽπ‘Œ
Where −∞ < π‘₯ < ∞, −∞ < 𝑦 < ∞. Then X and Y are said to have the bivariate
normal distribution. The joint moment generating function for X and Y is
1
𝑀(𝑑1 , 𝑑2 ) = exp [𝑑1 πœ‡π‘‹ + 𝑑2 πœ‡π‘Œ + (𝑑12 πœŽπ‘‹2 + 2πœŒπ‘‘1 𝑑2 πœŽπ‘‹ πœŽπ‘Œ + 𝑑22 πœŽπ‘Œ2 )]
2
.
(a) Find the marginal pdf’s of X and Y;
(b) Prove that X and Y are independent if and only if ρ = 0.
4
(Here ρ is indeed, the <population> correlation coefficient between X and Y.)
(c) Find the distribution of(𝑋 + π‘Œ).
(d) Find the conditional pdf of f(x|y), and f(y|x)
Solution:
(a)
The moment generating function of X can be given by
1
𝑀𝑋 (𝑑) = 𝑀(𝑑, 0) = 𝑒π‘₯𝑝 [πœ‡π‘‹ 𝑑 + πœŽπ‘‹2 𝑑 2 ].
2
Similarly, the moment generating function of Y can be given by
1
π‘€π‘Œ (𝑑) = 𝑀(𝑑, 0) = 𝑒π‘₯𝑝 [πœ‡π‘Œ 𝑑 + πœŽπ‘Œ2 𝑑 2 ].
2
Thus, X and Y are both marginally normal distributed, i.e.,
𝑋~𝑁(πœ‡π‘‹ , πœŽπ‘‹2 ), and π‘Œ~𝑁(πœ‡π‘Œ , πœŽπ‘Œ2 ).
The pdf of X is
𝑓𝑋 (π‘₯) =
The pdf of Y is
π‘“π‘Œ (𝑦) =
1
√2πœ‹πœŽπ‘‹
1
√2πœ‹πœŽπ‘Œ
𝑒π‘₯𝑝 [−
𝑒π‘₯𝑝 [−
(π‘₯ − πœ‡π‘‹ )2
2πœŽπ‘‹2
(𝑦 − πœ‡π‘Œ )2
2πœŽπ‘Œ2
].
].
(b)
If𝜌 = 0, then
1
𝑀(𝑑1 , 𝑑2 ) = exp [πœ‡π‘‹ 𝑑1 + πœ‡π‘Œ 𝑑2 + (πœŽπ‘‹2 𝑑12 + πœŽπ‘Œ2 𝑑22 )] = 𝑀(𝑑1 , 0) βˆ™ 𝑀(0, 𝑑2 )
2
Therefore, X and Y are independent.
If X and Y are independent, then
1
𝑀(𝑑1 , 𝑑2 ) = 𝑀(𝑑1 , 0) βˆ™ 𝑀(0, 𝑑2 ) = exp [πœ‡π‘‹ 𝑑1 + πœ‡π‘Œ 𝑑2 + (πœŽπ‘‹2 𝑑12 + πœŽπ‘Œ2 𝑑22 )]
2
1 2 2
= exp [πœ‡π‘‹ 𝑑1 + πœ‡π‘Œ 𝑑2 + (πœŽπ‘‹ 𝑑1 + 2πœŒπœŽπ‘‹ πœŽπ‘Œ 𝑑1 𝑑2 + πœŽπ‘Œ2 𝑑22 )]
2
Therefore, 𝜌 = 0
(c)
𝑀𝑋+π‘Œ (𝑑) = 𝐸[𝑒 𝑑(𝑋+π‘Œ) ] = 𝐸[𝑒 𝑑𝑋+π‘‘π‘Œ ]
Recall that 𝑀(𝑑1 , 𝑑2 ) = 𝐸[𝑒 𝑑1 𝑋+𝑑2 π‘Œ ], therefore we can obtain 𝑀𝑋+π‘Œ (𝑑)by 𝑑1 = 𝑑2 =
𝑑 in 𝑀(𝑑1 , 𝑑2 )
That is,
5
1
𝑀𝑋+π‘Œ (𝑑) = 𝑀(𝑑, 𝑑) = exp [πœ‡π‘‹ 𝑑 + πœ‡π‘Œ 𝑑 + (πœŽπ‘‹2 𝑑 2 + 2πœŒπœŽπ‘‹ πœŽπ‘Œ 𝑑 2 + πœŽπ‘Œ2 𝑑 2 )]
2
1 2
= exp [(πœ‡π‘‹ + πœ‡π‘Œ )𝑑 + (πœŽπ‘‹ + 2πœŒπœŽπ‘‹ πœŽπ‘Œ + πœŽπ‘Œ2 )𝑑 2 ]
2
∴ 𝑋 + π‘Œ ~𝑁(πœ‡ = πœ‡π‘‹ + πœ‡π‘Œ , 𝜎 2 = πœŽπ‘‹2 + 2πœŒπœŽπ‘‹ πœŽπ‘Œ + πœŽπ‘Œ2 )
(d)
The conditional distribution of X given Y=y is given by
𝑓(π‘₯|𝑦) =
𝑓(π‘₯, 𝑦)
1
=
𝑒π‘₯𝑝 −
𝑓(𝑦)
√2πœ‹πœŽπ‘‹ √1 − 𝜌2
2
𝜎
(π‘₯ − πœ‡π‘‹ − πœŽπ‘‹ 𝜌(𝑦 − πœ‡π‘Œ ))
π‘Œ
{
Similarly, we have the conditional distribution of Y given X=x is
𝑓(𝑦|π‘₯) =
𝑓(π‘₯, 𝑦)
1
=
𝑒π‘₯𝑝 −
𝑓(π‘₯)
√2πœ‹πœŽπ‘Œ √1 − 𝜌2
}
2
𝜎
(𝑦 − πœ‡π‘Œ − πœŽπ‘Œ 𝜌(π‘₯ − πœ‡π‘‹ ))
𝑋
2(1 − 𝜌2 )πœŽπ‘Œ2
{
Therefore:
.
2(1 − 𝜌2 )πœŽπ‘‹2
.
}
πœŽπ‘‹
(𝑦 − πœ‡π‘Œ ), (1 − 𝜌2 )πœŽπ‘‹2 )
πœŽπ‘Œ
πœŽπ‘Œ
π‘Œ|𝑋 = π‘₯ ~ 𝑁 (πœ‡π‘Œ + 𝜌 (π‘₯ − πœ‡π‘‹ ), (1 − 𝜌2 )πœŽπ‘Œ2 )
πœŽπ‘‹
𝑋|π‘Œ = 𝑦 ~ 𝑁 (πœ‡π‘‹ + 𝜌
Exercise:
1. Linear transformation : Let X ~ N (  ,  2 ) and Y ο€½ a οƒ— X  b , where a&b are
constants, what is the distribution of Y?
2. The Z-Score Distribution: Let X ~ N (  ,  2 ) , and Z ο€½
X 

οƒžaο€½
1

,b ο€½ ο€­


What is the distribution of Z?
3. Distribution of the Sample Mean:
If X 1 , X 2 ,
i .i . d .
, X n ~ N (  ,  2 ) , prove that X ~ N (  ,
2
n
).
6
4. Some musing over the weekend: We know from the bivariate normal
distribution that if X and Y follow a joint bivariate nornmal distribution,
then each variable (X or Y) is univariate normal, and furthermore, their
sum, (X+Y) also follows a univariate normal distribution. Now my question
is, do you think the sum of any two (univariate) normal random variables
(*even for those who do not have a joint bivriate normal distribution),
would always follow a univariate normal distribution? Prove you claim if
you answer is Yes, and provide at least one counter example if your answer
is no.
Exercise -- Solutions:
1. Linear transformation : Let X ~ N (  ,  2 ) and Y ο€½ a οƒ— X  b , where a&b are
constants, what is the distribution of Y?
Solution:
M Y (t ) ο€½ E (e tY ) ο€½ E[e t ( aX b ) ] ο€½ E (e atX bt ) ο€½ E (e atX οƒ— e bt )
ο€½ e οƒ— E (e
bt
atX
) ο€½ e οƒ—e
bt
at 
a 2 2t 2
2
ο€½ exp[( a  b)t 
a 2 2 t 2
]
2
Thus,
Y ~ N (a  b, a 2  2 )
2. Distribution of the Z-score:
Let X ~ N (  ,  2 ) , and Z ο€½
X 

οƒžaο€½
1

,b ο€½ ο€­


What is the distribution of Z?
Solution (1), the mgf approach:
M Z (t ) ο€½ e


ο€­ t

1
1
1
1
ο€­ t
 ( t )  2 ( t )2
t2
1
οƒ— M X ( t) ο€½ e  οƒ— e  2  ο€½ e 2

→ m.g.f. for N (0,1)
Thus, Z ο€½
X 

~ N (0,1)
7
Now with one standard normal table, we will be able to calculate the probability of
any normal random variable:
P ( a ο‚£ X ο‚£ b) ο€½ P (
ο€½ P(
a

a

ο‚£
X 

ο‚£Zο‚£
b
ο‚£

b

)
)
Solution (2), the c.d.f. approach:
FZ ( z ) ο€½ P( Z ο‚£ z ) ο€½ P(
X 
ο‚£ z)

ο€½ P( X ο‚£  οƒ— z   ) ο€½ FX ( οƒ— z   )
f Z ( z) ο€½
d
d
FZ ( z ) ο€½
FX ( οƒ— z   ) ο€½ f X ( οƒ— z   ) οƒ— 
dz
dz
ο€­
1
ο€½
e
2 
( οƒ— z   ο€­  ) 2
2 2
z2
1 ο€­2
 ο€½
e → the p.d.f. for N(0,1)
2
Solution (3), the p.d.f. approach:
ο€­
1
f X ( x) ο€½
2 
e
( x )2
2 2
x  οƒ—z  
Let the Jacobian be J:
Jο€½
dx d
ο€½ ( οƒ— z   ) ο€½ 
dz dz
ο€­
1
f z ( z ) ο€½| J | οƒ— f x ( x) ο€½  οƒ—
e
2
Z ο€½
X 

( x ο€­  )2
2 2
1 ο€­
ο€½
e
2
X ~ N ( ,
n
2
1 ο€­ z2
ο€½
e
2
~ N (0,1)
3. Distribution of the Sample Mean: If X 1 , X 2 ,
2
( οƒ— z   ο€­  )2
2 2
i .i . d .
, X n ~ N (  ,  2 ) , then
).
Solution:
M X (t ) ο€½ E (e tX ) ο€½ E (e
t
X 1  X 2 ο€«οŒο€« X n
n
)
8
= E (et
*
( X1   X n )
), where t * ο€½ t / n,
ο€½ M X1 ο€«οŒX n (t * ) ο€½ M X1 (t * )M X n (t * )
ο€½ (e
ο€½e
 X ~ N ( ,
1
2
t *   2t *2
)n
t 1
t
n  n 2 ( ) 2
n 2
n

ο€½e
t 
1 2 2
t
2 n
2
n
)
9
Download