Property of Point Estimators

advertisement
Lecture Notes #1
A Quick Review of Probability & Statistics
(0)
Review of Probability, the Monty Hall Problem
(http://en.wikipedia.org/wiki/Monty_Hall_problem)
The Monty Hall problem is a probability puzzle loosely based on the American
television game show Let's Make a Deal and named after the show's original host,
Monty Hall.
Suppose you're on a game show, and you're given the choice of three doors: Behind
one door is a car; behind the others, goats. You pick a door, say No. 1 [but the door is
not opened], and the host, who knows what's behind the doors, opens another door
(*always a door you did not choose, and behind which there is no car), say No. 3,
which has a goat. He then says to you, "Do you want to switch (*i.e. pick door No. 2),
or to stay (*i.e., stay with door No. 1 you picked initially)?" Is it to your advantage to
switch your choice?
The answer will be clear by computing and comparing the following two
probabilities: (1) what is your winning chance if your strategy is to stay? (2) what is
your winning chance if your strategy is to switch?
Solutions: (*many possible ways – but not all of them are correct even if your
answers are the right numbers. Here we present one approach using conditional
probability.)
P(Win _ By _ Stay )
 P(WBST | First _ Door _ Chosen _ Has _ Pr ize)  P( F .D.C _ Has _ Pr ize)
 P(WBST | First _ Door _ Chosen _ Has _ No _ Pr ize)  P( F .D.C _ Has _ No _ Pr ize)
1
2 1
 1  0  
3
3 3
1
P(Win _ By _ Switch)
 P(WBSW | First _ Door _ Chosen _ Has _ Pr ize)  P( F .D.C _ Has _ Pr ize)
 P(WBSW | First _ Door _ Chosen _ Has _ No _ Pr ize)  P( F .D.C _ Has _ No _ Pr ize)
1
2 2
 0   1 
3
3 3
(1)
Normal Distribution
Q. Who invented the normal distribution?
* Left: Abraham de Moivre (26 May 1667 in Vitry-le-François, Champagne,
France – 27 November 1754 in London, England) *From the Wikipedia
* Right: Johann Carl Friedrich Gauss (30 April 1777 – 23 February 1855)
2
<i> Probability Density Function (p.d.f.)
X ~ N (  ,  2 ) : X follows normal distribution of mean  and variance  2
f ( x) 
1
2 

( x )2
,    x  , x  R
2 2
e
b
P(a  X  b)   f ( x)dx = area under the pdf curve bounded by a and b
a
<ii> Cumulative Distribution Function (c.d.f.)
x
F ( x)  P( X  x)   f (t )dt

f ( x)  [ F ( x)]' 
d
F ( x)
dx
(2). Mathematical Expectation (Review).

Continuous random variable:
E[ g ( X )] 
 g ( x) f ( x)dx

3
Discrete random variable:
all
E[ g ( X )]   g ( x) P( X  x)
Properties of Expectations:
(1) E(c) = c, where c is a constant
(2) E[c*g(X)] = c*E[g(X)], where c is a constant
(3) E[g(X)+h(Y))]= E[g(X)]+E[h(Y)], for any X&Y
(4) E[g(X)*h(Y)] = E[g(X)]*E[h(Y)], if X & Y are independent –otherwise it is usually
not true.
Special case:

1) (population) Mean:   E ( X )   x  f ( x)dx

Note: E(aX+b) =aE(X)+b, where a & b are constants
2) (population) Variance:

Var(X) =  2  E[( X   )2 ]   ( x   )2  f ( x)dx  E( X 2 )  [ E( X )]2

Note: Var(aX+b) = a2Var(X), where a & b are constants
3) Moment generating function:

M X (t )  E (e tX )   e tx f ( x)dx , when X is continuous.

𝑴𝑿 (𝒕) = 𝐄(𝐞𝐭𝐗 ) = ∑𝐚𝐥𝐥 𝐩𝐨𝐬𝐬𝐢𝐛𝐥𝐞 𝐯𝐚𝐥𝐮𝐞𝐬 𝐨𝐟 𝐗 𝐞𝐭𝐱 𝐟(𝐱)
For normal distribution, X ~ N ( ,  ), f ( x) 
2

M X (t )   e f ( x)dx  e
tx
1
2 

e
( x )2
2 2
,   x  
1
2
t   2t 2

4) Moment:
1st (population) moment: 𝐸(𝑋) = ∫ 𝑥 ∙ 𝑓(𝑥) 𝑑𝑥
4
2nd (population) moment: 𝐸(𝑋 2 ) = ∫ 𝑥 2 ∙ 𝑓(𝑥) 𝑑𝑥
…
Kth (population) moment: 𝐸(𝑋 𝑘 ) = ∫ 𝑥 𝑘 ∙ 𝑓(𝑥) 𝑑𝑥
Theorem. If X 1 , X 2 are independent, then
f X1 , X 2 ( x1 , x2 )  f X1 ( x1 ) f X 2 ( x2 )
Theorem. If X 1 , X 2 are independent, then
M X 1  X 2 (t )  M X 1 (t ) M X 2 (t )
Theorem. Under regularity conditions, there is 1-1 correspondence
between the pdf and the mgf of a given random variable X. That is,
11
pdf f ( x) 
mgf M X (t ) .
e.g. Sgt. Jones wishes to select one army recruit into his unit. It is known that the IQ
distribution for all recruits is normal with mean 180 and standard deviation 10.
What is the chance that Sgt. Jones would select a recruit with an IQ of at least 200?
Sol) Let X represents the IQ of a randomly selected recruit.
X ~ N (   180,   10)
P( X  200)  P(
X  180 200  180

)  P( Z  2)  2.28%
10
10
e.g. Sgt. Jones wishes to select three army recruits into his unit. It is known that the
IQ distribution for all recruits is normal with mean 180 and standard deviation 10.
What is the chance that Sgt. Jones would select three recruits with an average IQ of
at least 200?
Sol) Let 𝑋1, 𝑋2 , 𝑋3 represents the IQ of three randomly selected recruits.
𝑋𝑖
𝑖.𝑖.𝑑.
~
𝑁(𝜇, 𝜎 2 ),
𝑖 = 1, 2, 3
5
X ~ N (  *  180, * 
P( X  200)  P(
10
)
3
X  180 200  180

)  P( Z  2 3 )  0.03%
10 / 3
10 / 3
e.g. (Another practical example of mathematical expectation) A gambler goes to
bet. The dealer has 3 dice, which are fair, meaning that the chance that each face
shows up is exactly 1/6.
The dealer says: "You can choose your bet on a number, any number from 1 to 6.
Then I'll roll the 3 dice. If none show the number you bet, you'll lose $1. If one shows
the number you bet, you'll win $1. If two or three dice show the number you bet,
you'll win $3 or $5, respectively."
Is it a fair game? (*A fair game is such that the expected, or equivalently the long
term average, winning is zero.)
𝟏
Solution: Let 𝑿 be # of dice show the number you bet, then 𝑿~𝑩𝒊𝒏𝒐𝒎𝒊𝒂𝒍(𝟑, 𝟔)
(Because the dealer roll 3 dice independently, and the probability for each
𝟏
dice show the number you bet is . That is,
𝟔
𝒙
𝟓 𝟑−𝒙
𝟑 𝟏
𝑷(𝑿 = 𝒙) = ( ) ( ) ( )
, 𝒙 = 𝟎, 𝟏, 𝟐, 𝟑
𝒙 𝟔
𝟔
Let 𝒀 be the amount of money you get, then,
𝟎
𝟓 𝟑−𝟎
𝟓 𝟑 𝟏𝟐𝟓
𝟑 𝟏
𝑷(𝒀 = −𝟏) = 𝑷(𝑿 = 𝟎) = ( ) ( ) ( )
=( ) =
𝟎 𝟔
𝟔
𝟔
𝟐𝟏𝟔
𝟏
𝟓 𝟑−𝟏
𝟏 𝟓 𝟐
𝟕𝟓
𝟑 𝟏
𝑷(𝒀 = 𝟏) = 𝑷(𝑿 = 𝟏) = ( ) ( ) ( )
=𝟑∙ ∙( ) =
𝟏 𝟔
𝟔
𝟔 𝟔
𝟐𝟏𝟔
𝟐
𝟓 𝟑−𝟐
𝟏 𝟐 𝟓
𝟏𝟓
𝟑 𝟏
𝑷(𝒀 = 𝟑) = 𝑷(𝑿 = 𝟐) = ( ) ( ) ( )
=𝟑∙( ) ∙ =
𝟐 𝟔
𝟔
𝟔
𝟔 𝟐𝟏𝟔
𝟑
𝟓 𝟑−𝟑
𝟏 𝟑
𝟏
𝟑 𝟏
𝑷(𝒀 = 𝟓) = 𝑷(𝑿 = 𝟑) = ( ) ( ) ( )
=( ) =
𝟑 𝟔
𝟔
𝟔
𝟐𝟏𝟔
6
Thus, your expected winning is
𝑬(𝒀) = ∑ 𝒚 ∙ 𝑷(𝒀 = 𝒚) = (−𝟏) ∙
𝒚
𝟏𝟐𝟓
𝟕𝟓
𝟏𝟓
𝟏
+𝟏∙
+𝟑∙
+𝟓∙
=𝟎
𝟐𝟏𝟔
𝟐𝟏𝟔
𝟐𝟏𝟔
𝟐𝟏𝟔
Therefore, it is a fair game.
*(3). Joint distribution, and independence
Definition. The joint cdf of two random variables X and Y are defined as:
𝐹𝑋,𝑌 (𝑥, 𝑦) = 𝐹(𝑥, 𝑦) = 𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦)
Definition. The joint pdf of two discrete random variables X and Y are defined as:
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑓(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)
Definition. The joint pdf of two continuous random variables X and Y are defined
as:
𝜕2
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑓(𝑥, 𝑦) = 𝜕𝑥𝜕𝑦 𝐹(𝑥, 𝑦)
Definition. The marginal pdf of the discrete random variable X or Y can be obtained
by summation of their joint pdf as the following: 𝑓𝑋 (𝑥) = ∑𝑦 𝑓(𝑥, 𝑦) ; 𝑓𝑌 (𝑦) =
∑𝑥 𝑓(𝑥, 𝑦) ;
Definition. The marginal pdf of the continuous random variable X or Y can be
∞
obtained by integration of the joint pdf as the following: 𝑓𝑋 (𝑥) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑦;
∞
𝑓𝑌 (𝑦) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥;
Definition. The conditional pdf of a random variable X or Y is defined as:
7
𝑓(𝑥|𝑦) =
𝑓(𝑥, 𝑦)
𝑓(𝑥, 𝑦)
; 𝑓(𝑦|𝑥) =
𝑓(𝑦)
𝑓(𝑥)
Definition. The joint moment generating function of two random variables X and Y
is defined as
𝑀𝑋,𝑌 (𝑡1 , 𝑡2 ) = 𝐸(𝑒 𝑡1 𝑋+𝑡2 𝑌 )
Note that we can obtain the marginal mgf for X or Y as follows:
𝑀𝑋 (𝑡1 ) = 𝑀𝑋,𝑌 (𝑡1 , 0) = 𝐸(𝑒 𝑡1 𝑋+0∗𝑌 ) = 𝐸(𝑒 𝑡1 𝑋 ); 𝑀𝑌 (𝑡2 ) = 𝑀𝑋,𝑌 (0, 𝑡2 ) = 𝐸(𝑒 0∗𝑋+𝑡2 ∗𝑌 )
= 𝐸(𝑒 𝑡2 ∗𝑌 )
Theorem. Two random variables X and Y are independent ⇔ (if and only if)
𝐹𝑋,𝑌 (𝑥, 𝑦) = 𝐹𝑋 (𝑥)𝐹𝑌 (𝑦) ⇔ 𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦) ⇔ 𝑀𝑋,𝑌 (𝑡1 , 𝑡2 ) = 𝑀𝑋 (𝑡1 ) 𝑀𝑌 (𝑡2 )
Definition. The covariance of two random variables X and Y is defined as
𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )].
Theorem. If two random variables X and Y are independent, then we have
𝐶𝑂𝑉(𝑋, 𝑌) = 0. (*Note: However, 𝐶𝑂𝑉(𝑋, 𝑌) = 0 does not necessarily mean that X
and Y are independent.)
Exercise
𝑄1. 𝐿𝑒𝑡 𝑋1 & 𝑋2 𝑏𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑁(𝜇, 𝜎 2 ). Prove that 𝑋1 + 𝑋2 and 𝑋1 − 𝑋2 are
independent in two approaches: (1) pdf, (2) mgf
Solution:
(1)
The pdf approach: Let
𝑊 = 𝑋1 + 𝑋2
𝑍 = 𝑋1 − 𝑋2
Then we have
8
𝑊+𝑍
2
𝑊−𝑍
𝑋2 =
2
𝑋1 =
This is a 1-1 transformation between (X1, X2) and (W, Z).
Define the Jacobian J of the transformation as:
𝜕𝑥1
𝐽 = | 𝜕𝑤
𝜕𝑥2
𝜕𝑤
𝜕𝑥1
1 1
𝜕𝑧 | = |2 2 | = −1/2
1
1
𝜕𝑥2
−
2
2
𝜕𝑧
Then the joint distribution of the new random variables is given by:
𝑤+𝑧 𝑤−𝑧
𝑓𝑊,𝑍 (𝑤, 𝑧) = 𝑓𝑋1, 𝑋2 (
,
) |𝐽|
2
2
Since 𝑋1 and 𝑋2 are independent, we have:
𝑓𝑋1 ,𝑋2 (𝑥1 , 𝑥2 ) = 𝑓𝑋1 (𝑥1 )𝑓𝑋2 (𝑥2 )
Thus we find:
𝑓𝑊,𝑍 (𝑤, 𝑧) = 𝑓𝑋1 (
=(
1
2
) e
−
𝑤+𝑧
𝑤−𝑧
) 𝑓𝑋2 (
) |𝐽|
2
2
2
2
1 𝑤+𝑧
1 𝑤−𝑧
(
−𝜇) − 2 (
−𝜇)
2
2σ2 2
2σ
∗
1
2
√2𝜋𝜎 2
1
1
1 1
− 2 (𝑤−2𝜇)2 − 2 (𝑧)2
4σ
4σ
= (
)e
2 2𝜋𝜎 2
1
1 2
1
1
(𝑤−2𝜇)2
−
−
𝑧
=
e 4σ2
e 4σ2
√2𝜋 × 2𝜎 2
√2𝜋 × 2𝜎 2
= 𝑓𝑍 (𝑧)𝑓𝑊 (𝑤)
9
So Z and W are independent. Furthermore we see that 𝑋1 + 𝑋2 ~𝑁(2𝜇, 2𝜎 2 ) and 𝑋1 −
𝑋2 ~𝑁(0, 2𝜎 2 )
(2)
The mgf approach: Now we p𝑟𝑜𝑣𝑒 𝑡ℎ𝑎𝑡
𝑴𝑿𝟏 +𝑿𝟐 ,𝑿𝟏 −𝑿𝟐 (𝒕𝟏 , 𝒕𝟐 ) = 𝑴𝑿𝟏 +𝑿𝟐 (𝒕𝟏 )𝑴𝑿𝟏 −𝑿𝟐 (𝒕𝟐 )
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏:
𝐿𝑒𝑡 𝑊 = 𝑋1 + 𝑋2 𝑎𝑛𝑑 𝑍 = 𝑋1 − 𝑋2
𝑊 𝑎𝑛𝑑 𝑍 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝑀𝑊,𝑍 (𝑡1 , 𝑡2 ) = 𝑀𝑊 (𝑡1 )𝑀𝑍 (𝑡2 )
𝑆𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑒 𝑓𝑜𝑟 𝑊 & 𝑍
𝑀𝑊,Z (𝑡1 , 𝑡2 ) = 𝐸(𝑒 𝑡1 𝑊+𝑡2 𝑍 ) ⇒
𝐸(𝑒 𝑡1 (𝑋1 +𝑋2 )+𝑡2 (𝑋1 −𝑋2 ) )
= 𝐸(𝑒 (𝑡1 +𝑡2 )𝑋1 +(𝑡1 −𝑡2 )𝑋2 )
𝐸(𝑒 (𝑡1 +𝑡2 )𝑋1 +(𝑡1 −𝑡2 )𝑋2 )
= 𝐸(𝑒 (𝑡1 +𝑡2 )𝑋1 )𝐸(𝑒 (𝑡1 −𝑡2 )𝑋2 ) 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑋1 & 𝑋2 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
∗ 𝐸(𝑒
(𝑡1 +𝑡2 )𝑋1
∗ 𝐸(𝑒
(𝑡1 −𝑡2 )𝑋2
∴ 𝐸(𝑒
)=
1
)+ (𝑡
)2 2
𝑒 𝜇(𝑡1 +𝑡2 2 1 +𝑡2 𝜎
)=
1
)+ (𝑡
)2 2
𝑒 𝜇(𝑡1 −𝑡2 2 1 −𝑡2 𝜎
(𝑡1 +𝑡2 )𝑋1 +(𝑡1 −𝑡2 )𝑋2
)=𝑒
𝜇𝑡1 +𝜇𝑡2 +
= exp ( 𝜇𝑡1 + 𝜇𝑡2 +
+
=𝑒
𝜇𝑡1 +𝜇𝑡2 +
𝜎2 𝑡12 2
𝜎2 𝑡22
+𝜎 𝑡1 𝑡2 +
2
2
=𝑒
𝜇𝑡1 −𝜇𝑡2 +
𝜎2 𝑡12 2
𝜎2 𝑡22
−𝜎 𝑡1 𝑡2 +
2
2
𝜎2 𝑡12 2
𝜎2 𝑡22
𝜎2 𝑡12 2
𝜎2 𝑡22
+𝜎 𝑡1 𝑡2 +
𝜇𝑡 −𝜇𝑡2 +
−𝜎 𝑡1 𝑡2 +
2
2 𝑒 1
2
2
𝜎 2 𝑡12
𝜎 2 𝑡22
𝜎 2 𝑡12
+ 𝜎 2 𝑡1 𝑡2 +
+ 𝜇𝑡1 − 𝜇𝑡2 +
− 𝜎 2 𝑡1 𝑡2
2
2
2
𝜎 2 𝑡22
)
2
1
1
= exp (2𝜇𝑡1 + (2𝜎 2 )𝑡12 + (2𝜎 2 )𝑡22 )
2
2
1
= 𝑒 2𝜇𝑡1 +2(2𝜎
2 )𝑡 2
1
1
𝑒 0𝑡2 +2(2𝜎
2 )𝑡 2
2
= 𝑀𝑋1 +𝑋2 (𝑡1 )𝑀𝑋1 −𝑋2 (𝑡2 )
= 𝑀𝑊 (𝑡1 )𝑀𝑍 (𝑡2 )
∴ 𝑀𝑊,𝑍 (𝑡1 , 𝑡2 ) = 𝑀𝑊 (𝑡1 )𝑀𝑍 (𝑡2 )
∴ 𝑊 = 𝑋1 + 𝑋2 𝑎𝑛𝑑 𝑍 = 𝑋1 − 𝑋2 are independent
10
Q2. 𝐿𝑒𝑡 𝑍 𝑏𝑒 𝑁(0,1), (1) please derive the covariance between 𝑍 𝑎𝑛𝑑 𝑍 2 ; (2) Are
𝑍 𝑎𝑛𝑑 𝑍 2 independent?
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏:
(1) 𝑐𝑜𝑣(𝑍, 𝑍 2 ) = 𝐸 [(𝑍 − 𝐸(𝑍)) (𝑍 2 − 𝐸(𝑍 2 ))] = 𝐸[(𝑍 − 0)(𝑍 2 − 1)] = 𝐸[𝑍 3 ] =
𝜕3
𝜕𝑡3
[𝑒𝑥𝑝(12𝑡 2 )]|𝑡=0 = 0
(2) 𝑍 𝑎𝑛𝑑 𝑍 2 are not independent. You can do it in many ways, using (a) pdf, (b)
cdf/probabilities, or (c) mgf.
For example, 𝑃(𝑍 > 1, 𝑍 2 > 1) = 𝑃(𝑍 > 1) ≠ 𝑃(𝑍 > 1) ∗ 𝑃(𝑍 2 > 1) = [𝑃(𝑍 > 1)]2
Q3. Let ~𝑁(3, 4) , please calculate 𝑃(1 < 𝑋 < 3)
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏:
𝑃(1 < 𝑋 < 3)
1−3 𝑋−3 3−3
= 𝑃(
<
<
)
2
2
2
= 𝑃(−1 < 𝑍 < 0)
= 𝑃(0 < 𝑍 < 1)
= 0.3413
𝑁𝑜𝑡𝑒: 𝐼𝑛 𝑡ℎ𝑒 𝑎𝑏𝑜𝑣𝑒, 𝑍~𝑁(0,1)
Q4. Students A and B plans to meet at the SAC Seawolf market between 12noon and
1pm tomorrow. The one who arrives first will wait for the other for 30 minutes, and
then leave. What is the chance that the two friends will be able to meet at SAC
during their appointed time period assuming that each one will arrive at SAC
independently, each at a random time between 12noon and 1pm?
Solution: 3/4 as illustrated by the following figure.
11
Let X and Y denote the arrival time (in hour where 0 indicates 12noon and 1
indicates 1pm) of A and B respectively. Our assumption implies that X and Y each
follows Uniform[0,1] distribution, and furthermore, they are independent, Thus the
joint pdf of X and Y follows the uniform distribution with f(x,y) = 1, if (x,y) is in the
square region, and f(x,y) = 0, outside the square region. A and B will be able to meet
iff |X − Y| ≤ 1/2 as represented by the red area M bounded by the lines: X − Y =
−1/2 and X − Y = +1/2.
That is:
1
1
1
3
P(A & B will meet) = P (|X − Y| ≤ 2) = ∬M 1𝑑𝑥𝑑𝑦 = 1 − 8 − 8 = 4
*Definitions: population correlation & sample correlation
Definition: The population correlation coefficient ρ is defined as:
𝜌=
𝑐𝑜𝑣(𝑋, 𝑌)
√𝑣𝑎𝑟(𝑋) ∗ 𝑣𝑎𝑟(𝑌)
Definition: Let (X1 , Y1), …, (Xn , Yn) be a random sample from a given bivariate
population, then the sample correlation coefficient r is defined as:
𝑟=
∑(𝑋𝑖 − 𝑋̅)(𝑌𝑖 − 𝑌̅)
√[∑(𝑋𝑖 − 𝑋̅)2 ][∑(𝑌𝑖 − 𝑌̅)2 ]
12
*Definition: Bivariate Normal Random Variable
(𝑋, 𝑌)~𝐵𝑁(𝜇𝑋 , 𝜎𝑋2 ; 𝜇𝑌 , 𝜎𝑌2 ; 𝜌) where 𝜌 is the correlation between 𝑋 & 𝑌
The joint p.d.f. of (𝑋, 𝑌) is
1
𝑥 2 𝑦 2 2𝜌𝑥𝑦
𝑓(𝑥, 𝑦) =
exp [−
( +
−
)]
2(1 − 𝜌2 ) 𝜎𝑋2 𝜎𝑌2 𝜎𝑋 𝜎𝑌
2𝜋𝜎𝑋 𝜎𝑌 √1 − 𝜌2
1
Exercise: Please derive the mgf of the bivariate normal distribution.
Q5. Let X and Y be random variables with joint pdf
𝑓𝑋,𝑌 (𝑥, 𝑦) =
1
exp {−
2𝜋𝜎𝑋 𝜎𝑌 √1 − 𝜌2
𝑦 − 𝜇𝑌 2
+(
) ]}
𝜎𝑌
1
𝑥 − 𝜇𝑋 2
𝑥 − 𝜇𝑋 𝑦 − 𝜇𝑦
[(
) − 2𝜌 (
)(
)
2
2(1 − 𝜌 )
𝜎𝑋
𝜎𝑋
𝜎𝑌
Where −∞ < 𝑥 < ∞, −∞ < 𝑦 < ∞. Then X and Y are said to have the bivariate
normal distribution. The joint moment generating function for X and Y is
1
𝑀(𝑡1 , 𝑡2 ) = exp [𝑡1 𝜇𝑋 + 𝑡2 𝜇𝑌 + (𝑡12 𝜎𝑋2 + 2𝜌𝑡1 𝑡2 𝜎𝑋 𝜎𝑌 + 𝑡22 𝜎𝑌2 )]
2
.
(a) Find the marginal pdf’s of X and Y;
(b) Prove that X and Y are independent if and only if ρ = 0.
(Here ρ is indeed, the <population> correlation coefficient between X and Y.)
(c) Find the distribution of(𝑋 + 𝑌).
(d) Find the conditional pdf of f(x|y), and f(y|x)
Solution:
(a)
13
The moment generating function of X can be given by
1
𝑀𝑋 (𝑡) = 𝑀(𝑡, 0) = 𝑒𝑥𝑝 [𝜇𝑋 𝑡 + 𝜎𝑋2 𝑡 2 ].
2
Similarly, the moment generating function of Y can be given by
1
𝑀𝑌 (𝑡) = 𝑀(𝑡, 0) = 𝑒𝑥𝑝 [𝜇𝑌 𝑡 + 𝜎𝑌2 𝑡 2 ].
2
Thus, X and Y are both marginally normal distributed, i.e.,
𝑋~𝑁(𝜇𝑋 , 𝜎𝑋2 ), and 𝑌~𝑁(𝜇𝑌 , 𝜎𝑌2 ).
The pdf of X is
𝑓𝑋 (𝑥) =
1
√2𝜋𝜎𝑋
𝑒𝑥𝑝 [−
(𝑥 − 𝜇𝑋 )2
2𝜎𝑋2
].
The pdf of Y is
𝑓𝑌 (𝑦) =
1
√2𝜋𝜎𝑌
𝑒𝑥𝑝 [−
(𝑦 − 𝜇𝑌 )2
2𝜎𝑌2
].
(b)
If𝜌 = 0, then
1
𝑀(𝑡1 , 𝑡2 ) = exp [𝜇𝑋 𝑡1 + 𝜇𝑌 𝑡2 + (𝜎𝑋2 𝑡12 + 𝜎𝑌2 𝑡22 )] = 𝑀(𝑡1 , 0) ∙ 𝑀(0, 𝑡2 )
2
Therefore, X and Y are independent.
If X and Y are independent, then
1
𝑀(𝑡1 , 𝑡2 ) = 𝑀(𝑡1 , 0) ∙ 𝑀(0, 𝑡2 ) = exp [𝜇𝑋 𝑡1 + 𝜇𝑌 𝑡2 + (𝜎𝑋2 𝑡12 + 𝜎𝑌2 𝑡22 )]
2
14
1
= exp [𝜇𝑋 𝑡1 + 𝜇𝑌 𝑡2 + (𝜎𝑋2 𝑡12 + 2𝜌𝜎𝑋 𝜎𝑌 𝑡1 𝑡2 + 𝜎𝑌2 𝑡22 )]
2
Therefore, 𝜌 = 0
(c)
𝑀𝑋+𝑌 (𝑡) = 𝐸[𝑒 𝑡(𝑋+𝑌) ] = 𝐸[𝑒 𝑡𝑋+𝑡𝑌 ]
Recall that 𝑀(𝑡1 , 𝑡2 ) = 𝐸[𝑒 𝑡1 𝑋+𝑡2 𝑌 ], therefore we can obtain 𝑀𝑋+𝑌 (𝑡)by 𝑡1 = 𝑡2 =
𝑡 in 𝑀(𝑡1 , 𝑡2 )
That is,
1
𝑀𝑋+𝑌 (𝑡) = 𝑀(𝑡, 𝑡) = exp [𝜇𝑋 𝑡 + 𝜇𝑌 𝑡 + (𝜎𝑋2 𝑡 2 + 2𝜌𝜎𝑋 𝜎𝑌 𝑡 2 + 𝜎𝑌2 𝑡 2 )]
2
1 2
= exp [(𝜇𝑋 + 𝜇𝑌 )𝑡 + (𝜎𝑋 + 2𝜌𝜎𝑋 𝜎𝑌 + 𝜎𝑌2 )𝑡 2 ]
2
∴ 𝑋 + 𝑌 ~𝑁(𝜇 = 𝜇𝑋 + 𝜇𝑌 , 𝜎 2 = 𝜎𝑋2 + 2𝜌𝜎𝑋 𝜎𝑌 + 𝜎𝑌2 )
(d)
The conditional distribution of X given Y=y is given by
2
𝑓(𝑥|𝑦) =
𝑓(𝑥, 𝑦)
1
=
𝑒𝑥𝑝 −
𝑓(𝑦)
√2𝜋𝜎𝑋 √1 − 𝜌2
𝜎
(𝑥 − 𝜇𝑋 − 𝜎𝑋 𝜌(𝑦 − 𝜇𝑌 ))
𝑌
.
2(1 − 𝜌2 )𝜎𝑋2
{
}
Similarly, we have the conditional distribution of Y given X=x is
2
𝑓(𝑦|𝑥) =
𝑓(𝑥, 𝑦)
1
=
𝑒𝑥𝑝 −
𝑓(𝑥)
√2𝜋𝜎𝑌 √1 − 𝜌2
{
𝜎
(𝑦 − 𝜇𝑌 − 𝜎𝑌 𝜌(𝑥 − 𝜇𝑋 ))
𝑋
2(1 − 𝜌2 )𝜎𝑌2
.
}
15
Therefore:
𝑋|𝑌 = 𝑦 ~ 𝑁 (𝜇𝑋 + 𝜌
𝜎𝑋
(𝑦 − 𝜇𝑌 ), (1 − 𝜌2 )𝜎𝑋2 )
𝜎𝑌
𝑌|𝑋 = 𝑥 ~ 𝑁 (𝜇𝑌 + 𝜌
𝜎𝑌
(𝑥 − 𝜇𝑋 ), (1 − 𝜌2 )𝜎𝑌2 )
𝜎𝑋
(4) Sampling from the Normal Population
*Example: We wish to estimate the distribution of heights of adult US male. It is
believed that the height of adult US male follows a normal distribution
N(μ, σ2 )
Def. Simple random sample: A sample in which every subject in the population has
the same chance to be selected.
X: The Random Variable denote the height of a adult male we will choose randomly
from the population
So X~N(μ, σ2 ): the distribution of a randomly selected subject is the population
distribution.
Theorem 1 Sampling from the normal population
Let X1 , X2 , … , Xn
𝑖𝑖𝑑
2
~ N(μ, σ ),
where i.i.d stands for independent and identically
distributed
2
σ
1. ̅
X~N(μ, n )
2. A function of the sample variance as shown below follows the Chi square
distribution with (n-1) degrees of freedom:
̅) 2
(n − 1)S 2 ∑𝑛𝑖=1(X𝑖 − X
2
=
~𝜒𝑛−1
2
2
σ
σ
*Reminder: The Sample variance S 2 is defined as:
S2 =
∑𝑛𝑖=1(X𝑖 − ̅
X)2
n−1
16
*Def 1: The Chi-square distribution is a special gamma distribution (*** Please
find out which one it is.)
*Def 2: Let Z1 , Z2 , … , Zk
𝑖𝑖𝑑
~ N(0,1),
then W = ∑𝑘𝑖=1 𝑍𝑖2 ~𝜒𝑘2
2
3. ̅
X & S are independent.
4. The Z-score transformation of the sample mean follows the standard normal
distribution:
̅
X−𝜇
Z=
~N(0,1)
σ/√𝑛
5. A T-random variable with n-1 degrees of freedom can be obtained by replacing
the population standard deviation with its sample counterpart in the Z-score
above:
̅−𝜇
X
T=
~𝑡𝑛−1
S/√𝑛
*Def. of t-distribution (Student’s t-distribution, first introduced by William Sealy
Gosset )
Let Z~N(0,1), 𝑊~𝜒𝑘2 , where Z&𝑊 are independent.
Then,
T=
Z
√𝑊
𝑘
= 𝑡𝑘
17
Wiki: William Sealy Gosset (June 13, 1876–October 16, 1937) is best known by his
pen name Student and for the Student's t-distribution.
(5) Point Estimators, MLE & MOME
Point Estimators
Example 1. Let X1, X2, …, Xn be a random sample from N(μ, σ2 ).
Please find a good point estimator for 1. μ 2. σ2
̅
Solutions. 1.μ̂ = X
̂2 = S 2
2. σ
18
There are the typical estimators for μ and S 2
Both are unbiased estimators.
Property of Point Estimators
Unbiased Estimators. θ̂ is said to be an unbiased estimator for θ if E(θ̂)= θ.
̅) = E (
E(X
X1 + X 2 + ⋯ + X n
E(X1 ) + E(X2 ) + ⋯ + E(Xn ) μ + μ + ⋯ + μ
)=
=
=μ
n
n
n
E(S 2 )= σ2 (*Please make sure you can prove this on your own. We had discussed this
before.)
Unbiased estimator may not be unique.
Example 2. E[∑ ai Xi ]=(∑ ai )μ
Let μ̃ =
∑ ai Xi
∑ ai
, it is unbiased for μ because E(μ̃) = μ
Variance of Unbiased Estimators.
σ2
̅) < 𝑣𝑎𝑟(Xi ) when n > 1
n } var(X
2
var(Xi ) = σ
̅) =
var(X
Methods for deriving point estimators
1. Maximum Likelihood Estimator (MLE)
2. Method of Moment Estimator (MOME)
Example 1 (continued). Xi
i. i. d
N(μ, σ2 ) ,i=1,2,…,n
~
1. Derive the MLE for μ and σ2 .
2. Derive the MOME for μ and σ2 .
Solution.
1. MLE
[i] pdf: f(xi )=
1
√2πσ
2
(x −μ)
− i
e 2σ2
, xi ∈ R, i = 1, … , n
19
[ii] likelihood function
L = f(x1 , x2 , … , xn )
n
n
= ∏ f(xi ) = ∏
i=1
=
i=1
1
√2πσ
(x −μ)2
− i 2
e 2σ
2
∑n
n
i=1(xi −μ)
2 −2 −
2
2σ
(2πσ ) . e
[iii] log likelihood function
∑ni=1(xi − μ)2
∑ni=1(xi − μ)2
n
n
n
2)
2)
l = lnL = − ln(2πσ −
= − ln(2π) − ln(σ −
2
2σ2
2
2
2σ2
[iv] Find the MLE’s
∑ni=1(xi − μ)
dl
=2
=0
dμ
σ2
dl
n 1 ∑ni=1(xi − μ)2
=0
{dσ2 = − 2 σ2 +
2σ4
μ̂ = ̅
X
n
⟹{ 2
∑i=1(Xi − ̅
X)2
σ =
̂
n
R. A. Fisher (http://en.wikipedia.org/wiki/Ronald_Fisher)
established the comprehensive theory for the maximum likelihood method.
20
2. MOME
Orde
r
Population
Moment
1st
X1 + X 2 + ⋯ + X n
n
2
2
X1 + X 2 + ⋯ + X n 2
=
n
…
k
k
X1 + X 2 + ⋯ + X n k
=
n
…
E(X)
2nd
E(X 2 )
…
…
kth
E(X k )
…
…
Sample Moment
=
Back to this example, we have two parameters, and we set up the equations for the
first two moments:
̅
E(X) = μ = X
E(X
2)
X1 2 + X 2 2 + ⋯ + X n 2
= μ + σ ==
n
2
2
⇒{ 2
X1
̂ =
σ
2
μ̂ = ̅
X
2
+ X2 + ⋯ + Xn 2
̅) 2
− (X
n
Proof. (Now we show that the MOME and the MLE of σ2 are the same for this normal
population example.
̅+X
̅) 2
∑ni=1 Xi 2
∑ni=1(Xi − X
2
̅
̅) 2
− (X) =
− (X
n
n
̅(Xi − ̅
∑ni=1(Xi − ̅
X)2 + ∑ni=1 2X
X) + ∑ni=1 ̅
X2
̅) 2
=
− (X
n
̅ )2
∑ni=1(Xi − X
=
n
Therefore, the MLE and MOME for σ2 are the same for the normal population.
21
2)
E(σ
̂
̅) 2
̅) 2
∑ni=1(Xi − X
n − 1 ∑ni=1(Xi − X
n−1
= E(
) = E(
)=
E(S 2 )
n
n
n−1
n
n − 1 2n → ∞ 2
=
σ
σ
⇒
n
(Asymptotically unbiased)
Karl Pearson, who developed the MOME
method (http://en.wikipedia.org/wiki/Karl_pearson)
From left to right: Egon Sharpe Pearson, Karl Pearson, Maria Pearson (née
Sharpe), Sigrid Letitia Sharpe Pearson
Example 3. Let Xi i.i.d
Bernoulli(p), i = 1, … , n.
~
Please derive 1. The MLE of p
2. The MOME of p.
Solution.
1. MLE
22
[i] f(xi ) = pxi (1 − p)1−xi , i = 1, … , n
[ii]L = p∑ xi (1 − p)n−∑ xi
[iii] l = lnL = (∑ xi )lnp + (n − ∑ xi )ln(1 − p)
[iv] Solving:
dl
=
dp
∑ xi
p
−
n−∑ xi
1−p
= 0 , we obtain the MLE: p̂ =
∑n
i=1 Xi
n
2. MOME
Since there is only one population parameter, we only need to set up one equation,
in this case, the equation on the first moment will suffice:
𝐸(𝑋) = 𝑝 =
∑ni=1 X i
X1 + X 2 + ⋯ + X n
⇒ The MOME is: p̂ =
n
n
Example 4. Let X1, …, Xn be a random sample from exp(λ)
Please derive 1. The MLE of λ
2. The MOME of λ.
Solution:
1. MLE:
𝑓(𝑥) = 𝜆exp(−𝜆𝑥)
𝑛
𝐿=∏
𝑛
𝑓(𝑥𝑖 ) = ∏
𝑖=1
𝜆exp(−𝜆𝑥𝑖 ) = 𝜆𝑛 exp(−𝜆 ∑
𝑖=1
𝑛
𝑥𝑖 )
𝑖=1
𝑛
𝑙 = 𝑙𝑛𝐿 = 𝑛𝑙𝑛(𝜆) − 𝜆 ∑
𝑥𝑖
𝑖=1
𝑛
𝜕𝑙 𝑛
= − ∑ 𝑥𝑖 = 0
𝜕𝜆 𝜆
𝑖=1
Thus the MLE is:
𝜆̂ =
1
𝑋̅
2. MOME:
𝐸(𝑋) =
1
𝜆
23
Thus setting:
𝐸(𝑋) =
1
= 𝑋̅
𝜆
We have the MOME:
𝜆̂ =
1
𝑋̅
24
Download