Lecture Notes #1 A Quick Review of Probability & Statistics (0) Review of Probability, the Monty Hall Problem (http://en.wikipedia.org/wiki/Monty_Hall_problem) The Monty Hall problem is a probability puzzle loosely based on the American television game show Let's Make a Deal and named after the show's original host, Monty Hall. Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1 [but the door is not opened], and the host, who knows what's behind the doors, opens another door (*always a door you did not choose, and behind which there is no car), say No. 3, which has a goat. He then says to you, "Do you want to switch (*i.e. pick door No. 2), or to stay (*i.e., stay with door No. 1 you picked initially)?" Is it to your advantage to switch your choice? The answer will be clear by computing and comparing the following two probabilities: (1) what is your winning chance if your strategy is to stay? (2) what is your winning chance if your strategy is to switch? Solutions: (*many possible ways – but not all of them are correct even if your answers are the right numbers. Here we present one approach using conditional probability.) P(Win _ By _ Stay ) P(WBST | First _ Door _ Chosen _ Has _ Pr ize) P( F .D.C _ Has _ Pr ize) P(WBST | First _ Door _ Chosen _ Has _ No _ Pr ize) P( F .D.C _ Has _ No _ Pr ize) 1 2 1 1 0 3 3 3 1 P(Win _ By _ Switch) P(WBSW | First _ Door _ Chosen _ Has _ Pr ize) P( F .D.C _ Has _ Pr ize) P(WBSW | First _ Door _ Chosen _ Has _ No _ Pr ize) P( F .D.C _ Has _ No _ Pr ize) 1 2 2 0 1 3 3 3 (1) Normal Distribution Q. Who invented the normal distribution? * Left: Abraham de Moivre (26 May 1667 in Vitry-le-François, Champagne, France – 27 November 1754 in London, England) *From the Wikipedia * Right: Johann Carl Friedrich Gauss (30 April 1777 – 23 February 1855) 2 <i> Probability Density Function (p.d.f.) X ~ N ( , 2 ) : X follows normal distribution of mean and variance 2 f ( x) 1 2 ( x )2 , x , x R 2 2 e b P(a X b) f ( x)dx = area under the pdf curve bounded by a and b a <ii> Cumulative Distribution Function (c.d.f.) x F ( x) P( X x) f (t )dt f ( x) [ F ( x)]' d F ( x) dx (2). Mathematical Expectation (Review). Continuous random variable: E[ g ( X )] g ( x) f ( x)dx 3 Discrete random variable: all E[ g ( X )] g ( x) P( X x) Properties of Expectations: (1) E(c) = c, where c is a constant (2) E[c*g(X)] = c*E[g(X)], where c is a constant (3) E[g(X)+h(Y))]= E[g(X)]+E[h(Y)], for any X&Y (4) E[g(X)*h(Y)] = E[g(X)]*E[h(Y)], if X & Y are independent –otherwise it is usually not true. Special case: 1) (population) Mean: E ( X ) x f ( x)dx Note: E(aX+b) =aE(X)+b, where a & b are constants 2) (population) Variance: Var(X) = 2 E[( X )2 ] ( x )2 f ( x)dx E( X 2 ) [ E( X )]2 Note: Var(aX+b) = a2Var(X), where a & b are constants 3) Moment generating function: M X (t ) E (e tX ) e tx f ( x)dx , when X is continuous. 𝑴𝑿 (𝒕) = 𝐄(𝐞𝐭𝐗 ) = ∑𝐚𝐥𝐥 𝐩𝐨𝐬𝐬𝐢𝐛𝐥𝐞 𝐯𝐚𝐥𝐮𝐞𝐬 𝐨𝐟 𝐗 𝐞𝐭𝐱 𝐟(𝐱) For normal distribution, X ~ N ( , ), f ( x) 2 M X (t ) e f ( x)dx e tx 1 2 e ( x )2 2 2 , x 1 2 t 2t 2 4) Moment: 1st (population) moment: 𝐸(𝑋) = ∫ 𝑥 ∙ 𝑓(𝑥) 𝑑𝑥 4 2nd (population) moment: 𝐸(𝑋 2 ) = ∫ 𝑥 2 ∙ 𝑓(𝑥) 𝑑𝑥 … Kth (population) moment: 𝐸(𝑋 𝑘 ) = ∫ 𝑥 𝑘 ∙ 𝑓(𝑥) 𝑑𝑥 Theorem. If X 1 , X 2 are independent, then f X1 , X 2 ( x1 , x2 ) f X1 ( x1 ) f X 2 ( x2 ) Theorem. If X 1 , X 2 are independent, then M X 1 X 2 (t ) M X 1 (t ) M X 2 (t ) Theorem. Under regularity conditions, there is 1-1 correspondence between the pdf and the mgf of a given random variable X. That is, 11 pdf f ( x) mgf M X (t ) . e.g. Sgt. Jones wishes to select one army recruit into his unit. It is known that the IQ distribution for all recruits is normal with mean 180 and standard deviation 10. What is the chance that Sgt. Jones would select a recruit with an IQ of at least 200? Sol) Let X represents the IQ of a randomly selected recruit. X ~ N ( 180, 10) P( X 200) P( X 180 200 180 ) P( Z 2) 2.28% 10 10 e.g. Sgt. Jones wishes to select three army recruits into his unit. It is known that the IQ distribution for all recruits is normal with mean 180 and standard deviation 10. What is the chance that Sgt. Jones would select three recruits with an average IQ of at least 200? Sol) Let 𝑋1, 𝑋2 , 𝑋3 represents the IQ of three randomly selected recruits. 𝑋𝑖 𝑖.𝑖.𝑑. ~ 𝑁(𝜇, 𝜎 2 ), 𝑖 = 1, 2, 3 5 X ~ N ( * 180, * P( X 200) P( 10 ) 3 X 180 200 180 ) P( Z 2 3 ) 0.03% 10 / 3 10 / 3 e.g. (Another practical example of mathematical expectation) A gambler goes to bet. The dealer has 3 dice, which are fair, meaning that the chance that each face shows up is exactly 1/6. The dealer says: "You can choose your bet on a number, any number from 1 to 6. Then I'll roll the 3 dice. If none show the number you bet, you'll lose $1. If one shows the number you bet, you'll win $1. If two or three dice show the number you bet, you'll win $3 or $5, respectively." Is it a fair game? (*A fair game is such that the expected, or equivalently the long term average, winning is zero.) 𝟏 Solution: Let 𝑿 be # of dice show the number you bet, then 𝑿~𝑩𝒊𝒏𝒐𝒎𝒊𝒂𝒍(𝟑, 𝟔) (Because the dealer roll 3 dice independently, and the probability for each 𝟏 dice show the number you bet is . That is, 𝟔 𝒙 𝟓 𝟑−𝒙 𝟑 𝟏 𝑷(𝑿 = 𝒙) = ( ) ( ) ( ) , 𝒙 = 𝟎, 𝟏, 𝟐, 𝟑 𝒙 𝟔 𝟔 Let 𝒀 be the amount of money you get, then, 𝟎 𝟓 𝟑−𝟎 𝟓 𝟑 𝟏𝟐𝟓 𝟑 𝟏 𝑷(𝒀 = −𝟏) = 𝑷(𝑿 = 𝟎) = ( ) ( ) ( ) =( ) = 𝟎 𝟔 𝟔 𝟔 𝟐𝟏𝟔 𝟏 𝟓 𝟑−𝟏 𝟏 𝟓 𝟐 𝟕𝟓 𝟑 𝟏 𝑷(𝒀 = 𝟏) = 𝑷(𝑿 = 𝟏) = ( ) ( ) ( ) =𝟑∙ ∙( ) = 𝟏 𝟔 𝟔 𝟔 𝟔 𝟐𝟏𝟔 𝟐 𝟓 𝟑−𝟐 𝟏 𝟐 𝟓 𝟏𝟓 𝟑 𝟏 𝑷(𝒀 = 𝟑) = 𝑷(𝑿 = 𝟐) = ( ) ( ) ( ) =𝟑∙( ) ∙ = 𝟐 𝟔 𝟔 𝟔 𝟔 𝟐𝟏𝟔 𝟑 𝟓 𝟑−𝟑 𝟏 𝟑 𝟏 𝟑 𝟏 𝑷(𝒀 = 𝟓) = 𝑷(𝑿 = 𝟑) = ( ) ( ) ( ) =( ) = 𝟑 𝟔 𝟔 𝟔 𝟐𝟏𝟔 6 Thus, your expected winning is 𝑬(𝒀) = ∑ 𝒚 ∙ 𝑷(𝒀 = 𝒚) = (−𝟏) ∙ 𝒚 𝟏𝟐𝟓 𝟕𝟓 𝟏𝟓 𝟏 +𝟏∙ +𝟑∙ +𝟓∙ =𝟎 𝟐𝟏𝟔 𝟐𝟏𝟔 𝟐𝟏𝟔 𝟐𝟏𝟔 Therefore, it is a fair game. *(3). Joint distribution, and independence Definition. The joint cdf of two random variables X and Y are defined as: 𝐹𝑋,𝑌 (𝑥, 𝑦) = 𝐹(𝑥, 𝑦) = 𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦) Definition. The joint pdf of two discrete random variables X and Y are defined as: 𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑓(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) Definition. The joint pdf of two continuous random variables X and Y are defined as: 𝜕2 𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑓(𝑥, 𝑦) = 𝜕𝑥𝜕𝑦 𝐹(𝑥, 𝑦) Definition. The marginal pdf of the discrete random variable X or Y can be obtained by summation of their joint pdf as the following: 𝑓𝑋 (𝑥) = ∑𝑦 𝑓(𝑥, 𝑦) ; 𝑓𝑌 (𝑦) = ∑𝑥 𝑓(𝑥, 𝑦) ; Definition. The marginal pdf of the continuous random variable X or Y can be ∞ obtained by integration of the joint pdf as the following: 𝑓𝑋 (𝑥) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑦; ∞ 𝑓𝑌 (𝑦) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥; Definition. The conditional pdf of a random variable X or Y is defined as: 7 𝑓(𝑥|𝑦) = 𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦) ; 𝑓(𝑦|𝑥) = 𝑓(𝑦) 𝑓(𝑥) Definition. The joint moment generating function of two random variables X and Y is defined as 𝑀𝑋,𝑌 (𝑡1 , 𝑡2 ) = 𝐸(𝑒 𝑡1 𝑋+𝑡2 𝑌 ) Note that we can obtain the marginal mgf for X or Y as follows: 𝑀𝑋 (𝑡1 ) = 𝑀𝑋,𝑌 (𝑡1 , 0) = 𝐸(𝑒 𝑡1 𝑋+0∗𝑌 ) = 𝐸(𝑒 𝑡1 𝑋 ); 𝑀𝑌 (𝑡2 ) = 𝑀𝑋,𝑌 (0, 𝑡2 ) = 𝐸(𝑒 0∗𝑋+𝑡2 ∗𝑌 ) = 𝐸(𝑒 𝑡2 ∗𝑌 ) Theorem. Two random variables X and Y are independent ⇔ (if and only if) 𝐹𝑋,𝑌 (𝑥, 𝑦) = 𝐹𝑋 (𝑥)𝐹𝑌 (𝑦) ⇔ 𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦) ⇔ 𝑀𝑋,𝑌 (𝑡1 , 𝑡2 ) = 𝑀𝑋 (𝑡1 ) 𝑀𝑌 (𝑡2 ) Definition. The covariance of two random variables X and Y is defined as 𝐶𝑂𝑉(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )]. Theorem. If two random variables X and Y are independent, then we have 𝐶𝑂𝑉(𝑋, 𝑌) = 0. (*Note: However, 𝐶𝑂𝑉(𝑋, 𝑌) = 0 does not necessarily mean that X and Y are independent.) Exercise 𝑄1. 𝐿𝑒𝑡 𝑋1 & 𝑋2 𝑏𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑁(𝜇, 𝜎 2 ). Prove that 𝑋1 + 𝑋2 and 𝑋1 − 𝑋2 are independent in two approaches: (1) pdf, (2) mgf Solution: (1) The pdf approach: Let 𝑊 = 𝑋1 + 𝑋2 𝑍 = 𝑋1 − 𝑋2 Then we have 8 𝑊+𝑍 2 𝑊−𝑍 𝑋2 = 2 𝑋1 = This is a 1-1 transformation between (X1, X2) and (W, Z). Define the Jacobian J of the transformation as: 𝜕𝑥1 𝐽 = | 𝜕𝑤 𝜕𝑥2 𝜕𝑤 𝜕𝑥1 1 1 𝜕𝑧 | = |2 2 | = −1/2 1 1 𝜕𝑥2 − 2 2 𝜕𝑧 Then the joint distribution of the new random variables is given by: 𝑤+𝑧 𝑤−𝑧 𝑓𝑊,𝑍 (𝑤, 𝑧) = 𝑓𝑋1, 𝑋2 ( , ) |𝐽| 2 2 Since 𝑋1 and 𝑋2 are independent, we have: 𝑓𝑋1 ,𝑋2 (𝑥1 , 𝑥2 ) = 𝑓𝑋1 (𝑥1 )𝑓𝑋2 (𝑥2 ) Thus we find: 𝑓𝑊,𝑍 (𝑤, 𝑧) = 𝑓𝑋1 ( =( 1 2 ) e − 𝑤+𝑧 𝑤−𝑧 ) 𝑓𝑋2 ( ) |𝐽| 2 2 2 2 1 𝑤+𝑧 1 𝑤−𝑧 ( −𝜇) − 2 ( −𝜇) 2 2σ2 2 2σ ∗ 1 2 √2𝜋𝜎 2 1 1 1 1 − 2 (𝑤−2𝜇)2 − 2 (𝑧)2 4σ 4σ = ( )e 2 2𝜋𝜎 2 1 1 2 1 1 (𝑤−2𝜇)2 − − 𝑧 = e 4σ2 e 4σ2 √2𝜋 × 2𝜎 2 √2𝜋 × 2𝜎 2 = 𝑓𝑍 (𝑧)𝑓𝑊 (𝑤) 9 So Z and W are independent. Furthermore we see that 𝑋1 + 𝑋2 ~𝑁(2𝜇, 2𝜎 2 ) and 𝑋1 − 𝑋2 ~𝑁(0, 2𝜎 2 ) (2) The mgf approach: Now we p𝑟𝑜𝑣𝑒 𝑡ℎ𝑎𝑡 𝑴𝑿𝟏 +𝑿𝟐 ,𝑿𝟏 −𝑿𝟐 (𝒕𝟏 , 𝒕𝟐 ) = 𝑴𝑿𝟏 +𝑿𝟐 (𝒕𝟏 )𝑴𝑿𝟏 −𝑿𝟐 (𝒕𝟐 ) 𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏: 𝐿𝑒𝑡 𝑊 = 𝑋1 + 𝑋2 𝑎𝑛𝑑 𝑍 = 𝑋1 − 𝑋2 𝑊 𝑎𝑛𝑑 𝑍 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑖𝑓 𝑎𝑛𝑑 𝑜𝑛𝑙𝑦 𝑖𝑓 𝑀𝑊,𝑍 (𝑡1 , 𝑡2 ) = 𝑀𝑊 (𝑡1 )𝑀𝑍 (𝑡2 ) 𝑆𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑒 𝑓𝑜𝑟 𝑊 & 𝑍 𝑀𝑊,Z (𝑡1 , 𝑡2 ) = 𝐸(𝑒 𝑡1 𝑊+𝑡2 𝑍 ) ⇒ 𝐸(𝑒 𝑡1 (𝑋1 +𝑋2 )+𝑡2 (𝑋1 −𝑋2 ) ) = 𝐸(𝑒 (𝑡1 +𝑡2 )𝑋1 +(𝑡1 −𝑡2 )𝑋2 ) 𝐸(𝑒 (𝑡1 +𝑡2 )𝑋1 +(𝑡1 −𝑡2 )𝑋2 ) = 𝐸(𝑒 (𝑡1 +𝑡2 )𝑋1 )𝐸(𝑒 (𝑡1 −𝑡2 )𝑋2 ) 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑋1 & 𝑋2 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 ∗ 𝐸(𝑒 (𝑡1 +𝑡2 )𝑋1 ∗ 𝐸(𝑒 (𝑡1 −𝑡2 )𝑋2 ∴ 𝐸(𝑒 )= 1 )+ (𝑡 )2 2 𝑒 𝜇(𝑡1 +𝑡2 2 1 +𝑡2 𝜎 )= 1 )+ (𝑡 )2 2 𝑒 𝜇(𝑡1 −𝑡2 2 1 −𝑡2 𝜎 (𝑡1 +𝑡2 )𝑋1 +(𝑡1 −𝑡2 )𝑋2 )=𝑒 𝜇𝑡1 +𝜇𝑡2 + = exp ( 𝜇𝑡1 + 𝜇𝑡2 + + =𝑒 𝜇𝑡1 +𝜇𝑡2 + 𝜎2 𝑡12 2 𝜎2 𝑡22 +𝜎 𝑡1 𝑡2 + 2 2 =𝑒 𝜇𝑡1 −𝜇𝑡2 + 𝜎2 𝑡12 2 𝜎2 𝑡22 −𝜎 𝑡1 𝑡2 + 2 2 𝜎2 𝑡12 2 𝜎2 𝑡22 𝜎2 𝑡12 2 𝜎2 𝑡22 +𝜎 𝑡1 𝑡2 + 𝜇𝑡 −𝜇𝑡2 + −𝜎 𝑡1 𝑡2 + 2 2 𝑒 1 2 2 𝜎 2 𝑡12 𝜎 2 𝑡22 𝜎 2 𝑡12 + 𝜎 2 𝑡1 𝑡2 + + 𝜇𝑡1 − 𝜇𝑡2 + − 𝜎 2 𝑡1 𝑡2 2 2 2 𝜎 2 𝑡22 ) 2 1 1 = exp (2𝜇𝑡1 + (2𝜎 2 )𝑡12 + (2𝜎 2 )𝑡22 ) 2 2 1 = 𝑒 2𝜇𝑡1 +2(2𝜎 2 )𝑡 2 1 1 𝑒 0𝑡2 +2(2𝜎 2 )𝑡 2 2 = 𝑀𝑋1 +𝑋2 (𝑡1 )𝑀𝑋1 −𝑋2 (𝑡2 ) = 𝑀𝑊 (𝑡1 )𝑀𝑍 (𝑡2 ) ∴ 𝑀𝑊,𝑍 (𝑡1 , 𝑡2 ) = 𝑀𝑊 (𝑡1 )𝑀𝑍 (𝑡2 ) ∴ 𝑊 = 𝑋1 + 𝑋2 𝑎𝑛𝑑 𝑍 = 𝑋1 − 𝑋2 are independent 10 Q2. 𝐿𝑒𝑡 𝑍 𝑏𝑒 𝑁(0,1), (1) please derive the covariance between 𝑍 𝑎𝑛𝑑 𝑍 2 ; (2) Are 𝑍 𝑎𝑛𝑑 𝑍 2 independent? 𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏: (1) 𝑐𝑜𝑣(𝑍, 𝑍 2 ) = 𝐸 [(𝑍 − 𝐸(𝑍)) (𝑍 2 − 𝐸(𝑍 2 ))] = 𝐸[(𝑍 − 0)(𝑍 2 − 1)] = 𝐸[𝑍 3 ] = 𝜕3 𝜕𝑡3 [𝑒𝑥𝑝(12𝑡 2 )]|𝑡=0 = 0 (2) 𝑍 𝑎𝑛𝑑 𝑍 2 are not independent. You can do it in many ways, using (a) pdf, (b) cdf/probabilities, or (c) mgf. For example, 𝑃(𝑍 > 1, 𝑍 2 > 1) = 𝑃(𝑍 > 1) ≠ 𝑃(𝑍 > 1) ∗ 𝑃(𝑍 2 > 1) = [𝑃(𝑍 > 1)]2 Q3. Let ~𝑁(3, 4) , please calculate 𝑃(1 < 𝑋 < 3) 𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏: 𝑃(1 < 𝑋 < 3) 1−3 𝑋−3 3−3 = 𝑃( < < ) 2 2 2 = 𝑃(−1 < 𝑍 < 0) = 𝑃(0 < 𝑍 < 1) = 0.3413 𝑁𝑜𝑡𝑒: 𝐼𝑛 𝑡ℎ𝑒 𝑎𝑏𝑜𝑣𝑒, 𝑍~𝑁(0,1) Q4. Students A and B plans to meet at the SAC Seawolf market between 12noon and 1pm tomorrow. The one who arrives first will wait for the other for 30 minutes, and then leave. What is the chance that the two friends will be able to meet at SAC during their appointed time period assuming that each one will arrive at SAC independently, each at a random time between 12noon and 1pm? Solution: 3/4 as illustrated by the following figure. 11 Let X and Y denote the arrival time (in hour where 0 indicates 12noon and 1 indicates 1pm) of A and B respectively. Our assumption implies that X and Y each follows Uniform[0,1] distribution, and furthermore, they are independent, Thus the joint pdf of X and Y follows the uniform distribution with f(x,y) = 1, if (x,y) is in the square region, and f(x,y) = 0, outside the square region. A and B will be able to meet iff |X − Y| ≤ 1/2 as represented by the red area M bounded by the lines: X − Y = −1/2 and X − Y = +1/2. That is: 1 1 1 3 P(A & B will meet) = P (|X − Y| ≤ 2) = ∬M 1𝑑𝑥𝑑𝑦 = 1 − 8 − 8 = 4 *Definitions: population correlation & sample correlation Definition: The population correlation coefficient ρ is defined as: 𝜌= 𝑐𝑜𝑣(𝑋, 𝑌) √𝑣𝑎𝑟(𝑋) ∗ 𝑣𝑎𝑟(𝑌) Definition: Let (X1 , Y1), …, (Xn , Yn) be a random sample from a given bivariate population, then the sample correlation coefficient r is defined as: 𝑟= ∑(𝑋𝑖 − 𝑋̅)(𝑌𝑖 − 𝑌̅) √[∑(𝑋𝑖 − 𝑋̅)2 ][∑(𝑌𝑖 − 𝑌̅)2 ] 12 *Definition: Bivariate Normal Random Variable (𝑋, 𝑌)~𝐵𝑁(𝜇𝑋 , 𝜎𝑋2 ; 𝜇𝑌 , 𝜎𝑌2 ; 𝜌) where 𝜌 is the correlation between 𝑋 & 𝑌 The joint p.d.f. of (𝑋, 𝑌) is 1 𝑥 2 𝑦 2 2𝜌𝑥𝑦 𝑓(𝑥, 𝑦) = exp [− ( + − )] 2(1 − 𝜌2 ) 𝜎𝑋2 𝜎𝑌2 𝜎𝑋 𝜎𝑌 2𝜋𝜎𝑋 𝜎𝑌 √1 − 𝜌2 1 Exercise: Please derive the mgf of the bivariate normal distribution. Q5. Let X and Y be random variables with joint pdf 𝑓𝑋,𝑌 (𝑥, 𝑦) = 1 exp {− 2𝜋𝜎𝑋 𝜎𝑌 √1 − 𝜌2 𝑦 − 𝜇𝑌 2 +( ) ]} 𝜎𝑌 1 𝑥 − 𝜇𝑋 2 𝑥 − 𝜇𝑋 𝑦 − 𝜇𝑦 [( ) − 2𝜌 ( )( ) 2 2(1 − 𝜌 ) 𝜎𝑋 𝜎𝑋 𝜎𝑌 Where −∞ < 𝑥 < ∞, −∞ < 𝑦 < ∞. Then X and Y are said to have the bivariate normal distribution. The joint moment generating function for X and Y is 1 𝑀(𝑡1 , 𝑡2 ) = exp [𝑡1 𝜇𝑋 + 𝑡2 𝜇𝑌 + (𝑡12 𝜎𝑋2 + 2𝜌𝑡1 𝑡2 𝜎𝑋 𝜎𝑌 + 𝑡22 𝜎𝑌2 )] 2 . (a) Find the marginal pdf’s of X and Y; (b) Prove that X and Y are independent if and only if ρ = 0. (Here ρ is indeed, the <population> correlation coefficient between X and Y.) (c) Find the distribution of(𝑋 + 𝑌). (d) Find the conditional pdf of f(x|y), and f(y|x) Solution: (a) 13 The moment generating function of X can be given by 1 𝑀𝑋 (𝑡) = 𝑀(𝑡, 0) = 𝑒𝑥𝑝 [𝜇𝑋 𝑡 + 𝜎𝑋2 𝑡 2 ]. 2 Similarly, the moment generating function of Y can be given by 1 𝑀𝑌 (𝑡) = 𝑀(𝑡, 0) = 𝑒𝑥𝑝 [𝜇𝑌 𝑡 + 𝜎𝑌2 𝑡 2 ]. 2 Thus, X and Y are both marginally normal distributed, i.e., 𝑋~𝑁(𝜇𝑋 , 𝜎𝑋2 ), and 𝑌~𝑁(𝜇𝑌 , 𝜎𝑌2 ). The pdf of X is 𝑓𝑋 (𝑥) = 1 √2𝜋𝜎𝑋 𝑒𝑥𝑝 [− (𝑥 − 𝜇𝑋 )2 2𝜎𝑋2 ]. The pdf of Y is 𝑓𝑌 (𝑦) = 1 √2𝜋𝜎𝑌 𝑒𝑥𝑝 [− (𝑦 − 𝜇𝑌 )2 2𝜎𝑌2 ]. (b) If𝜌 = 0, then 1 𝑀(𝑡1 , 𝑡2 ) = exp [𝜇𝑋 𝑡1 + 𝜇𝑌 𝑡2 + (𝜎𝑋2 𝑡12 + 𝜎𝑌2 𝑡22 )] = 𝑀(𝑡1 , 0) ∙ 𝑀(0, 𝑡2 ) 2 Therefore, X and Y are independent. If X and Y are independent, then 1 𝑀(𝑡1 , 𝑡2 ) = 𝑀(𝑡1 , 0) ∙ 𝑀(0, 𝑡2 ) = exp [𝜇𝑋 𝑡1 + 𝜇𝑌 𝑡2 + (𝜎𝑋2 𝑡12 + 𝜎𝑌2 𝑡22 )] 2 14 1 = exp [𝜇𝑋 𝑡1 + 𝜇𝑌 𝑡2 + (𝜎𝑋2 𝑡12 + 2𝜌𝜎𝑋 𝜎𝑌 𝑡1 𝑡2 + 𝜎𝑌2 𝑡22 )] 2 Therefore, 𝜌 = 0 (c) 𝑀𝑋+𝑌 (𝑡) = 𝐸[𝑒 𝑡(𝑋+𝑌) ] = 𝐸[𝑒 𝑡𝑋+𝑡𝑌 ] Recall that 𝑀(𝑡1 , 𝑡2 ) = 𝐸[𝑒 𝑡1 𝑋+𝑡2 𝑌 ], therefore we can obtain 𝑀𝑋+𝑌 (𝑡)by 𝑡1 = 𝑡2 = 𝑡 in 𝑀(𝑡1 , 𝑡2 ) That is, 1 𝑀𝑋+𝑌 (𝑡) = 𝑀(𝑡, 𝑡) = exp [𝜇𝑋 𝑡 + 𝜇𝑌 𝑡 + (𝜎𝑋2 𝑡 2 + 2𝜌𝜎𝑋 𝜎𝑌 𝑡 2 + 𝜎𝑌2 𝑡 2 )] 2 1 2 = exp [(𝜇𝑋 + 𝜇𝑌 )𝑡 + (𝜎𝑋 + 2𝜌𝜎𝑋 𝜎𝑌 + 𝜎𝑌2 )𝑡 2 ] 2 ∴ 𝑋 + 𝑌 ~𝑁(𝜇 = 𝜇𝑋 + 𝜇𝑌 , 𝜎 2 = 𝜎𝑋2 + 2𝜌𝜎𝑋 𝜎𝑌 + 𝜎𝑌2 ) (d) The conditional distribution of X given Y=y is given by 2 𝑓(𝑥|𝑦) = 𝑓(𝑥, 𝑦) 1 = 𝑒𝑥𝑝 − 𝑓(𝑦) √2𝜋𝜎𝑋 √1 − 𝜌2 𝜎 (𝑥 − 𝜇𝑋 − 𝜎𝑋 𝜌(𝑦 − 𝜇𝑌 )) 𝑌 . 2(1 − 𝜌2 )𝜎𝑋2 { } Similarly, we have the conditional distribution of Y given X=x is 2 𝑓(𝑦|𝑥) = 𝑓(𝑥, 𝑦) 1 = 𝑒𝑥𝑝 − 𝑓(𝑥) √2𝜋𝜎𝑌 √1 − 𝜌2 { 𝜎 (𝑦 − 𝜇𝑌 − 𝜎𝑌 𝜌(𝑥 − 𝜇𝑋 )) 𝑋 2(1 − 𝜌2 )𝜎𝑌2 . } 15 Therefore: 𝑋|𝑌 = 𝑦 ~ 𝑁 (𝜇𝑋 + 𝜌 𝜎𝑋 (𝑦 − 𝜇𝑌 ), (1 − 𝜌2 )𝜎𝑋2 ) 𝜎𝑌 𝑌|𝑋 = 𝑥 ~ 𝑁 (𝜇𝑌 + 𝜌 𝜎𝑌 (𝑥 − 𝜇𝑋 ), (1 − 𝜌2 )𝜎𝑌2 ) 𝜎𝑋 (4) Sampling from the Normal Population *Example: We wish to estimate the distribution of heights of adult US male. It is believed that the height of adult US male follows a normal distribution N(μ, σ2 ) Def. Simple random sample: A sample in which every subject in the population has the same chance to be selected. X: The Random Variable denote the height of a adult male we will choose randomly from the population So X~N(μ, σ2 ): the distribution of a randomly selected subject is the population distribution. Theorem 1 Sampling from the normal population Let X1 , X2 , … , Xn 𝑖𝑖𝑑 2 ~ N(μ, σ ), where i.i.d stands for independent and identically distributed 2 σ 1. ̅ X~N(μ, n ) 2. A function of the sample variance as shown below follows the Chi square distribution with (n-1) degrees of freedom: ̅) 2 (n − 1)S 2 ∑𝑛𝑖=1(X𝑖 − X 2 = ~𝜒𝑛−1 2 2 σ σ *Reminder: The Sample variance S 2 is defined as: S2 = ∑𝑛𝑖=1(X𝑖 − ̅ X)2 n−1 16 *Def 1: The Chi-square distribution is a special gamma distribution (*** Please find out which one it is.) *Def 2: Let Z1 , Z2 , … , Zk 𝑖𝑖𝑑 ~ N(0,1), then W = ∑𝑘𝑖=1 𝑍𝑖2 ~𝜒𝑘2 2 3. ̅ X & S are independent. 4. The Z-score transformation of the sample mean follows the standard normal distribution: ̅ X−𝜇 Z= ~N(0,1) σ/√𝑛 5. A T-random variable with n-1 degrees of freedom can be obtained by replacing the population standard deviation with its sample counterpart in the Z-score above: ̅−𝜇 X T= ~𝑡𝑛−1 S/√𝑛 *Def. of t-distribution (Student’s t-distribution, first introduced by William Sealy Gosset ) Let Z~N(0,1), 𝑊~𝜒𝑘2 , where Z&𝑊 are independent. Then, T= Z √𝑊 𝑘 = 𝑡𝑘 17 Wiki: William Sealy Gosset (June 13, 1876–October 16, 1937) is best known by his pen name Student and for the Student's t-distribution. (5) Point Estimators, MLE & MOME Point Estimators Example 1. Let X1, X2, …, Xn be a random sample from N(μ, σ2 ). Please find a good point estimator for 1. μ 2. σ2 ̅ Solutions. 1.μ̂ = X ̂2 = S 2 2. σ 18 There are the typical estimators for μ and S 2 Both are unbiased estimators. Property of Point Estimators Unbiased Estimators. θ̂ is said to be an unbiased estimator for θ if E(θ̂)= θ. ̅) = E ( E(X X1 + X 2 + ⋯ + X n E(X1 ) + E(X2 ) + ⋯ + E(Xn ) μ + μ + ⋯ + μ )= = =μ n n n E(S 2 )= σ2 (*Please make sure you can prove this on your own. We had discussed this before.) Unbiased estimator may not be unique. Example 2. E[∑ ai Xi ]=(∑ ai )μ Let μ̃ = ∑ ai Xi ∑ ai , it is unbiased for μ because E(μ̃) = μ Variance of Unbiased Estimators. σ2 ̅) < 𝑣𝑎𝑟(Xi ) when n > 1 n } var(X 2 var(Xi ) = σ ̅) = var(X Methods for deriving point estimators 1. Maximum Likelihood Estimator (MLE) 2. Method of Moment Estimator (MOME) Example 1 (continued). Xi i. i. d N(μ, σ2 ) ,i=1,2,…,n ~ 1. Derive the MLE for μ and σ2 . 2. Derive the MOME for μ and σ2 . Solution. 1. MLE [i] pdf: f(xi )= 1 √2πσ 2 (x −μ) − i e 2σ2 , xi ∈ R, i = 1, … , n 19 [ii] likelihood function L = f(x1 , x2 , … , xn ) n n = ∏ f(xi ) = ∏ i=1 = i=1 1 √2πσ (x −μ)2 − i 2 e 2σ 2 ∑n n i=1(xi −μ) 2 −2 − 2 2σ (2πσ ) . e [iii] log likelihood function ∑ni=1(xi − μ)2 ∑ni=1(xi − μ)2 n n n 2) 2) l = lnL = − ln(2πσ − = − ln(2π) − ln(σ − 2 2σ2 2 2 2σ2 [iv] Find the MLE’s ∑ni=1(xi − μ) dl =2 =0 dμ σ2 dl n 1 ∑ni=1(xi − μ)2 =0 {dσ2 = − 2 σ2 + 2σ4 μ̂ = ̅ X n ⟹{ 2 ∑i=1(Xi − ̅ X)2 σ = ̂ n R. A. Fisher (http://en.wikipedia.org/wiki/Ronald_Fisher) established the comprehensive theory for the maximum likelihood method. 20 2. MOME Orde r Population Moment 1st X1 + X 2 + ⋯ + X n n 2 2 X1 + X 2 + ⋯ + X n 2 = n … k k X1 + X 2 + ⋯ + X n k = n … E(X) 2nd E(X 2 ) … … kth E(X k ) … … Sample Moment = Back to this example, we have two parameters, and we set up the equations for the first two moments: ̅ E(X) = μ = X E(X 2) X1 2 + X 2 2 + ⋯ + X n 2 = μ + σ == n 2 2 ⇒{ 2 X1 ̂ = σ 2 μ̂ = ̅ X 2 + X2 + ⋯ + Xn 2 ̅) 2 − (X n Proof. (Now we show that the MOME and the MLE of σ2 are the same for this normal population example. ̅+X ̅) 2 ∑ni=1 Xi 2 ∑ni=1(Xi − X 2 ̅ ̅) 2 − (X) = − (X n n ̅(Xi − ̅ ∑ni=1(Xi − ̅ X)2 + ∑ni=1 2X X) + ∑ni=1 ̅ X2 ̅) 2 = − (X n ̅ )2 ∑ni=1(Xi − X = n Therefore, the MLE and MOME for σ2 are the same for the normal population. 21 2) E(σ ̂ ̅) 2 ̅) 2 ∑ni=1(Xi − X n − 1 ∑ni=1(Xi − X n−1 = E( ) = E( )= E(S 2 ) n n n−1 n n − 1 2n → ∞ 2 = σ σ ⇒ n (Asymptotically unbiased) Karl Pearson, who developed the MOME method (http://en.wikipedia.org/wiki/Karl_pearson) From left to right: Egon Sharpe Pearson, Karl Pearson, Maria Pearson (née Sharpe), Sigrid Letitia Sharpe Pearson Example 3. Let Xi i.i.d Bernoulli(p), i = 1, … , n. ~ Please derive 1. The MLE of p 2. The MOME of p. Solution. 1. MLE 22 [i] f(xi ) = pxi (1 − p)1−xi , i = 1, … , n [ii]L = p∑ xi (1 − p)n−∑ xi [iii] l = lnL = (∑ xi )lnp + (n − ∑ xi )ln(1 − p) [iv] Solving: dl = dp ∑ xi p − n−∑ xi 1−p = 0 , we obtain the MLE: p̂ = ∑n i=1 Xi n 2. MOME Since there is only one population parameter, we only need to set up one equation, in this case, the equation on the first moment will suffice: 𝐸(𝑋) = 𝑝 = ∑ni=1 X i X1 + X 2 + ⋯ + X n ⇒ The MOME is: p̂ = n n Example 4. Let X1, …, Xn be a random sample from exp(λ) Please derive 1. The MLE of λ 2. The MOME of λ. Solution: 1. MLE: 𝑓(𝑥) = 𝜆exp(−𝜆𝑥) 𝑛 𝐿=∏ 𝑛 𝑓(𝑥𝑖 ) = ∏ 𝑖=1 𝜆exp(−𝜆𝑥𝑖 ) = 𝜆𝑛 exp(−𝜆 ∑ 𝑖=1 𝑛 𝑥𝑖 ) 𝑖=1 𝑛 𝑙 = 𝑙𝑛𝐿 = 𝑛𝑙𝑛(𝜆) − 𝜆 ∑ 𝑥𝑖 𝑖=1 𝑛 𝜕𝑙 𝑛 = − ∑ 𝑥𝑖 = 0 𝜕𝜆 𝜆 𝑖=1 Thus the MLE is: 𝜆̂ = 1 𝑋̅ 2. MOME: 𝐸(𝑋) = 1 𝜆 23 Thus setting: 𝐸(𝑋) = 1 = 𝑋̅ 𝜆 We have the MOME: 𝜆̂ = 1 𝑋̅ 24