Probability & Statistics Professor Wei Zhu July 24th (I) The sum of two normal random variables may not be normal. Counter example: π~π(0,1), and the value of random variable W depends on a π, ππ βπππ coin (fair coin) flip: π = { , please −π, ππ π‘πππ (a) Find the distribution of W. (b) Is the joint distribution of Z and W a bivariate normal? (c) Is W+Z normal? Answer: (a) πΉπ (π€) = π(π ≤ π€) = π(π ≤ π€|π»)π(π») + π(π ≤ π€|π)π(π) = π(π ≤ π€)π(π») + π(−π ≤ π€)π(π) 1 1 = πΉπ (π€) × + π(π ≥ −π€) × 2 2 1 1 2 2 = πΉπ (π€) × + π(π ≤ π€) × So π~π(0,1). 1 1 = πΉπ (π€) × + πΉπ (π€) × 2 2 = πΉπ (π€) (b) π(π + π = 0) = π(π + π = 0|π»)π(π») + π(π + π = 0|π)π(π) 1 1 1 = 1× +0× = 2 2 2 Therefore, π + π is not a normal random variable. (c) The joint distribution of Z and W is not bivariate normal. This can be shown by deriving the joint mgf of Z and W and compare it with the joint mgf of bivariate normal (which we already knew, from the previous lecture). π(π‘1 , π‘2 ) = πΈ(π π‘1 π+π‘2 π ) = πΈ(πΈ(π π‘1 π+π‘2 π |ππ’π‘ππππ ππ ππππ ππππ)) = πΈ(π π‘1 π+π‘2 π |π»)π(π») + πΈ(π π‘1 π+π‘2 π |π)π(π) 1 = πΈ(π (π‘1 +π‘2 )π ) × 1 1 + πΈ(π (π‘1 −π‘2 )π ) × 2 2 (π‘1 −π‘2 )2 1 (π‘1 +π‘2 )2 2 = (π +π 2 ) 2 This is not the joint mgf of bivariate normal. Therefore the joint distribution of π πππ π is not bivariate normal. (II) Other Common Univariate Distributions Dear students, besides the Normal, Bernoulli and Binomial distributions, the following distributions are also very important in our studies. 1. Discrete Distributions 1.1. Geometric distribution This is a discrete waiting time distribution. Suppose a sequence of independent Bernoulli trials is performed and let X be the number of failures preceding the ο¬rst success. Then π~πΊπππ(π), with the pdf π(π₯) = π(π = π₯) = π(1 − π)π₯ , π₯ = 0,1,2, … 1.2. Negative Binomial distribution Suppose a sequence of independent Bernoulli trials is conducted. If X is the number of failures preceding the nth success, then X has a negative binomial distribution. Probability density function: π(π₯) = π(π = π₯) = (π + π₯ − 1) β π−1 π₯ ππ (1 − π) , ways of allocating failures and successes π(1 − π) , π π(1 − π) πππ(π) = , π2 π π (π‘) ππ₯ =[ ] . 1 − π π‘ (1 − π) πΈ(π) = 1. If n = 1, we obtain the geometric distribution. 2. Also seen to arise as sum of n independent geometric variables. 2 1.3. Poisson distribution Parameter: rate π > 0 MGF: ππ₯ (π‘) = π π(π π‘ −1) Probability density function: π(π) = π·(πΏ = π) = π−π ππ , π = π, π, π, … π! 1. The Poisson distribution arises as the distribution for the number of “point events” observed from a Poisson process. Examples: Figure 1: Poisson Example 2. The Poisson distribution also arises as the limiting form of the binomial distribution: π → ∞, ππ → π π→0 The derivation of the Poisson distribution (via the binomial) is underpinned by a Poisson process i.e., a point process on [0, ∞); see Figure 1. AXIOMS for a Poisson process of rate λ > 0 are: (A) The number of occurrences in disjoint intervals are independent. (B) Probability of 1 or more occurrences in any sub-interval [π‘, π‘ + h) is πh + π(h) (h → 0) (approx prob. is equal to length of interval π₯π). (C) Probability of more than one occurrence in [π‘, π‘ + h) is π(h) (h → 0) (i.e. prob is small, negligible). Note: π(h), pronounced (small order h) is standard notation for any function π(β) with the property: π(β) =0 β→0 β πππ 3 1.4. Hypergeometric distribution Consider an urn containing M black and N white balls. Suppose n balls are sampled randomly without replacement and let X be the number of black balls chosen. Then X has a hypergeometric distribution. Parameters: π, π > 0, 0 < π ≤ π + π Possible values: πππ₯ (0, π − π) ≤ π₯ ≤ πππ (π, π) Probability density function: π΄) ( π΅ ) π(π) = π·(πΏ = π) = π π − π , (π΄ + π΅) π π π + π − π πππ πΈ(π) = π , πππ(π) = . π+π π + π − 1 (π + π)2 ( The mgf exists, but there is no useful expression available. 1. The hypergeometric distribution is simply # samples with x black balls # possible samples π π ( )( ) = π₯ π−π₯ , π+π ( ) π 2. To see how the limits arise, observe we must have x ≤ n (i.e., no more than sample size of black balls in the sample.) Also, x ≤ M, i.e., x ≤ min (π, π). Similarly, we must have x ≥ 0 (i.e., cannot have < 0 black balls in sample), and π − π₯ ≤ π (i.e., cannot have more white balls than number in urn). i.e. π₯ ≥ π − π i.e. π₯ ≥ max (0, π − π). π 3. If we sample with replacement, we would get π ∼ π΅ (π, π = π+π). It is interesting to compare moments: 4 finite population correction hypergeometric πΈ(π) = ππ binomial πΈ(π₯) = ππ ↑ π+π−n πππ(π) = [ππ(1 − π)] π+π−1 πππ(π) = ππ(1 − π) ↓ When sample all balls in urn πππ(π)~0 2. Continuous Distributions 2.1 Uniform Distribution CDF, for a < π₯ < π: π₯ π₯ πΉ(π₯) = ∫ π(π₯)ππ₯ = ∫ −∞ π 1 π₯−π ππ₯ = , π−π π−π that is, 0 π₯−π πΉ(π₯) = { π−π 1 ππ (π‘) = π₯≤π π<π₯<π π≤π₯ π π‘π − π π‘π π‘(π − π) A special case is the πΌ(π, π) distribution: π(π) = { π, π < π₯ < 1 π, π¨ππ‘ππ«π°π’π¬π, πΉ(π₯) = π₯ for 0 < π₯ < 1 1 πΈ(π) = , 2 πππ(π) = 1 , 12 π(π‘) = ππ‘ − 1 . π‘ 2.2 Exponential Distribution PDF: π(π) = ππ−ππ , π ≥ π CDF: π(π) = π − π−ππ , ππ (π‘) = π , π > 0, π₯ ≥ 0. π−π‘ This is the distribution for the waiting time until the first occurrence in a Poisson process with rate parameter π > 0. 5 Exercise: In a Poisson process with rate π, the inter-arrival time (time between two adjacent events) follows i.i.d. exponential(π). 1. If π~πΈπ₯π(π) then, π·(πΏ ≥ π + π|πΏ ≥ π) = π·(πΏ ≥ π) (memoryless property) 2. It can be obtained as limiting form of geometric distribution. 2.3 Gamma distribution π(π) = ππΆ π(πΆ) ππΆ−π π−ππ , πΆ > 0, π > 0, π₯ ≥ 0 With ∞ gamma function: π€(πΌ) = ∫0 π‘ πΌ−1 π −π‘ ππ‘ π πΌ mgf : ππ (π‘) = (π−π‘) , π‘ < π. Suppose Y1 , . . . , YπΎ are independent Exp(λ) random variables and let X = Y1 + β― + YπΎ . Then π ∼ Gamma(πΎ, π), for πΎ integer. In general, π ∼ Gamma(α, λ), α > 0. 1. α is the shape parameter, π is the scale parameter π Note: if π ∼ Gamma (πΌ, 1) and π = π, then π ∼ Gamma(α, λ), That is, λ is scale parameter. Figure 2: Gamma Distribution π 1 2. Gamma ( 2 , 2) distribution is also called χ2π (chi-square with π df) distribution 6 if π is integer; χ22 = exponential distribution (for 2 df). 3. Gamma (πΎ, π) distribution can be interpreted as the waiting time until the πΎ π‘β occurrence in a Poisson process. 2.4 Beta density function Suppose Y1 ∼ Gamma (πΌ, π), π2 ∼ Gamma (β, λ) independently, then, π1 X= ~π΅(πΌ, π½), 0 ≤ π₯ ≤ 1. π1 + π2 Remark: see soon for derivation! 2.5 Standard Cauchy distribution Possible values: π₯ ∈ π 1 1 1 1 2 π PDF: π(π₯) = π (1+π₯2 ) ; (location parameter π = 0) CDF: πΉ(π₯) = + arctanπ₯ πΈ(π), πππ(π), ππ (π‘) do not exist. The Cauchy is a bell-shaped distribution symmetric about zero for which no moments are defined. If π1 ∼ π(0, 1) π1 ~Cauthy distribution. π2 and π2 ∼ π(0, 1) (III) Let’s Have Some Fun! independently, then π= The Post Office A post office has two clerks, Jack and Jill. It is known that their service times are two independent exponential random variables with the same parameter λ, and it is known that Jack spends on the average 10 minutes with each customer. Suppose Jack and Jill are each serving one customer at the moment, (a) What is the distribution of the waiting time for the next customer in line? 7 (b) What is the probability that the next customer in line will be the last customer (among the two customers being served and him/herself) to leave the post office? Solution: (a) Let X1 and X2 denote the service time for Jack and Jill, respectively. Then the waiting time for the next customer in line is: Y=min (X1,X2), where Xi ~ iid exp(λ), i=1,2. Furthermore, E(X1)=1/λ=10, λ=1/10. From the definition of exponential R.V., ο₯ f X i ( xi ) ο½ ο¬eο ο¬xi , xi>0, i=1,2 and P(Xi>x)= ο² f xi (u )du = e ο ο¬xi , xi>0, i=1,2. x FY(y)=P(Yο£y)=1-P(Y>y)=1-P(X1>y,X2>y)=1-P(X1>y)·P(X2>y)=1- (e-λy)2=1- e-2λy Therefore fY(y)= d FY(y)=(2λ) e-(2λ) y, y>0. Thus Y ~ exp (2λ), where λ=1/10. dy Note: Because of the memoryless property of the exponential distribution, it does not matter how long the current customers has been served by Jack or Jill. (b) One of the two customers being served right now will leave first. Now between the customer who is still being served and the next customer in line, their service time would follow the same exponential distribution because of the memoryless property of the exponential random variable. Therefore, each would have the same chance to finish first or last. That is, the next customer in line will be the last customer to leave with probability 0.5. Exercise. 1. Let X be a random variable following the Bernoulli distribution with success probability p. Please (1) Write down the probability density function of X, (2) Derive its moment generating function, (3) Derive the moment generating function of a B(n,p) (Binomial with parameters n and p) random variable. (4) Derive that the summation of n i.i.d Bernoulli (p) random variables would follow B(n,p). 2. Let X~N(μ1 , σ12 ), and Y~N(μ2 , σ22 ). Furthermore, X and Y are independent to each other. Please derive the distribution of (1) X+Y (2) X-Y (3) 2X-3Y+5 8 Solutions: Q1. (1) f(x) = px(1-p)1-x, x= 0 or 1. (2) ππ (t) = E(π π‘π ) = π 0βπ‘ π(π = 0) + π 1βπ‘ π(π = 1) = 1 − π + ππ π‘ (3) Assume Y ~ B(n,p) π π π=0 π=0 π ππ (t) = E(π π‘π ) = ∑ π π‘π π(Y = π) = ∑ π π‘π ( ) ππ (1 − π)π−π π π π = ∑ ( ) (ππ π‘ )π (1 − π)π−π = (1 − π + ππ π‘ )π π π=0 (4). Assume π1 , π2 , … , ππ ~π΅πππππ’πππ(π) i. i. d Let Y = ∑ππ=1 ππ π ππ (t) = E(π π‘π ) = E(π π‘ ∑π=1 ππ ) π = ∏ E(π π‘ππ ) = [E(π π‘π1 )]π (since ππ πππ i. i. d. ) π=1 = (1 − π + ππ π‘ )π (from conclusion of part (1)) Since the probability distribution can be identified by mgf, by comparing the results of part (3) and (4), we conclude that the sum of n i.i.d Bernoulli(p) random variables would follow B(n, p). Q2 If Z ~ N (μ, σ2) ∞ ππ (t) = E(π π‘π ) =∫ π −∞ π‘π§ 1 √2ππ π − (z−μ)2 2π2 ππ§ ∞ = ∫ 1 −∞ √2ππ ∞ [π§−(μ+π 1 1 − 2π2 = exp (ππ‘ + π 2 π‘ 2 ) ∫ π 2 −∞ √2ππ π − 1 [π§ 2 −(2μ+2π2 π‘)+π 2 ] 2π2 ππ§ 2 π‘)]2 ππ§ 1 1 = exp (ππ‘ + π 2 π‘ 2 ) β 1 = exp (ππ‘ + π 2 π‘ 2 ) 2 2 where π(x) = 1 √2ππ 2 π − [π§−(μ+π2 π‘)] 2π2 ππ π‘βπ πππ ππ π(μ + π 2 π‘, π 2 ) Then, we have 1 ππ (t) = exp (π1 π‘ + π12 π‘ 2 ) 2 1 ππ (t) = exp (π2 π‘ + π22 π‘ 2 ) 2 Since X and Y are independent, by theorem, we have (1) ππ+Y (t) = ππ (t)πY (t) 9 1 = exp [(π1 + π2 )t + (π12 + π22 )π‘ 2 ] 2 ⇒ π + Y ~ N(π1 + π2 , π12 + π22 ) (2) ππ−Y (t) = ππ (t)πY (−t) 1 = exp [(π1 − π2 )t + (π12 + π22 )π‘ 2 ] 2 2 2 ⇒ π − Y ~ N(π1 − π2 , π1 + π2 ) (3) π2π−3Y+5 (t) = ππ (2t)πY (−3t) β π 5π‘ 1 = exp [(2π1 − 3π2 + 5)t + (4π12 + 9π22 )π‘ 2 ] 2 ⇒ 2π − 3Y + 5 ~ N(2π1 − 3π2 + 5, 4π12 + 9π22 ) 10