ST2131 Probability Tay Yong Qiang Properties of Probability Mass Function 1. pX (xi ) ≥ 0; for i = 1, 2, · · · ; 2. pX (x) = 0; for other values of x; ∞ X 3. pX (xi ) = 1. Combinatorial Analysis Binomial Theorem n X n n k n−k (x + y) = x y k k=0 i=1 No. of Integer Solutions that satisfies x1 + x2 + · · · + xr = n (1) Cumulative Distribution Function FX (x) = P (X ≤ x), for x ∈ R Positive Integer: n−1 positive integer-valued vectors. r−1 Nonnegative Integer: n+r−1 nonnegative integer-valued vectors. r−1 Remark: If X has a probability mass function given by 1 1 1 1 p(1) = , p(2) = , p(3) = , p(4) = 4 2 8 8 then its c.d.f. is 0 a < 1 1 4 1 ≤ a < 2 1 F (a) = 2≤a<3 2 1 3 ≤ a < 4 8 1 4≤a 8 Expectation X E(X) = xpX (x) Multinomial Expansion (x1 + · ·· + xr )n X n n1 n2 n x1 x2 · · · xr r n1 , · · · , nr = Axioms of Probability x Inclusion-Exclusion Principle P (A1 , · · · , An ) = n X P (Ai ) − E[g(X)] X E[g(X)] = g(xi )pX (xi ) i=1 X Property of Expected Values E[aX + b] = aE(X) + b P (Ai1 , Ai2 ) 1≤i1 <i2 ≤n r+1 X + · · · + (−1) P (Ai1 · · · Air ) 1≤i1 <···<ir ≤n n+1 + · · · + (−1) i P (A1 · · · An ) Tail Sum Formula for Expectation ∞ ∞ X X E(X) = P (X ≥ k) = P (X > k) k=1 k=0 Variance If X is a random variable with mean µ, then, var(X) = E(X − µ)2 or P (AB) , P (A) > 0 P (B|A) = var(X) = E(X 2 ) − [E(X)]2 P (A) Note: E(X 2 ) ≥ [E(X)]2 ≥ 0 Useful Formulae General Multiplication Rule ∞ X λk P (A1 , · · · , An ) λ =e = P (A1 )P (A2 |A1 )P (A3 |A1 A2 ) · · · P (An |A1 · · · An−1 ) k! Conditional Probability Bayes’ Formulae 1. P (B) = P (B|A1 )P (A1 ) + · · · + P (B|An )P (An ) P (B|Ai )P (Ai ) 2. P (Ai |B) = P (B|A1 ) + · · · + P (B|An )P (An ) P (·|A) is a Probability Let A be an event with P (A) > 0, then 1. 0 ≤ P (B|A) ≤ 1 2. P (S|A) = 1 3. Let B1 , B2 , · · · be mutually exclusive events, then ∞ ∞ [ X P Bk |A = P (Bk |A) k=1 k=1 Independent Events Two events A and B are said to be independent if P (AB) = P (A)P (B), P (A|B) = P (A) Events A1 , A2 , · · · , An are said to be independent if for every subset collection of events Ai1 , Ai2 , · · · , Air , we have P (Ai1 Ai2 · · · Air ) = P (Ai1 )P (Ai2 ) · · · P (Air ) Properties of Independent Events If A and B are independent then so are A and B c , Ac and B, Ac and B c If A, B and C are independent then A is independent of any events formed from B and C. A is independent of B ∪ C and B ∩ C. Discrete Random Variables pX is defined ( as P (X = x) if x = x1 , x2 , · · · pX (x) = 0 if otherwise k=0 ∞ X ir i−1 i=1 1− λ n n = 1 (1 − r)2 ≈e , |r| < 1 −λ Standard p Deviation σX = var(X) Properties of Standard Deviation and Variance var(aX + b) = a2 var(X) SD(aX + b) = |a|SD(X) Discrete Random Variables Bernoulli Random Variable, Be(p) The experiment is only performed once, and define ( 1 if it is a success X = 0 if it is a failure P (X = 1) = p, P (X = 0) = 1 − p E(X) = p, var(X) = p(1 − p) Binomial Random Variable, Bin(n, p) Let X = number of successes in n Bernoulli(p) trials. For 0 ≤ k ≤ n, n k n−k P (X = k) = p q k i X n k n−k P (X ≤ i) = p (1 − p) , i = k k=0 0, 1, · · · , n E(X) = np, var(X) = np(1 − p) Geometric Random Variable, Geom(p) Let X = no. of trials required to obtain the first success. Note: X takes values 1, 2, · · · P (X = k) = pq k−1 , k ≥ 1 1−p 1 E(X) = , var(X) = p p2 Negative Binomial Random Variable, NB(r, p) Let X = no. of trials required to obtain r success. Note: X takes values r, r + 1, · · · n − 1 r k−r P (X = k) = p q k−1 r r(1 − p) E(X) = , var(X) = p p2 Note: Geom(p) = NB(1, p) April 8, 2017 Normal Distribution X ∼ N (µ, σ 2 ) 1 2 2 e−(x−µ) /(2σ ) , −∞ < x < ∞ fX (x) = √ Z 2πσ x 1 −(x−µ)2 /(2σ 2 ) FX (x) = e dx, −∞ < √ 2πσ −∞ x<∞ E(X) = µ, Var(X) = σ 2 Properties of Distribution Function 1. Fx is a nondecreasing function, i.e., if a < b, then FX (a) ≤ FX (b). 2. lim FX (b) = 1, lim FX (b) = 0 b→∞ b→−∞ 3. FX is right continuous. b ∈ R, lim FX (x) = FX (b). + Computing Poisson Distribution Function λ P (X = i) P (X = i + 1) = i+1 Hypergeometric Random Variable A set of N balls, m are red and N − m are blue. We choose n of these balls, without replacement, and define X to be the no. of red balls in our sample. P (X = x) = m x N −m n−x N n , x = 0, 1, · · · , N A Hypergeometric Random Variable for some values n, N, m is denoted by H(n, N, m). Also, nm E(X) = N nm h (n − 1)(m − 1) nm i var(X) = +1− N N −1 N Continuous Random Variable Exponential Random Variable X ∼ Exp(λ), λ>0 ( λe−λx , if x ≥ 0, fX (x) = 0, if x < 0 ( 1 − e−λx , x > 0 Fx (x) = 0, x>0 1 1 E(X) = , Var(X) = λ λ2 Note: The exponential variable is memoryless. Uniform Distribution If X is uniformly distributed over (a, b) then X ∼ U (a, b) 1 , a<x<b fX (x) = b−a 0, otherwise 0, x<a x − a , a≤x<b FX (x) = b−a 1, b≤x E(X) = a+b 2 , var(X) = (b − a)2 12 Marginal Distribution function of X FX (x) = lim FX,Y (x, y) y→∞ Useful Formulae P (X > a, Y > b) = 1 − FX (a) − FY (b) + FX,Y (a, b) P (a1 < X ≤ a2 , b1 < Y ≤ b2 ) = FX,Y (a2 , b2 ) − FX,Y (a1 , b2 ) + FX,Y (a1 , b1 ) − FX,Y (a2 , b1 ) Standard Normal Distribution Z ∼ N (0, 1) 1 2 fZ (x) = φ(x) = √ e−x /2 2π Z x 1 −x2 /2 FZ (x) = Φ(x) = √ e dx 2π −∞ E(X) = 0, Var(X) = 1 Joint Discrete Random Variable Marginal pmf of X X pX (x) = P (X = x) = pX,Y (x, y) y∈R Gamma Distribution X ∼ Γ(α, λ) λe−λx (λx)α−1 , fX (x) = Γ(α) 0, λ, α > 0Z ∞ Γ(α) = e −y α−1 y 0 E(X) = α λ , var(X) = Jointly Continuous Random Variable x≥0 , where x<0 α λ2 * Γ(1, λ) =Exp(λ), Γ(n) = (n − 1)!, Γ( 2 √ )= π Cauchy Distribution X has a cauchy distribution with parameter θ if X ∼ Cauchy(θ), −∞ < θ < ∞ 1 1 , −∞ < x < ∞ fX (x) = π 1 + (x − θ)2 Approximation Let X ∼Bin(n, p), where n is large, say (≥ 30). Binomial to Poisson Suitable if n is large and p is small such that λ = np is moderate, where p < 0.1. X ∼ Poisson(np) Binomial to Normal Suitable if np(1 − p) ≥ 10. X − np ≈ Z. √ npq Remember to apply continuity correction (cc). Bin(n, p) ≈ N (np, npq) and Expected Value of Sums of Random Variables X E[X] = X(s)p(s) s∈S i=1 (x,y)∈C fX,Y (x, y) dxdy −∞ 1 Beta Distribution X ∼ Beta(a, b) 1 xa−1 (1 − x)b−1 , 0 < x < 1 fX (x) = B(a, b) 0, otherwise Z 1 Γ(a)Γ(b) a−1 b−1 B(a, b) = x (1 − x) dx = Γ(a + b) 0 a ab E(X) = , Var(X) = 2 a+b (a + b) (a + b + 1) i=1 Joint pmf of X and Z Z Y P ((X, Y ) ∈ C) = MarginalZ pmf of X ∞ fX (x) = fX,Y (x, y) dy dy For " random variables X1 , X2 , · · · , Xn , # n n X X E Xi = E[Xi ] 1 Joint Distribution function of X and Y FX,Y (x, y) = P (X ≤ x, Y ≤ y), ∀x, y ∈ R That is, for any x→b Poisson Random Variable If X ∼ Poisson(λ), X = 0, 1, 2, · · · then e−λ λk , k≥0 P (X = k) = k! E(X) = λ, var(X) = λ ∞ ∞ X X λk −λ −λ λ Note: P (X = k) = e =e e =1 k! k=0 k=0 Distribution of g(X) Let X be a random variable with p.d.f. fX and g(x) be a strictly monotonic function. Then Y = g(X) has p.d.f. ( d g −1 (y) , y = g(x) fX (g −1 (y)) dy fY (y) = 0, y 6= g(x) Jointly Distributed Random Variable Useful Formulae P (a1 < X ≤ a2 , b1 < Y ≤ b2 ) = Z a Z b 2 2 fX,Y (x, y) dydx a1 b1 FX,Y (a, b) = P (X ≤ a, Y ≤ b) = Z a Z b fX,Y (x, y) dydx −∞ −∞ fX,Y (x, y) = ∂2 ∂x∂y FX,Y (x, y) Independent Random Variables Jointly Continuous Random Variable The following are equivalent: 1. X and Y are independent 2. fX,Y (x, y) = fX (x)fY (y), ∀x, y ∈ R 3. FX,Y (x, y) = FX (x)FY (y), ∀x, y ∈ R 4. ∃g, h : R → R, such that for all x, y ∈ R, we have fX,Y (x, y) = g(x)h(y) Note: Neither g(x) and h(y) has to be pdf. Sum of Independent Random Variables If X and Y are continuous and independent, then Z ∞ FX (x − t)fY (t) dt FX+Y (x) = Z −∞ ∞ fX+Y (x) = fX (x − t)fY (t) dt −∞ Sum of 2 Uniform Random Variables If X = Y ∼ U (0, 1) are independent, then 0<x≤1 x, fX+Y = 2 − x, 1 < x < 2. 0, otherwise Sum of 2 Gamma Random Variables If X ∼ Γ(α, λ) and Y ∼ Γ(β, λ) are independent, then X + Y ∼ (α + β, λ) ST2131 Probability Tay Yong Qiang Sum of n Exponential Random Variables If X1 = X2 = · · · = Xn ∼ Exp(λ) are independent, since Γ(1, λ) =Exp(λ), we have X1 + X2 + · · · + Xn ∼ Γ(n, λ). Sum of 2 Normal Random Variables 2 2 If X ∼ N (µ1 , σ1 ) and Y ∼ N (µ2 , σ2 ) are independent, 2 2 X + Y ∼ N (µ1 + µ2 , σ1 + σ2 ) Sum of Independent Discrete Random Variables Sum of 2 Poisson Random Variables If X ∼ Poisson(λ) and Y ∼ Poisson(µ) are id then, X + Y ∼ Poisson(λ + µ). Sum of 2 Binomial Random Variables If X ∼ Bin(n, p) and Y ∼ Bin(m, p) are independent, X + Y ∼ Bin(n + m, p) Sum of 2 Geometric Random Variables If X ∼ Geom(p) and Y ∼ Geom(p) are independent, X + Y ∼ NB(2, p) Note: Geom(p) = NB(1, p) Conditional Distribution: Continuous Case Conditional pdf of X given Y = y fX,Y (x, y) fX|Y (x|y) = , fY (y) > 0 fY (y) Conditional cdf of X given Y = yZ x FX|Y (x|y) = P (X ≤ x|Y = y) = −∞ 0: 1: 2: 3: fX|Y (t|y) dt Variance of a Sum n n X X var Xk = var(Xk ) + 2 k=1 k=1 X cov(Xi , Xj ) 1≤i<j≤n Variance of a Sum under Independence Let X1 , X2 , · · · , Xn be independent random variables, then n n X X var Xk = var(Xk ) k=1 k=1 Sample Variance Let X1 , · · · , Xn be independent and identical, E(Xi ) = µ, n X Xi be the sample mean var(Xi ) = σ 2 . Let X̄ = i=1 n n X (Xi − X̄)2 n−1 var(X̄) = Properties of Expectation If a ≤ x ≤ b, then a ≤ E(X) ≤ b. Proposition 7.1 If X and Y are jointly discrete with joint pmf pX,Y , then XX E[g(X, Y )] = g(x, y)pX,Y (x, y) x If X and Y are jointly continuous with joint pdf pX,Y , then Z ∞ Z ∞ E[g(X, Y )] = g(x, y)fX,Y (x, y) dxdy −∞ Corollaries of Proposition 7.1 1. If g(x, y) ≥ 0 then E[g(X, Y )] ≥ 0. 2. E[g(X, Y ) + h(X, Y )] = E[g(X, Y )] + E[h(X, Y )]. 3. E[g(X) + h(Y )] = E[g(X)] + E[h(Y )]. 4. Monotone Property: If X ≤ Y , then E(X) ≤ E(Y ). 5. E(X + Y ) = E(X) + E(Y ) 6. E(a1 X1 + · · · + an Xn ) = a1 E(X1 ) + · · · + an E(Xn ) Uniqueness Property If X and Y have m.g.f. MX and MY and MX (t) = MY (t), ∀t ∈ (−h, h), then X and Y have the same distribution. MGF of Common Random Variables is called the sample variance. n 2 Things to Note! Binomial Random Variable X ∼ Bin(n, p), MX (t) = (1 − p + pet )n Geometric Random Variable pet X ∼ Geom(p), MX (t) = 1 − (1 − p)et E[S ] = σ 2 Correlation Coefficient The correlation coefficient of random variables X and Y is denoted by ρ(X, Y ), where cov(X, Y ) and −1 ≤ ρ(X, Y ) ≤ 1 ρ(X, Y ) = p var(X)var(Y ) Conditional Expectation X E[X|Y = y] = xpX|Y (x|y) = h(y) Zx∞ E[X|Y = y] = xfX|Y (x|y) dx = h(y) Expectation by Conditioning X E(X|Y = y)P (Y = y) y E[X] = E[E(X|Y )] = Z ∞ E(X|Y = y)fY (y) dy −∞ Probabilities by Conditioning P (A) = E(IA ) = E[E(IA |Y )] X P (A|Y = y)P (Y = y) y = Z ∞ P (A|Y = y)fY (y) dy −∞ Conditional Variance var(X|Y ) = E[(X − E[X|Y ])2 |Y ] var(X) = E[var(X|Y )] + var(E[X|Y ]) Moment Generating Functions X tx e pX (x) x tX Z MX (t) = E[e ]= ∞ tx e fX (x) dx −∞ Covariance cov(X, Y ) = E(X − µX )(Y − µY ) If cov(X, Y ) = 0 then X and Y are uncorrelated. Using MGF to find moments Alternative Formulae cov(X, Y ) = E(XY ) − E(X)E(Y ) = E[X(Y − µY )] = E[Y (X − µX )] Multiplicative Property If X and Y are independent, then MX+Y (t) = MX (t)MY (t) (n) MX (0) = E(X n ) 1. Remember to c.c whenever we try to approximate from discrete to continuous, this applies even when using Central Limit Theorem! ZZ 2. When dealing with fX,Y (x, y) dA, you can try to (x,y)∈A Poisson Random Variable X ∼ Poisson(λ), MX (t) =exp(λ(et − 1)) Uniform Random Variable eβt − eαt X ∼ U (α, β), MX (t) = (β − α)t Exponential Random Variable λ X ∼ Exp(λ), MX (t) = , t<λ λ−t Normal Random Variable X ∼ N (µ, σ 2 ), MX (t) = exp(µt + σ2 Properties of Standard Normal P (Z ≥ 0) = P (Z ≤ 0) = 0.5 −Z ∼ N (0, 1) P (Z ≤ x) = 1 − P (Z > x) P (Z ≤ −x) = P (Z ≥ x) −µ If Y ∼ N (µ, σ 2 ), then X = Y σ ∼ N (0, 1) If X ∼ N (0, 1), then Y = aX + b ∼ N (b, a2 ), a, b ∈ R Bernoulli Random Variable X ∼ Be(p), MX (t) = 1 − p + pet i=1 j=1 j=1 −∞ ∂x ∂y J(x, y) is now in x and y, next convert it to u and v with the inverses found. −∞ i=1 Results ∂y ∂h y Properties of Covariance 1. var(X) = cov(X, X) 2. cov(X, Y ) = cov(Y, X) m n m n X X X X ai bj cov(Xi , Yj ) 3. cov ai Xi , bj Yj = i=1 Determine the support of U and V Find the Jacobian determinant of g and h Find x and y in terms of u and v, i.e. find the ‘inverses’ Substitute x, y and J(x, y) in fU,V (u, v) ∂g ∂g ∂x J(x, y) = ∂h Covariance of Independent Variables If X and Y are independent then cov(X, Y ) = 0. S2 = Joint Dist of Functions of Random Variables Suppose X and Y have joint pdf fX,Y and suppose U = g(X, Y ), V = h(X, Y ), then fU,V (u, v) = fX,Y (x, y)|J(x, y)|−1 Step Step Step Step Expectation Independent Variables If X and Y are independent, then for any g, h : R → R, E[g(X)h(Y )] = E[g(X)]E[h(Y )] April 8, 2017 σ 2 t2 2 ) Joint Moment Generating Functions Let X1 , · · · , Xn be n random variables, then M (t1 , · · · , tn ) = E[et1 X1 +···+tn Xn ] Independent MGFs If X1 , · · · , Xn are independent variables then, M (t1 , · · · , tn ) = MX1 (t1 )MX2 (t2 ) · · · MXn (tn ) Limit Theorems Markov’s Inequality Let X be a nonnengative variable, then E(X) P (X ≥ a) ≤ , a>0 a Chebyshev’s Inequality Let X be a variable with E(X) = µ, var(X) = σ 2 , then for a > 0, σ2 P (|X − µ| ≥ a) ≤ a2 Weak Law of Large Numbers Let X1 , · · · , Xn be independent and identically distributed variables, with E(X) = µ, then ∀ > 0, P X1 + · · · + Xn n − µ ≥ → 0 as n → ∞ Strong Law of Large Numbers Let X1 , X2 , · · · be independent and identically distributed variables with E(Xi ) = µ, then X1 + X2 + · · · + Xn P n → µ as n → ∞ or n o X1 + X2 + · · · + Xn lim =µ n→∞ n Central Limit Theorem Let X1 , X2 , · · · be independent and identically distributed variables with mean µ and variance σ 2 , then X1 + · · · + Xn − nµ ≈Z √ σ n think of it as volume bounded above by fX,Y and below by A.