2. Random Variables Dave Goldsman Georgia Institute of Technology, Atlanta, GA, USA 5/21/14 Goldsman 5/21/14 1 / 147 Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 2 / 147 Intro / Definitions Definition: A random variable (RV) is a function from the sample space to the real line. X : S → R. Example: Flip 2 coins. S = {HH, HT, T H, T T }. Suppose X is the RV corresponding to the # of H’s. X(T T ) = 0, X(HT ) = X(T H) = 1, X(HH) = 2. P (X = 0) = 14 , P (X = 1) = 12 , P (X = 2) = 41 . Notation: Capital letters like X, Y, Z, U, V, W usually represent RV’s. Small letters like x, y, z, u, v, w usually represent particular values of the RV’s. Thus, you can speak of P (X = x). Goldsman 5/21/14 3 / 147 Intro / Definitions Example: Let X be the sum of two dice rolls. Then X((4, 6)) = 10. In addition, 1/36 if x = 2 2/36 if x = 3 .. . P (X = x) = 6/36 if x = 7 ... 1/36 if x = 12 0 otherwise Goldsman 5/21/14 4 / 147 Intro / Definitions Example: Flip a coin. ( X ≡ 0 if T 1 if H Example: Roll a die. ( Y ≡ 0 if {1, 2, 3} 1 if {4, 5, 6} For our purposes, X and Y are the same, since P (X = 0) = P (Y = 0) = 21 and P (X = 1) = P (Y = 1) = 21 . Goldsman 5/21/14 5 / 147 Intro / Definitions Example: Select a real # at random betw 0 and 1. Infinite number of “equally likely” outcomes. Conclusion: P (each individual point) = P (X = x) = 0, believe it or not! But P (X ≤ 0.5) = 0.5 and P (X ∈ [0.3, 0.7]) = 0.4. If A is any interval in [0,1], then P (A) is the length of A. Goldsman 5/21/14 6 / 147 Intro / Definitions Definition: If the number of possible values of a RV X is finite or countably infinite, then X is a discrete RV. Otherwise,. . . A continuous RV is one with prob 0 at every point. Example: Flip a coin — get H or T . Discrete. Example: Pick a point at random in [0, 1]. Continuous. Goldsman 5/21/14 7 / 147 Discrete Random Variables Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 8 / 147 Discrete Random Variables Definition: If X is a discrete RV, its probability mass function (pmf) is f (x) ≡ P (X = x). Note that 0 ≤ f (x) ≤ 1, P x f (x) = 1. Example: Flip 2 coins. Let X be the number of heads. 1/4 if x = 0 or 2 f (x) = 1/2 if x = 1 0 otherwise Example: Uniform Distribution on integers 1, 2, . . . , n. X can equal 1, 2, . . . , n, each with prob 1/n. f (i) = 1/n, i = 1, 2, . . . , n. Goldsman 5/21/14 9 / 147 Discrete Random Variables Example/Definition: Let X denote the number of “successes” from n independent trials such that the P (success) at each trial is p (0 ≤ p ≤ 1). Then X has the Binomial Distribution with parameters n and p. The trials are referred to as Bernoulli trials. Notation: X ∼ Bin(n, p). “X has the Bin distribution” Example: Roll a die 3 indep times. Find P (Get exactly two 6’s). “success” (6) and “failure” (1,2,3,4,5) All 3 trials are indep, and P (success) = 1/6 doesn’t change from trial to trial. Let X = # of 6’s. Then X ∼ Bin(3, 16 ). Goldsman 5/21/14 10 / 147 Discrete Random Variables Theorem: If X ∼ Bin(n, p), then the prob of k successes in n trials is n k n−k P (X = k) = p q , k = 0, 1, . . . , n, k where q = 1 − p. Proof: Particular sequence of successes and failures: · · · F} SS · · · S} |F F {z | {z (prob = pk q n−k ) k successes n − k failures Number of ways to arrange the seq is Goldsman n k . Done. 5/21/14 11 / 147 Discrete Random Variables Back to the dice example, where X ∼ Bin(3, 16 ), and we want P (Get exactly two 6’s). n = 3, k = 2, p = 1/6, q = 5/6. 15 3 1 2 5 1 = P (X = 2) = 2 6 6 216 Goldsman k 0 1 2 3 P (X = k) 125 216 75 216 15 216 1 216 5/21/14 12 / 147 Discrete Random Variables Example: Roll 2 dice 12 times. Find P (Result will be 7 or 11 exactly 3 times). Let X = the number of times get 7 or 11. P (7 or 11) = P (7) + P (11) = 6 36 + 2 36 = 2 9. So X ∼ Bin(12, 2/9). 12 2 3 7 9 P (X = 3) = . 3 9 9 Goldsman 5/21/14 13 / 147 Discrete Random Variables Definition: If P (X = k) = e−λ λk /k!, k = 0, 1, 2, . . ., λ > 0, we say that X has the Poisson distribution with parameter λ. Notation: X ∼ Pois(λ). Example: Suppose the number of raisins in a cup of cookie dough is Pois(10). Find the prob that a cup of dough has at least 4 raisins. P (X ≥ 4) = 1 − P (X = 0, 1, 2, 3) 0 10 101 102 103 = 1 − e−10 + + + 0! 1! 2! 3! = 0.9897. Goldsman 5/21/14 14 / 147 Continuous Random Variables Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 15 / 147 Continuous Random Variables Example: Pick a point X randomly between 0 and 1. ( 1 if 0 ≤ x ≤ 1 f (x) = 0 otherwise P (a < X < b) = area under f (x) from a to b = b − a. Goldsman 5/21/14 16 / 147 Continuous Random Variables Definition: Suppose X is a continuous RV. f (x) is the probability density function (pdf) if R R f (x) dx = 1 (area under f (x) is 1) f (x) ≥ 0, ∀x (always non-negative) R If A ⊆ R, then P (X ∈ A) = A f (x) dx (probability that X is in a certain region A) Remarks: If X is a continuous RV, then P (a < X < b) = Rb a f (x) dx. An individual point has prob 0, i.e., P (X = x) = 0. Think of f (x) dx ≈ P (x < X < x + dx). Goldsman 5/21/14 17 / 147 Continuous Random Variables Note that f (x) denotes both pmf (discrete case) and pdf (continuous case) — but they are different: f (x) = P (X = x) if X is discrete. Must have 0 ≤ f (x) ≤ 1. f (x) dx ≈ P (x < X < x + dx) if X is continuous. Must have f (x) ≥ 0 (and possibly > 1). If X is cts, weR calculate the prob of an event by integrating, P (X ∈ A) = A f (x) dx. Goldsman 5/21/14 18 / 147 Continuous Random Variables Example: If X is “equally likely” to be anywhere between a and b, then X has the uniform distribution on (a, b). ( 1 b−a if a < x < b f (x) = 0 otherwise Notation: X ∼ U(a, b) Note: R R f (x) dx Goldsman = Rb 1 a b−a dx = 1 (as desired). 5/21/14 19 / 147 Continuous Random Variables Example: X has the exponential distriibution with parameter λ > 0 if it has pdf ( λe−λx if x ≥ 0 f (x) = 0 otherwise Notation: X ∼ Exp(λ) Note: R R f (x) dx Goldsman = R∞ 0 λe−λx dx = 1 (as desired). 5/21/14 20 / 147 Continuous Random Variables Example: Suppose X ∼ Exp(1). Then P (X ≤ 3) = R3 P (X ≥ 5) = R∞ 0 5 e−x dx = 1 − e−3 . e−x dx = e−5 . P (2 ≤ X < 4) = P (2 ≤ X ≤ 4) = P (X = 3) = Goldsman R3 3 R4 2 e−x dx = e−2 − e−4 . e−x dx = 0. 5/21/14 21 / 147 Continuous Random Variables Example: Suppose X is a cts RV with pdf ( cx2 if 0 < x < 2 f (x) = 0 otherwise Find c. = R2 Answer: P (0 < X < 1) = R1 Answer: 1 = R R f (x) dx 0 cx2 dx = 38 c, so c = 3/8. Find P (0 < X < 1). Goldsman 3 2 0 8 x dx = 1/8. 5/21/14 22 / 147 Continuous Random Variables Find P (0 < X < 1| 12 < X < 32 ). Answer: 1 3 P 0 < X < 1 < X < 2 2 1 P (0 < X < 1 and 2 < X < 32 ) = P ( 12 < X < 23 ) = = Goldsman P ( 12 < X < 1) P ( 12 < X < 23 ) R1 3 2 1/2 8 x dx = 7/26. R 3/2 3 2 dx x 1/2 8 5/21/14 23 / 147 Cumulative Distribution Functions Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 24 / 147 Cumulative Distribution Functions Definition: For any RV X, the cumulative distribution function (cdf) is defined for all x by F (x) ≡ P (X ≤ x). X discrete implies F (x) = X f (y) = {y|y≤x} X P (X = y). {y|y≤x} X continuous implies Z x F (x) = f (t) dt. −∞ Goldsman 5/21/14 25 / 147 Cumulative Distribution Functions Discrete cdf’s Example: Flip a coin twice. X = number of H’s. ( 0 or 2 w.p. 1/4 X = 1 w.p. 1/2 0 1/4 F (x) = P (X ≤ x) = 3/4 1 if x < 0 if 0 ≤ x < 1 if 1 ≤ x < 2 if x ≥ 2 (You have to be careful about “≤” vs. “<”.) Goldsman 5/21/14 26 / 147 Cumulative Distribution Functions Continuous cdf’s Theorem: If X is a continuous RV, then f (x) = F 0 (x). Proof: F 0 (x) = calculus. d dx Rx −∞ f (t) dt = f (x), by the fundamental theorem of Remark: If X is continuous, you can get from the pdf f (x) to the cdf F (x) by integrating. Goldsman 5/21/14 27 / 147 Cumulative Distribution Functions Example: X ∼ U(0, 1). ( f (x) = F (x) = Goldsman 1 if 0 < x < 1 0 otherwise 0 if x ≤ 0 x if 0 < x < 1 1 if x ≥ 1 5/21/14 28 / 147 Cumulative Distribution Functions Example: X ∼ Exp(λ). ( f (x) = Z ( f (t) dt = −∞ otherwise 0 x F (x) = Goldsman λe−λx if x > 0 1− 0 if x ≤ 0 e−λx if x > 0 5/21/14 29 / 147 Cumulative Distribution Functions Properties of all cdf’s F (x) is non-decreasing in x, i.e., a < b implies that F (a) ≤ F (b). limx→∞ F (x) = 1 and limx→−∞ F (x) = 0. F (x) is right-cts at every point x. Theorem: P (X > x) = 1 − F (x). Proof: 1 = P (X ≤ x) + P (X > x) = F (x) + P (X > x). Goldsman 5/21/14 30 / 147 Cumulative Distribution Functions Theorem: a < b ⇒ P (a < X ≤ b) = F (b) − F (a). Proof: P (a < X ≤ b) = P (X > a ∩ X ≤ b) = P (X > a) + P (X ≤ b) − P (X > a ∪ X ≤ b) = 1 − F (a) + F (b) − 1. Goldsman 5/21/14 31 / 147 Great Expectations Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 32 / 147 Great Expectations Great Expectations Mean (Expected Value) Law of the Unconscious Statistician Variance Chebychev’s Inequality Goldsman 5/21/14 33 / 147 Great Expectations Definition: The mean or expected value or average of a RV X is ( P xf (x) if X is discrete µ ≡ E[X] ≡ R x R xf (x) dx if X is cts The mean gives an indication of a RV’s central tendency. It can be thought of as a weighted average of the possible x’s, where the weights are given by f (x). Goldsman 5/21/14 34 / 147 Great Expectations Example: Suppose X has the Bernoulli distribution with parameter p, i.e., P (X = 1) = p, P (X = 0) = q = 1 − p. Then X E[X] = xP (X = x) = 1 · p + 0 · q = p. x Example: Die. X = 1, 2, . . . , 6, each w.p. 1/6. Then E[X] = X xf (x) = 1 · x 1 1 + · · · + 6 · = 3.5. 6 6 Example: Suppose X has the Geometric distribution with parameter p, i.e., X is the number of Bern(p) trials until you obtain your first success (e.g., F F F F S corresponds to X = 5). Then f (x) = (1 − p)x−1 p, x = 1, 2, . . ., and after a lot of algebra, E[X] = X x Goldsman xP (X = x) = ∞ X x(1 − p)x−1 p = 1/p. x=1 5/21/14 35 / 147 Great Expectations Example: X ∼ Exp(λ). f (x) = λe−λx , x ≥ 0. Then Z E[X] = xf (x) dx R Z ∞ = xλe−λx dx 0 Z ∞ −λx ∞ = −xe |0 − (−e−λx ) dx (by parts) 0 Z ∞ = e−λx dx (L’Hôpital’s rule) 0 = 1/λ. Goldsman 5/21/14 36 / 147 Great Expectations Law of the Unconscious Statistician (LOTUS) Definition/Theorem: The expected value of a function of X, say h(X), is ( P h(x)f (x) if X is discrete E[h(X)] ≡ R x R h(x)f (x) dx if X is cts E[h(X)] is simply a weighted function of h(x), where the weights are the f (x) values. Examples: E[X 2 ] = E[sin X] = Goldsman R R Rx 2 f (x) dx R (sin x)f (x) dx 5/21/14 37 / 147 Great Expectations Just a moment please. . . Definition: The kth moment of X is ( P xk f (x) if X is discrete k E[X ] = R xk R x f (x) dx if X is cts Definition: The kth central moment of X is ( P (x − µ)k f (x) X is discrete E[(X − µ)k ] = R x k R (x − µ) f (x) dx X is cts Goldsman 5/21/14 38 / 147 Great Expectations Definition: The variance of X is the second central moment, i.e., Var(X) ≡ E[(X − µ)2 ]. It’s a measure of spread or dispersion. Notation: σ 2 ≡ Var(X). p Definition: The standard deviation of X is σ ≡ + Var(X). Goldsman 5/21/14 39 / 147 Great Expectations Theorem: For any h(X) and constants a and b, we have E[ah(X) + b] = aE[h(X)] + b. Proof (just do cts case): Z E[ah(X) + b] = (ah(x) + b)f (x) dx Z = a h(x)f (x) dx + b f (x) dx R Z R R = aE[h(X)] + b. Remark: In particular, E[aX + b] = aE[X] + b. Similarly, E[g(X) + h(X)] = E[g(X)] + E[h(X)]. Goldsman 5/21/14 40 / 147 Great Expectations Theorem (easier way to calculate variance): Var(X) = E[X 2 ] − (E[X])2 Proof: Var(X) = E[(X − µ)2 ] = E[X 2 − 2µX + µ2 ] = E[X 2 ] − 2µE[X] + µ2 (by above remarks) = E[X 2 ] − µ2 . Goldsman 5/21/14 41 / 147 Great Expectations Example: X ∼ Bern(p). ( X = 1 w.p. p 0 w.p. q Recall E[X] = p. In fact, for any k, E[X k ] = 0k · q + 1k · p = p. So Var(X) = E[X 2 ] − (E[X])2 = p − p2 = pq. Goldsman 5/21/14 42 / 147 Great Expectations Example: X ∼ U(a, b). f (x) = Z E[X] = E[X 2 ] = a b x a Z 1 b−a , b x2 a < x < b. a+b 1 dx = b−a 2 1 a2 + ab + b2 dx = b−a 3 Var(X) = E[X 2 ] − (E[X])2 = Goldsman (a − b)2 (algebra). 12 5/21/14 43 / 147 Great Expectations Theorem: Var(aX + b) = a2 Var(X). A “shift” doesn’t matter! Proof: Var(aX + b) = E[(aX + b)2 ] − (E[aX + b])2 = E[a2 X 2 + 2abX + b2 ] − (aE[X] + b)2 = a2 E[X 2 ] + 2abE[X] + b2 −(a2 (E[X])2 + 2abE[X] + b2 ) = a2 (E[X 2 ] − (E[X])2 ) = a2 Var(X) Goldsman 5/21/14 44 / 147 Great Expectations Example: X ∼ Bern(0.3) Recall that E[X] = p = 0.3 and Var(X) = pq = (0.3)(0.7) = 0.21. Let Y = h(X) = 4X + 5. Then E[Y ] = E[4X + 5] = 4E[X] + 5 = 6.2 Var(Y ) = Var(4X + 5) = 16Var(X) = 3.36. Goldsman 5/21/14 45 / 147 Great Expectations Approximations to E[h(X)] and Var(h(X)) Sometimes Y = h(X) is messy, and we may have to approximate E[h(X)] and Var(h(X)). Use a Taylor series approach. To do so, let µ = E[X] and σ 2 = Var(X) and note that Y = h(µ) + (X − µ)h0 (µ) + (X − µ)2 00 h (µ) + R, 2 where R is a remainder term which we shall now ignore. Then h00 (µ)σ 2 E[(X − µ)2 ] 00 . h (µ) = h(µ) + E[Y ] = h(µ) + E[X − µ]h0 (µ) + 2 2 and (now we’ll use an even-cruder approximation) . Var(Y ) = Var h(µ) + (X − µ)h0 (µ) = [h0 (µ)]2 σ 2 . Goldsman 5/21/14 46 / 147 Great Expectations Approximations to E[h(X)] and Var(h(X)) Example: Suppose that X has pdf f (x) = 3x2 , 0 ≤ x ≤ 1, and that we want to test out out approximations on the “complicated” random variable Y = h(X) = X 3/4 . Well, it’s not really that complicated, since we can calculate the exact moments: Z 1 E[Y ] = x3/4 f (x) dx = Z x6/4 f (x) dx = Z 0 E[Y 2 ] = Z 1 3x11/4 dx = 4/5 0 1 0 1 3x7/2 dx = 2/3 0 Var(Y ) = E[Y 2 ] − (E[Y ])2 = 2/75 = 0.0267. Before we can do the approximation, note that Z 1 Z 1 xf (x) dx = 3x3 dx = 3/4 µ = E[X] = 0 2 0 2 2 σ = Var(X) = E[X ] − (E[X]) = 3/80 = 0.0375. Goldsman 5/21/14 47 / 147 Great Expectations Further, h(µ) = µ3/4 = (3/4)3/4 = 0.8059 h0 (µ) = (3/4)µ−1/4 = (3/4)(3/4)−1/4 = 0.8059 h00 (µ) = −(3/16)µ−5/4 = −0.2686. Thus, (0.2686)(0.0375) h00 (µ)σ 2 . = 0.8059 − = 0.8009 E[Y ] = h(µ) + 2 2 and . Var(Y ) = [h0 (µ)]2 σ 2 = (0.8059)2 (0.0375) = 0.0243, both of which are reasonably close to their true values. Goldsman 5/21/14 48 / 147 Great Expectations Chebychev’s Inequality Theorem: Suppose that E[X] = µ and Var(X) = σ 2 . Then for any > 0, P (|X − µ| ≥ ) ≤ σ 2 /2 . Proof: See text. Remarks: If = kσ, then P (|X − µ| ≥ kσ) ≤ 1/k 2 . P (|X − µ| < ) ≥ 1 − σ 2 /2 . Chebychev gives a crude bound on the prob that X deviates from the mean by more than a constant, in terms of the constant and the variance. You can always use Chebychev, but it’s crude. Goldsman 5/21/14 49 / 147 Great Expectations Example: Suppose X ∼ U(0, 1). f (x) = 1, 0 < x < 1. E[X] = a+b 2 = 1/2, Var(X) = (a−b)2 12 = 1/12. Then Chebychev implies 1 1 P X − ≥ ≤ . 2 122 In particular, for = 1/3, 1 3 1 (upper bound). ≤ P X − ≥ 2 3 4 Goldsman 5/21/14 50 / 147 Great Expectations Example (cont’d): Let’s compare the above bound to the exact answer. 1 1 P X − ≥ 2 3 1 1 = 1 − P X − < 2 3 1 1 1 = 1−P − <X − < 3 2 3 1 5 = 1−P <X< 6 6 Z 5/6 = 1− f (x) dx 1/6 2 = 1 − = 1/3. 3 (So Chebychev bound was pretty high in comparison.) Goldsman 5/21/14 51 / 147 Functions of a Random Variable Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 52 / 147 Functions of a Random Variable Functions of a Random Variable Problem Statement Discrete Case Continuous Case Inverse Transform Theorem Problem: You have a RV X and you know its pmf/pdf f (x). Define Y ≡ h(X) (some fn of X). Find g(y), the pmf/pdf of Y . Goldsman 5/21/14 53 / 147 Functions of a Random Variable Discrete Case: X discrete implies Y discrete implies X g(y) = P (Y = y) = P (h(X) = y) = P ({x|h(x) = y}) = f (x). x|h(x)=y Example: X is the # of H’s in 2 coin tosses. Want pmf for Y = h(X) = X 3 − X. x 0 1 2 P (X = x) 1 4 1 2 1 4 y = x3 − x 0 0 6 g(0) = P (Y = 0) = P (X = 0 or 1) = 3/4 and g(6) = P (Y = 6) = P (X = 2) = 1/4. ( 3/4 if y = 0 g(y) = 1/4 if y = 6 Goldsman 5/21/14 54 / 147 Functions of a Random Variable Example: X is discrete with f (x) = 1/8 if x = −1 3/8 if x = 0 1/3 if x = 1 1/6 if x = 2 0 otherwise Let Y = X 2 (Y can only equal 0,1,4). P (Y = 0) = f (0) = 3/8 P (Y = 1) = f (−1) + f (1) = 11/24 g(y) = P (Y = 4) = f (2) = 1/6 0, otherwise Goldsman 5/21/14 55 / 147 Functions of a Random Variable Continuous Case: X continuous implies Y can be continuous or discrete. Example: Y = X 2 (clearly cts) ( 0 if X < 0 Example: Y = is not continuous. 1 if X ≥ 0 Method: Compute G(y), the cdf of Y . Z G(y) = P (Y ≤ y) = P (h(X) ≤ y) = f (x) dx. {x|h(x)≤y} If G(y) is cts, construct the pdf g(y) by differentiating. Goldsman 5/21/14 56 / 147 Functions of a Random Variable Example: f (x) = |x|, −1 ≤ x ≤ 1. Find the pdf of the RV Y = X 2 . 2 G(y) = P (Y ≤ y) = P (X ≤ y) = 0 , 1 if y ≥ 1 (?) if 0 < y < 1 where √ √ (?) = P (− y ≤ X ≤ y) = Goldsman if y ≤ 0 Z √ y √ − y |x| dx = y. 5/21/14 57 / 147 Functions of a Random Variable Thus, G(y) = 0 if y ≤ 0 1 if y ≥ 1 y if 0 < y < 1 This implies ( g(y) = G0 (y) = 0 if y ≤ 0 or y ≥ 1 1 if 0 < y < 1 This is the U(0,1) distribution! Goldsman 5/21/14 58 / 147 Functions of a Random Variable Example: Suppose U ∼ U(0, 1). Find the pdf of Y = −`n(1 − U ). G(y) = P (Y ≤ y) = P (−`n(1 − U ) ≤ y) = P (1 − U ≥ e−y ) = P (U ≤ 1 − e−y ) Z 1−e−y f (u) du = 0 = 1 − e−y (since f (u) = 1) Taking the derivative, we have g(y) = e−y , y > 0. Wow! This implies Y ∼ Exp(λ = 1). We can generalize this result. . . Goldsman 5/21/14 59 / 147 Functions of a Random Variable Inverse Transform Theorem: Suppose X is a cts RV having cdf F (x). Then the random variable F (X) ∼ U(0, 1). Proof: Let Y = F (X). Then the cdf of Y is G(y) = P (Y ≤ y) = P (F (X) ≤ y) = P (X ≤ F −1 (y)) (the cdf is mono. increasing) = F (F −1 (y)) (F (x) is the cdf of X) = y. Uniform! Goldsman 5/21/14 60 / 147 Functions of a Random Variable Remark: This is a great theorem, since it applies to all continuous RV’s X. Corollary: X = F −1 (U ), so you can plug in a U(0,1) RV into the inverse cdf to generate a realization of a RV having X’s distribution. Remark: This is what we did in the example on the previous page. This trick has tremendous applications in simulation. Goldsman 5/21/14 61 / 147 Bivariate Random Variables Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 62 / 147 Bivariate Random Variables Bivariate Random Variables Discrete Case Continuous Case Bivariate cdf’s Marginal Distributions Now let’s look at what happens when you consider 2 RV’s simultaneously. Example: Choose a person at random. Look at height and weight (X, Y ). Obviously, X and Y will be related somehow. Goldsman 5/21/14 63 / 147 Bivariate Random Variables Discrete Case Definition: If X and Y are discrete RV’s, then (X, Y ) is called a jointly discrete bivariate RV. The joint (or bivariate) pmf is f (x, y) = P (X = x, Y = y). Properties: (1) 0P≤ P f (x, y) ≤ 1. (2) x y f (x, y) = 1. PP (3) A ⊆ R2 ⇒ P ((X, Y ) ∈ A) = (x,y)∈A f (x, y). Goldsman 5/21/14 64 / 147 Bivariate Random Variables Example: 3 sox in a box (numbered 1,2,3). Draw 2 sox at random w/o replacement. X = # of first sock, Y = # of second sock. The joint pmf f (x, y) is X=1 X=2 X=3 P (Y = y) Y =1 0 1/6 1/6 1/3 Y =2 1/6 0 1/6 1/3 Y =3 1/6 1/6 0 1/3 P (X = x) 1/3 1/3 1/3 1 fX (x) ≡ P (X = x) is the “marginal” distribution of X. fY (y) ≡ P (Y = y) is the “marginal” distribution of Y . Goldsman 5/21/14 65 / 147 Bivariate Random Variables By the law of total probability, P (X = 1) = 3 X P (X = 1, Y = y) = 1/3. y=1 In addition, P (X ≥ 2, Y ≥ 2) XX = f (x, y) x≥2 y≥2 = f (2, 2) + f (2, 3) + f (3, 2) + f (3, 3) = 0 + 1/6 + 1/6 + 0 = 1/3. Goldsman 5/21/14 66 / 147 Bivariate Random Variables Continuous Case Definition: If X and Y are cts RV’s, then (X, Y ) is a jointly cts RV if there exists a function f (x, y) such that (1) fR (x, R y) ≥ 0, ∀x, y. (2) R2 f (x, y) dx dy = 1. RR (3) P (A) = P ((X, Y ) ∈ A) = A f (x, y) dx dy. In this case, f (x, y) is called the joint pdf. If A ⊆ R2 , then P (A) is the volume between f (x, y) and A. Think of f (x, y) dx dy ≈ P (x < X < x + dx, y < Y < y + dy). Easy to see how all of this generalizes the 1-dimensional pdf, f (x). Goldsman 5/21/14 67 / 147 Bivariate Random Variables Easy Example: Choose a point at random in the interior of the circle inscribed in the unit square, e.g., C ≡ (x − 12 )2 + (y − 21 )2 = 14 . Find the pdf of (X, Y ). Since the area of the circle is π/4, ( 4/π if (x, y) ∈ C f (x, y) = 0 otherwise Application: Toss n darts randomly into the unit square. The probability that any individual dart will land in the circle is π/4. It stands to reason that the proportion of darts, p̂n , that land in the circle will be approximately π/4. So you can use 4p̂n to estimate π! Goldsman 5/21/14 68 / 147 Bivariate Random Variables Example: Suppose that ( f (x, y) = 4xy if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 otherwise 0 Find the prob (volume) of the region 0 ≤ y ≤ 1 − x2 . Z V 1 Z 1−x2 = 4xy dy dx 0 Z 0 1Z = √ 1−y 4xy dx dy 0 0 = 1/3. Moral: Be careful with limits! Goldsman 5/21/14 69 / 147 Bivariate Random Variables Bivariate cdf’s Definition: The joint (bivariate) cdf of X and Y is F (x, y) ≡ P (X ≤ x, Y ≤ y), for all x, y. PP discrete s≤x,t≤y f (s, t) F (x, y) = R y R x f (s, t) ds dt continuous −∞ −∞ Goldsman 5/21/14 70 / 147 Bivariate Random Variables Properties: F (x, y) is non-decreasing in both x and y. limx→−∞ F (x, y) = limy→−∞ F (x, y) = 0. limx→∞ F (x, y) = FY (y) = P (Y ≤ y) limy→∞ F (x, y) = FX (x) = P (X ≤ x). limx→∞ limy→∞ F (x, y) = 1. F (x, y) is cts from the right in both x and y. Going from cdf’s to pdf’s (continuous case). 1-dimension: f (x) = F 0 (x) = 2-D: f (x, y) = Goldsman ∂2 ∂x∂y F (x, y) = d dx Rx ∂2 ∂x∂y −∞ f (t) dt. Rx Ry −∞ −∞ f (s, t) dt ds. 5/21/14 71 / 147 Bivariate Random Variables Example: Suppose ( F (x, y) = 1 − e−x − e−y + e−(x+y) if x ≥ 0, y ≥ 0 if x < 0 or y < 0 0 The “marginal” cdf of X is ( FX (x) = lim F (x, y) = y→∞ 1 − e−x if x ≥ 0 0 if x < 0 The joint pdf is ∂2 F (x, y) ∂x∂y ∂ −x (e − e−y e−x ) = ∂y ( e−(x+y) if x ≥ 0, y ≥ 0 = 0 if x < 0 or y < 0 f (x, y) = Goldsman 5/21/14 72 / 147 Bivariate Random Variables Marginal Distributions — Distrns of X and Y . Example (discrete case): f (x, y) = P (X = x, Y = y). X=1 X=2 X=3 P (Y = y) Y =4 0.01 0.07 0.12 0.2 Y =6 0.29 0.03 0.48 0.8 P (X = x) 0.3 0.1 0.6 1 By total probability, P (X = 1) = P (X = 1, Y = any #) = 0.3. Goldsman 5/21/14 73 / 147 Bivariate Random Variables Definition: If X and Y are jointly discrete, then the marginal pmf’s of X and Y are, respectively, X fX (x) = P (X = x) = f (x, y) y and fY (y) = P (Y = y) = X f (x, y) x Goldsman 5/21/14 74 / 147 Bivariate Random Variables Example (discrete case): f (x, y) = P (X = x, Y = y). X=1 X=2 X=3 P (Y = y) Y =4 0.06 0.02 0.12 0.2 Y =6 0.24 0.08 0.48 0.8 P (X = x) 0.3 0.1 0.6 1 Hmmm. . . . Compared to the last example, this has the same marginals but different joint distribution! Goldsman 5/21/14 75 / 147 Bivariate Random Variables Definition: If X and Y are jointly cts, then the marginal pdf’s of X and Y are, respectively, Z Z fX (x) = f (x, y) dy and fY (y) = f (x, y) dx. R R Example: ( f (x, y) = e−(x+y) if x ≥ 0, y ≥ 0 otherwise 0 Then the marginal pdf of X is Z fX (x) = f (x, y) dy = R Goldsman Z ( ∞ e 0 −(x+y) dy = e−x 0 if x ≥ 0 otherwise 5/21/14 76 / 147 Bivariate Random Variables Example: ( f (x, y) = 21 2 4 x y 0 if x2 ≤ y ≤ 1 otherwise Note funny limits where the pdf is positive, i.e., x2 ≤ y ≤ 1. Z fX (x) = Z 1 f (x, y) dy = x2 R Z fY (y) = Goldsman Z f (x, y) dx = R 21 21 2 x y dy = x2 (1 − x4 ), 4 8 √ y √ − y 21 2 7 x y dx = y 5/2 , 4 2 if −1 ≤ x ≤ 1 if 0 ≤ y ≤ 1 5/21/14 77 / 147 Conditional Distributions Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 78 / 147 Conditional Distributions Conditional Distributions Recall conditional probability: P (A|B) = P (A ∩ B)/P (B) if P (B) > 0. Suppose that X and Y are jointly discrete RV’s. Then if P (Y = y) > 0, P (X = x|Y = y) = P (X = x ∩ Y = y) f (x, y) = P (Y = y) fY (y) P (X = x|Y = 2) defines the probabilities on X given that Y = 2. Definition: If fY (y) > 0, then fX|Y (x|y) ≡ pmf/pdf of X given Y = y. f (x,y) fY (y) is the conditional Remark: Usually just write f (x|y) instead of fX|Y (x|y). Remark: Of course, fY |X (y|x) = f (y|x) = Goldsman f (x,y) fX (x) . 5/21/14 79 / 147 Conditional Distributions Discrete Example: f (x, y) = P (X = x, Y = y). X=1 X=2 X=3 fY (y) Y =4 0.01 0.07 0.12 0.2 Y =6 0.29 0.03 0.48 0.8 fX (x) 0.3 0.1 0.6 1 Then, for example, f (x|y = 6) = Goldsman f (x, 6) f (x, 6) = = fY (6) 0.8 29 80 3 80 48 80 0 if x = 1 if x = 2 if x = 3 otherwise 5/21/14 80 / 147 Conditional Distributions Old Cts Example: f (x, y) = 21 2 x y, if x2 ≤ y ≤ 1 4 fX (x) = 21 2 x (1 − x4 ), if −1 ≤ x ≤ 1 8 fY (y) = 7 5/2 y , if 0 ≤ y ≤ 1 2 Then, for example, f ( 1 , y) 1 f (y| ) = 2 1 = 2 fX ( 2 ) Goldsman 21 8 · 21 4 1 4 · · 14 y (1 − 1 16 ) = 32 y, 15 if 1 4 ≤ y ≤ 1. 5/21/14 81 / 147 Conditional Distributions More generally, f (x, y) fX (x) f (y|x) = = 21 2 4 x y , 21 2 4 8 x (1 − x ) = 2y , 1 − x4 if x2 ≤ y ≤ 1 if x2 ≤ y ≤ 1. Note: 2/(1 − x4 ) is a constant with respect to y, and we can check to see that f (y|x) is a legit condl pdf: Z 1 x2 Goldsman 2y dy = 1. 1 − x4 5/21/14 82 / 147 Conditional Distributions Typical Problem: Given fX (x) and f (y|x), find fY (y). Steps: (1) f (x, y) = fX (x)f (y|x) (2) fY (y) = R R f (x, y) dx. Example: fX (x) = 2x, 0 < x < 1. Given X = x, suppose that Y |x ∼ U(0, x). Now find fY (y). Goldsman 5/21/14 83 / 147 Conditional Distributions Solution: Y |x ∼ U(0, x) ⇒ f (y|x) = 1/x, 0 < y < x. So f (x, y) = fX (x)f (y|x) 1 = 2x · , if 0 < x < 1 and 0 < y < x x = 2, if 0 < y < x < 1. Thus, Z fY (y) = 1 2 dx = 2(1 − y), 0 < y < 1. f (x, y) dx = R Goldsman Z y 5/21/14 84 / 147 Independent Random Variables Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 85 / 147 Independent Random Variables Independence Recall that two events are independent if P (A ∩ B) = P (A)P (B). Then P (A|B) = P (A ∩ B) P (A)P (B) = = P (A). P (B) P (B) And similarly, P (B|A) = P (B). Now want to define independence for RV’s, i.e., the outcome of X doesn’t influence the outcome of Y . Definition: X and Y are independent RV’s if, for all x and y, f (x, y) = fX (x)fY (y). Goldsman 5/21/14 86 / 147 Independent Random Variables Equivalent definitions: F (x, y) = FX (x)FY (y), ∀x, y or P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y), ∀x, y If X and Y aren’t indep, then they’re dependent. Theorem: X and Y are indep if and only if f (y|x) = fY (y) ∀x, y. Proof: f (y|x) = fX (x)fY (y) f (x, y) = = fY (y). fX (x) fX (x) Similarly, X and Y indep implies f (x|y) = fX (x). Goldsman 5/21/14 87 / 147 Independent Random Variables Example (discrete): f (x, y) = P (X = x, Y = y). X=1 X=2 fY (y) Y =2 0.12 0.28 0.4 Y =3 0.18 0.42 0.6 fX (x) 0.3 0.7 1 X and Y are indep since f (x, y) = fX (x)fY (y), ∀x, y. Goldsman 5/21/14 88 / 147 Independent Random Variables Example (cts): Suppose f (x, y) = 6xy 2 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. After some work (which can be avoided by the next theorem), we can derive fX (x) = 2x, if 0 ≤ x ≤ 1, and fY (y) = 3y 2 , if 0 ≤ y ≤ 1. X and Y are indep since f (x, y) = fX (x)fY (y), ∀x, y. Goldsman 5/21/14 89 / 147 Independent Random Variables Easy way to tell if X and Y are indep. . . Theorem: X and Y are indep iff f (x, y) = a(x)b(y), ∀x, y, for some functions a(x) and b(y) (not necessarily pdf’s). So if f (x, y) factors into separate functions of x and y, then X and Y are indep. Example: f (x, y) = 6xy 2 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Take a(x) = 6x, 0 ≤ x ≤ 1, and b(y) = y 2 , 0 ≤ y ≤ 1. Thus, X and Y are indep (as above). Goldsman 5/21/14 90 / 147 Independent Random Variables Example: f (x, y) = 21 2 4 x y, x2 ≤ y ≤ 1. “Funny” (non-rectangular) limits make factoring into marginals impossible. Thus, X and Y are not indep. Example: f (x, y) = c x+y , 1 ≤ x ≤ 2, 1 ≤ y ≤ 3. Can’t factor f (x, y) into fn’s of x and y separately. Thus, X and Y are not indep. Now that we can figure out if X and Y are indep, what can we do with that knowledge? Goldsman 5/21/14 91 / 147 Independent Random Variables Consequences of Independence Definition/Theorem (another Unconscious Statistician): Let h(X, Y ) be a fn of the RV’s X and Y . Then ( P P h(x, y)f (x, y) discrete E[h(X, Y )] = R xR y R R h(x, y)f (x, y) dx dy continuous Theorem: Whether or not X and Y are indep, E[X + Y ] = E[X] + E[Y ]. Goldsman 5/21/14 92 / 147 Independent Random Variables Proof (cts case): Z Z E[X + Y ] = (x + y)f (x, y) dx dy R R Z Z Z Z = xf (x, y) dx dy + R R Z = Z x R R Z f (x, y) dy dx + R Z = yf (x, y) dx dy R Z y R f (x, y) dx dy R Z xfX (x) dx + R yfY (y) dy R = E[X] + E[Y ]. Goldsman 5/21/14 93 / 147 Independent Random Variables Can generalize this result to more than two RV’s. Corollary: If X1 , X2 , . . . , Xn are RV’s, then X n n X E Xi = E[Xi ]. i=1 i=1 Proof: Induction. Goldsman 5/21/14 94 / 147 Independent Random Variables Theorem: If X and Y are indep, then E[XY ] = E[X]E[Y ]. Proof (cts case): Z Z xyf (x, y) dx dy E[XY ] = R R Z Z xyfX (x)fY (y) dx dy (X and Y are indep) = R R Z = Z xfX (x) dx R yfY (y) dy R = E[X]E[Y ]. Goldsman 5/21/14 95 / 147 Independent Random Variables Remark: The above theorem is not necessarily true if X and Y are dependent. See the upcoming discussion on covariance. Theorem: If X and Y are indep, then Var(X + Y ) = Var(X) + Var(Y ). Remark: The assumption of independence really is important here. If X and Y aren’t independent, then the result might not hold. Corollary: If X1 , X2 , . . . , Xn are indep RV’s, then X n n X Var Xi = Var(Xi ). i=1 i=1 Proof: Induction. Goldsman 5/21/14 96 / 147 Independent Random Variables Proof: Var(X + Y ) = E[(X + Y )2 ] − (E[X + Y ])2 = E[X 2 + 2XY + Y 2 ] − (E[X] + E[Y ])2 = E[X 2 ] + 2E[XY ] + E[Y 2 ] −(E[X])2 − 2E[X]E[Y ] − (E[Y ])2 = E[X 2 ] + 2E[X]E[Y ] + E[Y 2 ] −(E[X])2 − 2E[X]E[Y ] − (E[Y ])2 = E[X 2 ] − (E[X])2 + E[Y 2 ] − (E[Y ])2 = Var(X) + Var(Y ). Goldsman 5/21/14 97 / 147 Independent Random Variables Random Samples Definition: X1 , X2 , . . . , Xn form a random sample if Xi ’s are all independent. Each Xi has the same pmf/pdf f (x). iid Notation: X1 , . . . , Xn ∼ f (x) (“indep and identically distributed”) Goldsman 5/21/14 98 / 147 Independent Random Variables iid Example/Theorem: Suppose X1 , . . . , Xn ∼ f (x) with E[Xi ] = µ and Var(Xi ) = σ 2 . Define the sample mean as n 1X Xi . X̄ ≡ n i=1 Then X n n n 1 1X 1X E[X̄] = E Xi = E[Xi ] = µ = µ. n n n i=1 i=1 i=1 So the mean of X̄ is the same as the mean of Xi . Goldsman 5/21/14 99 / 147 Independent Random Variables Meanwhile,. . . X n 1 Xi Var(X̄) = Var n i=1 X n 1 = Var Xi n2 i=1 = = n 1 X Var(Xi ) (Xi ’s indep) n2 1 n2 i=1 n X σ 2 = σ 2 /n. i=1 So the mean of X̄ is the same as the mean of Xi , but the variance decreases! This makes X̄ a great estimator for µ (which is usually unknown in practice), and the result is referred to as the Law of Large Numbers. Stay tuned. Goldsman 5/21/14 100 / 147 Conditional Expectation Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 101 / 147 Conditional Expectation Conditional Expectation Usual definition of expectation. E.g., what’s the avg weight of a male? ( P yf (y) discrete E[Y ] = R y R yf (y) dy continuous Now suppose we’re interested in the avg weight of a 6’6" tall male. f (y|x) is the conditional pdf/pmf of Y given X = x. Definition: The conditional expectation of Y given X = x is ( P yf (y|x) discrete E[Y |X = x] ≡ R y R yf (y|x) dy continuous Note that E[Y |X = x] is a function of x. Goldsman 5/21/14 102 / 147 Conditional Expectation Discrete Example: f (x, y) X=0 X=3 fY (y) Y =2 0.11 0.34 0.45 Y =5 0.00 0.05 0.05 Y = 10 0.29 0.21 0.50 fX (x) 0.40 0.60 1 The unconditional expectation is E[Y ] = conditional on X = 3, we have P y yfY (y) = 6.15. But f (3, y) f (3, y) = = f (y|x = 3) = fX (3) 0.60 34 60 5 60 21 60 if y = 2 if y = 5 if y = 10 So the expectation conditional on X = 3 is X 5 21 34 E[Y |X = 3] = yf (y|3) = 2 +5 + 10 = 5.05. 60 60 60 y Goldsman 5/21/14 103 / 147 Conditional Expectation Old Cts Example: f (x, y) = 21 2 x y, if x2 ≤ y ≤ 1. 4 f (y|x) = 2y if x2 ≤ y ≤ 1. 1 − x4 Recall that Thus, Z E[Y |x] = yf (y|x) dy = R Thus, e.g., E[Y |X = 0.5] = Goldsman 2 3 · 2 1 − x4 63 15 64 / 16 Z 1 x2 y 2 dy = 2 1 − x6 · . 3 1 − x4 = 0.70. 5/21/14 104 / 147 Conditional Expectation Theorem (double expectations): E[E(Y |X)] = E[Y ]. Remarks: Yikes, what the heck is this!? The expected value (averaged over all X’s) of the conditional expected value (of Y |X) is the plain old expected value (of Y ). Think of the outside exp value as the exp value of h(X) = E(Y |X). Then the Law of the Unconscious Statistician miraculously gives us E[Y ]. Believe it or not, sometimes it’s easier to calculate E[Y ] indirectly by using our double expectation trick. Goldsman 5/21/14 105 / 147 Conditional Expectation Proof (cts case): By the Unconscious Statistician, Z E[E(Y |X)] = E(Y |x)fX (x) dx R Z Z yf (y|x) dy fX (x) dx = R R Z Z = yf (y|x)fX (x) dx dy R R Z = Z y R f (x, y) dx dy R Z = yfY (y) dy = E[Y ]. R Goldsman 5/21/14 106 / 147 Conditional Expectation Old Example: Suppose f (x, y) = ways. 21 2 4 x y, if x2 ≤ y ≤ 1. Find E[Y ] two By previous examples, we know that fX (x) = 21 2 x (1 − x4 ), if −1 ≤ x ≤ 1 8 fY (y) = 7 5/2 y , if 0 ≤ y ≤ 1 2 E[Y |x] = Goldsman 2 1 − x6 · . 3 1 − x4 5/21/14 107 / 147 Conditional Expectation Solution #1 (old, boring way): Z Z E[Y ] = yfY (y) dy = 0 R 1 7 7 7/2 y dy = . 2 9 Solution #2 (new, exciting way): E[Y ] = E[E(Y |X)] Z E(Y |x)fX (x) dx = R Z 1 = −1 2 1 − x6 · 3 1 − x4 21 2 7 4 x (1 − x ) dx = . 8 9 Notice that both answers are the same (good)! Goldsman 5/21/14 108 / 147 Conditional Expectation Example: An alternative way to calculate the mean of the Geom(p). Let Y ∼ Geom(p), e.g., Y could be the number of coin flips before H appears, where P (H) = p. Further, let ( 1 if first flip is H X = . 0 otherwise Based on the result X of the first step, we have E[Y ] = E[E(Y |X)] X = E(Y |x)fX (x) x = E(Y |X = 0)P (X = 0) + E(Y |X = 1)P (X = 1) = (1 + E[Y ])(1 − p) + 1(p). (why?) Solving, we get E[Y ] = 1/p (which is indeed the correct answer!) Goldsman 5/21/14 109 / 147 Conditional Expectation Theorem (expectation of the sum of a random number of RV’s): Suppose that X1 , X2 , . . . are independent RV’s, all with the same mean. Also suppose that N is a nonnegative, integer-valued RV, that’s independent of the Xi ’s. Then ! N X E Xi = E[N ]E[X1 ]. i=1 Remark: PN You have to be very careful here. In particular, note that E( i=1 Xi ) 6= N E[X1 ], since the LHS is a number and the RHS is random. Goldsman 5/21/14 110 / 147 Conditional Expectation Proof: By double expectation and the fact that N is indep of the Xi ’s, ! " !# N N X X E Xi = E E Xi N i=1 = = = = ∞ X n=1 ∞ X n=1 ∞ X n=1 ∞ X E i=1 N X i=1 E n X i=1 E n X ! Xi N = n P (N = n) ! Xi N = n P (N = n) ! Xi P (N = n) i=1 nE[X1 ]P (N = n) n=1 = E[X1 ] ∞ X nP (N = n). 2 n=1 Goldsman 5/21/14 111 / 147 Conditional Expectation Example: Suppose the number of times we roll a die is N ∼ Pois(10). If Xi denotes the value of the ith toss, then the expected total of all of the rolls is ! N X E Xi = E[N ]E[X1 ] = 10(3.5) = 35. 2 i=1 Theorem: Under the same conditions as before, ! N X Var Xi = E[N ]Var(X1 ) + (E[X1 ])2 Var(N ). i=1 Proof: See, for instance, Ross. Goldsman 2 5/21/14 112 / 147 Conditional Expectation Computing Probabilities by Conditioning Let A be some event, and define the RV Y as: ( 1 if A occurs Y = . 0 otherwise Then E[Y ] = X yfY (y) = P (Y = 1) = P (A). y Similarly, for any RV X, we have E[Y |X = x] = X yfY (y|x) y = P (Y = 1|X = x) = P (A|X = x). Goldsman 5/21/14 113 / 147 Conditional Expectation Further, since E[Y ] = E[E(Y |X)], we have P (A) = E[Y ] = E[E(Y |X)] Z = E[Y |x]dFX (x) R Z = P (A|X = x)dFX (x). R Goldsman 5/21/14 114 / 147 Conditional Expectation Example/Theorem: If X and Y are independent continuous RV’s, then Z P (Y < X) = FY (x)fX (x) dx, R where FY (·) is the c.d.f. of Y and fX (·) is the p.d.f. of X. Proof: (Actually, there are many proofs.) Let the event A = {Y < X}. Then Z P (Y < X) = P (Y < X|X = x)fX (x) dx R Z = P (Y < x|X = x)fX (x) dx ZR = P (Y < x)fX (x) dx R (since X, Y are indep). 2 Goldsman 5/21/14 115 / 147 Conditional Expectation Example: If X ∼ Exp(µ) and Y ∼ Exp(λ) are independent RV’s. Then Z P (Y < X) = FY (x)fX (x) dx R Z ∞ = (1 − e−λx )µe−µx dx 0 = Goldsman λ . 2 λ+µ 5/21/14 116 / 147 Conditional Expectation Example/Theorem: If X and Y are independent continuous RV’s, then Z P (X + Y < a) = FY (a − x)fX (x) dx, R where FY (·) is the c.d.f. of Y and fX (·) is the p.d.f. of X. The quantity X + Y is called a convolution. Proof: Z P (X + Y < a) = P (X + Y < a|X = x)fX (x) dx ZR P (Y < a − x|X = x)fX (x) dx = ZR P (Y < a − x)fX (x) dx = R (since X, Y are indep). 2 Goldsman 5/21/14 117 / 147 Conditional Expectation iid Example: Suppose X, Y ∼ Exp(λ). Note that −λ(a−x) if a − x ≥ 0 and x ≥ 0 1−e FY (a − x) = (i.e., 0 ≤ x ≤ a) 0 if otherwise Z FY (a − x)fX (x) dx ZRa = (1 − e−λ(a−x) )λe−λx dx P (X + Y < a) = 0 = 1 − e−λa − λae−λa , d P (X + Y < a) = λ2 ae−λa , da This implies that X + Y ∼ Gamma(2, λ). 2 Goldsman if a ≥ 0. a ≥ 0. 5/21/14 118 / 147 Conditional Expectation And Now, A Word From Our Sponsor. . . Congratulations! You are now done with the most difficult module of the course! Things will get easier from here (I hope)! Goldsman 5/21/14 119 / 147 Covariance and Correlation Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 120 / 147 Covariance and Correlation Covariance and Correlation These are measures used to define the degree of association between X and Y if they don’t happen to be indep. Definition: The covariance between X and Y is Cov(X, Y ) ≡ σXY ≡ E[(X − E[X])(Y − E[Y ])]. Remark: Cov(X, X) = E[(X − E[X])2 ] = Var(X). If X and Y have positive covariance, then X and Y move “in the same direction.” Think height and weight. If X and Y have negative covariance, then X and Y move “in opposite directions.” Think snowfall and temperature. Goldsman 5/21/14 121 / 147 Covariance and Correlation Theorem (easier way to calculate Cov): Cov(X, Y ) = E[XY ] − E[X]E[Y ]. Proof: Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] = E XY − XE[Y ] − Y E[X] + E[X]E[Y ] = E[XY ] − E[X]E[Y ] − E[Y ]E[X] + E[X]E[Y ] = E[XY ] − E[X]E[Y ]. Goldsman 5/21/14 122 / 147 Covariance and Correlation Theorem: X and Y indep implies Cov(X, Y ) = 0. Proof: Cov(X, Y ) = E[XY ] − E[X]E[Y ] = E[X]E[Y ] − E[X]E[Y ] (X, Y indep) = 0. Goldsman 5/21/14 123 / 147 Covariance and Correlation Danger Will Robinson: Cov(X, Y ) = 0 does not imply X and Y are indep!! Example: Suppose X ∼ U(−1, 1) and Y = X 2 (so X and Y are clearly dependent). But Z 1 1 dx = 0 and 2 −1 Z 1 1 3 E[XY ] = E[X ] = x3 · dx = 0, 2 −1 E[X] = x· so Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0. Goldsman 5/21/14 124 / 147 Covariance and Correlation Definition: The correlation between X and Y is Cov(X, Y ) σXY ρ = Corr(X, Y ) ≡ p . = σ Var(X)Var(Y ) X σY Remark: Cov has “square” units; corr is unitless. Corollary: X, Y indep implies ρ = 0. Theorem: It can be shown that −1 ≤ ρ ≤ 1. ρ ≈ 1 is “high” corr ρ ≈ 0 is “low” corr ρ ≈ −1 is “high” negative corr. Example: Height is highly correlated with weight. Temperature on Mars has low corr with IBM stock price. Goldsman 5/21/14 125 / 147 Covariance and Correlation Anti-UGA Example: Suppose X is the avg GPA of a UGA student, and Y is his IQ. Here’s the joint pmf. Goldsman f (x, y) X=2 X=3 X=4 fY (y) Y = 40 0.0 0.2 0.1 0.3 Y = 50 0.15 0.1 0.05 0.3 Y = 60 0.3 0.0 0.1 0.4 fX (x) 0.45 0.3 0.25 1 5/21/14 126 / 147 Covariance and Correlation X E[X] = xfX (x) = 2.8 x X E[X 2 ] = x2 fX (x) = 8.5 x Var(X) = E[X 2 ] − (E[X])2 = 0.66 Similarly, E[Y ] = 51, E[Y 2 ] = 2670, and Var(Y ) = 69. XX E[XY ] = xyf (x, y) x y = 2(40)(.0) + · · · + 4(60)(.1) = 140 Cov(X, Y ) = E[XY ] − E[X]E[Y ] = −2.8 Cov(X, Y ) = −0.415. ρ = p Var(X)Var(Y ) Goldsman 5/21/14 127 / 147 Covariance and Correlation Cts Example: Suppose f (x, y) = 10x2 y, 0 ≤ y ≤ x ≤ 1. Z x fX (x) = 10x2 y dy = 5x4 , 0 ≤ x ≤ 1 0 Z 1 5x5 dx = 5/6 E[X] = 0 E[X 2 ] = Z 1 5x6 dx = 5/7 0 Var(X) = E[X 2 ] − (E[X])2 = 0.01984 Goldsman 5/21/14 128 / 147 Covariance and Correlation Similarly, Z fY (y) = 1 10x2 y dx = y E[Y ] = 5/9, Z Var(Y ) = 0.04850 1Z x E[XY ] = 0 10 y(1 − y 3 ), 0 ≤ y ≤ 1 3 10x3 y 2 dy dx = 10/21 0 Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.01323 Cov(X, Y ) ρ = p = 0.4265 Var(X)Var(Y ) Goldsman 5/21/14 129 / 147 Covariance and Correlation Theorems Involving Covariance Theorem: Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ), whether or not X and Y are indep. Remark: If X, Y are indep, the Cov term goes away. Proof: By the work we did on a previous proof, Var(X + Y ) = E[X 2 ] − (E[X])2 + E[Y 2 ] − (E[Y ])2 +2(E[XY ] − E[X]E[Y ]) = Var(X) + Var(Y ) + 2Cov(X, Y ). Goldsman 5/21/14 130 / 147 Covariance and Correlation Theorem: n n X X XX Var( Xi ) = Var(Xi ) + 2 i=1 i<j i=1 Cov(Xi , Xj ). Proof: Induction. Remark: If all Xi ’s are indep, then n n X X Xi ) = Var(Xi ). Var( i=1 Goldsman i=1 5/21/14 131 / 147 Covariance and Correlation Theorem: Cov(aX, bY ) = abCov(X, Y ). Proof: Cov(aX, bY ) = E[aX · bY ] − E[aX]E[bY ] = abE[XY ] − abE[X]E[Y ] = abCov(X, Y ). Theorem: n n X X XX Var( ai Xi ) = a2i Var(Xi ) + 2 i=1 i=1 i<j ai aj Cov(Xi , Xj ). Proof: Put above two results together. Goldsman 5/21/14 132 / 147 Covariance and Correlation Example: Var(X − Y ) = Var(X) + Var(Y ) − 2Cov(X, Y ). Example: Var(X − 2Y + 3Z) = Var(X) + 4Var(Y ) + 9Var(Z) −4Cov(X, Y ) + 6Cov(X, Z) − 12Cov(Y, Z). Goldsman 5/21/14 133 / 147 Moment Generating Functions Outline 1 Intro / Definitions 2 Discrete Random Variables 3 Continuous Random Variables 4 Cumulative Distribution Functions 5 Great Expectations 6 Functions of a Random Variable 7 Bivariate Random Variables 8 Conditional Distributions 9 Independent Random Variables 10 Conditional Expectation 11 Covariance and Correlation 12 Moment Generating Functions Goldsman 5/21/14 134 / 147 Moment Generating Functions Introduction Recall that E[X k ] is the kth moment of X. Definition: MX (t) ≡ E[etX ] is the moment generating function (mgf) of the RV X. Remark: MX (t) is a function of t, not of X! Example: X ∼ Bern(p). ( X = 1 w.p. p 0 w.p. q Then MX (t) = E[etX ] = X etx f (x) = et·1 p + et·0 q = pet + q. x Goldsman 5/21/14 135 / 147 Moment Generating Functions Example: X ∼ Exp(λ). ( f (x) = λe−λx if x > 0 otherwise 0 Then MX (t) = E[etX ] Z = etx f (x) dx R Z ∞ = λ e(t−λ)x dx 0 = Goldsman λ if λ > t. λ−t 5/21/14 136 / 147 Moment Generating Functions Big Theorem: Under certain technical conditions, dk k E[X ] = MX (t) , k = 1, 2, . . . . k dt t=0 Thus, you can generate the moments of X from the mgf. (Sometimes, it’s easier to get moments this way than directly.) Goldsman 5/21/14 137 / 147 Moment Generating Functions “Proof:” MX (t) = E[etX ] = E X ∞ (tX)k k=0 = 1 + tE[X] + k! t2 E[X 2 ] 2 = ∞ X (tX)k E k! k=0 + ··· This implies d MX (t) = E[X] + tE[X 2 ] + · · · dt and so d MX (t) = E[X]. dt t=0 Same deal for higher-order moments. Goldsman 5/21/14 138 / 147 Moment Generating Functions Example: X ∼ Exp(λ). Then MX (t) = E[X] = λ λ−t for λ > t. So λ d MX (t) = = 1/λ. dt (λ − t)2 t=0 t=0 Further, d2 2λ E[X ] = MX (t) = = 2/λ2 . dt2 (λ − t)3 t=0 t=0 2 Thus, Var(X) = E[X 2 ] − (E[X])2 = Goldsman 1 2 − = 1/λ2 . λ2 λ2 5/21/14 139 / 147 Moment Generating Functions Other Applications You can do lots of nice things with mgf’s. . . Find the mgf of a linear function of X. Find the mgf of the sum of indep RV’s. Identify distributions. Derive cool properties of distributions. Goldsman 5/21/14 140 / 147 Moment Generating Functions Theorem: Suppose X has mgf MX (t) and let Y = aX + b. Then MY (t) = etb MX (at). Proof: MY (t) = E[etY ] = E[et(aX+b) ] = etb E[e(at)X ] = etb MX (at). Example: Let X ∼ Exp(λ) and Y = 3X + 2. Then MY (t) = e2t MX (3t) = e2t Goldsman λ , if λ > 3t. λ − 3t 5/21/14 141 / 147 Moment Generating Functions Theorem: Suppose X1 , . . . , Xn are indep. Let Y = MY (t) = n Y Pn i=1 Xi . Then MXi (t). i=1 Proof: tY t MY (t) = E[e ] = E[e P Xi Y n tXi ] = E e i=1 = n Y E[etXi ] (Xi ’s indep) i=1 = n Y MXi (t). i=1 Goldsman 5/21/14 142 / 147 Moment Generating Functions Corollary: If X1 , . . . , Xn are i.i.d. and Y = Pn i=1 Xi , then MY (t) = [MX1 (t)]n . iid Example: Suppose X1 , . . . , Xn ∼ Bern(p). Then by a previous example, MY (t) = [MX1 (t)]n = (pet + q)n . So what use is a result like this? Goldsman 5/21/14 143 / 147 Moment Generating Functions Theorem: If X and Y have the same mgf, then they have the same distribution (at least in this course)! Proof: Too hard for this course. Example: The sum Y of n i.i.d. Bern(p) RV’s is the same as a Binomial(n, p) RV. By the previous example, MY (t) = (pet + q)n . So all we need to show is that the mgf of the matches this. Goldsman 5/21/14 144 / 147 Moment Generating Functions Meanwhile, let Z ∼ Bin(n, p). MZ (t) = E[etZ ] = X etz P (Z = z) z n X tz n = e pz q n−z z z=0 n X n = (pet )z q n−z z z=0 t = (pe + q)n (by the binomial theorem), and this matches the mgf of Y from the last pg. Goldsman 5/21/14 145 / 147 Moment Generating Functions Example: You can identify a distribution by its mgf. MX (t) = 3 t 1 e + 4 4 15 implies that X ∼ Bin(15, 0.75). Example: MY (t) = e −2t 3 3t 1 e + 4 4 15 implies that Y has the same distribution as 3X − 2, where X ∼ Bin(15, 0.75). Goldsman 5/21/14 146 / 147 Moment Generating Functions Theorem (Additive property of Binomials): If X1 , . . . , Xk are indep, with Xi ∼ Bin(ni , p) (where p is the same for all Xi ’s), then Y ≡ k X k X Xi ∼ Bin( ni , p). i=1 i=1 Proof: MY (t) = k Y MXi (t) (mgf of indep sum) i=1 = k Y (pet + q)ni (Binomial(ni , p) mgf) i=1 Pk = (pet + q) i=1 ni . P This is the mgf of the Bin( ki=1 ni , p), so we’re done. Goldsman 5/21/14 147 / 147