STAT511 — Spring 2014 1 Lecture Notes Chapter 5 February 19, 2014 Joint Probability Distributions and Random Samples 5.1 Jointly Distributed Random Variables Chapter Overview • Jointly distributed rv – Joint mass function, marginal mass function for discrete rv – Joint density function, marginal density function for continuous rv – Independent random variables • Expectation, covariance and correlation between two rvs – – – – Expectation Covariance Correlation Interpretations • Statistics and their distributions • Distribution of the sample mean • Distribution of a linear combination Joint Mass Function of Two Discrete RVs Definition 1. Let X and Y be two discrete rvs defined on the sample space S of a random experiment. The joint probability mass function p(x, y) is defined for each pair of numbers (x, y) by: p(x, y) = P (X = x and Y = y) Let A be the set consisting of pairs of (x, y) values, then the probability P [(X, Y ) ∈ A] is obtained by summing the joint pmf pairs in A: XX P [(X, Y ) ∈ A] = p(x, y) (x,y) ∈A Example of Joint PMF Example 5.1.1 Exercise 5.3: A market has two check out lines. Let X be the number of customers at the express checkout line at a particular time of day. Let Y denote the number of customers in the super-express line at the same time. The joint pmf of (X, Y ) is given below: x =, y = 0 1 2 3 4 0 0.08 0.06 0.05 0.00 0.00 1 0.07 0.15 0.04 0.03 0.01 2 0.04 0.05 0.10 0.04 0.05 3 0.00 0.04 0.06 0.07 0.06 What is P (X = 1, Y = 0)? What is P (X = 1, Y > 2)? What is P (X = Y )? Purdue University Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 2 Lecture Notes Marginal Probability Mass Function Definition 2. The marginal probability mass functions of X and Y , denoted pX (x) and pY (y), respectively, are given by X pX (x) = P (X = x) = p(x, y) y pY (y) = P (Y = y) = X p(x, y) x Example of Marginal Probability Mass Function Example 5.1.1 Now let’s find the marginal mass function. x =, y = 0 1 2 3 4 p(y) 0 0.08 0.06 0.05 0.00 0.00 0.19 1 0.07 0.15 0.04 0.03 0.01 0.30 2 0.04 0.05 0.10 0.04 0.05 0.28 3 0.00 0.04 0.06 0.07 0.06 0.23 p(x) 0.19 0.30 0.25 0.14 0.12 1.00 What is P (X = 3)? What is P (Y = 2)? Joint Probability Density Function of Two Continuous RVs Definition 3. Let X and Y be continuous rv’s. Then f (x, y) is the joint probability density function for X and Y if for any two-dimensional set A: Z Z P [(X, Y ) ∈ A] = f (x, y)dxdy A In particular, if A is the two-dimensional rectangle {(x, y) : a ≤ x ≤ b, c ≤ x ≤ d}, Z bZ P [(X, Y ) ∈ A] = d f (x, y)dydx a c Joint Probability Density Function of Two Continuous RVs P [(X, Y ) ∈ A] = Volume under density surface above A Purdue University Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 3 Lecture Notes Marginal Probability Density Function Definition 4. The marginal probability density function of X and Y , denoted fX (x) and fY (y), respectively, are given by: Z ∞ f (x, y)dy, −∞ < X < ∞. fX (x) = −∞ ∞ Z f (x, y)dx, fY (y) = −∞ < Y < ∞. −∞ Independent Random Variables Definition 5 (Independence between X and Y ). Two random variables X and Y are said to be independent if for every pair of x and y values, p(x, y) = pX (x) · pY (y) when X and Y are discrete or f (x, y) = fX (x) · fY (y) when X and Y are continuous If X, Y are independent, we have: P (a < X < b, c < Y < d) = P (a < X < b) · P (c < Y < d) If the conditions are not satisfied for all (x, y) then X and Y are dependent. Example of Independence Example 5.1.2 X follows an exponential distribution with λ = 2, Y follows an exponential distribution with λ = 3, X and Y are independent, find f (x, y). f (x, y) = fX (x) · fY (y) = 2e−2x 3e−3y = 6e−(2x+3y) , x ≥ 0, y ≥ 0 Example 5.1.3 Toss a fair coin, and a die. Let X = 1 if coin is head, let X = 0 if coin is tail. Let Y be the outcome of the die. if X and Y are independent, find p(x, y) and find the probability that the outcome of the die is greater than 3 and the coin is a head? x=,y= 0 1 1 2 1 2 1 · 16 · 16 1 2 1 2 2 · 16 · 16 1 2 1 2 3 · 16 · 16 1 2 1 2 4 · 16 · 16 1 2 1 2 5 · 16 · 16 1 2 1 2 6 · 16 · 16 P (X = 1, Y > 3) = P (X = 1) · P (Y > 3) Examples Continued Example 5.1.4 Given the following p(x, y), is X and Y independent? x =, y = 0 1 2 3 4 p(y) Purdue University 0 0.04 0.08 0.16 0.04 0.08 0.4 1 0.03 0.06 0.12 0.03 0.06 0.3 2 0.01 0.02 0.04 0.01 0.02 0.1 3 0.02 0.04 0.08 0.02 0.04 0.2 p(x) 0.1 0.2 0.4 0.1 0.2 1.00 Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 4 Lecture Notes Example 5.1.5 Give fX (x) = 0.5x, 0 < x < 2, fY (y) = 3y 2 , 0 < y < 1, f (x, y) = 1.5xy 2 , 0 < x < 2 and 0 < y < 1, is X, Y independent? fX (x)fY (y) = 0.5x · 3y 2 = 1.5x · y 2 = f (x, y) More Than Two Random Variables If X1 , X2 , · · · , Xn are all discrete random variables, the joint pmf of the variables is the function p(x1 , · · · , xn ) = P (X1 = x1 , · · · , Xn = xn ) If the variables are continuous, the joint pdf is the function f such that for any n intervals [a1 , b1 ], · · · , [an , bn ], P (a1 ≤ X1 ≤ b1 , · · · , an ≤ Xn ≤ bn ) Z b1 Z bn f (x1 , · · · , xn )dxn · · · dx1 ··· = a1 an Independence – More Than Two Random Variables The random variables X1 , X2 , · · · , Xn are independent if for every subset Xi1 , Xi2 , · · · , Xin of the variables, the joint pmf or pdf of the subset is equal to the product of the marginal pmf’s or pdf’s. Conditional Distributions Definition 6. Let X, Y be two continuous rv’s with joint pdf f (x, y) and marginal pdfs fX (x) and fY (y). Then for any X value x for which fX (x) > 0, the conditional probability density function of Y given that X = x is: fY |X (y|x) = f (x, y) , fX (x) −∞ < y < ∞. If X and Y are discrete, replace pdf’s by pmf’s in this definition. That then gives conditional probability mass function of Y when X = x. Example of Conditional Mass Example 5.1.1 Joint mass is given below. What is the conditional mass function of Y , given X = 1? x =, y = 0 1 2 3 4 p(y) Purdue University 0 0.08 0.06 0.05 0.00 0.00 0.19 1 0.07 0.15 0.04 0.03 0.01 0.30 2 0.04 0.05 0.10 0.04 0.05 0.28 3 0.00 0.04 0.06 0.07 0.06 0.23 p(x) 0.19 0.30 0.25 0.14 0.12 1.00 Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 5 Lecture Notes Example 5.1.6 Given f (x, y) = 65 (x + y 2 ), 0 ≤ x ≤ 1, 0 < y < 1. fX (x) = 65 x + 25 . What is the conditional density of Y given X = 0? fY |X (y|0) = 5.2 6 2 y f (0, y) = 52 fX (0) 5 Expected Values, Covariance, and Correlation Expected Values Definition 7. Let X and Y be jointly distributed rvs with pmf p(x, y) or pdf f (x, y) according to whether the variables are discrete or continuous. Then the expected value of a function h(X, Y ), denoted E[h(X, Y )] or µh(x,y) is: µh(X, Y) = E [h(X, Y )] = P P x y h(x, y) · p(x, y), R∞ R∞ −∞ ∞ discrete; h(x, y) · f (x, y)dxdy, continuous. Examples of Expected Values • Example 5.2.1 The joint pmf is given below. E[max(X, Y )]? p(x, y) x=0 1 10 y=0 0.02 0.04 0.01 1 0.06 0.15 0.15 10 0.02 0.20 0.14 What is E(XY )? What is 20 0.10 0.10 0.01 • Example 5.2.2 Joint pdf of X and Y is: f (x, y) = 4xy, 0 < x < 1, 0 < y < 1. What is E(XY )? Covariance Definition 8. Let E(X) and E(Y ) denote the expectations of rv X and Y . The covariance between X and Y , denoted Cov(X, Y ) is defined as: Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))] i.e., = P P x y [x − E(X)] [y − E(Y )] p(x, y), R∞ R∞ −∞ ∞ discrete; (x − E(X))(y − E(Y ))f (x, y)dxdy, continuous. Properties of Covariance and Shortcut Formula • Cov(X, X) = V ar(X) • Cov(X, Y ) = Cov(Y, X) • Cov(aX, bY ) = abCov(X, Y ), Cov(X + a, Y + b) = Cov(X, Y ), i.e., Cov(aX + b, cY + d) = acCov(X, Y ) Purdue University Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 6 Lecture Notes • Shortcut formula: Cov(X, Y ) = E(XY ) − E(X)E(Y ) • If X and Y are independent, then Cov(X, Y ) = 0. However, Cov(X, Y ) = 0 does not imply independence. Interpretation of Covariance Similar to Variance, Covariance is a measure of variation. • Covariance measures how much two random variables vary together. – As opposed to variance: a measure of variation of a single rv. • If two rv’s tend to vary together, then the covariance between the two variables will be positive. – For example, when one of them is above its expected value, then the other variable tends to be above its expected value as well. • If two rv’s vary differently, then the covariance between the two variables will be negative. – For example, when one of them is above its expected value, the other variable tends to be below its expected value. Examples of Covariance • Example 5.2.1 Exercise The joint pmf is given below. What’s Cov(X, Y )? p(x, y) x=0 1 10 y=0 0.02 0.04 0.01 1 0.06 0.15 0.15 10 0.02 0.20 0.14 20 0.10 0.10 0.01 • Example 5.2.2 Joint pdf of X and Y is: f (x, y) = 4xy, 0 < x < 1, 0 < y < 1. What’s Cov(X, Y )? • Example 5.2.3 Given the pmf below, what’s Cov(X, Y )? p(x, y) x=0 1 pY (y) y=1 0.04 0.06 0.1 2 0.36 0.54 0.9 pX (x) 0.4 0.6 1 Correlation Definition 9. The correlation coefficient of two rv’s X and Y , denoted Corr(X, Y ), ρX,Y or just ρ is defined by: Cov(X, Y ) p Corr(X, Y ) = ρX,Y = p V ar(X) V ar(Y ) i.e., Cov(X, Y ) σX · σY are the std dev’s of X and Y , respectively. Corr(X, Y ) = ρX,Y = where σX and σY Purdue University Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 7 Lecture Notes Properties and Shortcut Formula of Correlation • For any two rv’s X and Y , −1 ≤ Corr(X, Y ) ≤ 1 • For a and c both positive or both negative, Corr(aX + b, cY + d) = Corr(X, Y ) • If X and Y are linearly related, i.e., Y = aX + b, then Corr(X, Y ) = ±1 • Shortcut formula: E(XY ) − E(X)E(Y ) p Corr(X, Y ) = p 2 E(X ) − (E(X))2 E(Y 2 ) − (E(Y ))2 • If X and Y are independent, Corr(X, Y ) = 0. However, Corr(X, Y ) = 0 does not imply independence. Interpretation of Correlation Correlation is a standardized measure. • Correlation coefficient indicates the strength and direction of a linear relationship between two rv’s. • X and Y approximately positively linearly related, Corr(X, Y ) will be close to 1. • X and Y approximately negatively linearly related, Corr(X, Y ) will be close to −1. • X and Y not linearly related, Corr(X, Y ) = 0. Especially, when X, Y not related, i.e., independent, Corr(X, Y ) = 0. Examples of Correlation • Example 5.2.1 Exercise The joint pmf is given below. Find Cov(X, Y ). p(x, y) x=0 1 10 y=0 0.02 0.04 0.01 1 0.06 0.15 0.15 10 0.02 0.20 0.14 20 0.10 0.10 0.01 • Example 5.2.2 Joint pdf of X and Y is: f (x, y) = 4xy, 0 < x < 1, 0 < y < 1. Find Corr(X, Y ). • Example 5.2.3 Given the pmf below, what’s Corr(X, Y )? p(x, y) x = −1 0 1 Purdue University y = −1 1 3 1 9 1 9 1 9 1 9 1 9 1 9 1 9 1 9 1 9 Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 5.3 8 Lecture Notes Statistics and their Distributions Statistic Definition 10 (Statistic). A statistic is any quantity whose value can be calculated from sample data. Or, a statistic is a function of random variables. We denote a statistic by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic. Examples: • Two rv X1 and X2 , denote X̄ = X1 +X2 , 2 X̄ is a statistic. • 5 rv’s X1 , X2 , · · · , X5 , denote Xmax = max(X1 , X2 , · · · , X5 ), Xmax is a statistic. 2+ Independent RV’s Definition 11 (Joint pmf and pdf for more than two rv’s). For n rv’s X1 , X2 , · · · , Xn , the joint pmf is: p(x1 , x2 , · · · , xn ) = P (X1 = x1 , X2 , = x2 , · · · , Xn = xn ) and the joint pdf is for any intervals [a1 , b1 ], · · · , [an , bn ]: Z b1 Z bn f (x1 , · · · , xn )dx1 · · · dxn P (a1 ≤ X1 ≤ b1 , ..., an ≤ Xn ≤ bn ) = ··· a1 an Definition 12 (Independence of more than two rv’s). The random variables X1 , X2 , · · · , Xn are said to be independent if for any subset of Xi ’s, the joint pmf or pdf is the product of the marginal pmf or pdf’s. Random Samples Definition 13 (Random Sample). The rv’s X1 , X2 , · · · , Xn are said to form a (simple) random sample of size n if: 1. The Xi ’s are independent rv’s. 2. Every Xi has the same probability distribution. Such Xi ’s are said to be independent and identically distributed. • Example: Let X1 , X2 , · · · , Xn be a random sample from standard normal, i.e., X1 follows N (0, 1), X2 follows N (0, 1), ... ... Xn follows N (0, 1) And, X1 , X2 , · · · , Xn are independent. Examples of Statistic of a Random Sample • Example 5.3.1 Sample mean Take a random sample of size n from a specific distribution (say standard normal). X̄ = X1 +X2n+···+Xn is the random variable sample mean. X̄ = X1 + X2 + · · · + Xn , is a statistic n If X1 = x1 , X2 = x2 , · · · , Xn = xn , x̄ = Purdue University x1 + x2 + · · · + xn is a value of the rv X̄ n Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 9 Lecture Notes • Example 5.3.2 Sample variance Take a random sample of size n from a specific distribution(say standard normal). X̄ = X1 +X2n+···+Xn is the random variable sample mean. P (Xi − X̄)2 S2 = , is a statistic n−1 If X1 = x1 , X2 = x2 , · · · , Xn = xn , P (xi − x̄)2 2 s = , is a value of the rv S 2 n−1 Deriving the Sampling Distribution of a Statistic Definition 14 (Sampling Distribution). A statistic is a random variable, the distribution of a statistic is called the sampling distribution of the statistic. We may use probability rules to obtain the sampling distribution of a statistic. Example 5.3.3 Example 5.20 in textbook. A large automobile service center charges $40, $45, and $50 for a tune-up of four-, six-, and eight-cylinder cars, respectively. 20% of the tune-ups are done for four-cylinder cars, 30% for six-cylinder cars and 50% for eight-cylinder cars. Let X be the service charge for a single tune-up. Then the distribution of X is: x p(x) 40 0.2 45 0.3 50 0.5 Now let X1 and X2 be the service charges of two randomly selected tune-ups. Find the distribution of: 1. X̄ = X1 +X2 2 2. S 2 = (X1 −X̄)2 +(X2 −X̄)2 2−1 Example 5.3.3 x1 40 40 40 45 45 45 50 50 50 5.4 x2 40 45 50 40 45 50 40 45 50 p(x1 , x2 ) 0.04 0.06 0.10 0.06 0.09 0.15 0.10 0.15 0.25 x̄ 40 42.5 45 42.5 45 47.5 45 47.5 50 s2 0 12.5 50 12.5 0 12.5 50 12.5 0 The Distribution of the Sample Mean Sampling Distribution of Sample Sum and Sample Mean Proposition (Mean and Std Dev of Sample Sum). Let X1 , X2 , ..., Xn be a random sample from a distribution with mean µ and standard deviation σ. Sample sum is To = X1 + X2 + ... + Xn . Then: Purdue University Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 Lecture Notes 10 1. E(To ) = nµ 2. V ar(To ) = nσ 2 and σTo = √ nσ Proposition (Mean and Std Dev of Sample Mean). Let X1 , X2 , ..., Xn be a random sample from a distribution with mean µ and standard deviation σ. Sample mean is X̄ = X1 +X2n+...+Xn . Then: 1. E X̄ = µ 2 2. V ar X̄ = σn and σX̄ = √σn Sample variance S 2 related sampling distribution will be introduced in Chapter 7. The Case of Normal Population Distribution Proposition. Let X1 , X2 , ..., Xn be a random sample from a normal distribution with mean µ and standard deviation σ (N (µ, σ)). Then To and X̄ both follow normal distributions: √ 1. To ∼ N (nµ, nσ) 2. X̄ ∼ N (µ, √σn ) Example 5.4.1 Example 5.25. Let X be the time it takes a rat to find its way through a maze. X ∼ N (µ = 1.5, σ 2 = 0.352 ) (in minutes). Suppose five rats are randomly selected. Let X1 , X2 , · · · , X5 denote their times in the maze. Assume X1 , X2 , · · · , X5 be a random sample from N (µ = 1.5, σ 2 = 0.352 ). Let total time To = X1 + X2 + · · · + X5 , average time X̄ = X1 +X25+···+X5 . What is the probability that the total time of the 5 rats is between 6 and 8 minutes? What is the probability that the average time is at most 2.0 minutes? Example 5.4.1 Continued... T0 ∼ N (5 · 1.5 = 7.5, 5 · 0.352 = 0.6125) So, 8 − 7.5 6 − 7.5 P (6 < To < 8) = P ( √ <Z< √ 0.6125 0.6125 P (6 < To < 8) = Φ(0.64) − Φ(−1.92) = 0.7115 X̄ ∼ N (1.5, So, P (X̄ ≤ 2.0) = P (Z <≤ 0.352 ) 5 2.0 − 1.5 √ ) = P (Z ≤ 3.19) 0.35/ 5 P (X̄ ≤ 2.0) = Φ(3.19) = 0.9993 Purdue University Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 Lecture Notes 11 Central Limit Theorem Theorem 15. Let X1 , X2 , · · · , Xn be a random sample from a distribution with mean µ and variance σ 2 . Then if n is sufficiently large, X̄ has approximately a normal 2 = σ 2 . T then has approximately a normal distribution distribution with µX̄ = µ and σX̄ o n with µTo = nµ and σT2o = nσ 2 . The larger the value of n, the better the approximation. Rule of Thumb: If n > 30, the Central Limit Theorem can be used. Central Limit Theorem Central Limit Theorem Examples • Example 5.5.1 Example 5.26 in textbook. The amount of a particular impurity in a batch of some chemical product is a random variable with mean 4.0g and standard deviation 1.5g. If 50 batches are independently prepared, what is the (approximate) probability that the sample average amount of impurity X̄ is between 3.5g and 3.8g? • Example 5.5.2 Example 5.27 in textbook. The number of major defects for a certain model of automobile is a random variable with mean 3.2 and standard deviation 2.4. Among 100 randomly selected cars of this model, how likely is it that the average number of major defects exceeds 4? How likely is it that the number of major defects of all 100 cars exceeds 20? 5.5 The Distribution of a Linear Combination Linear Combinations of RV’s Purdue University Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati) STAT511 — Spring 2014 12 Lecture Notes Definition 16. Given a collection of n random variables X1 , X2 , · · · , Xn and n numerical constants a1 , a2 , · · · , an , the rv: Y = a1 X1 + a2 X2 + · · · + an Xn = n X ai Xi i=1 is called a linear combination of the Xi0 s. 1. Sample sum To = X1 + X2 + · · · + Xn is a linear combination with a1 = a2 = · · · = an = 1. 2. Sample mean X̄ = an = n1 . X1 +X2 +···+Xn n is a linear combination with a1 = a2 = · · · = Mean and Variance of Linear Combinations Proposition. Let X1 , X2 , · · · , Xn have expectations µ1 , µ2 , · · · , µn respectively and variances σ12 , σ22 , · · · , σn2 respectively. Let Y be the linear combination of Xi0 s, Y = a1 X1 + a2 X2 + · · · + an Xn . Then: 1. E(Y ) = a1 µ1 + a2 µ2 + · · · + an µn . P P 2. V ar(Y ) = ni=1 nj=1 ai aj Cov(Xi , Xj ) 3. If X1 , X2 , · · · , Xn are independent, V ar(Y ) = a21 σ12 + a22 σ22 + · · · + a2n σn2 . Corollary 17. If X1 , X2 are independent, then E(X1 − X2 ) = E(X1 ) − E(X2 ) and V ar(X1 − X2 ) = V ar(X1 ) + V ar(X2 ). The Case of Normal Random Variables Proposition. If X1 , X2 , · · · , Xn are independent,normally distributed rv’s, with means µ1 , µ2 , · · · , µn and variances σ12 , σ22 , · · · , σn2 , then any linear combination of the Xi0 s Y = a1 X1 + a2 X2 + · · · + an Xn also has a normal distribution. Y ∼ N (a1 µ1 + a2 µ2 + · · · + an µn , a21 σ12 + a22 σ22 + · · · + a2n σn2 ) Example Example 5.5.3 Example 5.30 in textbook. Three grades of gasoline are priced at $1.20, $1.35 and $1.50 per gallon, respectively. Let X1 , X2 and X3 denote the amounts of these grades purchased (gallons) on a particular day. Suppose Xi ’s are independent and normally distributed with µ1 = 1000, µ2 = 500, µ3 = 100, σ1 = 100, σ2 = 80, σ3 = 50. The total revenue of the sale of the three grades of gasoline on a particular day is Y = 1.2X1 + 1.35X2 + 1.5X3 . Find the probability that total revenue exceeds $2500. Purdue University Chapter5˙print.tex; Last Modified: February 19, 2014 (W. Sharabati)