STAT355 - Probability & Statistics Chapter 5: Joint Probability Distributions and Random Samples Fall 2011 STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint1 Probab / 34 Chap 5 - Joint Probability Distributions and Random Samples 1 5.1 Jointly Distributed Random Variables 2 5.2 Expected Values, Covariance, and Correlation 3 5.3 Statistics and Their Distributions 4 5.4 The Distribution of the Sample Mean 5 5.5 The Distribution of a Linear Combination STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint2 Probab / 34 Two Discrete Random Variables I The probability mass function (pmf) of a single discrete rv X specifies how much probability mass is placed on each possible X value. I The joint pmf of two discrete rvs X and Y describes how much probability mass is placed on each possible pair of values (x, y ). STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint3 Probab / 34 Two Discrete Random Variables Definition I Let X and Y be two discrete rvs defined on the sample space S of an experiment. The joint probability mass function p(x, y ) is defined for each pair of numbers (x, y ) by p(x, y ) = P(X = x and Y = y ) I The marginal probability mass function of X , denoted by pX (x), is given by X pX (x) = p(x, y ) for each possible value x y I Similarly, the marginal probability mass function of Y is X pY (y ) = p(x, y ) for each possible value y . x STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint4 Probab / 34 Two Discrete Random Variables - Remarks I It must be the case that p(x, y ) ≥ 0 and P P x y p(x, y ) = 1. I Let A be any set consisting of pairs of (x, y ) values (e.g., A = (x, y ) : x + y = 5 or (x, y ) : max(x, y ) ≤ 3). Then the probability P[(X , Y ) ∈ A] is obtained by summing the joint pmf over pairs in A: XX P[(X , Y ) ∈ A] = p(x, y ) (x,y ) ∈A STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint5 Probab / 34 Two Discrete Random Variables - Examples A large insurance agency services a number of customers who have purchased both a homeowner’s policy and an automobile policy from the agency. For each type of policy, a deductible amount must be specified. For an automobile policy, the choices are $100 and $250, whereas for a homeowner’s policy, the choices are 0, $100, and $200. Suppose an individual with both types of policy is selected at random from the agency’s files. Let X = the deductible amount on the auto policy, and Y = the deductible amount on the homeowner’s policy. Possible (X , Y ) pairs are then (100, 0), (100, 100), (100, 200), (250, 0), (250, 100), and (250, 200); the joint pmf specifies the probability associated with each one of these pairs, with any other pair having probability zero. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint6 Probab / 34 Two Discrete Random Variables - Examples Suppose the joint pmf is given in the accompanying joint probability table: y p(x,y) x 100 250 0 0.2 0.05 100 0.10 0.15 200 0.20 0.30 Then p(100, 100) = P(X = 100andY = 100) = P($100 deductible on both policies) = .10. The probability P(Y ≥ 100) is computed by summing probabilities of all (x, y ) pairs for which y ≥ 100: P(Y ≥ 100) = p(100, 100)+p(250, 100)+p(100, 200)+p(250, 200) = 0.75 STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint7 Probab / 34 Two Discrete Random Variables - Examples The possible X values are x = 100 and x = 250, so computing row totals in the joint probability table yields pX (100) = p(100, 0) + p(100, 100) + p(100, 200) = .50 and pX (250) = p(250, 0) + p(250, 100) + p(250, 200) = .50 The marginal pmf of X is then 0.5 pX (x) = 0 if x = 100, 200 otherwise . And the marginal for Y is 0.25 0.50 pY (y ) = 0 if y = 0, 100 if y = 200 otherwise . STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint8 Probab / 34 Two Continuous Random Variables I The probability that the observed value of a continuous rv X lies in a one-dimensional set A (such as an interval) is obtained by integrating the pdf f (x) over the set A. I Similarly, the probability that the pair (X , Y ) of continuous rv’s falls in a two-dimensional set A (such as a rectangle) is obtained by integrating a function called the joint density function. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint9 Probab / 34 Two Continuous Random Variables Definition Let X and Y be continuous rv’s. A joint probability density function fR (x, yR) for these two variables is a function satisfying f (x, y ) ≥ 0 and ∞ ∞ −∞ −∞ f (x, y )dxdy = 1. Then for any two-dimensional set A Z Z P[(X , Y ) ∈ A] = f (x, y )dxdy A In particular, if A is the two-dimensional rectangle {(x, y ) : a ≤ x ≤ b, c ≤ y ≤ d}, then Z b Z P[(X , Y ) ∈ A] = P(a ≤ X ≤ b, c ≤ Y ≤ d) = f (x, y )dxdy a STAT355 () - Probability & Statistics d c Chapter Fall 2011 5: Joint 10 Probab / 34 Two Continuous Random Variables Definition The marginal probability density functions of X and Y , denoted by fX (x) and fY (y ), respectively, are given by Z ∞ fX (x) = f (x, y )dy for − ∞ ≤ x ≤ ∞ (1) −∞ Z ∞ fY (y ) = f (x, y )dx for − ∞ ≤ y ≤ ∞ (2) −∞ STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 11 Probab / 34 Two Continuous Random Variables - Examples A bank operates both a drive-up facility and a walk-up window. On a randomly selected day, let X = the proportion of time that the drive-up facility is in use (at least one customer is being served or waiting to be served) and Y = the proportion of time that the walk-up window is in use. Then the set of possible values for (X , Y ) is the rectangle D = {(x, y ) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. Suppose the joint pdf of (X , Y ) is given by 6 2 if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 5 (x + y ) f (x, y ) = 0 otherwise . STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 12 Probab / 34 Two Continuous Random Variables - Examples Suppose the joint pdf of (X , Y ) is given by 6 2 if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 5 (x + y ) f (x, y ) = 0 otherwise . Verify that this is a legitimate pdf 1 f (x, y ) ≥ 0 R∞ R∞ 2 −∞ −∞ f (x, y )dxdy = 1 STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 13 Probab / 34 Independent Random Variables Definition Two random variables X and Y are said to be independent if for every pair of x and y values p(x, y ) = pX (x)pY (y ) when X and Y are discrete or (3) f (x, y ) = fX (x)fY (y ) when X and Y are continuous If (3) is not satisfied for all (x, y ), then X and Y are said to be dependent. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 14 Probab / 34 Two Continuous Random Variables - Examples In the insurance situation p(100, 100) = .10 6= (.5)(.25) = pX (100)pY (100) so X and Y are not independent. Independence of two random variables is most useful when the description of the experiment under study suggests that X and Y have no effect on one another. Then once the marginal pmfs or pdfs have been specified, the joint pmf or pdf is simply the product of the two marginal functions. It follows that P(a ≤ X ≤ b, c ≤ Y ≤ d) = P(a ≤ X ≤ b)P(c ≤ Y ≤ d) STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 15 Probab / 34 Conditional Distributions Definition Let X and Y be two continuous rvs with joint pdf f (x, y ) and marginal X pdf fX (x). Then for any X value x for which fX (x) > 0, the conditional probability density function of Y given that X = x is fY |X (y |x) = f (x, y ) fX (x) −∞<y <∞ STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 16 Probab / 34 Exercise (5.1) 13 You have two lightbulbs for a particular lamp. Let X = the lifetime of the first bulb and Y = the lifetime of the second bulb (both in 1000s of hours). Suppose that X and Y are independent and that each has an exponential distribution with parameter λ = 1. 1 What is the joint pdf of X and Y ? 2 What is the probability that each bulb lasts at most 1000 hours (i.e. X ≤ 1 and Y ≤ 1)? 3 What is the probability that the total lifetime of the two bulbs is at most 2? [Hint: Draw a picture of the region A = {(x, y ) : x ≥ 0, y ≥ 0, x + y ≤ 2} before integrating.] 4 What is the probability that the total lifetime is between 1 and 2? STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 17 Probab / 34 Expected Values Proposition Let X and Y be jointly distributed rv’s with pmf p(x, y ) or pdf f (x, y ) according to whether the variables are discrete or continuous. Then the expected value of a function h(X , Y ), denoted by E [h(X , Y )] or µh(X ,Y ) , is given by P P h(x, y )p(x, y ) if X and Y are discrete E [h(X , Y )] = R ∞ Rx∞ y if X and Y are continuous −∞ −∞ h(x, y )f (x, y )dxdy STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 18 Probab / 34 Expected Values - Example The joint pdf of the amount X of almonds and amount Y of cashews in a 1-lb can of nuts was 24xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, x + y ≤ 1 f (x, y ) = 0 otherwise If 1 lb of almonds cost the company $100, 1 lb of cashews costs $1.50, and 1 lb of peanuts costs $0.50, then the cost of the contents of a can is h(X , Y ) = (1)X + (1.5)Y + (0.5)(1 − X − Y ) = 0.5 + 0.5X + Y The expected total cost is Z Z E [h(X , Y )] = h(x, y )f (x, y )dxdy Z 1Z = 1−x (0.5 + 0.5x + y )24xydxdy 0 0 STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 19 Probab / 34 Covariance I When two random variables X and Y are not independent, it is frequently of interest to assess how strongly they are related to one another. Definition The covariance between two rv’s X and Y is Cov (X , Y ) = E [(X − µX )(Y − µY )] P P (x − µX )(y − µY )p(x, y ) R ∞ Rx∞ y = −∞ −∞ (x − µX )(y − µY )f (x, y )dxdy X , Y discrete X , Y cont. The following shortcut formula for Cov (X , Y ) simplifies the computations. Proposition Cov (X , Y ) = E (XY ) − µX µY STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 20 Probab / 34 Covariance I Since X − µX and Y − µY are the deviations of the two variables from their respective mean values, the covariance is the expected product of deviations. Remarks: 1 Cov (X , X ) = E [(X − µX )2 ] = V (X ). 2 If X and Y have a strong positive relationship to one another then Cov (X , Y ) should be quite positive. 3 For a strong negative relationship, Cov (X , Y ) should be quite negative. 4 If X and Y are not strongly related, Cov (X , Y ) is near 0. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 21 Probab / 34 Correlation Definition The correlation coefficient of X and Y , denoted by Corr (X , Y ), ρX ,Y , or just ρ, is defined by Cov (X , Y ) ρX ,Y = σX σY where σX and sigmaY are the standard deviations of X and Y . Proposition If a and c are either both positive or both negative, Corr (aX + b, cY + d) = Corr (X , Y ) For any two rv’s X and Y , 1 ≤ Corr (X , Y ) ≤ 1. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 22 Probab / 34 Correlation Proposition 1 If X and Y are independent, then ρX ,Y = 0, but ρ = 0 does not imply independence. 2 ρ = 1 or −1 iff Y = aX + b for some numbers a and b with a 6= 0. I This proposition says that ρ is a measure of the degree of linear relationship between X and Y , and only when the two variables are perfectly related in a linear manner will ρ be as positive or negative as it can be. I A ρ less than 1 in absolute value indicates only that the relationship is not completely linear, but there may still be a very strong nonlinear relation. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 23 Probab / 34 Exercise (5.2) 27 Annie and Alvie have agreed to meet for lunch between noon (0:00pm) and 1:00pm. Denote Annie’s arrival time by X , Alvie’s by Y , and suppose X and Y are independent with pdf’s 3x 2 0 ≤ x ≤ 1 fX (x) = 0 otherwise 2y 0 ≤ y ≤ 1 fY (y ) = 0 otherwise What are the expected amount of time that the one who arrives first must wait for the other person? [Hint: h(X , Y ) = |X − Y |] STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 24 Probab / 34 Exercise (5.2) 35 1 2 3 Use the rules of expected value to show that Cov (aX + b, cY + d) = ac Cov (X , Y ). Use part 1. along with the rules of variance and standard deviation to show that Corr (aX + b, cY + d) = Corr (X , Y ) when a and c have the same sign. What happens if a and c have opposite sign. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 25 Probab / 34 Random Samples Definition A statistic is any quantity whose value can be calculated from sample data. I A statistic is a random variable and will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic. Definition The rv’s X1 , X2 , ..., Xn are said to form a (simple) random sample of size n if 1 The Xi ’s are independent rvs. 2 Every Xi has the same probability distribution. A random sample Xi , i = 1, ..., n is sometimes referred to as iid (independent and identically distributed). STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 26 Probab / 34 Exercise (5.3) 39 It is known that 80% of all brand A zip drives work in a satisfactory manner throughout the warranty period (are ”successes”). Suppose that n = 10 drives are randomly selected. Let X = the number of successes in the sample. The statistic X /n is the sample proportion (fraction) of successes. Obtain the sampling distribution of this statistic. [Hint: One possible value of X /n is 0.3. What is the probability of this value (what kind of random variable is X )? STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 27 Probab / 34 The Distribution of the Sample Mean Notation: Let X1 , ..., Xn be an iid rv’s. The sample mean is denoted by X̄ = n X Xi i=1 Proposition Let X1 , X2 , ..., Xn be a random sample from a distribution with mean value µ and standard deviation σ. Then 1 2 E (X̄ ) = µX̄ = µ √ V (X̄ ) = σX̄2 = σ 2 /n and σX̄ = σ/ n In addition, with T0 = X1 + ... + Xn , E (T0 ) = nµ. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 28 Probab / 34 The Distribution of the Sample Mean The Central Limit Theorem (CLT) Theorem Let X1 , X2 , ..., Xn be a random sample from a distribution with mean µ and variance σ 2 . Then if n is sufficiently large, X̄ has approximately a normal distribution with mean µX̄ and variance σX̄2 = σ 2 /n and T0 also has approximately a normal distribution with mean µT0 = nµ and variance σT2 0 = nσ 2 . Remark: The larger the value of n, the better the approximation. Rule of Thumb: If n > 30, the Central Limit Theorem can be used. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 29 Probab / 34 CLT - Example The CLT can be used to justify the normal approximation to the binomial distribution discussed earlier. We know that a binomial variable X is the number of successes in a binomial experiment consisting of n independent success/failure trials with p = P(S) for any particular trial. Define a new rv X1 by 1 if the first trial results in a success X1 = 0 if the first trial results in a failure and define X2 , X3 , ..., Xn analogously for the other n1 trials. Each Xi indicates whether or not there is a success on the corresponding trial. Because the trials are independent and P(S) is constant from trial to trial, the Xi s are iid (a random sample from a Bernoulli distribution).The CLT then implies that if n is sufficiently large, both the sum and the average of the Xi ’s have approximately normal distributions. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 30 Probab / 34 Exercise (5.4) 55 The number of parking tickets issued in a certain city on any given weekday has a Poisson distribution with parameter µ = 50. What is the approximate probability that 1 between 35 and 70 tickets are given out on a particular day? [Hint: When µ is large, a Poisson rv has approximately a normal distribution.] 2 The total number of tickets given out during a 5-day week is between 225 and 175? STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 31 Probab / 34 The Distribution of a Linear Combination Definition Given a collection of n random variables X1 , ..., Xn and n numerical constants a1 , ..., an , the rv Y = a1 X1 + ... + an Xn is called a linear combination of the Xi ’s. STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 32 Probab / 34 The Distribution of a Linear Combination Proposition Let X1 , X2 , ..., Xn have mean values µ1 , ..., µn , respectively, and variances σ12 , ..., σn2 respectively. 1 Whether or not the Xi ’s are independent, E (a1 X1 + ... + an Xn ) = a1 E (X1 ) + ... + an E (Xn ) = a1 µ1 + ... + an µn 2 If X1 , ..., Xn are independent, V (a1 X1 + ... + an Xn ) = a12 V (X1 ) + ... + an2 V (Xn ) = a12 σ12 + ... + an2 σn2 3 For any X1 , ..., Xn , V (a1 X1 + ... + an Xn ) = n X n X ai aj Cov (Xi , Xj ) i=1 j=1 STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 33 Probab / 34 Exercise (5.5) 73 Suppose the expected tensile strength of type-A steel is 105 ksi and the standard deviation of tensile strength is 8 ksi. For type-B steel, suppose the expected tensile strength and standard deviation of tensile strength are 100 ksi and 6 ksi, respectively. Let X̄ = the sample average tensile strength of a random sample of 40 type-A specimens, and let Ȳ = the sample average tensile strength of a random sample of 35 type-B specimens. 1 What is the approximate distribution of X̄ ?, Of Ȳ ? 2 What is the approximate distribution of X̄ − Ȳ ? Justify your answer. 3 Calculate (approximately) P(−1 ≤ X̄ − Ȳ ≤ 1) 4 Calculate P(X̄ − Ȳ ≥ 10). If you actually observed X̄ − Ȳ ≥ 10, would you doubt that µ1 − µ2 = 5? STAT355 () - Probability & Statistics Chapter Fall 2011 5: Joint 34 Probab / 34