Stat 330 (Spring 2015): slide set 11 1st processor 4001 4002 450 x x x x x x x x x x x x 5001 x x x x 5002 x x x x - 2 Since we draw at random, we assume that each of the above combinations is equally likely. This yields the following probability mass function: In total we have 5 · 4 = 20 possible combinations. 2nd processor Ω 4001 4002 450 5001 5002 Summary: For a sample space we can draw a table of all the possible combinations of processors. We will distinguish between processors of the same speed by using the subscripts 1 and 2 Example (Cont’d) Last update: January 28, 2015 Stat 330 (Spring 2015) Slide set 11 Stat 330 (Spring 2015): slide set 11 1st proc. mHz 400 450 500 2nd 400 0.1 0.1 0.2 processor 450 500 0.1 0.2 0.0 0.1 0.1 0.1 Solution: = 0.1 + 0 + 0.1 = 0.2. 3 P (X = Y ) = pX,Y (400, 400) + pX,Y (450, 450) + pX,Y (500, 500) This might be important if we wanted to match the chips to assemble a dual processor machine Question 1: What is the probability for X = Y ? Probabilities: Example (Cont’d) Stat 330 (Spring 2015): slide set 11 1 Select two processors out of the box (without replacement) and let X = speed of the first selected processor Y = speed of the second selected processor pX,Y (x, y) := P (X = x ∩ Y = y) Example: A box contains 5 unmarked PowerPC G4 processors of different speed no 2 400 mHz speeds: 1 450 mHz 2 500 mHz 2 variables case: Consider the two variables: X, Y are two discrete variables. The joint probability mass function is defined as Motivation: Real problems very seldom concern a single random variable. As soon as more than 1 variable is involved it is not sufficient to think of modeling them only individually - their joint behavior is important. Compound p.m.f. Stat 330 (Spring 2015): slide set 11 = 0.1 + 0.2 + 0.1 = 0.4. P (X > Y ) = pX,Y (450, 400) + pX,Y (500, 400) + pX,Y (500, 450) x,y 400 450 500 (mHz) 0.4 0.2 0.4 h(x, y)pX,Y (x, y) x,y |x − y|pX,Y (x, y) = = 0 + 5 + 20 + 5 + 0 + 5 + 20 + 5 + 0 = 60. 6 |500 − 400| · 0.2 + |500 − 450| · 0.1 + |500 − 500| · 0.1 = |450 − 400| · 0.1 + |450 − 450| · 0.0 + |450 − 500| · 0.1 + = |400 − 400| · 0.1 + |400 − 450| · 0.1 + |400 − 500| · 0.2 + E[|X − Y |] = Correlation: The correlation between two variables X and Y is Cov(X, Y ) ρ := V ar(X) · V ar(Y ) 7 Remark: This definition looks very much like the definition for the variance of a single random variable. In fact, if we set Y := X in the above definition, then Cov(X, X) = V ar(X). Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] Covariance between X and Y : Using the above definition of expected value of h(X, Y ) gives us: Stat 330 (Spring 2015): slide set 11 Stat 330 (Spring 2015): slide set 11 Motivation: For two variables we can measure how “similar” their values are: 5 4 Question 3: What is E[|X − Y |] (the average speed difference)? Continued Example: Let X, Y be as before. E[h(X, Y )] := Expected value for a function of several variables: y pY (y) Solution: Here, we have the situation E[|X − Y |] = E[h(X, Y )], with h(X, Y ) = |X − Y |. pX,Y (x, y) pX,Y (x, y) 400 450 500 (mHz) 0.4 0.2 0.4 Recall: Expected value of X is ....? x pX (x) Example For the previous example the marginal probability mass functions are Statistics for multivariate case x y Stat 330 (Spring 2015): slide set 11 Previous example again... Example (Cont’d) pY (y) = pX (x) = Marginal p.m.f.: Go from joint probability mass functions to individual pmfs: Solution: This is asking: What is the probability that the first processor has higher speed than the second? Question 2: What is the probability for X > Y ? Example (Cont’d) and Marginal p.m.f. X 10 Note: Random variables are independent, if all events of the form {X = x} and {Y = y} are independent. Need pX,Y (x, y) = pX (x) · pY (y) for all possible combinations of x and y. 11 To prove that two variables are independent we need to check pX,Y (x, y) = pX (x) · pY (y) for all possible combinations of x and y.! That means we need to find only one contradiction to prove that two variables are not independent! Answer: Therefore, X and Y are not independent! pX,Y (450, 450) = 0 = 0.2 · 0.2 = pX (450) · pY (450). Answer to the question: = 250 + 0 − 500 + 0 + 0 + 0 − 500 + 250 + 0 = −500. · · · + +(500 − 450)(500 − 450) · 0.1 = Stat 330 (Spring 2015): slide set 11 Definition: Two random variables X and Y are independent, if their joint probability pX,Y is equal to the product of the marginal densities pX · pY . Recall: What is independence of events? Question: Are X and Y independent? (x − E[X])(y − E[Y ])pX,Y (x, y) = Stat 330 (Spring 2015): slide set 11 Check x,y = (400 − 450)(400 − 450) · 0.1 + (450 − 450)(400 − 450) · 0.1 + Cov(X, Y ) = The covariance between X and Y is: • E[X] = E[Y ] = 450 • V ar[X] = V ar[Y ] = 2000 Solution: Use marginal pmfs to compute: 9 −500 ρ= = = −0.25, 2000 V ar(X)V ar(Y ) Cov(X, Y ) Example (Cont’d) ρ indicates a weak negative (linear) association. ρ therefore is Stat 330 (Spring 2015): slide set 11 In our previous Example: What is ρ(X, Y ) in our box with five chips? Example (Cont’d) 8 ρ near ±1 indicates a strong linear relationship, ρ near 0 indicates lack of linear association. of • function ρ is a measure of linear association between X and Y . linear • ρ = 1 or -1, Y is a ρ=1 → Y = aX + b with a > 0, ρ = −1 → Y = aX + b with a < 0, if • ρ is between -1 and 1 Facts and inference about correlation: • Stat 330 (Spring 2015): slide set 11 Statistics for multivariate (cont’d) Stat 330 (Spring 2015): slide set 11 V ar[aX + bY + c] = a2V ar[X] + b2V ar[Y ] + 2ab · Cov(X, Y ) 12 Theorem: For two random variables X and Y and three real numbers a, b, c holds: V ar[X + Y ] = V ar[X] + V ar[Y ] E[X · Y ] = = E[X] · E[Y ] Theorem: If two random variables X and Y are independent, More theorems on statistics