4.1 Chapter 4: Mathematical Expectation Diagram from Chapter 1: Take Sample Sample Inference Population In Chapter 3, we learned about some PDFs and how they are used to quantify the distribution of items in a population. In this chapter, we are going to summarize information in the population relative to the PDFs. Specifically, we are going to look at the “expected value” of a random variable and find it using its PDF. These expected values are going to represent parameters. Thus, they will summarize items in the population. 2005 Christopher R. Bilder 4.2 4.1: Mean of a Random Variable Given the use of a PDF for a population, we may want to know what value of X we would expect on average to obtain. This is found through the expected value of X, E(X). Definition 4.1: Let X be a random variable with PDF f(x). The population mean or expected value of X is = E(X) = x f(x) x when X is discrete, and = E(X) = x f(x)dx when X is continuous. Notes: o In calculus you learned about something similar. It may have been called “moment about the y-axis” (Faires and Faires, 1988). o You will often see written as X to emphasize that the expected value is for the random variable X. This notation is helpful when there are multiple random variables. 2005 Christopher R. Bilder 4.3 o In the discrete case, you can think of as a weighted average. The weight for each x is f(x) = P(X=x). o is a parameter since it summarizes possible values of a population. Example: Let’s Play Plinko! (plinko.xls) Let X be a random variable denoting the amount won for one chip. Below are 5 different PDFs for the amount won. x $100 $500 $1,000 $0 $10,000 $100 f(x) 0.1667 0.3571 0.2976 0.1429 0.0357 $850.00 Drop Plinko chip in column above: $500 $1,000 $0 f(x) f(x) f(x) 0.1339 0.0822 0.0359 0.3080 0.2204 0.1287 0.2991 0.2862 0.2545 0.1920 0.2796 0.3713 0.0670 0.1316 0.2096 $1,136.16 $1,720.39 $10,000 f(x) 0.0176 0.0882 0.2353 0.4118 0.2471 $2,418.26 $2,751.76 For example, = E(X) = x f(x) 1000.1667 + xT 5000.3571 + 10000.2976 + 00.1429 + 100000.0357 = 850 where T = {100, 500, 1000, 0, 10000}. 2005 Christopher R. Bilder 4.4 This is a good example of why you can think of as a weighted average. Questions: What is the optimal place to drop the chip in order to maximize your expected winnings? Compare what actually occurred in 2001 to what is expected: x $100 $500 $1,000 $0 $10,000 Drop Plinko chip in column above: $100 $500 $1,000 $0 $10,000 Count Count Count Count Count 0 3 2 5 2 0 3 12 5 6 1 3 12 19 11 0 2 9 21 8 0 1 4 9 4 Total chips 1 12 39 59 Average won $1,000 $1,233 $1,492 $1,898 31 $1,748 Why are these averages different from what we would expect? o The sample size is small. As the sample size gets bigger, we would expect the sample average to approach the population mean. o There is variability from one sample to the next. This implies that we would not expect the same number of observed values for the years of 2002, 2003, … when 2005 Christopher R. Bilder 4.5 the game is played. More will be discussed about variability later. o Possibly one of the underlying assumptions behind the calculation of the probabilities is incorrect. While we did not discuss these assumptions in class, the main one is that the probability of going to the left or right ONE slot is 0.5 as a plinko chip hits a peg on the board. If you watch the plinko game, you will notice that chips can often go more than ONE slot to the left or right after hitting a peg. Example: Tire life (chapter4_calc_rev.mws) The number of miles an automobile tire lasts before it reaches a critical point in tread wear can be represented by a PDF. Let X = the number of miles (in thousands) an automobile is driven until it reaches the critical tread wear point for one tire. Suppose the PDF for X is 1 30x e f(x) 30 0 for x 0 for x 0 2005 Christopher R. Bilder 4.6 0.035 0.03 f(X) 0.025 0.02 0.015 0.01 0.005 0 0 25 50 75 100 125 150 175 200 225 250 X Find the expected number of miles a tire would lasts until it reaches the critical tread wear point. In other words, this will be the average lifetime for all tires. 1 30x = E(X) = x f(x)dx = x e dx 30 0 Use integration by parts: 1 30x e dv = 30 x 1 v= e 30 dx = -e-x/30 0 30 Let u = x du = dx Then E(X) uv vdu xe x / 30 0 0 e x / 30 dx 2005 Christopher R. Bilder 4.7 Note that by L’hopital’s rule, x 1 lim lim 0 x ex / 30 x (1/ 30)ex / 30 Then E(X) (0 0) 30 e x / 30 0 30(0 1) 30 Thus, one would expect the tire to last 30,000 miles on average. In Maple, > assume(x>0); > f:=1/30*exp(-x/30); 1 ( 1/30 x~ ) f := e 30 > plot(f,x=0..150, title = "Tire life PDF"); 2005 Christopher R. Bilder 4.8 > mu:=int(x*f,x=0..infinity); := 30 On a TI-89: 2005 Christopher R. Bilder 4.9 Sometimes, functions of random variables are of interest when finding the expected value. Section 4.2 will discuss one very important example for when this is of interest. In general, here is how the expected value can be found: Theorem 4.1: Let X be a random variable with PDF f(x). The mean or expected value of the random variable g(X) is g(X) = E[g(X)] = g(x) f(x) x when X is discrete, and g(X) = E[g(X)] = g(x) f(x)dx when X is continuous. Be very careful here! The function, g(X), is NOT a PDF. It is a function of a random variable. Remember that Chapter 3 often would use g(x) to denote a marginal PDF. Example: Tire life (chapter4_calc_rev.mws) Find E[|X-30|]. What does this mean in terms of the problem? 2005 Christopher R. Bilder 4.10 1 30x E | X 30 | | x 30 | e dx 30 0 1 30x 1 30x (30 x) e dx (x 30) e dx 30 30 0 30 x x 30 x 30 x 1 1 e 30 dx x e 30 dx x e 30 dx e 30 dx 30 0 30 30 0 30 30 30e x / 30 30e 30 0 x / 30 30e xe x / 30 30 x / 30 30 30e x / 30 xe x / 30 30 0 where reordering is done 30e 30e1 30 0 30e1 30e1 30e1 30 0 0 0 30e1 1 60e1 22.0728 In Maple, > int(abs(x-30)*f,x=0..infinity); 60 e ( -1 ) > evalf(int(abs(x-30)*f,x=0..infinity)); 22.07276647 Find E(X2). What does this mean in terms of the problem? 2005 Christopher R. Bilder 4.11 x 1 E(X2) = x f(x)dx = x2 e 30 dx 30 0 2 Use integration by parts twice to obtain: E(X2) = 1800 Notice that E(X2) [E(X)]2 = E(X)2 In Maple, > int(x^2*f,x=0..infinity); 1800 Corollary: E(aX+b) = aE(X) + b for some constants a and b. pf: E(aX b) (ax b)f(x)dx axf(x)dx bf(x)dx a xf(x)dx b f(x)dx aE(X) b 1 aE(X) b Example: Tire life While this example is not necessarily realistic, we can still do it to illustrate the corollary. 2005 Christopher R. Bilder 4.12 Find E[2X+1] = 2E[X] + 1 = 230 + 1 = 61. Theorem 4.2: Let X and Y be random variables with joint PDF f(x,y). The mean or expected value of the random variable g(X,Y) is g(X,Y) = E[g(X,Y)] = g(x,y) f(x,y) x y when X and Y are discrete, and g(X,Y) = E[g(X,Y)] = g(x,y) f(x,y)dx dy when X and Y are continuous. See the examples in Section 4.1 on your own. There is one particular case which will be important in the next section - when g(X,Y) = XY. In the continuous case, notice the difference between E(XY) and E(X)E(Y): E(XY) xy f(x,y)dx dy 2005 Christopher R. Bilder 4.13 E(X)E(Y) x f(x,y)dx dy y f(x,y)dx dy x g(x)dx yh(y)dy When are these two quantities equal? Notice how E(X) and E(Y) are found. Example: A sample for the tire life PDF (example_sample_tire.xls in Chapter 3) Below is a sample that comes from a population characterized by the PDF of 1 30x e f(x) 30 0 for x 0 for x 0 for the tire life example. Tire Number 1 2 3 4 x 8.7826 2.8102 71.202 16.55 2005 Christopher R. Bilder 4.14 5 6 7 8 9 10 999 1000 23.581 2.1657 36.784 14.432 27.69 10.743 34.008 52.044 Questions: How could you estimate E(X) using this sample and what would you expect it to be approximately? How could you estimate E(X2) using this sample and what would you expect it to be approximately? 2005 Christopher R. Bilder 4.15 4.2: Variance and Covariance In addition to knowing what we would expect X to be on average, we may want to know something about the “variability” of X from . For example in the tire tread wear example, we want to numerically quantify the average deviation from the mean for X. We already did this in terms of E[|X-30|] = E[|X-|]. There is another way more often used for doing this in terms of the squared deviation, E[(X-)2]. Definition 4.3: Let X be a random variable with PDF f(x) and mean . The variance of X is 2 = E[(X-)2] = (x )2 f(x) x when X is discrete, and = E[(X-) ] = (x )2 f(x)dx 2 2 when X is continuous. The positive square root of the variance, , is called the standard deviation of X. Notes: The variance is the expected average squared deviation of X from . 2005 Christopher R. Bilder 4.16 This an example of using Theorem 4.1 with g(x) = (x-)2. E[(X-)2] is equivalently written as E{[X-E(X)]2} so that there is an expectation within an expectation. Once the E(X) is found, it is a constant value as has been shown in the last section. Common notation that is often used here includes: Var(X) = 2X = 2 = E[(X-)2]. The Var() part is just a nice way to replace the E[ ] part. The subscript X on 2 is often helpful to use when there is more than one random variable under consideration. Thus, 2X denotes the variance of X. The reason for considering the positive square root of 2 is so that we can use the original units of X. 20 and >0 for all possible values of X. Theorem 4.2: The variance of a random variable X is 2 = Var(X) = E(X2) – 2 = E(X2) – [E(X)]2 pf: E[(X-)2] = E(X2 - 2X + 2) = E(X2) - 2E(X) + E(2) = E(X2) - 22 + 2 = E(X2) - 2 2005 Christopher R. Bilder 4.17 Example: Let’s Play Plinko! (plinko.xls) Let X be a random variable denoting the amount won for one chip. Find the variance and standard deviation for X. x $100 $500 $1,000 $0 $10,000 $100 f(x) 0.1667 0.3571 0.2976 0.1429 0.0357 $850.00 $1,799.31 -$2,748.61 $4,448.61 -$4,547.92 $6,247.92 Drop Plinko chip in column above: $500 $1,000 $0 f(x) f(x) f(x) 0.1339 0.0822 0.0359 0.3080 0.2204 0.1287 0.2991 0.2862 0.2545 0.1920 0.2796 0.3713 0.0670 0.1316 0.2096 $1,136.16 $2,404.79 $1,720.39 $3,246.57 $2,418.26 $3,923.92 $10,000 f(x) 0.0176 0.0882 0.2353 0.4118 0.2471 $2,751.76 $4,170.28 -$3,673.42 -$4,772.75 -$5,429.57 -$5,588.79 $5,945.74 $8,213.54 $10,266.10 $11,092.32 -$6,078.21 -$8,019.33 -$9,353.49 -$9,759.06 $8,350.54 $11,460.12 $14,190.01 $15,262.59 For example when dropping the chip above the $100 reservoir (#1 or #9 from p. 3.21 of Section 3.1-3.3), 2 = E[(X-)2] = (x )2 f(x) xT = (100-850)20.1667 + (500-850)20.3571 + (1000-850)20.2976 + (0-850)20.1429 + (10000-850)20.0357 = 3,237,500 dollars2 2005 Christopher R. Bilder 4.18 where T = {100, 500, 1000, 0, 10000}. To put this into units of just dollars, we take the positive square root to find = $1,799.31. What does this value really mean? “Rule of thumb” for the # of standard deviations all possible observations (data) lies from its mean: 2 or 3. This is an ad-hoc interpretation of Chebyshev’s Rule (Section 4.4) and the Empirical Rule (not in our book). This rule of thumb is discussed now to help you understand standard deviations. When someone drops one chip from the top of the Plinko board, I would generally expect the amount of money they will win to be between - 2 and + 2. A more conservative expected amount would be - 3 and + 3. Examine the application of the rule of thumb in the table and how it makes sense when the chip is dropped above the $100 spot. How are these calculations done in Excel? See the Plinko Probabilities worksheet of plinko.xls. Note that the negative values are represented by a “( )” instead of a negative sign. 2005 Christopher R. Bilder 4.19 2005 Christopher R. Bilder 4.20 Example: Time it takes to get to class What time should you leave your home for class in order to make sure that you are never (or rarely) late? Example: Tire life (chapter4_calc_rev.mws) The number of miles an automobile tire lasts before it reaches a critical point in tread wear can be represented by a PDF. Let X = the number of miles (in thousands) an automobile is driven until it reaches the critical tread wear point for one tire. Suppose the PDF for X is 1 30x e f(x) 30 0 for x 0 for x 0 0.035 0.03 f(X) 0.025 0.02 0.015 0.01 0.005 0 0 25 50 75 100 125 150 X 2005 Christopher R. Bilder 175 200 225 250 4.21 Find the variance and standard deviation for the number of tire miles driven until it reaches its critical wear point. = Var(X) = E[(X-) ] = (x )2 f(x)dx = 2 2 1 30x e dx where = 30. Instead of doing this (x ) 30 0 integral, it is often a little easier to work with 2 = E(X2) 2. Previously on p. 4.10, we found E(X2) = 1800. Thus, 2 = 1800 – 302 = 900! 2 In Maple, > int((x-mu)^2*f,x=0..infinity); 900 > int(x^2*f,x=0..infinity) - mu^2; 900 On TI-89, 2005 Christopher R. Bilder 4.22 Notice that = 900 = 30. Putting this it terms of thousands of miles, the standard deviation is 30,000 miles which is the same as the mean here. Generally for random variables, the mean and the standard deviation will not be the same!!! This just happens to be a special PDF called the “Exponential PDF” where this will always occur (see p. 166 of the book). Using the rule of thumb, -2 = 30 - 230 = -30 and +2 = 30 + 230 = 90 and -3 = 30 - 330 = -60 and +3 = 30 + 330 = 120 Thus, one would expect all of the tires to be between -30 and 90. A more conservative range would be -60 and 120. Of course, the negative values do not make sense for this problem. However, examine where the upper bound values fall on the PDF plot. Make sure you understand why these upper values make sense in terms of the plot! Compare = 30 to E(|X-|) = 22.07 found earlier. Suppose the PDF was changed to 1 15x for x 0 e f(x) 15 0 for x 0 2005 Christopher R. Bilder 4.23 In this case, one can show that E(X) = 15, Var(X) = 2 = 225, and = 15. The PDF is shown below. 0.07 0.06 f(X) 0.05 =15 0.04 0.03 0.02 0.01 0 0 25 50 75 100 125 150 175 200 225 X Below is the same plot with the original PDF of f(x) = (1/30)e-x/30 for x>0 where the y and x-axis scales are drawn on exactly the same scale. 0.07 0.06 f(X) 0.05 0.04 =30 0.03 0.02 0.01 0 0 25 50 75 100 125 150 175 200 225 X Notice the =30 plot has a PDF more spread out than the =15 plot. This is because the standard deviation (and variance) is larger for the =30 plot! 2005 Christopher R. Bilder 4.24 Question: Given the choice between a tire with ==30 and a tire with ==15, which would you choose? Other common g(X) functions used with E[g(X)]: The skewness for a random variable X is defined to be E[(X-)3]/E[(X-)2]3/2 = E[(X-)3]/3 This quantity measures the lack of symmetry in a PDF. For example, the PDF shown on p. 4.6 is very skewed (non-symmetric). The PDF shown on p. 3.30 of the Section 3.1-3.3 notes is symmetric. Note that E[(X-)2]3/2 E[(X-)3] E[(X-)]3 The kurtosis of a random variable X is defined to be E[(X-)4]/E[(X-)2]2 = E[(X-)4]/4 This quantity measures the amount of peakedness or flatness of a PDF. For example, the red PDF below has a higher peak than the blue PDF which is more flat. 2005 Christopher R. Bilder 4.25 Theorem 4.3: Let X be a random variable with PDF f(x). The variance of the random variable g(X) is 2 2 2 Var[g(X)] g(X) E g(X) g(X) g(x) g(X) f(x) 2 x when X is discrete, and Var[g(X)] 2 g( X) E g(X) g( X) g(x) g( X) f(x)dx 2 when X is continuous. Examine the examples in Section 4.2 on your own. 2005 Christopher R. Bilder 4.26 Covariance and correlation Suppose there are two random variables, X and Y. It is often of interest to determine if they are independent (remember Chapter 2). If they are dependent, then we would like quantify the amount and strength of dependence. Also, we would be interested in the type (positive or negative) of dependence. Positive dependence means that “large” values of X tend to occur with “large” values of Y. “Small” values of X tend to occur with “small” values of Y. If we could plot all values in a population, the dependence would look like this: Large values of Y Small values of Y Small values of X 2005 Christopher R. Bilder Large values of X 4.27 Negative dependence means that “large” values of X tend to occur with “small” values of Y. “Small” values of X tend to occur with “large” values of Y. If we could plot all values in a population, the dependence would look like this: Large values of Y Small values of Y Small values of X Large values of X Thus, positive dependence means as values of X increase, Y tends to increase as well (they move in the same direction). Negative dependence means as values of X increase, Y tends to decrease (they move in an opposite direction). Example: High school and college GPA Suppose I had a joint PDF which quantified the possible values for which high school and college GPAs can take 2005 Christopher R. Bilder 4.28 on. Let X = the student’s high school GPA and Y = the student’s college GPA. Questions: Would you expect there to be a relationship between X and Y? In other words, are X and Y independent or dependent? If they are dependent, would you expect there to be a strong or weak amount of dependence? If they are dependent, would you expect a positive or negative dependence? What would positive and negative dependence mean in terms of the problem? The numerical measure of the dependence between two random variables is called the “covariance”. It is denoted symbolically by XY when X and Y denote the two random variables. Below are a few notes about it: o XY = 0 when there is independence. o XY > 0 when there is positive dependence o XY < 0 when there is negative dependence o The further away XY is from 0, the stronger the dependence. Definition 4.4: Let X and Y be random variables with joint PDF f(x,y). Suppose E(X) = X and E(Y) = Y. The covariance of X and Y is 2005 Christopher R. Bilder 4.29 XY E X X Y Y x X y Y f(x,y) x y when X and Y are discrete, and XY E X X Y Y x X y Y f(x,y)dx dy when X and Y are continuous. Common notation that is often used for the covariance is Cov(X,Y) = XY. Below is a simplifying formula similar to the one used for the variance. Theorem 4.4: The covariance of two random variables X and Y with means X and Y, respectively, is given by XY = E(XY) – XY = E(XY) – E(X)E(Y) pf: E[(X-X)(Y-Y)] = E[XY - YX - XY + XY] = E[XY] - XE[Y] - YE[X] + E[XY] = E[XY] - XY - YX + XY = E[XY] - XY 2005 Christopher R. Bilder 4.30 Note that E(XY) XY = E(X)E(Y) except under a particular condition to be discussed in the next section. There is one problem with the covariance. The measure of strength of dependence (how far it is from 0) is not necessarily bounded above or below. The correlation coefficient, denoted by XY, fixes this problem. It is the covariance divided by the standard deviations of X and Y in order to provide a numerical value that is always between -1 and 1. Below are a few notes about it: o -1 XY 1 o XY = 0 when there is independence. o XY > 0 when there is positive dependence o XY < 0 when there is negative dependence o The closer to 1 that XY is, the stronger the positive dependence. o The closer to -1 that XY is, the stronger the negative dependence. o When X = Y, XY = 1. More generally, when X = a + bY for constants a and b>0, XY = 1. o When X = -Y, XY = -1. More generally, when X = a + bY for constants a and b<0, XY = -1. Definition 4.5: Let X and Y be random variables with covariance XY and standard deviations X and Y, respectively. The correlation coefficient for X and Y is 2005 Christopher R. Bilder 4.31 XY XY X Y Cov(X,Y) V ar(X) V ar(Y) Sometimes one will see this denoted as Corr(X,Y). Example: Grades for two courses (chapter4_calc_rev.mws) Let X be a random variable denoting grade in a math course and Y be a random variable denoting grade in a statistics course. Suppose the joint PDF is x2 2y2 f(x,y) 0 for 0<x<1 and 0<y<1 otherwise 2005 Christopher R. Bilder 4.32 Find the covariance and correlation coefficient to numerically measure the dependence between X and Y. To find the covariance, I am going to use the XY = E(XY) - XY expression. Find X: 1 1 E(X) 0 0 xf(x,y)dydx 0 0 x(x 2 2y2 )dydx 1 1 0 0 x3 2xy2 dydx 1 1 1 2 1 0 x3 y xy3 dx 3 0 2 1 0 x3 x 0 0 dx 3 1 1 1 x4 x2 4 3 0 1 1 00 4 3 7 12 This could have been found a little easier using results from Section 3.4. In that section, we found g(x) = x2 + 2/3 was the marginal PDF for X. Thus, 2005 Christopher R. Bilder 4.33 E(X) 0 xg(x)dx 1 2 1 0 x x 2 dx 3 2 1 0 x3 x dx 3 1 1 1 x4 x2 4 3 0 7 12 Find Y: 1 E(Y) 0 yh(y)dy 1 y 2y2 dy 3 11 0 y 2y3 dy 3 1 0 1 1 1 y2 y 4 6 2 0 1 1 00 6 2 2 3 Find E(XY): 2005 Christopher R. Bilder 4.34 Since both X and Y are involved in the expectation, the joint PDF must be used. E(XY) 0 0 xyf(x,y)dydx 1 1 0 0 xy(x 2 2y2 )dydx 1 1 0 0 x3 y 2xy3 dydx 1 1 1 1 11 0 x3 y2 xy4 dx 2 2 0 1 11 0 x3 x 0 0 dx 2 2 1 1 4 1 2 x x 8 4 0 1 1 00 8 4 3 8 Then XY = E(XY) - XY = 3/8 – (7/12)(2/3) = 3/8 – 7/18 = 27/72 – 28/72 = -1/72 = -0.0139 To use www.integrals.com in order to find E(XY), one needs to split the problem into two parts up. First, one needs to find the inner integral. Since I had decided previously to integrate first with respect to y last time, I will do it again. However, I need to integrate with respect to ”x” on integrals.com. So suppose 2005 Christopher R. Bilder 4.35 2 2 2 2 0 0 xy(x 2y )dydx is rewritten as 0 0 ax(a 2x )dxda . Thus, “a” represents “x” and “x” represents “y” in the original integral. To find the inner integral, 1 1 1 1 Then 1 a x a2 1 1 3 1 x a a a a 2 0 2 2 2 2 2 2 2 4 Again, to use integrals.com, I need to integrate with respect to x. Thus, replace a in the above expression with x and enter the result into appropriate box on the web page. 2005 Christopher R. Bilder 4.36 Then 2 4 1 x x 3 4 8 0 8 and the same result is obtained as before. This idea of splitting up the integral into two parts can also be used when using calculators that do not allow for multiple integration. From TI-89, 2005 Christopher R. Bilder 4.37 All of these calculations are much easier in Maple! > fxy:=x^2+2*y^2; fxy := x 22 y 2 > int(int(fxy,x=0..1),y=0..1); 1 > E(XY):=int(int(x*y*fxy,x=0..1), y=0..1); 3 E( XY ) := 8 > E(X):=int(int(x*fxy,x=0..1),y=0..1); 2005 Christopher R. Bilder 4.38 E( X ) := 7 12 > E(Y):=int(int(y*fxy,x=0..1),y=0..1); 2 E( Y ) := 3 > Cov(X,Y):=E(XY)-E(X)*E(Y); -1 Cov( X, Y ) := 72 > Cov(X,Y):=int(int((x-E(X))* (y-E(Y))*fxy,x=0..1),y=0..1); -1 Cov( X, Y ) := 72 > evalf(Cov(X,Y)); -.01388888889 > evalf(Cov(X,Y),4); -.01389 To find the correlation coefficient, I need to find the variances of X and Y in addition to the covariance between X and Y. To do this, I am going to use the shortcut formulas of Var(X) = 2X = E(X2) - 2X and Var(Y) = 2Y = E(Y2) - 2Y . Since the individual means have already been found, I just need to find E(X2) and E(Y2). Find E(X2): 2005 Christopher R. Bilder 4.39 E(X 2 ) 0 x 2g(x)dx 1 2 1 0 x 2 x 2 dx 3 2 1 0 x 4 x 2 dx 3 1 1 2 x5 x3 5 9 0 19 45 Find Var(X): Var(X) = E(X2) - 2X = 19/45 – (7/12)2 = 0.0819 Find E(Y2): 1 E(Y2 ) 0 y2h(y)dy 1 1 0 y2 2y2 dy 3 11 0 y2 2y4 dy 3 1 1 2 y3 y5 9 5 0 23 45 2005 Christopher R. Bilder 4.40 Find Var(Y): Var(Y) = E(Y2) - 2Y = 23/45 – (2/3)2 = 3/45 = 1/15 = 0.0667 Then XY XY -1/72 -0.1879 X Y 0.0819 1/15 From Maple: > Var(X):=int(int((x-E(X))^2*fxy, x=0..1),y=0..1); 59 Var ( X ) := 720 > Var(Y):=int(int((y-E(Y))^2*fxy, x=0..1),y=0..1); 1 Var ( Y ) := 15 > Corr(X,Y):=Cov(X,Y) / sqrt(Var(X)*Var(Y)); 5 Corr ( X, Y ) := 177 354 > evalf(Corr(X,Y),4); -.1878 2005 Christopher R. Bilder 4.41 Describe the relationship between math course grade (X) and stat course grade (Y): o Are math and stat course grades independent or dependent? Explain. o If they are dependent, is there a strong or weak amount of dependence? o If they are dependent, is there a positive or negative relationship between math and stat course grades? On an exam, I may just ask you to describe the relationship between two random variables instead of prompting you with the above questions. In your explanation, you should still address these types of questions! What we have developed is a way to understand the relationship between two different random variables. Where would this be useful? Suppose you want to study the relationships between: Humidity and temperature ACT and SAT score White and red blood cell counts Winning percentage and the number of yards per game on offense for NFL teams … 2005 Christopher R. Bilder 4.42 4.3: Means and Variances of Linear Combinations of Random Variables This section discusses some items we have already discussed and also some new items. The main purpose here is for you to get comfortable with finding expectations and variances of functions of random variables. Theorem 4.5: If a and b are constant, then E(aX+b) = aE(X) + b See p. 4.11 for where this was first introduced. Note what happens if a and/or b is equal to 0. Theorem 4.6: The expected value of the sum or difference of two or more functions of a random variable X is the sum or difference of the expected values of the functions. That is, E[g(X) h(X)] = E[g(X)] E[h(X)] For example, let g(X) = aX2+bX+c and h(X) = dX+f for some constants a,b,c,d, and f. Then E[g(X) - h(X)] = E(aX2 + bX + c - dX - f) = aE(X2) + bE(X) + E(c) - dE(X) - E(f) 2005 Christopher R. Bilder 4.43 = aE(X2) + bE(X) + c - dE(X) - f The main point to remember is that you can distribute an expectation through a linear combination of random variables. Note that you can not distribute an expectation through a product of random variables except under special conditions. For example, E(X2) E(X)E(X) usually. Theorem 4.7: The expected value of the sum or difference of two or more functions of the random variables X and Y is the sum or difference of the expected values of the functions. That is, E[g(X,Y) h(X,Y)] = E[g(X,Y)] E[h(X,Y)] For example, let g(X,Y) = aX2+bY+c and h(X,Y) = dX+f. Then E[g(X) - h(X)] = E(aX2 + bY + c - dX - f) = aE(X2) + bE(Y) + E(c) - dE(X) - E(f) = aE(X2) + bE(Y) + c - aE(X) - f Again, the main point to remember is that you can distribute an expectation through a linear combination of random variables. Note that you can not distribute an expectation through a product of random variables 2005 Christopher R. Bilder 4.44 except under special conditions. For example, E(XY) E(X)E(Y) as previously discussed (see p. 4.13). Theorem 4.8: Let X and Y be two independent random variables. Then E(XY) = E(X)E(Y). We discussed this case on p. 4.13. Notice that E(XY) xy f(x,y)dx dy . Under independence, this simplifies to E(XY) xy g(x)h(y)dx dy yh(y) x g(x)dx dy yh(y)dy x g(x)dx since xg(x) has no values of y in it. Also, remember that E(X)E(Y) x f(x,y)dx dy y f(x,y)dx dy x g(x)dx yh(y)dy Notice what happens to the covariance and correlation coefficient when X and Y are independent! 2005 Christopher R. Bilder 4.45 Cov(X,Y) = XY = E(XY) – E(X)E(Y) = 0 under independence! XY XY 0 0 under independence! X Y X Y Often, the variance of linear combinations is important. 2 2 2 Theorem 4.9: If a and b are constants, then aX b a X = a22. Stated differently, Var(aX + b) = a2Var(X). pf: 2005 Christopher R. Bilder 4.46 Theorem 4.10: If a and b are constants and X and Y are random variables with joint PDF of f(x,y), then 2 2 2 2 2 aX bY a X b Y 2abXY Stated differently, Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y). pf: 2005 Christopher R. Bilder 4.47 Corollary 1: If X and Y are independent, then Var(aX+bY) = a2Var(X) + b2Var(Y) since Cov(X,Y) = XY = 0. This corollary will help us is Section 10.8 when deriving the test statistic used in a hypothesis test for two means. Example: Grades for two courses (chapter4_calc_rev.mws) Let X be a random variable denoting grade in a math course and Y be a random variable denoting grade in a statistics course. Suppose the joint PDF is x2 2y2 f(x,y) 0 for 0<x<1 and 0<y<1 otherwise This is not necessarily a realistic example, but find Var(X + 2Y): Var(X+2Y) = Var(X) + 22Var(Y) + 212Cov(X,Y) = 0.0819 + 40.0667 + 4(-0.0139) = 0.2931 In Maple, > Var(X)+2^2*Var(Y)+2*1*2*Cov(X,Y); 211 720 2005 Christopher R. Bilder 4.48 Theorem: If X1, X2, …, Xn are random variables from a joint PDF of f(x1, x2, …, xn) and a1, a2, …, an are constants, then Var(a1X1 + a2X2 + … + anXn) n = ai2 Var(Xi ) + 2 ai a jCov(Xi ,X j ) i 1 n i j = ai2 Var(Xi ) + ai a jCov(Xi ,X j ) i 1 i j Corollary: If X1, X2, …, Xn are independent random variables, n then Var(a1X1 + a2X2 + … + anXn) = ai2 Var(Xi ) i 1 2005 Christopher R. Bilder 4.49 4.4: Chebyshev’s Theorem From Section 4.2: “Rule of thumb” for the # of standard deviations all possible observations (data) lies from its mean: 2 or 3. This is an ad-hoc interpretation of Chebyshev’s Rule (Section 4.4) and the Empirical Rule (not in our book). The purpose of this section to give a more precise definition and explanation of this powerful rule. Thereom 4.11 (Cheby): The probability that any random variable X will assume a value within k standard deviations of the mean is at least 1-1/k2. That is, P( - k < X < + k) 1 - 1 k2 Notes: There is no mention of the PDF for X here!!! Thus, this holds for ANY PDF!!! Below are some values of k and lower bounds for the probability which are often of interest. k 1- 1 0 1 k2 2005 Christopher R. Bilder 4.50 k 2 3 4 5 1 k2 0.75 0.88 0.9375 0.96 1- While the probabilities for 2 or 3 may be lower than you would expect based on the rule of thumb, remember that the above table gives lower bounds only. For a given PDF, the probabilities could be higher! See p. 111 of the book for the proof. Example: Tire life The number of miles an automobile tire lasts before it reaches a critical point in tread wear can be represented by a probability distribution. Let X = the number of miles (in thousands) an automobile is driven until it reaches the critical tread wear point for one tire. Suppose the probability distribution for X is 1 30x e f(x) 30 0 for x 0 for x 0 2005 Christopher R. Bilder 4.51 0.035 0.03 f(X) 0.025 0.02 0.015 0.01 0.005 0 0 25 50 75 100 125 150 175 200 225 250 X Find P(-k ≤ X ≤ +k). Note that we could insert the values of and into the probability. I will wait until the end to do that. Since -k could be less than 0 (outside of the possible values of X), we should use the following expression for the probability: If -k > 0 then P(-k ≤ X ≤ +k) = - e-(+k)/30 + e-(-k)/30 2005 Christopher R. Bilder 4.52 If -k < 0 then P(0 ≤ X ≤ +k) = - e-(+k)/30 + e-0/30 = -e-(+k)/30 + 1 Earlier, we found that = 30 and = 30. Then P(30-30k ≤ X ≤ 30+30k) can be found for various values of k. Note that for k1, 30-30k0, so we can find P(0 ≤ X ≤ 30+30k) instead. 1 k2 P(0 ≤ X ≤ 30+30k) k 1- 1 2 3 4 5 0 0.75 0.88 0.9375 0.96 P(0 ≤ X ≤ 60) = 0.8647 P(0 ≤ X ≤ 90) = 0.9502 P(0 ≤ X ≤ 120) = 0.9817 P(0 ≤ X ≤ 150) = 0.9933 P(0 ≤ X ≤ 180) = 0.9975 Would you expect most tires to have a lifespan (time until tire reaches critical tread wear) of with 2 (or 3) standard deviations from the mean? Question: Suppose the random variables X and Y both have means of 10. For X, the standard deviation is 2 and for Y the standard deviation is 4. For which random 2005 Christopher R. Bilder 4.53 variable would you expect possible observed values to be closer to the mean? 2005 Christopher R. Bilder