Chapter 4: Mathematical Expectation

advertisement
4.1
Chapter 4: Mathematical Expectation
Diagram from Chapter 1:
Take Sample
Sample
Inference
Population
In Chapter 3, we learned about some PDFs and how
they are used to quantify the distribution of items in a
population. In this chapter, we are going to summarize
information in the population relative to the PDFs.
Specifically, we are going to look at the “expected value”
of a random variable and find it using its PDF. These
expected values are going to represent parameters.
Thus, they will summarize items in the population.
 2005 Christopher R. Bilder
4.2
4.1: Mean of a Random Variable
Given the use of a PDF for a population, we may want to
know what value of X we would expect on average to
obtain. This is found through the expected value of X,
E(X).
Definition 4.1: Let X be a random variable with PDF f(x).
The population mean or expected value of X is
 = E(X) =  x f(x)
x
when X is discrete, and

 = E(X) =  x f(x)dx

when X is continuous.
Notes:
o In calculus you learned about something similar. It
may have been called “moment about the y-axis”
(Faires and Faires, 1988).
o You will often see  written as X to emphasize that
the expected value is for the random variable X. This
notation is helpful when there are multiple random
variables.
 2005 Christopher R. Bilder
4.3
o In the discrete case, you can think of  as a weighted
average. The weight for each x is f(x) = P(X=x).
o  is a parameter since it summarizes possible values
of a population.
Example: Let’s Play Plinko! (plinko.xls)
Let X be a random variable denoting the amount won for
one chip. Below are 5 different PDFs for the amount
won.
x
$100
$500
$1,000
$0
$10,000
$100
f(x)
0.1667
0.3571
0.2976
0.1429
0.0357
 $850.00
Drop Plinko chip in column above:
$500
$1,000
$0
f(x)
f(x)
f(x)
0.1339
0.0822
0.0359
0.3080
0.2204
0.1287
0.2991
0.2862
0.2545
0.1920
0.2796
0.3713
0.0670
0.1316
0.2096
$1,136.16
$1,720.39
$10,000
f(x)
0.0176
0.0882
0.2353
0.4118
0.2471
$2,418.26 $2,751.76
For example,  = E(X) =  x f(x)  1000.1667 +
xT
5000.3571 + 10000.2976 + 00.1429 + 100000.0357
= 850
where T = {100, 500, 1000, 0, 10000}.
 2005 Christopher R. Bilder
4.4
This is a good example of why you can think of  as a
weighted average.
Questions:
 What is the optimal place to drop the chip in order to
maximize your expected winnings?
 Compare what actually occurred in 2001 to what is
expected:
x
$100
$500
$1,000
$0
$10,000
Drop Plinko chip in column above:
$100 $500 $1,000
$0
$10,000
Count Count Count Count Count
0
3
2
5
2
0
3
12
5
6
1
3
12
19
11
0
2
9
21
8
0
1
4
9
4
Total chips
1
12
39
59
Average won $1,000 $1,233 $1,492 $1,898
31
$1,748
Why are these averages different from what we would
expect?
o The sample size is small. As the sample size gets
bigger, we would expect the sample average to
approach the population mean.
o There is variability from one sample to the next. This
implies that we would not expect the same number of
observed values for the years of 2002, 2003, … when
 2005 Christopher R. Bilder
4.5
the game is played. More will be discussed about
variability later.
o Possibly one of the underlying assumptions behind
the calculation of the probabilities is incorrect. While
we did not discuss these assumptions in class, the
main one is that the probability of going to the left or
right ONE slot is 0.5 as a plinko chip hits a peg on the
board. If you watch the plinko game, you will notice
that chips can often go more than ONE slot to the left
or right after hitting a peg.
Example: Tire life (chapter4_calc_rev.mws)
The number of miles an automobile tire lasts before it
reaches a critical point in tread wear can be represented
by a PDF. Let X = the number of miles (in thousands)
an automobile is driven until it reaches the critical tread
wear point for one tire. Suppose the PDF for X is
 1  30x
 e
f(x)   30
 0

for x  0
for x  0
 2005 Christopher R. Bilder
4.6
0.035
0.03
f(X)
0.025
0.02
0.015
0.01
0.005
0
0
25
50
75
100
125
150
175
200
225
250
X
Find the expected number of miles a tire would lasts until
it reaches the critical tread wear point. In other words,
this will be the average lifetime for all tires.
1  30x
 = E(X) =  x f(x)dx =  x
e dx
30

0


Use integration by parts:
1  30x
e
dv =
30
x
 1

v= 
e 30 dx = -e-x/30
0 30
Let u = x
du = dx
Then
E(X)  uv   vdu
  xe
 x / 30

0

 0 e x / 30 dx
 2005 Christopher R. Bilder
4.7
Note that by L’hopital’s rule,
x
1
lim
 lim
0
x  ex / 30
x  (1/ 30)ex / 30
Then
E(X)  (0  0)  30 e
 x / 30

0
 30(0  1)
 30
Thus, one would expect the tire to last 30,000 miles on
average.
In Maple,
> assume(x>0);
> f:=1/30*exp(-x/30);
1 (  1/30 x~ )
f :=
e
30
> plot(f,x=0..150, title = "Tire life
PDF");
 2005 Christopher R. Bilder
4.8
> mu:=int(x*f,x=0..infinity);
 := 30
On a TI-89:
 2005 Christopher R. Bilder
4.9
Sometimes, functions of random variables are of interest
when finding the expected value. Section 4.2 will discuss
one very important example for when this is of interest. In
general, here is how the expected value can be found:
Theorem 4.1: Let X be a random variable with PDF f(x). The
mean or expected value of the random variable g(X) is
g(X) = E[g(X)] =  g(x) f(x)
x
when X is discrete, and

g(X) = E[g(X)] =  g(x) f(x)dx

when X is continuous.
Be very careful here! The function, g(X), is NOT a PDF.
It is a function of a random variable. Remember that
Chapter 3 often would use g(x) to denote a marginal
PDF.
Example: Tire life (chapter4_calc_rev.mws)
Find E[|X-30|]. What does this mean in terms of the
problem?
 2005 Christopher R. Bilder
4.10
1  30x
E | X  30 |    | x  30 | e dx
30
0


1  30x
1  30x
  (30  x) e dx   (x  30) e dx
30
30
0
30
x
x
30  x
30

  x


1
1
  e 30 dx 
 x e 30 dx 
 x e 30 dx   e 30 dx
30 0
30 30
0
30
30
 30e

 x / 30
 30e

30
0
 x / 30
 30e
 xe
 
 x / 30

30
 x / 30


30

 30e
 x / 30
 xe
 x / 30

30
0
where reordering is done
 
 30e  

  30e1  30  0  30e1   30e1  30e1   30  0  

  0  0   30e1
1
 60e1  22.0728
In Maple,
> int(abs(x-30)*f,x=0..infinity);
60 e
( -1 )
> evalf(int(abs(x-30)*f,x=0..infinity));
22.07276647
Find E(X2). What does this mean in terms of the
problem?
 2005 Christopher R. Bilder
4.11
x

1
E(X2) =  x f(x)dx =  x2
e 30 dx
30

0

2

Use integration by parts twice to obtain:
E(X2) = 1800
Notice that E(X2)  [E(X)]2 = E(X)2
In Maple,
> int(x^2*f,x=0..infinity);
1800
Corollary: E(aX+b) = aE(X) + b for some constants a and b.
pf:
E(aX  b)   (ax  b)f(x)dx
  axf(x)dx   bf(x)dx
 a  xf(x)dx b  f(x)dx
 aE(X)  b  1
 aE(X)  b
Example: Tire life
While this example is not necessarily realistic, we can
still do it to illustrate the corollary.
 2005 Christopher R. Bilder
4.12
Find E[2X+1] = 2E[X] + 1 = 230 + 1 = 61.
Theorem 4.2: Let X and Y be random variables with joint
PDF f(x,y). The mean or expected value of the random
variable g(X,Y) is
g(X,Y) = E[g(X,Y)] =   g(x,y) f(x,y)
x y
when X and Y are discrete, and
 
g(X,Y) = E[g(X,Y)] =   g(x,y) f(x,y)dx dy
 
when X and Y are continuous.
See the examples in Section 4.1 on your own.
There is one particular case which will be important in
the next section - when g(X,Y) = XY. In the continuous
case, notice the difference between E(XY) and E(X)E(Y):
 
E(XY)    xy f(x,y)dx dy
 
 2005 Christopher R. Bilder
4.13
 
 



E(X)E(Y)     x f(x,y)dx dy     y f(x,y)dx dy 
  
   






   x g(x)dx    yh(y)dy 
 
  

When are these two quantities equal?
Notice how E(X) and E(Y) are found.
Example: A sample for the tire life PDF
(example_sample_tire.xls in Chapter 3)
Below is a sample that comes from a population
characterized by the PDF of
 1  30x
 e
f(x)   30
 0

for x  0
for x  0
for the tire life example.
Tire
Number
1
2
3
4
x
8.7826
2.8102
71.202
16.55
 2005 Christopher R. Bilder
4.14
5
6
7
8
9
10

999
1000
23.581
2.1657
36.784
14.432
27.69
10.743
34.008
52.044
Questions:
 How could you estimate E(X) using this sample and
what would you expect it to be approximately?
 How could you estimate E(X2) using this sample and
what would you expect it to be approximately?
 2005 Christopher R. Bilder
4.15
4.2: Variance and Covariance
In addition to knowing what we would expect X to be on
average, we may want to know something about the
“variability” of X from . For example in the tire tread
wear example, we want to numerically quantify the
average deviation from the mean for X. We already did
this in terms of E[|X-30|] = E[|X-|]. There is another
way more often used for doing this in terms of the
squared deviation, E[(X-)2].
Definition 4.3: Let X be a random variable with PDF f(x) and
mean . The variance of X is
2 = E[(X-)2] =  (x  )2 f(x)
x
when X is discrete, and

 = E[(X-) ] =  (x  )2 f(x)dx
2
2

when X is continuous. The positive square root of the
variance, , is called the standard deviation of X.
Notes:
 The variance is the expected average squared
deviation of X from .
 2005 Christopher R. Bilder
4.16
 This an example of using Theorem 4.1 with
g(x) = (x-)2.
 E[(X-)2] is equivalently written as E{[X-E(X)]2} so that
there is an expectation within an expectation. Once the
E(X) is found, it is a constant value as has been shown
in the last section.
 Common notation that is often used here includes:
Var(X) = 2X = 2 = E[(X-)2]. The Var() part is just a
nice way to replace the E[ ] part.
 The subscript X on 2 is often helpful to use when there
is more than one random variable under consideration.
Thus, 2X denotes the variance of X.
 The reason for considering the positive square root of
2 is so that we can use the original units of X.
 20 and >0 for all possible values of X.
Theorem 4.2: The variance of a random variable X is
2 = Var(X) = E(X2) – 2 = E(X2) – [E(X)]2
pf:
E[(X-)2]
= E(X2 - 2X + 2)
= E(X2) - 2E(X) + E(2)
= E(X2) - 22 + 2
= E(X2) - 2
 2005 Christopher R. Bilder
4.17
Example: Let’s Play Plinko! (plinko.xls)
Let X be a random variable denoting the amount won for
one chip. Find the variance and standard deviation for
X.
x
$100
$500
$1,000
$0
$10,000
$100
f(x)
0.1667
0.3571
0.2976
0.1429
0.0357
 $850.00
 $1,799.31

-$2,748.61
 $4,448.61

-$4,547.92
 $6,247.92
Drop Plinko chip in column above:
$500
$1,000
$0
f(x)
f(x)
f(x)
0.1339
0.0822
0.0359
0.3080
0.2204
0.1287
0.2991
0.2862
0.2545
0.1920
0.2796
0.3713
0.0670
0.1316
0.2096
$1,136.16
$2,404.79
$1,720.39
$3,246.57
$2,418.26
$3,923.92
$10,000
f(x)
0.0176
0.0882
0.2353
0.4118
0.2471
$2,751.76
$4,170.28
-$3,673.42 -$4,772.75 -$5,429.57 -$5,588.79
$5,945.74 $8,213.54 $10,266.10 $11,092.32
-$6,078.21 -$8,019.33 -$9,353.49 -$9,759.06
$8,350.54 $11,460.12 $14,190.01 $15,262.59
For example when dropping the chip above the $100
reservoir (#1 or #9 from p. 3.21 of Section 3.1-3.3),
2 = E[(X-)2] =  (x  )2 f(x)
xT
= (100-850)20.1667 + (500-850)20.3571 +
(1000-850)20.2976 + (0-850)20.1429 +
(10000-850)20.0357
= 3,237,500 dollars2
 2005 Christopher R. Bilder
4.18
where T = {100, 500, 1000, 0, 10000}.
To put this into units of just dollars, we take the positive
square root to find  = $1,799.31.
What does this value really mean?
“Rule of thumb” for the # of standard deviations all
possible observations (data) lies from its mean: 2 or
3. This is an ad-hoc interpretation of Chebyshev’s
Rule (Section 4.4) and the Empirical Rule (not in our
book). This rule of thumb is discussed now to help
you understand standard deviations.
When someone drops one chip from the top of the
Plinko board, I would generally expect the amount of
money they will win to be between  - 2 and  +
2. A more conservative expected amount would be
 - 3 and  + 3.
Examine the application of the rule of thumb in the table
and how it makes sense when the chip is dropped above
the $100 spot.
How are these calculations done in Excel? See the
Plinko Probabilities worksheet of plinko.xls. Note that
the negative values are represented by a “( )” instead of
a negative sign.
 2005 Christopher R. Bilder
4.19
 2005 Christopher R. Bilder
4.20
Example: Time it takes to get to class
What time should you leave your home for class in order
to make sure that you are never (or rarely) late?
Example: Tire life (chapter4_calc_rev.mws)
The number of miles an automobile tire lasts before it
reaches a critical point in tread wear can be represented
by a PDF. Let X = the number of miles (in thousands)
an automobile is driven until it reaches the critical tread
wear point for one tire. Suppose the PDF for X is
 1  30x
 e
f(x)   30
 0

for x  0
for x  0
0.035
0.03
f(X)
0.025
0.02
0.015
0.01
0.005
0
0
25
50
75
100
125
150
X
 2005 Christopher R. Bilder
175
200
225
250
4.21
Find the variance and standard deviation for the number
of tire miles driven until it reaches its critical wear point.

 = Var(X) = E[(X-) ] =  (x  )2 f(x)dx =
2
2

1  30x
e dx where = 30. Instead of doing this
 (x  )
30
0
integral, it is often a little easier to work with 2 = E(X2) 2. Previously on p. 4.10, we found E(X2) = 1800. Thus,
2 = 1800 – 302 = 900!

2
In Maple,
> int((x-mu)^2*f,x=0..infinity);
900
> int(x^2*f,x=0..infinity) - mu^2;
900
On TI-89,
 2005 Christopher R. Bilder
4.22
Notice that  =  900 = 30. Putting this it terms of
thousands of miles, the standard deviation is 30,000
miles which is the same as the mean here. Generally for
random variables, the mean and the standard deviation
will not be the same!!! This just happens to be a special
PDF called the “Exponential PDF” where this will always
occur (see p. 166 of the book). Using the rule of thumb,
-2 = 30 - 230 = -30 and +2 = 30 + 230 = 90
and
-3 = 30 - 330 = -60 and +3 = 30 + 330 = 120
Thus, one would expect all of the tires to be between -30
and 90. A more conservative range would be -60 and
120. Of course, the negative values do not make sense
for this problem. However, examine where the upper
bound values fall on the PDF plot. Make sure you
understand why these upper values make sense in
terms of the plot!
Compare  = 30 to E(|X-|) = 22.07 found earlier.
Suppose the PDF was changed to
 1  15x
for x  0
 e
f(x)  15
 0
for x  0

 2005 Christopher R. Bilder
4.23
In this case, one can show that E(X) = 15, Var(X) = 2 =
225, and  = 15. The PDF is shown below.
0.07
0.06
f(X)
0.05
=15
0.04
0.03
0.02
0.01
0
0
25
50
75
100
125
150
175
200
225
X
Below is the same plot with the original PDF of f(x) =
(1/30)e-x/30 for x>0 where the y and x-axis scales are
drawn on exactly the same scale.
0.07
0.06
f(X)
0.05
0.04
=30
0.03
0.02
0.01
0
0
25
50
75
100
125
150
175
200
225
X
Notice the =30 plot has a PDF more spread out than
the =15 plot. This is because the standard deviation
(and variance) is larger for the =30 plot!
 2005 Christopher R. Bilder
4.24
Question: Given the choice between a tire with ==30
and a tire with ==15, which would you choose?
Other common g(X) functions used with E[g(X)]:
 The skewness for a random variable X is defined to be
E[(X-)3]/E[(X-)2]3/2 = E[(X-)3]/3
This quantity measures the lack of symmetry in a PDF.
For example, the PDF shown on p. 4.6 is very skewed
(non-symmetric). The PDF shown on p. 3.30 of the
Section 3.1-3.3 notes is symmetric.
Note that E[(X-)2]3/2  E[(X-)3]  E[(X-)]3
 The kurtosis of a random variable X is defined to be
E[(X-)4]/E[(X-)2]2 = E[(X-)4]/4
This quantity measures the amount of peakedness or
flatness of a PDF. For example, the red PDF below has
a higher peak than the blue PDF which is more flat.
 2005 Christopher R. Bilder
4.25
Theorem 4.3: Let X be a random variable with PDF f(x). The
variance of the random variable g(X) is

2

2

2
Var[g(X)]  g(X)
 E g(X)  g(X)    g(x)  g(X)  f(x)
2
x
when X is discrete, and
Var[g(X)]  
2
g( X)


 E g(X)  g( X)    g(x)  g( X)  f(x)dx
2

when X is continuous.
Examine the examples in Section 4.2 on your own.
 2005 Christopher R. Bilder
4.26
Covariance and correlation
Suppose there are two random variables, X and Y. It is
often of interest to determine if they are independent
(remember Chapter 2). If they are dependent, then we
would like quantify the amount and strength of dependence.
Also, we would be interested in the type (positive or
negative) of dependence.
Positive dependence means that “large” values of X tend
to occur with “large” values of Y. “Small” values of X
tend to occur with “small” values of Y. If we could plot all
values in a population, the dependence would look like
this:
Large values of Y
Small values of Y
Small values of X
 2005 Christopher R. Bilder
Large values of X
4.27
Negative dependence means that “large” values of X
tend to occur with “small” values of Y. “Small” values of
X tend to occur with “large” values of Y. If we could plot
all values in a population, the dependence would look
like this:
Large values of Y
Small values of Y
Small values of X
Large values of X
Thus, positive dependence means as values of X
increase, Y tends to increase as well (they move in the
same direction). Negative dependence means as values
of X increase, Y tends to decrease (they move in an
opposite direction).
Example: High school and college GPA
Suppose I had a joint PDF which quantified the possible
values for which high school and college GPAs can take
 2005 Christopher R. Bilder
4.28
on. Let X = the student’s high school GPA and Y = the
student’s college GPA.
Questions:
 Would you expect there to be a relationship between X
and Y? In other words, are X and Y independent or
dependent?
 If they are dependent, would you expect there to be a
strong or weak amount of dependence?
 If they are dependent, would you expect a positive or
negative dependence? What would positive and
negative dependence mean in terms of the problem?
The numerical measure of the dependence between two
random variables is called the “covariance”. It is denoted
symbolically by XY when X and Y denote the two random
variables. Below are a few notes about it:
o XY = 0 when there is independence.
o XY > 0 when there is positive dependence
o XY < 0 when there is negative dependence
o The further away XY is from 0, the stronger the
dependence.
Definition 4.4: Let X and Y be random variables with joint
PDF f(x,y). Suppose E(X) = X and E(Y) = Y. The
covariance of X and Y is
 2005 Christopher R. Bilder
4.29
XY  E  X  X  Y  Y       x  X  y  Y  f(x,y)
x y
when X and Y are discrete, and
 
XY  E  X  X  Y  Y       x   X  y   Y  f(x,y)dx dy
 
when X and Y are continuous.
Common notation that is often used for the covariance is
Cov(X,Y) = XY.
Below is a simplifying formula similar to the one used for the
variance.
Theorem 4.4: The covariance of two random variables X and
Y with means X and Y, respectively, is given by
XY = E(XY) – XY = E(XY) – E(X)E(Y)
pf:
E[(X-X)(Y-Y)] = E[XY - YX - XY + XY]
= E[XY] - XE[Y] - YE[X] + E[XY]
= E[XY] - XY - YX + XY
= E[XY] - XY
 2005 Christopher R. Bilder
4.30
Note that E(XY)  XY = E(X)E(Y) except under a
particular condition to be discussed in the next section.
There is one problem with the covariance. The measure of
strength of dependence (how far it is from 0) is not
necessarily bounded above or below. The correlation
coefficient, denoted by XY, fixes this problem. It is the
covariance divided by the standard deviations of X and Y in
order to provide a numerical value that is always between -1
and 1. Below are a few notes about it:
o -1  XY  1
o XY = 0 when there is independence.
o XY > 0 when there is positive dependence
o XY < 0 when there is negative dependence
o The closer to 1 that XY is, the stronger the positive
dependence.
o The closer to -1 that XY is, the stronger the negative
dependence.
o When X = Y, XY = 1. More generally, when X = a + bY
for constants a and b>0, XY = 1.
o When X = -Y, XY = -1. More generally, when X = a + bY
for constants a and b<0, XY = -1.
Definition 4.5: Let X and Y be random variables with
covariance XY and standard deviations X and Y,
respectively. The correlation coefficient for X and Y is
 2005 Christopher R. Bilder
4.31
XY 
XY

X Y
Cov(X,Y)
V ar(X) V ar(Y)
Sometimes one will see this denoted as Corr(X,Y).
Example: Grades for two courses (chapter4_calc_rev.mws)
Let X be a random variable denoting grade in a math
course and Y be a random variable denoting grade in a
statistics course. Suppose the joint PDF is
 x2  2y2
f(x,y)  
 0
for 0<x<1 and 0<y<1
otherwise
 2005 Christopher R. Bilder
4.32
Find the covariance and correlation coefficient to
numerically measure the dependence between X and Y.
To find the covariance, I am going to use the XY =
E(XY) - XY expression.
Find X:
1 1
E(X)  0 0 xf(x,y)dydx
 0 0 x(x 2  2y2 )dydx
1 1
 0 0 x3  2xy2 dydx
1 1
1
2
1
 0 x3 y  xy3 dx
3
0
2
1
 0 x3  x  0  0 dx
3
1
1
1
 x4  x2
4
3 0
1 1
  00
4 3
7

12
This could have been found a little easier using
results from Section 3.4. In that section, we found
g(x) = x2 + 2/3 was the marginal PDF for X. Thus,
 2005 Christopher R. Bilder
4.33
E(X)  0 xg(x)dx
1
2
1 
 0 x  x 2   dx
3

2
1
 0 x3  x dx
3
1
1
1
 x4  x2
4
3 0
7

12
Find Y:
1
E(Y)  0 yh(y)dy
1

  y   2y2  dy
3

11
 0 y  2y3 dy
3
1
0
1
1
1
 y2  y 4
6
2 0
1 1
  00
6 2
2

3
Find E(XY):
 2005 Christopher R. Bilder
4.34
Since both X and Y are involved in the expectation,
the joint PDF must be used.
E(XY)  0 0 xyf(x,y)dydx
1 1
 0 0 xy(x 2  2y2 )dydx
1 1
 0 0 x3 y  2xy3 dydx
1 1
1
1
11
 0 x3 y2  xy4 dx
2
2
0
1
11
 0 x3  x  0  0 dx
2
2
1
1 4 1 2
x  x
8
4 0
1 1
  00
8 4
3

8

Then XY = E(XY) - XY = 3/8 – (7/12)(2/3)
= 3/8 – 7/18 = 27/72 – 28/72 = -1/72 = -0.0139
To use www.integrals.com in order to find E(XY), one
needs to split the problem into two parts up. First, one
needs to find the inner integral. Since I had decided
previously to integrate first with respect to y last time, I
will do it again. However, I need to integrate with
respect to ”x” on integrals.com. So suppose
 2005 Christopher R. Bilder
4.35
2
2
2
2
0 0 xy(x  2y )dydx is rewritten as 0 0 ax(a  2x )dxda .
Thus, “a” represents “x” and “x” represents “y” in the
original integral. To find the inner integral,
1 1
1 1
Then
1
a x
 a2 1  1 3 1
x 
a
   a    a  a
2 0
2
 2
 2 2 2
2 2
4
Again, to use integrals.com, I need to integrate with
respect to x. Thus, replace a in the above expression
with x and enter the result into appropriate box on the
web page.
 2005 Christopher R. Bilder
4.36
Then
2
4
1
x
x
3


4
8 0 8
and the same result is obtained as before. This idea of
splitting up the integral into two parts can also be used
when using calculators that do not allow for multiple
integration.
From TI-89,
 2005 Christopher R. Bilder
4.37
All of these calculations are much easier in Maple!
> fxy:=x^2+2*y^2;
fxy := x 22 y 2
> int(int(fxy,x=0..1),y=0..1);
1
> E(XY):=int(int(x*y*fxy,x=0..1),
y=0..1);
3
E( XY ) :=
8
> E(X):=int(int(x*fxy,x=0..1),y=0..1);
 2005 Christopher R. Bilder
4.38
E( X ) :=
7
12
> E(Y):=int(int(y*fxy,x=0..1),y=0..1);
2
E( Y ) :=
3
> Cov(X,Y):=E(XY)-E(X)*E(Y);
-1
Cov( X, Y ) :=
72
> Cov(X,Y):=int(int((x-E(X))*
(y-E(Y))*fxy,x=0..1),y=0..1);
-1
Cov( X, Y ) :=
72
> evalf(Cov(X,Y));
-.01388888889
> evalf(Cov(X,Y),4);
-.01389
To find the correlation coefficient, I need to find the
variances of X and Y in addition to the covariance
between X and Y. To do this, I am going to use the
shortcut formulas of Var(X) = 2X = E(X2) -  2X and Var(Y)
= 2Y = E(Y2) -  2Y . Since the individual means have
already been found, I just need to find E(X2) and E(Y2).
Find E(X2):
 2005 Christopher R. Bilder
4.39
E(X 2 )  0 x 2g(x)dx
1
2

1
 0 x 2  x 2   dx
3

2
1
 0 x 4  x 2 dx
3
1
1
2
 x5  x3
5
9 0
19

45
Find Var(X):
Var(X) = E(X2) -  2X = 19/45 – (7/12)2 = 0.0819
Find E(Y2):
1
E(Y2 )  0 y2h(y)dy
1

1
 0 y2   2y2  dy
3

11
 0 y2  2y4 dy
3
1
1
2
 y3  y5
9
5 0
23

45
 2005 Christopher R. Bilder
4.40
Find Var(Y):
Var(Y) = E(Y2) -  2Y = 23/45 – (2/3)2 = 3/45 = 1/15 =
0.0667
Then XY 
XY
-1/72

 -0.1879
X Y
0.0819 1/15
From Maple:
> Var(X):=int(int((x-E(X))^2*fxy,
x=0..1),y=0..1);
59
Var ( X ) :=
720
> Var(Y):=int(int((y-E(Y))^2*fxy,
x=0..1),y=0..1);
1
Var ( Y ) :=
15
> Corr(X,Y):=Cov(X,Y) /
sqrt(Var(X)*Var(Y));
5
Corr ( X, Y ) := 
177
354
> evalf(Corr(X,Y),4);
-.1878
 2005 Christopher R. Bilder
4.41
Describe the relationship between math course grade
(X) and stat course grade (Y):
o Are math and stat course grades independent or
dependent? Explain.
o If they are dependent, is there a strong or weak
amount of dependence?
o If they are dependent, is there a positive or negative
relationship between math and stat course grades?
On an exam, I may just ask you to describe the
relationship between two random variables instead of
prompting you with the above questions. In your
explanation, you should still address these types of
questions!
What we have developed is a way to understand the
relationship between two different random variables. Where
would this be useful? Suppose you want to study the
relationships between:
 Humidity and temperature
 ACT and SAT score
 White and red blood cell counts
 Winning percentage and the number of yards per game
on offense for NFL teams
…
 2005 Christopher R. Bilder
4.42
4.3: Means and Variances of Linear Combinations of
Random Variables
This section discusses some items we have already
discussed and also some new items. The main purpose
here is for you to get comfortable with finding expectations
and variances of functions of random variables.
Theorem 4.5: If a and b are constant, then
E(aX+b) = aE(X) + b
See p. 4.11 for where this was first introduced. Note
what happens if a and/or b is equal to 0.
Theorem 4.6: The expected value of the sum or difference of
two or more functions of a random variable X is the sum or
difference of the expected values of the functions. That is,
E[g(X)  h(X)] = E[g(X)]  E[h(X)]
For example, let g(X) = aX2+bX+c and h(X) = dX+f for
some constants a,b,c,d, and f. Then
E[g(X) - h(X)] = E(aX2 + bX + c - dX - f)
= aE(X2) + bE(X) + E(c) - dE(X) - E(f)
 2005 Christopher R. Bilder
4.43
= aE(X2) + bE(X) + c - dE(X) - f
The main point to remember is that you can distribute an
expectation through a linear combination of random
variables. Note that you can not distribute an
expectation through a product of random variables
except under special conditions. For example, E(X2) 
E(X)E(X) usually.
Theorem 4.7: The expected value of the sum or difference of
two or more functions of the random variables X and Y is the
sum or difference of the expected values of the functions.
That is,
E[g(X,Y)  h(X,Y)] = E[g(X,Y)]  E[h(X,Y)]
For example, let g(X,Y) = aX2+bY+c and h(X,Y) = dX+f.
Then
E[g(X) - h(X)] = E(aX2 + bY + c - dX - f)
= aE(X2) + bE(Y) + E(c) - dE(X) - E(f)
= aE(X2) + bE(Y) + c - aE(X) - f
Again, the main point to remember is that you can
distribute an expectation through a linear combination of
random variables. Note that you can not distribute an
expectation through a product of random variables
 2005 Christopher R. Bilder
4.44
except under special conditions. For example, E(XY) 
E(X)E(Y) as previously discussed (see p. 4.13).
Theorem 4.8: Let X and Y be two independent random
variables. Then E(XY) = E(X)E(Y).
We discussed this case on p. 4.13.
 
Notice that E(XY)    xy f(x,y)dx dy . Under
 
independence, this simplifies to
 
E(XY)    xy g(x)h(y)dx dy
 





  yh(y)  x g(x)dx dy

  yh(y)dy  x g(x)dx


since xg(x) has no values of y in it.
Also, remember that
 
 



E(X)E(Y)     x f(x,y)dx dy     y f(x,y)dx dy 
  
   






   x g(x)dx    yh(y)dy 
 
  

Notice what happens to the covariance and correlation
coefficient when X and Y are independent!
 2005 Christopher R. Bilder
4.45
Cov(X,Y) = XY = E(XY) – E(X)E(Y) = 0 under
independence!
XY 
XY
0

 0 under independence!
X Y X Y
Often, the variance of linear combinations is important.
2
2 2
Theorem 4.9: If a and b are constants, then aX
b  a X =
a22. Stated differently, Var(aX + b) = a2Var(X).
pf:
 2005 Christopher R. Bilder
4.46
Theorem 4.10: If a and b are constants and X and Y are
random variables with joint PDF of f(x,y), then
2
2 2
2 2
aX
bY  a X  b Y  2abXY
Stated differently,
Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y).
pf:
 2005 Christopher R. Bilder
4.47
Corollary 1: If X and Y are independent, then Var(aX+bY) =
a2Var(X) + b2Var(Y) since Cov(X,Y) = XY = 0.
This corollary will help us is Section 10.8 when deriving
the test statistic used in a hypothesis test for two means.
Example: Grades for two courses (chapter4_calc_rev.mws)
Let X be a random variable denoting grade in a math
course and Y be a random variable denoting grade in a
statistics course. Suppose the joint PDF is
 x2  2y2
f(x,y)  
 0
for 0<x<1 and 0<y<1
otherwise
This is not necessarily a realistic example, but find
Var(X + 2Y):
Var(X+2Y) = Var(X) + 22Var(Y) + 212Cov(X,Y)
= 0.0819 + 40.0667 + 4(-0.0139)
= 0.2931
In Maple,
> Var(X)+2^2*Var(Y)+2*1*2*Cov(X,Y);
211
720
 2005 Christopher R. Bilder
4.48
Theorem: If X1, X2, …, Xn are random variables from a joint
PDF of f(x1, x2, …, xn) and a1, a2, …, an are constants, then
Var(a1X1 + a2X2 + … + anXn)
n
=  ai2 Var(Xi ) + 2 ai a jCov(Xi ,X j )
i 1
n
i j
=  ai2 Var(Xi ) +  ai a jCov(Xi ,X j )
i 1
i j
Corollary: If X1, X2, …, Xn are independent random variables,
n
then Var(a1X1 + a2X2 + … + anXn) =  ai2 Var(Xi )
i 1
 2005 Christopher R. Bilder
4.49
4.4: Chebyshev’s Theorem
From Section 4.2:
“Rule of thumb” for the # of standard deviations all
possible observations (data) lies from its mean: 2 or 3.
This is an ad-hoc interpretation of Chebyshev’s Rule
(Section 4.4) and the Empirical Rule (not in our book).
The purpose of this section to give a more precise definition
and explanation of this powerful rule.
Thereom 4.11 (Cheby): The probability that any random
variable X will assume a value within k standard deviations
of the mean is at least 1-1/k2. That is,
P( - k < X <  + k)  1 -
1
k2
Notes:
 There is no mention of the PDF for X here!!! Thus, this
holds for ANY PDF!!!
 Below are some values of k and lower bounds for the
probability which are often of interest.
k
1-
1
0
1
k2
 2005 Christopher R. Bilder
4.50
k
2
3
4
5
1
k2
0.75
0.88
0.9375
0.96
1-
 While the probabilities for 2 or 3 may be lower than
you would expect based on the rule of thumb,
remember that the above table gives lower bounds
only. For a given PDF, the probabilities could be
higher!
 See p. 111 of the book for the proof.
Example: Tire life
The number of miles an automobile tire lasts before it
reaches a critical point in tread wear can be represented
by a probability distribution. Let X = the number of miles
(in thousands) an automobile is driven until it reaches
the critical tread wear point for one tire. Suppose the
probability distribution for X is
 1  30x
 e
f(x)   30
 0

for x  0
for x  0
 2005 Christopher R. Bilder
4.51
0.035
0.03
f(X)
0.025
0.02
0.015
0.01
0.005
0
0
25
50
75
100
125
150
175
200
225
250
X
Find P(-k ≤ X ≤ +k). Note that we could insert the
values of  and  into the probability. I will wait until the
end to do that.
Since -k could be less than 0 (outside of the possible
values of X), we should use the following expression for
the probability:
If -k > 0 then
P(-k ≤ X ≤ +k) = - e-(+k)/30 + e-(-k)/30
 2005 Christopher R. Bilder
4.52
If -k < 0 then
P(0 ≤ X ≤ +k) = - e-(+k)/30 + e-0/30 = -e-(+k)/30 + 1
Earlier, we found that  = 30 and  = 30. Then
P(30-30k ≤ X ≤ 30+30k) can be found for various
values of k. Note that for k1, 30-30k0, so we can
find P(0 ≤ X ≤ 30+30k) instead.
1
k2
P(0 ≤ X ≤ 30+30k)
k
1-
1
2
3
4
5
0
0.75
0.88
0.9375
0.96
P(0 ≤ X ≤ 60) = 0.8647
P(0 ≤ X ≤ 90) = 0.9502
P(0 ≤ X ≤ 120) = 0.9817
P(0 ≤ X ≤ 150) = 0.9933
P(0 ≤ X ≤ 180) = 0.9975
Would you expect most tires to have a lifespan (time
until tire reaches critical tread wear) of with 2 (or 3)
standard deviations from the mean?
Question:
Suppose the random variables X and Y both have
means of 10. For X, the standard deviation is 2 and for
Y the standard deviation is 4. For which random
 2005 Christopher R. Bilder
4.53
variable would you expect possible observed values to
be closer to the mean?
 2005 Christopher R. Bilder
Download