2. Random Variables

advertisement
2. Random Variables
Dave Goldsman
Georgia Institute of Technology, Atlanta, GA, USA
5/21/14
Goldsman
5/21/14
1 / 147
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
2 / 147
Intro / Definitions
Definition: A random variable (RV) is a function from the sample
space to the real line. X : S → R.
Example: Flip 2 coins. S = {HH, HT, T H, T T }.
Suppose X is the RV corresponding to the # of H’s.
X(T T ) = 0, X(HT ) = X(T H) = 1, X(HH) = 2.
P (X = 0) = 14 , P (X = 1) = 12 , P (X = 2) = 41 .
Notation: Capital letters like X, Y, Z, U, V, W usually represent RV’s.
Small letters like x, y, z, u, v, w usually represent particular values of
the RV’s.
Thus, you can speak of P (X = x).
Goldsman
5/21/14
3 / 147
Intro / Definitions
Example: Let X be the sum of two dice rolls. Then X((4, 6)) = 10. In
addition,


1/36 if x = 2





2/36 if x = 3




..



 .
P (X = x) =
6/36 if x = 7



 ...






1/36 if x = 12



 0
otherwise
Goldsman
5/21/14
4 / 147
Intro / Definitions
Example: Flip a coin.
(
X ≡
0 if T
1 if H
Example: Roll a die.
(
Y ≡
0 if {1, 2, 3}
1 if {4, 5, 6}
For our purposes, X and Y are the same, since
P (X = 0) = P (Y = 0) = 21 and P (X = 1) = P (Y = 1) = 21 .
Goldsman
5/21/14
5 / 147
Intro / Definitions
Example: Select a real # at random betw 0 and 1.
Infinite number of “equally likely” outcomes.
Conclusion: P (each individual point) = P (X = x) = 0, believe it or not!
But P (X ≤ 0.5) = 0.5 and P (X ∈ [0.3, 0.7]) = 0.4.
If A is any interval in [0,1], then P (A) is the length of A.
Goldsman
5/21/14
6 / 147
Intro / Definitions
Definition: If the number of possible values of a RV X is finite or
countably infinite, then X is a discrete RV. Otherwise,. . .
A continuous RV is one with prob 0 at every point.
Example: Flip a coin — get H or T . Discrete.
Example: Pick a point at random in [0, 1]. Continuous.
Goldsman
5/21/14
7 / 147
Discrete Random Variables
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
8 / 147
Discrete Random Variables
Definition: If X is a discrete RV, its probability mass function (pmf) is
f (x) ≡ P (X = x).
Note that 0 ≤ f (x) ≤ 1,
P
x f (x)
= 1.
Example: Flip 2 coins. Let X be the number of heads.



 1/4 if x = 0 or 2
f (x) =
1/2 if x = 1


 0 otherwise
Example: Uniform Distribution on integers 1, 2, . . . , n. X can equal
1, 2, . . . , n, each with prob 1/n. f (i) = 1/n, i = 1, 2, . . . , n.
Goldsman
5/21/14
9 / 147
Discrete Random Variables
Example/Definition: Let X denote the number of “successes” from n
independent trials such that the P (success) at each trial is p
(0 ≤ p ≤ 1). Then X has the Binomial Distribution with parameters n
and p.
The trials are referred to as Bernoulli trials.
Notation: X ∼ Bin(n, p). “X has the Bin distribution”
Example: Roll a die 3 indep times. Find
P (Get exactly two 6’s).
“success” (6) and “failure” (1,2,3,4,5)
All 3 trials are indep, and P (success) = 1/6 doesn’t change from trial
to trial.
Let X = # of 6’s. Then X ∼ Bin(3, 16 ).
Goldsman
5/21/14
10 / 147
Discrete Random Variables
Theorem: If X ∼ Bin(n, p), then the prob of k successes in n trials is
n k n−k
P (X = k) =
p q
, k = 0, 1, . . . , n,
k
where q = 1 − p.
Proof: Particular sequence of successes and failures:
· · · F}
SS
· · · S} |F F {z
| {z
(prob = pk q n−k )
k successes n − k failures
Number of ways to arrange the seq is
Goldsman
n
k
. Done.
5/21/14
11 / 147
Discrete Random Variables
Back to the dice example, where X ∼ Bin(3, 16 ), and we want
P (Get exactly two 6’s).
n = 3, k = 2, p = 1/6, q = 5/6.
15
3 1 2 5 1
=
P (X = 2) =
2 6
6
216
Goldsman
k
0
1
2
3
P (X = k)
125
216
75
216
15
216
1
216
5/21/14
12 / 147
Discrete Random Variables
Example: Roll 2 dice 12 times.
Find P (Result will be 7 or 11 exactly 3 times).
Let X = the number of times get 7 or 11.
P (7 or 11) = P (7) + P (11) =
6
36
+
2
36
=
2
9.
So X ∼ Bin(12, 2/9).
12 2 3 7 9
P (X = 3) =
.
3
9
9
Goldsman
5/21/14
13 / 147
Discrete Random Variables
Definition: If P (X = k) = e−λ λk /k!, k = 0, 1, 2, . . ., λ > 0, we say that
X has the Poisson distribution with parameter λ.
Notation: X ∼ Pois(λ).
Example: Suppose the number of raisins in a cup of cookie dough is
Pois(10). Find the prob that a cup of dough has at least 4 raisins.
P (X ≥ 4) = 1 − P (X = 0, 1, 2, 3)
0
10
101 102 103
= 1 − e−10
+
+
+
0!
1!
2!
3!
= 0.9897.
Goldsman
5/21/14
14 / 147
Continuous Random Variables
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
15 / 147
Continuous Random Variables
Example: Pick a point X randomly between 0 and 1.
(
1 if 0 ≤ x ≤ 1
f (x) =
0 otherwise
P (a < X < b) =
area under f (x) from a to b
= b − a.
Goldsman
5/21/14
16 / 147
Continuous Random Variables
Definition: Suppose X is a continuous RV. f (x) is the probability
density function (pdf) if
R
R f (x) dx = 1 (area under f (x) is 1)
f (x) ≥ 0, ∀x (always non-negative)
R
If A ⊆ R, then P (X ∈ A) = A f (x) dx (probability that X is in a
certain region A)
Remarks: If X is a continuous RV, then
P (a < X < b) =
Rb
a
f (x) dx.
An individual point has prob 0, i.e., P (X = x) = 0.
Think of f (x) dx ≈ P (x < X < x + dx).
Goldsman
5/21/14
17 / 147
Continuous Random Variables
Note that f (x) denotes both pmf (discrete case) and pdf (continuous
case) — but they are different:
f (x) = P (X = x) if X is discrete.
Must have 0 ≤ f (x) ≤ 1.
f (x) dx ≈ P (x < X < x + dx) if X is continuous. Must have f (x) ≥ 0
(and possibly > 1).
If X is cts, weR calculate the prob of an event by integrating,
P (X ∈ A) = A f (x) dx.
Goldsman
5/21/14
18 / 147
Continuous Random Variables
Example: If X is “equally likely” to be anywhere between a and b, then
X has the uniform distribution on (a, b).
(
1
b−a if a < x < b
f (x) =
0 otherwise
Notation: X ∼ U(a, b)
Note:
R
R f (x) dx
Goldsman
=
Rb
1
a b−a
dx = 1 (as desired).
5/21/14
19 / 147
Continuous Random Variables
Example: X has the exponential distriibution with parameter λ > 0 if
it has pdf
(
λe−λx if x ≥ 0
f (x) =
0
otherwise
Notation: X ∼ Exp(λ)
Note:
R
R f (x) dx
Goldsman
=
R∞
0
λe−λx dx = 1 (as desired).
5/21/14
20 / 147
Continuous Random Variables
Example: Suppose X ∼ Exp(1). Then
P (X ≤ 3) =
R3
P (X ≥ 5) =
R∞
0
5
e−x dx = 1 − e−3 .
e−x dx = e−5 .
P (2 ≤ X < 4) = P (2 ≤ X ≤ 4) =
P (X = 3) =
Goldsman
R3
3
R4
2
e−x dx = e−2 − e−4 .
e−x dx = 0.
5/21/14
21 / 147
Continuous Random Variables
Example: Suppose X is a cts RV with pdf
(
cx2 if 0 < x < 2
f (x) =
0 otherwise
Find c.
=
R2
Answer: P (0 < X < 1) =
R1
Answer: 1 =
R
R f (x) dx
0
cx2 dx = 38 c, so c = 3/8.
Find P (0 < X < 1).
Goldsman
3 2
0 8 x dx
= 1/8.
5/21/14
22 / 147
Continuous Random Variables
Find P (0 < X < 1| 12 < X < 32 ).
Answer:
1
3
P 0 < X < 1 < X <
2
2
1
P (0 < X < 1 and 2 < X < 32 )
=
P ( 12 < X < 23 )
=
=
Goldsman
P ( 12 < X < 1)
P ( 12 < X < 23 )
R1 3 2
1/2 8 x dx
= 7/26.
R 3/2 3
2 dx
x
1/2 8
5/21/14
23 / 147
Cumulative Distribution Functions
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
24 / 147
Cumulative Distribution Functions
Definition: For any RV X, the cumulative distribution function (cdf)
is defined for all x by F (x) ≡ P (X ≤ x).
X discrete implies
F (x) =
X
f (y) =
{y|y≤x}
X
P (X = y).
{y|y≤x}
X continuous implies
Z
x
F (x) =
f (t) dt.
−∞
Goldsman
5/21/14
25 / 147
Cumulative Distribution Functions
Discrete cdf’s
Example: Flip a coin twice. X = number of H’s.
(
0 or 2 w.p. 1/4
X =
1
w.p. 1/2


0



 1/4
F (x) = P (X ≤ x) =

3/4




1
if x < 0
if 0 ≤ x < 1
if 1 ≤ x < 2
if x ≥ 2
(You have to be careful about “≤” vs. “<”.)
Goldsman
5/21/14
26 / 147
Cumulative Distribution Functions
Continuous cdf’s
Theorem: If X is a continuous RV, then f (x) = F 0 (x).
Proof: F 0 (x) =
calculus.
d
dx
Rx
−∞ f (t) dt
= f (x), by the fundamental theorem of
Remark: If X is continuous, you can get from the pdf f (x) to the cdf
F (x) by integrating.
Goldsman
5/21/14
27 / 147
Cumulative Distribution Functions
Example: X ∼ U(0, 1).
(
f (x) =
F (x) =
Goldsman
1 if 0 < x < 1
0 otherwise



 0 if x ≤ 0
x if 0 < x < 1


 1 if x ≥ 1
5/21/14
28 / 147
Cumulative Distribution Functions
Example: X ∼ Exp(λ).
(
f (x) =
Z
(
f (t) dt =
−∞
otherwise
0
x
F (x) =
Goldsman
λe−λx if x > 0
1−
0
if x ≤ 0
e−λx
if x > 0
5/21/14
29 / 147
Cumulative Distribution Functions
Properties of all cdf’s
F (x) is non-decreasing in x, i.e., a < b implies that F (a) ≤ F (b).
limx→∞ F (x) = 1 and limx→−∞ F (x) = 0.
F (x) is right-cts at every point x.
Theorem: P (X > x) = 1 − F (x).
Proof:
1 = P (X ≤ x) + P (X > x) = F (x) + P (X > x).
Goldsman
5/21/14
30 / 147
Cumulative Distribution Functions
Theorem:
a < b ⇒ P (a < X ≤ b) = F (b) − F (a).
Proof:
P (a < X ≤ b)
= P (X > a ∩ X ≤ b)
= P (X > a) + P (X ≤ b) − P (X > a ∪ X ≤ b)
= 1 − F (a) + F (b) − 1.
Goldsman
5/21/14
31 / 147
Great Expectations
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
32 / 147
Great Expectations
Great Expectations
Mean (Expected Value)
Law of the Unconscious Statistician
Variance
Chebychev’s Inequality
Goldsman
5/21/14
33 / 147
Great Expectations
Definition: The mean or expected value or average of a RV X is
( P
xf (x) if X is discrete
µ ≡ E[X] ≡
R x
R xf (x) dx if X is cts
The mean gives an indication of a RV’s central tendency. It can be
thought of as a weighted average of the possible x’s, where the
weights are given by f (x).
Goldsman
5/21/14
34 / 147
Great Expectations
Example: Suppose X has the Bernoulli distribution with parameter
p, i.e., P (X = 1) = p, P (X = 0) = q = 1 − p. Then
X
E[X] =
xP (X = x) = 1 · p + 0 · q = p.
x
Example: Die. X = 1, 2, . . . , 6, each w.p. 1/6. Then
E[X] =
X
xf (x) = 1 ·
x
1
1
+ · · · + 6 · = 3.5.
6
6
Example: Suppose X has the Geometric distribution with parameter
p, i.e., X is the number of Bern(p) trials until you obtain your first
success (e.g., F F F F S corresponds to X = 5). Then
f (x) = (1 − p)x−1 p, x = 1, 2, . . ., and after a lot of algebra,
E[X] =
X
x
Goldsman
xP (X = x) =
∞
X
x(1 − p)x−1 p = 1/p.
x=1
5/21/14
35 / 147
Great Expectations
Example: X ∼ Exp(λ). f (x) = λe−λx , x ≥ 0. Then
Z
E[X] =
xf (x) dx
R
Z ∞
=
xλe−λx dx
0
Z ∞
−λx ∞
= −xe
|0 −
(−e−λx ) dx (by parts)
0
Z ∞
=
e−λx dx (L’Hôpital’s rule)
0
= 1/λ.
Goldsman
5/21/14
36 / 147
Great Expectations
Law of the Unconscious Statistician (LOTUS)
Definition/Theorem: The expected value of a function of X, say h(X),
is
( P
h(x)f (x) if X is discrete
E[h(X)] ≡
R x
R h(x)f (x) dx if X is cts
E[h(X)] is simply a weighted function of h(x), where the weights are
the f (x) values.
Examples: E[X 2 ] =
E[sin X] =
Goldsman
R
R
Rx
2 f (x) dx
R (sin x)f (x) dx
5/21/14
37 / 147
Great Expectations
Just a moment please. . .
Definition: The kth moment of X is
( P
xk f (x) if X is discrete
k
E[X ] =
R xk
R x f (x) dx if X is cts
Definition: The kth central moment of X is
( P
(x − µ)k f (x) X is discrete
E[(X − µ)k ] =
R x
k
R (x − µ) f (x) dx X is cts
Goldsman
5/21/14
38 / 147
Great Expectations
Definition: The variance of X is the second central moment, i.e.,
Var(X) ≡ E[(X − µ)2 ]. It’s a measure of spread or dispersion.
Notation: σ 2 ≡ Var(X).
p
Definition: The standard deviation of X is σ ≡ + Var(X).
Goldsman
5/21/14
39 / 147
Great Expectations
Theorem: For any h(X) and constants a and b, we have
E[ah(X) + b] = aE[h(X)] + b.
Proof (just do cts case):
Z
E[ah(X) + b] =
(ah(x) + b)f (x) dx
Z
= a h(x)f (x) dx + b f (x) dx
R
Z
R
R
= aE[h(X)] + b.
Remark: In particular, E[aX + b] = aE[X] + b.
Similarly, E[g(X) + h(X)] = E[g(X)] + E[h(X)].
Goldsman
5/21/14
40 / 147
Great Expectations
Theorem (easier way to calculate variance):
Var(X) = E[X 2 ] − (E[X])2
Proof:
Var(X) = E[(X − µ)2 ]
= E[X 2 − 2µX + µ2 ]
= E[X 2 ] − 2µE[X] + µ2 (by above remarks)
= E[X 2 ] − µ2 .
Goldsman
5/21/14
41 / 147
Great Expectations
Example: X ∼ Bern(p).
(
X =
1 w.p. p
0 w.p. q
Recall E[X] = p. In fact, for any k,
E[X k ] = 0k · q + 1k · p = p.
So Var(X) = E[X 2 ] − (E[X])2 = p − p2 = pq.
Goldsman
5/21/14
42 / 147
Great Expectations
Example: X ∼ U(a, b). f (x) =
Z
E[X] =
E[X 2 ] =
a
b
x
a
Z
1
b−a ,
b
x2
a < x < b.
a+b
1
dx =
b−a
2
1
a2 + ab + b2
dx =
b−a
3
Var(X) = E[X 2 ] − (E[X])2 =
Goldsman
(a − b)2
(algebra).
12
5/21/14
43 / 147
Great Expectations
Theorem: Var(aX + b) = a2 Var(X). A “shift” doesn’t matter!
Proof:
Var(aX + b) = E[(aX + b)2 ] − (E[aX + b])2
= E[a2 X 2 + 2abX + b2 ] − (aE[X] + b)2
= a2 E[X 2 ] + 2abE[X] + b2
−(a2 (E[X])2 + 2abE[X] + b2 )
= a2 (E[X 2 ] − (E[X])2 )
= a2 Var(X)
Goldsman
5/21/14
44 / 147
Great Expectations
Example: X ∼ Bern(0.3)
Recall that E[X] = p = 0.3 and
Var(X) = pq = (0.3)(0.7) = 0.21.
Let Y = h(X) = 4X + 5. Then
E[Y ] = E[4X + 5] = 4E[X] + 5 = 6.2
Var(Y ) = Var(4X + 5) = 16Var(X) = 3.36.
Goldsman
5/21/14
45 / 147
Great Expectations
Approximations to E[h(X)] and Var(h(X))
Sometimes Y = h(X) is messy, and we may have to approximate
E[h(X)] and Var(h(X)). Use a Taylor series approach. To do so, let
µ = E[X] and σ 2 = Var(X) and note that
Y = h(µ) + (X − µ)h0 (µ) +
(X − µ)2 00
h (µ) + R,
2
where R is a remainder term which we shall now ignore.
Then
h00 (µ)σ 2
E[(X − µ)2 ] 00
.
h (µ) = h(µ) +
E[Y ] = h(µ) + E[X − µ]h0 (µ) +
2
2
and (now we’ll use an even-cruder approximation)
.
Var(Y ) = Var h(µ) + (X − µ)h0 (µ) = [h0 (µ)]2 σ 2 .
Goldsman
5/21/14
46 / 147
Great Expectations
Approximations to E[h(X)] and Var(h(X))
Example: Suppose that X has pdf f (x) = 3x2 , 0 ≤ x ≤ 1, and that we
want to test out out approximations on the “complicated” random
variable Y = h(X) = X 3/4 . Well, it’s not really that complicated, since
we can calculate the exact moments:
Z
1
E[Y ] =
x3/4 f (x) dx =
Z
x6/4 f (x) dx =
Z
0
E[Y 2 ] =
Z
1
3x11/4 dx = 4/5
0
1
0
1
3x7/2 dx = 2/3
0
Var(Y ) = E[Y 2 ] − (E[Y ])2 = 2/75 = 0.0267.
Before we can do the approximation, note that
Z 1
Z 1
xf (x) dx =
3x3 dx = 3/4
µ = E[X] =
0
2
0
2
2
σ = Var(X) = E[X ] − (E[X]) = 3/80 = 0.0375.
Goldsman
5/21/14
47 / 147
Great Expectations
Further,
h(µ) = µ3/4 = (3/4)3/4 = 0.8059
h0 (µ) = (3/4)µ−1/4 = (3/4)(3/4)−1/4 = 0.8059
h00 (µ) = −(3/16)µ−5/4 = −0.2686.
Thus,
(0.2686)(0.0375)
h00 (µ)σ 2
.
= 0.8059 −
= 0.8009
E[Y ] = h(µ) +
2
2
and
.
Var(Y ) = [h0 (µ)]2 σ 2 = (0.8059)2 (0.0375) = 0.0243,
both of which are reasonably close to their true values.
Goldsman
5/21/14
48 / 147
Great Expectations
Chebychev’s Inequality
Theorem: Suppose that E[X] = µ and Var(X) = σ 2 . Then for any
> 0,
P (|X − µ| ≥ ) ≤ σ 2 /2 .
Proof: See text.
Remarks: If = kσ, then P (|X − µ| ≥ kσ) ≤ 1/k 2 .
P (|X − µ| < ) ≥ 1 − σ 2 /2 .
Chebychev gives a crude bound on the prob that X deviates from the
mean by more than a constant, in terms of the constant and the
variance. You can always use Chebychev, but it’s crude.
Goldsman
5/21/14
49 / 147
Great Expectations
Example: Suppose X ∼ U(0, 1). f (x) = 1, 0 < x < 1.
E[X] =
a+b
2
= 1/2, Var(X) =
(a−b)2
12
= 1/12.
Then Chebychev implies
1 1
P X − ≥ ≤
.
2
122
In particular, for = 1/3,
1
3
1
(upper bound).
≤
P X − ≥
2
3
4
Goldsman
5/21/14
50 / 147
Great Expectations
Example (cont’d): Let’s compare the above bound to the exact answer.
1 1
P X − ≥
2
3
1 1
= 1 − P X − <
2
3
1
1
1
= 1−P − <X − <
3
2
3
1
5
= 1−P
<X<
6
6
Z 5/6
= 1−
f (x) dx
1/6
2
= 1 − = 1/3.
3
(So Chebychev bound was pretty high in comparison.)
Goldsman
5/21/14
51 / 147
Functions of a Random Variable
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
52 / 147
Functions of a Random Variable
Functions of a Random Variable
Problem Statement
Discrete Case
Continuous Case
Inverse Transform Theorem
Problem: You have a RV X and you know its pmf/pdf f (x).
Define Y ≡ h(X) (some fn of X).
Find g(y), the pmf/pdf of Y .
Goldsman
5/21/14
53 / 147
Functions of a Random Variable
Discrete Case: X discrete implies Y discrete implies
X
g(y) = P (Y = y) = P (h(X) = y) = P ({x|h(x) = y}) =
f (x).
x|h(x)=y
Example: X is the # of H’s in 2 coin tosses. Want pmf for
Y = h(X) = X 3 − X.
x
0
1
2
P (X = x)
1
4
1
2
1
4
y = x3 − x
0
0
6
g(0) = P (Y = 0) = P (X = 0 or 1) = 3/4 and
g(6) = P (Y = 6) = P (X = 2) = 1/4.
(
3/4 if y = 0
g(y) =
1/4 if y = 6
Goldsman
5/21/14
54 / 147
Functions of a Random Variable
Example: X is discrete with
f (x) =


1/8 if x = −1






 3/8 if x = 0
1/3 if x = 1




1/6 if x = 2



 0 otherwise
Let Y = X 2 (Y can only equal 0,1,4).


P (Y = 0) = f (0) = 3/8



 P (Y = 1) = f (−1) + f (1) = 11/24
g(y) =

P (Y = 4) = f (2) = 1/6




0, otherwise
Goldsman
5/21/14
55 / 147
Functions of a Random Variable
Continuous Case: X continuous implies Y can be continuous or
discrete.
Example: Y = X 2 (clearly cts)
(
0 if X < 0
Example: Y =
is not continuous.
1 if X ≥ 0
Method: Compute G(y), the cdf of Y .
Z
G(y) = P (Y ≤ y) = P (h(X) ≤ y) =
f (x) dx.
{x|h(x)≤y}
If G(y) is cts, construct the pdf g(y) by differentiating.
Goldsman
5/21/14
56 / 147
Functions of a Random Variable
Example: f (x) = |x|, −1 ≤ x ≤ 1.
Find the pdf of the RV Y = X 2 .
2
G(y) = P (Y ≤ y) = P (X ≤ y) =



 0
,
1
if y ≥ 1


 (?) if 0 < y < 1
where
√
√
(?) = P (− y ≤ X ≤ y) =
Goldsman
if y ≤ 0
Z
√
y
√
− y
|x| dx = y.
5/21/14
57 / 147
Functions of a Random Variable
Thus,
G(y) =



 0
if y ≤ 0
1
if y ≥ 1


 y if 0 < y < 1
This implies
(
g(y) = G0 (y) =
0 if y ≤ 0 or y ≥ 1
1
if 0 < y < 1
This is the U(0,1) distribution!
Goldsman
5/21/14
58 / 147
Functions of a Random Variable
Example: Suppose U ∼ U(0, 1). Find the pdf of Y = −`n(1 − U ).
G(y) = P (Y ≤ y)
= P (−`n(1 − U ) ≤ y)
= P (1 − U ≥ e−y )
= P (U ≤ 1 − e−y )
Z 1−e−y
f (u) du
=
0
= 1 − e−y (since f (u) = 1)
Taking the derivative, we have g(y) = e−y , y > 0.
Wow! This implies Y ∼ Exp(λ = 1).
We can generalize this result. . .
Goldsman
5/21/14
59 / 147
Functions of a Random Variable
Inverse Transform Theorem: Suppose X is a cts RV having cdf F (x).
Then the random variable F (X) ∼ U(0, 1).
Proof: Let Y = F (X). Then the cdf of Y is
G(y) = P (Y ≤ y)
= P (F (X) ≤ y)
= P (X ≤ F −1 (y)) (the cdf is mono. increasing)
= F (F −1 (y)) (F (x) is the cdf of X)
= y. Uniform!
Goldsman
5/21/14
60 / 147
Functions of a Random Variable
Remark: This is a great theorem, since it applies to all continuous RV’s
X.
Corollary: X = F −1 (U ), so you can plug in a U(0,1) RV into the
inverse cdf to generate a realization of a RV having X’s distribution.
Remark: This is what we did in the example on the previous page.
This trick has tremendous applications in simulation.
Goldsman
5/21/14
61 / 147
Bivariate Random Variables
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
62 / 147
Bivariate Random Variables
Bivariate Random Variables
Discrete Case
Continuous Case
Bivariate cdf’s
Marginal Distributions
Now let’s look at what happens when you consider 2 RV’s
simultaneously.
Example: Choose a person at random. Look at height and weight
(X, Y ). Obviously, X and Y will be related somehow.
Goldsman
5/21/14
63 / 147
Bivariate Random Variables
Discrete Case
Definition: If X and Y are discrete RV’s, then (X, Y ) is called a jointly
discrete bivariate RV.
The joint (or bivariate) pmf is
f (x, y) = P (X = x, Y = y).
Properties:
(1) 0P≤ P
f (x, y) ≤ 1.
(2) x y f (x, y) = 1.
PP
(3) A ⊆ R2 ⇒ P ((X, Y ) ∈ A) =
(x,y)∈A f (x, y).
Goldsman
5/21/14
64 / 147
Bivariate Random Variables
Example: 3 sox in a box (numbered 1,2,3). Draw 2 sox at random w/o
replacement. X = # of first sock, Y = # of second sock. The joint
pmf f (x, y) is
X=1
X=2
X=3
P (Y = y)
Y =1
0
1/6
1/6
1/3
Y =2
1/6
0
1/6
1/3
Y =3
1/6
1/6
0
1/3
P (X = x)
1/3
1/3
1/3
1
fX (x) ≡ P (X = x) is the “marginal” distribution of X.
fY (y) ≡ P (Y = y) is the “marginal” distribution of Y .
Goldsman
5/21/14
65 / 147
Bivariate Random Variables
By the law of total probability,
P (X = 1) =
3
X
P (X = 1, Y = y) = 1/3.
y=1
In addition,
P (X ≥ 2, Y ≥ 2)
XX
=
f (x, y)
x≥2 y≥2
= f (2, 2) + f (2, 3) + f (3, 2) + f (3, 3)
= 0 + 1/6 + 1/6 + 0 = 1/3.
Goldsman
5/21/14
66 / 147
Bivariate Random Variables
Continuous Case
Definition: If X and Y are cts RV’s, then (X, Y ) is a jointly cts RV if
there exists a function f (x, y) such that
(1) fR (x,
R y) ≥ 0, ∀x, y.
(2)
R2 f (x, y) dx dy = 1.
RR
(3) P (A) = P ((X, Y ) ∈ A) =
A f (x, y) dx dy.
In this case, f (x, y) is called the joint pdf.
If A ⊆ R2 , then P (A) is the volume between f (x, y) and A.
Think of
f (x, y) dx dy ≈ P (x < X < x + dx, y < Y < y + dy).
Easy to see how all of this generalizes the 1-dimensional pdf, f (x).
Goldsman
5/21/14
67 / 147
Bivariate Random Variables
Easy Example: Choose a point at random in the interior of the circle
inscribed in the unit square, e.g., C ≡ (x − 12 )2 + (y − 21 )2 = 14 . Find the
pdf of (X, Y ).
Since the area of the circle is π/4,
(
4/π if (x, y) ∈ C
f (x, y) =
0
otherwise
Application: Toss n darts randomly into the unit square. The probability
that any individual dart will land in the circle is π/4. It stands to reason
that the proportion of darts, p̂n , that land in the circle will be
approximately π/4. So you can use 4p̂n to estimate π!
Goldsman
5/21/14
68 / 147
Bivariate Random Variables
Example: Suppose that
(
f (x, y) =
4xy if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
otherwise
0
Find the prob (volume) of the region 0 ≤ y ≤ 1 − x2 .
Z
V
1 Z 1−x2
=
4xy dy dx
0
Z
0
1Z
=
√
1−y
4xy dx dy
0
0
= 1/3.
Moral: Be careful with limits!
Goldsman
5/21/14
69 / 147
Bivariate Random Variables
Bivariate cdf’s
Definition: The joint (bivariate) cdf of X and Y is
F (x, y) ≡ P (X ≤ x, Y ≤ y), for all x, y.
 PP

discrete

s≤x,t≤y f (s, t)

F (x, y) =


 R y R x f (s, t) ds dt continuous
−∞ −∞
Goldsman
5/21/14
70 / 147
Bivariate Random Variables
Properties:
F (x, y) is non-decreasing in both x and y.
limx→−∞ F (x, y) = limy→−∞ F (x, y) = 0.
limx→∞ F (x, y) = FY (y) = P (Y ≤ y)
limy→∞ F (x, y) = FX (x) = P (X ≤ x).
limx→∞ limy→∞ F (x, y) = 1.
F (x, y) is cts from the right in both x and y.
Going from cdf’s to pdf’s (continuous case).
1-dimension: f (x) = F 0 (x) =
2-D: f (x, y) =
Goldsman
∂2
∂x∂y F (x, y)
=
d
dx
Rx
∂2
∂x∂y
−∞ f (t) dt.
Rx Ry
−∞ −∞ f (s, t) dt ds.
5/21/14
71 / 147
Bivariate Random Variables
Example: Suppose
(
F (x, y) =
1 − e−x − e−y + e−(x+y)
if x ≥ 0, y ≥ 0
if x < 0 or y < 0
0
The “marginal” cdf of X is
(
FX (x) = lim F (x, y) =
y→∞
1 − e−x if x ≥ 0
0
if x < 0
The joint pdf is
∂2
F (x, y)
∂x∂y
∂ −x
(e − e−y e−x )
=
∂y
(
e−(x+y) if x ≥ 0, y ≥ 0
=
0
if x < 0 or y < 0
f (x, y) =
Goldsman
5/21/14
72 / 147
Bivariate Random Variables
Marginal Distributions — Distrns of X and Y .
Example (discrete case): f (x, y) = P (X = x, Y = y).
X=1
X=2
X=3
P (Y = y)
Y =4
0.01
0.07
0.12
0.2
Y =6
0.29
0.03
0.48
0.8
P (X = x)
0.3
0.1
0.6
1
By total probability,
P (X = 1) = P (X = 1, Y = any #) = 0.3.
Goldsman
5/21/14
73 / 147
Bivariate Random Variables
Definition: If X and Y are jointly discrete, then the marginal pmf’s of
X and Y are, respectively,
X
fX (x) = P (X = x) =
f (x, y)
y
and
fY (y) = P (Y = y) =
X
f (x, y)
x
Goldsman
5/21/14
74 / 147
Bivariate Random Variables
Example (discrete case): f (x, y) = P (X = x, Y = y).
X=1
X=2
X=3
P (Y = y)
Y =4
0.06
0.02
0.12
0.2
Y =6
0.24
0.08
0.48
0.8
P (X = x)
0.3
0.1
0.6
1
Hmmm. . . . Compared to the last example, this has the same
marginals but different joint distribution!
Goldsman
5/21/14
75 / 147
Bivariate Random Variables
Definition: If X and Y are jointly cts, then the marginal pdf’s of X and
Y are, respectively,
Z
Z
fX (x) =
f (x, y) dy and fY (y) =
f (x, y) dx.
R
R
Example:
(
f (x, y) =
e−(x+y) if x ≥ 0, y ≥ 0
otherwise
0
Then the marginal pdf of X is
Z
fX (x) =
f (x, y) dy =
R
Goldsman
Z
(
∞
e
0
−(x+y)
dy =
e−x
0
if x ≥ 0
otherwise
5/21/14
76 / 147
Bivariate Random Variables
Example:
(
f (x, y) =
21 2
4 x y
0
if x2 ≤ y ≤ 1
otherwise
Note funny limits where the pdf is positive, i.e., x2 ≤ y ≤ 1.
Z
fX (x) =
Z
1
f (x, y) dy =
x2
R
Z
fY (y) =
Goldsman
Z
f (x, y) dx =
R
21
21 2
x y dy = x2 (1 − x4 ),
4
8
√
y
√
− y
21 2
7
x y dx = y 5/2 ,
4
2
if −1 ≤ x ≤ 1
if 0 ≤ y ≤ 1
5/21/14
77 / 147
Conditional Distributions
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
78 / 147
Conditional Distributions
Conditional Distributions
Recall conditional probability: P (A|B) = P (A ∩ B)/P (B) if P (B) > 0.
Suppose that X and Y are jointly discrete RV’s. Then if P (Y = y) > 0,
P (X = x|Y = y) =
P (X = x ∩ Y = y)
f (x, y)
=
P (Y = y)
fY (y)
P (X = x|Y = 2) defines the probabilities on X given that Y = 2.
Definition: If fY (y) > 0, then fX|Y (x|y) ≡
pmf/pdf of X given Y = y.
f (x,y)
fY (y)
is the conditional
Remark: Usually just write f (x|y) instead of fX|Y (x|y).
Remark: Of course, fY |X (y|x) = f (y|x) =
Goldsman
f (x,y)
fX (x) .
5/21/14
79 / 147
Conditional Distributions
Discrete Example: f (x, y) = P (X = x, Y = y).
X=1
X=2
X=3
fY (y)
Y =4
0.01
0.07
0.12
0.2
Y =6
0.29
0.03
0.48
0.8
fX (x)
0.3
0.1
0.6
1
Then, for example,
f (x|y = 6) =
Goldsman






f (x, 6)
f (x, 6)
=
=

fY (6)
0.8




29
80
3
80
48
80
0
if x = 1
if x = 2
if x = 3
otherwise
5/21/14
80 / 147
Conditional Distributions
Old Cts Example:
f (x, y) =
21 2
x y, if x2 ≤ y ≤ 1
4
fX (x) =
21 2
x (1 − x4 ), if −1 ≤ x ≤ 1
8
fY (y) =
7 5/2
y , if 0 ≤ y ≤ 1
2
Then, for example,
f ( 1 , y)
1
f (y| ) = 2 1 =
2
fX ( 2 )
Goldsman
21
8
·
21
4
1
4 ·
· 14 y
(1 −
1
16 )
=
32
y,
15
if
1
4
≤ y ≤ 1.
5/21/14
81 / 147
Conditional Distributions
More generally,
f (x, y)
fX (x)
f (y|x) =
=
21 2
4 x y
,
21 2
4
8 x (1 − x )
=
2y
,
1 − x4
if x2 ≤ y ≤ 1
if x2 ≤ y ≤ 1.
Note: 2/(1 − x4 ) is a constant with respect to y, and we can check to
see that f (y|x) is a legit condl pdf:
Z
1
x2
Goldsman
2y
dy = 1.
1 − x4
5/21/14
82 / 147
Conditional Distributions
Typical Problem: Given fX (x) and f (y|x), find fY (y).
Steps: (1) f (x, y) = fX (x)f (y|x)
(2) fY (y) =
R
R f (x, y) dx.
Example: fX (x) = 2x, 0 < x < 1.
Given X = x, suppose that Y |x ∼ U(0, x). Now find fY (y).
Goldsman
5/21/14
83 / 147
Conditional Distributions
Solution: Y |x ∼ U(0, x) ⇒ f (y|x) = 1/x, 0 < y < x. So
f (x, y) = fX (x)f (y|x)
1
= 2x · , if 0 < x < 1 and 0 < y < x
x
= 2, if 0 < y < x < 1.
Thus,
Z
fY (y) =
1
2 dx = 2(1 − y), 0 < y < 1.
f (x, y) dx =
R
Goldsman
Z
y
5/21/14
84 / 147
Independent Random Variables
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
85 / 147
Independent Random Variables
Independence
Recall that two events are independent if P (A ∩ B) = P (A)P (B).
Then
P (A|B) =
P (A ∩ B)
P (A)P (B)
=
= P (A).
P (B)
P (B)
And similarly, P (B|A) = P (B).
Now want to define independence for RV’s, i.e., the outcome of X
doesn’t influence the outcome of Y .
Definition: X and Y are independent RV’s if, for all x and y,
f (x, y) = fX (x)fY (y).
Goldsman
5/21/14
86 / 147
Independent Random Variables
Equivalent definitions:
F (x, y) = FX (x)FY (y), ∀x, y
or
P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y), ∀x, y
If X and Y aren’t indep, then they’re dependent.
Theorem: X and Y are indep if and only if f (y|x) = fY (y) ∀x, y.
Proof:
f (y|x) =
fX (x)fY (y)
f (x, y)
=
= fY (y).
fX (x)
fX (x)
Similarly, X and Y indep implies f (x|y) = fX (x).
Goldsman
5/21/14
87 / 147
Independent Random Variables
Example (discrete): f (x, y) = P (X = x, Y = y).
X=1
X=2
fY (y)
Y =2
0.12
0.28
0.4
Y =3
0.18
0.42
0.6
fX (x)
0.3
0.7
1
X and Y are indep since f (x, y) = fX (x)fY (y), ∀x, y.
Goldsman
5/21/14
88 / 147
Independent Random Variables
Example (cts): Suppose f (x, y) = 6xy 2 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.
After some work (which can be avoided by the next theorem), we can
derive
fX (x) = 2x, if 0 ≤ x ≤ 1, and
fY (y) = 3y 2 , if 0 ≤ y ≤ 1.
X and Y are indep since f (x, y) = fX (x)fY (y), ∀x, y.
Goldsman
5/21/14
89 / 147
Independent Random Variables
Easy way to tell if X and Y are indep. . .
Theorem: X and Y are indep iff f (x, y) = a(x)b(y), ∀x, y, for some
functions a(x) and b(y) (not necessarily pdf’s).
So if f (x, y) factors into separate functions of x and y, then X and Y
are indep.
Example: f (x, y) = 6xy 2 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Take
a(x) = 6x, 0 ≤ x ≤ 1, and b(y) = y 2 , 0 ≤ y ≤ 1.
Thus, X and Y are indep (as above).
Goldsman
5/21/14
90 / 147
Independent Random Variables
Example: f (x, y) =
21 2
4 x y,
x2 ≤ y ≤ 1.
“Funny” (non-rectangular) limits make factoring into marginals
impossible. Thus, X and Y are not indep.
Example: f (x, y) =
c
x+y ,
1 ≤ x ≤ 2, 1 ≤ y ≤ 3.
Can’t factor f (x, y) into fn’s of x and y separately. Thus, X and Y are
not indep.
Now that we can figure out if X and Y are indep, what can we do with
that knowledge?
Goldsman
5/21/14
91 / 147
Independent Random Variables
Consequences of Independence
Definition/Theorem (another Unconscious Statistician): Let h(X, Y ) be
a fn of the RV’s X and Y . Then
( P P
h(x, y)f (x, y)
discrete
E[h(X, Y )] =
R xR y
R R h(x, y)f (x, y) dx dy continuous
Theorem: Whether or not X and Y are indep,
E[X + Y ] = E[X] + E[Y ].
Goldsman
5/21/14
92 / 147
Independent Random Variables
Proof (cts case):
Z Z
E[X + Y ] =
(x + y)f (x, y) dx dy
R
R
Z Z
Z Z
=
xf (x, y) dx dy +
R
R
Z
=
Z
x
R
R
Z
f (x, y) dy dx +
R
Z
=
yf (x, y) dx dy
R
Z
y
R
f (x, y) dx dy
R
Z
xfX (x) dx +
R
yfY (y) dy
R
= E[X] + E[Y ].
Goldsman
5/21/14
93 / 147
Independent Random Variables
Can generalize this result to more than two RV’s.
Corollary: If X1 , X2 , . . . , Xn are RV’s, then
X
n
n
X
E
Xi =
E[Xi ].
i=1
i=1
Proof: Induction.
Goldsman
5/21/14
94 / 147
Independent Random Variables
Theorem: If X and Y are indep, then E[XY ] = E[X]E[Y ].
Proof (cts case):
Z Z
xyf (x, y) dx dy
E[XY ] =
R
R
Z Z
xyfX (x)fY (y) dx dy (X and Y are indep)
=
R
R
Z
=
Z
xfX (x) dx
R
yfY (y) dy
R
= E[X]E[Y ].
Goldsman
5/21/14
95 / 147
Independent Random Variables
Remark: The above theorem is not necessarily true if X and Y are
dependent. See the upcoming discussion on covariance.
Theorem: If X and Y are indep, then
Var(X + Y ) = Var(X) + Var(Y ).
Remark: The assumption of independence really is important here. If
X and Y aren’t independent, then the result might not hold.
Corollary: If X1 , X2 , . . . , Xn are indep RV’s, then
X
n
n
X
Var
Xi =
Var(Xi ).
i=1
i=1
Proof: Induction.
Goldsman
5/21/14
96 / 147
Independent Random Variables
Proof:
Var(X + Y )
= E[(X + Y )2 ] − (E[X + Y ])2
= E[X 2 + 2XY + Y 2 ] − (E[X] + E[Y ])2
= E[X 2 ] + 2E[XY ] + E[Y 2 ]
−(E[X])2 − 2E[X]E[Y ] − (E[Y ])2
= E[X 2 ] + 2E[X]E[Y ] + E[Y 2 ]
−(E[X])2 − 2E[X]E[Y ] − (E[Y ])2
= E[X 2 ] − (E[X])2 + E[Y 2 ] − (E[Y ])2
= Var(X) + Var(Y ).
Goldsman
5/21/14
97 / 147
Independent Random Variables
Random Samples
Definition: X1 , X2 , . . . , Xn form a random sample if
Xi ’s are all independent.
Each Xi has the same pmf/pdf f (x).
iid
Notation: X1 , . . . , Xn ∼ f (x) (“indep and identically distributed”)
Goldsman
5/21/14
98 / 147
Independent Random Variables
iid
Example/Theorem: Suppose X1 , . . . , Xn ∼ f (x) with E[Xi ] = µ and
Var(Xi ) = σ 2 . Define the sample mean as
n
1X
Xi .
X̄ ≡
n
i=1
Then
X
n
n
n
1
1X
1X
E[X̄] = E
Xi =
E[Xi ] =
µ = µ.
n
n
n
i=1
i=1
i=1
So the mean of X̄ is the same as the mean of Xi .
Goldsman
5/21/14
99 / 147
Independent Random Variables
Meanwhile,. . .
X
n
1
Xi
Var(X̄) = Var
n
i=1
X
n
1
=
Var
Xi
n2
i=1
=
=
n
1 X
Var(Xi ) (Xi ’s indep)
n2
1
n2
i=1
n
X
σ 2 = σ 2 /n.
i=1
So the mean of X̄ is the same as the mean of Xi , but the variance
decreases! This makes X̄ a great estimator for µ (which is usually
unknown in practice), and the result is referred to as the Law of Large
Numbers. Stay tuned.
Goldsman
5/21/14
100 / 147
Conditional Expectation
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
101 / 147
Conditional Expectation
Conditional Expectation
Usual definition of expectation. E.g., what’s the avg weight of a male?
( P
yf (y)
discrete
E[Y ] =
R y
R yf (y) dy continuous
Now suppose we’re interested in the avg weight of a 6’6" tall male.
f (y|x) is the conditional pdf/pmf of Y given X = x.
Definition: The conditional expectation of Y given X = x is
( P
yf (y|x)
discrete
E[Y |X = x] ≡
R y
R yf (y|x) dy continuous
Note that E[Y |X = x] is a function of x.
Goldsman
5/21/14
102 / 147
Conditional Expectation
Discrete Example:
f (x, y)
X=0
X=3
fY (y)
Y =2
0.11
0.34
0.45
Y =5
0.00
0.05
0.05
Y = 10
0.29
0.21
0.50
fX (x)
0.40
0.60
1
The unconditional expectation is E[Y ] =
conditional on X = 3, we have
P
y
yfY (y) = 6.15. But




f (3, y)
f (3, y)
=
=
f (y|x = 3) =

fX (3)
0.60


34
60
5
60
21
60
if y = 2
if y = 5
if y = 10
So the expectation conditional on X = 3 is
X
5
21 34 E[Y |X = 3] =
yf (y|3) = 2
+5
+ 10
= 5.05.
60
60
60
y
Goldsman
5/21/14
103 / 147
Conditional Expectation
Old Cts Example:
f (x, y) =
21 2
x y, if x2 ≤ y ≤ 1.
4
f (y|x) =
2y
if x2 ≤ y ≤ 1.
1 − x4
Recall that
Thus,
Z
E[Y |x] =
yf (y|x) dy =
R
Thus, e.g., E[Y |X = 0.5] =
Goldsman
2
3
·
2
1 − x4
63 15
64 / 16
Z
1
x2
y 2 dy =
2 1 − x6
·
.
3 1 − x4
= 0.70.
5/21/14
104 / 147
Conditional Expectation
Theorem (double expectations): E[E(Y |X)] = E[Y ].
Remarks: Yikes, what the heck is this!?
The expected value (averaged over all X’s) of the conditional expected
value (of Y |X) is the plain old expected value (of Y ).
Think of the outside exp value as the exp value of h(X) = E(Y |X).
Then the Law of the Unconscious Statistician miraculously gives us
E[Y ].
Believe it or not, sometimes it’s easier to calculate E[Y ] indirectly by
using our double expectation trick.
Goldsman
5/21/14
105 / 147
Conditional Expectation
Proof (cts case): By the Unconscious Statistician,
Z
E[E(Y |X)] =
E(Y |x)fX (x) dx
R
Z Z
yf (y|x) dy fX (x) dx
=
R
R
Z Z
=
yf (y|x)fX (x) dx dy
R
R
Z
=
Z
y
R
f (x, y) dx dy
R
Z
=
yfY (y) dy = E[Y ].
R
Goldsman
5/21/14
106 / 147
Conditional Expectation
Old Example: Suppose f (x, y) =
ways.
21 2
4 x y,
if x2 ≤ y ≤ 1. Find E[Y ] two
By previous examples, we know that
fX (x) =
21 2
x (1 − x4 ), if −1 ≤ x ≤ 1
8
fY (y) =
7 5/2
y , if 0 ≤ y ≤ 1
2
E[Y |x] =
Goldsman
2 1 − x6
·
.
3 1 − x4
5/21/14
107 / 147
Conditional Expectation
Solution #1 (old, boring way):
Z
Z
E[Y ] =
yfY (y) dy =
0
R
1
7
7 7/2
y dy = .
2
9
Solution #2 (new, exciting way):
E[Y ] = E[E(Y |X)]
Z
E(Y |x)fX (x) dx
=
R
Z
1
=
−1
2 1 − x6
·
3 1 − x4
21 2
7
4
x (1 − x ) dx = .
8
9
Notice that both answers are the same (good)!
Goldsman
5/21/14
108 / 147
Conditional Expectation
Example: An alternative way to calculate the mean of the Geom(p).
Let Y ∼ Geom(p), e.g., Y could be the number of coin flips before H
appears, where P (H) = p. Further, let
(
1 if first flip is H
X =
.
0
otherwise
Based on the result X of the first step, we have
E[Y ] = E[E(Y |X)]
X
=
E(Y |x)fX (x)
x
= E(Y |X = 0)P (X = 0) + E(Y |X = 1)P (X = 1)
= (1 + E[Y ])(1 − p) + 1(p).
(why?)
Solving, we get E[Y ] = 1/p (which is indeed the correct answer!)
Goldsman
5/21/14
109 / 147
Conditional Expectation
Theorem (expectation of the sum of a random number of RV’s):
Suppose that X1 , X2 , . . . are independent RV’s, all with the same
mean. Also suppose that N is a nonnegative, integer-valued RV, that’s
independent of the Xi ’s. Then
!
N
X
E
Xi = E[N ]E[X1 ].
i=1
Remark:
PN You have to be very careful here. In particular, note that
E( i=1 Xi ) 6= N E[X1 ], since the LHS is a number and the RHS is
random.
Goldsman
5/21/14
110 / 147
Conditional Expectation
Proof: By double expectation and the fact that N is indep of the Xi ’s,
!
"
!#
N
N
X
X
E
Xi
= E E
Xi N
i=1
=
=
=
=
∞
X
n=1
∞
X
n=1
∞
X
n=1
∞
X
E
i=1
N
X
i=1
E
n
X
i=1
E
n
X
!
Xi N = n P (N = n)
!
Xi N = n P (N = n)
!
Xi
P (N = n)
i=1
nE[X1 ]P (N = n)
n=1
= E[X1 ]
∞
X
nP (N = n). 2
n=1
Goldsman
5/21/14
111 / 147
Conditional Expectation
Example: Suppose the number of times we roll a die is N ∼ Pois(10).
If Xi denotes the value of the ith toss, then the expected total of all of
the rolls is
!
N
X
E
Xi = E[N ]E[X1 ] = 10(3.5) = 35. 2
i=1
Theorem: Under the same conditions as before,
!
N
X
Var
Xi = E[N ]Var(X1 ) + (E[X1 ])2 Var(N ).
i=1
Proof: See, for instance, Ross.
Goldsman
2
5/21/14
112 / 147
Conditional Expectation
Computing Probabilities by Conditioning
Let A be some event, and define the RV Y as:
(
1 if A occurs
Y =
.
0 otherwise
Then
E[Y ] =
X
yfY (y) = P (Y = 1) = P (A).
y
Similarly, for any RV X, we have
E[Y |X = x] =
X
yfY (y|x)
y
= P (Y = 1|X = x)
= P (A|X = x).
Goldsman
5/21/14
113 / 147
Conditional Expectation
Further, since E[Y ] = E[E(Y |X)], we have
P (A) = E[Y ]
= E[E(Y |X)]
Z
=
E[Y |x]dFX (x)
R
Z
=
P (A|X = x)dFX (x).
R
Goldsman
5/21/14
114 / 147
Conditional Expectation
Example/Theorem: If X and Y are independent continuous RV’s, then
Z
P (Y < X) =
FY (x)fX (x) dx,
R
where FY (·) is the c.d.f. of Y and fX (·) is the p.d.f. of X.
Proof: (Actually, there are many proofs.) Let the event A = {Y < X}.
Then
Z
P (Y < X) =
P (Y < X|X = x)fX (x) dx
R
Z
=
P (Y < x|X = x)fX (x) dx
ZR
=
P (Y < x)fX (x) dx
R
(since X, Y are indep). 2
Goldsman
5/21/14
115 / 147
Conditional Expectation
Example: If X ∼ Exp(µ) and Y ∼ Exp(λ) are independent RV’s. Then
Z
P (Y < X) =
FY (x)fX (x) dx
R
Z ∞
=
(1 − e−λx )µe−µx dx
0
=
Goldsman
λ
. 2
λ+µ
5/21/14
116 / 147
Conditional Expectation
Example/Theorem: If X and Y are independent continuous RV’s, then
Z
P (X + Y < a) =
FY (a − x)fX (x) dx,
R
where FY (·) is the c.d.f. of Y and fX (·) is the p.d.f. of X. The quantity
X + Y is called a convolution.
Proof:
Z
P (X + Y < a) =
P (X + Y < a|X = x)fX (x) dx
ZR
P (Y < a − x|X = x)fX (x) dx
=
ZR
P (Y < a − x)fX (x) dx
=
R
(since X, Y are indep). 2
Goldsman
5/21/14
117 / 147
Conditional Expectation
iid
Example: Suppose X, Y ∼ Exp(λ). Note that

−λ(a−x) if a − x ≥ 0 and x ≥ 0


 1−e
FY (a − x) =
(i.e., 0 ≤ x ≤ a)



0
if otherwise
Z
FY (a − x)fX (x) dx
ZRa
=
(1 − e−λ(a−x) )λe−λx dx
P (X + Y < a) =
0
= 1 − e−λa − λae−λa ,
d
P (X + Y < a) = λ2 ae−λa ,
da
This implies that X + Y ∼ Gamma(2, λ). 2
Goldsman
if a ≥ 0.
a ≥ 0.
5/21/14
118 / 147
Conditional Expectation
And Now, A Word From Our Sponsor. . .
Congratulations! You are now done with the most difficult module of
the course!
Things will get easier from here (I hope)!
Goldsman
5/21/14
119 / 147
Covariance and Correlation
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
120 / 147
Covariance and Correlation
Covariance and Correlation
These are measures used to define the degree of association between
X and Y if they don’t happen to be indep.
Definition: The covariance between X and Y is
Cov(X, Y ) ≡ σXY ≡ E[(X − E[X])(Y − E[Y ])].
Remark: Cov(X, X) = E[(X − E[X])2 ] = Var(X).
If X and Y have positive covariance, then X and Y move “in the same
direction.” Think height and weight.
If X and Y have negative covariance, then X and Y move “in opposite
directions.” Think snowfall and temperature.
Goldsman
5/21/14
121 / 147
Covariance and Correlation
Theorem (easier way to calculate Cov):
Cov(X, Y ) = E[XY ] − E[X]E[Y ].
Proof:
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
= E XY − XE[Y ] − Y E[X] + E[X]E[Y ]
= E[XY ] − E[X]E[Y ] − E[Y ]E[X] + E[X]E[Y ]
= E[XY ] − E[X]E[Y ].
Goldsman
5/21/14
122 / 147
Covariance and Correlation
Theorem: X and Y indep implies Cov(X, Y ) = 0.
Proof:
Cov(X, Y ) = E[XY ] − E[X]E[Y ]
= E[X]E[Y ] − E[X]E[Y ] (X, Y indep)
= 0.
Goldsman
5/21/14
123 / 147
Covariance and Correlation
Danger Will Robinson: Cov(X, Y ) = 0 does not imply X and Y are
indep!!
Example: Suppose X ∼ U(−1, 1) and Y = X 2 (so X and Y are clearly
dependent).
But
Z
1
1
dx = 0 and
2
−1
Z 1
1
3
E[XY ] = E[X ] =
x3 · dx = 0,
2
−1
E[X] =
x·
so
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.
Goldsman
5/21/14
124 / 147
Covariance and Correlation
Definition: The correlation between X and Y is
Cov(X, Y )
σXY
ρ = Corr(X, Y ) ≡ p
.
=
σ
Var(X)Var(Y )
X σY
Remark: Cov has “square” units; corr is unitless.
Corollary: X, Y indep implies ρ = 0.
Theorem: It can be shown that −1 ≤ ρ ≤ 1.
ρ ≈ 1 is “high” corr
ρ ≈ 0 is “low” corr
ρ ≈ −1 is “high” negative corr.
Example: Height is highly correlated with weight.
Temperature on Mars has low corr with IBM stock price.
Goldsman
5/21/14
125 / 147
Covariance and Correlation
Anti-UGA Example: Suppose X is the avg GPA of a UGA student, and
Y is his IQ. Here’s the joint pmf.
Goldsman
f (x, y)
X=2
X=3
X=4
fY (y)
Y = 40
0.0
0.2
0.1
0.3
Y = 50
0.15
0.1
0.05
0.3
Y = 60
0.3
0.0
0.1
0.4
fX (x)
0.45
0.3
0.25
1
5/21/14
126 / 147
Covariance and Correlation
X
E[X] =
xfX (x) = 2.8
x
X
E[X 2 ] =
x2 fX (x) = 8.5
x
Var(X) = E[X 2 ] − (E[X])2 = 0.66
Similarly, E[Y ] = 51, E[Y 2 ] = 2670, and Var(Y ) = 69.
XX
E[XY ] =
xyf (x, y)
x
y
= 2(40)(.0) + · · · + 4(60)(.1) = 140
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = −2.8
Cov(X, Y )
= −0.415.
ρ = p
Var(X)Var(Y )
Goldsman
5/21/14
127 / 147
Covariance and Correlation
Cts Example: Suppose f (x, y) = 10x2 y, 0 ≤ y ≤ x ≤ 1.
Z x
fX (x) =
10x2 y dy = 5x4 , 0 ≤ x ≤ 1
0
Z
1
5x5 dx = 5/6
E[X] =
0
E[X 2 ] =
Z
1
5x6 dx = 5/7
0
Var(X) = E[X 2 ] − (E[X])2 = 0.01984
Goldsman
5/21/14
128 / 147
Covariance and Correlation
Similarly,
Z
fY (y) =
1
10x2 y dx =
y
E[Y ] = 5/9,
Z
Var(Y ) = 0.04850
1Z x
E[XY ] =
0
10
y(1 − y 3 ), 0 ≤ y ≤ 1
3
10x3 y 2 dy dx = 10/21
0
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.01323
Cov(X, Y )
ρ = p
= 0.4265
Var(X)Var(Y )
Goldsman
5/21/14
129 / 147
Covariance and Correlation
Theorems Involving Covariance
Theorem: Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ), whether or
not X and Y are indep.
Remark: If X, Y are indep, the Cov term goes away.
Proof: By the work we did on a previous proof,
Var(X + Y ) = E[X 2 ] − (E[X])2 + E[Y 2 ] − (E[Y ])2
+2(E[XY ] − E[X]E[Y ])
= Var(X) + Var(Y ) + 2Cov(X, Y ).
Goldsman
5/21/14
130 / 147
Covariance and Correlation
Theorem:
n
n
X
X
XX
Var(
Xi ) =
Var(Xi ) + 2
i=1
i<j
i=1
Cov(Xi , Xj ).
Proof: Induction.
Remark: If all Xi ’s are indep, then
n
n
X
X
Xi ) =
Var(Xi ).
Var(
i=1
Goldsman
i=1
5/21/14
131 / 147
Covariance and Correlation
Theorem: Cov(aX, bY ) = abCov(X, Y ).
Proof:
Cov(aX, bY ) = E[aX · bY ] − E[aX]E[bY ]
= abE[XY ] − abE[X]E[Y ]
= abCov(X, Y ).
Theorem:
n
n
X
X
XX
Var(
ai Xi ) =
a2i Var(Xi ) + 2
i=1
i=1
i<j
ai aj Cov(Xi , Xj ).
Proof: Put above two results together.
Goldsman
5/21/14
132 / 147
Covariance and Correlation
Example: Var(X − Y ) = Var(X) + Var(Y ) − 2Cov(X, Y ).
Example:
Var(X − 2Y + 3Z)
= Var(X) + 4Var(Y ) + 9Var(Z)
−4Cov(X, Y ) + 6Cov(X, Z) − 12Cov(Y, Z).
Goldsman
5/21/14
133 / 147
Moment Generating Functions
Outline
1
Intro / Definitions
2
Discrete Random Variables
3
Continuous Random Variables
4
Cumulative Distribution Functions
5
Great Expectations
6
Functions of a Random Variable
7
Bivariate Random Variables
8
Conditional Distributions
9
Independent Random Variables
10
Conditional Expectation
11
Covariance and Correlation
12
Moment Generating Functions
Goldsman
5/21/14
134 / 147
Moment Generating Functions
Introduction
Recall that E[X k ] is the kth moment of X.
Definition: MX (t) ≡ E[etX ] is the moment generating function (mgf)
of the RV X.
Remark: MX (t) is a function of t, not of X!
Example: X ∼ Bern(p).
(
X =
1 w.p. p
0 w.p. q
Then
MX (t) = E[etX ] =
X
etx f (x) = et·1 p + et·0 q = pet + q.
x
Goldsman
5/21/14
135 / 147
Moment Generating Functions
Example: X ∼ Exp(λ).
(
f (x) =
λe−λx
if x > 0
otherwise
0
Then
MX (t) = E[etX ]
Z
=
etx f (x) dx
R
Z ∞
= λ
e(t−λ)x dx
0
=
Goldsman
λ
if λ > t.
λ−t
5/21/14
136 / 147
Moment Generating Functions
Big Theorem: Under certain technical conditions,
dk
k
E[X ] =
MX (t) , k = 1, 2, . . . .
k
dt
t=0
Thus, you can generate the moments of X from the mgf. (Sometimes,
it’s easier to get moments this way than directly.)
Goldsman
5/21/14
137 / 147
Moment Generating Functions
“Proof:”
MX (t) = E[etX ] = E
X
∞
(tX)k
k=0
= 1 + tE[X] +
k!
t2 E[X 2 ]
2
=
∞
X
(tX)k
E
k!
k=0
+ ···
This implies
d
MX (t) = E[X] + tE[X 2 ] + · · ·
dt
and so
d
MX (t)
= E[X].
dt
t=0
Same deal for higher-order moments.
Goldsman
5/21/14
138 / 147
Moment Generating Functions
Example: X ∼ Exp(λ). Then MX (t) =
E[X] =
λ
λ−t
for λ > t. So
λ
d
MX (t)
=
= 1/λ.
dt
(λ − t)2 t=0
t=0
Further,
d2
2λ E[X ] =
MX (t)
=
= 2/λ2 .
dt2
(λ − t)3 t=0
t=0
2
Thus,
Var(X) = E[X 2 ] − (E[X])2 =
Goldsman
1
2
−
= 1/λ2 .
λ2 λ2
5/21/14
139 / 147
Moment Generating Functions
Other Applications
You can do lots of nice things with mgf’s. . .
Find the mgf of a linear function of X.
Find the mgf of the sum of indep RV’s.
Identify distributions.
Derive cool properties of distributions.
Goldsman
5/21/14
140 / 147
Moment Generating Functions
Theorem: Suppose X has mgf MX (t) and let Y = aX + b. Then
MY (t) = etb MX (at).
Proof:
MY (t) = E[etY ] = E[et(aX+b) ] = etb E[e(at)X ]
= etb MX (at).
Example: Let X ∼ Exp(λ) and Y = 3X + 2. Then
MY (t) = e2t MX (3t) = e2t
Goldsman
λ
, if λ > 3t.
λ − 3t
5/21/14
141 / 147
Moment Generating Functions
Theorem: Suppose X1 , . . . , Xn are indep. Let Y =
MY (t) =
n
Y
Pn
i=1 Xi .
Then
MXi (t).
i=1
Proof:
tY
t
MY (t) = E[e ] = E[e
P
Xi
Y
n
tXi
] = E
e
i=1
=
n
Y
E[etXi ] (Xi ’s indep)
i=1
=
n
Y
MXi (t).
i=1
Goldsman
5/21/14
142 / 147
Moment Generating Functions
Corollary: If X1 , . . . , Xn are i.i.d. and Y =
Pn
i=1 Xi ,
then
MY (t) = [MX1 (t)]n .
iid
Example: Suppose X1 , . . . , Xn ∼ Bern(p). Then by a previous
example,
MY (t) = [MX1 (t)]n = (pet + q)n .
So what use is a result like this?
Goldsman
5/21/14
143 / 147
Moment Generating Functions
Theorem: If X and Y have the same mgf, then they have the same
distribution (at least in this course)!
Proof: Too hard for this course.
Example: The sum Y of n i.i.d. Bern(p) RV’s is the same as a
Binomial(n, p) RV.
By the previous example, MY (t) = (pet + q)n . So all we need to show
is that the mgf of the matches this.
Goldsman
5/21/14
144 / 147
Moment Generating Functions
Meanwhile, let Z ∼ Bin(n, p).
MZ (t) = E[etZ ] =
X
etz P (Z = z)
z
n
X
tz n
=
e
pz q n−z
z
z=0
n X
n
=
(pet )z q n−z
z
z=0
t
= (pe + q)n (by the binomial theorem),
and this matches the mgf of Y from the last pg.
Goldsman
5/21/14
145 / 147
Moment Generating Functions
Example: You can identify a distribution by its mgf.
MX (t) =
3 t 1
e +
4
4
15
implies that X ∼ Bin(15, 0.75).
Example:
MY (t) = e
−2t
3 3t 1
e +
4
4
15
implies that Y has the same distribution as 3X − 2, where
X ∼ Bin(15, 0.75).
Goldsman
5/21/14
146 / 147
Moment Generating Functions
Theorem (Additive property of Binomials): If X1 , . . . , Xk are indep, with
Xi ∼ Bin(ni , p) (where p is the same for all Xi ’s), then
Y ≡
k
X
k
X
Xi ∼ Bin(
ni , p).
i=1
i=1
Proof:
MY (t) =
k
Y
MXi (t) (mgf of indep sum)
i=1
=
k
Y
(pet + q)ni (Binomial(ni , p) mgf)
i=1
Pk
= (pet + q) i=1 ni .
P
This is the mgf of the Bin( ki=1 ni , p), so we’re done.
Goldsman
5/21/14
147 / 147
Download