X f a < b

advertisement
A random variable X has a probability density function if there is a function f : R → R so that
Zb
P(a < X ≤ b) =
f (x)dx
a
for any a < b. A random variable with a density is called continuous. We
remind the reader that the distribution of a continuous random variable is
determined by its density function.
The goal of this discussion is to discuss the determination of the distribution of the random variable Y = h(X), where h is a differentiable function
and X is a continuous random variable with density function f .
We first consider the case where the following hold:
1. h : D → R is differentiable on its domain D, which is an open interval
of R.
2. h is one-to-one, which means that h(x) = h(y) implies that x = y.
3. The support of X, defined as
def
supp( f ) = {x : f (x) > 0}
(1)
is contained in D.
For any set S ⊂ R, define h(S) = {y : y = f (x) for some x ∈ D}.
Theorem 1. Suppose that the conditions above hold. Then Y = h(X) is a
continuous random variable with density function
d −1 −1
fY (y) = fX (h (y)) h (y) 1 {y ∈ h(D)} ,
(2)
dy
for all points y so that fX is continuous at h−1 (y).
Proof. Since h is one-to-one, it is either always increasing or always decreasing on D. Assume that h is increasing, the other case is similar. We
begin by computing the distribution function of Y :
P(Y ≤ y) = P(h(X) ≤ y)
= P X ≤ h−1 (y)
Z h−1(y)
fX (x)dx
=
−∞
Zy
d
fX (h−1 (z)) h−1 (z)dz
=
dy
−∞
Zy
d −1 −1
=
fX (h (z)) h (z) dz
dy
−∞
1
by change of variables
since h is increasing .
2
Now suppose that y is such that h−1 (y) is a point of continuity for fX . Then
by the Fundamental Theorem of Calculus, FY is differentiable at y with
derivative
d −1 d
−1
fY (y) = FY (y) = fX (h (y)) h (y) .
dy
dy
Now we consider the case where h is not one-to-one. We will assume the
following: For each y ∈ h(D), the set h−1 ({y}) = {x : h(x) = y} is a finite
set.
We recall the following theorem from calculus:
Theorem (Inverse Function Theorem). Let h : D → R be a differentiable
function. Let y = f (x) for some x ∈ D. Suppose that f 0 (x) 6= 0. Then
there is an open interval I containing x and an open interval J containing
y, so that h restricted to I is one-to-one, and there is a differentiable inverse
h−1
x : J → I.
Thus, for each x ∈ h−1 ({y}), there is a function gx defined in a neighborhood of x so that h ◦ gi (y0 ) = y0 for all y0 in a neighborhood of y, and
gi ◦ h(x0 ) = x0 for all x0 in a neighborhood of x.
Assume that for each x ∈ h−1({y}), we have h0 (x) 6= 0. Now, since
−1
h ({y}) is finite, say equal to {x1 , . . . , xr }, letting gi = gxi there is an interval J containing y on which each of the gi is defined. We can take J small
enough so that {gi (J)} are disjoint intervals.
We have for a ≤ y ≤ b with a < b and a, b ∈ J,


[
P(a ≤ Y ≤ b) = P
{X ∈ gx([a, b])}
x∈h−1 ({y})
∑
=
P(X ∈ gx ([a, b]))
x∈h−1 ({y})
∑
=
Z
f (u)du
x∈h−1 ({y}) gx ([a,b])
=
x∈h
Z
=
∑
−1
b
Z
b
({y}) a
∑
a x∈h−1({y})
f (gx (s)) g0x(s) ds
f (gx (s)) g0x(s) ds
3
y
x1
x2
x3
F IGURE 1. A many-to-one function
Then taking a = y and b = y + ∆y, differentiating with respect to ∆y, and
evaluating at ∆y = 0 yields
fY (y) =
(3)
∑ f (gx (s)) g0x (s) .
x∈h−1({y})
Figure 1 shows a many-to-one function. Note how a little neighborhood
around y maps to neighborhoods surrounding the three points in h−1 ({y}).
For this y, the sum in (3) will have three terms.
Let us consider an example. Suppose that X has an exponential(1) distribution, and let


if 0 < x ≤ 13
3x
8
h(x) = 1 − 5 x − 13
if 13 < x < 15

2 x − 8 8
if x ≥ 15
.
15
The reader should graph this function. Let 0 < y < 1. The there are three x
so that h(x) = y. Namely,
1
x= y
3
1
4
x = − y+
5
15
1
8
x = y+ .
2
15
The three functions on the right of the above equation are then g1(x), g2 (x)
and g3 (x). Thus we have
e−y/3 ey/5−4/15 e−y/2−8/15
+
+
.
3
5
2
Here is another example. Suppose that X has the density
x
f (x) = 2 1 {0 < x < 2π} ,
2π
fY (y) =
4
and consider the random variable Y = sin X. We will now find the density
of Y .
First take y > 0. Then
sin−1({y}) ∩ (0, 2π) = {arcsin(y), arcsin(y) + π/2} .
This follows since, by convention, arcsin(y) is defined to take values in
[− π2 , π2 ] for y ∈ [−1, 1].
Thus using the notation as above, we have
g1 (y) = arcsin(y) ,
g2 (y) = arcsin(y) +
π
.
2
Thus we have
fY (y) =
arcsin(y) + π2
1
1
arcsin(y)
p
p
+
2
2
2π
2π
1 − y2
1 − y2
=
arcsin(y) + π4
p
.
π2 1 − y2
Let us review some facts from multivariate calculus. Let h : D → R be a
one-to-one function, where D ⊂ Rn and R ⊂ Rn . We write
h(x1 , . . . , xn ) = (h1 (x1 , . . . , xn), . . . , hn(x1 , . . . , xn )) .
The total derivative of h at x = (x1 , . . . , xn) is defined as the matrix

 ∂h
1
1
1
(x) ∂h
(x) · · · ∂h
(x)
∂x1
∂x2
∂xn

 ∂h2
∂h2
2

 ∂x (x) ∂h
∂x2 (x) · · · ∂xn (x)
1

Dh(x) =  .
.
.
.
..
..
.. 

 ..
∂hn
∂x1 (x)
∂hn
∂x2 (x)
···
∂hn
∂xn (x)
The Jacobian of h at x, which we will denote by Jh (x) is defined as
def
Jh (x) = det Dh(x) .
(4)
Now the change of variables formula says the following: Let h : D → R
be a function which has continuous first partial derivatives. Then
Z
Z
f (y)dy =
f (h(x))|Jh (x)|dx .
E
h−1 (E)
Now we can state the formula for finding the density of Y = h(X), where
X is a random vector in Rn , and h is a one-to-one function defined on D ⊂
Rn , where the support of fX is contained in D.
5
Theorem 2. Let h be as above, and let X be a continuous random vector in
Rn with density fX . Then the density of Y is given by
fY (y) = fX (h−1 (y))|Jh−1 (y)| .
A fact which is often very useful is that
(5)
|Jh−1 (y)| =
1
|Jh (h−1 (y)|
.
Let X,Y be independent standard Normal random variables. Let
(D, Θ) = h(X,Y ) = (X 2 + Y 2 , arctan(Y /X)) .
Then
√
√
h−1(d, θ) = ( d cos θ, d sin θ) .
We have
"
Dh−1(d, θ) =
1
√
cosθ
2 d
1
√ sin θ
2 d
#
√
− d sin θ
√
d cos θ
Thus
1
1
1
cos2 θ + sin2 θ = .
2
2
2
Then we have for d > 0 and θ ∈ [0, 2π):
|Jh−1 (d, θ)| =
1 − 1 (d cos2 θ+d sin2 θ) 1 1 − d 1
e 2
= e 2
2π
2 2
2π
This shows that D and Θ are independent, and D is exponential(1/2), and
Θ is Uniform[0, 2π). [Why?]
Note we could run this in reverse: Suppose we start with D an exponential(1/2)
random variable, and Θ an independent Uniform[0, 2π) random variable.
Then let
√
√
g(d, θ) = ( d cosθ, d sin θ) .
fD,Θ (d, θ) =
Then g−1 (x, y) = h(x, y), where h is defined as above. Now finding the
Jacobian of g−1 itself can be done, but it is perhaps easier to use (5):
1
1
=
= 2.
Jg−1 (x, y) = Jh (x, y) =
Jh−1 (h(x, y)) 1/2
Thus,
2
2
2
2
1
1
1
fX,Y (x, y) = e−(x +y )/2 2 = e−(x +y )/2 .
2
2π
2π
Thus g(D, Θ) gives a pair of independent standard normal random variables.
[Why?]
6
This gives a method of simulating a pair of Normal random variable. It
is relatively easy to simulate a uniform and an exponential random variable. Then applying the function g to them gives a pair of Normal random
variables.
Download