A random variable X has a probability density function if there is a function f : R → R so that Zb P(a < X ≤ b) = f (x)dx a for any a < b. A random variable with a density is called continuous. We remind the reader that the distribution of a continuous random variable is determined by its density function. The goal of this discussion is to discuss the determination of the distribution of the random variable Y = h(X), where h is a differentiable function and X is a continuous random variable with density function f . We first consider the case where the following hold: 1. h : D → R is differentiable on its domain D, which is an open interval of R. 2. h is one-to-one, which means that h(x) = h(y) implies that x = y. 3. The support of X, defined as def supp( f ) = {x : f (x) > 0} (1) is contained in D. For any set S ⊂ R, define h(S) = {y : y = f (x) for some x ∈ D}. Theorem 1. Suppose that the conditions above hold. Then Y = h(X) is a continuous random variable with density function d −1 −1 fY (y) = fX (h (y)) h (y) 1 {y ∈ h(D)} , (2) dy for all points y so that fX is continuous at h−1 (y). Proof. Since h is one-to-one, it is either always increasing or always decreasing on D. Assume that h is increasing, the other case is similar. We begin by computing the distribution function of Y : P(Y ≤ y) = P(h(X) ≤ y) = P X ≤ h−1 (y) Z h−1(y) fX (x)dx = −∞ Zy d fX (h−1 (z)) h−1 (z)dz = dy −∞ Zy d −1 −1 = fX (h (z)) h (z) dz dy −∞ 1 by change of variables since h is increasing . 2 Now suppose that y is such that h−1 (y) is a point of continuity for fX . Then by the Fundamental Theorem of Calculus, FY is differentiable at y with derivative d −1 d −1 fY (y) = FY (y) = fX (h (y)) h (y) . dy dy Now we consider the case where h is not one-to-one. We will assume the following: For each y ∈ h(D), the set h−1 ({y}) = {x : h(x) = y} is a finite set. We recall the following theorem from calculus: Theorem (Inverse Function Theorem). Let h : D → R be a differentiable function. Let y = f (x) for some x ∈ D. Suppose that f 0 (x) 6= 0. Then there is an open interval I containing x and an open interval J containing y, so that h restricted to I is one-to-one, and there is a differentiable inverse h−1 x : J → I. Thus, for each x ∈ h−1 ({y}), there is a function gx defined in a neighborhood of x so that h ◦ gi (y0 ) = y0 for all y0 in a neighborhood of y, and gi ◦ h(x0 ) = x0 for all x0 in a neighborhood of x. Assume that for each x ∈ h−1({y}), we have h0 (x) 6= 0. Now, since −1 h ({y}) is finite, say equal to {x1 , . . . , xr }, letting gi = gxi there is an interval J containing y on which each of the gi is defined. We can take J small enough so that {gi (J)} are disjoint intervals. We have for a ≤ y ≤ b with a < b and a, b ∈ J, [ P(a ≤ Y ≤ b) = P {X ∈ gx([a, b])} x∈h−1 ({y}) ∑ = P(X ∈ gx ([a, b])) x∈h−1 ({y}) ∑ = Z f (u)du x∈h−1 ({y}) gx ([a,b]) = x∈h Z = ∑ −1 b Z b ({y}) a ∑ a x∈h−1({y}) f (gx (s)) g0x(s) ds f (gx (s)) g0x(s) ds 3 y x1 x2 x3 F IGURE 1. A many-to-one function Then taking a = y and b = y + ∆y, differentiating with respect to ∆y, and evaluating at ∆y = 0 yields fY (y) = (3) ∑ f (gx (s)) g0x (s) . x∈h−1({y}) Figure 1 shows a many-to-one function. Note how a little neighborhood around y maps to neighborhoods surrounding the three points in h−1 ({y}). For this y, the sum in (3) will have three terms. Let us consider an example. Suppose that X has an exponential(1) distribution, and let if 0 < x ≤ 13 3x 8 h(x) = 1 − 5 x − 13 if 13 < x < 15 2 x − 8 8 if x ≥ 15 . 15 The reader should graph this function. Let 0 < y < 1. The there are three x so that h(x) = y. Namely, 1 x= y 3 1 4 x = − y+ 5 15 1 8 x = y+ . 2 15 The three functions on the right of the above equation are then g1(x), g2 (x) and g3 (x). Thus we have e−y/3 ey/5−4/15 e−y/2−8/15 + + . 3 5 2 Here is another example. Suppose that X has the density x f (x) = 2 1 {0 < x < 2π} , 2π fY (y) = 4 and consider the random variable Y = sin X. We will now find the density of Y . First take y > 0. Then sin−1({y}) ∩ (0, 2π) = {arcsin(y), arcsin(y) + π/2} . This follows since, by convention, arcsin(y) is defined to take values in [− π2 , π2 ] for y ∈ [−1, 1]. Thus using the notation as above, we have g1 (y) = arcsin(y) , g2 (y) = arcsin(y) + π . 2 Thus we have fY (y) = arcsin(y) + π2 1 1 arcsin(y) p p + 2 2 2π 2π 1 − y2 1 − y2 = arcsin(y) + π4 p . π2 1 − y2 Let us review some facts from multivariate calculus. Let h : D → R be a one-to-one function, where D ⊂ Rn and R ⊂ Rn . We write h(x1 , . . . , xn ) = (h1 (x1 , . . . , xn), . . . , hn(x1 , . . . , xn )) . The total derivative of h at x = (x1 , . . . , xn) is defined as the matrix ∂h 1 1 1 (x) ∂h (x) · · · ∂h (x) ∂x1 ∂x2 ∂xn ∂h2 ∂h2 2 ∂x (x) ∂h ∂x2 (x) · · · ∂xn (x) 1 Dh(x) = . . . . .. .. .. .. ∂hn ∂x1 (x) ∂hn ∂x2 (x) ··· ∂hn ∂xn (x) The Jacobian of h at x, which we will denote by Jh (x) is defined as def Jh (x) = det Dh(x) . (4) Now the change of variables formula says the following: Let h : D → R be a function which has continuous first partial derivatives. Then Z Z f (y)dy = f (h(x))|Jh (x)|dx . E h−1 (E) Now we can state the formula for finding the density of Y = h(X), where X is a random vector in Rn , and h is a one-to-one function defined on D ⊂ Rn , where the support of fX is contained in D. 5 Theorem 2. Let h be as above, and let X be a continuous random vector in Rn with density fX . Then the density of Y is given by fY (y) = fX (h−1 (y))|Jh−1 (y)| . A fact which is often very useful is that (5) |Jh−1 (y)| = 1 |Jh (h−1 (y)| . Let X,Y be independent standard Normal random variables. Let (D, Θ) = h(X,Y ) = (X 2 + Y 2 , arctan(Y /X)) . Then √ √ h−1(d, θ) = ( d cos θ, d sin θ) . We have " Dh−1(d, θ) = 1 √ cosθ 2 d 1 √ sin θ 2 d # √ − d sin θ √ d cos θ Thus 1 1 1 cos2 θ + sin2 θ = . 2 2 2 Then we have for d > 0 and θ ∈ [0, 2π): |Jh−1 (d, θ)| = 1 − 1 (d cos2 θ+d sin2 θ) 1 1 − d 1 e 2 = e 2 2π 2 2 2π This shows that D and Θ are independent, and D is exponential(1/2), and Θ is Uniform[0, 2π). [Why?] Note we could run this in reverse: Suppose we start with D an exponential(1/2) random variable, and Θ an independent Uniform[0, 2π) random variable. Then let √ √ g(d, θ) = ( d cosθ, d sin θ) . fD,Θ (d, θ) = Then g−1 (x, y) = h(x, y), where h is defined as above. Now finding the Jacobian of g−1 itself can be done, but it is perhaps easier to use (5): 1 1 = = 2. Jg−1 (x, y) = Jh (x, y) = Jh−1 (h(x, y)) 1/2 Thus, 2 2 2 2 1 1 1 fX,Y (x, y) = e−(x +y )/2 2 = e−(x +y )/2 . 2 2π 2π Thus g(D, Θ) gives a pair of independent standard normal random variables. [Why?] 6 This gives a method of simulating a pair of Normal random variable. It is relatively easy to simulate a uniform and an exponential random variable. Then applying the function g to them gives a pair of Normal random variables.