Urban Operations Research Compiled by James S. Kang Problem Set 1 Solutions Fall 2001 10/3/2001 1. LO Problem 2.3 (Ingolfsson, 1993; Kang, 2001) Anyone who arrives at the transfer station observes two independent Poisson processes, A and B. Each arrival of the combined Poisson process comes from process A (B) with probability λA λA +λB B ( λAλ+λ ), independently of all other arrivals. Thus the combined Poisson process has an embedded B Bernoulli process. To solve this problem, you need a good grasp of the fundamental properties of Poisson and Bernoulli processes. If you feel uncomfortable with the answers below, now is a good time to review Poisson and Bernoulli processes, for example, by reading Chapter 4 of “Fundamentals of Applied Probability Theory” by Alvin Drake. (a) (i) The times between successive A train arrivals are independent exponential random variables with parameter λA = 3/hour. By the memoryless property, the fact that Bart arrives at a random time has no relevance. The time he has to wait until the next A train is still an exponential random variable with parameter λA . So if we call his waiting time X, then the PDF of X is given by fX (x) = λA e−λA x = 3e−3x , x ≥ 0. This is in fact a random incidence question. So we can also solve this question by using formula (2.65) in the textbook. Let Y be the interarrival time of A trains, then fX (x) = E[Y ] = 1 λA = 13 . FY (x) = P {Y ≤ x} = fX (x) = x 0 1 − FY (x) . E[Y ] λA e−λA y dy = 1 − e−λA x = 1 − e−3x . Hence 1 − (1 − e−3x ) 1 3 = 3e−3x , x ≥ 0. (ii) An easy way to answer this question is to first translate it into a Bernoulli process that one is familiar with, for example, a sequence of coin tosses. An A train then becomes a “head” and a B train becomes a “tail”. The problem now reads: What is the probability that one obtains at least 3 tails before a head is obtained? This probability is the same as the probability that the next three tosses result in tails (i.e. the next three trains are B trains). The outcomes of the fourth and subsequent tosses are irrelevant for the 1 purposes of answering this question. Thus, P {at least 3 B trains arrive} = P {next 3 trains are B trains} 3 3 3 6 2 λB = = . = λA + λB 3+6 3 Another way to answer this question is to consider the complementary event. Let NB be a random variable denoting the number of B trains that arrive while Bart is waiting. Then P {NB ≥ 3} = 1 − P {NB = 0} − P {NB = 1} − P {NB = 2} 3 2 λB λA λB 2 λA λA − · − = . =1− λA + λB λA + λB λA + λB λA + λB λA + λB 3 (iii) In order for exactly 3 B trains to arrive while Bart is waiting, the next 3 trains should be B trains and the fourth one should be an A train. P {exactly 3 B trains} = P {next 3 trains are B trains, the 4th train is an A train} 3 3 2 λA 1 λB . = = λA + λB λA + λB 3 3 (b) The combined Poisson process of train arrivals has an arrival rate of λ = λA + λB = 9/hour. The probability of having exactly 9 arrivals during any hour (t = 1) in this process is obtained by using the Poisson PMF formula. PK (9) = 99 e−9 (λt)9 e−λt = 0.1318. 9! 9! (c) To answer this question, one needs to consider another Bernoulli process associated with the combined Poisson process. In this Bernoulli process, each hour is an independent trial and an hour is a success if exactly 9 trains arrive during that hour. From (b), a success occurs with probability PK (9) 0.1318. The question asks what the expected number of trials until the first success is. The answer is the expected value of a geometric random variable with parameter PK (9), i.e. E[number of hours until exactly 9 trains per hour] = (d) 1 1 = 7.59. PK (9) 0.1318 (i) An A train will be delayed if the time, denoted by Z, from the arrival of the A train to the moment of the arrival of the next B train is less than 30 seconds. By the memoryless 2 property, this time has an exponential PDF with parameter λB . Thus the probability that an A train is delayed is obtained by PD = P {Z ≤ 30 seconds} = (30 secs)(1 hr/3600 secs) 0 1/120 = 0 λB e−λB t dt 6e−6t dt = 1 − e−1/20 . Since probabilities equal long-run frequencies, this is also approximately the fraction of A trains that are delayed over some reasonably long period, say a month. (ii) A B train passenger that benefits from the delay policy does not have to wait at all for an A train, so her expected waiting time under the policy is zero. Without the policy, her mean waiting time would be 1 λA = 1 3 = 20 minutes. Therefore such a passenger’s mean waiting time reduction is 20 minutes. Next, consider a passenger in an A train. With probability PD computed in (i), the passenger’s travel time will increase by the amount of time that the A train waits for a B train. Note that the mean increase in travel time for a passenger in an A train that is held for a B train is equal to E[Z | Z ≤ 1/120]. To compute this quantity, we invoke the total expectation theorem: E[Z] = E[Z | Z ≤ 1/120]P {Z ≤ 1/120} + E[Z | Z > 1/120]P {Z > 1/120}. Since E[Z] = 1 λB and E[Z|Z > 1/120] = 1 120 + 1 λB , we have 1/λB = E[Z | Z ≤ 1/120]PD + (1/120 + 1/λB )(1 − PD ). Rearranging terms, E[Z | Z ≤ 1/120] = 1/λB − (1/120 + 1/λB )(1 − PD ) PD 0.00413 hours = 14.9 seconds. Therefore the mean increase in travel time for an A train passenger is E[Z | Z ≤ 1/120] × PD = 14.9 seconds × (1 − e−1/20 ) = 0.73 seconds. One might attempt to obtain the mean increase in travel time (i.e. the expected waiting time, denoted by E[W ]) for an A train passenger as follows: 3 E[W ] = E[W | Z ≤ 1/120]PD + E[W | Z > 1/120](1 − PD ) 1 ? 1 = · × (1 − e−1/20 ) + 0 . 2 120 This is not correct because E[W | Z ≤ 1/120] = 1 2 · 1 120 . The restriction, Z ≤ 1/120, cuts off the right tail of the exponential density curve, fW (w), but what is left is NOT uniform. Let us assume that it never happens that two or more A trains are delayed waiting for the same B train, i.e. we ignore the possibility that two A trains may arrive within 30 seconds of each other. Under this assumption, one A train is held for every B train receiving the benefits. Therefore the policy will lead to a net global travel time reduction if E[total time reduction] > E[total time increase] 1 ⇒ E[NBA ] × hours > E[NA ] × 0.00413 hours 3 ⇒ E[NBA ] > 0.012 × E[NA ], where NBA is the number of people on a B train who wish to transfer to an A train and NA is the number of people on an A train being held. In words, the policy is favored if the average number of people on a B train who wish to transfer is at least 1.2% of the average number of people on an A train. This is a condition that one would expect to hold true for most subway transfer stations. You might want to think about the effect of ignoring the possibility of two A trains arriving within 30 seconds of each other. How would one assess the reasonableness of this simplification? How would the analysis change if one did not make this simplifying assumption? 2. LO Exercise 3.5 (Kang, 2001) As in class, let U be a random variable denoting the distance from the center of the needle to the nearest line, and let Θ be a random variable denoting the acute angle between the needle and the vertical line as shown in the figure below. If the needle is thrown randomly, U is uniformly distributed between 0 and uniformly distributed over [0, cos θ ≥ d 2, i.e. θ ≤ and Θ is also Let T be the event that the needle touches a line (or lines). 2l πd as we saw in class. cos−1 dl , then the needle If l ≤ d, P (T ) = l 2 π 2 ]. d 2 We now compute P (T ) when l > d. Note that if always touches at least one line no matter where it 4 d U l/2 Θ lands. For the sake of notational simplicity, let A represent the event that 0 ≤ Θ ≤ cos−1 dl . Using the total probability theorem, P (T ) is computed by P (T ) = P (T | A)P (A) + P (T | Ac )P (Ac ) . Since Θ ∼ U [0, π2 ], P (A) = 2 π cos−1 dl . Because P (T | A) = 1, we have P (T ) = 2 d d 2 cos−1 + P (T | Ac ) 1 − cos−1 . π l π l If we compute P (T | Ac ), we are all set. Given a θ, P (T | Ac , Θ = θ) is computed by P (T | Ac , Θ = θ) = P (U ≤ l l 2 l cos θ | Ac ) = · cos θ = cos θ , 2 d 2 d because U ∼ U [0, d2 ]. Now we can compute P (T | Ac ) as follows: P (T | A ) = c π 2 cos−1 d l l cos θ fΘ|Ac (θ) dθ , d where fΘ|Ac (θ) is the conditional PDF of Θ conditioned on Ac . Since fΘ|Ac (θ) = P (T | A ) = c = = π 2 cos−1 1− 1− 2 π 2 π d l l cos θ d 1− 2 π 2l πd cos−1 dl 2l πd cos−1 5 d l 2 π cos−1 d l dθ π sin θ 2 cos−1 d l 1 − sin cos −1 d l . fΘ (θ) P (Ac ) , Now we have 2l d 2 P (T ) = cos−1 + π l πd Because sin θ = √ 1 − sin cos −1 d l . 1 − cos2 θ, sin cos −1 d = l 1− cos cos−1 d l 2 = 2 d 1− . l Therefore, P (T ) is simplified as P (T ) = 2l d 2 cos−1 + 1− π l πd 2 d . 1− l To summarize, P (T ) = 2l πd 2 π cos−1 d l + 2l πd 1− 1− d 2 l if l ≤ d, if l > d. 3. (Kang, 2001) Let random variable Y denote the interarrival time of buses. E[Y ] = 3 × 0.4 + 5 × 0.5 + 12 × 0.1 = 4.9 E[Y 2 ] = 32 × 0.4 + 52 × 0.5 + 122 × 0.1 = 30.5 (a) Let V be the waiting time. Using Equation (2.66) in the textbook, E[V ] = 30.5 E[Y 2 ] = = 3.11 minutes 2E[Y ] 2 × 4.9 (b) Consider N intervals, where N is very large. We can expect that the number of intervals with length of 3 minutes is 0.4N . Similarly, 0.5N and 0.1N are the numbers of intervals with length of 5 minutes and of 12 minutes, respectively. Therefore, the total length (minutes) of the N intervals is 3 × 0.4N + 5 × 0.5N + 12 × 0.1N = 4.9N . The probability that he arrives during a 12-minute interval is the proportion of the total length taken up by 12-minute intervals to 4.9N . P (Mendel arrives during a 12-minute interval) = 6 12 × 0.1N = 0.245 4.9N (c) Let W be the length of the interval in which Mendel arrives. We can compute P (V < 1) by P (V < 1) =P (V < 1 | W = 3)P (W = 3) + P (V < 1 | W = 5)P (W = 5) + P (V < 1 | W = 12)P (W = 12). Given that Mendel arrives in a 3-minute interval, the probability that he waits less than one minute is 1 3 because the moment of his arrival is totally random (uniformly distributed) over the 3-minute interval. Similarly, P (V < 1 | W = 5) = 1 5 and P (V < 1 | W = 12) = 1 12 . Hence, 1 1 1 P (W = 3) + P (W = 5) + P (W = 12) 3 5 12 1 5 × 0.5 1 12 × 0.1 1 3 × 0.4 + + = 3 4.9 5 4.9 12 4.9 P (V < 1) = = 0.204 4. LO Problem 3.13 (Chew, 1997; Kang, 2001) Following the notation given in the text, (X1 , Y1 ) and (X2 , Y2 ) denote the locations of the response unit and incident, respectively. S (S ) denote the set of points within (outside) the central square. Let A = {(X1 , Y1 ) ∈ S} and B = {(X2 , Y2 ) ∈ S}. R1 1 R2 1111 0000 0000 1111 0000 1111 a 0000 1111 a 1 (a) Let us consider the case in which incidents and the response unit are uniformly, independently distributed over the entire square. In this case, the expected travel distance can be 7 decomposed as E[D] =E[D | A ∩ B]P (A ∩ B) + E[D | A ∩ B ]P (A ∩ B ) + E[D | A ∩ B]P (A ∩ B) + E[D | A ∩ B ]P (A ∩ B ) . By symmetry, E[D | A ∩ B ]P (A ∩ B ) = E[D | A ∩ B]P (A ∩ B). Hence, E[D] = E[D | A ∩ B]P (A ∩ B) + 2E[D | A ∩ B ]P (A ∩ B ) + E[D | A ∩ B ]P (A ∩ B ) . We know that E[D] = and E[D | A ∩ B] = 23 a from class. Since A and B are independent and P (A) = P (B) = we have E[D] = (b) 2 3 2 a , 2 = E[D | A ∩ B]P (A)P (B) + 2E[D | A ∩ B ]P (A)P (B ) + 3 E[D | A ∩ B ]P (A )P (B ) 2 = a(a2 )2 + 2E[D | A ∩ B ]a2 (1 − a2 ) + E[D | A ∩ B ](1 − a2 )2 . 3 (i) The set B can be divided into two classes of identically-sized shapes: four rectangles of type R1 (bordering the central square) and four rectangles of type R2 (at corners of the unit-square). Hence, E[D | A ∩ B ] = 4E[D | A ∩ R1 ]P (R1 | B ) + 4E[D | A ∩ R2 ]P (R2 | B ) . Note that P (R1 | B ) = P (R1 | R1 ∪ R2 ) P (R1 ∪ R2 | B ). Since P (R1 ∪ R2 | B ) = 14 , we can rewrite E[D | A ∩ B ] as E[D | A ∩ B ] = E[D | A ∩ R1 ]P (R1 | R1 ∪ R2 ) + E[D | A ∩ R2 ]P (R2 | R1 ∪ R2 ) . (ii) By the definition of the conditional probability, P {(X2 , Y2 ) ∈ R1 | (X2 , Y2 ) ∈ R1 ∪ R2 } = P {(X2 , Y2 ) ∈ R1 } P {(X2 , Y2 ) ∈ R1 and (X2 , Y2 ) ∈ R1 ∪ R2 } = P {(X2 , Y2 ) ∈ R1 ∪ R2 } P {(X2 , Y2 ) ∈ R1 ∪ R2 } a(1 − a) 12 2a a = . = 1 2 = 1 1 1 +a a + 2 (1 − a) a(1 − a) 2 + 2 (1 − a) (iii) Let Dx , Dy be the travel distances in the x axis and in the y axis, respectively. From class, we know E[Dx | A ∩ R1 ] = a3 . If the locations of the response unit and incident are 8 uniformly distributed over S and R1 respectively, E[Dy | A ∩ R1 ] = a 2 + 1 2 · 1−a 2 . Hence, E[D | A ∩ R1 ] = E[Dx | A ∩ R1 ] + E[Dy | A ∩ R1 ] 1 7 a a 1 1−a = + a. = + + · 3 2 2 2 4 12 Note that E[Dy | A∩R2 ] is the same as E[Dy | A∩R1 ]. It is easy to see E[Dx | A∩R2 ] = a 2 + 1 2 · 1−a 2 . Therefore, E[D | A ∩ R2 ] = E[Dx | A ∩ R2 ] + E[Dy | A ∩ R2 ] 1 1 a 1 1−a + · = + a. =2 2 2 2 2 2 (c) From (a), we have W̄ (a) = E[D | A ∩ B ] = 2 3 − 23 a(a2 )2 − 2E[D | A ∩ B ]a2 (1 − a2 ) . (1 − a2 )2 From (b), E[D | A ∩ B ] is computed as follows: E[D | A ∩ B ] = E[D | A ∩ R1 ]P (R1 | R1 ∪ R2 ) + E[D | A ∩ R2 ]P (R2 | R1 ∪ R2 ) 7 2a 2a 1 1 1 + a + + a 1− = 4 12 a+1 2 2 a+1 2a 1+a 1−a 3 + 7a + = 12 a+1 2 a+1 7a2 4 2 1 1 2 a+ +1−a = a +a+1 . = 2(a + 1) 3 2(a + 1) 3 W̄ (a) is then given by W̄ (a) = 2 3 (1 − a5 ) − ( 43 a2 + a + 1)a2 (1 − a) (1 − a2 )2 = −2a4 − a3 − a2 + 2a + 2 3(1 + a)(1 − a2 ) = 2a3 + 3a2 + 4a + 2 . 3(1 + a)2 W̄ (0) indicates the expected travel distance when no zero-demand zone exists, which should be equal to E[D]. Indeed, W̄ (0) = 23 . W̄ (1) = 11 12 . If a = 1, the entire unit-square is a zero-demand zone. In this case, the response unit and incidents are uniformly distributed along the perimeter of the unit-square. 9 W̄ (1) is the expected travel distance from one point on the perimeter to another point on the perimeter. Let us compute this quantity in another way. Consider the locations of the response unit and incident. There are 16 possible cases to consider: • The response unit and the incident are on the same edge of the square (4 cases). The expected travel distance between two locations is 13 . • The response unit and the incident are on adjacent edges of the square (8 cases). The expected travel distance between two locations is 1 2 + 1 2 = 1. • The response unit and the incident are on opposite edges of the square (4 cases). The expected travel distance between two locations is Since all cases are equally likely, W̄ (1) = 4 16 · 1 3 + 8 16 1 3 + 1 = 43 . ·1+ 4 16 · 4 3 = 11 12 . 5. LO Problem 3.14 (Kang, 2001) R1 R8 1 R7 R6 Z1 1111 0000 0000 1111 0000 1111 0000 1111 Z2 R2 R3 R4 R5 1 (a) Given that (X1 , Y1 ) ∈ A and (X2 , Y2 ) ∈ B , the cases where the perturbation term is strictly positive are: • (X1 , Y1 ) ∈ R1 and (X2 , X2 ) ∈ R5 • (X1 , Y1 ) ∈ R5 and (X2 , X2 ) ∈ R1 • (X1 , Y1 ) ∈ R3 and (X2 , X2 ) ∈ R7 • (X1 , Y1 ) ∈ R7 and (X2 , X2 ) ∈ R3 10 The probability of the first case is computed by P ((X1 , Y1 ) ∈ R1 ∩ (X2 , X2 ) ∈ R5 ) P (A ∩ B ) a(1−a) 2 2 a2 = . = (1 − a2 )2 4(a + 1)2 P ((X1 , Y1 ) ∈ R1 ∩ (X2 , X2 ) ∈ R5 | A ∩ B ) = By symmetry, the probabilities of the other three cases are equal to a2 P (W̄E (a) > 0) = 4 × = 4(a + 1)2 a a+1 a2 . 4(a+1)2 Therefore, 2 . (b) Consider the case where (X1 , Y1 ) ∈ R1 and (X2 , X2 ) ∈ R5 . Note that in this case, there is no extra travel distance in the y direction. The travel distance in the x direction is given by DxB | R1 ∩ R5 = min(Z1 + Z2 , 2a − Z1 − Z2 ) = Z1 + Z2 , if Z1 + Z2 ≤ a, 2a − Z1 − Z2 , otherwise, where Z1 and Z2 is the x−distances from the left edges of R1 and R5 to the response unit and the incident, respectively (see the figure above). E[DxB | R1 ∩ R5 ] = a a−z1 0 0 0 a−z1 a a (z1 + z2 )fZ1 (z1 )fZ2 (z2 ) dz2 dz1 + (2a − z1 − z2 )fZ1 (z1 )fZ2 (z2 ) dz2 dz1 . Since fZ1 (z1 ) = fZ2 (z2 ) = a1 , a a 1 (2a − z1 − z2 ) dz2 dz1 a2 0 a−z1 0 0 a a 1 1 2 a−z1 1 1 2 a = 2 dz1 + 2 dz1 z1 z2 + z2 2az2 − z1 z2 − z2 a 0 2 a 0 2 a−z1 0 a a 2 1 1 1 2 2 = 2 a − z1 dz1 + 2 az1 − z1 dz1 2a 0 a 0 2 a a 1 1 1 1 2 1 3 az − z = 2 a2 z1 − z13 + 2 2a 3 a 2 1 6 1 0 0 E[DxB | R1 ∩ R5 ] = 1 a2 a a−z1 (z1 + z2 ) dz2 dz1 + 1 2 1 = a + a = a. 3 3 3 The expected travel distance in the x direction without the square barrier is 13 a. Therefore 11 the extra travel distance, given the perturbation term is positive, is 23 a − 13 a = 13 a. This gives 1 a 3 W̄E (a) = W̄ (1) = W̄ (1) + W̄E (1) = 11 12 + 1 12 a a+1 2 . = 1. To support this result, consider again the locations of the response unit and incident when a = 1 (see Problem 3.13 (c)). There are 16 possible cases to consider. Here, we focus on the extra travel distance due to the zero-demand zone barrier, which is additionally required compared to Problem 3.13 (c). • The response unit and the incident are on the same edge of the square (4 cases). The expected extra travel distance between two locations is 0. • The response unit and the incident are on adjacent edges of the square (8 cases). The expected extra travel distance between two locations is also 0. • The response unit and the incident are on opposite edges of the square (4 cases). Note that the expected travel distance between two locations was 1 3 + 1 = 43 . However, since no travel is allowed through the zero-demand zone, the expected travel distance becomes 2 3 + 1 (we can obtain 2 3 by using the same procedure as we used in (b)). Hence the extra travel distance between two locations is 13 . Since all cases are equally likely, W̄E (1) is computed by W̄E (1) = 4 16 ·0+ 8 16 ·0+ 4 16 · 1 3 = 1 12 . 6. LO Problem 3.18 (Kang, 2001) For this problem, we employ the notation used in class, which is a little different from the notation in the textbook. Let G(a) ≡ E[D p ] ≡ E[|X1 − X2 |p ]. Let us consider G(a + ε) that is E[Dp ] when the highway segment under consideration is extended by ε where ε is very small. Suppose a < X1 ≤ a + ε and 0 ≤ X2 ≤ a. Since X1 and X2 are independent, G(a + ε) for this case is computed as follows: G(a + ε) = E[(X1 − X2 ) ] = p a a+ε a 0 (x1 − x2 )p fX2 (x2 )fX1 (x1 ) dx2 dx1 , where fX1 (x1 ) and fX2 (x2 ) are the probability density functions of X1 and X2 , respectively. Because X1 and X2 are uniformly distributed over (a, a+ε] and [0, a] respectively, fX1 (x1 ) = 1 a. Thus, 12 1 ε and fX2 (x2 ) = 1 G(a + ε) = aε 1 = aε = a+ε a a+ε a 0 a 1 1 · aε p + 1 (x1 − x2 )p dx2 dx1 a −1 p+1 (x1 − x2 ) dx1 p+1 0 a+ε p+1 − (x − a) xp+1 dx1 1 1 a 1 1 1 1 · xp+2 (x1 − a)p+2 = − 1 aε p + 1 p + 2 p+2 G(a + ε) ≈ a = 1 1 · (a + ε)p+2 − εp+2 − ap+2 aε (p + 1)(p + 2) = 1 1 · (p + 2)ap+1 ε + o(ε) , aε (p + 1)(p + 2) where o(ε) represents higher order terms of ε satisfying limε→0 ap (p+1) a+ε o(ε) ε = 0 (“pathetic terms”). Clearly, as ε → 0. When 0 ≤ X1 ≤ a and a < X2 ≤ a + ε, we also have G(a + ε) ≈ ap (p+1) as ε → 0 by symmetry. If 0 ≤ X1 ≤ a and 0 ≤ X2 ≤ a, then G(a+ ε) = G(a). Finally, we do not have to compute G(a+ ε) for the case where a < X1 ≤ a + ε and a < X2 ≤ a + ε because the associated probability is negligible. The following table summarizes G(a + ε)’s. Case 0 ≤ X1 ≤ a, 0 ≤ X2 ≤ a Probability of a case a a a 2 a+ε · a+ε = ( a+ε ) G(a + ε) given a case G(a) a < X1 ≤ a + ε, 0 ≤ X2 ≤ a ε a+ε · a a+ε = εa (a+ε)2 ap (p+1) 0 ≤ X1 ≤ a, a < X2 ≤ a + ε a a+ε · ε a+ε = εa (a+ε)2 ap (p+1) a < X1 ≤ a + ε, a < X2 ≤ a + ε ε a+ε · ε a+ε ε 2 = ( a+ε ) We do not care. Using the total expectation theorem, we obtain G(a + ε) = G(a) = G(a) ≈ G(a) a a+ε a a+ε a a+ε 2 + εa εa ap ap + + o(ε2 ) 2 (p + 1) (a + ε) (p + 1) (a + ε)2 + εa 2ap + o(ε2 ) (p + 1) (a + ε)2 + εa 2ap . (p + 1) (a + ε)2 2 2 13 From the formula of the sum of an infinite geometric series, we know 1 a = a+ε 1+ ε a =1− ε ε 2 ε 3 + − + ··· . a a a Ignoring higher order terms of ε, we get ε a ≈1− . a+ε a This gives the following approximations: 2 ε 2 2ε ε2 2ε + 2 ≈1− , ≈ 1− =1− a a a a 2 a 2ε ε 2ε2 ε ε ε εa 1 − = − 2 ≈ . = ≈ 2 (a + ε) a a+ε a a a a a a a+ε Therefore, we can rewrite G(a + ε) as 2ε G(a + ε) ≈ G(a) 1 − a 2ap ε 2ε 2ap−1 ε + · = G(a) 1 − + . (p + 1) a a (p + 1) Rearranging terms, we have 2G(a) 2ap−1 G(a + ε) − G(a) =− + . ε a (p + 1) If ε → 0, we have the following differential equation: G (a) = − 2ap−1 2G(a) + . a (p + 1) “Judicious” guesses (or consultation with books on differential equations) lead us to the following solution: G(a) ≡ E[D p ] = 2ap . (p + 1)(p + 2) We can skip the derivation of the differential equation by directly using Equation (3.64) in the textbook. Once we obtain G(a + ε), we can plug it in (3.64), which gives the same differential equation as above. 14