g ( LIBRARIES) 'o/TV C9 M.I.T. LIBRARIES - DEWEY >3, HD28 .M414 Dynamic Multidrug Therapies for HIV: A Control Theoretic Approach Lawrence M. Wein, Stefanos Zenios, and Martin A. #3874-95-MSA Nowak December, 1995 MASSACHUSETTS INSTITUTE ^23 J/ 1996 LIBRARIES Dynamic Multidrug Therapies for HIV: A Control Theoretic Approach Lawrence M. Wein Sloan School of Management, M.I.T. Cambridge, MA 02139 Stefanos Zenios Operations Research Center, M.I.T. Cambridge, MA 02139 Martin A. Nowak Department of Zoology, University Oxford 0X1 of Oxford UK 3PS, ABSTRACT Motivated by the inability of current drug treatment to provide long-term benefit to HIVinfected individuals, we derive HIV therapeutic strategies by formulating and analyzing a mathematical control problem. The model tracks the dynamics of uninfected and infected CD4 + cells and free virus, in time, several different and allows the virus to mutate into various strains. At each point therapeutic options are available, where each option corresponds to a combination of reverse transcriptase inhibitors. current status and chooses among minimize" the total viral load. Our The controller observes the individual's the therapeutic options in a dynamic fashion in order to initial numerical results suggest that dynamic therapies have the potential to significantly outperform the static protocols that are currently in use; by anticipating and responding to the disease progression, the dynamic strategy reduces the total free virus, increases the uninfected CD4 + count, and delays the emergence of drug- resistant strains. December 1995 Submitted to Journal of Theoretical Biology Introduction 1. Optimal treatment of the human immunodeficiency virus of type the subject of intense research activity. Rapid progress has been and testing of anti-HIV therapeutic agents, resulting Administration of five reverse transcriptase in in viral made (HIV) infection in the is development the approval by the Food and Drug (RT) inhibitors (AZT. ddl, ddC, d4T, 3TC) and one protease inhibitor (saquinavir). These potent drugs inhibit a rapid decline 1 viral replication and lead to abundance, often within days after starting therapy. Unfortunately, these drugs have had only limited success in delaying the onset of AIDS: continual viral replication of DNA. HIV, together with the high error rate of reverse transcription of viral leads to the emergence of drug-resistant virus strains, typically within (depending upon the drug) of treatment initiation (McLeod aL. 1993; Volberding ef aL. 1994: Wei et a/.. multidrug resistance has been documented 1993; Richman et ah. 1994; of protease inhibitors in 1995; Schuurman combinations of Shafer et aL. 1994, 1995; Kojima (Condra et aL. shown that the AZT-3TC combination & Hammer. RT into weeks or months 199*2; 1995). Lagakos et Moreover, inhibitors (Larder et aL. et ai., 1995). although Larder et is et aL. RNA 1995) and combinations a!. (1995) have recently able to sustain in vivo antiviral effects for at least 24 weeks. It would appear that increased effectiveness of HIV therapy could be achieved by developing dynamic multidrug approaches, where the combination of drugs received by a patient changes over time in response to the disease progression (e.g., current viral load, mix of viral strains). In this paper C4 + count, we mathematically model the dynamic multiple drug therapy problem as an optimal control problem that can be informally described as follows: choose an optimal mix of drugs viral load over regimen. some time horizon (e.g.. at each point in time in order to minimize the total one year) subject to a toxicity constraint on the drug Several mathematical models have been developed that incorporate the effects of ther- apy on an HIV-infected individuals. In a 1993; Kirschner & series of Perelson, 1994; Kirschner papers (Perelson, 1989; Perelson & Webb, et al., 1995a; Kirschner et al, 1995), Perelson and Kirschner and their colleagues have studied the timing, frequency and intensity of AZT treatment. Agur (1989) focuses on the optimal tradeoff between the toxicity and efficacy of AZT. The mathematical models considered Nowak et al. (1901), and Kirschner McLean t Nowak & Webb in these studies (1992), do not allow Nowak k May for virus (1993), Frost mutation. k McLean (1994) (1995b) analyze descriptive (as opposed to optimization) models for the competitive interaction of AZT-sensitive and AZT-resistant strains of HIV; the latter three papers also include numerical simulations of alternating and/or combination therapies. Our deterministic control problem, which number of virus strains, or quasispecies, A number finite is described in Section of therapeutic options are allowed, RT CD4 + and infectious and infected strain; in this context, the uninfected CD4 + CD4 + RT cells where each option consists of the inhibitors. The model si- incorporates uninfected free virus associated with each virus inhibitors prevent the free virus from successfully infecting Each drug option has a cells. considers a finite and allows mutations from one strain to another. multaneous application of one or more T-cells, 2, different efficacy against each virus strain, thereby allowing for complex drug-virus interactions. Because of the high dimensionality of the control problem, we resort to approximation methods in Section more 3; specifically, perturbation methods and the policy improvement algorithm are used to derive a closed form dynamic policy. Several other types of therapeutic approaches, such as protease inhibitors and the reconstitution of the considered in Section discussed in Section 4. 6. immune system, are Results from numerical simulations are reported in Section 5 and Concluding remarks can be found in Section 7. . 2. Problem Formulation Our mathematical model incorporates be the density of yi(t) CD4 + If we by strain cells infected density of infectious free virus of strain t is time at i and r, = i 1. let u,-(£) . . . , /, let denote the noninfectious virions are ignored in our model. i; define x(t) to be the density of uninfected the system at time HIV. For / different strains of given by (x(t), yi(t), . . , CD4 + cells at yi(t), V\{t), . . . , time vj(t)), t, then the state of which denoted by is (x(t). yi (t).v (t)). t The controller has RT inhibitors. = 2 might be a combination of and combination is ' j the amount of option infectivity rate, J therapeutic options denoted by 3,. efficacy of which manner. drug combination j drug policy of pji strain dj(t), and the is 1 t, where each option corresponds to might represent AZT. ddC and d4T. Our assume that each For i the infectivity of virus it i = is The RT = — 1,...,J, J2j Pj>dj{t)]: CD4 + cells; own the inhibitors reduce virus infections by virus strain $i[l used with ddl virus strain has its infects uninfected 1....,/ and j new AZT control variables are dj(t). which the next section. in blocking in We t. the rate that is reason for the "tilde" will become clear infectivity in the following = For example, j to use at time j time at let i. we assume p3 i denote the Under a generic that the values toxicity parameters in equation (4) are chosen so that the infectivity of each nonnegative under The dynamics of our all feasible therapeutic strategies. system are described by the following set of ordinary differential equations: x(t) = A - (f, + £ ft«j(0[l >=1 m di(*)])*(*). X>* = .7 (1) 1 J = (E 9wW*)[i - E/wM')Lk>) - b i {t) rate at which uninfected o.-y.-(*), ( 2) 3=1 fc=l The - = CD4 + ir i yi (t)-[ki cells + Pix(t)]v (t). (3) i are invaded by virus strain i at time t is $ t v (t).r(t l ). and each of these potential infections leads to a reduction infections by strain is /?,-[l i — Z)j =] decline in uninfected cells x(t) is j. rise in infected cells y,(i). the fraction of reverse transcriptions of strain Hence, J2j=i = <7«j CD4 + interpretation infected cell. cell, is Tr t free virus of strain is Clair et qi 3 and cell rate gt j in (2) infected by strain replicates at rate 7rj a/. at rate lytic burst 7T,-y,-(i); 7r, after it in has an alternative during the death of an can capture a wide range of drug-virus and antagonistic (1991) and Larder ef a/. i produced t Notice that the quantities pJt-, St. i /a virions are produced by a interactions, including both synergistic observed by that result in a i terms are nearly zero. Strain and thus that The mutation an d the diagonal terms of the mutation matrix are close to one 1 value, while the off-diagonal infected a rate of successful and these infections cause a simultaneous Pjidj(t)]vi(t)x(t), and The in free virus. combination therapy, as effects of (1995), and surveyed by Wilson and Hirsch (1995). Uninfected CD4 + compartments tissue pool of precursors A. In addition, CD4 + (e.g., (e.g., we cells, cells cells let increase because of (exponential) proliferation in peripheral secondary lymphoid organs) and/or (linear) production from a the thymus). For simplicity, [t, a and k denote the respective natural death t t infected by strain Because the primary focus of i and this free virus of strain paper is rate rates of uninfected i. on therapeutic regimens and not on natural human immune response into the Hence, the mutants are assumed to escape from the drugs, not from the immune disease progression, model. we assume a constant production response. we purposely do not incorporate The model from actively infected infected cells are not incorporate the also ignores latently infected cells; although cells (Ho unimportant et ai, 1995; for the new data from Shaw location of considerable production of Wei et a/.), this emergence of drug most plasma virus comes does not imply that latently resistance. Finally, our (1995). which suggests that the plasma virus and many new model does lymph system infections. is the We impose the toxicity and nonnegativity constraints £^(0<i. 3 > dj(t) where t3 is constant over time (and is is taken to be unity, without assumption adapted to the more in (4) is j. is However, because the one therapeutic option at each point innocuous. Moreover, the dynamic policy can be realistic setting where the months toxicity threshold of intense therapy of light therapy) or the severity of the patient's side effects: drug combination dictated by the index policy in time, loss of generality). typically use only specified function of either time (e.g.. six months (5) additive across drugs, and that the toxicity threshold emerge from our analysis in time, the additivity easily 0. a (normalized) measure of the toxicity per unit time of drug combination Hence, we assume that toxicity policies that (4) =1 in the is is an exogenously alternated with six we propose that the next section be employed at each point but the dosage should be altered accordingly. Let T be the time horizon. Then the mathematical control problem controls {dj(t),t > is to choose the 0} to minimize T L 5>(*)d (6) t=l subject to (l)-(5). Our control problem assumes that the decision maker, in choosing dj(t) at time observe the current state (x(t),y (t).v l to quantify virus load, CD4 + cell l {t)). Although recent technology makes counts and virus-infected cells in it lymph system is the blood of infected Moreover, quantification of virus load and infected even more complicated. A more realistic control can possible individuals, these techniques are not currently available for day-to-day treatment of infected patients at large. t, problem HIV- cells in the for current day-to- day treatment would allow the controller to see only a partial observation of the state (e.g., only the total problem CD4 + count x(t) (l)-(6) then + YlUi an ^ the Vi(t) becomes a nonlinear filtering greatly complicate the problem, reasonably on page 161 of Elliott, good ^ total vir (x(t),yi(t),Vi(t)} at time control derived. Kalman filter However, because we dynamic multidrug therapies, Analysis control problem (l)-(6) does not appear to admit a closed form solution. More- ai. 1962) cannot solve a problem of on Pontryagin's maximum principle (Pontrvagin drug options and 30 virus realistic size (e.g., five Hence, we resort to an approximate method, which employs perturbation methods strains). in The t. over, standard numerical techniques based ct ';(0)- and hereafter assume that the controller can observe 3. The l policies (e.g., the linearized Aggoun and Morse, 1994) can be this line of inquiry Hi=i problem; although partial observations are primarily interested in the potential value of employing we do not pursue l° a<i conjunction with ideas from dynamic programming. 3.1. time t Overview. to time (x,yi,Vi). T Let V(x.y .v t t ,t) denote the cost incurred (as given under the optimal policy, given that the state of the system V For ease of notation, we suppress the dependence of the dynamic programming optimality equation dv + f < its at from time t is arguments. Then (Bellman. 1957) f^dv, ' „ . . ,\ dV mm - — {^GfJ}l . is on in (6)) 7 d.r 1=1 J=l j J (<) j=\ fc=l where Q follows: denotes the set of admissible controls that satisfy (4)-(5). we use asymptotic methods to obtain closed Our basic approach form approximate expressions (2I+l)-dimensional process {(x(t),yi(t),Vi(t)} under a static control policy 6 (i.e., is as for the dj(t) = d3 for all t). From these same interpretation expressions, as the V in (7), policy, not the optimal policy. (Howard. 1960) is derive a. closed form expression for except that the (approximate) cost Then one iteration of the policy is V which has the under the static improvement algorithm V performed: we differentiate our approximate closed form expression for with respect to x and y the we t . substitute these derivatives into the right side of (7) and perform embedded minimization in this equation. This approach yields a closed form dynamic drug control policy: it specifies how much of each drug combination to use at each point in time in terms of the current state, the current time, and exact, then all the problem parameters. If our closed form expression for V was dynamic programming theory would imply that our proposed policy was better (yields a lower cost in (6)) than the static policy fact, (in if our expression for I' was exact then repeated policy improvement iterations would generate a sequence of policies that converges to the optimal policy). However, our expression draw this conclusion. is not exact, and we cannot V Nevertheless, this philosophy (finding an approximate value for and performing one iteration of the policy improvement algorithm) has been used with considerable success in designing dynamic call acceptance/rejection protocols for stochastic models of telephone 3.2. traffic (e.g., The Optimal is Because our proposed dynamic policy takes a Static Policy. static policy as its starting point, optimization problem Ott and Krishnan, 1985; Key. 1990). it is natural to employ the optimal static policy. The static given by the following nonlinear program: r minimize,..,,,.,,,.^ ^= ! bject to A -(fi + Yl &«i[l " EM;])-r = i=i (E q*hvk[l (8) v, 1 °- ( 9) j=i -J^Pjkdifjx - a iyi = 0. (10) j=i fc=i TT.-t/i - [ki + 0iX]Vi = 0, (11) We £^<1, (12) dj>0. (13) have been unable to find a closed form solution to (8)-(13); however, standard numerical techniques (e.g., straint (12) is Avriel. solved by direct enumeration of J+ all infectivity of virus is i /^ J,, where d t = Asymptotic Analysis. 1 — the additivity con- = or tJ for all j; feasible solutions. 1 Under Let d* denote the optimal static policy that solves (8)-( 13). 3.3. If not realistic then one can change constraint (13) to d3 new problem can be this 1976) can be used to solve this problem. this policy, the J2j=i Pji^j- we In this subsection find a closed form approximate expression for the system trajectory under the optimal static policy. Consider equations (3). with J, taking the place of 1 — 5l/=i Keeping Pjidj(t). in mind that we will ( 1 use the solul ion to these equations to derive V. let us consider the initial conditions (in this subsection, s a generic time index and t denotes a specific point x(t) = To perform the perturbation generality, let us assume that @\ = x, = yi{t) analysis, Vi{t) yi j3 t . = Vi (14) . Then we let e = /?i e. Without loss of and define - = is time) we introduce the parameter rnin{i<,<j} &, in )- (15) f Although 0! = 1, we The asymptotic 10 analysis than one). Typical values less -5 ; parameter retain this is in the model. based on the assumption that e is small in value the infectivity parameter in the literature for $ (i.e., much are roughly hence, the approximation should perform well. We x(s) assume that our solution = X M(s) + ex^(s). is y> (s) of the form: = 0) y\ (s) + 8 ey { t l \s). Vi (s) - 0) v\ (s) + 1 er' '^). (16) Although we could define and solve is for higher order terms, just deriving the first order We quite cumbersome, at least without a computer. use (15) to replace use (16) to substitute in for the system state in equations (l)-(3) and (14). we collect terms of order Collecting the constant e' unknown to solve for the (3 t by For i terms e/? t and = 0,1. processes on the right side of (16). terms yields the following system of differential equations: (i.e., e°) = \-iix {0] {s). x {0) (s) 0) y\ ii°\s) = 0) = (s) nyl (17) - 0) (3) (is) (s), -*iy\ kiV^is), (19) subject to the initial conditions = ,-<°>(0 It is y\°\t) .r. = y $\t) = t », (20) worth noting that the mathematically perturbed system (17)-(20) corresponds precisely to the physical perturbation performed use data from Wei et ai., Ho et al. when RT giving potent and Perelson et al. who used (1995), and protease inhibitors to perturb the system, to estimate values The solution to equations (17)-(20) z (o)( 3) =A+ The order e = ( x _ t A) e -M(.-*) ai{ Vi*- = JEiW_ e -a,(.^) + — k various for the RT inhibitors model parameters. is V?\s) „<*»(,) we inhibitors. In Section 4, {v . (21) ) ~\ t _ J™_ — k a, (22) t )e -M*-*). (23 ) a; terms lead to the following system of differential equations: I x(%) = - M xW( a )-x(°>( a )j:M»} 0) (-)i i) »} (-)=x (o, (*)(x:««A^f (*))- a *»i 1) (*). ( 24 ) ( 25 ) , 9 riy l%) = v\'\s) - kivPia) - i3 t x^(s)vl 0) (s). (26) subject to the initial conditions 1) .r< Now we tially. The solution is yV(t) 0, = = 1) u{ (i) 0. (27) given in Appendix A. The Proposed Dynamic tems (17)-(20) and (24)-(27) (s))ds. This integration simplified tice, = substitute the solution (21)-(23) into (24)-(26) and solve equations (24)-(27) sequen- 3.4. ev\ (0 if T is carried out in is very large (i.e.. function (and hence a policy) that is is t The proposed B. in years) in relation to T = set V by J 5Zf_ 1 (f,- solution (Ho Wei ef al.. oc and = / in denoted by V c . to distinguish t In prac- the time scale of the et a/.): hence, this % Now we Ctiki Wij3 . ' /_^_ + \otikiH it from the value function the (at most J + 1) is a T= oo and t = in (7). 0) (ai x ~ + fi)(ki A //' + y- Mil - \ (j. fi)J I ' J=l otjkj (28) - (29) Ja can substitute these derivatives into (7) and perform the minimization. cause the function to be minimized is (46) to obtain a value WctUt^-l)!*)^^), e + independent of the time horizon; this approximate, is Differentiating (46) with respect to x and y yields (with + {s) greatly is independent of the time horizon. a daily basis Consequently, we stationary value function Appendix value function systems dynamics, which change on natural assumption. solution to the asymptotic sys- hand, we can estimate the value function in we assume that the the time horizon With the Policy. linear in our controls extreme points of the constraint set (4)-(5), form. Let us define the dynamic quantities I Ci{t) = 0Mt) 7=1 10 dj(t), Cilk'i the solution d*(t) and can be found is Be- one of in closed / Ki quel, ^ y. _ J_ ";A, *jqijdi \ / ajkj '\ ^ fc, Then the proposed policy + 7T/[^A(Qf + A-; ctiki(cu apply no therapy is: // ) + - Ay (?,/.; (Q y (0] f _ (i.e., d^(t) < = f v(Q \ k + (i' t for j = J) 1 (30) if (31) 0: Tj d".(t) (i.e.. = t~, , d~(t) = for j = argmax^^^jj eLiPj.-c»(0 j We f + fi)(k + (i) max {l<j<J} otherwise, use drug combination j' Q ^ j"), where / QO x (32) . conclude this subsection with several remarks. The proposed therapeutic strategy j has the dynamic index is H t=] PjiCi(t)/Tj, a dynamic index and at policy, each point in time the policy uses the drug combination that possesses the largest index. The quantity marginal increase virus strain i at in the total future viral load time dynamic marginal /. The index cost for each virus for where drug combination if we c (t) essentially represents the t one more CI)4 + let drug combination j is by the efficiency of drug cell get infected by computed by weighting the j for that virus, summing up over the virus strains, and dividing by the drug combination's toxicity. Hence, our dynamic policy uses information on the effectiveness of each drug against each virus, the current potential cost (in terms of total future viral load) of a new infection by each virus strain, and the toxicity of each drug combination, and summarizes manner. An important advantage of index . policies is this information their ease of use: the derivation and implementation of an index policy is in a succinct the complexity of independent of the problem size; hence, policy (30)-(32) can easily be derived for a problem with 20 drug combinations and 150 virus strains. Although this policy is not the optimal solution to problem (l)-(6). the optimal solution to dynamic resource allocation problems is often characterized by index policies (e.g.. Gittins. 1989). Notice that if we took e = in (30) then the policy would be independent of x and whereas the proposed policy depends on the entire (2/ 11 -+- ?/,-, l)-dimensional system state; this suggests that incorporating only the e° terms leads to a rather crude policy. Also, one of the drug combinations would always be administered inequality (31) never holds) (i.e., Inquality (31) was never satisfied in any of our numerical runs: the proposed = if e 0. dynamic policy always used one of the drug combinations. It is possible to implement further iterations of the policy us denote the proposed policy in (30)-(32) by eQ it improvement algortihm. Let although this control (t); can be expressed solely as a function of time because the system define d = (t) t — 1 state-dependent, deterministic. we If then this quantity can be used as our starting policy \t) J2j=iPjidj is is next policy improvement iteration. Turning to the asymptotic analysis, we observe for the that the e°-order system (17)-(20) and If we substitute is still d] (t) in for d t in its solution (21)-(23) are independent of the control. equations (24)-(25). then the eorder system (24)-(27) a set of linear differential equations thai can be easily solved numerically using the matrix exponential (Golub and Van Loan. 1989). Then we can carry out the calculations Appendix B and equations (28)-(29) on a computer, and perform the minimization to get a also new policy d } (t). Of in (7) course, higher order terms in the asymptoic expansion can be performed relatively easily with a computer, and it is conceivable that, with enough higher order terms and enough policy iterations (typically, only a handful of the latter required to get close to optimality). such a procedure would generate a policy that close to optimal. in However, because the proposed policy in (30)-(32) is performed well in is very our numerical study, we have not pursued this computational approach. 3.5. A Symmetric Case. To gain a better understanding we focus on the symmetric case where I Ptj — p for i ^ strain has the strain i. j (where p > p), q n same parameter Also, let y = Y.\=\ Vi — q values, and v = = and and = J, a, q,j for £)f=1 v = t (I = i a, k 1, — . t = q)/(I . , . /, k, — 7r t = of the proposed policy, 7r, 1) for i /?,• / = j. /?, Tj = 1, = p, Hence, each virus drug combination i is targeted at denote the total number of infected 12 pn cells and Symmetry arguments imply the total free virus, respectively. extreme point static j and t, or d3 {t) = policies 1/7 for efficacy" of the static the second policy. and # is dj(t) = 1/7 all j and > if t (<-- 1 — p zero for the is for all so): dj(t) where p i. policy and first the one minimizing (8)) (i.e., [p is is for all the "average {I — dj(t) = + = 1 )p]/I for < if where 0. k iTii + a Then d = t. best static policy airy //(A-- q; optimal (although perhaps not uniquely drug policy, which The — = is that one of the two symmetric Jf — r— -)(— a oq r- A x-- - A A' (q /' + /.i)(k — a) + (v--, k )- a k + fi. (33) The solution in the symmetric case reduces to: r-(.r-y)>—— ~ if 7r( 1 — p) < a p[?r( then apply no treatment Q + ~ /' p) — then apply no drugs + if (34) ; ak a\ + //)(fc + /x) + r-(r-«/) < ^7— /3|_3r Otherwise administer drug combination - — — p) > a if (a , 1 If 7r( 1 /*, (_ 1 — p) where — A(q + A- + /0 r v~(t) (35) • aj qA' > Uj(f) for ? = 1. .... 7. Hence, the proposed policy in the symmetric case always applies the drug combination This therapeutic strategy certainly seems that corresponds to the most prevalent virus. reasonable, although not obvious nor necessarily optimal. Also, the drug/no drug decision in (34)-(35) depends only on the relative value of the amount of between the number of uninfected and infected cells. the drug/no drug decision) in three dimensions as a straight line in and the difference Hence, the "switching curve" (between x, y and v) can actually be expressed two dimensions. 4. The (i.e.. free virus v Alternative Therapies tedious part of the analysis in Section 3 is the perturbation analysis that leads to the derivatives of the value function for a generic static policy. 13 Now that these derivatives have been estimated, it a relatively simple matter to consider other types of therapies. is Here are two examples; much of the previous notation Protease Inhibitors. Protease is reused. inhibitors render newly produced virions non-infectious. Suppose we have J combinations of protease inhibitors, and the controller must decide how much < of each to use subject to II,=i Tjdj(t) matrix pJn and the resulting virus replication rate to that in Sectic;. 3 yields the The 1- is 7r t [l drugs' effectiveness — Y?i=\ Pjidj{t)]. is An given by the analysis similar dynamic index E'"'**^*" (36) drug combination for At each point j. in time, the combination with the largest index administered. Note that d t = one — 1 in J2j=i Pjid*, index is positive: otherwise, no drugs are should be positive, so therapy should always be applied. Define 'i^f- where this if the proposed therapeutic strategy uses optimization problem that dj solves the static is analogous to the Section 3.2. Integrating the approximate value function in Appendix B with respect to Vi gives and substitution of W. 1 dvi ki tA ki(ki it) fegofe [p ajkj -iW-A). this quantity into (36) yields the problem parameters and the current 3.5, + state. h) (37) v proposed therapy in terms of the For the symmetric case considered in Section the solution in (36)-(37) employs the protease inhibitor combination that corresponds to the largest value of y,{t) at each time has infected the most that is, the therapy targets the virus strain that cells. Reconstituting the Immune System. (IL-2), that reconstitutes the immune the production rate of uninfected where our control t; A(r) € [0,A]. Consider using a drug, such as interleukin-2 system. This drug affects our model by increasing A, CD4 + cells. Suppose the new production rate Then the optimization problem embedded 14 is in the A + A(i), dynamic programming optimality equation simply to minimize A«-. The proposed policy is \*(t) and X*(t) = otherwise, where 4^ = \ if ^<0. ox (38) given in equation (28) (with d is is t = Define the 1). constants *= > If c t drug; for all i £*?-!-• then we never use the drug, and neither of these cases hold then a if the right side of (39) is if c, (») < dynamic policy for all is ~, if > a (that . is, and so this course, our cell will 7r t > a,; always given I\\," first quantity on ) value of nearly equal to ^/(q^A:,), and only if 7r £ - > a, for all i. If than unity free virus particle during its lifetime), fuel to the fire". hence, adding uninfected if is for the infected cells is greater produce more than one drug effectively "adds more suggest that some is then the "basic reproductive ratio each infected The the expected (with respect to the mutation probabilities q tJ we ignore mutations then the drug t then we always use the optimal. w/(ak). Since the mutation rates are very small, this quantity and i CD4 + Empirical results (see Section 5) cells in isolation is not desirable; of model has not incorporated an immune response, and thus may be omitting positive side effects of additional analysis can be performed for a CD4 + cells. Although we do not do so here, a similar therapy that reduces CD4 + cell production. Other therapies that can be analyzed include certain forms of immunotherapy (which would increase the death rate of CD4 + expansion of decrease y;, cells (Wilson free virus et al., and/or the death rate of infected cells), ex vivo 1995). which would simultaneously increase x and dynamic gene therapy (Nabel, 1994), which would increase i\ and for certain strains. Most importantly, we can also consider ously, with a joint toxicity constraint. RT inhibitors and production of CD4 + employing some of these therapies simultane- For example, cells if (Schwartz 15 one allowed the simidtaneous use of et a/., 1991), it may turn out to be beneficial to introduce CD4 + when the times cells at infectivity of the virus and the viral load are sufficiently suppressed. An 5. In this section, the under several simpler rather, we dynamic model policies. Example Illustrative is simulated under the proposed policy, as well as No attempt has been made to generate a model of realistic size; consider only two virus strains and two drug combinations in order to illustrate the nature and the impact of dynamic drug treatment. In a subsequent study, use data from multidrug clinical Parameter Values. 5.1. Table 1. a Ho et al. and Wei per day. About model are displayed in analyze larger models. The parameter 2% estimate a et al. of the total cells/mm 3 per day. The death rate equilibrium CD4 + contains roughly 5 x 10 6 human body for to values for our Most of these values were sequentially derived from existing data manner. cells trials to we plan CD4 + count is A/// = // mm CD4 + in the following production rale of roughly 1.8 x 10 9 population resides in the peripheral blood, and 3 . Hence, the 1.8 x 10 9 figure is comparable to 7 was chosen to be 0.007 per day, so that the virus-free 1000 cells per mm 3 , which corresponds to the CD4 + count an uninfected individual. Now we Wei et al. turn to the death rates a, of infected cells estimate the death rate of virus-producing recentby. using more accurate data. Perelson et al. and cells to A*; of free virus. Ho et al. and be about 0.35 per day. More (1995) show that the about 0.49. They were also able to get a rough estimate of 3.07 mean death for the rate is death rate of free virus. Some of the remaining parameters are derived by considering the quasi-steady state conditions before drug treatment. For typical pre-treatment values, (over 20 individuals) in Table 1 of Ho et a/.; we use the average the average pre-treatment 16 CD4 + values count was 180 mm per cells 3 and the average was present virus in mm was 134 virions per viral load 3 Only the wild-type . we can nonnegligible amounts before treatment. Hence, the wild-type virus, and set the consider only side of equations (l)-(3) equal to zero (reflecting the left + quasi-steady state) to obtain a set of four equations (equations (l)-(3) and x and four unknowns: the pre-treatment number of uninfected infectivity rate J3 and the replication rate solving these two equations for mm 3 per mm 3 per day and y , = J3 180/)i>/(a and the fraction of cells agreement with the estimate of for yields k 7r = v(k + 0x)/y — produced by an infected is produced and cell in + Substituting 180 K. y yields fSv) = = J3 (aX found is y/(x Embretson in - y for x in (1) 180a/i)/(lS0m> 11.86 cells per that are infected 5% — x and infected cells mm + et al. 3 y) = in and (2) 168.14 is -4 cells in close Finally, solving (3) (1993). Readers should keep the 2.58 x 10 0.066, which = 35.18 virions per day, implying that tt/o our model. = Hence, x . 180) cells y, and - \v) = = y mind 71.8 virions are that most virus the lymph system, whereas our estimates for k and $ are based on plasma in concentrations. We 10" 5 we use the mutation rate calculated We also . pn let = assume that drug combination P22 = 0.95 and p 12 = P21 effective at blocking infections of its = own not is dominant AZT) RT at i 0.05. is (1995), q u targeted at virus strain i. = More 921 = might arise if 5% is The 95% effective at blocking infections virus 1 is an AZT-resistant strain time zero, and the two drug combinations correspond to two other inhibitors. x 3.4 specifically, meaning that each drug combination strain, but only of the other strain; such a state of affairs that Mansky and Temin in toxicity coefficients are set to t x - r2 = 1. so that if (i.e., dj(t) = 1 then the amount of drug combination j administered corresponds to the threshold toxicity level. Notice that, until now, the parameter values are consistent with the symmetric case introduced in Section 3.5. Now we introduce asymmetry by letting n 2 17 = 0.9^!; hence, we assume that state virus 1 has a higher replication rate than virus 2. was taken as the starting point of our simulation runs Variabl< The pre-treatment equilibrium (see Table 1). 80 100 time (days) Figure 1. System behavior under the dynamic 19 policy. 80 100 time (days) Figure 2. System behavior under the continuous treatment 20 policy. the same quantities under the continuous treatment policy that continually applies drug (i.e.. di(t) = l.d 2 (t) = for all /). Under both were very similar, with the free virus v t policies, the shape of the lagging behind the infected cell count days; hence, the dynamics of the infected cells do not appear in Figures simulated the model under the optimal static policy derived in Section d { — 0.5-3, d2 = 0.47. Although we do not include figures average (over one year) uninfected policies are reported in Table 2. CD4 + for the is 1 3.2; and y, and y, curves by several 2. We also the solution was optimal static policy, the count and average total viral load for the various Readers should note that the optimal static policy was derived under the assumption that drug toxicities are additive; valid then the static policy v, 1 not a feasible alternative. if this assumption is not and cells, this quantity peaks at three months; the peak corresponds to more than a three- Not surprisingly, the dynamic policy fold increase over the pre-treatment value. treatment with drug with drug 1 The policy first week at the beginning of the second month) switches back and forth 1. for over a initiates uses drug 2 on day 18, and irregularly (e.g., it stays between the two drugs about once per day until the end of the third month. During the fourth month, the two virus strains feed on the large pool of uninfected emerge, peaking with a total viral load that The drug treatment roughly four times the pre-treatment is attempt to simultaneously control both viruses. The the majority of The and fifth large viral load in turn leads to CD4 + count. This reduction more in viruses back under control during the fifth (e.g., fit virus new steady state is cell infections, policy expended more effort Continuous Treatment. we again to the first uninfected cells month. After slowly reached. Over the allocated 58.7% of the time to drug and the uninfected is six about 55% this time, these oscillations month with first six rise in the CD4 + 2: hence, the dynamic virus. fit in Figure the end of the month. This high viral load then drops the uninfected which in turn leads to a 2, count; however, in contrast policy, the continuous treatment policy suppresses virus level, dampen a value of about 1 throughout the months. Virus 2 emerges during the second month, and reaches a very high pre-treatment count months, the dynamic policy and 41.3% of the time to drug and a linear CD4 + allows the drugs to bring both Under the continuous treatment policy pictured 1 an higher than the pre-treatment than the optimal static policy on the more see a rapid drop in virus dynamic 1 in 1. the total viral load peaks again at the end of the seventh 240) and a month, less fit virus 2 constitutes decreases in the fourth month, bottoming out at a level that uninfected this level. months, and hence the dynamic policy exerts more controlling the its effort in and simultaneously between the two options during oscillates rapidly of the free virus during the fourth cells CD4 + level by count below its reduction of free virus 2 in the third month. 99 Comparison of Table 2 shows that the dynamic policy performs much better Policies. than the continuous treatment policy: in average (over the first six months) the continuous policy achieves a 17.4% reduction viral load and a 31.6% increase in CD4 + uninfected count with respect to the drug-free equilibrium, whereas the dynamic policy achieves a 53.4% reduction peaks and viral in viral load CD4 + and 126.2% increase valleys are less in uninfected CD4 + count. Moreover, the pronounced under the dynamic policy than under the continuous treatment policy. Finally, the dynamic policy, by frequently switching between the two drug options during the early months, delays the emergence of virus 2 from the second month until the fourth month. Although the results are not reported here, we also tested an alternating policy that uses drug 2 for the last three We and for the first three 1 months. Not surprisingly, months and then uses drug this policy did not perform well. should note that the post-treatment steady state values of uninfected under total virus load CD4 + cells of the therapeutic strategies are not appreciably different all than the pre-treatment steady state values; hence, the therapeutic benefits in our model are achieved during the transient domain. individuals undergoing for many However, the years. Therefore, the improvements over the transient domain Unequal Infectivity Hates. 0.9/?i- The in our model should in clinical practice. Because there seems to be some uncertainty about whether virus strains have different infectivity rates, = of HIV-infected dynamic drug treatment would probably exhibit transient behavior be indicative of the improvements that can be realized 02 dynamics viral we reduced the infectivity of virus 2 so that qualitative results were similar to our base case, although the policy expended a slightly larger fraction of its effort larger proportion of virus 2 in the viral mix. More runs were generated by varying the parameters 7r £-, displayed in Figures 1 and 2 remained on virus resulting in a somewhat numerous other simulation generally, $, A and p (J and the qualitative intact; hence, the 23 1, dynamic , model appears to results be robust. 7. We Conclusions have used the control theory paradigm HIV in a Our model therapeutic setting. incorporates different virus strains, and a variety of therapeutic options are available. The approximation method, which uses perturbation analysis and the policy improvement gorithm, gives rise to a dynamic index policy: dynamic index, and largest index. each drug combination has an associated drug combination with the at each point in time the policy uses the The dynamic indices succinctly al- summarize three quantities: the efficacy of each drug combination on each virus strain, the toxicity of each drug combination, and the marginal benefit of blocking a new cell infection by each virus strain; the last of these three quantities changes over time as a function of an individual's viral count, viral load and mix. Numerical results for a two- virus, two-drug model suggest that dynamic multidrug ther- apies outperform their static counterparts: CD4 + count is increased, the total viral load and the emergence of drug the individualized therapy that we propose is reduced, the uninfected in anticipation of delayed. Although difficult to may outweigh implement than the costs of implemen- allowing the less fit more fit realistic response virus strains at a relatively low level, while perhaps strains to partially establish themselves. Although our numerical results focus on the two- virus, two-drug more in the emergence of drug-resistant strains. In addition, the dynamic policy attempts to maintain the of is These benefits are achieved by frequently changing therapies over time tation. and is resistant strains no doubt more protocols that are currently in practice, the benefits to CD4 + models using data from multidrug studies turn out that the best way to delay the onset of AIDS is is planned case, the for the future. etc.). The model and 24 It may via the intelligent use of a wide range of therapies (RT inhibitors, protease inhibitors, reconstitution of the immunotherapy, gene therapy, development immune system, analysis presented here provides the framework for the Finally, our development of such therapeutic approach to this strategies. problem circumvents the usual obstacles inherent ing high-dimensional nonlinear control problems. in solv- This method, which appears to be new, has potential applications in a wide variety of control problems in epidemiology and ecology: besides allowing for mutation among multiple variants of entities (in this case, viruses), the approach can also incorporate discrete age classes and discrete (e.g., for spatial (e.g.. lattice) structures (e.g., for optimal vaccinations of measles) dynamic control of spatial epidemics). Acknowledgment The He first author was supported by NSF grant DDM-9057297 and EPSRC grant GR/J71786. gratefully acknowledges a valuable discussion with Peter Whittle during the early course ol this research, estimation. and thanks Denise Kirschner for a helpful The second author was supported by 25 conversation about parameter the Wellcome Trust and Keble College. REFERENCES Agur. A new method (1989). Z. for Biomedical Modeling and Simulation. Levine, D. IMACS. lishing Co. Avriel, AZT. reducing cytotoxicity of the anti-AIDS drug S., ed., J. C. Baltzer AG, Scientific Pub- 59-61. M. (1976). Nonlinear Programming. Prentice- Hall, Englewood Cliffs, NJ. Bellman, R. E. (1957). Dynamic Programming. Princeton U. Press, Princeton, NJ. Condra. H. J. et In vivo (1995). al. emergence of HIV-1 variants resistant to multiple protease inhibitors. Nature 374, 569-571. Elliott, R. J., Aggoun, L. Control. Springer- Verlag. Embretson. and Moore, New B. (1994). J. York. (1993). Analysis of J. ef al. Hidden Markov Models: Estimation and human immunodeficiency virus-infected tissues by amplification and in situ hybridization reveals latent and permissive infections at single-cell resolution. Pvoc. Natl. Acad. Sci. Frost, S. D. W. and McLean, USA 90, 357-361. A. R. (1994). Quasispecies dynamics and the emergence of drug resistance during zidovudine therapy of HIV infection. Gittins, J. C. (1989). AIDS 8. 323-332. Multi-armed Bandit Allocation Indices. Wiley, New York. Golub. G. H. and Van Loan, C. F. (1989). Matrix Computations. Johns Hopkins U. Press, Baltimore. Ho. D. D. infection. et al. (1995). lymphocytes in HIV-1 Dynamic Programming and Markov Processes. MIT Press, Cam- MA. Key. P. B. (1990). Inf. CD4 Nature 373, 123-127. Howard, R. A. (1960). bridge, Rapid turnover of plasma virions and Sci. Optimal control and trunk reservation in loss networks. Prob. Engrg. 4. 203-242. Kirschner. D., Lenhart. S. and Serbin. S. (1995). Optimal control of the chemotherapy of HIV. Preprint, Dept. of Math., U. of Tennessee, Knoxville, TN. Kirschner. D. and Perelson. A. (1994). AZT treatment studies. A model for the immune system Mathematical Populations Dynamics 26 III, response to HIV: Theory of Epidemics. Arino. 0.. Axelrod, D. and Kimmcl. M.. eds., Kirschner, D. and of AIDS. Submitted Kirschner. D. and in Webb. G. A model for Publ.. Winn. Manitoba. treatment strategy in the chemotherapy Math. Biology. to Bull. Webb. G. the treatment of F. (1995a). Wuerz 1. drug resistance F. (1995b). Effects of HIV infection. in Preprint, Dept. of Math.. Texas chemotherapy strategies A&M U.. College Station. TX. Kojima. E. ef al. Human (1995). immunodeficiency virus type and development of drug- related mutations 1 (HIV-1) viremia changes with symptomatic HIV-1 infection in patients receiving alternating or simultaneous zidovudine and didanosine therapy. J. Infectious Dis- eases 171. 1152-1158. Lagakos. S.. Pettinelli. C. Stein. D. and Volberding. P. A. (1993). The Concorde Trial. Lancet 341. 1276. Larder. B. A.. Kellam. P. and select viable multidrug-resistant Larder. B. A.. Kemp. S. antiretroviral efficacy of Mansky. L. HIV-1 in vitro. Nature 365. 151-453. D. and Harrigan. P. R. (1995). Potential mechanism for sustained AZT-3TC combination therapy. Science 269. 696-699. in vivo than that predicted from the 1 Convergent combination therapy can (1993). M. and Temin, H. M. (1995). Lower eficiency virus type J. Kemp. D. S. mutation rate of human immunod- fidelity of purified reverse transcriptase. Virology 69, 5087-5094. McLean. A. R. and Nowak. M. A. (1992). zidovudine resistant strains of HIV. AIDS McLeod. G. X. and Hammer. S. M. Competition between zidovudine sensitive and 6. 71-79. (1992). Zidovudine: 5 years later. Annals of Internal Medicine 117. 487-501. Nabel. G. J. (1994). Nowak. M. A. Gene therapy approaches et al. (1991). to AIDS. AIDS 8. S61-S69. Antigenic diversity thresholds and the development of AIDS. Science 254. 963-969. Nowak. M. A. and May. R. M. (1993). AIDS pathogenesis: mathematical models and SIV infections. Ott. T. J. AIDS of HIV 7. S3-S18. and Krishnan K. R. (19S5). State dependent routeing of telephone use of separable routeing schemes. In Proc. 11th 27 Int. traffic and the TeletrafRc Cong.. Akiyama, M.. ed.. Amsterdam. Elsevier, Perelson, A. S. (19S9). Modeling the interaction of the immune system with HIV. In Mathe- AIDS Epidemiology, Castillo-Chavez, C, New York, 83, 350-370. matical and Statistical Approaches to Notes Biomath., Springer- Verlag, in The dynamics Perelson, A. S., Kirschner, D. and DeBoer, R. (1993). CD4+ Lecture infection of Math. Biosciences 114, 81-125. cells. Perelson, A. S. ef and lifespan, HIV of ed., HIV-1 dynamics (1995). al. viral generation time. Pontryagin, L. virion clearance rate, infected cell Submitted to Science. Boltyanskii, V. G., Gamkrelidze, R. V., S., The Mathematical Theory of Optimal Richman, D. D. in vivo: et al. (1994). and Mishchenko, Processes. Interscience Publishers, Resistance to AZT therapy in patients with advanced infection with E. F. (1962). New York. and ddC during long-term combination human immunodeficiency virus. J AIDS 7, 135-138. St. Clair. mutation M. in ef II. a/. Resistance to ddl and sensitivity to (1991). AZT induced by a HIV-1 reverse transcriptase. Science 253, 1557-1559. Schuurman, R. et al. (1995). Rapid changes human immunodeficiency in virus type 1 RNA load and appearance of drug-resistant virus populations in persons treated with lamivudine (3TC). J. Infectious Diseases 171. 1411-1419. Schwartz, D. H., Skowron, G. and Merigan, T. C. (1991). Safety and effects of interleukin-2 plus zidovudine in asymptomatic individuals infected with J AIDS virus. 4. 11-23. Shafer. R. W. gene mutations. Shafer, R. W. (1991). ef al. for drug-resistant of human immunodeficiency Combination therapy with zidovudine and didanosine human immunodeficiency J. et al. virus type 1 selects strains with unique patterns of pol Infectious Diseases 169. 722-729. (1995). Drug human immunodeficiency combination therapy. J. resistance and heterogeneous long-term virologic responses virus type Tinfected subjects to zidovudine and didanosine Infectious Diseases 172. 70-78. Shaw, G. M. (1995). Personal communication. Volberding, P. A. tomatic HIV ef al. infection. (1994). JAMA The duration of zidovudine benefit in persons with 272, 437-442. 28 asymp- Wei, X. et al. (1995). Viral dynamics in human immunodeficiency virus type 1 infection. for the treatment Nature 373, 117-123. Wilson, C. C. and Hirsch. M. of human immunodeficiency S. (1995). Combination antiretroviral therapy virus type-1 infection. Proc. Ass. American Physicians 107. 19-27. Wilson. C. C. et al. (1995). Ex vivo expansion of CD-I lymphocytes from human immunod- eficiency virus type 1-infected persons in the presence of combination antiretroviral therapy. J. Infectious Diseases 172. 88-96. 29 / ) APPENDIX A The solution to (24)- (27) is + ft*(* - -) \ /' — fc^-d, * A / - -) fi ' V -rr, \f'{k, Uibj - + cti kj r - . j j ki n) - *,-)(<* - ki-ai rr; a,){k, - a,•- - a j)( k t r + r ( 1 fc; //) + (r a,-) ' + " *,• (v3 - - // A' / (41 . t c -(*+rt(-t)\ t ) a, it _ < A-, — a, ; o -kj.-fiJJ aii-ki J a - / // ki-cti —^ Xiqjiftdjix—) 7r lt - (»< - a, c\j t) - a, —-1±-) - Vj { ir-?/f-( ai+ ")( s -') -77 /' Ji M- - ai){ki - ( \ -maPidiix ~ -) ^r'" yi e-l°i-+ - J J —Q fcj A-, (40) . A-, t fi n-ki ki-oti cti) (i {ki-ctiy \ - a,) - aj - ocj){Cti iri ji \{ki /' + - ^(fcj \ \{kj- // .. —Q A', j7j - -) -Pi(x \ ) ^/ + oti)([i ^77 r- + & - -^-) cti(ki-ai) A'i-a, // qti&id t (x - H \(ki i=i P -(fc.+/x)(s-t) )' a; //(a, -Vte TjMl-h - - A- ( *,-)(** : \ //) " ^)J m + 7 (42) where the constants appearing in these equations are c,=i(-T^-.L --**-)-%* --i(^ + .-.)) ^y, = -QiiPidi[t ( 7 VA'j — — + {ViQ, 30 A, — Q; q, — Av («) + 9ii ftd j-- - --('-,- -)(-7] I ) i -) ; *iVi - (aj A + -a \{kj fi ){Qj } Vl _ Ti l -TTiqufiidi- / ft \(ki ' \(ki t ) l H V'j ("-'/,.. ;Aj j/< - \//(A*, // + - \(&j - KiVi k (Vi - Q, - -) + (vi — - a, A , — Q, Q, - fa _ Q j) a,-)(fci Qj)(Qi;-Qj - - Q fcj //) j 7T, - /<)(*"; - «j + (»i- /((a, ( _ aj fc 77,11, ) // - Cti)(ki a, )(o, A Xiqjifydjix ki (44) fi] A' ly n-«y, ) t -, - kj ^-)< - an - («i <) — ati)(ki — cti — p) A 3 d {x 2 V(^« - a[ „2 (* ,, ( ft I ^ A +ft(x- q it - to - T 2 *-)" kj - a } q, - t 7T;t/j / "'// a, fcj 7T t -a +fi A C ~^±+ ft- T^^TT _Q + - a, = + kj-aj (ai-kj a,) *iVj 3J dJ {x qJl - %)(«< ( - A, - fi) a ~ j)(^ ~ « fc fc i; l 2/j ): ^i-«j (Ot-^i /0 ^- -/*)(* (45) APPENDIX B V The approximate value function L + — 0,(^ // a v ( A +Pi{x- -) //' t (&i - a 2 t + - Qj « " _ ) 7 fc + C„(i- e -*iCr-t)) «iVi >,+u)(T-t) - u "t; (Vi ( MA', -Q,)(I', ( e-'tr-*>) ) -i TTiJ/i + ; Cw *«(l - ftA/^^e-^1 "-*)-!: given by ECt^OO + ^OO)* 77 e is I rT f e - h_ y fi){a l; - + 31 ) 1,2 a,- + fi) (», //, ki-Oi 1 - c -(^+"Hr-n Vih + V ) a' : ,(o KiVi -Ki<4u k; \^ — r| y 5|V iftift " l)c- a {T - ,] ' -{a t l + 1) -)- + ^ W*i - aj)( ai - — /' (»i 3570 kj —r\ )(M*j - Qj)(ai - a3 - 128 - «,-)(<* e {a t - kj wy . [Vi - > t ) -{kj+n)(T-t) 32 - a, k3 - Qj - kj _ - e T - f) -Qi) -(^)(T-t) _ //(q,- - > *,-(** -otj- ! 1 k - t - /.i)(k t *,-)(* - + fi) / k^ ——+ — -Qj- fi){ki fi)(ki — . k\ [V k t '< J + fi) fi){ai yj 7T + Q,(k m<*i-h) ^n(ki-ai){ki-ai- -mjiPjdj{x -Tjj- q. e .( l-e- a of —-(^)(r-0_i) —— + . )l // T+ Jidi-\ (j,\(ki-ai) 2 +(vi-—— +7Tigt i^di(x t : -r fi){Qj /') y fi)(k3 + fi] ))]) (46) Date Due my o .-, w|<?ff m is* Lib-26-67 MIT LIBRARIES 111111111! Mlllll llll lllllllllllll II III 3 9080 00939 8865