- Prowmdlng. ol tho 31rt Conhmnca on Doclaionand Control Tucson, Arizona Dewmber 1992 - WP8 16:20 O P T I M A L C O N T R O L OF A H Y B R I D SYSTEM WITH P A T H W I S E AVERAGE COST* Mrinal K. Ghosht, Aristotle Arapostathist and Steven I. Marcus5 Abstract. We study the ergodic control problem of switching diffusions representing a typical hybrid system that arises in numerous applications such as fault tolerant control systems, flexible manufacturing systems, etc. Under certain conditions, we establish the existence of a stable Markov nonrandomized policy which is almost surely optimal for the pathwise longrun average cost criterion. We then study the corresponding Hamilton-Jacobi-Bellman (HJB) equation and establish the existence of a unique solution in a certain class. Using this, we characterize the optimal policy as a minimizing selector of the Hamiltonian associated with the HJB equations. We apply these results to a failure prone manufacturing system and show that the optimal production rate is of the hedging point type. 1. Introduction. We address the problem of controlling switching diffusions by continually monitoring the drift and jump rates of the continuous and discrete components, respectively. The objective is to minimize, almost surely, the pathwise long-run average (ergodic) cost over all admissible policies. A controlled switching diffusion is a typical example of a hybrid system which arises in numerous applications of systems with multiple modes or failure modes, such as fault tolerant control systems, multiple target tracking, flexible manufacturing systems etc. [9],[lo], [12]. The state of the system at time t is given by a pair ( X ( t ) ,S ( t ) ) E Rd x S,S = {1,2,. ..,AV}. The continuous component X ( t ) is governed by a “controlled diffusion process” with a drift vector which depends on the discrete component S ( t ) . The discrete component S ( t ) is a “controlled Markov chain” with a transition matrix depending on the continuous component. The evolution of the process ( X ( t ) ,S ( t ) ) is governed by the following equations: d X ( t ) = b ( X ( t ) ,S ( t ) ,u ( t ) ) d t + g ( X ( t ) ,S ( t ) ) d W ( t ) , (1.1) where c is the running cost function. Under certain conditions, we will show that there exists a Markov policy U and constant p* such that and for any other admissible policy - This will establish that v is optimal in a much stronger sense; viz., the most “pessimistic” average cost under o is no worse than the most “optimistic” average cost under any other admissible policy. Also, under the conditions assumed in this paper, the optimal pathwise average cost coincides with the optimal expected average cost. So we will not distinguish between the two optimality criteria. Our paper is organized as follows. In Section 2 we present a concise description of the problem. Section 3 is devoted to the study of recurrence and ergodicity of switching diffusions. The existence of an optimal policy is established in Section 4. The Hamilton-Jacobi-Bellman (HJB) equations are studied in Section 5. Using the results of Sections 2-5, a failure prone manufacturing system is analyzed in Section 6. Proofs are omitted due to length limitations. 2. P r o b l e m Description. We first exhibit that the switching diffusion (1.1),(1.2) can be constructed on a given probability space. Our presentation follows [9],[lo]; we repeat it here for the sake of clarity and completeness. Let U be a compact metric space and S = {1,2,. . . , N } . Let T= I P ( S ( t + S t ) = j s ( t ) = z , X ( s ) , S ( s ) , sr t ) + = x i j ( X ( t ) , u ( t ) ) a t o(St), i #j, t Department of Mathematics, Indian Institute of Science, Bangalore 560012, India. Department of Electrical and Computer Engineering, University of Texas, Austin, Texas 78712-1084. 5 Electrical Engineering Department and Systems Research Center, University of Maryland, College Park, MD 20742. ‘This work was supported in part by the Texas Advanced Research Program under Grant No. 003658-186, in part by the Air Force Office of Scientific Research under Grants F49620-92-5-0045 and F49620-92-3-0083, and in part by the National Science Foundation under Grant CDR-8803012. CH3229-2/92/0000-1061$1.OO 0 1992 IEEE U = p,,.. .,&I‘ [Uti(., : Rd x Atm Sx U --f Rd 43 : Rd x s + R d X d , xPm : R x~ s + R , (1.2) for t 2 0, X ( 0 ) = X O , S ( 0 ) = SO.where b, U , X are suitable functions, X i , 2 0, z # j , X i l = 0, W ( . )is a standard Brownian motion and U(.) is a nonanticipative control process (admissible policy). The latter is called a Markov policy if u ( t ) = v ( X ( t ) , S ( t ) )for a suitable function U. Our aim is to minimize almost surely (a.s.) over all admissible policies the quantity U(.) rT e,m = 1, . . . , N N 2 0, for e # m, Xt,,, = 0, for any c E S. m=l We make the following assumptions which will be in effect throughout the paper. ( A l ) (i) For each i E S,$(,,i;) is bounded, continuous and Lipschitz in its first argument uniformly with respect to the third. (ii) ~ , j ( . .) , is bounded, Lipschitz and for each IC, a(., k)u’(.,IC) is uniformly elliptic, i.e., there exists a constant U, > 0 such that pr. a(., I C ) ~IC) ~ 2. , (iii) For each e, m E S , is continuous and Lipschitz in its first argument uniformly with respect to the second. Also, there exist constants ’ l o , Xo, 0 < Xo < A0 such that for C,m E S, e # m, Xo 5 Xtm 5 Ao. For any Polish space Y , B ( Y ) will denote its Bore1 u-field and P ( Y ) the space of probability measures endowed with the Prohorov topology, i.e., the topology of weak convergence. Let 1061 ~ m(Y)be the set of all nonnegative integer-valued a - b i t e measures on B ( Y ) . Let m,,(Y) be the smallest a-field on m(Y) with respect to which all the maps from m(Y) to W U (00)of the form p H p ( B ) with B E %(Y)are measurable. n ( Y )will always be assumed to be endowed with this measurability structure. Let V = "(U) and b = [ b l , . . . , b d ] ' : wd x S x U -t wd be defmed by b,(., . , v ) = Similarly, for t ,j l&(., .,u)u(du). (2.1) E S,v E V , Xi, is defined as icy. If U(.) is a Dirac measure, i.e., U(.) = where U(.) is U-valued, then it is called an admissible nonrandomized policy. An admissible policy is called feedback if U(.) is progressively measurable with respect to the natural filtration of ( X ( . ) ,S(-)). A particular subclass of feedback policies is of special interest. A feedback policy U(.) is called a (homogeneous) Markov policy if U(.) = ir(X(.),S(.))for a measurable map ir : Rd x S + U. With an abuse in notation the map ii itself is called a Markov policy. Let U, IIM and IIMOdenote the sets of all admissible, Markov and Markov nonrandomized policies respectively. If (W(.),p(.,.),Xo,So,w(.)) satisfying the above are given on a prescribed probability space (O,F,P), then under ( A l ) , the equation (2.4) will admit an a.s. unique strong solution [ll, Chap. 31, and X(.)E C(B+;Bd),S(.)E D(R+;S), where D(W+;S) is the space of right continuous functions on 1, with left limits taking values in S. However, if U(.) is a feedback policy, then there exists a measurable map f : w+ x C(R+;Wd) x D(W+;S) 4U such that for each t 2 0, o ( t ) = f ( t , X ( - ) ,S(.))and is measurable with respect to the a-field generated by ( X ( s ) ,S(s),s 5 t). Thus, w ( . ) cannot be specified a priori in (2.4). Instead, one has to replace v(t) in (2.4) by f ( t , X ( . ) , S ( . ) ) ,and (2.4) takes the form and so on. For fixed z and v, these are disjoint intervals, and the length of Aij(z, U ) is X;j(z,U). Now, define a function h : Rd x S x bY h ( z ,i, v , z ) = v xw 4 for t 2 0 with X ( 0 ) = XO,S(0) = SO.In general, (2.5) will not even admit a weak solution. However, if the feedback policy is Markov, then the existence of a unique strong solution can be established. We now introduce some notation which will be used throughout the paper. Define L1(Rdx S) = { f : Rd x S + W : for each i E S,f(.,i)E L'(Wd)}. L1(Rdx S) is endowed R j - i if 2 E Aij(z, v) 0 otherwise. (2.3) Let ( X ( t ) , S ( t ) )be the (Wdx S)-valued controlled switching diffusion process given by the following stochastic differential equations: with the product topology of (J!?(R~))~. Similarly, we define C,"(Rd x S),W,2dz(Ra x S ) , etc. For f E W:;d,P(Rd x S),U E U, we write where for t 2 0 with X ( 0 ) = XO,S(0) = SO,where (i) XO is a prescribed Rd-valued random variable. (ii) SO is a prescribed S-valued random variable. (iii) W ( . ) = [Wl(.),. . . ,Wd(.)]' is a d-dimensional standard Wiener process. (iv) p(&, dz) is an m(R+ x R)-valued Poisson random memure with intensity dt x m(dz), where m is the Lebesgue measure on R. and more generally, for v E (v) p(.,-), W ( - ) XO , and SOare independent. (vi) U(.) is a %valued process with measurable sample paths satisfying the following nonanticipativity property. Let U The following result is proved in [lo]. Theorem 2.1. Under a Markov policy v, (2.4) admits an a.s. unique strong solution such that ( X ( . ) ,S(.))is a Feller process with differential generator L". A Markov policy v is called stable if the corresponding process ( X ( . ) ,S(.))is positive recurrent. In this case, the process will have a unique invariant probability measure, denoted by q,, E F ( R d x S). The uniqueness of qv is guaranteed by ( A l ) (ii) 3KE1 Then and are independent. Such a process U(.) will be called an admissible (control) pol- 1062 -.. and (iii). We assume that the set of stable Markov policies is nonempty. for every neighborhood V of 5 having compact closure 7in It is clear that i f f is L"-harmonic then The Optimization Problem. Let C : W d x S x U -+ Iw+ be the cost function. We make the following assumption on C. (A2) For each i E S, C(.,i;) is continuous. We define c : Rd x S x V 4 R+ by f(.,i) = EtV,;f( X ( W ) , S ( T " ) ) . (3.2) Lemma 3.3. Let D C Rd be open. Then, under (AI): (i) Every Lv-harmonic function in D is continuous in D . (ii) If L" f = 0, f E W 2 J ( Dx S ) , then f is L"-harmonic. Conversely, i f f is Lv-harmonic and f E W,;f(D x then Lv f = 0 in D . (iii) (Maximum Principle) Let D be connected and f 2 0 and L"-haxrnonic in D. Then f is either strictly positive on D x S or identically zero. c ( z , i , v )= J U ? ( z , i , U ) v ( d U ) . s), (2.9) Let U(.) be an admissible policy and (X(,), S(.)) the corresponding process. The pathwise (long-run) average cost incurred under U(.) is Tl limsup T-CS 1 T c ( x ( t ) ,~ ( t ) , v ( t ) ) d t . xll(~) (2.10) We wish to a.s. minimize (2.10) over all admissible policies. Our goal is to establish the existence of a stable Markov policy which is a.s. optimal. We will carry out our study under two alternate sets of hypotheses: (a) a condition on the cost which penalizes unstable behavior, (b) a blanket stability condition which implies that all Markov policies are stable. We will describe these conditions in Section 4. 3. Recurrence and Ergodicity of Switching Diffusions. Due to the interaction between the continuous and discrete components, the study of recurrence and ergodicity of switching diffusions is quite involved. Let v be a Markov policy which will be fixed throughout this section unless explicitly mentioned otherwise. Let P" : R+ x W dx S 4 P(Wdx S) denote the transition function of the corresponding process ( X ( . ) ,S(.)). Also P:,l and E:,i denote the probability measure and the expectation operator, respectively, on the canonical space, of the process ( X ( . ) , S ( . ) )starting at (z,i)E !Itdx S. The following result plays a crucial role in recurrence. Lemma 3.1. For any ( t ,I,i) E P " ( t , z , i ; . )is R~ x S . Let T,:,T~ be T ,= ~ inf{t T, = Let D inf{t W+ x Rd x S, the support of > o : S ( t )= j } . 2 0 : ( X ( t ) , S ( t ) )f D 2 0 : X ( t )4 D } . Theorem 3.1. Let 52 C Rd be a bounded open domain and D an open set whose closure is a compact subset of 0. Let f 2 0 a n d L U f = O i n W Z , p ( R x S )Thenforany5,yE . Dandi,j E S , we have f ( 2 , 4 5 Cf(Y,j) I where C is a constant which depends only on d, N , the diameter of D, the Hausdodf distance between D, the bounds on 5, U , X in (AI), the ellipticity constant of uu', and is independent of the L"-harmonic function f or the Markov policy v. n, x {i}} P z , i ( X ( t n )E B ( I , E ) , S ( ~=, )i , for a sequence t , A point Lemma 3.2. Under (AI), the following holds sup V E H M , ZEBd (ii) (iii) sup E,",,TD,,< m. V E n M , ZERd v E H M , rERd (iv) sup "EnM,L E,",,rD< CO. - ~ ~ d It is well known that the properties of harmonic functions of a Markov process play an important role in recurrence and ergodicity [2]. Therefore we will study some properties of the S(.)) under the Markov harmonic functions of the process (X(.), policy v . Let D c W d be an open set and f : D x S 4 R. The function f is called L"-harmonic on D if it is bounded on compact subsets of D , and for all z E D , i E S , f ( 5 ,2 ) = E,",if(X(Tv,i), S(Tv,i)) t m) = 1. is transient if 4 m) = 1 If all points of the switching diffusion are recurrent, then it is called recurrent. A transient switching diffusion is similarly defined. Under (Al) (ii), (iii), we show that a switching diffusion is either recurrent or transient. < m. E:,,., (z, i) P,",:(IX(t)l403, as t E $ T , ~< cm. sup > 0. We now discuss the recurrence properties of switching diffusions. Our treatment closely follows [2]. The point (z,i)E W d x S is said to be recurrent if given any E > 0, Using (3.2) and well known arguments in Markov processes [8, Vol. I, p. 1111 the following results can be proved. (i) Jz, We next establish Harnack's inequality for L"-harmonic functions, which is a very important result in partial differential equations. This result will not play any role in recurrence or ergodicity, but it will be crucial in deriving a certain estimate in Section 5. A0 c Rd be a bounded open set. Define = inf{t = Remark 9.2. The counterexample in Remark 3.1 again that Theorem 3.1 does not hold if we drop the condition X;j 2 > 0 : S ( t ) = i and S(u) # i , for some 0 < U < t } TD Remark 3.1. (i) The condition 2 A 0 plays a crucial role in the above results. For example, the maximum principle does not hold without this, as - the following - counterexample shows. Let d = I, s = {1:2}, ~ 1 1 ( , )= 0, ~ 2 1 ( . ) 1, i; = 0, D = f ( z , l ) E 0, f ( z , 2 ) = coshz. Then it is easily verified that Lf(z, 2 ) 5 0, i = 1,2. (ii) To the best of our knowledge, this maximum principle is not known in the literature on partial differential equations. cows the stopping times defined as follows. TD,;= inf{t D. Lemma 3.4. Under (AI), the following statements are equivalent. (i) The switching diffusion is recurrent. (ii) P,",i(X(t)E D , for some t 3 0) = 1,for all z E Rd, i E S and any non-empty open set D . (ii) There exists a compact set K c W d such that P,",,(x(~)E K , for some t 2 0) = 1 for d (z,i)E W~ x S. (iv) P,",;(X(t,) E D , for a sequence t , t w) = 1, for all 5 E Rd, i E S and any non-empty open set D . (v) There exists a point z E Itd, a pair of numbers ro, r1, 0 < ro < r l , and a y E a B ( z , r l ) such that P:,i(qz,ro)C < m) = 1, for any i E S. (3.1) 1063 Theorem 3.2. Under (Al), for any Markov policy, the switching diffusion is either recurrent or transient. A recurrent switching diffusion will admit a unique (up to a constant multiple) o-finite invariant measure. The switching diffusion is called positive recurrent if it is recurrent and admits a finite invariant measure. A Markov policy v is called stable if the corresponding process is positive recurrent; the corresponding invariant probability measure is denoted by vu. Theorem 3.3. Let z, ro, rl be as in Lemma 3.4(v). Then under (Al), the switching diffusion is positive recurrent if (3.3) Note that it may be very difficult to verify (3.3) for general b, CT, A. One usually verifies (3.3) by constructing a Liapunov function (21. For switching diffusions such a construction seems difficult, since it involves solving a system of ordinary differential equations in closed form. However, we present some criteria for positive recurrence and discuss some implications. (A3) There exists a w E CZ(Rdx S), w 2 0 such that t"w(z,i)-+ 00 as 1x1 -+ 00 for each i, E,V,iw(X(t),S(t)) and Ez,ilL"w(X(t),S ( t ) ) (are locally bounded, and L W w ( zi ,) L: p - Q"(I, for some p 2) > 0 and q > 0. We assume that p* < M. Consider the following conditions. (A5) Assume that for each i E S, lim inf inf Z(z, i, U ) (z1-m uEU (A6) There exists a w E C2(Rdx S),w 2 0 such that (i) w(x,i)+ 00 as ( 2 )-+ 00 uniformly in1.1 for each i. (ii) For each z1 E n ~E,V,iw(X(t), , S(t)) and E : , i ) L w w ( X ( t S(t)) ), are locally bounded. (iii) There exists p > 0, q > 0 such that L'w(z, i) 5 p - qw(z,i) for each u E U. Our main result is the following. 1 Theorem 4.1. Let (Al), (A2) hold. Under any one of the conditions (A4), (A5) or (A.6), there exists a v* E I I ~ M D which is a.s. optimal. 5. Hamilton-Jacobi-Bellman Equations. In this section, we will study the HJB equations and characterize the optimal policy in terms of their solution. We will work under the following assumption. (A7) The cost function Z is bounded, continuous and Lipschitz in its first argument uniformly with respect to the third. We will follow the vanishing discount approach, i.e., derive the HJB equations for the ergodic criterion as a vanishing limit of the HJB equations for the discounted criterion as the discount factor approaches zem. The results follow those of [6]. However, they differ in important technical details. For a > 0, I E Etd, i E S,let V,(z,i) denote the discounted value function with discount factor a > 0, i.e., Theorem 3.4. Under (Al) and (A3), the process ( X ( . ) , S ( . ) ) under the Markov policy v is positive recurrent. (A4) There exists a Cz function w : Etdx S + W+ such that (i) limlzl+oow ( z , i )= uniformly in 1x1. (ii) There exists a > 0, E > 0 such that for (I >( a, L'w(z, i) < - E , for all U E U, i E S, and IVw(z, i)12 2 p-', where p is the constant in (AI) (iii). (iii) w ( z , i) and IVw(z, i)l have polynomial growth. The proof of [4, Lemma 6.2.2, p. 1501 can be closely paralleled to yield the following. +M Theorem 3.5. Under (AI) and (A4), the process ( X ( . ) , S ( . ) ) under any Markov policy v is positive recurrent. Thus, all Markov policies are stable. 4. Existence of an Optimal Policy. In this section we will establish the existence of a stable Markov nonrandomized policy under certain conditions. We will follow the methodology developed in (41, [5], [6],[7] for controlled diffusions. For switching difEusions, similar techniques carry through with some extra technical details. Let &M and I I ~ M Ddenote the set of stable Markov and stable Markov nonrandomized policies respectively. Since we look for an optimal policy in IISMD,it is natural to assume that it is nonempty. Let ZI E IISMD. Then V,( CL, c(z, i, v(z,z))vw(dz,i) := pv I, i) = 1% E:,i [/ 00 e-"tc(X(t), S(t),u ( t ) ) d t ]. Theorem 5.1. Under (AI), (A7), V, is the unique solution in c ~ ( Rx S) ~ n cb(wd x S ) of inf { LuV0(z, i) UEU + ?(I, i, (4.1) U)} = aV,(z, 2). (5.2) For i E S, set Gi = { z E W d : inf E(z,i, U ) 5 p ' } UECJ N G=UG~ (5.3) i=l By (A5) and (A7), G is compact. The following result plays a very crucial role. Lemma 5.1. Under (AI), (A5), (A7), there exists a a. E ( 0 , l ) such that for (Y E (0,a0],min,,i Vol(z,i)is attained on the set G as defined in (5.3). Ako, IVol(z,i)- V,(y, j ) l is bounded on compacta, uniformly over a E (0, ao). Theorem 5.2. Under (AI), (A5), (A7), there exists a function V : W d x S -+ R and a scalar p E R such that for some fixed io E S, E C2(Rdx S) and (V,p) satisfies the HJB equations given by UEU inf ( L ~ V ( Si), a.s. (5.1) The following result is proved in [lo]. V = > p* + ?(I,i , ~ ) =] p. (5.5) Based on Lemma 5.1 and Theorem 3.1, the following results can be derived by closely following the methods in [6]. Lemma 5.2. Assume (AI), (A5), (A7). Let v E IIMDbe such that for each i certain hedging point policy is stable. Therefore, by the results of Section 4 there exists an a.s. optimal Markov nonrandomized policy for the cost criterion N The HJB equations in this case are + ~ ~ , j ( x , u ( ~ , i ) ) V ( x , j ) f C ( z , i , ~a.e. ( z , i (5.6) )) $V”(I, 0) - dV‘(s, 0) $Vtf(z, 1) - minUEro,rl{(u - d)V‘(z, 1)) J=1 Then U E ISMD D. The scalar p in (5.5) equals p* and v in (5.6) is a.s. optimal. Theorem 5.3. Among all pairs (b,p) E W,;:(Rd x S ) x R, 2 5 p < 00, satisfying (5.5), (V,p*), where If is the function in Theorem 5.2, is the unique one satisfying the properties (5.4). The results of Section 5 ensure existence of a C2solution (V,p*) of (6.2), where p* is the optimal cost. Using the convexity of c(.), it ran be shown that V ( . , i )is convex for each i. Hence there exists an x* such that Theorem 5.4. Under (Al), (A5), (A7), v E I I ~ M is D a.s. optimal if and only if (5.6) holds. V ‘ ( I , ~5 ) O 20 Remark 5 . 1 . The bounded condition on the cost function E can be dropped by mimicking the arguments in (6, p. 2021. We will now study the HJB equation under ( A l ) , (AG) and (A7). Recall that under ( A l ) , (A6), IIM = ~ S M We . say that a function f : Rd x S --t R is in the class O(w) if for each i E S limsupIf(z,i)l/w(z,i) < 00. I ~ I - ~ for z 5 I* for x 2 I*. (6.3) From (6.2), it follows that the value of U which minimizes ~ ) V ’ ( I1) , is r if x 5 I* U = { 0 ifz>z*. (5.7) (U - I = I*, V’(z*,l) = 0 and therefore any U E [O,r]minimizes - d)V‘(s, 1). Thus, in view of Theorem 5.4, we can choose any U E [O,r]at I = I*. To be specific, we choose U = d at, z’, i.e., we just produce to meet the demand exactly. Thus, the following U* E I I ~ M Dis optimal At (U Theorem 5.5. Under (AI), (AS) and (A7), the equation (5.5) admits a unique solution (V, p) in the class Wf;z(Rd x S)nO(w), 2 5 p < CO satisfying V(0, io) = 0 for a fixed io E S. Remark 5.2. (i) The statement of Theorem 5.5 holds under (Al), (A4) and (A7). (ii) For the stable case we have carried out our analysis under the Liapunov condition (A6). Analogous results can be derived under the weaker condition (A4). 6. An Application t o a Manufacturing Model. We now use the results of the previous sections to analyze the manufacturing model studied in [l],[3], [lo]. Suppose there is one machine producing a single commodity. We assume that the demand rate is a constant d > 0. Let the machine state S ( t ) take values in ( 0 , I}, S(t) = 0 or 1, according as the machine is down or functional. Let S ( t ) be a continuous time Markov chain with generator [ -xlo _hhui] where A0 > 0. A1 > 0, A0 and A1 are infinitesimal repair and failure rates respectively. The inventory X ( t ) is governed by the Ito equation d X ( t ) = (u(t)- d ) d t + odW(t) (6.1) where U > 0, u(t) is the production rate, W ( t ) is a onedimensional Wiener process independent of S ( t ) . The last term in (6.1) can be interpreted as “sales return”, “inventory spoilage”, “sudden demand fluctuations”, etc. A negative value of X ( t ) represents backlogged demand. The production rate is constrained by u(t) = i r v*(z,O) = 0, v * ( z , 1) = if I < I* d if I = I* 0 if z > I*. (6.4) The stability of the policy (6.4) follows from Lemma 5.2 provided we show that n s , ~ is nonempty. We show that the zeroinventory policy v given by is stable if and only if The condition (6.6) is very appealing from an intuitive point of view. Note that ’A, and A T 1 are mean sojourn time of the chain in states 0 and 1 respectively. In state 0 the mean inventory depletes at a rate d while in state 1 it builds up at the rate ( T - d). Thus, if (6.6) is satisfied, one would expect the zero-inventory policy to stabilize the system. Our analysis confirms this intuition. We first show that under U the process (X(.),S(.)) has an invariant measure 7” with a strictly positive “density-mass”. In view of Lemma 3.1, it would follow from the ergodic theory of Markov processes [13,Chap. 11 that ( X ( . ) , S ( . ) )is posit,ive recurrent. Thus, v would be stable. To this end, we attempt to solve the adjoint system (L”)’q5(s,t)= 0 if S(t) = 0 E [O,r] if S(t) = 1. Let c : B -+ R+ be the cost function which is assumed to be convex and Lipschitz. Also .(I) = ~‘(1x1)for some c’ : Ut+ 4 R+ which is increasing. Thus, c satisfies (A5). We will show that a Without any loss of generality, assume that x > 0, 1065 U = a. Then for Solving (6.15a)-(6.15d), we obtain a unique solution satisfying O<p<a, (a,P,y,6) (6.16) O<6<y. It follows from (6.16) and the foregoing that 4+(z) > 0, for I 2 0 4-(1) > 0, and for I < 0. + CY, p E Note that if r A 0 5 d(A1 Ao), then s3 = 0 and there exists no positive solution d-(z) of (6.10) in L’(-m,O]. In [3], Bielecki and Kumar have studied the mean square stability of the piecewise deterministic system, i.e., (6.1) with U = 0. They have shown that for w,s1 = d , s2 = ( d + d m ) / 2 .For z < 0, ~ “ ( . , O ) + d ~ ’ ( z , 0 ) - A 0 4 ( 5 , 0 ) + ~ 1 4 ( ~ ,= ~ )0 4”(1, 1) - ( r - d)#(ir, 1) - X 1 4 ( 1 , 1) (6.10) + Ao4(z, 0) = 0 r-d . + where y, 6 E W, G(s) = E.’ ds - Xo and roots of the polynomial s3, s4 the policy (6.5) is mean square stable. Our analysis shows that additive noise in (6.1) retains the stability of the zero-inventory policy as long as strict inequality holds in the above. . are the positive ordered by 0 < s3 < s 4 . It can be verified that +(s4) > 0. We need to satisfy Acknowledgement. The authors wish to thank Prof. S. R. S. Varadhan for explaining to us the work of Krylov and Safonov on Harnack’s inequality. $(s3) < 0, (6.13) 4+(0) = 4-(0) REFERENCES (6.12) s3-(r-2d)~’-[(r-d)dSAo+A1]~+[rAo-d(X1+Xo)] d T>xo Under (6.6), all the solutions of (6.10) in L’(-m,O] can be parameterized by 1. R. Akella and P. R. Kumar, Optimal control of production rate in a failure prone manufacturing system, IEEE Ttans. Automat. Control AC-31 (1986), 116-126. 2. R. N . Bhattacharya, Criteria for recurrence and ezistence of invariant measures for multidimensional diBusion.4, Annals of Probability 6 (1978), 541-553. 3. and 4. (6.13) is simply the continuity requirement, whereas (6.14) should hold since S(t) has a unique invariant probability measure A which satisfies Notes in Math. Series, No. 203, Longman, Harlow,UK, 1989. 5. V. S. Borkar and M. K . Ghosh, Ergodic control of muliidimewional difiaions, I: The ezistence results, SIAM J . Control Optim. 26 (1988), 112-1 26. 6. 7. 8. 9. The conditions (6.13), (6.14) are equivalent to the set of linear equations a-P=y-6 (6.15a) AoCY + Alp = -7%qS3) + 6*(s*) (6.15b) 1066 T. Bielecki and P. R. Kumar, Optimality of zero-inventory policies for unreliable manufacturing systems, Oper. Res. 36 (1988), 532-546. V. S.Borkar, Optimal Control of Diffwion Processes, Pitman Research 10. -, Ergodic control of multidimensional difiions, II: Adaptive contml, Appl. Math. Optim. 21 (1990), 191-220. -, Controlled diffurions with constraints, J . Math. Anal. Appl. 152 (1990), 88-108. E. B. Dynkin, Markov Processes Vols. I and II, Springer-Verlag, New York, 1965. M. K . Ghosh, A. Arapostathis and S . I. Marcus, An optimal control problem arising in flezible manufacturing systems, Proc. 30th IEEE Conf. on Decision and Control, Brighton, England (1991), 1844-1849. -, Optimal control of switching diffusions with application to flezibfe manufacturing systems, SIAM J . Control Optim. (to appear).