- WP8 16:20

advertisement
-
Prowmdlng. ol tho 31rt Conhmnca
on Doclaionand Control
Tucson, Arizona Dewmber 1992
-
WP8 16:20
O P T I M A L C O N T R O L OF A H Y B R I D SYSTEM
WITH P A T H W I S E AVERAGE COST*
Mrinal K. Ghosht, Aristotle Arapostathist and Steven I. Marcus5
Abstract. We study the ergodic control problem of switching diffusions representing a typical hybrid system that arises
in numerous applications such as fault tolerant control systems,
flexible manufacturing systems, etc. Under certain conditions,
we establish the existence of a stable Markov nonrandomized
policy which is almost surely optimal for the pathwise longrun average cost criterion. We then study the corresponding
Hamilton-Jacobi-Bellman (HJB) equation and establish the existence of a unique solution in a certain class. Using this, we
characterize the optimal policy as a minimizing selector of the
Hamiltonian associated with the HJB equations. We apply these
results to a failure prone manufacturing system and show that
the optimal production rate is of the hedging point type.
1. Introduction. We address the problem of controlling
switching diffusions by continually monitoring the drift and
jump rates of the continuous and discrete components, respectively. The objective is to minimize, almost surely, the pathwise
long-run average (ergodic) cost over all admissible policies. A
controlled switching diffusion is a typical example of a hybrid
system which arises in numerous applications of systems with
multiple modes or failure modes, such as fault tolerant control
systems, multiple target tracking, flexible manufacturing systems etc. [9],[lo], [12]. The state of the system at time t is
given by a pair ( X ( t ) ,S ( t ) ) E Rd x S,S = {1,2,. ..,AV}.
The
continuous component X ( t ) is governed by a “controlled diffusion process” with a drift vector which depends on the discrete
component S ( t ) . The discrete component S ( t ) is a “controlled
Markov chain” with a transition matrix depending on the continuous component. The evolution of the process ( X ( t ) ,S ( t ) ) is
governed by the following equations:
d X ( t ) = b ( X ( t ) ,S ( t ) ,u ( t ) ) d t
+ g ( X ( t ) ,S ( t ) ) d W ( t ) ,
(1.1)
where c is the running cost function. Under certain conditions,
we will show that there exists a Markov policy U and constant
p* such that
and for any other admissible policy
-
This will establish that v is optimal in a much stronger sense;
viz., the most “pessimistic” average cost under o is no worse
than the most “optimistic” average cost under any other admissible policy. Also, under the conditions assumed in this paper,
the optimal pathwise average cost coincides with the optimal
expected average cost. So we will not distinguish between the
two optimality criteria.
Our paper is organized as follows. In Section 2 we present a
concise description of the problem. Section 3 is devoted to the
study of recurrence and ergodicity of switching diffusions. The
existence of an optimal policy is established in Section 4. The
Hamilton-Jacobi-Bellman (HJB) equations are studied in Section 5. Using the results of Sections 2-5, a failure prone manufacturing system is analyzed in Section 6. Proofs are omitted
due to length limitations.
2. P r o b l e m Description. We first exhibit that the switching
diffusion (1.1),(1.2) can be constructed on a given probability
space. Our presentation follows [9],[lo]; we repeat it here for
the sake of clarity and completeness. Let U be a compact metric
space and S = {1,2,. . . , N } . Let
T=
I
P ( S ( t + S t ) = j s ( t ) = z , X ( s ) , S ( s ) , sr t )
+
= x i j ( X ( t ) , u ( t ) ) a t o(St),
i
#j,
t Department
of Mathematics, Indian Institute of Science, Bangalore
560012, India.
Department of Electrical and Computer Engineering, University of Texas,
Austin, Texas 78712-1084.
5 Electrical Engineering Department and Systems Research Center, University of Maryland, College Park, MD 20742.
‘This work was supported in part by the Texas Advanced Research Program
under Grant No. 003658-186, in part by the Air Force Office of Scientific Research under Grants F49620-92-5-0045 and F49620-92-3-0083, and in part
by the National Science Foundation under Grant CDR-8803012.
CH3229-2/92/0000-1061$1.OO 0 1992 IEEE
U
=
p,,.. .,&I‘
[Uti(.,
: Rd x
Atm
Sx U
--f
Rd
43 : Rd x s + R d X d ,
xPm : R x~ s + R ,
(1.2)
for t 2 0, X ( 0 ) = X O , S ( 0 ) = SO.where b, U , X are suitable
functions, X i , 2 0, z # j ,
X i l = 0, W ( . )is a standard
Brownian motion and U(.) is a nonanticipative control process
(admissible policy). The latter is called a Markov policy if u ( t ) =
v ( X ( t ) , S ( t ) )for a suitable function U. Our aim is to minimize
almost surely (a.s.) over all admissible policies the quantity
U(.)
rT
e,m = 1, . . . , N
N
2 0, for e # m,
Xt,,, = 0, for any
c E S.
m=l
We make the following assumptions which will be in effect
throughout the paper.
( A l ) (i) For each i E S,$(,,i;) is bounded, continuous and
Lipschitz in its first argument uniformly with respect to the
third.
(ii) ~ , j ( . .)
, is bounded, Lipschitz and for each IC, a(.,
k)u’(.,IC)
is uniformly elliptic, i.e., there exists a constant U, > 0 such that
pr.
a(.,
I C ) ~IC)
~ 2. ,
(iii) For each e, m E S ,
is continuous and Lipschitz in its
first argument uniformly with respect to the second. Also, there
exist constants ’ l o , Xo, 0 < Xo < A0 such that for C,m E S,
e # m, Xo 5 Xtm 5 Ao.
For any Polish space Y , B ( Y ) will denote its Bore1 u-field
and P ( Y ) the space of probability measures endowed with the
Prohorov topology, i.e., the topology of weak convergence. Let
1061
~
m(Y)be the set of all nonnegative integer-valued a - b i t e measures on B ( Y ) . Let m,,(Y) be the smallest a-field on m(Y)
with respect to which all the maps from m(Y)
to W U (00)of
the form p H p ( B ) with B E %(Y)are measurable. n ( Y )will
always be assumed to be endowed with this measurability structure. Let V = "(U) and b = [ b l , . . . , b d ] ' : wd x S x U -t wd be
defmed by
b,(., . , v ) =
Similarly, for
t ,j
l&(.,
.,u)u(du).
(2.1)
E S,v E V , Xi, is defined as
icy. If U(.) is a Dirac measure, i.e., U(.) =
where U(.) is
U-valued, then it is called an admissible nonrandomized policy.
An admissible policy is called feedback if U(.) is progressively
measurable with respect to the natural filtration of ( X ( . ) ,S(-)).
A particular subclass of feedback policies is of special interest.
A feedback policy U(.) is called a (homogeneous) Markov policy
if U(.) = ir(X(.),S(.))for a measurable map ir : Rd x S + U.
With an abuse in notation the map ii itself is called a Markov
policy. Let U, IIM and IIMOdenote the sets of all admissible,
Markov and Markov nonrandomized policies respectively.
If (W(.),p(.,.),Xo,So,w(.))
satisfying the above are given
on a prescribed probability space (O,F,P), then under ( A l ) ,
the equation (2.4) will admit an a.s. unique strong solution
[ll, Chap. 31, and X(.)E C(B+;Bd),S(.)E D(R+;S), where
D(W+;S) is the space of right continuous functions on 1, with
left limits taking values in S. However, if U(.) is a feedback
policy, then there exists a measurable map
f : w+ x C(R+;Wd) x D(W+;S)
4U
such that for each t 2 0, o ( t ) = f ( t , X ( - ) ,S(.))and is measurable with respect to the a-field generated by ( X ( s ) ,S(s),s 5 t).
Thus, w ( . ) cannot be specified a priori in (2.4). Instead, one has
to replace v(t) in (2.4) by f ( t , X ( . ) , S ( . ) ) ,and (2.4) takes the
form
and so on. For fixed z and v, these are disjoint intervals, and
the length of Aij(z, U ) is X;j(z,U). Now, define a function
h : Rd x S x
bY
h ( z ,i, v , z ) =
v xw
4
for t 2 0 with X ( 0 ) = XO,S(0) = SO.In general, (2.5) will
not even admit a weak solution. However, if the feedback policy
is Markov, then the existence of a unique strong solution can
be established. We now introduce some notation which will be
used throughout the paper. Define L1(Rdx S) = { f : Rd x S +
W : for each i E S,f(.,i)E L'(Wd)}. L1(Rdx S) is endowed
R
j - i if 2 E Aij(z, v)
0
otherwise.
(2.3)
Let ( X ( t ) , S ( t ) )be the (Wdx S)-valued controlled switching
diffusion process given by the following stochastic differential
equations:
with the product topology of (J!?(R~))~. Similarly, we define
C,"(Rd x S),W,2dz(Ra x S ) , etc. For f E W:;d,P(Rd x S),U E U,
we write
where
for t 2 0 with X ( 0 ) = XO,S(0) = SO,where
(i) XO is a prescribed Rd-valued random variable.
(ii) SO is a prescribed S-valued random variable.
(iii) W ( . ) = [Wl(.),. . . ,Wd(.)]' is a d-dimensional standard
Wiener process.
(iv) p(&, dz) is an m(R+ x R)-valued Poisson random memure
with intensity dt x m(dz), where m is the Lebesgue measure on
R.
and more generally, for v E
(v) p(.,-), W ( - ) XO
, and SOare independent.
(vi) U(.) is a %valued process with measurable sample paths
satisfying the following nonanticipativity property.
Let
U
The following result is proved in [lo].
Theorem 2.1. Under a Markov policy v, (2.4) admits an a.s.
unique strong solution such that ( X ( . ) ,S(.))is a Feller process
with differential generator
L".
A Markov policy v is called stable if the corresponding process
( X ( . ) ,S(.))is positive recurrent. In this case, the process will
have a unique invariant probability measure, denoted by q,, E
F ( R d x S). The uniqueness of qv is guaranteed by ( A l ) (ii)
3KE1
Then
and
are independent.
Such a process U(.) will be called an admissible (control) pol-
1062
-..
and (iii). We assume that the set of stable Markov policies is
nonempty.
for every neighborhood V of 5 having compact closure 7in
It is clear that i f f is L"-harmonic then
The Optimization Problem. Let C : W d x S x U -+ Iw+ be
the cost function. We make the following assumption on C.
(A2) For each i E S, C(.,i;) is continuous.
We define c : Rd x S x V 4 R+ by
f(.,i) = EtV,;f( X ( W ) , S ( T " ) ) .
(3.2)
Lemma 3.3. Let D C Rd be open. Then, under (AI):
(i) Every Lv-harmonic function in D is continuous in D .
(ii) If L" f = 0, f E W 2 J ( Dx S ) , then f is L"-harmonic. Conversely, i f f is Lv-harmonic and f E W,;f(D x
then Lv f = 0
in D .
(iii) (Maximum Principle) Let D be connected and f 2 0 and
L"-haxrnonic in D. Then f is either strictly positive on D x S
or identically zero.
c ( z , i , v )= J U ? ( z , i , U ) v ( d U ) .
s),
(2.9)
Let U(.) be an admissible policy and (X(,),
S(.)) the corresponding process. The pathwise (long-run) average cost incurred under U(.) is
Tl
limsup
T-CS
1
T
c ( x ( t ) ,~ ( t ) , v ( t ) ) d t .
xll(~)
(2.10)
We wish to a.s. minimize (2.10) over all admissible policies. Our
goal is to establish the existence of a stable Markov policy which
is a.s. optimal. We will carry out our study under two alternate
sets of hypotheses: (a) a condition on the cost which penalizes
unstable behavior, (b) a blanket stability condition which implies that all Markov policies are stable. We will describe these
conditions in Section 4.
3.
Recurrence and Ergodicity of Switching Diffusions. Due to the interaction between the continuous and discrete components, the study of recurrence and ergodicity of
switching diffusions is quite involved. Let v be a Markov policy which will be fixed throughout this section unless explicitly
mentioned otherwise. Let P" : R+ x W dx S 4 P(Wdx S) denote
the transition function of the corresponding process ( X ( . ) ,S(.)).
Also P:,l and E:,i denote the probability measure and the expectation operator, respectively, on the canonical space, of the
process ( X ( . ) , S ( . ) )starting at (z,i)E !Itdx S. The following
result plays a crucial role in recurrence.
Lemma 3.1. For any ( t ,I,i) E
P " ( t , z , i ; . )is R~ x S .
Let
T,:,T~ be
T ,=
~ inf{t
T, =
Let D
inf{t
W+
x Rd x S, the support of
> o : S ( t )= j } .
2 0 : ( X ( t ) , S ( t ) )f D
2 0 : X ( t )4 D } .
Theorem 3.1. Let 52 C Rd be a bounded open domain and D
an open set whose closure is a compact subset of 0. Let f 2 0
a n d L U f = O i n W Z , p ( R x S )Thenforany5,yE
.
Dandi,j E S ,
we have
f ( 2 , 4 5 Cf(Y,j) I
where C is a constant which depends only on d, N , the diameter
of D, the Hausdodf distance between D, the bounds on 5, U ,
X in (AI), the ellipticity constant of uu', and is independent of
the L"-harmonic function f or the Markov policy v.
n,
x {i}}
P z , i ( X ( t n )E B ( I , E ) , S ( ~=, )i , for a sequence t ,
A point
Lemma 3.2. Under (AI), the following holds
sup
V E H M , ZEBd
(ii)
(iii)
sup
E,",,TD,,< m.
V E n M , ZERd
v E H M , rERd
(iv)
sup
"EnM,L
E,",,rD< CO.
-
~
~
d
It is well known that the properties of harmonic functions
of a Markov process play an important role in recurrence and
ergodicity [2]. Therefore we will study some properties of the
S(.)) under the Markov
harmonic functions of the process (X(.),
policy v . Let D c W d be an open set and f : D x S 4 R.
The function f is called L"-harmonic on D if it is bounded on
compact subsets of D , and for all z E D , i E S ,
f ( 5 ,2 )
= E,",if(X(Tv,i),
S(Tv,i))
t m) = 1.
is transient if
4
m) = 1
If all points of the switching diffusion are recurrent, then it is
called recurrent. A transient switching diffusion is similarly defined. Under (Al) (ii), (iii), we show that a switching diffusion
is either recurrent or transient.
< m.
E:,,.,
(z, i)
P,",:(IX(t)l403, as t
E $ T , ~< cm.
sup
> 0.
We now discuss the recurrence properties of switching diffusions. Our treatment closely follows [2]. The point (z,i)E
W d x S is said to be recurrent if given any E > 0,
Using (3.2) and well known arguments in Markov processes [8,
Vol. I, p. 1111 the following results can be proved.
(i)
Jz,
We next establish Harnack's inequality for L"-harmonic functions, which is a very important result in partial differential
equations. This result will not play any role in recurrence or
ergodicity, but it will be crucial in deriving a certain estimate in
Section 5.
A0
c Rd be a bounded open set. Define
= inf{t
=
Remark 9.2. The counterexample in Remark 3.1 again
that Theorem 3.1 does not hold if we drop the condition X;j 2
> 0 : S ( t ) = i and S(u) # i , for some 0 < U < t }
TD
Remark 3.1. (i) The condition
2 A 0 plays a crucial role
in the above results. For example, the maximum principle does
not hold without this, as
- the following
- counterexample shows.
Let d = I, s = {1:2}, ~ 1 1 ( , )= 0, ~ 2 1 ( . ) 1, i; = 0, D =
f ( z , l ) E 0, f ( z , 2 ) = coshz. Then it is easily verified that
Lf(z, 2 ) 5 0, i = 1,2.
(ii) To the best of our knowledge, this maximum principle is not
known in the literature on partial differential equations.
cows
the stopping times defined as follows.
TD,;= inf{t
D.
Lemma 3.4. Under (AI), the following statements are equivalent. (i) The switching diffusion is recurrent. (ii) P,",i(X(t)E
D , for some t 3 0) = 1,for all z E Rd, i E S and any non-empty
open set D . (ii) There exists a compact set K c W d such that
P,",,(x(~)E K , for some t 2 0) = 1 for d (z,i)E W~ x S. (iv)
P,",;(X(t,) E D , for a sequence t , t w) = 1, for all 5 E Rd,
i E S and any non-empty open set D . (v) There exists a
point z E Itd, a pair of numbers ro, r1, 0 < ro < r l , and a
y E a B ( z , r l ) such that P:,i(qz,ro)C
< m) = 1, for any i E S.
(3.1)
1063
Theorem 3.2. Under (Al), for any Markov policy, the switching diffusion is either recurrent or transient.
A recurrent switching diffusion will admit a unique (up to
a constant multiple) o-finite invariant measure. The switching
diffusion is called positive recurrent if it is recurrent and admits
a finite invariant measure.
A Markov policy v is called stable if the corresponding process is positive recurrent; the corresponding invariant probability
measure is denoted by vu.
Theorem 3.3. Let z, ro, rl be as in Lemma 3.4(v). Then under
(Al), the switching diffusion is positive recurrent if
(3.3)
Note that it may be very difficult to verify (3.3) for general
b, CT, A. One usually verifies (3.3) by constructing a Liapunov
function (21. For switching diffusions such a construction seems
difficult, since it involves solving a system of ordinary differential
equations in closed form. However, we present some criteria for
positive recurrence and discuss some implications.
(A3) There exists a w E CZ(Rdx S), w 2 0 such that
t"w(z,i)-+ 00 as 1x1 -+ 00 for each i, E,V,iw(X(t),S(t)) and
Ez,ilL"w(X(t),S ( t ) ) (are locally bounded, and
L W w ( zi ,) L: p - Q"(I,
for some p
2)
> 0 and q > 0.
We assume that p* < M. Consider the following conditions.
(A5) Assume that for each i E S,
lim inf inf Z(z, i, U )
(z1-m uEU
(A6) There exists a w E C2(Rdx S),w 2 0 such that
(i) w(x,i)+ 00 as ( 2 )-+ 00 uniformly in1.1 for each i.
(ii) For each z1 E n ~E,V,iw(X(t),
,
S(t))
and E : , i ) L w w ( X ( t S(t))
),
are locally bounded.
(iii) There exists p > 0, q > 0 such that L'w(z, i) 5 p - qw(z,i)
for each u E U.
Our main result is the following.
1
Theorem 4.1. Let (Al), (A2) hold. Under any one of the
conditions (A4), (A5) or (A.6), there exists a v* E I I ~ M D
which
is a.s. optimal.
5. Hamilton-Jacobi-Bellman Equations. In this section,
we will study the HJB equations and characterize the optimal
policy in terms of their solution. We will work under the following assumption.
(A7) The cost function Z is bounded, continuous and Lipschitz
in its first argument uniformly with respect to the third.
We will follow the vanishing discount approach, i.e., derive
the HJB equations for the ergodic criterion as a vanishing limit
of the HJB equations for the discounted criterion as the discount
factor approaches zem. The results follow those of [6]. However,
they differ in important technical details. For a > 0, I E Etd,
i E S,let V,(z,i) denote the discounted value function with
discount factor a > 0, i.e.,
Theorem 3.4. Under (Al) and (A3), the process ( X ( . ) , S ( . ) )
under the Markov policy v is positive recurrent.
(A4) There exists a Cz function w : Etdx S + W+ such that
(i) limlzl+oow ( z , i )=
uniformly in 1x1.
(ii) There exists a > 0, E > 0 such that for (I
>(
a, L'w(z, i) <
- E , for all U E U, i E S, and IVw(z, i)12 2 p-', where p is the
constant in (AI) (iii).
(iii) w ( z , i) and IVw(z, i)l have polynomial growth. The proof
of [4, Lemma 6.2.2, p. 1501 can be closely paralleled to yield the
following.
+M
Theorem 3.5. Under (AI) and (A4), the process ( X ( . ) , S ( . ) )
under any Markov policy v is positive recurrent. Thus, all
Markov policies are stable.
4. Existence of an Optimal Policy. In this section we will
establish the existence of a stable Markov nonrandomized policy under certain conditions. We will follow the methodology
developed in (41, [5], [6],[7] for controlled diffusions. For switching difEusions, similar techniques carry through with some extra
technical details.
Let &M and I I ~ M Ddenote the set of stable Markov and
stable Markov nonrandomized policies respectively. Since we
look for an optimal policy in IISMD,it is natural to assume that
it is nonempty. Let ZI E IISMD. Then
V,(
CL,
c(z, i, v(z,z))vw(dz,i) := pv
I, i) = 1% E:,i
[/
00
e-"tc(X(t), S(t),u ( t ) ) d t ].
Theorem 5.1. Under (AI), (A7), V, is the unique solution in
c ~ ( Rx S)
~ n cb(wd
x S ) of
inf { LuV0(z, i)
UEU
+ ?(I, i,
(4.1)
U)}
= aV,(z,
2).
(5.2)
For i E S, set
Gi = { z E
W d : inf E(z,i, U ) 5 p ' }
UECJ
N
G=UG~
(5.3)
i=l
By (A5) and (A7), G is compact.
The following result plays a very crucial role.
Lemma 5.1. Under (AI), (A5), (A7), there exists a a. E ( 0 , l )
such that for (Y E (0,a0],min,,i Vol(z,i)is attained on the set
G as defined in (5.3). Ako, IVol(z,i)- V,(y, j ) l is bounded on
compacta, uniformly over a E (0, ao).
Theorem 5.2. Under (AI), (A5), (A7), there exists a function
V : W d x S -+ R and a scalar p E R such that for some fixed
io E S,
E C2(Rdx
S) and (V,p) satisfies the HJB equations given by
UEU
inf ( L ~ V ( Si),
a.s.
(5.1)
The following result is proved in [lo].
V
=
> p*
+ ?(I,i , ~ ) =] p.
(5.5)
Based on Lemma 5.1 and Theorem 3.1, the following results
can be derived by closely following the methods in [6].
Lemma 5.2. Assume (AI), (A5), (A7). Let v E IIMDbe such
that for each i
certain hedging point policy is stable. Therefore, by the results
of Section 4 there exists an a.s. optimal Markov nonrandomized
policy for the cost criterion
N
The HJB equations in this case are
+ ~ ~ , j ( x , u ( ~ , i ) ) V ( x , j ) f C ( z , i , ~a.e.
( z , i (5.6)
))
$V”(I, 0) - dV‘(s, 0)
$Vtf(z, 1) - minUEro,rl{(u
- d)V‘(z, 1))
J=1
Then U E ISMD D. The scalar p in (5.5) equals p* and v in (5.6)
is a.s. optimal.
Theorem 5.3. Among all pairs (b,p) E W,;:(Rd
x S ) x R,
2 5 p < 00, satisfying (5.5), (V,p*), where If is the function in
Theorem 5.2, is the unique one satisfying the properties (5.4).
The results of Section 5 ensure existence of a C2solution (V,p*)
of (6.2), where p* is the optimal cost. Using the convexity of
c(.), it ran be shown that V ( . , i )is convex for each i. Hence
there exists an x* such that
Theorem 5.4. Under (Al), (A5), (A7), v E I I ~ M is
D a.s. optimal if and only if (5.6) holds.
V ‘ ( I , ~5
) O
20
Remark 5 . 1 . The bounded condition on the cost function E can
be dropped by mimicking the arguments in (6, p. 2021.
We will now study the HJB equation under ( A l ) , (AG) and
(A7). Recall that under ( A l ) , (A6), IIM = ~ S M We
. say that
a function f : Rd x S --t R is in the class O(w) if for each i E S
limsupIf(z,i)l/w(z,i) < 00.
I ~ I - ~
for z 5 I*
for x 2 I*.
(6.3)
From (6.2), it follows that the value of U which minimizes
~ ) V ’ ( I1)
, is
r if x 5 I*
U = {
0 ifz>z*.
(5.7)
(U -
I = I*, V’(z*,l) = 0 and therefore any U E [O,r]minimizes
- d)V‘(s, 1). Thus, in view of Theorem 5.4, we can choose
any U E [O,r]at I = I*. To be specific, we choose U = d at,
z’, i.e., we just produce to meet the demand exactly. Thus, the
following U* E I I ~ M Dis optimal
At
(U
Theorem 5.5. Under (AI), (AS) and (A7), the equation (5.5)
admits a unique solution (V, p) in the class Wf;z(Rd x S)nO(w),
2 5 p < CO satisfying V(0, io) = 0 for a fixed io E S.
Remark 5.2. (i) The statement of Theorem 5.5 holds under (Al),
(A4) and (A7). (ii) For the stable case we have carried out our
analysis under the Liapunov condition (A6). Analogous results
can be derived under the weaker condition (A4).
6. An Application t o a Manufacturing Model. We now
use the results of the previous sections to analyze the manufacturing model studied in [l],[3], [lo]. Suppose there is one
machine producing a single commodity. We assume that the demand rate is a constant d > 0. Let the machine state S ( t ) take
values in ( 0 , I}, S(t) = 0 or 1, according as the machine is down
or functional. Let S ( t ) be a continuous time Markov chain with
generator
[ -xlo _hhui]
where A0 > 0. A1 > 0, A0 and A1 are infinitesimal repair and
failure rates respectively. The inventory X ( t ) is governed by the
Ito equation
d X ( t ) = (u(t)- d ) d t + odW(t)
(6.1)
where U > 0, u(t) is the production rate, W ( t ) is a onedimensional Wiener process independent of S ( t ) . The last
term in (6.1) can be interpreted as “sales return”, “inventory
spoilage”, “sudden demand fluctuations”, etc. A negative value
of X ( t ) represents backlogged demand. The production rate is
constrained by
u(t) =
i
r
v*(z,O) = 0,
v * ( z , 1) =
if
I
< I*
d if I = I*
0 if z > I*.
(6.4)
The stability of the policy (6.4) follows from Lemma 5.2 provided we show that n s , ~
is nonempty. We show that the zeroinventory policy v given by
is stable if and only if
The condition (6.6) is very appealing from an intuitive point of
view. Note that ’A,
and A T 1 are mean sojourn time of the chain
in states 0 and 1 respectively. In state 0 the mean inventory depletes at a rate d while in state 1 it builds up at the rate ( T - d).
Thus, if (6.6) is satisfied, one would expect the zero-inventory
policy to stabilize the system. Our analysis confirms this intuition. We first show that under U the process (X(.),S(.)) has
an invariant measure 7” with a strictly positive “density-mass”.
In view of Lemma 3.1, it would follow from the ergodic theory
of Markov processes [13,Chap. 11 that ( X ( . ) , S ( . ) )is posit,ive
recurrent. Thus, v would be stable. To this end, we attempt to
solve the adjoint system
(L”)’q5(s,t)= 0
if S(t) = 0
E [O,r] if S(t) = 1.
Let c : B -+ R+ be the cost function which is assumed to be
convex and Lipschitz. Also .(I) = ~‘(1x1)for some c’ : Ut+ 4 R+
which is increasing. Thus, c satisfies (A5). We will show that a
Without any loss of generality, assume that
x > 0,
1065
U =
a. Then for
Solving (6.15a)-(6.15d), we obtain a unique solution
satisfying
O<p<a,
(a,P,y,6)
(6.16)
O<6<y.
It follows from (6.16) and the foregoing that
4+(z) > 0,
for I 2 0
4-(1) > 0,
and
for
I
< 0.
+
CY,
p
E
Note that if r A 0 5 d(A1 Ao), then s3 = 0 and there exists no
positive solution d-(z) of (6.10) in L’(-m,O].
In [3], Bielecki and Kumar have studied the mean square
stability of the piecewise deterministic system, i.e., (6.1) with
U = 0. They have shown that for
w,s1 = d , s2 = ( d + d m ) / 2 .For z < 0,
~ “ ( . , O ) + d ~ ’ ( z , 0 ) - A 0 4 ( 5 , 0 ) + ~ 1 4 ( ~ ,=
~ )0
4”(1, 1) - ( r - d)#(ir,
1) - X 1 4 ( 1 , 1)
(6.10)
+ Ao4(z, 0) = 0
r-d
.
+
where y, 6 E W, G(s) = E.’ ds - Xo and
roots of the polynomial
s3, s4
the policy (6.5) is mean square stable. Our analysis shows that
additive noise in (6.1) retains the stability of the zero-inventory
policy as long as strict inequality holds in the above.
.
are the positive
ordered by 0 < s3 < s 4 . It can be verified that
+(s4) > 0. We need to satisfy
Acknowledgement. The authors wish to thank Prof. S. R. S.
Varadhan for explaining to us the work of Krylov and Safonov
on Harnack’s inequality.
$(s3)
< 0,
(6.13)
4+(0) = 4-(0)
REFERENCES
(6.12)
s3-(r-2d)~’-[(r-d)dSAo+A1]~+[rAo-d(X1+Xo)]
d
T>xo
Under (6.6), all the solutions of (6.10) in L’(-m,O] can be parameterized by
1. R. Akella and P. R. Kumar, Optimal control of production rate in a
failure prone manufacturing system, IEEE Ttans. Automat. Control
AC-31 (1986), 116-126.
2. R. N . Bhattacharya, Criteria for recurrence and ezistence of invariant measures for multidimensional diBusion.4, Annals of Probability 6
(1978), 541-553.
3.
and
4.
(6.13) is simply the continuity requirement, whereas (6.14)
should hold since S(t) has a unique invariant probability measure A which satisfies
Notes in Math. Series, No. 203, Longman, Harlow,UK, 1989.
5. V. S. Borkar and M. K . Ghosh, Ergodic control of muliidimewional
difiaions, I: The ezistence results, SIAM J . Control Optim. 26 (1988),
112-1 26.
6.
7.
8.
9.
The conditions (6.13), (6.14) are equivalent to the set of linear
equations
a-P=y-6
(6.15a)
AoCY
+ Alp =
-7%qS3)
+ 6*(s*)
(6.15b)
1066
T. Bielecki and P. R. Kumar, Optimality of zero-inventory policies for
unreliable manufacturing systems, Oper. Res. 36 (1988), 532-546.
V. S.Borkar, Optimal Control of Diffwion Processes, Pitman Research
10.
-,
Ergodic control of multidimensional difiions, II: Adaptive
contml, Appl. Math. Optim. 21 (1990), 191-220.
-,
Controlled diffurions with constraints, J . Math. Anal. Appl.
152 (1990), 88-108.
E. B. Dynkin, Markov Processes Vols. I and II, Springer-Verlag, New
York, 1965.
M. K . Ghosh, A. Arapostathis and S . I. Marcus, An optimal control
problem arising in flezible manufacturing systems, Proc. 30th IEEE
Conf. on Decision and Control, Brighton, England (1991), 1844-1849.
-,
Optimal control of switching diffusions with application to
flezibfe manufacturing systems, SIAM J . Control Optim. (to appear).
Download