Chapter 13 Stochastic Optimal Control The state of the system is represented by a controlled stochastic process. Section 13.2 formulates a stochastic optimal control problem. We shall consider stochastic differential equations of a type known as Itô equations, which are perturbed by Markov diffusion processes, and our goal will be to synthesize optimal feedback controls for systems subject to Itô equations. In Section 13.3, we shall extend the production planning model of Chapter 6. In Section 13.4, we solve an optimal stochastic advertising problem. In Section 13.5, we will introduce investment decisions in the consumption model of Example 1.3, and consider both risk-free and risky investments. In Section 13.6, we will conclude the chapter by mentioning stochastic optimal control problems involving jump Markov processes, which are treated in Sethi and Zhang (1994a, 1994c). 13.2 Stochastic Optimal Control We assume the state variables to be observable, and we use the dynamic programming or the HamiltonJacobi-Bellman framework rather than the stochastic maximum principle. Maximize subject to the Itô stochastic differential equation V(x,t), the value function satisfies By Taylor’s expansion, we have From (3.29), we can formally write The multiplication rules of the stochastic calculus are: 13.3 A stochastic Production Planning Model Xt = the inventory level at time t (state variable), Ut = the production rate at time t (control variable), S = the constant demand rate at time t ; S >0, T = the length of planning period, = the factory-optimal inventory level, = the factory-optimal production level, x0 = the initial inventory level, h = the inventory holding cost coefficient, c = the production cost coefficient, B = the salvage value per unit of inventory at time T, zt = the standard Wiener process, = the constant diffusion coefficient. Let V(x,t) be the value function. It satisfies the Hamilton-Jacobi-Bellman (HJB) equation Remark 13.1: If production rate were restricted to be nonnegative, then (13.44) would be changed to 13.3.1 Solution for the Production Planning Problem Since (13.51) must hold for any value of x , we must have where the boundary conditions are obtained by comparing (13.47) with the boundary condition V(x,T) = Bx of (13.43). To solve (13.52), we expand fractions to obtain by partial where Since S is assumed to be a constant, we can reduce (13.53) to By the change of variable defined by solution is given by the Remark 13.2 The optimal production rate in (13.59) equals the demand plus a correction term which depends on the level of inventory and the distance from the horizon time T. Since (y-1) < 0 for t < T, it is clear that for lower values of x, the optimal production rate is likely to be positive. However, if x is very high, the correction term will become smaller than –S, and the optimal control will be negative. In other words, if inventory level is too high, the factory can save money by disposing a part of the inventory resulting in lower holding costs. Figure 13.2: A Sample Path of Xt with X0=x0 >0 and B>0 13.4 A Stochastic Advertising Problem The model is : where Xt is the market share and Ut is the rate of advertising at time t . V(x) is the expected value of the discounted profits from time t to infinity. Since T = , the future looks the same from any time t, and therefore the value function does not depend on t. We can write the HJB equation as We obtain the explicit formula for the optimal feedback control as Eventually, the market share process hovers around the equilibrium level 13.5 An Optimal Consumption-Investment Problem Consider investing a part of Rich’s wealth in a risky security or stock that earns an expected rate of return that equals > r . The problem of Rich, known now as Rich Investor, is to optimally allocate his wealth between the risky-free savings account and the risky stock over time and also consume over time so as to maximize his total utility of consumption. The savings account is easy to model. Modeling the stock is more complicated. where is the average rate of return on stock, is the standard deviation associated with the return, and zt is a standard Wiener process. The price process Pt given by (13.74) is often referred to as a logarithmic Brownian Motion. Notation: Wt = the wealth at time t , Ct = the consumption rate at time t , Qt = the fraction of the wealth invested in stock at time t, 1-Qt= the fraction of the wealth kept in the savings account at time t , U(c) = the utility of consumption when consumption is at the rate c; the function U(c) is assumed to be increasing and concave, = the rate of discount applied to consumption utility, B = the bankruptcy parameter to be explained later. The term QtWtdt represents the expected return from the risky investment of QtWt dollars during the period from t to t+dt. The term QtWtdzt represents the risk involved in investing QtWt dollars in stock. The term (1-Qt)rWt dt is the amount of interest earned on the balance of (1-Qt)Wt dollars in the savings account. Finally, Ctdt represents the amount of consumption during the interval from t to t+dt . We shall say that Rich goes bankrupt at time T , when his wealth falls to zero at that time. T is a random variable, called a stopping time, since it is observed exactly at the instant of time when wealth falls to zero. Rich’s objective function is: See Sethi(1997a) for a detailed discussion of the bankruptcy parameter B . Simplify the problem by assuming: We also assume B =-. The condition (13.81) together with B =- implies a strictly positive consumpton level at all times and no bankruptcy. Karatzas, Lehoczky, Sethi, and Shreve(1986) assumed that the value function is strictly concave and, therefore, Vx is monotonically decreasing in x . This means that The function c(•) defined in (13.83) has an inverse X(•) such that (13.84) can be written as Note that c(X(c))=c, c’(X)X’(c)=1, and therefore, c’(X)= 1/ X’(c). Differentiation with respect to c yields the intended second-order, linear ordinary differential equation This problem and its many extensions have been studied in great detail. See,e.g., Sethi(1997a). 13.6 Concluding Remarks For stochastic optimal control application to manufacturing problems, see Sethi and Zhang (1994a) and Yin and Zhang (1997). For applications to problems in finance, see Sethi(1997a) and Karatzas and Shreve(1998). For applications in marketing, see Tapiero(1988). For applications in economics including economics of natural resources, see Derzko and Sethi(1981a), and Malliaris and Brock(1982).