9.3 Figure 9.7 Concept of Suboptimization and Principle of Optimality 551 Suboptimization (principle of optimality). suboptimization is shown in Fig. 9.7. Since the suboptimizations are to be done in the reverse order, the components of the system are also numbered in the same manner for convenience (see Fig. 9.3). The process of suboptimization was stated by Bellman [9.2] as the principle of optimality: An optimal policy (or a set of decisions) has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. Recurrence Relationship. Suppose that the desired objective is to minimize the n-stage objective function f , which is given by the sum of the individual stage returns: Minimize f = R n(x n, s + + R nŠ (x1 )1n where the state and decision variables are related as si = t i(s +1i x i), , nŠ1 , s n) + · · · + R s 2) i = 1, 2 . ., . , n 1(x 1, (9.9) (9.10) 552 Dynamic Programming Consider the first subproblem by starting at the final stage, i = 1. If the input to this stage s 2 is specified, then according to the principle of optimality, x 1 must be selected to optimize R 1. Irrespective of what happens to the other stages, x 1 must be selected such that R we have 1(x 1, s 2) is an optimum for the input s f _(s 2) 1 = opt[R 2. If the optimum is denoted as f 1(x 1, s 1_, 2)] (9.11) x1 This is called a one-stage policy since once the input state s 2 is specified, the optimal values of R 1, x 1, and s 1 are completely defined. Thus Eq. (9.11) is a parametric equation giving the optimum f1_ as a function of the input parameter s 2. Next, consider the second subproblem by grouping the last two stages together. If f2_ denotes the optimum objective value of the second subproblem for a specified value of the input s 3, we have f _(s 2 3) = opt [R 2(x 2, s 3) +R 1(x 1, s 2)] (9.12) x ,x2 are specified, Eq. (9.12) can be written s2 . Since s2 can be obtained once x21 and s3 The principle of optimality requires that x 1 be selected so as to optimize R 1 for a given as f )] _(s 3 2 ) = opt[R2 (x2 ,3s ) + •f (s 2 (9.13) 1 x2 Thus f2_ represents the optimal policy for the two-stage subproblem. It can be seen that the principle of optimality reduced the dimensionality of the problem from two [in Eq. (9.12)] to one [in Eq. (9.13)]. This can be seen more clearly by rewriting Eq. (9.13) using Eq. (9.10) as f 2_(s3 ) = opt[R2 (x2 ,3s ) + •f 2{t 2(x 3, s 1 x2 )}] In this form it can be seen that for a specified input s 3, (9.14) the optimum is determined solely by a suitable choice of the decision variable x 2. Thus the optimization problem stated in Eq. (9.12), in which both x 2 and x 1 are to be simultaneously varied to produce the optimum f 2_, is reduced to two subproblems defined by Eqs. (9.11) and (9.13). Since the optimization of each of these subproblems involves only a single decision variable, the optimization is, in general, much simpler. This idea can be generalized and the ith subproblem defined by f i _(s + ) =1i [R i(x i, s + + R iŠ (x1 )1i opt x i,x iŠ1,...,x1 , s i) + · · · + R s 2)] iŠ1 1(x 1, (9.15) which can be written as f )1i i _(s + = opt[R i(x i, s + )1i + f_ iŠ1 (s i)] (9.16) x i denotes the optimal value of the objective function corresponding to the last _ f iwhere Š 1 stages, and s i is the input to the stage i Š 1. The original problem in Eq. (9.15) iŠ1 requires the simultaneous variation of i decision variables, x 1, x 2, . . . , x i, to determine 9.4 Computational Procedure in Dynamic Programming 553 the optimum value of f i •i k=1 Rk for any specified value of the input s . This +1i problem, by using the principle of optimality, has been decomposed into i separate = problems, each involving only one decision variable. Equation (9.16) is the desired recurrence relationship valid for i = 2, 3, . . . , n. 9.4 COMPUTATIONAL PROCEDURE IN DYNAMIC PROGRAMMING The use of the recurrence relationship derived in Section 9.3 in actual computations is discussed in this section [9.10]. As stated, dynamic programming begins by suboptimizing the last component, numbered 1. This involves the determination of f 1 _(s 2) = opt[R 1(x 1, s 2)] (9.17) x1 the of The best value of the decision variable x 1, denoted as x 1_, is that which makes depend on the condition of the input or feed that the component 1 receives from return (or objective) function R1 assume its optimum value, denoted by f 1_. Both x1_ and f1_ the upstream, that is, on s 2. Since the particular value s 2 will assume after the upstream components are optimized is not known at this time, this last-stage suboptimization problem is solved for a "range" of possible values of s2 and the results are entered into a graph or a table. This graph or table contains a complete summary of the results of suboptimization of stage 1. In some cases, it may be possible to express f1_ as a function of s 2. If the calculations are to be performed on a computer, the results suboptimization have to be stored in the form of a table in the computer. Figure 9.8 shows a typical table in which the results obtained from the suboptimization of stage 1 are entered. Next we move up the serial system to include the last two components. In this two-stage suboptimization, we have to determine f 2 _(s 3) = opt [R 2(x 2, s 3) +R 1(x 1, (9.18) s 2)] in Eq. (9.18) to x 2,x1 Since all the information about component 1 has already been encoded in the table corresponding to f Figure 9.8 _, 1 this information can then be substituted for R1 Suboptimization of component 1 for various settings of the input state variable s 2. 554 Dynamic Programming get the following simplified statement: f )] _ 3 2 (s 1• (9.19) 2 ) = opt[R2 (x2 ,3s ) + f (s x2 Thus the number of variables to be considered has been reduced from two (x 1 and x 2) to one (x 2). A range of possible values of s 3 must be considered and for each one, x2_ for different must be found so as to optimize [R 2 + f 1_(s 2)]. The results (x2_ and f2_ s3 ) of this suboptimization are entered in a table as shown in Fig. 9.9. Figure 9.9 variable s 3. Suboptimization of components 1 and 2 for various settings of the input state