int. j. prod. res., 1 september 2004, vol. 42, no. 17, 3619–3625 Optimal control of a remanufacturing system K. NAKASHIMAy*, H. ARIMITSUy, T. NOSEy and S. KURIYAMAz An optimal control problem of a remanufacturing system under stochastic demand is studied. The system is formulated by a Markov decision process, which is a class of stochastic sequential processes in which the reward and transition probability depend only on the current state of the system and the current action. The models have gained recognition in such diverse fields as engineering, economics, communications, etc. Each model consists of states, actions, rewards and transition probabilities. The paper considers two types of inventories: the actual product inventory in a factory and the virtual inventory used by a customer. The state of the remanufacturing system is defined by both inventory levels. One can obtain the optimal production policy that minimizes the expected average cost per period. The paper also considers some scenarios under various conditions and shows the example of controlling the remanufacturing system. 1. Introduction The escalating growth in consumer waste in recent years has started to threaten the environment. Currently, product recovery is practised in part because of the escalating deterioration of the environment and in part because of profit motives. Product recovery aims to minimize the amount of waste sent to landfill sites by recovering materials and parts from old or outdated products by means of recycling and remanufacturing. Product recovery includes collection, disassembly, cleaning, sorting, repairing, reconditioning, reassembling and testing (Gupta and Taleb 1991, Brennan et al. 1994). The present paper deals with an optimal control problem of a remanufacturing system under stochastic variability such as demand. The system is formulated into a Markov decision process (MDP) (Howard 1960, Puterman 1994). The MDP is a class of stochastic sequential processes in which the reward and transition probability depends only on the current state of the system and the current action. MDP models have gained recognition in such diverse fields as economics, communications, transportation, etc. Each model consists of states, actions, rewards and transition probability. In the engineering field, for example, the approach is used for controlling the production system (Ohno and Ichiki 1987, Ohno and Nakashima 1995). Choosing an action as a production quantity in a state generates rewards and/or costs and determines the state at the next decision epoch through a transition Revision received May 2004. yDepartment of Industrial Management, Osaka Institute of Technology, 5-16-1 Omiya, Asahi-ku, Osaka, 535-8585 Japan. zSetsunan University, Faculty of Business Administration and Information, 17-8, Ikedanakamachi, Neyagawa, Osaka, 572-8508 Japan. *To whom correspondence should be addressed. e-mail: nakasima@dim.oit.ac.jp International Journal of Production Research ISSN 0020–7543 print/ISSN 1366–588X online # 2004 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/00207840410001721772 3620 K. Nakashima et al. probability function. One can then obtain the optimal production policy that minimizes the expected average cost per period in the optimal production control problem. The paper considers two types of inventories: the actual product inventory in a factory and the virtual inventory used by customers. It defines the state of the remanufacturing system by considering both inventory levels. It then obtains the optimal production policy that minimizes the expected average cost per period. The paper also considers some scenarios under various conditions and optimizes the remanufacturing system. Section 2 briefly summarizes remanufacturing environments and the relevant literature. Section 3 considers a single-item remanufacturing system under stochastic demand. The system is formulated into an undiscounted MDP to determine the optimal control policy that minimizes the expected average cost per period. Finally, the numerical results of controlling the remanufacturing system under various conditions are shown. 2. Remanufacturing environments and a literature review The paper focuses on the operational aspect of product recovery in the remanufacturing environment with stochastic variability. Remanufacturing systems have to make operational, manufacturing, inventory, distribution and marketing-related decisions (Stock 1992, Kopicky et al. 1993). In general, the existing methods for traditional production systems cannot be used for remanufacturing systems directly. Remanufacturing environments are characterized by their highly flexible structures. Flexibility is required to handle the uncertainties, e.g. collection and discard, that are likely to arise. In addition, note that the products used by consumers will be collected and recovered and used as parts for remanufacuring. Gungor and Gupta (1999) and Moyer and Gupta (1997) review the literature in the area of environmentally conscious manufacturing and product recovery. They summarize much of the area including industrial examples, modelling and solutions. Minner (2001) notes there are the two well-known streams in product recovery research area: stochastic inventory control (SIC) and material requirement planning (MRP). The present paper restricts itself to SIC. As for the periodic review models, Cohen et al. (1980) develop the product recovery model in which the collected products are used directly. Inderfurth (1997) discusses the effect of non-zero lead times for orders and recovery in the different model. As for continuous review models, Muckstadt and Isaac (1981) deal with a model for a remanufacturing system with non-zero lead times and a control policy with the traditional (Q, r) rule. Van der Laan and Salomon (1997) suggest push and pull strategies for the remanufacturing system. Guide and Gupta (1999) present a queuing model to study a remanufacturing system. Kiesmuller (2003) discusses optimal policy for a one-product recovery system with lead times. The policy was composed of the optimal manufacturing rate, remanufacturing rate and disposal rate over a finite-planning horizon. All the above studies, however, considered demand and procurement as independent being in the systems. Nakashima et al. (2002) deal with a product recovery system with a single class of product life cycle. They propose a new analytical approach to evaluate the system using a Markov chain and give numerical examples for various conditions. Optimal control of a remanufacturing system 3621 3. Optimization of the system We formulate a remanufacturing system with stochastic variability such as demand using a discrete time Markov decision model. Consider a single process that produces a single-item product. The finished products are stocked in the factory and faced according to customer demand. Traditional inventory management focuses on only the inventory in the factory. The remanufacturing system, however, should focus on the outdated products collected from customers. That is, the remanufacturing producers have to consider the products in use as part of the future inventory. The products used by consumers are taken here to be the virtual inventory. It is important to manage the virtual inventory until products are collected and used in remanufacturing as well as controlling the inventory on hand. 3.1. The model Figure 1 shows a remanufacturing production system. Remanufacturing preserves the product’s or the part’s identity, and performs the required disassembly and refurbishing operations to bring the product to a desired level of quality with remanufacturing cost. On the other hand, we define normal manufacturing as producing the products using new resources. The number of products by normal manufacturing at period t, P(t), is chosen as an action, k, i.e. k ¼ P(t). Products are produced by normal manufacturing and/or remanufacturing with the parts taken back from customers. All production begins at the start of the period and all products are completed by the end of the period. All the products bought by consumers are new. It is assumed that the number of finished products and that of the products bought by consumers are I(t) and J(t), respectively. If the backlog occurs, I(t) is negative. Demand in successive periods, D(t), is composed of independent random variables with a known identical distribution. When sold, products are remanufactured at the remanufacturing rate, l, with the remanufacturing cost including the collecting cost, and products in use are discarded at the discarded rate, , with an out-of-date cost. It is supposed that l þ 1. 3.2. Formulation The state of the system is denoted by: sðtÞ ¼ ðIðtÞ, JðtÞÞ: ð1Þ P(t) Inventory on hand I(t) • λ Factory o o o D(t) Customer Virtual Inventory J(t) µ Figure 1. Remanufacturing system. 3622 K. Nakashima et al. The transition of the each inventory is given by: Iðt þ 1Þ ¼ IðtÞ þ k þ lJðtÞ DðtÞ ð2Þ Jðt þ 1Þ ¼ JðtÞ lJðtÞ JðtÞ þ DðtÞ ð3Þ The action space in s(t), K(s(t)) is defined by: KðsðtÞÞ ¼ f0, . . . , maxf0, Imax IðtÞ lJðtÞgg: The transition probability is defined as: 8 > < PrfDðtÞ ¼ dg if sðt þ 1Þ ¼ ðIðtÞ þ k þ lJðtÞ d, PsðtÞsðtþ1Þ ¼ JðtÞ lJðtÞ JðtÞ þ dÞ, > : 0 otherwise: ð4Þ ð5Þ The expected cost per period in state s(t) under k, r(s(t),k) is given by: rðsðtÞ, kÞ ¼ CN k þ CR lJðtÞ þ CH ½IðtÞþ þ CB ½IðtÞþ þ CO JðtÞ, ð6Þ where CH CN CR CB CO holding cost per unit, normal manufacturing cost of a new product, remanufacturing cost of a product, backlog cost per unit, out-of-date cost per unit. S and |S| are a set of all possible states and the total number of states, respectively. Let us number the state, sn as s (¼ 1, . . . , |S|). An undiscounted MDP that minimizes the expected average cost per period, g, is formulated as the following optimality equation: ( ) X Pss0 ðkÞvs0 ðs 2 SÞ; g þ vs ¼ min rðs, kÞ þ ð7Þ k2KðsÞ s0 2S where vs is the relative value when the production system starts from state s (Howard 1960). An optimal production policy is determined as a set of k that minimizes the right-hand side of equation (7) for each state s using a policy iteration method (Howard 1960, Puterman 1994). 4. Numerical results This section shows the numerical example of controlling the remanufacturing system described in section 3. It is assumed that the maximum number of products inventory, Imax, is 5 and that of backlog demand, Imin, is set as 5. The maximum number of the virtual inventory is 10. The cost parameters are as follows: CH ¼ 1, CN ¼ 2, CR ¼ 3, CB ¼ 10 and CO ¼ 10: The distribution of the demand is given by: Q 1 1 Q Pr Dn ¼ D Q þ j ¼ , ð0 j QÞ, j 2 2 where D ¼ 2 and Q is an even number and variance ( 2) is Q/4. ð8Þ 3623 Optimal control of a remanufacturing system 4.1. Optimal control policy One can obtain an optimal control policy that minimizes the expected average cost per period using a policy iteration method. It is assumed that the remanufacturing rate, l, is 0.2 and the discarded rate, , is 0.5. Table 1 shows the optimal control policy in case variance of demand is 0.5. Note that the optimal number of normal production is restricted to the number of remanufacturing production. The minimum expected cost per period, g, is 11.5 under this optimal control policy. 4.2. Effect of the remanufacturing rate and of the variance of the demand Figure 2 shows the behaviour of the minimum costs under two types of variance, i.e. 2 ¼ 0.5 and 1.0. As the remanufacturing rate increases, the minimum cost (5, 0) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (5, 7) (5, 8) (5, 9) (5, 10) (4, 0) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) Action, k (I, J ) 7 6 6 5 5 4 4 3 3 2 2 6 5 5 4 4 3 3 k (I, J ) (4, 7) (4, 8) (4, 9) (4, 10) (3, 0) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (3, 7) (3, 8) (3, 9) (3, 10) (2, 0) (2, 1) (2, 2) 2 2 1 1 6 5 4 4 3 2 2 1 1 0 0 5 4 4 (2, 3) (2, 4) (2, 5) (2, 6) (2, 7) (2, 8) (2, 9) (2, 10) (1, 0) (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (1, 7) (1, 8) (1, 9) Table 1. 3 3 2 1 0 0 0 0 4 3 3 2 2 1 1 0 0 0 (I, J ) k (I, J ) k (I, J ) k (I, J ) k (1, 10) (0, 0) (0, 1) (0, 2) (0, 3) (0, 4) (0, 5) (0, 6) (0, 7) (0, 8) (0, 9) (0, 10) (1, 0) (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) 0 3 2 2 1 1 0 0 0 0 0 0 2 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (1, 6) (1, 7) (1, 8) (1, 9) (1, 10) (2, 0) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (2, 7) (2, 8) (2, 9) (2, 10) (3, 0) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (3, 7) (3, 8) (3, 9) (3, 10) (4, 0) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (4, 7) (4, 8) Optimal control policy. 25.00 Variance=0.5 Variance=1.0 20.00 Minimum cost State (I, J ) 15.00 10.00 5.00 1 2 3 4 5 6 7 Remanufacturing rate Figure 2. Behaviour of minimum cost. 8 9 (4, 9) (4, 10) (5, 0) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (5, 7) (5, 8) (5, 9) (5, 10) 3624 K. Nakashima et al. decreases in each variance case. All the minimum costs in the case 2 ¼ 0.5 are less than those in the case 2 ¼ 1.0. This also illustrates the importance of the smoothing production. 5. Conclusions This paper has dealt with the optimal control problem of the remanufacturing system under stochastic variability. It considered two types of inventories: the actual product inventory in the factory and the virtual inventory used by consumers. The system was formulated into the undiscounted MDP. It obtained the optimal production policy that minimized the expected average cost per period using the policy iteration method. Finally, it showed the examples of controlling the remanufacturing system optimally under various conditions. Numerical results illustrated the property of the optimal control of the remanufacturing system. This means that the proposed approach is applicable to various systems by choosing the parameters according to their conditions. For future research, we suggest that the product lifetime should be considered to model the remanufacturing systems more precisely. Acknowledgement The authors thank anonymous referees for valuable comments and helpful suggestions. References BRENNAN, L., GUPTA, S. M. and TALEB, K. N., 1994, Operations planning issues in an assembly/disassembly environment. International Journal of Operations and Production Management, 14, 57–67. COHEN, M. A., NAHMIAS, S. and PIERSKALLA, W. P., 1980, A dynamic inventory system with recycling. Naval Research Logistics Quarterly, 27, 289–296. GUIDE, JR, V. D. R. and GUPTA, S. M., 1999, A queueing network model for remanufacturing production systems. In Proceedings of the Second International Seminar on Reuse, Eindhoven, the Netherlands, 1–3 March, pp. 115–128. GUNGOR, A. and GUPTA, S. M., 1999, Issues in environmentally conscious manufacturing and product recovery: a survey. Computers and Industrial Engineering, 36, 811–853. GUPTA, S. M. and TALEB, K. N., 1994, Scheduling disassembly. International Journal of Production Research, 32, 1857–1866. HOWARD, R. A., 1960, Dynamic Programming and Markov Processes (Cambridge, MA: MIT Press). INDERFURTH, K., 1997, Simple optimal replenishment and disposal policies for a product recovery system with lead-times. OR Spektrum, 19, 111–122. KIESMULLER, G. P., 2003, Optimal control of a product recovery system with lead time. International Journal of Production Economics, 81–82, 333–340. KOPICKY, R. J., BERG, M. J., LEGG, L., DASAPPA, V. and MAGGIONI, C., 1993, Reuse and Recycling: Reverse Logistics Opportunities (Oak Brook: Council of Logistics Management). MINNER, S., 2001, Strategic safety stocks in reverse logistics supply chains. International Journal of Production economics, 71, 417–428. MOYER, L. and GUPTA, S. M., 1997, Environmental concerns and recycling/disassembly efforts in the electronics industry. Journal of Electronics Manufacturing, 7, 1–22. MUCKSTADT, J. A. and ISAAC, M. H., 1981, An analysis of single item inventory systems with returns. Naval Research Logistics Quarterly, 28, 237–254. NAKASHIMA, K., ARIMITSU, H., NOSE, T. and KURIYAMA, S., 2002, Analysis of a product recovery system. International Journal of Production Research, 40, 3849–3856. OHNO, K. and ICHIKI, K., 1987, Computing optimal policies for controlled tandem queueing systems. Operations Research, 35, 121–126. Optimal control of a remanufacturing system 3625 OHNO, K. and NAKASHIMA, K., 1995, Optimality of a Just-in-Time production system. In Proceedings of APORS’94 (Singapore: World Scientific), pp. 390–398. PUTERMAN, M. L., 1994, Markov Decision Processes (New York: Wiley). STOCK, J. R., 1992, Reverse Logistics (Oak Brook: Council of Logistics Management). VAN DER LAAN, E. A. and SALOMON, M., 1997, Production planning and inventory control with remanufacturing and disposal. European Journal of Operational Research, 102, 264–278.