Beer Game Computers Play: Can Artificial Agents Manage Supply Chain?1 Steven O. Kimbrough, D.-J. Wu, Fang Zhong Department of Operations and Information Management The Wharton School University of Pennsylvania and Department of Management Bennett S. LeBow College of Business Drexel University Philadelphia, PA 19104 U.S.A. May, 2000 1 Corresponding author is D.J. Wu, his current address is: 101 North 33rd Street, Academic Building, Philadelphia, PA 19104. Email: [email protected] Abstract In this study, we build an electronic supply chain that is managed by artificial agents. We investigate if artificial agents do better than human beings when playing the MIT “Beer Game” by mitigating the Bullwhip effect or by discovering good and effective business strategies. In particular, we study the following questions: Can agents learn reasonably good policies in the face of deterministic demand with fixed lead-time? Can agents track reasonably well in the face of stochastic demand with stochastic lead-time? Can agents learn and adapt in various contents to play the game? Can agents cooperate across the supply chain? What kind of mechanisms induces coordination, cooperation and information sharing among agents in the electronic supply chain when facing automated electronic markets? KEY WORDS AND PHRASES: Artificial Agents, Beer Game, Supply Chain Management, Business Strategies. 1. Introduction We propose a framework that treats an automated enterprise as an electronic supply chain that consists of various artificial agents. In particular, we view such a virtual enterprise as a MAS. A classical example of supply chain management is the MIT Beer Game, which has attracted great attention from both the supply chain management practitioners as well as academic researchers. There are several types of agents in this chain, for example, the retailer, distributor, wholesaler, and manufacturer. Each self-interested agent tries to achieve its own goal such as minimize its inventory cost by making orders to its supplier. Each agent makes its own prediction on future demand of its customer based on its observation. Due to lack of the incentives of information sharing and bounded rationality of each agent, the observed or experiment performance of real agents (human beings) managing such a supply chain is usually far from optimal from a system-wide point of view (Sterman, 1989). It would be interesting to see if we can gain insights for such a community by using artificial agents to play the role of human beings. This paper intends to provide a first step towards this direction. We differ from current research on supply chains in the OM/OR area, in that the approach taken here is an agent-based, information-processing model, in automated marketplace. While in OM/OR area, the focus is usually on the optimal solutions that are based on some known demand assumptions, and it is generally very difficult to derive and to generalize such optimal policies when the environment changes. 2. Literature review In the management science literature, Lee and Whang (1997) identified four sources of the bullwhip effect and offered counter actions for firms. Chen (1999) shows that the bullwhip effect can be eliminated under base-stock installation policy. This two are both based on the seminal work of Clark and Scarf (1960). A well-known phenomenon when human agents playing this game, is demand information distortion or the “bullwhip effect” across the supply chain from the retailer to the factory agent (Lee, Padmanabhan, and Whang, 1997; Lee and Whang, 1999). Under the assumption that the division managers share a common goal to optimize the overall performance of the supply chain acting as a team - Chen (1999) proves that the “Pass Order,” or “One for one” (“1-1”) strategy - order whatever is ordered from your customer - is optimal. In the information technology literature, the idea of marrying multi-agent systems and supply chain management has been proposed by several researchers (Nissen, 2000; Yung and Yang, 1999). Our approach (Kimbrough, Wu and Zhong, 2000) differs from others in that we focus on agents learning in an electronic supply chain. We developed DragonChain, a general platform for multi-agent supply chain management. We implement the “MIT Beer Game” (Sterman, 1989) to test the performance of DragonChain. 3. Methodology and implementation discussion 4. Results Our main idea is to replace human players with artificial agents by letting DragonChain to play the Beer Game and to see if artificial agents can learn to mitigate the bullwhip effect by discovering good and efficient order policies. Experiment 1. In the first round of our experiment using DragonChain, under deterministic demand as set up in the classroom Beer Game (customer demands 4 cases of beer in the first 4 weeks and then jumps to 8 cases of beer from week 5, but maintains the same demand over time), we fix the policy of three other players (the Wholesaler, Distributor and Factory Manager) to use “1-1” rules, i.e., if current demand is x then order x from my supplier, while letting the Retailer agent to adapt and learn. The goal of this experiment, is to see if the Retailer agent can match the other players’ strategy and learn the “1-1” policy. The Retailer makes orders according to different rules, such as x-1, x-2, or x+1 etc. where x is the customer demand for each week. The rules are expressed in binary strings with 6 digits. The left most one represent the sign of the rule function 0 represents “-” and 1 represents “+“. The rest digits of the string decide how many to add to or subtract from the demand. For example, a rule like 101001 can be translated to “x+9”: 101001 “+” = 9 in decimal So the total searching range is from x-31, x-30,…x-1, x, x+1 to x+31, in which totally 63 rules are included. Whenever x – j < 0, (j = 1…31), we set the order for current week to 0. Genetic algorithm is used here to help the Retailer agent to find better rules to minimize its accumulate cost in 35 weeks. In any generation, the agent will place orders according to every rule in the generation for all the 35 weeks, and get corresponding fitness for each rule. We follow the Fitness Proportion Principle, which means the best rule in the generation has the largest opportunity to be chosen as the parent of the next generation. In such mechanism, the agent learns which rule will be better generation by generation. The execution cycle of the GA used here is described as following: Step 1. GA randomly generates 20 rules, from 000000 to 111111; Step 2. Each rule is assigned absolute fitness according to evaluation function. Step3. The rules in the current generation will be sorted according to their fitness. Step4. Relative fitness is calculated for each rule by the following equation Absolute Fitness – Lowest absolute fitness in the generation Relative Fitness = ------------------------------------------------------------------------Highest absolute fitness – Lowest absolute fitness and Pick up parent rules based on Fitness Proportion Principle and create offsprings. Replace current generation with the newly generated offsprings. Step5. Conduct “mutation” on the new generation according to the mutation rate and return to step 2. There are two types of GA operators used here, Crossover and Mutation: Crossover RULES 110010 001001 101011 000000 111110 011100 110000 000101 Crossover Pt. Offspring 110011 101010 101010 111111 001100 010000 011101 000100     Mutation RULES Mutation Pt. Result 110100 000101   100100 000111 In the evaluation function, the Retailer Agent will place its order according to the selected rules for 35 weeks. At the end of each week, its cost for that week is calculated as following: InventoryPosition = CurrentInventory + NewShipment – CurrentDemand; If InventoryPosition >= 0 CurrentCost = InventoryPosition * 1 Else CurrentCost = InventoryPosition * 2 Where CurrentInventory is the inventory at the beginning of each week, NewShipment is the amount of beer just shipped in, and CurrentDemand is the demand from customers for the current week. Our initial results show that DragonChain can learn the “1-1” policy consistently. With a population size of 20, the Retail agent can find the “1-1” strategy very quickly and converges at that point. We then allow all four agents make their own decisions simultaneously. In this case, the binary strings are longer than those of the previous experiment, because they are composed of the rules for all the agents. So far, the length of the binary strings is 6 * 4 = 24. For example: 110001 001001 101011 000000 where 110001 is the rule for Retailer; 001001 is the rule for Wholesaler; 101011 is the rule for Distributor; 000000 is the rule for Factory. Same crossover and mutation operators as in the previous experiment are used and also each time only one point of the string is select to conduct the crossover or mutation. In the evaluation function, each agent places orders according to its own rule for the 35 weeks and current cost is calculated for each of them separately. But for Wholesaler, Distributor and Factory, their current demand will be the order from its lower level, which are Retailer, Wholesaler and Distributor. The four agents act as a team and their goal is trying to minimize their total cost in 35 weeks. Genetic algorithm is also the key method for the agents to learn and find their optimal strategies. The result is very exciting., again each agent in DragonChain can find and converge to the “1-1” rule separately, as shown in Figure 1. This is certainly much better than human agents (MBA students from a premier business school), as shown in Figure 2. Order vs. Week 9 8 7 6 O rder Retailer 5 WholeSaler Distributer 4 Factory 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Week Figure 1: Artificial agents in DragonChain are able to find the optimal “pass order” policy when playing the MIT Beer Game. Accumulate Cost Comparison of Human Being and Agent 5000 Accumulate Cost 4000 3000 MBA Group1 MBA Group2 MBA Group3 Agent 2000 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Weeks Figure 2: Artificial agents in DragonChain perform much better than MBA students in the MIT Beer Game. Although it is well-known that “1-1” policy is the optimal for the deterministic demand with fixed leadtime, no previous studies have tested whether 1-1 is Nash. We conduct further experiments to test this. The result is that each agent finds 1-1 and it is stable, therefore constitutes the Nash equilibrium. We also conduct experiments to test the robustness of the results by allowing agents to search a much large space. We introduce more agents to the supply chain. In these experiments, all the agents discover “1-1” policy, but it takes longer to find it when the number of agents increases. The following table summarizes the findings. Table 1: Four Agents Five Agents Six Agents 4 11 21 Convergence generation Experiment 2. In the second round of experiment, we test the case of stochastic demand where demand is randomly generated from a known distribution, e.g., uniformly distributed between [0, 15]. We have 16 * 31 = 496 possible rules in total. The goal of this experiment, is to find whether artificial agents can track the trend of demand, whether artificial agent can discover good dynamic order policies. Again we use “1-1” is as a heuristic to benchmark. Our initial experiments show that DragonChain can discover dynamic order strategies (so far, the winning rule seems to “x + 1” for each agent) that outperform “1-1”, as shown in Figure 3. Further more, as shown in Figure 4, notice an interesting and surprising discovery of our experiments is that, when using artificial agents to play the Beer Game, in all test cases, there is no bullwhip effect, or the bullwhip effect disappears! Accumulated Cost vs. Week 5000 Accumulated Cost 4000 3000 Agent Cost 1-1 Cost 2000 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Week Figure 3: When facing stochastic demand and penalty costs for all players as in the MIT Beer Game, artificial agents seem to discover a dynamic order policy that outperforms “1-1’. Order vs. Week 20 18 16 14 Retailer WholeSaler 10 Factory 8 Distributer 6 4 2 Week Figure 4: The bullwhip effect is eliminated when using artificial agents to replace human beings in playing the MIT Beer Game. 35 33 31 29 27 25 23 21 19 17 15 13 11 9 7 5 3 0 1 Order 12 By calculating the variances across the supply chain, we notice that the Bullwhip effect has been eliminated under all test cases, even under the stochastic customer demand. Table 2: Variance comparison for experiment 2. Variance Retailer Wholesaler Distributor Manufacturer 26.0790 26.1345 25.2571 26.8671 Like experiment 1, we conduct a few more experiments to test the results of experiment 2. First, the Nash property remains under stochastic demand. Second we increase game period from 35 weeks to 100 weeks, and see if agents can beat ‘1-1’ constantly. Again, the customer demand is uniformly distributed between [0, 15]. Agents find new strategies (x+1, x, x + 1, x) than outperform the ‘1-1’ policy. The agents are smart enough to adopt this new strategy to take advantage of the end effect of the game. Third, we test the statistical validity by rerunning the experiment with different random seed numbers. The results are fairly robust. Experiment 3. In the third round of experiment, we changed lead-time from fixed 2 weeks to uniformly distributed from 0 to 4. So now we are facing both stochastic demand and lead-time. The length of rule strings is also 6 * 4 = 24. The evaluation function is changed in this experiment, because when the ordered beer be shipped in will depend on the different lead-time of each week. For example, in the 12th week, if Wholesaler Agent orders 20 unit of beer to Distributor Agent and the lead-time is 3 weeks for that week. Then the 20 unit beer should be shipped in week 16 according to the equation: ShippingInWeek = WeekOfOrder + 1 + Lead-time We add “1” to the right part of the equation, because the order is passed to the upper level agent in the next week. Following the steps we described in the first experiment, the agents find better strategies within 30 generations. The best rules they found so far are: Retailer: x+1 Wholesaler: x+1 Distributor: x+2 Factory: x+1 And the total cost of 35 weeks is 2555, which is much lesser than 7463 by “1-1” policy. In the appendix, we show the evolution of effective adaptive supply chain management policies when facing stochastic customer demand and stochastic lead-time. We further test the stability of these strategies discovered by agents, and find that they are stable, thus constitute a Nash equilibrium. Order vs. Week 18 16 14 Order 12 10 Customer Demand Retailer 8 6 4 2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 Week Figure 6: Artificial Retailer agent is able to track customer demand reasonably well when facing stochastic customer demand. Orders vs. Weeks 25 20 15 Orders Retailor Order WholeSaler Order Distributer Order Factory Order 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Weeks Figure 7: Artificial agents mitigate the bullwhip effect when facing stochastic demand and stochastic lead time. Accumulated Cost vs. Week 8000 7000 6000 Ac c um u la te d Cos t 5000 1-1 cost 4000 Agent cost 3000 2000 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Week Figure 9: Artificial agents seem to be able to discover dynamic ordering policies that outperform the “1-1” policy when facing stochastic demand and stochastic lead time. Table 3: Variance comparison for experiment 3: No Bullwhip effect. Variance Retailer Wholesaler Distributor Manufacturer 26.0790 26.1345 25.2571 26.8671 Table 4: The evolution of effective adaptive supply chain management policies when facing stochastic customer demand and stochastic lead-time. Strategies Generation Total Cost Retailer Wholesaler Distributor Manufacturer 0 x–0 x–1 x+4 x+2 7380 1 x+3 x–2 x+2 x+5 7856 2 x–0 x+5 x+6 x+3 6987 3 x–1 x+5 x+2 x+3 6137 4 x+0 x+5 x–0 x–2 6129 5 x+3 x+1 x+ 2 x+3 3886 6 x–0 x+1 x+2 x+0 3071 7 x+2 x+1 x+2 x+ 1 2694 8 x+1 x+1 x+2 x+1 2555 9 x+1 x+1 x+2 x+1 2555 10 x+1 x+1 x+2 x+1 2555 11 x+1 x+1 x+2 x+1 2555 12 x+1 x+1 x+2 x+1 2555 13 x+1 x+1 x+2 x+1 2555 14 x+1 x+1 x+2 x+1 2555 15 x+1 x+1 x+2 x+1 2555 16 x+1 x+1 x+2 x+1 2555 17 x+1 x+1 x+2 x+1 2555 18 x+1 x+1 x+2 x+1 2555 19 x+1 x+1 x+2 x+1 2555 20 x+1 x+1 x+2 x+1 2555 3. The Columbia Beer Game The Columbia Beer Game is a modified version of MIT beer game (Chen, 1999). The following table compares major differences between the two. MIT Beer Game Columbia Beer Game Customer Demand Deterministic Stochastic Information Delay 0 Fixed Physical Delay Fixed Fixed Penalty Cost All agents have penalty costs Only retailer incurs penalty cost Players all have the same Players have decreasing costs holding costs upstream the supply chain. Holding Cost All players have no knowledge Information All players know the distribution about the distribution of Acknowledgement of customer demand. customer demand. Now our experiment follows the settings of the Columbia Beer Game, except that our agents have no knowledge about the distribution of customer demand. We specifically set the information leadtime to (2, 2, 2, 0), physical leadtime to (2, 2, 2, 3) and customer demand is normally distributed with mean of 50 and standard deviation of 10. Other initial conditions are set as in Chen (1999). Since there is additional information delay in Columbia Beer Game, we consider it into our rules too. Two more bits are added into the rule bit-string to represent the information delay. For example, if the Retailer has a rule of 10001101, it means “if demand is x in week t-1, then order x+3”, where t is the current week: 10001101 “+” 3 “ t – 1” Now, the master rule for the whole team has the length of 8 * 4 = 32. Under the same learning mechanism as that of the previous experiments, agents find the optimal policy in the literature (Chen, 1999): order whatever amount is ordered with time shift, I.e., Q1 = D (t-1), Qi = Qi-1 (t – l). 4. Summary and further research This study makes the following contributions: (1) first and foremost, we test the concept of an intelligent supply chain that is management by artificial agents. Agents are capable of playing the Beer Game by tracking demand; eliminating the Bullwhip effect; discovering the optimal policies if exist; finding good policies under complex scenarios where analytical solutions are not available. (2) such an electronic supply chain is adaptable to an ever-changing business environment. The findings of this research could have high impact on the development of next generation ERP or multi-agent soft enterprise computing as depicted in Figure 7. Ongoing research includes investigation on the value of information sharing in supply chain; the emergence of trust in supply chain; coordination, cooperation, bargaining, and negotiation in multi-agent systems; and the study of alternative agent learning mechanisms such as classifier systems. Pricing Agent Investment Agent Executive Community (StrategyFinder) Production Community (LivingFactory) Supply Chain Community (DragonChain) Factory Agent Distributor Agent E-Marketplace Community (eBAC) Retailer Agent Wholesaler Agent Bidding Agent Contracting Agent Auction Agent Figure 7: A framework for multi-agent intelligent enterprise modeling. 5. References 1. Chen, F., “Decentralized supply chains subject to information delays,” Management Science, 45, 8 (August 1999), 1076-1090. 2. Kleindorfer, P., Wu, D.J., and Fernando, C. “Strategic gaming and the evolving electric power market”, forthcoming in, European Journal of Operational Research, 2000. 3. Lee, H., Padmanabhan, P., and Whang, S. “Information distortion in a supply chain: The bullwhip effect”, Management Science, 43, 546-558, 1997. 4. Lee, H., and Whang, S. “Decentralized multi-echelon supply chains: Incentives and information”, Management Science, 45, 633-640, 1999. 5. Strader, T., Lin, F., and Shaw, M. “Simulation of order fulfillment in divergent assembly supply chains”, Journal of Artificial Societies and Social Simulations, 1, 2, 1998. 6. Kimbrough, S., “Introduction to a Selectionist Theory of Metaplanning,” Working Paper, The Wharton School, University of Pennsylvania, http://grace.wharton.upenn.edu/\~~sok. 7. Lin, F., Tan, G., and Shaw, M. “Multi-agent enterprise modeling”, forthcoming in, Journal of Organizational Computing and Electronic Commerce. 8. Maes, P., Guttman, R., and Moukas, A. “Agents that buy and sell: Transforming commerce as we know it”, Communications of the ACM, March, 81-87, 1999. 9. Marschak, J., R. Radner. 1972. Economic Theory of Teams. Yale University Press, New Haven, CT. 10. Nissen, M. “Supply chain process and agent design for e-commerce”, in: R.H. Sprague, Jr., (Ed.), Proceedings of The 33rd Annual Hawaii International Conference on System Sciences, IEEE Computer Society Press, Los Alamitos, California, 2000. 11. Robinson, W., and Elofson, G. “Electronic broker impacts on the value of postponement”, in: R.H. Sprague, Jr., (Ed.), Proceedings of The 33rd Annual Hawaii International Conference on System Sciences, IEEE Computer Society Press, Los Alamitos, California, 2000. 12. Sikora, R., and Shaw, M. “A multi-agent framework for the coordination and integration of information systems”, Management Science, 40, 11 (November 1998), S65-S78. 13. Sterman, J. “Modeling managerial behavior: Misperceptions of feedback in a dynamic decision making experiment”, Management Science, 35, 321-339, 1989. 14. Weiss (ed.), Multi-agent Systems: A Modern Approach to Distributed Artificial Intelligence, Cambridge, MA: MIT Press, 1999. 15. Weinhardt, C., and Gomber, P. “Agent-mediated off-exchange trading”, in: R.H. Sprague, Jr., (Ed.), Proceedings of The 33rd Annual Hawaii International Conference on System Sciences, IEEE Computer Society Press, Los Alamitos, California, 1999. 16. Wooldridge, M. “Intelligent Agents”, in G. Weiss (ed.), Multi-agent Systems: A Modern Approach to Distributed Artificial Intelligence, Cambridge, MA: MIT Press, 27-77, 1999. 17. Wu, D.-J. “Discovering near-optimal pricing strategies for the deregulated electronic power marketplace using genetic algorithms”, Decision Support Systems, 27, 1/2, 25-45, 1999b. 18. Wu, D.-J. “Artificial agents for discovering business strategies for network industries”, forthcoming in, International Journal of Electronic Commerce, 5, 1 (Fall 2000b). 19. Wu, D.J., and Sun, Y. “ Multi-agent bidding, auction and contracting support systems”, Working Paper, LeBow College of Business, Drexel University, April, 2000. 20. Yan, Y., Yen, J., and Bui, T. “A multi-agent based negotiation support system for distributed transmission cost allocation”, in: R.H. Sprague, Jr., (Ed.), Proceedings of The 33rd Annual Hawaii International Conference on System Sciences, IEEE Computer Society Press, Los Alamitos, California, 2000. 21. Yung, S., and Yang, C. “A new approach to solve supply chain management problem by integrating multi-agent technology and constraint network”, in: R.H. Sprague, Jr., (Ed.), Proceedings of The 32nd Annual Hawaii International Conference on System Sciences, IEEE Computer Society Press, Los Alamitos, California, 1999.