To study the performance of the intelligent agents in our iSCM

advertisement
Beer Game Computers Play:
Can Artificial Agents Manage Supply Chain?1
Steven O. Kimbrough, D.-J. Wu, Fang Zhong
Department of Operations and Information Management
The Wharton School
University of Pennsylvania
and
Department of Management
Bennett S. LeBow College of Business
Drexel University
Philadelphia, PA 19104
U.S.A.
May, 2000
1
Corresponding author is D.J. Wu, his current address is: 101 North 33rd Street, Academic Building,
Philadelphia, PA 19104. Email: [email protected]
Abstract
In this study, we build an electronic supply chain that is managed by artificial agents. We
investigate if artificial agents do better than human beings when playing the MIT “Beer
Game” by mitigating the Bullwhip effect or by discovering good and effective business
strategies. In particular, we study the following questions: Can agents learn reasonably
good policies in the face of deterministic demand with fixed lead-time? Can agents track
reasonably well in the face of stochastic demand with stochastic lead-time? Can agents
learn and adapt in various contents to play the game? Can agents cooperate across the
supply chain? What kind of mechanisms induces coordination, cooperation and
information sharing among agents in the electronic supply chain when facing automated
electronic markets?
KEY WORDS AND PHRASES: Artificial Agents, Beer Game, Supply Chain
Management, Business Strategies.
1. Introduction
We propose a framework that treats an automated enterprise as an electronic
supply chain that consists of various artificial agents. In particular, we view such a virtual
enterprise as a MAS. A classical example of supply chain management is the MIT Beer
Game, which has attracted great attention from both the supply chain management
practitioners as well as academic researchers.
There are several types of agents in this chain, for example, the retailer,
distributor, wholesaler, and manufacturer. Each self-interested agent tries to achieve its
own goal such as minimize its inventory cost by making orders to its supplier. Each agent
makes its own prediction on future demand of its customer based on its observation. Due
to lack of the incentives of information sharing and bounded rationality of each agent, the
observed or experiment performance of real agents (human beings) managing such a
supply chain is usually far from optimal from a system-wide point of view (Sterman,
1989). It would be interesting to see if we can gain insights for such a community by
using artificial agents to play the role of human beings.
This paper intends to provide a first step towards this direction. We differ from
current research on supply chains in the OM/OR area, in that the approach taken here is
an agent-based, information-processing model, in automated marketplace. While in
OM/OR area, the focus is usually on the optimal solutions that are based on some known
demand assumptions, and it is generally very difficult to derive and to generalize such
optimal policies when the environment changes.
2. Literature review
In the management science literature, Lee and Whang (1997) identified four sources
of the bullwhip effect and offered counter actions for firms. Chen (1999) shows that the
bullwhip effect can be eliminated under base-stock installation policy. This two are both
based on the seminal work of Clark and Scarf (1960). A well-known phenomenon when
human agents playing this game, is demand information distortion or the “bullwhip
effect” across the supply chain from the retailer to the factory agent (Lee, Padmanabhan,
and Whang, 1997; Lee and Whang, 1999). Under the assumption that the division
managers share a common goal to optimize the overall performance of the supply chain acting as a team - Chen (1999) proves that the “Pass Order,” or “One for one” (“1-1”)
strategy - order whatever is ordered from your customer - is optimal.
In the information technology literature, the idea of marrying multi-agent systems and
supply chain management has been proposed by several researchers (Nissen, 2000; Yung
and Yang, 1999). Our approach (Kimbrough, Wu and Zhong, 2000) differs from others in
that we focus on agents learning in an electronic supply chain. We developed
DragonChain, a general platform for multi-agent supply chain management. We
implement the “MIT Beer Game” (Sterman, 1989) to test the performance of
DragonChain.
3. Methodology and implementation discussion
4. Results
Our main idea is to replace human players with artificial agents by letting DragonChain
to play the Beer Game and to see if artificial agents can learn to mitigate the bullwhip
effect by discovering good and efficient order policies.
Experiment 1. In the first round of our experiment using DragonChain, under
deterministic demand as set up in the classroom Beer Game (customer demands 4 cases
of beer in the first 4 weeks and then jumps to 8 cases of beer from week 5, but maintains
the same demand over time), we fix the policy of three other players (the Wholesaler,
Distributor and Factory Manager) to use “1-1” rules, i.e., if current demand is x then
order x from my supplier, while letting the Retailer agent to adapt and learn. The goal of
this experiment, is to see if the Retailer agent can match the other players’ strategy and
learn the “1-1” policy.
The Retailer makes orders according to different rules, such as x-1, x-2, or x+1 etc.
where x is the customer demand for each week. The rules are expressed in binary strings
with 6 digits. The left most one represent the sign of the rule function 0 represents “-” and
1 represents “+“. The rest digits of the string decide how many to add to or subtract from
the demand. For example, a rule like 101001 can be translated to “x+9”:
101001
“+”
= 9 in decimal
So the total searching range is from x-31, x-30,…x-1, x, x+1 to x+31, in which totally 63
rules are included. Whenever x – j < 0, (j = 1…31), we set the order for current week to
0. Genetic algorithm is used here to help the Retailer agent to find better rules to
minimize its accumulate cost in 35 weeks. In any generation, the agent will place orders
according to every rule in the generation for all the 35 weeks, and get corresponding
fitness for each rule. We follow the Fitness Proportion Principle, which means the best
rule in the generation has the largest opportunity to be chosen as the parent of the next
generation. In such mechanism, the agent learns which rule will be better generation by
generation.
The execution cycle of the GA used here is described as following:
Step 1. GA randomly generates 20 rules, from 000000 to 111111;
Step 2. Each rule is assigned absolute fitness according to evaluation function.
Step3. The rules in the current generation will be sorted according to their fitness.
Step4. Relative fitness is calculated for each rule by the following equation
Absolute Fitness – Lowest absolute fitness in the
generation
Relative Fitness =
------------------------------------------------------------------------Highest absolute fitness – Lowest absolute fitness
and Pick up parent rules based on Fitness Proportion Principle and create
offsprings. Replace current generation with the newly generated offsprings.
Step5. Conduct “mutation” on the new generation according to the mutation rate
and return to step 2.
There are two types of GA operators used here, Crossover and Mutation:
Crossover
RULES
110010
001001
101011
000000
111110
011100
110000
000101
Crossover Pt.
Offspring
110011
101010
101010
111111
001100
010000
011101
000100
[3]
[5]
[2]
[6]
Mutation
RULES
Mutation Pt.
Result
110100
000101
[2]
[5]
100100
000111
In the evaluation function, the Retailer Agent will place its order according to the
selected rules for 35 weeks. At the end of each week, its cost for that week is calculated
as following:
InventoryPosition = CurrentInventory + NewShipment – CurrentDemand;
If InventoryPosition >= 0
CurrentCost = InventoryPosition * 1
Else
CurrentCost = InventoryPosition * 2
Where CurrentInventory is the inventory at the beginning of each week, NewShipment is
the amount of beer just shipped in, and CurrentDemand is the demand from customers for
the current week.
Our initial results show that DragonChain can learn the “1-1” policy consistently.
With a population size of 20, the Retail agent can find the “1-1” strategy very quickly and
converges at that point.
We then allow all four agents make their own decisions simultaneously. In this
case, the binary strings are longer than those of the previous experiment, because they are
composed of the rules for all the agents. So far, the length of the binary strings is 6 * 4 =
24. For example:
110001 001001 101011 000000
where 110001 is the rule for Retailer;
001001 is the rule for Wholesaler;
101011 is the rule for Distributor;
000000 is the rule for Factory.
Same crossover and mutation operators as in the previous experiment are used and
also each time only one point of the string is select to conduct the crossover or mutation.
In the evaluation function, each agent places orders according to its own rule for the 35
weeks and current cost is calculated for each of them separately. But for Wholesaler,
Distributor and Factory, their current demand will be the order from its lower level,
which are Retailer, Wholesaler and Distributor. The four agents act as a team and their
goal is trying to minimize their total cost in 35 weeks. Genetic algorithm is also the key
method for the agents to learn and find their optimal strategies. The result is very
exciting., again each agent in DragonChain can find and converge to the “1-1” rule
separately, as shown in Figure 1. This is certainly much better than human agents (MBA
students from a premier business school), as shown in Figure 2.
Order vs. Week
9
8
7
6
O rder
Retailer
5
WholeSaler
Distributer
4
Factory
3
2
1
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Week
Figure 1: Artificial agents in DragonChain are able to find
the optimal “pass order” policy when playing the MIT Beer
Game.
Accumulate Cost Comparison of Human Being and Agent
5000
Accumulate Cost
4000
3000
MBA Group1
MBA Group2
MBA Group3
Agent
2000
1000
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Weeks
Figure 2: Artificial agents in DragonChain perform much better
than MBA students in the MIT Beer Game.
Although it is well-known that “1-1” policy is the optimal for the deterministic
demand with fixed leadtime, no previous studies have tested whether 1-1 is Nash. We
conduct further experiments to test this. The result is that each agent finds 1-1 and it is
stable, therefore constitutes the Nash equilibrium. We also conduct experiments to test
the robustness of the results by allowing agents to search a much large space. We
introduce more agents to the supply chain. In these experiments, all the agents discover
“1-1” policy, but it takes longer to find it when the number of agents increases. The
following table summarizes the findings.
Table 1:
Four Agents
Five Agents
Six Agents
4
11
21
Convergence
generation
Experiment 2. In the second round of experiment, we test the case of stochastic
demand where demand is randomly generated from a known distribution, e.g., uniformly
distributed between [0, 15]. We have 16 * 31 = 496 possible rules in total. The goal of
this experiment, is to find whether artificial agents can track the trend of demand,
whether artificial agent can discover good dynamic order policies. Again we use “1-1”
is as a heuristic to benchmark. Our initial experiments show that DragonChain can
discover dynamic order strategies (so far, the winning rule seems to “x + 1” for each
agent) that outperform “1-1”, as shown in Figure 3. Further more, as shown in Figure 4,
notice an interesting and surprising discovery of our experiments is that, when using
artificial agents to play the Beer Game, in all test cases, there is no bullwhip effect, or the
bullwhip effect disappears!
Accumulated Cost vs. Week
5000
Accumulated Cost
4000
3000
Agent Cost
1-1 Cost
2000
1000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Week
Figure 3: When facing stochastic demand and penalty costs for
all players as in the MIT Beer Game, artificial agents seem to
discover a dynamic order policy that outperforms “1-1’.
Order vs. Week
20
18
16
14
Retailer
WholeSaler
10
Factory
8
Distributer
6
4
2
Week
Figure 4: The bullwhip effect is eliminated when using
artificial agents to replace human beings in playing the
MIT Beer Game.
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
0
1
Order
12
By calculating the variances across the supply chain, we notice that the Bullwhip
effect has been eliminated under all test cases, even under the stochastic customer
demand.
Table 2: Variance comparison for experiment 2.
Variance
Retailer
Wholesaler
Distributor
Manufacturer
26.0790
26.1345
25.2571
26.8671
Like experiment 1, we conduct a few more experiments to test the results of
experiment 2. First, the Nash property remains under stochastic demand. Second we
increase game period from 35 weeks to 100 weeks, and see if agents can beat ‘1-1’
constantly. Again, the customer demand is uniformly distributed between [0, 15]. Agents
find new strategies (x+1, x, x + 1, x) than outperform the ‘1-1’ policy. The agents are
smart enough to adopt this new strategy to take advantage of the end effect of the game.
Third, we test the statistical validity by rerunning the experiment with different
random seed numbers. The results are fairly robust.
Experiment 3. In the third round of experiment, we changed lead-time from fixed 2
weeks to uniformly distributed from 0 to 4. So now we are facing both stochastic demand
and lead-time. The length of rule strings is also 6 * 4 = 24. The evaluation function is
changed in this experiment, because when the ordered beer be shipped in will depend on
the different lead-time of each week. For example, in the 12th week, if Wholesaler Agent
orders 20 unit of beer to Distributor Agent and the lead-time is 3 weeks for that week.
Then the 20 unit beer should be shipped in week 16 according to the equation:
ShippingInWeek = WeekOfOrder + 1 + Lead-time
We add “1” to the right part of the equation, because the order is passed to the upper level
agent in the next week.
Following the steps we described in the first experiment, the agents find better
strategies within 30 generations. The best rules they found so far are:
Retailer:
x+1
Wholesaler:
x+1
Distributor:
x+2
Factory:
x+1
And the total cost of 35 weeks is 2555, which is much lesser than 7463 by “1-1” policy.
In the appendix, we show the evolution of effective adaptive supply chain management
policies when facing stochastic customer demand and stochastic lead-time. We further
test the stability of these strategies discovered by agents, and find that they are stable,
thus constitute a Nash equilibrium.
Order vs. Week
18
16
14
Order
12
10
Customer Demand
Retailer
8
6
4
2
0
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
Week
Figure 6: Artificial Retailer agent is able to track customer demand reasonably well
when facing stochastic customer demand.
Orders vs. Weeks
25
20
15
Orders
Retailor Order
WholeSaler Order
Distributer Order
Factory Order
10
5
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Weeks
Figure 7: Artificial agents mitigate the bullwhip effect when facing stochastic
demand and stochastic lead time.
Accumulated Cost vs. Week
8000
7000
6000
Ac c um u la te d Cos t
5000
1-1 cost
4000
Agent cost
3000
2000
1000
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Week
Figure 9: Artificial agents seem to be able to discover dynamic ordering policies that
outperform the “1-1” policy when facing stochastic demand and stochastic lead
time.
Table 3: Variance comparison for experiment 3: No Bullwhip effect.
Variance
Retailer
Wholesaler
Distributor
Manufacturer
26.0790
26.1345
25.2571
26.8671
Table 4: The evolution of effective adaptive supply chain management policies when facing stochastic
customer demand and stochastic lead-time.
Strategies
Generation
Total Cost
Retailer
Wholesaler
Distributor
Manufacturer
0
x–0
x–1
x+4
x+2
7380
1
x+3
x–2
x+2
x+5
7856
2
x–0
x+5
x+6
x+3
6987
3
x–1
x+5
x+2
x+3
6137
4
x+0
x+5
x–0
x–2
6129
5
x+3
x+1
x+ 2
x+3
3886
6
x–0
x+1
x+2
x+0
3071
7
x+2
x+1
x+2
x+ 1
2694
8
x+1
x+1
x+2
x+1
2555
9
x+1
x+1
x+2
x+1
2555
10
x+1
x+1
x+2
x+1
2555
11
x+1
x+1
x+2
x+1
2555
12
x+1
x+1
x+2
x+1
2555
13
x+1
x+1
x+2
x+1
2555
14
x+1
x+1
x+2
x+1
2555
15
x+1
x+1
x+2
x+1
2555
16
x+1
x+1
x+2
x+1
2555
17
x+1
x+1
x+2
x+1
2555
18
x+1
x+1
x+2
x+1
2555
19
x+1
x+1
x+2
x+1
2555
20
x+1
x+1
x+2
x+1
2555
3. The Columbia Beer Game
The Columbia Beer Game is a modified version of MIT beer game (Chen, 1999). The
following table compares major differences between the two.
MIT Beer Game
Columbia Beer Game
Customer Demand
Deterministic
Stochastic
Information Delay
0
Fixed
Physical Delay
Fixed
Fixed
Penalty Cost
All agents have penalty costs
Only retailer incurs penalty cost
Players all have the same
Players have decreasing costs
holding costs
upstream the supply chain.
Holding Cost
All players have no knowledge
Information
All players know the distribution
about the distribution of
Acknowledgement
of customer demand.
customer demand.
Now our experiment follows the settings of the Columbia Beer Game, except that our
agents have no knowledge about the distribution of customer demand. We specifically set
the information leadtime to (2, 2, 2, 0), physical leadtime to (2, 2, 2, 3) and customer
demand is normally distributed with mean of 50 and standard deviation of 10. Other
initial conditions are set as in Chen (1999).
Since there is additional information delay in Columbia Beer Game, we consider it
into our rules too. Two more bits are added into the rule bit-string to represent the
information delay. For example, if the Retailer has a rule of 10001101, it means “if
demand is x in week t-1, then order x+3”, where t is the current week:
10001101
“+”
3
“ t – 1”
Now, the master rule for the whole team has the length of 8 * 4 = 32.
Under the same learning mechanism as that of the previous experiments, agents find
the optimal policy in the literature (Chen, 1999): order whatever amount is ordered with
time shift, I.e., Q1 = D (t-1), Qi = Qi-1 (t – l).
4. Summary and further research
This study makes the following contributions: (1) first and foremost, we test the concept
of an intelligent supply chain that is management by artificial agents. Agents are capable
of playing the Beer Game by tracking demand; eliminating the Bullwhip effect;
discovering the optimal policies if exist; finding good policies under complex scenarios
where analytical solutions are not available. (2) such an electronic supply chain is
adaptable to an ever-changing business environment. The findings of this research could
have high impact on the development of next generation ERP or multi-agent soft
enterprise computing as depicted in Figure 7.
Ongoing research includes investigation on the value of information sharing in supply
chain; the emergence of trust in supply chain; coordination, cooperation, bargaining, and
negotiation in multi-agent systems; and the study of alternative agent learning
mechanisms such as classifier systems.
Pricing
Agent
Investment
Agent
Executive Community
(StrategyFinder)
Production
Community
(LivingFactory)
Supply Chain
Community
(DragonChain)
Factory
Agent
Distributor
Agent
E-Marketplace
Community
(eBAC)
Retailer
Agent
Wholesaler
Agent
Bidding
Agent
Contracting
Agent
Auction
Agent
Figure 7: A framework for multi-agent intelligent enterprise
modeling.
5. References
1. Chen, F., “Decentralized supply chains subject to information delays,”
Management Science, 45, 8 (August 1999), 1076-1090.
2. Kleindorfer, P., Wu, D.J., and Fernando, C. “Strategic gaming and the evolving
electric power market”, forthcoming in, European Journal of Operational
Research, 2000.
3. Lee, H., Padmanabhan, P., and Whang, S. “Information distortion in a supply
chain: The bullwhip effect”, Management Science, 43, 546-558, 1997.
4. Lee, H., and Whang, S. “Decentralized multi-echelon supply chains: Incentives
and information”, Management Science, 45, 633-640, 1999.
5. Strader, T., Lin, F., and Shaw, M. “Simulation of order fulfillment in divergent
assembly supply chains”, Journal of Artificial Societies and Social Simulations, 1,
2, 1998.
6. Kimbrough, S., “Introduction to a Selectionist Theory of Metaplanning,” Working
Paper,
The
Wharton
School,
University
of
Pennsylvania,
http://grace.wharton.upenn.edu/\~~sok.
7. Lin, F., Tan, G., and Shaw, M. “Multi-agent enterprise modeling”, forthcoming
in, Journal of Organizational Computing and Electronic Commerce.
8. Maes, P., Guttman, R., and Moukas, A. “Agents that buy and sell: Transforming
commerce as we know it”, Communications of the ACM, March, 81-87, 1999.
9. Marschak, J., R. Radner. 1972. Economic Theory of Teams. Yale University
Press, New Haven, CT.
10. Nissen, M. “Supply chain process and agent design for e-commerce”, in: R.H.
Sprague, Jr., (Ed.), Proceedings of The 33rd Annual Hawaii International
Conference on System Sciences, IEEE Computer Society Press, Los Alamitos,
California, 2000.
11. Robinson, W., and Elofson, G. “Electronic broker impacts on the value of
postponement”, in: R.H. Sprague, Jr., (Ed.), Proceedings of The 33rd Annual
Hawaii International Conference on System Sciences, IEEE Computer Society
Press, Los Alamitos, California, 2000.
12. Sikora, R., and Shaw, M. “A multi-agent framework for the coordination and
integration of information systems”, Management Science, 40, 11 (November
1998), S65-S78.
13. Sterman, J. “Modeling managerial behavior: Misperceptions of feedback in a
dynamic decision making experiment”, Management Science, 35, 321-339, 1989.
14. Weiss (ed.), Multi-agent Systems: A Modern Approach to Distributed Artificial
Intelligence, Cambridge, MA: MIT Press, 1999.
15. Weinhardt, C., and Gomber, P. “Agent-mediated off-exchange trading”, in: R.H.
Sprague, Jr., (Ed.), Proceedings of The 33rd Annual Hawaii International
Conference on System Sciences, IEEE Computer Society Press, Los Alamitos,
California, 1999.
16. Wooldridge, M. “Intelligent Agents”, in G. Weiss (ed.), Multi-agent Systems: A
Modern Approach to Distributed Artificial Intelligence, Cambridge, MA: MIT
Press, 27-77, 1999.
17. Wu, D.-J. “Discovering near-optimal pricing strategies for the deregulated
electronic power marketplace using genetic algorithms”, Decision Support
Systems, 27, 1/2, 25-45, 1999b.
18. Wu, D.-J. “Artificial agents for discovering business strategies for network
industries”, forthcoming in, International Journal of Electronic Commerce, 5, 1
(Fall 2000b).
19. Wu, D.J., and Sun, Y. “ Multi-agent bidding, auction and contracting support
systems”, Working Paper, LeBow College of Business, Drexel University, April,
2000.
20. Yan, Y., Yen, J., and Bui, T. “A multi-agent based negotiation support system for
distributed transmission cost allocation”, in: R.H. Sprague, Jr., (Ed.), Proceedings
of The 33rd Annual Hawaii International Conference on System Sciences, IEEE
Computer Society Press, Los Alamitos, California, 2000.
21. Yung, S., and Yang, C. “A new approach to solve supply chain management
problem by integrating multi-agent technology and constraint network”, in: R.H.
Sprague, Jr., (Ed.), Proceedings of The 32nd Annual Hawaii International
Conference on System Sciences, IEEE Computer Society Press, Los Alamitos,
California, 1999.
Download