Evaluating Resilience Strategies Based on an Evolutionary Multi agent System Kazuhiro Minami, Tomoya Tanjo, and Hiroshi Maruyama Institute of Statistical Mathematics, Japan December 4, 2013 CyberneticsCom 2013 We sometimes have an unexpected event • 3.11 earthquake and tunami • 9.11 • Lehman financial shock in 2008 • We cannot completely prevent such disasters • Instead, we should aim to design a system that contains a damage and is readily recoverable to an acceptable level 7/31/2012 Kazuhiro Minami 2 Resilience: Definition “Capacity of a (social-ecological) system to absorb a spectrum of shocks or perturbations and to sustain and develop its fundamental function, structure, identity, and feedbacks as a result of recovery or reorganization in a new context.” -- by Buzz Holling (1973) 7/31/2012 Kazuhiro Minami 3 Resilience = Resistance + Recovery Logstaff et al., “Building Resilient Communities,” Homeland Security Affairs, Vol VI, No.3, 2010 + Taoi-cho, Miyagi Pref. http://www.bousaihaku.com/cgi-bin/hp/index2.cgi?ac1=B742&ac2=&ac3=1574&Page=hpd2_view http://fullload.jp/blog/2011/04/post-265.php 7/31/2012 Kazuhiro Minami 4 Goal: How to make our systems more resilient against large unexpected events? Financial Crisis Malicious Attackers Natural Disasters Civil Infrastructure Financial Systems Engineering Systems Society Organizations New Technologies 5 Biological science might be a major source of wisdom for resilience engineering Redundancy Multiple pathways for metabolism Diversity Adaptability 6 Redundancy and diversity are heavily used techniques in Computer Science • Maintain a backup system in a cloud service – Financial companies was able to continue their services after 9.11 event – Many web sites maintain multiple copies of the server • Software diversity makes it difficult for hackers to compromise multiple servers of the same service – Change compiler options or use different algorithms • Ethernet uses a randomization technique to avoid message collision 7 However, applying those techniques to real-world systems is NOT so trivial • Cost for replication would be high in NON-ICT systems • Replication sometimes decreases the quality of service – Inconsistency of data – Timely monitoring of a system is more difficult; thus need to sacrifice the adaptability of a system • Toyota’s supply chain system put precedence on adaptability over redundancy 8 Multi-agent simulations based on a population genetics model Colony of n agents C: “fit” configurations Resource Each robot has ten binary features (e.g., 2-leg/4-leg, flying/non-flying, …) E.g., <0110111011> Constraint C A Subset of 2(set of all 1,024 configurations) A robot is fit if its configuration is in C • Resource Reserve R – Fit robots contribute to build up R – A robot consumes one unit for reconfiguring its one feature • The colony is resilient if robots can survive a series of changing constraints C1, C2, …, Ct, … 9 Represent a changing environment as a sequence of dynamic constraints fit unfit unfit ` fit fit fit Ct Time t fit unfit Ct+1 Time t+1 10 Need to pay a cost for adaptation Remove Add Resource Unfit System bitstring fit 10110010 10110011 10110011 Adaptation Adaptation An adaptation in our model is much faster than that in biological systems 11 A robot could produce a clone or die • Make a clone – when the amount of the resource is doubled • Die – when the resource is used up 12 Metrics of resilience in our model • Redundancy – How much resource does a robot maintain? • Diversity – Diversity index • Adaptability – How many bits a robot can flip at a time? 13 Multi-agent Simulations • Define initial parameters – – – – – – Population size Bit length of a robot Size and type of constraints Initial amount of each robot’s resource Initial diversity index Adaptation strategy • Random or intelligent • #flips at a time • Run the system at 100 time steps • Examine how a population size, the diversity index vary over time 14 #Agents Diversity at the beginning helps a population survive longer Time Parameter Value Initial population size 100 Agent bit length 8 Constraint size 26 Constraint transition continu ous Adaptation strategy random Adaptation speed 1 15 Two adaptation Strategies 1. Random strategy (flip one bit randomly) Constraint 10110110 2. Intelligent strategy (flip one bit to be closer to the constraint) 16 #Agents If robots adapt intelligently, the population grows much faster Time Time 17 If agents share the common resource, the sustainability of a system can be greatly improved Sudden changes of the constraint Individual resources Sudden changes of the constraint Shared resource 18 Summary • Explore design space parameterized by three resilience properties based on an evolutionary multi-agent system – Redundancy – Diversity – Adaptability • Obtain quantitative initial results regarding design strategies for building resilient systems 19 Future work: Further possibilities for adaptation strategies • Local vs Global – Local: Each robot makes its own decision independently from others – Global: There is a global coordination. Every robot must follow the order – Mixed • Complete vs Incomplete knowledge on C – Complete knowledge: max 10 steps to become fit again – Incomplete knowledge: probabilistic (max 1023 steps if the landscape is stable) 20 Backup 21 We consider three types of constraints 1. Disruptive changes: a new constraint Ct is generated randomly at each time t T = t-1 T=t T = t+1 2. Small changes: a new constraint Ct is generated from Ct-1 by adding a neighbor configuration into Ct-1 or removing a configuration in Ct-1 T = t-1 T=t T = t+1 3. Small changes with continuous topology: Same as case 2, but all configurations in Ct are connected T = t-1 T=t T = t+1 22 Measure diversity considers population abundance of each type where N is the size of a population and pi is the size of an individual i Example 1: if N=5, Pr(`1101’) = 5, then D = 52/52 = 1 Example 2: if N=5, size(`1101’) = 3, and size(`1111’) = 2, then D = 52/32+22 = 25/13 = 1.92 23