Dealing with the chaos monkey Mobile Computing Bruce Scharlau, University of Aberdeen, 2013 Background You have large international service built on top of web services in ‘the cloud’, which you rely upon What happens to your service if they disappear? How will your customers respond? Bruce Scharlau, University of Aberdeen, 2013 We can place data elsewhere on the network Use a web service to store data elsewhere – save photos to flickr, files to some other app in the cloud. Can save files automatically, or at user discretion with time values, etc. (twitter, email apps, or photo capture) Bruce Scharlau, University of Aberdeen, 2013 Amazon Web Services died for several days a few years ago: only one company who used them carried on while others suffered the outage Bruce Scharlau, University of Aberdeen, 2013 Netflix’s chaos monkey saved them They had built a service to create random outages of services they used. This forced them to provide a minimal service despite outages When Amazon went down, they were prepared Bruce Scharlau, University of Aberdeen, 2013 Feed & grow your chaos monkey • How often will remote data be accessed? • How quickly does remote data need to appear? • How often will the data be updated/edited? • Where will minimal data be stored? These answers will suggest solutions for you Bruce Scharlau, University of Aberdeen, 2013 Remote data may not be always needed Depending upon what you put on remote servers depends upon your own product and how it is deployed. These answers will suggest solutions for you Bruce Scharlau, University of Aberdeen, 2013 Remote data may not be instant If remote data is not expected to be instant, then slower servers of your own may suffice for interim periods These answers will suggest solutions for you Bruce Scharlau, University of Aberdeen, 2013 Remote data can be slowly edited Remote data can be staged so that current versions are local and thus can be used when remote services fail These answers will suggest solutions for you Bruce Scharlau, University of Aberdeen, 2013 Storing your own minimal data may be necessary Remote web services help, but are not the only route to success These answers will suggest solutions for you Bruce Scharlau, University of Aberdeen, 2013 All depends upon data storage needs • • • • • • How often will the data be accessed? How quickly does the data need to appear? How often will the data be updated/edited? Will the data be added to over time? Will the data be deleted? How will the data need to be used? These answers will suggest solutions for you Bruce Scharlau, University of Aberdeen, 2013 Use caches to manage data Caches come in different shapes and sizes and some can handle data before it’s written to db Some can hold data while db is changed, etc Bruce Scharlau, University of Aberdeen, 2013 Remove 3rd party dependencies Don’t make your app wait for third party responses before it replies to user Find way to use the 3rd party in asynchronous manner so your speed isn’t determined by their response time Bruce Scharlau, University of Aberdeen, 2013 Separate out functions, etc Keep functions in separate libraries to ease maintenance and development When everything is put in one component it becomes entangled and causes problems with response rates Bruce Scharlau, University of Aberdeen, 2013 Take this further and assume anything could fail Servers die, power fails, things fail. Build your system to withstand this and you’ll do fine You will end up with a resilient infrastructure Bruce Scharlau, University of Aberdeen, 2013 When code is ready then test https://github.com/Netflix/SimianArmy/wiki/Q uick-Start-Guide will guide you Run automatic tests on code, but test code works by randomly stopping services, etc http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html http://techblog.netflix.com/search/label/chaos%20monkey Bruce Scharlau, University of Aberdeen, 2013 Run these tests when suitable staff are available Run these tests when staff expect them so that they can respond accordingly and learn from them Run them on production side so that responses can be organised accordingly Better now than at 3am at the weekend… Bruce Scharlau, University of Aberdeen, 2013 Must be run against production Chaos monkey must be run against production as this is where it counts and where nuances exist that can’t be replicated in test environments All of this fits into larger ‘devops’ approach to development http://www.ibm.com/developerworks/java/library/a-devops1/index.html Bruce Scharlau, University of Aberdeen, 2013 What will happen with your site? Gather with your team and track the app from the handset to the servers and determine what happens if any part breaks What can be preloaded, cached, or stored elsewhere? Bruce Scharlau, University of Aberdeen, 2013