chaos monkey - Homepages | The University of Aberdeen

advertisement
Dealing with the chaos monkey
Mobile Computing
Bruce Scharlau, University of Aberdeen, 2013
Background
You have large international service built on
top of web services in ‘the cloud’, which you
rely upon
What happens to your service if they
disappear? How will your customers respond?
Bruce Scharlau, University of Aberdeen, 2013
We can place data elsewhere on
the network
Use a web service to store data elsewhere –
save photos to flickr, files to some other
app in the cloud.
Can save files automatically, or at user
discretion with time values, etc. (twitter,
email apps, or photo capture)
Bruce Scharlau, University of Aberdeen, 2013
Amazon Web Services died for
several days a few years ago:
only one company who used
them carried on while others
suffered the outage
Bruce Scharlau, University of Aberdeen, 2013
Netflix’s chaos monkey saved them
They had built a service to create random
outages of services they used.
This forced them to provide a
minimal service despite outages
When Amazon went down, they were prepared
Bruce Scharlau, University of Aberdeen, 2013
Feed & grow your chaos monkey
• How often will remote data be accessed?
• How quickly does remote data need to
appear?
• How often will the data be updated/edited?
• Where will minimal data be stored?
These answers will suggest solutions for you
Bruce Scharlau, University of Aberdeen, 2013
Remote data may not be always
needed
Depending upon what you put on remote
servers depends upon your own product and
how it is deployed.
These answers will suggest solutions for you
Bruce Scharlau, University of Aberdeen, 2013
Remote data may not be instant
If remote data is not expected to be instant, then
slower servers of your own may suffice for
interim periods
These answers will suggest solutions for you
Bruce Scharlau, University of Aberdeen, 2013
Remote data can be slowly edited
Remote data can be staged so that current
versions are local and thus can be used when
remote services fail
These answers will suggest solutions for you
Bruce Scharlau, University of Aberdeen, 2013
Storing your own minimal data
may be necessary
Remote web services help, but are not the only
route to success
These answers will suggest solutions for you
Bruce Scharlau, University of Aberdeen, 2013
All depends upon data storage
needs
•
•
•
•
•
•
How often will the data be accessed?
How quickly does the data need to appear?
How often will the data be updated/edited?
Will the data be added to over time?
Will the data be deleted?
How will the data need to be used?
These answers will suggest solutions for you
Bruce Scharlau, University of Aberdeen, 2013
Use caches to manage data
Caches come in different shapes and sizes
and some can handle data before it’s written
to db
Some can hold data while db is changed,
etc
Bruce Scharlau, University of Aberdeen, 2013
Remove 3rd party dependencies
Don’t make your app wait for third party
responses before it replies to user
Find way to use the 3rd party in asynchronous
manner so your speed isn’t determined by
their response time
Bruce Scharlau, University of Aberdeen, 2013
Separate out functions, etc
Keep functions in separate libraries to ease
maintenance and development
When everything is put in one component it
becomes entangled and causes problems
with response rates
Bruce Scharlau, University of Aberdeen, 2013
Take this further and assume
anything could fail
Servers die, power fails, things fail.
Build your system to withstand
this and you’ll do fine
You will end up with a resilient infrastructure
Bruce Scharlau, University of Aberdeen, 2013
When code is ready then test
https://github.com/Netflix/SimianArmy/wiki/Q
uick-Start-Guide will guide you
Run automatic tests on code, but test code
works by randomly stopping services, etc
http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html
http://techblog.netflix.com/search/label/chaos%20monkey
Bruce Scharlau, University of Aberdeen, 2013
Run these tests when suitable
staff are available
Run these tests when staff expect them so
that they can respond accordingly and learn
from them
Run them on production side so that
responses can be organised accordingly
Better now than at 3am at the weekend…
Bruce Scharlau, University of Aberdeen, 2013
Must be run against production
Chaos monkey must be run against
production as this is where it counts and
where nuances exist that can’t be replicated
in test environments
All of this fits into larger ‘devops’ approach
to development
http://www.ibm.com/developerworks/java/library/a-devops1/index.html
Bruce Scharlau, University of Aberdeen, 2013
What will happen with your site?
Gather with your team and track the app from
the handset to the servers and determine
what happens if any part breaks
What can be preloaded, cached, or stored
elsewhere?
Bruce Scharlau, University of Aberdeen, 2013
Download