SessionID Towards Operational Excellence Adrian Hornsby Principal Evangelist - Architecture Amazon Web Services @adhorn © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Operational Excellence? When your whole business is fundamentally dependent on technology, operational excellence is critical. 1995 Database Internet Web Server customers Inventory Orders Customer Service Tools Fulfillment Center Tools What is Operational Excellence? What is Operational Excellence? • Happy customers! • Consistently exceeding operational goals • Anticipating and addressing problems • Effectively responding to operational issues • Continuously improving …and doing all of this at significant scale. How does a technology organization move toward OE? Achieving Operational Excellence Culture Technology Tools Processes Achieving Operational Excellence Culture Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results https://www.amazon.jobs/en/principles Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results Amazon Flywheel Convenience Fast Delivery Innovation Reduce Customer’s Costs Wide Selection of Products What would Low-Flying-Hawk say?” Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results 2 Pizza Team Responsibilities Their product Responsible for *Unless their product belongs in the blue * Deployment tools CI/CD tools Monitoring tools Metrics tool Logging tools APM tools Infrastructure provisioning tools Security tools Database management tools Testing tools …. Not responsible for You build it; you ship it Achieving Operational Excellence Tools Tools to Operate the Cloud • Test Automation • Configuration Management • Software Deployment • Monitoring and Visualization • Reporting • Change Management • Incident Management • Trouble Ticketing • Security Auditing • Forecasting and Planning Calling Houston… Website Deployment team “website-push” perl script Calling Houston… Website Deployment team “website-push” perl script Command line tools Hand build Hand deploy to NFS % /opt/amazon/customer-service/bin/request-refund Breaking the monolith Breaking the monolith ü ü ü ü Small Focused Single-purpose Connected via HTTP API Conway’s law Organization “Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.” — M. Conway Architecture THEIR PRODUCT Deployment tools CI/CD tools Monitoring tools Metrics tool Logging tools APM tools Infrastructure provisioning tools Security tools Database management tools Testing tools …. You measure. You collect data. You listen to anecdotes. Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results Write Code Wait Build Code Wait Deploy to Test Wait Deploy to Prod Brazil • Centralized and hosted build system • Generating artifacts to deploy • Deployment service • No downtime deployments • Health checking • Versioned artifacts and rollbacks https://www.allthingsdistributed.com/2014/11/apollo-amazon-deployment-engine.html Pipelines • Path code takes from check-in to production • Where automation, testing, and approvals happen • Enabler of continuous deployment Example Pipeline and Stages Packages VersionSet Gamma Revision history Revision history Revision history >> PDX-Prod Revision history Status Approval status - Diff Approval Workflow Approval Workflow Compliance verification Whitelisting Status Status Cancel L1 approval Approve Not L2 approval Approve Deploy when ready Status Not Prod - Rest >> Revision history Hundreds of millions of deployments a year - as of 2019 https://aws.amazon.com/devops/ Achieving Operational Excellence Processes “Oh! Those tables always come back, and they’re always damaged. They’re not packaged right, so the surface of the table always gets scratched.” People already have good intentions If good intentions don’t work, what does? Mechanisms 1902 Toyota will not allow any defect that they know about to go down the manufacturing line. Image Source: https://www.autoguide.com/auto-news/2016/01/toyota-production-japan-may-stop-next-monthdue-to-steel-shortage.html Andon Cord Image Source: https://www.autoguide.com/auto-news/2016/01/toyota-production-japan-may-stop-next-monthdue-to-steel-shortage.html The Andon Cord Andon Customer Service Jeff Bezos 2012 Shareholder Letter We noticed that you experienced poor video playback while watching the following rental on Amazon Video On Demand: Casablanca. We’re sorry for the inconvenience and have issued you a refund for the following amount: $2.99. We hope to see you again soon. "Good intentions never work, you need good mechanisms to make anything happen." Jeff Bezos Good Mechanisms ≈ Complete Processes Audit Tools Adoption Correction of Errors (COE) Mechanism to learn from our mistakes • technical flaws • process flaws • documentation flaws • organizational flaws • other flaws Mechanism to identify contributing factors to failures Mechanism to drive CONTINUOUS IMPROVEMENT Anatomy of a COE • What happened? • What data do you have to support this? • Metrics and graphs • What was the impact on customers and your business? • What are the contributing factors? • Don’t stop at operators. • What lessons did you learn? • What corrective actions are you taking? • Actions items • Related items (trouble tickets etc.) https://www.youtube.com/watch?v=yQiRli2ZPxU Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results Audit Weekly Operational Metrics Review • Continuous inspection mechanism • Maintains focus on operations • Foundation of a healthy operations program Typical Agenda (~15min) • Share successes and failings • Action items follow up • Review COEs • Review key service metrics • Identify new best practices https://aws.amazon.com/blogs/opensource/the-wheel/ Continuous Improvement Policy Engine • Automated risk and opportunity analyzer • Identifies potential risks to availability, infrastructure, security and more • Both inherited and direct • Highlights potential opportunities to optimize resource utilization • Extensible and configurable • Provides single-pane-of-glass view into policy compliance • Allows acknowledgment • Reports roll-up the organization hierarchy Mechanism to propagate local learnings globally In conclusion... Achieving operational excellence requires: an operationally focused culture a rich set of tools the right processes • Good Intentions Don’t Work • Mechanisms Work “The world, thankfully, is full of many high-performing, highly distinctive corporate cultures. We never claim that our approach is the right one – just that it’s ours – and over the last two decades, we’ve collected a large group of like-minded people. Folks who find our approach energizing and meaningful.” Jeff Bezos - 2015 Amazon.com letter to shareholders Thank you! Adrian Hornsby @adhorn https://medium.com/@adhorn https://dev.to/adhorn © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.