Uploaded by Nandini Patnaik

AWS-Summit-Online-OperationalExcellence-Adrian

advertisement
SessionID
Towards Operational Excellence
Adrian Hornsby
Principal Evangelist - Architecture
Amazon Web Services
@adhorn
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is Operational Excellence?
When your whole business is fundamentally
dependent on technology,
operational excellence is critical.
1995
Database
Internet
Web Server
customers
Inventory
Orders
Customer Service
Tools
Fulfillment Center Tools
What is Operational Excellence?
What is Operational Excellence?
• Happy customers!
• Consistently exceeding operational goals
• Anticipating and addressing problems
• Effectively responding to operational issues
• Continuously improving
…and doing all of this at significant scale.
How does a technology organization move
toward OE?
Achieving Operational Excellence
Culture
Technology
Tools
Processes
Achieving Operational Excellence
Culture
Culture: Amazon Leadership Principles
1. Customer Obsession
2. Ownership
3. Invent and Simplify
4. Are Right, A Lot
5. Hire and Develop the Best
6. Insist on the Highest Standards
7. Think Big
8. Bias for Action
9. Frugality
10. Learn and Be Curious
11. Earn Trust
12. Dive Deep
13. Have Backbone; Disagree
and Commit
14. Deliver Results
https://www.amazon.jobs/en/principles
Culture: Amazon Leadership Principles
1. Customer Obsession
2. Ownership
3. Invent and Simplify
4. Are Right, A Lot
5. Hire and Develop the Best
6. Insist on the Highest Standards
7. Think Big
8. Bias for Action
9. Frugality
10. Learn and Be Curious
11. Earn Trust
12. Dive Deep
13. Have Backbone; Disagree
and Commit
14. Deliver Results
Amazon Flywheel
Convenience
Fast Delivery
Innovation
Reduce Customer’s
Costs
Wide Selection of
Products
What would Low-Flying-Hawk say?”
Culture: Amazon Leadership Principles
1. Customer Obsession
2. Ownership
3. Invent and Simplify
4. Are Right, A Lot
5. Hire and Develop the Best
6. Insist on the Highest Standards
7. Think Big
8. Bias for Action
9. Frugality
10. Learn and Be Curious
11. Earn Trust
12. Dive Deep
13. Have Backbone; Disagree
and Commit
14. Deliver Results
2 Pizza Team Responsibilities
Their
product
Responsible
for
*Unless their product belongs in the blue
*
Deployment tools
CI/CD tools
Monitoring tools
Metrics tool
Logging tools
APM tools
Infrastructure provisioning
tools
Security tools
Database management
tools
Testing tools
….
Not responsible for
You build it; you ship it
Achieving Operational Excellence
Tools
Tools to Operate the Cloud
• Test Automation
• Configuration Management
• Software Deployment
• Monitoring and Visualization
• Reporting
• Change Management
• Incident Management
• Trouble Ticketing
• Security Auditing
• Forecasting and Planning
Calling Houston…
Website
Deployment team
“website-push” perl script
Calling Houston…
Website
Deployment team
“website-push” perl script
Command line tools
Hand build
Hand deploy to NFS
% /opt/amazon/customer-service/bin/request-refund
Breaking the monolith
Breaking the monolith
ü
ü
ü
ü
Small
Focused
Single-purpose
Connected via HTTP API
Conway’s law
Organization
“Organizations which design systems … are constrained to
produce designs which are copies of the communication
structures of these organizations.”
— M. Conway
Architecture
THEIR
PRODUCT
Deployment tools
CI/CD tools
Monitoring tools
Metrics tool
Logging tools
APM tools
Infrastructure provisioning
tools
Security tools
Database management tools
Testing tools
….
You measure.
You collect data.
You listen to anecdotes.
Culture: Amazon Leadership Principles
1. Customer Obsession
2. Ownership
3. Invent and Simplify
4. Are Right, A Lot
5. Hire and Develop the Best
6. Insist on the Highest Standards
7. Think Big
8. Bias for Action
9. Frugality
10. Learn and Be Curious
11. Earn Trust
12. Dive Deep
13. Have Backbone; Disagree
and Commit
14. Deliver Results
Write
Code
Wait
Build
Code
Wait
Deploy
to Test
Wait
Deploy
to
Prod
Brazil
• Centralized and hosted build
system
• Generating artifacts to deploy
•
Deployment service
•
No downtime deployments
•
Health checking
•
Versioned artifacts and rollbacks
https://www.allthingsdistributed.com/2014/11/apollo-amazon-deployment-engine.html
Pipelines
• Path code takes from check-in
to production
• Where automation, testing, and
approvals happen
• Enabler of continuous
deployment
Example Pipeline and Stages
Packages
VersionSet
Gamma
Revision history
Revision history
Revision history
>>
PDX-Prod
Revision history
Status
Approval status - Diff
Approval Workflow
Approval Workflow
Compliance
verification
Whitelisting
Status
Status
Cancel
L1 approval
Approve
Not
L2 approval
Approve
Deploy when ready
Status
Not
Prod - Rest
>>
Revision history
Hundreds of millions of deployments a
year - as of 2019
https://aws.amazon.com/devops/
Achieving Operational Excellence
Processes
“Oh! Those tables always come back, and they’re always damaged.
They’re not packaged right, so the surface of the table always gets
scratched.”
People already have good intentions
If good intentions don’t work, what does?
Mechanisms
1902
Toyota will not allow any defect that they know
about to go down the manufacturing line.
Image Source: https://www.autoguide.com/auto-news/2016/01/toyota-production-japan-may-stop-next-monthdue-to-steel-shortage.html
Andon Cord
Image Source: https://www.autoguide.com/auto-news/2016/01/toyota-production-japan-may-stop-next-monthdue-to-steel-shortage.html
The Andon Cord
Andon Customer Service
Jeff Bezos 2012 Shareholder Letter
We noticed that you experienced poor
video playback while watching the
following rental on Amazon Video On
Demand: Casablanca. We’re sorry for
the inconvenience and have issued you
a refund for the following amount:
$2.99. We hope to see you again soon.
"Good intentions never work, you need
good mechanisms to make anything
happen."
Jeff Bezos
Good Mechanisms ≈ Complete Processes
Audit
Tools
Adoption
Correction of Errors (COE)
Mechanism to learn from our mistakes
• technical flaws
• process flaws
• documentation flaws
• organizational flaws
• other flaws
Mechanism to identify contributing factors to failures
Mechanism to drive CONTINUOUS IMPROVEMENT
Anatomy of a COE
• What happened?
• What data do you have to support this?
•
Metrics and graphs
• What was the impact on customers and your business?
• What are the contributing factors?
•
Don’t stop at operators.
• What lessons did you learn?
•
What corrective actions are you taking?
•
Actions items
•
Related items (trouble tickets etc.)
https://www.youtube.com/watch?v=yQiRli2ZPxU
Culture: Amazon Leadership Principles
1. Customer Obsession
2. Ownership
3. Invent and Simplify
4. Are Right, A Lot
5. Hire and Develop the Best
6. Insist on the Highest Standards
7. Think Big
8. Bias for Action
9. Frugality
10. Learn and Be Curious
11. Earn Trust
12. Dive Deep
13. Have Backbone; Disagree
and Commit
14. Deliver Results
Audit
Weekly Operational Metrics Review
•
Continuous inspection mechanism
•
Maintains focus on operations
•
Foundation of a healthy operations program
Typical Agenda (~15min)
•
Share successes and failings
•
Action items follow up
•
Review COEs
•
Review key service metrics
•
Identify new best practices
https://aws.amazon.com/blogs/opensource/the-wheel/
Continuous Improvement
Policy Engine
• Automated risk and opportunity analyzer
• Identifies potential risks to availability, infrastructure, security and
more
•
Both inherited and direct
• Highlights potential opportunities to optimize resource utilization
• Extensible and configurable
• Provides single-pane-of-glass view into policy compliance
• Allows acknowledgment
• Reports roll-up the organization hierarchy
Mechanism to propagate local learnings globally
In conclusion...
Achieving operational excellence
requires:
an operationally focused culture
a rich set of tools
the right processes
• Good Intentions Don’t Work
• Mechanisms Work
“The world, thankfully, is full of many high-performing, highly
distinctive corporate cultures.
We never claim that our approach is the right one – just that
it’s ours – and over the last two decades, we’ve collected a large
group of like-minded people. Folks who find our approach
energizing and meaningful.”
Jeff Bezos - 2015 Amazon.com letter to shareholders
Thank you!
Adrian Hornsby
@adhorn
https://medium.com/@adhorn
https://dev.to/adhorn
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Download