Slide 1

advertisement
Transforming IT
Operations: A Survey of
Effective Practices
Shawn Winnington-Ball
Information Systems And
Technology
03 December 2013
Introduction
• There are some fundamental problems in
IT that people are working to solve
• Let’s examine some of the ideas and
approaches
• Knowledge gleaned from various sources:
books, blogs, articles
• A selection of what I find compelling
IT/business
• IT is a critical function in the achievement
of business goals
– Business goals have become IT goals
• Past the point of no return where we can
fallback to manual processes
– IT risk is therefore business risk
IT/business
• In our digital society, there is tremendous
value in using IT to create novel ways of
enhancing our experiences
– Digitalization (Gartner)
• Business success is tied to IT success,
and how creatively and capably the IT
hammer can be wielded
IT risks
• What are some of the risks that might
prevent us from achieving our IT goals?
– There’s too much work to do already
– Fixed culture: ‘we’ve always done it this way’
– Sufficiently resilient/secure IT infrastructure
– Silo mentality: the right people aren’t talking
about the right things
– Insufficient understanding of true priorities
The approaches
• From here on out, IT operations context
• The Phoenix Project: IT is in the toilet, and
the miraculous recovery
• The DevOps movement: bury the hatchet
• IT process improvement efforts, culture
change
IT is a mess
• The situation: too much to do, everything
chaotic, messy, unordered
• Where do you begin when overwhelmed?
• Tough to build a house with a jumbled pile
of bricks, lumber, screws and shingles
• The right work isn’t getting done: inefficient
practices and processes
Unclogging the pipes
• Analyze active work, see the big picture
– Who spends the time on this work?
– Which of the work is repeatable?
– Which of it requires specialized knowledge?
– What are the organization’s true priorities and
how does the work fit with them? Is there a
disconnect?
Unclogging the pipes
• Collect the work, categorize it
– Projects, Infrastructure, Changes, Unplanned
– Infrastructure development/maintenance work
is internal project work: call as it much
– 20000’ view: what are All The Things currently
underway?
– This is our Work in Progress, active tasks
Unclogging the pipes
• Clear the backlog: what is preventing the
work from getting done?
– Constraints and bottlenecks
– Systematically clear them
• Low-hanging fruit: cease unplanned work
– Underlying causes: why does IT break?
Unclogging the pipes
• Steady ongoing changes, make them less
prone to causing unplanned work
• Technical debt, taking shortcuts now will
cause pain later
• Control the release of work into IT
• Demand outstrips capacity: don’t autoaccept new commitments
Unclogging the pipes
• Determine total IT capacity. What
commitments can we reasonably take on?
– Isolate key projects and freeze ongoing efforts
for everything else
– Identify the work that only one person does
and standardize it, document the process
– Elevate preventative work: if it breaks often it
gets the most attention
Unclogging the pipes
• ‘Setting the tempo by our constraints’
– Say NO now but say YES later once the
backlog is clear
– It’s easy to be honest about your capabilities
when you have a clear picture
Free and clear
• What can these ideas bring about?
– Reduction in chaos
– Ordered approach to work, priorities-based
– No more uncontrolled change
– Honest assessment of true capabilities
DevOps overview
• What is DevOps? A collaborative
approach to how IT development and
operations relate
• Tension between creating and maintaining
– Development: fast, agile, creative
– Production: stable, predictable, resilient
• Reconciling different perspectives
DevOps overview
• Borne from the Agile development
movement: fast code release, quick sprints
• Speed is of the essence: companies need
to keep up with competition, provide value
quicker and more often, more reliably
• The DevOps philosophy is summed up in
three guiding principles…
DevOps – First Way
1. Systems Thinking
– Performance of the entire system
– Fast flow of work: continuous integration,
deployment: small legos not big bricks
– Understand that value is generated in IT from
left to right: development to production,
always moving forward
– ‘”Reduce friction, increase velocity”’ (Farr)
DevOps – Second Way
2. Amplify feedback loops
– Bring developers closer to their live code: if
sysadmin is on-call, why not the developer
– Improve the duration between learning of and
correcting failures
– When the system is broken, fix it before
completing the work itself
DevOps – Third Way
3. Culture of continual improvement and
learning
– Take risks, fail quickly, move on
– Prevent failures from reaching production
– The basis of improvement is practice and
repetition: make it habitual and widespread
– Test your supposed resilience: break things
on purpose to see what happens
DevOps: the toys
• Infrastructure as code: heavy use of
configuration management
• Versioned environments, automated
deployments
• Graph anything and everything
• DevOps isn’t tools but they are invaluable
to establishing the culture
The Visible Ops
• Prescriptive guide based on ITIL
• ITIL doesn’t tell you where to begin;
daunting effort
• Authors provide 4 distinct phases of
process improvement
• Case study based: what do the shining
stars have in common?
The Visible Ops
• “80% of outages caused by operator and
application errors”
• Cultural problems
– Change management is made too tough
– “Cowboy culture”; misplaced sense of agility
– Reactive, always firefighting, never planning
– Constantly chasing audit requirements
The Visible Ops
• Characteristics of high-performing orgs
–
–
–
–
–
–
–
High availability as measured by MTBF and MTTR
High throughput of successful changes
Investment early in IT lifecycle: release mgmt
Visible audit controls
IT ops and security working closely, mentor/mentee
Low amounts of unplanned work
Server to admin ratio > 100:1
The Visible Ops
• “Stabilize the patient”
–
–
–
–
–
Identify most problematic infrastructure
Publish change policy: Thou Shalt Not Touch
Create designated change windows
Use Tripwire to verify compliance
Create Change Advisory Board body comprising
stakeholders, use change request tracking system
– Initiate change management meetings (to authz
changes) and daily change briefings (to announce)
The Visible Ops
• “Catch and Release” & “Find Fragile
Artifacts”
– Interrogate all systems, ask many questions of them
– Find the systems that are unique, scary, important,
and historically problematic
– Determine how many unique configurations you
actually have
– Document systems and services and
interdependencies in a CMDB
The Visible Ops
• Create a Repeatable Build Library
– Infrastructure as fuses; replace, don’t fix
– Engineer builds for fragile infrastructure
– Reduce unique configurations in production
– Create ‘Golden Builds’: system images
– Identify lowest common denominators across
the environment
The Visible Ops
• Continual Improvement
– Metrics: can’t manage what you can’t measure
– Fact-, not belief-based management
– MTTR and MTBF are key, affected by release stage
planning efforts
• Closed loop between phases 1-3
– Release, controls, resolution
LISA 2011
• SREs at Google: Tom Limoncelli
– Disconnect between dev and prod, competition brings
them closer out of necessity
– Faster feature release, pent-up waterfall methods no
longer suffice
– Dev teams run their own services for 6+ months
– SREs provide self-service to devs: systems, storage,
bandwidth, monitoring, docs: videos, wikis, SLA
metrics
LISA 2011
• Deployinator at Etsy: Erik Kastner, John
Goulah
–
–
–
–
–
Speed and agility valued: 30+ code deploys/day
“Be wrong as fast as possible”
Graph everything that can be measured
The entire company is on IRC, up to CEO
Code push announcements are published via IRC bot
LISA 2011
• Puppet: Luke Kanies
– A pep talk for an obstinate, slow-moving sector
– Competition drives innovation: do it better and faster
than the next person
– Zynga was adding 1000 servers per week (!)
– Cloud computing is independence and self-service,
not doing it all yourself, relying on sub-contractors
LISA 2011
• Game Day: Jesse Robbins, Opscode
– Things happen, adjust your response to them
– Determine the MTTR on your own terms
– Rules:
• Preparation: goals: mitigate impact, reduce MTTR, MTBF
• Participation: all hands on deck, everyone suffers together
• Exercises: ‘trigger and expose latent’ defects, start small
– Work up to full data centre outage!
– Essentially positive outlook, can-do attitude
IT culture
• Tools, tools, tools is the typical mantra
• Discuss the ideas, habits and beliefs that
underpin our approach to our jobs and IT
• Technology is rapid, people aren’t
– “Give People priority. If a few more projects spent a third or more
of their time, effort and money on People aspects (consultation,
collaboration, walkthroughs, training, pilots, training, coaching,
training, support, feedback...) instead of Technology and ITIL
consultants, we might have some more successful ITSM
implementations.” (Rob England, itskeptic.org)
IT culture
• How do you compel people to change their
views and habits?
– Address ‘how is this time any different?’
– Address ‘how does this affect me?’ and ‘what
do I stand to gain from it?’
– Courage to tell it like it is: be honest and don’t
avoid conflict out of fear
– Be vulnerable, share your personal story
Conclusions
• Many great ideas on how to advance IT
operations to meet business goals
• Perhaps we just need ideas to flourish in
small pockets?
– Can’t ordain cultural change: find places
where it will grow and support the good ideas
– Organize more places to connect like-minded
people
Sources, inspiration
•
•
•
•
•
•
•
•
•
•
•
•
•
•
The Visible Ops, The Phoenix Project – Behr, Kim, Spafford
http://itskeptic.org (ITSM consultant, kinda grouchy, great critical perspective)
http://blogs.pinkelephant.com/troy (ITSM consultant, several years’ of blog material)
“SRE@Google Limoncelli”
“Opscode Gameday LISA 2011”
http://agile.dzone.com/articles/agile-its-second-decade-0
http://itrevolution.com/learn-more-about-concepts-in-phoenix-project/
http://itrevolution.com/nick-galbreath-on-integrating-information-security-into-devops/
http://itrevolution.com/one-of-the-best-devops-talks-on-it-transformation-continuously-deployingculture-by-rembetsy-and-mcdonnell-velocity-london-2012/
http://itrevolution.com/the-three-ways-principles-underpinning-devops/
http://itrevolution.com/video-of-my-2012-puppetconf-keynote/
http://noelbruton.wordpress.com/2013/11/23/the-phoenix-project-exposes-itils-anti-managementbackwardness
https://speakerdeck.com/atalanta/how-not-to-do-devops
http://venturebeat.com/2013/09/30/an-idiots-guide-to-devops
Questions
swball@uwaterloo.ca
Download