249ContDeliveryPractices

advertisement

OpenStack Continuous Delivery

Best Practices:

Putting what IBM learned from this Community into practice

Andrew Trossman, Distinguished Engineer, Cloud Management, IBM

Kendall Lock, Director of Cloud Management, IBM

2

Breaking the Addiction of Long Cycle Times

• Inertia is very powerful

• OpenStack and Cloud technologies enable much of the change, but…

• Modifying team behavior is key!!!

3

Modifying Behavior…Leveraging Technology

The 12 Step Program to

Continuous Delivery

4

Step 1: Admit you have a problem

“I am powerless over my long release cycles, and my life has become unmanageable ”

The Emotional Reaction The Truth

“Big Rocks Make Big

Splashes – we need big releases to Market ”

“Since the Release is 9 months long, I need to fight to the death now to get my favorite feature in ”

“Be careful about customer feedback – we are already full ”

You can still make a “Big

Splash ” by accumulating delivered features

Delivering incrementally means being able to change your mind – react to customers and competitors

Software is not meat – we do not price it by weight

5

Step 2: Accept a methodology greater than yourself

The Continuous Delivery Pipeline

6

The Ugly Truth…Part 0

• Although all people on Planet Earth if asked “do you think you will do better with fewer features of high quality or more features with average quality?

” will say “the former of course”,

Stakeholders have a hard time changing their own behavior

• Don ’t want to take a month to put Continuous Delivery automation in place – “ doesn ’t offer direct customer value ”

• “Totally agree we should do this…just have these one or two urgent customer things to deal with first ” – can quickly put 6 business mortgages on the project that can never get paid off

The Ugly Truth…Part 1

• Our team had a distinct set of roles:

• “I am the programmer – my job is to write features that someone could use ”

• “I am the tester – my job is to try to break stuff by using my hands and other deadly weapons ”

• So…test automation must be the other guy’s job…

7

• The developers thought test was beneath them

• The testers didn ’t have programming skills to do automation

(but did have the pessimistic mentality to understand what things to try to break)

Step 3: Turn your life over to

Continuous Delivery

Deal with Common Objections

• “We have deadlines to meet and won’t get all of our code done if we also have to write a bunch of automation.

• Deadlines can be like the call of the Siren – beware!

• Quality can never be beaten in with a mallet

• “Talk to ‘the other guys’ whose job it is to make that stuff.”

8

• The “Other Guys” aren’t coming…they are you!

• Automation is not a role, it is an accountable behavior

• “Running test automation will slow down our end-to-end build and therefore our team’s productivity will go down.

• Builds have to be like breathing – you don’t even think about it.

• We had to force test automation into our culture – no new capability can be written without appropriate test automation in place

Developers now write test automation to check in code, and Testers pair up in the Scrums to identify which kinds of tests are needed

9

The Ugly Truth…Part 2

• Our build process was taking 6 hours end-end

• Very serialized, manual steps in the middle

• In our early stages, we found test blockers in more than 50% of our builds

• We used a manual “fire bucket brigade” to verify the build as green

• Without good test automation, we had to do manual testing of a bucket of scenarios to declare the build good (literally passing the baton from Beijing to North Carolina before declaring the answer)

10

Step 4: Take inventory of yourself

Determine the starting state of your pipe…and drive key ingredients!

What kind of automation do you have?

Build, Deployment, and Test Automation are Must-Haves

We chose BuildForge and Jenkins, Chef, and Rational Performance Tester and Selenium

How upgradable is your stuff?

“Continuous Delivery” can’t mean “Continuously Down for Maintenance”, so you need zero or short downtimes

Learn from OpenStack design patterns – multiple service instances and separation of code from data, leverage well-known HA techniques while you get there

Do you have missionaries to spread the word?

Find a leader or leaders who “get it”, and empower them

The executive sponsors have to eliminate the question of “if”, so the invested leaders get to focus on the “how”

Those leaders may not be the ones that traditionally had the title – look for them and get behind them to demonstrate the value of the new behavior

11

Build and Test infrastructure and how information flows through the system

12

Step 5: Admit your wrongs (and enact a plan to work on them)

Design for Disaster First

Fast Rollback is your new best friend

– it is your safety net!

Insurance against disasters (if a mistake gets through to your users), and buys you time to build up your automation

Avoids the most common derail factor in the early stage ( “Well, we had a bad problem in our app so we better go back to our long test cycles to make sure we never do that again

”)

Developers are optimists – you want to keep that confidence without sacrificing the occasional mistake

Images made this much easier – rebooting a pre-configured working system could be done in a few minutes

Design for incrementals - feature switches, strangler patterns

Flip the switch back if the new thing is soft in the middle

May have side effect of customer acceptance testing

The Ugly Truth…Part 3

• We said “ok, now test automation is part of our lives – go do it”

• But…it turns out that wasn’t quite enough direction for a distributed team to know what to do and how to do it well

• The initial result:

• We spent time up front deciding on the baseline of tests we wanted automated that covered our main use cases (Good)

• It took twice as long as we had hoped to get the baseline of test automation in place (Bad)

• Because our tests were designed and written by people who seemingly ranged in skill from “experienced professional” to

“drunk guy hitting on your Mother”, we had about 1/3 of our builds initially marked “Failed” when after investigation, the test automation was the part that was broken, not the code (Worse)

13

Step 6: Remove your defects

Test automation is hard to do well…Tempest is setting a course

Lesson #1: Treat test automation like any other code

• Review it, approve it, reject or promote it

• We needed the idea of “non voting” test cases – those that don’t stop the build because they are still in hardening phase

Lesson #2: Tests are atomic, they shouldn

’t depend on previous results

• Common test artifact repositories make it much easier to compose longer test scenarios from individual test cases

Lesson #3: Tests can

’t have allergies – being sensitive to code tweaks will greatly slow you down

Use

“immutable markers” where possible – APIs, window IDs, configuration labels; enforce those to stay consistent

Lesson #4: Understand what your users will vary, then design your tests to loop through new variations that you want to introduce

• Different images/software, different environment configurations, different shopping cart contents, different pricing promotions…

• We introduced an “Automation Lead” who designed and reviewed the work

• Lesson #5: Load/stress test your system continuously

14

Step 7: Remove shortcomings with humility

Make it easy to do what you want to avoid serializing on the pipeline

Make it easy to perform a

“sandbox build”

• OpenStack makes this easy to do through quick spawning of a virtual system with an image that has the tools already setup

• Think about “Test-as-a-Service” to avoid waiting for the “Big Bang”

Deliver your test harness as a Cloud service

– allow developers to request a subset of testing against their private builds

Publicize your results

– successes and failures

• The “Build Website” should be at the top of your Browser Favorites

• Use “peer pressure” to drive constant learning and tuning

We dedicated some team members to build up our basic machinery

– we rotated the “Ops” responsibility

• Builds must be like breathing…you do it repeatedly or you die

15

16

Step 8: Apologize to those you have harmed

Walk in your users ’ shoes, and show them (quickly) that you care

Use what you build before your customers do wherever possible

Become your own best reference/case study

We forced the current system to be used for dev/test

– if it doesn’t work, everyone gets blocked so mistakes become personal

Apply a publish/subscription model for the end of your Continuous Delivery

Pipeline

Have your application “poll” for updates, and allow each use to update at the appropriate frequency

Internal sites update daily to force constant pressure on reliable, incremental delivery

Customer beta sites may update less frequently (weekly) since digestion is slower

Update your customer production sites as frequently as you can to make them happy

(and bring their friends)

17

Step 9: Make direct amends, except when doing so would harm them

Make your feedback loop as tight as possible

Build a channel for your users to talk to you

Facebook “Like” concepts

Comment inline to your app

Web analytics to see where users derail

Talk back

Produce a

“did you know” bubble or feed

Highlight the people who gave you your best ideas publicly – encourage participation

Nobody has time – whatever you do, keep it to 5 minutes or less

We implemented

“why not try…” guides along with our live site to steer users to new things we wanted to try out

The Ugly Truth…Part 4

• The components of our product had different design heritage

• Some had data and code together on one disk

• Some had single instance services

• Some maintained local state

• We lacked the OpenStack design philosophy – everything multi-instance, code and data clearly separated

• Refactoring the design all at once is very expensive and risky, so we adopted “traditional HA” techniques for parts of our application (such as System Automation heartbeating and

DRBD file system replication)

18

19

Step 10: Continue to inventory yourself, and promptly deal with shortcomings

Make your application “Production Friendly”

Roll forward whenever possible, roll backward when you must

We used Chef for scripting this automation, and employ the three key Cloud ingredients:

Compute, Network, and Storage

Launch new instances and install new code

If needed, update data schema in a backward compatible way (add columns but don ’t remove or rename)

Switch IP to new instance (can switch back if failure is detected later)

Work out automation around “lights out operation”

Dev/Test has no rules, but promoting things to Production incurs the overhead of all the IT management practices – integrating those tools and procedures automatically will eliminate a walk through some very thick mud

We used workflows attached to our provisioning steps to integrate the tools and procedures we have to follow

20

Step 11: Meditate to Continue

Forward

Like always – inspect what you want to occur

Shift tracking methology from “pacing” metrics to “follow through ” and “reactivity” metrics

Being on schedule is much less of an interesting point of view

Customer acceptance and usage becomes front and center

Root cause any escapes from your Continuous Delivery

Pipeline and don ’t allow them to persist

Speed still matters…but it is more about efficiency

Constantly ask what could be done to improve team effectiveness, and then do what you determine

21

Step 12: Once awakened, carry the message to others in despair

Our experience in 6 months:

7:1 labor reduction (compared to the normal effort required to deploy and test our version)

More than 3300 builds

50% reduction in problem resolution time

Thanks for coming…and good luck with your journey!

22

Arabic

Hindi

Russian

Thank You

English

Thai

Traditional Chinese

Grazie

Gracias

Spanish

Obrigado

Brazilian Portuguese

Danke

German

Merci

French

Simplified Chinese

Tamil

Korean

Japanese

Download