Continuous Delivery on an Enterprise Java Stack Marc Fasel, Senior Consultant, Shine Technologies @marcfasel http://blog.shinetech.com Shine Technologies Specialises in enterprise software development Blue-chip clients Technology focus Enterprise Java Ruby AWS Mobile development What is Continuous Delivery? Rapid, repeatable, reliable, low risk deployments Automation of the software delivery process Extension of Continuous Integration Continuous check-in by developers Automated build Automated unit testing => Detect integration problems early Business Motivation Put business in control of software release cycle Reduce risk deliver small batches to production Increase Agility Allow for short iterations Trigger deployments at any time Continuous Delivery Project Blue-chip Australian retailer Website relaunch with nation-wide marketing campaign High-performance web application Expected load of 1,000,000 requests per hour 24*7 availability Greenfield development Rapid release of new features Deployment Status Quo Monthly deployments of software Highly manual process Expert knowledge required Tedious and unpleasant Failures due to manual process Not rapid, repeatable, reliable, and low risk Implementation of Continuous Delivery 1. Automated deploy to application server 2. Automated acceptance testing 3. Push-button promotion to user acceptance testing (UAT) environment 4. Manual user acceptance testing 5. Push-button release to production Continuous Delivery Pipeline Source: Wikipedia Starting Point Continuous Integration was already in place Agile practices were emerging Automated provisioning of virtual machines using Puppet was in place - machines were reliably the same Development and deployment was both managed by Shine => DevOps Access to production machines Environments Continuous Integration Server Local Deploy Build Promote UAT Promote Production Production Environment Cluster of 20 nodes Load Balancer Use of Puppet to create nodes Tomcat and Apache connected to a single Oracle database 1. Automated Deploy to Build Server Different parts work together: Build artifact, configuration, data Build artifact is copied into application server Configuration files (Tomcat, Apache Httpd) are copied Manual non-destructive database migration beforehand Database Migration Ideal: Automated database migration Database schema changes are in version control Automated execution of database scripts during deployment and promotion Not possible at client DBA team get change requests with attached scripts Review of database changes Execution independent of software release cycle 2. Automated Acceptance Testing Selenium Run through major use cases Write code don’t record scripts Use page pattern to aid reusability Run with every deploy for quick feedback Tests only ran against Firefox Problems with Automated Acceptance Testing Data must be set up/ reset in the environment No collision with test data used by developers 3. Push-button Promotion to UAT Artifact, configuration, and data Promotion: Copy existing artifact, don’t rebuild All environment-specific configuration external Database changes had been done beforehand 4. Manual User Acceptance Testing UAT for small features is no problem Bottleneck were manual regression tests; even with a good test plan this took two hours of tedious work Batching of features was necessary to reduce the number of times the manual regression tests had to be performed Good balance was 2 week iterations with a few days of UAT and 1/2 day of manual regression testing Automated Multi-browser Acceptance Testing Run Selenium test against supported browsers and operating systems Maintenance of such an environment time-consuming Testers were not technical enough to take charge of scripted acceptance testing => Reliance on manual acceptance tests and - painfully - manual regression tests Manual Regression Test Manual regression test after UAT is done, otherwise: If UAT fails manual regression test has to be redone=> lots of work Time pressure was high because of the 1/2 day needed after UAT was done 5. Push-button Release to Production Same script as for UAT promotion: copy artifact, copy production configuration from svn, database setup Zero Outage Deployment Deploy at any time Production deployment needs to be transparent to the user Round-Robin deployment in cluster Session Sharing Programmatically remove node form load balancer cluster Session Sharing Different approaches possible Applications server allow for automatic session replication Sessions can be stored outside of application server, i.e. Keep sessions on client We chose to implement client-side session sharing Load Balancer Visibility Programmatically take cluster nodes out of Load Balancer Remove from cluster Deploy new software version Add server back to cluster Creative solution: health check file in Apache HTTPD Advantages of Zero-Outage Deployment Deploy any time vs. deployment window 7:00am Friday morning Deployment becomes Business-As-Usual If deployment fails we can redo it an hour later Support for agile development Controlled environment: no more manual steps Deliver hot fixes within hours not days or weeks Problem: Different Software Versions in Production Round-robin deployment means multiple software versions are in the same cluster User lands on server with new front-end feature User submits page User lands on server with previous version Error Only happens during deployment We chose to ignore this Reporting Monitoring of # of exceptions in logs Monitoring of down time of servers Disaster Recovery Roll Back Deploy latest working version No analysis of problem necessary Almost immediately back in business Roll Forward Fix the problem and release again Analysis may take time Not feasible for a 24*7 application Disaster Recovery: Example Database table was missing index UAT database had large tables truncated Long running queries blocked each other Whole cluster stopped working Roll-back restored operations Root-cause analysis Continuous Deployment Takes Continuous Delivery one step further Promote code to Production as soon as it is ready Reduce batch size to one feature per release Beyond two week sprints: deliver asap Disadvantages of Continuous Deployment Business may request more deployments Manual testing still the most important test gate for regression; the more you deploy the less thorough this step will be Continuous Delivery is not for free: Manual testing still takes time Regression testing time is often more than testing new features Not Covered Automated Performance Testing Spot-check by developers using JMeter was done after major rewrites of the software by external party Production database passwords in version control Systems that are not easily automated Achievements Rapid, reliable, repeatable, low risk deployments Extension of existing agile process Two week releases Hotfixes any time Happy developers Happy business users Questions