Enterprise Service Bus - Proof of Concept Findings Document Enterprise Service Bus Proof of Concept Findings Document Revision History M. Thomas 7/2/2014 Initial draft. Circulated to team members. M. Thomas 7/2/2014 Incorporated comments from Bill Brickman including new sections "Fit With Existing Similar Products" and "Interaction with TWS/Maestro and ITO". M. Thomas 7/3/2014 Incorporated comments from Lisa Justiniano, including new executive summary. M. Thomas 7/8/2014 Incorporated comments from John Shen. Added sections on architecture. M. Thomas 7/25/2014 Added Rob Parrot's OpenStack offer. M.Thomas 11/17/2014 Added decision to revert to ServiceMix when it became apparent Fabric8 version 2 was not appropriate for our needs. Executive Summary A team of 6 HUIT | ATS | Enterprise Applications staff evaluated open source Enterprise Service Bus (ESB) software packages over a period of 6 - 8 weeks. Two packages were evaluated in-depth: Red Hat jBoss Fuse and Apache ServiceMix. We determined the ESB software offers integration functionality that complements our existing tools and promises faster development or more flexibility for certain problem types. We identified a half dozen use cases where ESB software would be a better solution than the tools we currently use. We chose to move ahead with the free ServiceMix ESB and its default messaging back-end (Apache ActiveMQ) and developed a plan for an inexpensive deployment on existing hardware. The remainder of this document provides details of this Proof of Concept (POC) initiative. This “findings” document describes in detail the purpose of the mission, challenges the team faced and the potential HUIT Administrative Technology Services - Enterprise Applications Page 1 Enterprise Service Bus - Proof of Concept Findings Document benefits the team believes can be derived from implementing an ESB. Observations that were made along this journey are included, leading the team to their final recommendations. HUIT Administrative Technology Services - Enterprise Applications Page 2 Enterprise Service Bus - Proof of Concept Findings Document 1 Introduction The ESB project is a proof of concept for using Enterprise Service Bus (ESB) products within HUIT | ATS | Enterprise Applications. While we will consider any benefit an ESB might provide us, our reason for taking a close look at ESB products is because we believe they can help us with integrations between existing systems. We suspect using an ESB might make some of our existing integrations simpler, more transparent and / or more stable. We also believe that we might be able to provide additional services to our customers at little or no cost. We are not currently hoping to use an ESB to help us decompose an application into series of cooperating services. Said another way, this project is not a pre-cursor to moving towards Service Oriented Architecture (SOA). 1.1 Project Goals Here are the projects goals: 1. 2. 3. 4. 1.2 Get experience with ESBs in general and with 2 or 3 specific ESB products. Determine if an ESB would benefit ATS. Evaluate ESBs to identify one that best fits our needs. Secondary goal: Get our feet wet on Amazon Web Services. Project Team The following team members contributed to the proof of concept and to this findings document: John Marks (GMAS Practice) John Shen (PeopleSoft Practice) Sreeni Gunnala (Finance & Procurement Practice) Bill Brickman (Data Warehouse Practice) Mike Thomas (Lead Architect) Lisa Justiniano (Sponsor) 2 Products Under Consideration 2.1 The Products We considered several products for evaluation, including Mule ESB, JBoss Fuse, Apache ServiceMix and Informatica. We also looked at Fabic8, a community version of Fuse with cloud deployment features. All are widely used, mature products. HUIT Administrative Technology Services - Enterprise Applications Page 3 Enterprise Service Bus - Proof of Concept Findings Document After an initial screening we decided to evaluate JBoss Fuse and Apache ServiceMix in depth, because they are open source products and they are produced / supported by organizations with which we are comfortable. Fuse and ServiceMix are very similar products; Fuse is an enhanced and commercially supported version of ServiceMix produced by Red Hat. Mule ESB is an open source product as well (open source, but commercial, NOT free). With limited time and resources we had to constrain the number of products we evaluated. The fact that ServiceMix and Fuse are so similar was very appealing; it means we could adopt either and not be tied to a single vendor. This fact together with our time constraint led to a decision to exclude Mule ESB from the indepth evaluation. Informatica was considered because we already use it as our primary ETL tool. Informatica is not open source; it is a commercial product and the ESB functionality would require an additional license. In the end it was excluded from the in-depth evaluation because we had issues getting an evaluation license for the ESB features in the short time available to us. 2.2 The Components Both ServiceMix and Fuse are made up of a series of major sub-components, virtually all of which are under the stewardship of the Apache Foundation. They both rely heavily on the following Apache projects: Karaf, Felix, Aries, CXF, Camel, and ActiveMQ. Here is a block diagram showing the major components with a synopsis of what each component does. This diagram is for ServiceMix; Fuse is very similar. HUIT Administrative Technology Services - Enterprise Applications Page 4 Enterprise Service Bus - Proof of Concept Findings Document Apache ServiceMix Legend Apache Karaf - console, ssh and other tools - folder-based hot deploy - security - centralized logging Process / Service Library Apache Felix - OSGi container - Java + XML bundles - class loading - service resolution Apache CXF - web services library Apache Aries - OSGi Blueprint container - pure XML bundles - service resolution -mapping XML to services Call Apache Camel - service integration library Apache MINA - networking library Netty - non-blocking IO library Quartz - scheduler etc. Active MQ - messaging runtime One important thing to note from this diagram is that ServiceMix and Fuse provide two OSGI containers in which our code can run. One is a standard OSGi container that can run bundles of Java code augmented with Spring XML metadata. The other is an OSGi Blueprint container that can run bundles of Blueprint XML that contain no Java code. 3 Product Evaluation 3.1 3.1.1 Installation Server We did stand-alone ServiceMix and Fuse installations on a variety of Windows and Unix hosts. We have installed multiple versions of both products on Windows 7, Red Hat Enterprise Linux 6.4, CentOS 6.5 and HUIT Administrative Technology Services - Enterprise Applications Page 5 Enterprise Service Bus - Proof of Concept Findings Document Amazon Linux 3.10. In all installations and testing we used the ActiveMQ messaging broker that was built in to the ESB products, but we understand that for a production deployment a stand-alone ActiveMQ cluster would be advised. There was not enough time in the proof of concept window to build out a redundant system of cooperating ESB nodes, though we confirmed that both products support this kind of deployment. The two products diverge in this area. ServiceMix deployments usually use Apache Cellar for redundancy; Fuse deployments mostly use Fuse Fabric. Note though, the community version of Fabric, Fabric8 (pronounced fabricate), is available for use with ServiceMix. Since ServiceMix and Fuse are mostly stateless when used correctly (pushing everything stateful through a message broker), the real effort in installing a redundant load-balancing architecture would be spent getting a bullet-proof ActiveMQ back end configured. Both ServiceMix and Fuse were universally praised by project developers for their ease of installation. A stand-alone installation using the default ports could be done in less than five minutes, assuming an appropriate JVM is available. The only customization necessary was to write a small script to set JAVA_HOME and PATH so the correct JVM is found, but even that wasn't necessary on a developer workstation. One installation-related issue was encountered with Fuse. We could not connect to the Fuse 6.1 Karaf console using our SSH Tectia client (though 6.0 worked fine with all ssh clients and puTTY worked fine with 6.1). We determined the issue was due to SSH crypto algorithm negotiation failure (client algorithms: crypticore128@ssh.com,aes128-cbc,aes192-cbc,aes256-cbc,3des-cbc,seed-cbc@ssh.com, server algorithms: aes128-ctr). We did not pursue a work-around. 3.1.2 Development Environment Developers could use an existing generic Eclipse installation to build code bundles, or they could choose to install JBoss Developer Studio, a version of Eclipse with JBoss branding and several plugins to assist in Fuse development. Some project developers were very keen on using the JBoss IDE because of the promised integration with Fuse and because it offers a drag and drop interface for constructing integrations. However: None of those who tried the IDE were able to achieve the simple task of deploying a bundle to Fuse from within the IDE. Multiple developers reported that the promise of drag and drop was hollow. For a complex integration you had to switch out of drag and drop mode into XML editing mode, and once you took this step you could not go back to drag and drop. Bundles are built using Maven. In nearly all cases the Maven that is built into Eclipse was sufficient. We did find one installation task (deploying Oracle JDBC drivers as an OSGi bundle) which required command-line Maven, so developers may need to also install a stand-alone Maven. HUIT Administrative Technology Services - Enterprise Applications Page 6 Enterprise Service Bus - Proof of Concept Findings Document Many integrations could be done without any Java coding at all; for these integrations a text editor would be the only tooling necessary beyond the basic ESB. Even when Java code is required for an integration, a developer is not forced to use a complex IDE like Eclipse; command-line Maven would be sufficient. 3.2 Coding The team did the following general types of integrations: web services and web service clients database queries and database updates local file system reads and writes SFTP reads and writes message producers and consumers (Active MQ) email notifications Of the dozen or so integrations we worked through while learning, or wrote from scratch ourselves, only two or three required any Java code. We anticipate, however, that more Java code would be required as the integrations get more complex. 3.2.1 Learning Curve The Java coding necessary for some types of integrations is straightforward, and the build and deployment process is automated, so anyone with a bit of Java experience should be able to pick it up. However, for groups with no Java expertise installing and learning the tools (Eclipse, Maven) might be a barrier. The first integration of a particular type (e.g. providing a web service, consuming a web service, querying a database, etc.) is difficult. The XML is exacting and if you have errors it is difficult to isolate the problem. Online example code spans multiple versions of both products, going back ten years, and it is often difficult to tell if an example found on the internet should run unchanged on a particular ESB version. Once you have done an integration of a particular type, however, doing additional integrations of that type is almost trivial. On average the developers involved in the proof of concept felt that one third (36.5% to be precise) of their colleagues could produce useful code after 1 week and more than half (58.8%) after 2 weeks. 3.2.2 Coding Example So what does one of these integrations look like? Most integrations were just a few lines of XML, once you exclude all the boilerplate. As an example, here is code which monitors a folder for incoming files, and when one is detected it: parses the newly-arrived file to extract a query parameter HUIT Administrative Technology Services - Enterprise Applications Page 7 Enterprise Service Bus - Proof of Concept Findings Document does a database "select count(*) from person where ... " using the query parameter parses the DB result set and writes the count to another file The code, excluding boilerplate, is: <!-- Data source --> <reference id="myDS" interface="javax.sql.DataSource" filter="(osgi.jndi.service.name=jdbc/oracleds)" /> <!-- Get the surname to query from the XML input file. --> <from uri="file:data/db/input"/> <setBody><xpath resultType="java.lang.String">/query/lastName</xpath></setBody> <!-- Do the SQL query. --> <to uri="sql:select count(*) as cnt from person where last_name=#?dataSource=myDS"/> <!-- Result is a list of maps (one per record). Get count and convert to string. <setBody><simple>${body[0][CNT]}</simple></setBody> <setBody><simple>${bodyAs(String)}</simple></setBody> <!-- Write count to a file. --> <to uri="file:data/db/output"/> Surprisingly little code considering how much functionality it describes. To be fair, the above code relies on a data source (which could be shared by many integrations), that looks like: <bean id="dataSource" class="oracle.jdbc.pool.OracleDataSource"> <property name="URL" value="jdbc:oracle:thin:@ace.cadm.harvard.edu:8103:ACEDV"/> <property name="user" value="aceuser"/> <property name="password" value="my_secret_password"/> </bean> <service interface="javax.sql.DataSource" ref="dataSource" id="unused"> <service-properties> <entry key="osgi.jndi.service.name" value="jdbc/oracleds"/> </service-properties> </service> 3.3 Stability While doing some performance testing we saw a couple of troubling issues with ServiceMix (we later confirmed these same issues occur in Fuse): We saw bundles not running even though they appeared in the Karaf Console as "Active". The message "Caused by: java.lang.OutOfMemoryError: unable to create new native thread" appeared in the log files. On a couple of occasions people were unable to log in to the remote host with ssh until we killed the ServiceMix Java process. We tracked both issues down (using command "ps -eLf | grep java | wc -l") to the fact that ServiceMix was starting an ever-increasing number of threads as time went on, and eventually it ran into the Linux HUIT Administrative Technology Services - Enterprise Applications Page 8 Enterprise Service Bus - Proof of Concept Findings Document thread limit. By starting bundles one at a time we tracked down the offending bundle and found it was different because it started a camel route from within another camel route. The code looked like: <route id="startTheRoute"> <from uri="timer://runOnce?fixedRate=true&amp;delay=10s" /> <to uri="controlbus:route?routeId=workRoute&amp;action=start" /> </route> <route id="workRoute"> <from ... /> <to ... /> </route> This coding pattern appears to be a fork-bomb; it will continuously create new threads because the "controlbus:route?action=start" URI appears to start the entire XML bundle (not just the "workRoute" route), creating another "startTheRoute" route and recursing from there. The important takeaway is that bundles are not isolated from one and other; one bundle can bring the entire system down. It probably does not make sense, then, to share a ServiceMix instance (or even a VM which is hosting a ServiceMix instance) among disparate groups. This was the only stability issue we encountered in 6 weeks of working with ServiceMix and Fuse. 3.4 Compatibility ServiceMix was the more compatible product. We tested two major versions of ServiceMix (4.5 and 5.0) and two minor versions of Fuse (6.0 and 6.1). Moving from one version to another required minor code changes with both products. Anything which ran on the most recent version of Fuse would also run on the most recent version of ServiceMix, but the converse was not necessarily true. Code found on the Internet was more likely to run unchanged on ServiceMix than on Fuse. Most code compatibility issues with Fuse could be tracked down to the fact that we could not connect to the bundled ActiveMQ from within Fuse. We had to create our own connection to Red Hat's version of ActiveMQ, called A-MQ. We developed our own work-around. 3.5 Documentation The documentation for both products was fair. These are niche products, so we were not expecting much, but both ServiceMix and Fuse managed to fall below our expectations. Much of the problem stems from the fact that both ESB products are compositions of other Apache products and the documentation for those modules ranges widely in both quality and coverage. The most important modules, Apache Camel and Apache CXF, are thoroughly documented and are widely, if informally, supported. There are books available for both Camel and CXF. Some of the other modules have very spotty documentation. For example, the Apache Aries Blueprint documentation states that it aims to support 80% of usage and refers the reader to the OSGi specs for HUIT Administrative Technology Services - Enterprise Applications Page 9 Enterprise Service Bus - Proof of Concept Findings Document the other 20%. Even then, much of its documentation is simply placeholder text waiting to be fleshed out. Because the ESB products do not provide their own documentation for the Apache sub-modules, when you run into an issue you are left poring through Apache's documentation. This material is not consistently formatted or even organized the same way across products. The impact of this was felt most acutely when trying to debug an error, because often you can't tell which module is producing the error. As a result you get to experience the differing styles and organization of several products' documentation as you search for a solution. 3.6 Community Because the documentation was spotty, by necessity we relied heavily on the community surrounding ServiceMix, Fuse, and all their constituent products. We found that online demos and sample code were available and that about half the time they would run on both products unchanged. When integrations wouldn't run it was a frustrating experience discovering what was wrong. Very little of the online material identified the product and version they were targeted at, or even the date the posting was made. The most useful of this material came from third party technical blogs written by people who appear to work professionally in the ESB space, but many of these authors seem to have lost interest and drifted away over the years. We found the level of participation in online communities like discussion groups, technical forums, knowledge bases, and technical Q&A sites like Stack Overflow (http://stackoverflow.com/) was fairly low. You could usually find several postings of interest, but not the large body of knowledge you might find with a more mainstream product like a relational database. There wasn't a lot of difference between the Fuse community and the ServiceMix community in this regard. In fact there was so much overlap between the two online communities that you were as likely to be referred to documentation for one as the other. In the case of Fuse we submitted questions to Red Hat technical support people on a couple of occasions and the responses were prompt (within 24 hours), on point, and at an acceptable level of detail. Red Hat's Knowledge Base was very well executed but fairly limited in depth, despite being two years old. A search for "camel", the most complex and widely used of the Apache modules embedded in Fuse, returned only 110 articles. All the articles were tagged with exactly what product and version they dealt with. Surprisingly, many of the articles were tagged "ServiceMix", and there were only 250 articles in the entire knowledge base that were tagged "Fuse" or "ServiceMix". 3.7 Security We did not do any serious testing of the ServiceMix and Fuse security sub-systems. We confirmed both products rely on Apache Karaf for authentication services. We added users to each product and HUIT Administrative Technology Services - Enterprise Applications Page 10 Enterprise Service Bus - Proof of Concept Findings Document confirmed a correct login was necessary to access administrative resources but we did not make any attempt to secure any of the integrations we developed. Our understanding at this point is that Fuse and ServiceMix offer very similar security features and assurances, so security should not be a factor when making a decision between them. Security is an area which must be explored more thoroughly before we could take either product to a production environment. 3.8 Performance We did individual ad-hoc performance testing, by invoking our bundles multiple times, either by triggering them with a very short timer or by triggering them with many input files landing at the same time. We did two sessions with all developers doing ad-hoc performance testing concurrently, one session for each product. We saw no issues other than the fork-bomb issue covered in section 3.3. We were running with the default Java heap sizes and we saw smooth and consistent consumption of work. We were running with hundreds of transactions per minute, but these were all very light weight integrations. Resource consumption was very modest. We were unable to overwhelm an AWS "m1.small" instance which is a single vCPU (about a half a physical CPU), 1.7 GB memory, and intentionally constrained network performance. 3.9 Error Handling Each of the modules available from within ServiceMix and Fuse provide their own error handling mechanisms. The most important module for integrations is Apache Camel and it provides three mechanisms: errorHandler: this is the default, and the default error handler terminates the route and adds a message to a dead letter queue try ... catch ... finally: looks like Java exception handling onException: looks like shell script exception handling We need to do further investigation of error handling and come up with some best practices. In particular we must replace the default error handler, because it terminates the route and logs an error, but does not remove the input so the process repeats until the bundle is deactivated manually. One of the team members, John Shen, ran into a situation where his web service client was throwing exceptions because of errors being reported by the remote system. He could catch the exception but found no way to extract the actual error message coming from the remote system. He tried all three of Camel's documented error handling constructs, but none made the remote error message available. A HUIT Administrative Technology Services - Enterprise Applications Page 11 Enterprise Service Bus - Proof of Concept Findings Document solution might be to rework the integration to provide our own exception handling Java code (which would necessitate moving from the Aries OSGi Blueprint container to the Felix OSGi Spring container). Another solution might be to use the CXF module rather than the camel-cxf module. We will have to find a work-around for this. We don't control the exception hierarchy on many of the systems we would like to integrate with, so access to the remote error message is important. 3.10 Logging and Monitoring Logging for both ServiceMix and Fuse is available from within the Karaf console and is written to a series of plain text files which are automatically rotated. Logging can be produced from within Java code using the java.util.Logging, SLF4J, Log4j or Apache Commons logging APIs. Camel provides its own log command which can be used from within any Camel route. Logging is configured using a standard log4j configuration file. The logging output was concise and consistent. Regardless of which module generates a log message, the messages were written using the same format and (by default) to the same place. A "sifting" appender is provided which, when enabled, can split logging for each bundle out to its own log file. Monitoring of ServiceMix and FUSE is done using any JMX monitoring tool, such as jConsole. An add-on, Jolokia, can be used to bridge JMX to HTTP for monitoring tools which do not support JMX. The ActiveMQ instances, where most program state is stored, provide their own web-based console where the message queues can be examined and monitored. You can also use a JMX monitoring tool for this purpose. For automated monitoring there are ActiveMQ add-ons which provide monitoring capabilities using standard protocols like SNMP. Many common data center monitoring solutions like Nagios provide their own ActiveMQ monitoring solutions using JMX. Camel can be used to integrate with SNMP endpoints so, while we haven't investigated this, it might be possible to have individual Camel-based bundles report their own state to a remote monitoring system. 3.11 Interaction with TWS/Maestro and ITO We did not do any testing with Harvard's scheduling system, Tivoli Web Scheduling (a.k.a. Maestro) or with ITO, our system operator messaging system. We realize that we will need to develop mechanisms and best practices for interacting with both, but the same mechanisms would work for both ServiceMix and Fuse. We know we have some baseline functionality. We can call shell scripts from within Camel, so we have the ability to send ITO messages. We know we can create and detect files in the file system, giving TWS/Maestro the ability to start and hold jobs given the appropriate file presence tests in our Camel routes. Hopefully we can discover better ways to do these things. For example I would be surprised if TWS/Maestro does not have a restful API where a Camel route could query and see if it is currently on hold. HUIT Administrative Technology Services - Enterprise Applications Page 12 Enterprise Service Bus - Proof of Concept Findings Document 4 Benefits This section details some of the benefits we believe an ESB would provide to HUIT and to our customers. Standards-based integration mechanism: Fuse or ServiceMix would give us an OSGi- and Blueprint-based integration mechanism rather than the current ad-hoc mix of general purpose scripting languages (e.g. Perl, bash), proprietary languages and utilities (SQL*Plus, SQL*Loader and Informatica), Oracle database links, and custom Java code. Developers are more willing to put effort into something that is standardsbased because they expect the skills to be of use elsewhere. Many of our current mechanisms have been embraced by some ATS practices and completely ignored by others. Informatica, for example, is in heavy use by the Data Warehouse practice, in light use by the Alumni and Student Financials practices, and ignored by everyone else. An ESB would provide us with an alternative that is useful to everyone in the organization. An ESB would not replace all of the existing integration mechanisms; clearly we are not going to replace the data warehouse's Informatica data loads with ESB transactions. Developer-controlled toolset: While ETL tools like Informatica can provide much of the same functionality an ESB would provide, they are large and complex pieces of software and are usually encumbered by licensing restrictions. An individual developer does not ordinarily install their own copy of these tools for testing and experimentation; instead they make arrangements with an administrator to get a developer space in the shared instance. They must work carefully so they don't impact other developers. All of these factors introduce friction. Developers could spin up a local copy of Service Mix or Fuse in just a few minutes, and know that they have exclusive access and that they can't adversely affect others on their team. More flexible integration options: To interact with web services, LDAP/AD, SNMP, TCP sockets, etc. we currently have to write Java code. Camel gives us the ability to interact with pretty much anything that listens on a TCP port without writing any code. Because integrations are so much easier to develop, we will be able to provide integrations our customers want that previously would have fallen below the must-have / nice-to-have threshold. Faster development: Once we have a centralized ESB in place, deploying new integrations becomes much faster. Very little code to develop, no additional infrastructure to set up, and in many cases no security changes (network ACLs, users/groups/roles) means lower development effort and therefore faster turn-around times. Decoupled systems: We would get the ability to decouple systems by introducing queues of work into existing integrations. Many interactions which are currently synchronous could be transitioned to being HUIT Administrative Technology Services - Enterprise Applications Page 13 Enterprise Service Bus - Proof of Concept Findings Document asynchronous, breaking these dependencies. This would yield reduced down-time for our existing systems and simplified release management planning and communication. Batch processing to transactional: An ESB would give us the ability to move existing integrations from periodic (mostly nightly) batches to individual transactions, thereby getting fresh data in front of users faster. Many of our batch transactions happen in batch mode because it was too difficult to move those transactions between systems in real time. With an ESB and queues, sending transactions to another system becomes much easier because the burden is taken off of the source system. Data flow fan-out: We have many cases where data flows in a chain among systems, like: System A System B System C Sometimes these flows are not in place to enforce business logic; instead they exist because of a fear of performance impact on the source system. A publish and subscribe messaging broker like ActiveMQ gives us fan-out ability in these cases. The source system can post messages to a queue and all interested systems can subscribe to that queue, without impacting the performance of the source system. System B System A Publish / Subscribe Queue System C Systems that are currently late in the data flow chain will get fresh data faster. Centralized security: By pushing many (or even most) of our integrations through a single ESB we get a centralized point where we can enforce security, produce logging and notifications, and generate operations and management reports. Improved transparency: In the current integration landscape determining whether a specific scheduled integration has run is straightforward (via the TWS/Maestro console) but seeing the individual transactions might be very difficult. We use a combination of flag files, email notifications, database records, log files, etc. to record these transactions, but there is no overarching order or structure so it can be difficult for people unfamiliar with a particular application to locate this information. HUIT Administrative Technology Services - Enterprise Applications Page 14 Enterprise Service Bus - Proof of Concept Findings Document Similarly, existing un-scheduled integrations (for example new records appearing in a table that is DB-linked) are pretty much invisible to all but those developers closest to the integration. By putting these transactions through a queue, this information can be easily made available to additional interested parties. 5 Considerations for the Future 5.1 Product Selection Product selection is really a two part question, because we can choose the ESB (like ServiceMix or Fuse) and a message broker (like ActiveMQ) independently. 5.1.1 ESB 4 of 5 developers involved in the evaluation preferred ServiceMix over Fuse, citing compatibility in every case. The fifth developer felt Fuse was better because of the promise of more sophisticated developer tools. Two of five developers felt we should evaluate additional products. Mule ESB and Oracle Service Bus were mentioned by name. One developer identified functionality he thought was lacking in the products we evaluated: "We may need to look at ESBs allowing more complex data mapping translation and data lineage tracking. We also need to look at what ESBs a lot of our vendor products are looking to use, assuming they aren’t specific to that product" Transitioning existing code to an ESB is going to be a long process. We will find that code moves to an ESB with much less friction when it already has to move for other reasons (e.g. migrating to new hardware or implementing new functional requirements). We should plan on standing up an ESB and finding it takes a long time until anyone is ready to move onto it. It would not be a surprise if we had a working ESB deployed for 6 - 12 months before anyone started to actually use it. A commercial package like Oracle Service Bus might be prohibitively costly in this situation. If we were asked to choose a product today we would choose ServiceMix and be happy with the choice. Given more time to evaluate additional products I think it likely that we would still wind up choosing ServiceMix, because it is a pleasure to work with and because it has so much segment mind-share. I don't believe Fuse or any other commercial product offers enough at this point to justify its cost. 5.1.2 Message Broker Throughout this proof-of-concept we used ActiveMQ as our message broker, but we could use any other JMS-compliant message broker, or indeed, any messaging service supported by Apache Camel. HUIT Administrative Technology Services - Enterprise Applications Page 15 Enterprise Service Bus - Proof of Concept Findings Document We considered using Amazon's Simple Queue Service (SQS) because it is supported by the aws-sqs Camel module and it gives us redundancy and fail-over without any additional effort. We rejected this idea, however, because: SQS does not support encryption at rest so we would have to encrypt every message. SQS does not provide a console where administrators can monitor message queues. Instead you must use the Amazon CloudWatch service which provides some metrics like counts of messages received, messages delivered and messages deleted, but does not provide access to the messages themselves. We didn't seriously consider any other message queuing systems in this proof-of-concept because both ServiceMix and Fuse are so clearly designed to integrate with ActiveMQ. Post-proof-of-concept we should, however, take a closer look at some other products like the following: RabbitMQ: An open source JMS broker from SpringSource. Open MQ: An open source JMS broker from Oracle. This is the one that is integrated into the Glassfish container. Oracle AQ: A message queueing system built into the Oracle database since 10g. Apparently this offers a JMS-compliant API, so we should be able to make Camel talk to it. Oracle WebLogic: The full version of WebLogic provides a built in message broker (which it uses for EJB message-driven beans). 5.2 Architecture Options Since the ESB (e.g. ServiceMix or Fuse) and its underlying message broker (ActiveMQ) can be deployed separately we have a nearly unlimited set of options for how we might architect this infrastructure. With just three applications we might have the following configurations: A B C App App App App App App ESB ESB ESB ESB ESB ESB MQ MQ MQ D App App App ESB MQ App App ESB App ESB MQ A. Multiple message brokers, private to specific groups of applications. B. Single message broker, each application has its own ESB. HUIT Administrative Technology Services - Enterprise Applications Page 16 Enterprise Service Bus - Proof of Concept Findings Document C. Single ESB and single message broker. D. Shared message broker, ESBs private to groups of application. How we ultimately decide to architect the ESB(s) and message broker(s) will depend on the applications we are trying to integrate, the data they contain, and the populations which use and maintain them. 5.3 Deployment Location We have three decisions to make with regards to the deployment location for an ESB. Firstly we need to decide whether to deploy a centralized instance which can be used by multiple Practices. With that decision in hand, we need to decide where to deploy the ESB itself and where to deploy the Message Broker component. 5.3.1 A Centralized Instance All developers saw benefit in having a centralized instance of an ESB, either at the ATS Enterprise Applications level, the ATS level, or even at the HUIT level. 2 of 5 developers felt they would require high availability for an ESB used by their Practice. The other 3 felt they would benefit from 100% up-time, but did not feel it was absolutely necessary. This desire for high availability and the cost of the associated redundant infrastructure would argue for a centralized instance. There are some arguments against a centralized instance: From our load testing session it is apparent that it is possible to crash the instance by introducing a bundle that asks too much of the container (a denial of service like our fork-bomb incident) . Bundles must take care to properly identify themselves. Things like the optional id attribute of the <camelContext> and <route> elements, and the <description> element in a camel route would need to be enforced so error messages can be tracked back to bundles: <camelContext xmlns="http://camel.apache.org/schema/blueprint" id="xxx-context"> <route id="xxx-route"> <description>Grants.gov call service.</description> 5.3.2 Logs will require extra attention. Deploying many bundles on an instance would mean a lot more logging to wade through when attempting to resolve issues. If the "sifting" logging appender is enabled logging will be split out by bundle so there will be many, many log files and the logging from a particular business process might be split out over several files. Deploying the ESB An ESB, by its nature, needs to connect to many existing systems. The fact that most of ATS's data repositories are located on the un-routable Harvard 10.x.x.x subnet makes it difficult for a system on the outside to connect to these systems. HUIT Administrative Technology Services - Enterprise Applications Page 17 Enterprise Service Bus - Proof of Concept Findings Document In the proof-of-concept, putting ServiceMix and Fuse on an Amazon EC2 instance created a difficult-tobridge gap between the ESBs and the Harvard hosts with the data. The easiest way to get access to hosts at 60 Oxford Street (most with un-routable 10.x.x.x IP addresses) was to put the EC2 instance on the Cisco AnyConnect VPN, thereby giving it a 10.x.x.x IP address. This had the beneficial side effect of securing the pipe between these systems, but it was also an interactive process... after a reboot, someone had to log in to the EC2 instance and connect the VPN, using their own HUID and PIN for credentials. Post-proof-of-concept, for an AWS deployment we would need a permanent VPN in place to get access to the Harvard's 10.x.x.x network. This could be an IPsec VPN using Harvard's existing Fortinet Fortigate infrastructure, or we could build our own VPN using something like OpenVPN, but for the latter we would need a termination host inside Harvard's network. In our performance testing the network and the remote systems we were calling appeared to be the performance bottleneck. We never saw any memory, disk, or CPU overcommit and, as previously mentioned, we were using a very constrained "m1.small" Amazon EC2 instance. An alternative to Amazon would be to deploy an ESB within the Harvard data center, either on an existing host, on a VM, or on the new HUIT OpenStack cloud. One potential location that has been identified is to co-locate this new service with the existing FTP service on the ASFTP server (formerly known as "drone"). Another potential location comes from the HUIT Cloud / DevOps Working Group; Robert Parrot has shown strong interest in this project and has offered centrally-funded OpenStack VMs at no cost (no additional details at this time, but we plan to research this further). 5.3.3 Deploying a Message Broker Assuming we start with ActiveMQ as our message broker, the host or hosts we use for an ActiveMQ instance will need to store their data somewhere. For a redundant solution which provides fail-over Active MQ supports either a shared-storage configuration, or a shared-database configuration. Of these, shared-database seems to be the most common approach. Undelivered messages are persisted to the shared storage or shared database so that Active MQ can survive an outage. Since this data is at rest we must be sure it is encrypted. I suspect Oracle RAC with the encryption option is probably our best alternative. With data at rest, we will also have to ensure the database is located on the secure 10.x.x.x subnet. 5.4 Fit With Existing Similar Products Those of us familiar with the capabilities of Informatica are seeing a boundary between ServiceMix / Fuse and Informatica. We're not 100% sure where the boundary lies yet but we feel the two are complementary and not duplicative. Informatica has many capabilities which would take a lot of effort to replicate in ServiceMix or Fuse, for example the data mapping lineage and logging that Bill Brickman HUIT Administrative Technology Services - Enterprise Applications Page 18 Enterprise Service Bus - Proof of Concept Findings Document mentioned in section 5.1.1 of this document. Writing data transformations in Java would be obscuring, and often a step backward in error monitoring and logging. Simple movement of data: we love Camel. Making files, re-formatting, messaging: Camel surely wins. But populating a chain of related DB objects with data, applying calculations and joins? Informatica wins. Bill even feels that taking some of the load off of Informatica and moving it to ServiceMix or Fuse would bring more developers to Informatica because we would be using it only where it shines: "I think if we could get Informatica to work within databases, and put ServiceMix as the delivery point, outside-to-database, and database-to-outside, for files, web service consumption and production, for messaging, I think folks might end up liking Informatica more, when it isn’t pushed past what it’s best at." 5.5 Potential Use Cases This section identifies some potential use cases that the developers involved in the proof-of-concept felt were good candidates for an ESB. 5.5.1 Chart of Accounts Validator An ESB would be a really easy way to make the existing CoA Validator servlet available as a standardsbased service. We could provide CoA Validator services using all four combinations of SOAP & REST transports with XML & JSON payloads with minimal development effort. 5.5.2 ACE Data Mart One really powerful use case for an ESB would be to provide a web-service-based data mart. A Camel route could be used to retrieve data from an Oracle reporting database and make the data available as a set of REST services. This might be the best solution for providing third party data feeds for Alumni data, rather than permitting third parties to directly access the REST APIs currently being built into the ACE application. This approach would have the following advantages: 5.5.3 No need to pull additional data into ACE just because a third party wants to consume it. All data in the reporting database (including all Advance data) would be available. ACE remains nimble - we can change the application and the REST API without worrying about affecting third parties (release downtime, API changes). These data feeds would have no performance impact on the transactional systems like ACE and Advance because they are pulling from a reporting database. FTP Server The administrative systems FTP server (ASFTP) manages many flat file integrations among internal and external systems and is shared by several ATS practices. The PeopleSoft Practice, for example, has a number of vendor input / output file transfers which are handled by scripts on ASFTP. HUIT Administrative Technology Services - Enterprise Applications Page 19 Enterprise Service Bus - Proof of Concept Findings Document ServiceMix and Fuse are really good at the following functions: monitoring folders for incoming files moving files after processing delivering files via FTP and SFTP notifying interested parties via email Similar functions are implemented over and over again on the ASFTP server using various scripting languages, predominantly bash and Perl. Much of this code could be eliminated by using an ESB but we would need to work out best practices for encryption: 5.5.4 Encryption and decryption must be used on all flat files. Encryption and decryption should be done using CyberArk’s APIs for passwords and keys. IDM Import Job The IDM (Identity Management) Import job is a command-line Java process, kicked off by TWS/Maestro, which pulls data from the IDM database and writes it to an ID card production database. It is loaded with technical debt because it was written by a new-at-the-time GMAS team member who has subsequently departed. It uses tools and libraries that are not in use elsewhere, and as a result it is difficult to port when the environment in which it runs changes (new Java versions, new library versions, etc.). 5.5.5 Harvard Data Warehouse Data Extracts The HDW currently does a lot of data extraction, packaging and delivery which could be implemented in an ESB: HDW has many file transfer hand-shaking and packaging problems that have been solved with complex Unix scripts using command line SCP / SFTP and various locking strategies. Any shell script that currently uses .LCK files to synchronize itself is probably a good candidate for a move to an ESB. Many of the smaller, simpler database extracts currently implemented as Informatica wrapped with complex Unix scripts could be moved to an ESB. These generally produce flat files and transfer them to remote locations with SCP / SFTP. The HDW has customers with applications wanting database data access via web services. Currently this need is met with direct DB logins and a combination of database views to restrict the data they can see, and customer-written SQL to filter to the records of interest. 6 Recommendation We definitely believe an ESB is worth pursuing. We think the products we evaluated were powerful tools that show potential for providing great benefit to ATS and our customers. HUIT Administrative Technology Services - Enterprise Applications Page 20 Enterprise Service Bus - Proof of Concept Findings Document 6.1 Products At this point some real-world experience with a production quality ESB installation would be of more value than further research into other products we might adopt. We are at that stage where additional research won't lead us any closer to the perfect solution; until we actually try an ESB in production we don't fully know what features we are looking for. I believe we should choose ServiceMix with an ActiveMQ back end for a traditional deployment, or Fabric8 plus ActiveMQ for a cloud deployment. We know this isn't the most feature-rich solution but it is light weight, inexpensive, and might well meet all our needs. We are not making a life-long commitment; I fully expect to re-evaluate this decision 18-24 months down the road. This discovery process will give us a much better idea of the weaknesses in ServiceMix/Fabric8 and ActiveMQ. This knowledge would be of great benefit when evaluating additional products like the various alternatives to ActiveMQ enumerated in an earlier section of this document. Late breaking news: After the team committed to using Fabric8, the newFabric8 version 2.0 was released November 4, 2014 Fabric8 version 2 is built on top of Kubernetes (a container orchestration framework from Google) and Docker. Docker is an application containerization product which provides light weight virtual environments in which applications can run; much lighter weight than traditional VMs. Docker is based on Linux kernel namespaces, cgroups and the aufs "union" filesystem. This presents a few issues for us: Fabric8 version 2 does not run under Windows or Mac so developers could not have their own sandbox environment on their laptop without running a VM hypervisor and a Linux VM. Fabric8 version 2 is much more complex than version 1. It now supports multiple Java code containers; Karaf is just one of several including Tomcat and TomEE. Just building the Hello World application for Camel took almost an hour, mostly spent fetching packages from the maven repositories. There appear to be only two choices for deployment at this time: Jube (a pure Java implementation of Kubernetes) or RedHat's OpenShift v3 PAAS product. Jube appears to be recommended for development only. The team has decided, as a result, to go back to ServiceMix. 6.2 First Integrations So we have chosen a product. Now we have to choose some guinea pig integrations. While much of the power of ServiceMix can be tapped without any Java coding, certain functionalities like providing web services or doing complex database retrievals do require some Java coding. We think, therefore, uptake of an ESB in the ATS Practices would be dependent on previous Java experience, and that we should start with an ATS Practice which already has Java experience. HUIT Administrative Technology Services - Enterprise Applications Page 21 Enterprise Service Bus - Proof of Concept Findings Document The real unknown at this point is how do these products behave in a production environment? How stable are they? How do they behave when there are failures in the hardware and systems around them? How easy is it to determine cause when there is a problem? What kind of resources do they consume and is resource consumption linear as usage goes up? Starting small is probably the best approach to answering these questions; replicating a single existing service. Maybe implementing a REST/XML Chart of Accounts validator and putting it in production beside the existing servlet-based validator service? We could then port internal clients across to the new service before we make it available publicly. This would give us a chance to get answers about how these products behave in a production environment without too much risk. Once that first integration is done then we would extend to all the practices within ATS | Enterprise Applications. In fact, Lisa Justiniano intends to make deployment of an ESB-based integration a FY15 goal for each of her Practice Managers. 6.3 Architecture We would like to start with the simplest possible architecture. A single ServiceMix instance and a single ActiveMQ instance. This will be lowest cost and will provide the biggest benefit in terms of shared security, logging and auditing transparency (more instances splits the information into more places) reduced deployment time (infrastructure and ACLs are already in place) At some point in the future we may need to move to a more complex architecture for performance reasons or to isolate populations (of developers, of systems, or even of users) from one and other. 6.4 Where to Host the Infrastructure If we can get a ServiceMix / ActiveMQ instance up with zero new monthly cost that would be ideal. Because of the additional cost and the difficulty of securing the network connection we believe that Amazon is not the appropriate place to host an ESB for ATS at this time. It might be better to deploy an ESB within the Harvard data center, piggybacking onto existing hardware, but we must keep in mind that doing so might affect the performance of the existing systems. Initially we felt we should start by deploying on the ASFTP (formerly known as "drone") servers. The ASFTP server is not heavily loaded now so the additional load should not impact existing services as long as our code bundles are well-behaved. The only additional expense would be some additional disk space to handle the installs and logging. However, a better alternative has presented itself. Rob Parrot at HUIT Innovation & Architecture has kindly offered a set of OpenStack VMs in the 60 Oxford Street data center for an initial deployment. Of course this would mean a change to Fabric8 so we can take advantage of its cloud deployment features, but it would align nicely with HUIT's goal of hosting 100% of new applications on the cloud. We would HUIT Administrative Technology Services - Enterprise Applications Page 22 Enterprise Service Bus - Proof of Concept Findings Document start in the data center so we have access to the databases and middleware instances the ESB would integrate with, and when the network issues are worked out we could move to an external OpenStack provider. We could start without redundant hardware; our first integration is basically stateless, so we don't have to worry about redundant ActiveMQ instances. Over time, as need presents itself, we could move our single ActiveMQ instance to a redundant ActiveMQ using database-backed persistence, or to the JMS services provided directly by the Oracle database product. HUIT Administrative Technology Services - Enterprise Applications Page 23