Front cover WebSphere Transaction Cluster Facility Redguides for Business Leaders Jason Keenaghan Sastry Duri Barry Baker Paul Dantzig Jonathan Collins Don Kallberg Build optimized and scalable transaction processing solutions Understand the limits of the relational model for transaction processing Discover the business and IT benefits of IBM WTCF Executive overview As with many companies, you likely have an IT environment that includes multiple, diverse technology platforms, applications, and assets. Furthermore, you are probably under constant pressure to rationalize and reduce this environment to a smaller set of target platforms and infrastructure that can enable cost reduction and reduce complexity. Although this is a sound approach that enables an IT executive to deliver bottom-line results, there are likely needs and trends in your industry or in your business that continues to justify your requirement for a diverse set of platforms and technologies. These forces are driven largely by the diversity of the workload characteristics and non-functional requirements. Larger IT trends, such as the growth in mobile computing, will likely drive portions of your existing systems to their limits. These workloads are what will push you beyond your desired reduced set of platform patterns. For these workloads that you anticipate dramatic volume growth, a different approach is required if you are going to achieve IT efficiency. The focus of this approach should be on selecting platforms and technologies that are workload-specific and optimized. In this IBM® Redguide™ publication, we introduce an example of a workload-optimized offering that can provide a high performing and highly scalable solution for a targeted and demanding style of transaction processing. IBM WebSphere® Transaction Cluster Facility (WTCF) is a new offering that provides the foundation for you to deploy mission-critical online transaction processing (OLTP) against shared data. This offering provides consistent performance, high transaction throughput, and scalability to grow incrementally and smoothly with your business. By using and deploying a solution that is built from the ground up on workload-specific and optimized offerings, you should be able to reduce your costs for that particular workload over time, when compared to addressing the workload with general purpose offerings. Depending on the workload and the volume growth trend, those general purpose offerings might not meet your medium- to longer-term needs. IBM WebSphere Cluster Transaction Facility has been born out of decades of experience in supporting high-volume, high-value transaction processing against shared data by using IBM System z® (the IT industry standard for mission-critical OLTP), and provides similar capabilities now on IBM Power Systems™. In this IBM Redguide, we explore the unique features and capabilities of this offering and describe the underpinnings regarding why, for shared data transaction processing, this solution is in fact optimized, more efficient, and more scalable than other general purpose approaches. © Copyright IBM Corp. 2012. All rights reserved. 1 2 WebSphere Transaction Cluster Facility Business drivers This section explains the business drivers such as transaction volumes, workloads, and platforms, that you should consider when you determine the best solution for your transaction requirements. Growth of data and transaction volumes The growth of data and data-intensive applications is increasing every day. According to The Essential CIO report (IBM Institute for Business Value, CIE03073-USEN-01 2011, page 12), “we live in a world deeply infused with data. Vast quantities are being generated and captured as the world’s economic and societal systems become more instrumented and interconnected. And, those same systems are becoming ever more connected and complex. All the while, the pace of change is unabated.” The growth in data volume is just one dimension of the challenge. Additionally, the complexity, speed of change, and the business risk associated with ensuring the accuracy and secure management of the data is also growing. The web ushered in a phase of dramatic increase in new applications and the creation and access to new information. Over time, patterns emerged around web applications, n-tier architectures, service orientation, and business process management that helped IT executives execute on their vision. Now, mobile, social computing, and “The Internet of Things” is driving the next phase of dramatic growth in application and data patterns. The following website provides more information: http://asmarterplanet.com/blog/2010/03/the-internet-of-things.html To simultaneously cope with and achieve value from this growth in data, companies are undertaking numerous initiatives around analytics and Big Data, while making use of technologies including cloud, virtualization, optimized systems, and new transaction and data management offerings. Before things settle into proven patterns for how best to deal with the significant growth in data volume and application capabilities, numerous technologies and approaches will come and go; but in the end, new patterns will almost certainly emerge. © Copyright IBM Corp. 2012. All rights reserved. 3 As an IT executive, this becomes even more complicated by the fact that you are pressured to return greater value to your company through IT, and to do so on a shorter time horizon and with a constrained budget. This is driving the need to focus on several areas: Areas that are currently growing the fastest. Areas that will grow in the future. You need to assess the value of the application and data, and select optimal platforms and offerings to be able to support those workloads today and into the future. In some cases, doing so might go against established pattern guidelines and biases for technology selection, but again, it is important to stay aware of the idea that things are changing and will continue to change. This awareness requires the ability to recognize when current approaches will not support future business requirements. One critical area of focus, as it relates to the growing volumes of data, is transaction processing and assessing whether your existing approaches and solutions to transaction processing will continue to meet your needs going forward. In the end, the need for application infrastructure and data management capabilities that are flexible and workload-optimized is real, immediate, and growing. And, the argument that one application or data architecture will meet your needs going forward is losing ground as the growth of data and new applications progresses. Your application foundation needs to be designed to efficiently deliver applications and services to the business that are robust, scalable, and provide the right level of transactional integrity. Because your workload and data requirements are becoming more varied, one approach cannot fit all of your needs. In the following section, we describe several key transaction processing characteristics to consider as you select platforms for the future. Workload unique characteristics Transaction processing is a broad category of computing that most would consider to be well understood but not adaptable for innovation. But, given the dramatic changes occurring, this could not be further from the truth. In recent years, to deal with the growth driven by the web, many new approaches, patterns, and offerings have emerged. These alternative approaches have been developed to deal with growing transaction volumes and limitations with existing approaches. As your business expands to support new modes of customer interaction and as you try to capture more and more information from more and more sources to better enable your business, it is natural to expect that you will require new and different approaches to transaction processing. As you look at your current and future requirements for transaction processing, it is important to realize that a transaction is simply an abstract construct and not a universal unit of measure, that would allow you to compare various platforms and various workloads. Various platforms can be compared if you can hold the definition of the transaction constant, but if you cannot, then trying to compare platforms by raw transaction numbers is somewhat misleading. For example, 10,000 transactions per second on a system that is used to manage a customer profile is not likely to be the same as a system supporting 10,000 transactions per second that manages bank accounts, bank transfers, and ATMs. Not only do these two applications significantly differ in what they need to perform, there is also an inherent difference in the value of the transaction and the tolerance with which you have for the integrity of the transactions. For example, a customer might merely be annoyed by the loss of a transaction 4 WebSphere Transaction Cluster Facility that was meant to update their profile with a new address, but they will most likely have a different reaction to the loss of a transaction associated with a change to one of their bank accounts. Consider the primary transaction characteristics when determining the right approach: The size of the data set required to process a single transaction The ability to partition the data set Integrity requirements The volume of queries and updates Availability requirements Each of these characteristics, for a given workload, will narrow your options with regards to meeting your needs. It is important to realize that each of these characteristics cannot be viewed in isolation because they are interrelated. For example, if your transactions and data are such that you can effectively partition the data set and route each transaction to the appropriate partition and the transaction can be entirely serviced by that partition, then many solutions are available to you, both centralized and distributed in nature. But, if you look at the expected volume of updates and your integrity requirements together, then that will certainly narrow your options. This IBM Redguide is meant to introduce you to the IBM WebSphere Transaction Cluster Facility (WTCF). WTCF is not intended to address all of your transaction processing and data management requirements. Rather, it is targeted at supporting workloads and data sets that cannot be partitioned such that you can effectively route and isolate a transaction to a particular partition and where the integrity of your data is paramount (for example, data loss or inconsistent data cannot be tolerated). This way is especially true where the volume is high, growing, and expected to continue to grow, and where high availability is a must. Non-partitioned high volume workloads are challenging to support, IBM WebSphere Transaction Cluster Facility is designed to effectively handle these workloads with ease. Platform choice and flexibility What is provided in IBM WebSphere Transaction Cluster Facility is an optimized approach to support high value, mission-critical transaction processing and data management against shared data. What this provides is a means to support a challenging workload and to do so in such a way that you never have to question the view of your data being managed and transacted against that is managed by IBM WTCF. IBM System z is the leader in supporting these types of demanding workload characteristics against shared data. And now, with IBM WebSphere Transaction Cluster Facility, IBM is providing a new offering to support some of these workloads on distributed platforms. An important note is that although IBM WTCF is targeted at similar workloads, understanding the detailed differences in both functional and non-functional capabilities of the various hardware platforms and software offerings is critical. The reason is that, although IBM WTCF is targeted at high volume shared data transaction and data management workloads, there are many differences between it and what is provided on IBM System z. 5 6 WebSphere Transaction Cluster Facility Introducing WTCF and key features For a quarter of a century, the relational database management system (RDBMS) has been the dominant model for database management. But, today, non-relational or “Not Only SQL” (NoSQL) databases are gaining mindshare as an alternative model for database management, and are typically focused on performance, scale, and specific use cases like providing ad-hoc query support. Just as transaction rates have grown beyond recognition over the last decade, the volumes of data that are being stored have also increased massively. As such, the need to augment the traditional n-tier architecture with new forms of transaction and data management application infrastructure is immediate. To answer this need, IBM introduced a new scale-up offering called WebSphere Transaction Cluster Facility (WTCF) in September of 2011. This product is a combination of application infrastructure and data management that supports demanding transactional workloads that require a clustered, singular shared database while extending the value of WebSphere Application Server, IBM DB2®, and the DB2 IBM pureScale® feature for transaction and database management. With WTCF, low-latency, scalable, and continuously available transaction applications that have the strictest transaction and data integrity demands can be better supported on distributed systems, namely IBM Power Systems. WTCF provides the application infrastructure to augment existing applications or create new applications that have rigorous transaction and data integrity requirements in a flexible, scalable, and efficient manner. Figure 1 on page 8 depicts how WTCF fits into a general application architecture, positioned between the Business Logic and Data/Resource tiers. © Copyright IBM Corp. 2012. All rights reserved. 7 Tier 1 Presentation Clients Tier 2 Business Logic Tier 3 Data/Resource Application Servers WTCF Tivoli Monitoring DB2 pureScale WAS WTCF applications are WAS applications (or C++) that are co-located on clustered servers where the data is managed. LAN WTCF provides a unique data management API to enable the creation of non-relational network data model of heterogeneous data structures. WTCF Cluster WAS Existing Enterprise Information Systems WTCF is an offering targeted logically between the data/resource and the business logic tiers for a select workload. WAS Tivoli Monitoring Apps WTCF WAS DB2 Power Apps WTCF WAS DB2 Power Apps WTCF WAS DB2 Power DB2 pureScale WTCF is targeted at applications that require shared data. WTCF avoids some of the common performance/ Resource scalability limitersResources in relational: Manager no joins, no inserts, no deletes, (for example, few indices, denormalized data databases) model, no O/R mapping, and no derived data/aggregates. Figure 1 WTCF in a general application architecture Online transaction processing (OLTP) systems are often at the heart of an enterprise’s IT infrastructure, enabling those revenue-generating activities that are the key to business operations. Enterprises demand that their OLTP systems offer the highest levels of reliability, availability, serviceability (RAS), and even scalability to meet current and future needs. The need for 24x7x365 OLTP systems is not only desirable, it is an absolute requirement. After all, enterprises are betting their businesses on these systems. A failure, or even slight performance degradation, can result in millions of dollars in lost revenue. WebSphere Transaction Cluster Facility (WTCF) provides the core middleware components for creating modern, innovative, and agile applications for the high volume OLTP environment. The various products that comprise WTCF offer industry leading RAS capabilities within their respective markets. Additionally, those components and features that are unique to WTCF are built using decades of experience by IBM in high volume OLTP on System z, and translating that to the world of distributed systems. WTCF addresses the primary concerns of many CIOs and CTOs who must select the correct platform for their OLTP systems: Performance Scalability Flexibility, agility, and reduced time to market through reliable tooling Data integrity and security Manageability The following sections provide more details about the key features and functions that are available in WTCF. 8 WebSphere Transaction Cluster Facility Roles of the various products and components of WTCF WTCF is a pre-packaged offering that consists of several IBM Software offerings. These products, along with some unique WTCF components, have been coupled together to provide an optimized platform for extreme transaction processing based on a shared, always consistent, view of mission-critical information. Figure 2 shows the products and components in a typical WTCF cluster. Configuration Assistant Generated DB Classes for Java WTCF Java Library WebSphere Application Server Generated DB Classes for C++ * Tivoli Monitoring Java Applications C++ Applications Monitoring Agent WTCF C++ Library Persistence Server DB2 Shared Main Storage Figure 2 Products and components that comprise a node in the WTCF cluster Each node within the WTCF cluster runs on the IBM AIX® operating system for Power processors. Figure 2 illustrates the individual components that exist as part of a WTCF node. The WTCF product includes the following components: WebSphere Application Server V8 is an industry leading application server for Java Enterprise Edition (EE) programming. WebSphere Application Server V8 provides not only the core runtime application server environment, but also those application development, deployment, and system management capabilities that you would expect from an enterprise application server, including the following items: – IBM Rational® Application Developer (RAD) Standard Edition for WebSphere Application Server V8 – IBM Tivoli® Composite Application Manager (ITCAM) 7.2 for WebSphere Application Server V8 – IBM Assembly and Deploy Tools for WebSphere Administration DB2 Enterprise Server Edition 9.7.2 is a renowned relational database management system (RDBMS) with unmatched reliability, availability, and serviceability features. The WTCF database is built on top of DB2, using DB2’s superior performance to provide a 9 high-performing data store, while exploiting its capabilities in a unique, non-relational manner. DB2 pureScale Feature for Enterprise Server Edition 9.8.3 offers superior performance and near-linear scalability for a centralized database in a clustered distributed system. Configuration Assistant is a WTCF-unique Eclipse-based application for designing, defining, and configuring your WTCF database. Additionally, using WTCF’s Configuration Assistant, you can generate database-specific classes in both Java and C++ for use by your applications when interacting with the WTCF database. WTCF Java and C++ libraries present a unique, object-oriented, application programming interface (API) for both Java and C++ applications. Application programmers use a combination of these simple database constructs and base classes along with database-specific generated classes to assemble and create high-performing transaction processing applications. WTCF libraries give context and definition to the otherwise opaque data that is stored in the underlying DB2 database. Persistence server is the liaison between the WTCF libraries and the underlying DB2 database. The persistence server is responsible for managing the transactional state of the database and ensuring the efficient use and management of database resources. Shared main storage facilities are provided with WTCF to allow for the caching of frequently accessed, primarily read-only data objects. Each node in the cluster maintains its own cache manager that is responsible for coordinating updates and accesses to the cache by multiple application processes that are connected to the node. IBM Tivoli Monitoring Agent for WTCF provides insight into the health of the various WTCF components in real time. The agent communicates performance information to the IBM Tivoli Monitoring product suite and can be used in conjunction with the Tivoli Enterprise Portal to continually monitor the transaction processing environment. In addition to the WTCF agent, standard monitoring agents can be installed to monitor the AIX operating system and the DB2 database. Massively scalable centralized databases with WTCF WTCF exploits the clustering capabilities of DB2 pureScale to provide a massively scalable OLTP system on an always-consistent, centralized database. WTCF is architected for up to 256 nodes clustered together on a single image database. Each node in the WTCF cluster operates independently with its own unique persistence server, node manager, and cache manager components. DB2 pureScale provides the facilities for data management, serialization, and coordination of data accesses across nodes in the cluster. 10 WebSphere Transaction Cluster Facility Figure 3 depicts the component architecture of a multinode WTCF installation. WTCF WTCF WTCF WTCF Figure 3 WTCF component architecture in a multinode cluster Database structures optimized for high volume transaction processing The WTCF offering enables the creation of non-relational OLTP solutions specifically tailored to provide unmatched scalability and performance. The data model that can be implemented with WTCF is a hybrid between a hierarchical and a networked database. Linkages among various types of database objects are defined to provide information about interactions and relationships among the objects. Additionally, these linkages are used for defining the supported navigational paths through the database. WTCF is focused on providing an object-oriented view of transactional data. In so doing, there is a logical progression of the types of WTCF database objects. Figure 4 on page 12 shows the relationship between these various WTCF database objects. Starting with the smallest object, the objects are as follows: A field is the smallest unit of data in the WTCF database representing one unique piece of information. Every field has a data type, such as character, string, or integer. Each record is stored as an object of user data within the folder. User data is a collection of fields which represent one logical unit of information. A folder is a collection of logically related records. For example, a folder that represents one customer may contain a name record, an address record, and a phone number record. A cabinet is a collection of logically equivalent folders. For example, the collection of all customer folders would represent a cabinet. A file room is a collection of logically related cabinets with hierarchical relationships. A database represents the logical collection of all WTCF data that is available to an application. 11 1...N Database 1...N File Room 1...N Cabinet 1...N Folder 1...N Record Field Figure 4 Conceptual view of WTCF database objects Folders within WTCF represent the minimal unit of information that an application can request from the database, and upon which a serialization lock can be obtained. After a given folder has been opened, the application then has access to all of the records that the folder contains. Although the entire folder is accessible to the application, the entire contents of the folder may not be read into main storage at one time; this is managed by WTCF transparently from the application. Records stored within a given folder may be of the same type or they may be of different types. The ability to group heterogeneous record types together in the same folder can have tremendous benefits from the standpoint of transactional efficiency. Applications are able to access and manipulate logically related pieces of information that have different data structures, while minimizing the amount of I/O activity on the overall system. The WTCF database administrator (DBA) is responsible for defining the types of folders that can exist in the database. Folders of the same type are said to reside in the same cabinet. For any given cabinet, there is a defined set of records that are allowed to exist in any of those folders. Furthermore, from within the records of a given folder, it may contain references to other folders in other cabinets. It is these references that are used to establish the hierarchy of various types of folders. Figure 5 on page 13 illustrates an example of a cabinet hierarchy that could be defined by a DBA for a sample transportation reservation system. In this example, the database is separated into two file rooms: Passengers and Inventory. The Passengers file room contains a cabinet called Reservation that consists of one folder for each reservation that has been made. A Reservation folder may consist of multiple types of records, for example, a passenger information record, an itinerary record, a travel history record, and so on. References to Reservation folders are maintained from inside other cabinets, for example, Name, Number, Seat, and Wait. Similarly, references to Seat and Wait folders are maintained from inside the Transport cabinet, thus creating a multi-level navigation hierarchy. 12 WebSphere Transaction Cluster Facility Database = "testdb" File Room = "Passengers" Name Number File Room = "Inventory" Transport Seat Inventory Group Wait Reservation Transport Inventory Figure 5 Sample WTCF database layout In addition, WTCF provides the ability to host logically separate databases through a mechanism known as multitenancy. Multitenancy allows multiple instances of database structures to exist for each configured tenant in a single WTCF database. All tenants in the database share a common database schema, but will access different copies of data transparently. Application agility to drive business success Modern object-oriented and service-oriented application architectures work best when the underlying databases follow a similar paradigm. For this reason, WTCF presents and manages all information in the database as a set of objects. The database administrator defines the overall structure of the WTCF database by using the configuration assistant. The result is a database configuration file that contains file room metadata, along with programming artifacts that are used directly by the application programmer. The configuration assistant is used to do the following tasks: Define runtime attributes of all cabinets in the file room. These attributes are used online to control WTCF library routine operations, such as the following operations: – Inserting records into the correct location in an organized folder. – Comparing fields during a search for a particular record. Generate class definitions, in both C++ and Java, that you can use in WTCF applications. These generated database-specific classes include the following items: – Folder locators: Used to locate specific folders in the database based on predefined search criteria. – Folder indexers: Used to programmatically establish hierarchical relationships among folders in the database. – Records: Type-specific collections of fields that represent an object in the database. 13 Using the configuration assistant to generate database-specific classes for the application has the following benefits: Provides a level of encapsulation so that applications can work with data records as objects without knowledge of how those records are stored or serialized in a particular folder. Provides a single point of control to ensure consistency in the application data definitions and the actual format of the data as it is stored in the database. Reduces the time-to-market when you create new types of application data or modify existing types of data. Provides APIs that are data-aware and used by the underlying WTCF library routines. In addition to the persistent data objects that can be stored in the WTCF database, WTCF includes a main storage cache for sharing frequently accessed common data across multiple operating system processes. The cache manager is responsible for managing the cache and all entities (or items) in the cache. Each entity that is included in the cache is referred to as an item. Cache items are byte streams that are referenced by keys; cache keys are text strings that represent a symbolic name for the cache item. The WTCF cache is intended for data that has the following characteristics: The same item is accessed by multiple processes. Each process is able to identify the same item by using an identical key. That is, you do not expect different processes to refer to the same item by using different keys. Updates to items are not frequent; that is, new items are not added, the values of items are not changed, or items are not deleted on a frequent basis. 14 WebSphere Transaction Cluster Facility Comparing WTCF to relational WebSphere Transaction Cluster Facility (WTCF) itself is implemented on top of DB2, a relational database. It exploits DB2 strengths in unique ways and adds new functionality well suited to developing scalable applications with unique data processing characteristics. Reservation systems are one example of scalable, high volume transaction processing platforms that can benefit from the capabilities offered by WTCF. At a minimum, these systems typically manage two types of databases: passenger reservations and inventory. In any given reservation transaction, both of these databases are frequently accessed by the application. We examine the following considerations that make these workloads unique and untamable through relational means: Data is not partitionable Data partitioning strategies achieve scale by first partitioning data and then directing requests to the most appropriate partition. These strategies win big when they “guess right.” The cost of getting data from a separate partition far exceeds the cost of getting data from within the same partition, and as a result these strategies under-perform when they “guess wrong.” Reservation requests are not amenable to this technique because any agent should be able to book any passenger on any flight or a combination of flights. Therefore, partitioning data by agent, by passenger, or by flight increases the average cost of booking a reservation for all. Denormalized data These workloads contain data hubs, a collection of related but dissimilar data items held together by a logical entity. The passenger name record (PNR) is a good example of one such record. Because of high performance requirements, such records should be accessed in the most efficient manner. WTCF achieves this by storing all the related data items in a single folder. Data denormalization, in general, leads to data duplication and associated maintenance headaches. Data normalization, however, requires costly join operations. For data hubs, WTCF gives the ability to denormalize data without data duplication. Data too big to fit in memory If one has to pick entities with contrasting access patterns and relative sizes, those would be seat inventory and PNRs. In a typical reservation system, seat inventory is the smaller of the two. Inventory is small to a point that it might successfully fit in main memory in its © Copyright IBM Corp. 2012. All rights reserved. 15 entirety. In addition, inventory access patterns are cache friendly. PNRs, however, are several orders of magnitude larger, and therefore cannot fit entirely in main memory. Further, PNR access patterns are unpredictable and exhibit very poor cache hit ratios. In a relational setting, assembling a PNR record not only requires join operations, but these operations result in a relatively higher number of disk accesses. Anti-cluster behavior With the advent of massively scalable databases with shared storage, DB2 pureScale enables nearly linear scalability. Therefore, data partitioning may not be required to build scalable reservation applications. However, normalization requirements that are required for an operational system lead to data distributed in multiple tables. Because clusters share table data at the page level, collisions on rows stored in a given page will lead to excessive page exchanges between cluster members, resulting in delayed or slow service times. Simple reservation data model To understand WTCF features better, let us take a reservation system and model its data in terms of WTCF artifacts and also as relational tables. This gives us an opportunity to compare and contrast both approaches and compute the cost of creating a reservation. Consider a simple reservation system for trains that consist of the following types of information: Seat inventory Train inventory contains information about the availability of seats for booking. Passenger records A passenger record contains details about individual passengers, contact information, train bookings, seat assignment, services, and history. Seat maps and wait lists A seat map provides the status of each seat for every travel segment of a given train journey. A wait list consists of an ordered list of passengers waiting for seats in a given train on a given day. Logical data model We need the following logical entities to model our reservation system: A train station identified by name and station ID. A train identified by a train number. A train journey starting from a station, passing through zero or more stations before terminating at its final destination. It consists of an ordered list of stations, arrival and departure times, arrival status, and departure status. A travel segment consisting of an ordered pair of stations where the first station indicates where a travel begins and the second station indicates where it ends. An inventory counter consisting of a travel segment, and a pair of positive integers representing available and booked seats. A train inventory consisting of train ID, day of travel at the origin station, and a set of inventory counters. A wait list that is an ordered list of passengers at a station for a given train. 16 WebSphere Transaction Cluster Facility A passenger reservation consisting of an ordered list of passenger names, their contact information, their trips, services, and reservation history. A trip consisting of a trip ID unique within a reservation folder, a train ID, a travel segment, and the date and time of travel. A seat map consisting of passenger details and the travel segments they occupy a given seat in a train journey. A service consisting of a service ID unique within a reservation folder, trip ID, and seat assignment. Our example puts all of the these logical entities in context. Consider a train SP0011 named South Polar Express. The SP0011 train, operated by FastTrack starts at the station StnA in the rail network ABC Rail, and terminates at the station StnC in the network belonging to Polar Rail. The train SP0011 stops at the station StnB belonging to ABC Rail before terminating at the StnC. For stations StnA and StnB, the network code is 1 and station codes are 1 and 2 respectively; for StnC, the network code is 2, and station code 1. The travel segments for the train SP0011 are (StnA, StnB), (StnB, StnC)and (StnA, StnC). WTCF realization In this section, we describe a physical realization of the logical model described in “Logical data model” on page 16 using WTCF artifacts. Figure 6 shows the cabinet hierarchy for the train file room. Train Journey • Hash on day of the year • One record for each train • Train number • Managed References • Seat Map • Wait List • Train Inventory Counters Seat Map Wait List • One folder per train per day • One record per passenger • One passenger name • Reference to corresponding PNR folder for reservation • Car, compartment, seat numbers • One folder per train per day • One record per wait listed passenger • Wait list number • Reference to corresponding PNR folder PNR • One folder per reservation • Contains different types of records • Reservation-specific records (example, trip info, services booked) • Passenger-specific records (example, passenger details, contact information) Train Inventory Counters • One folder per train per day • One record for each travel segment Figure 6 Cabinet hierarchy 17 It consists of the following elements: Train Journey is a root-level cabinet with the following characteristics: – There are 366 root folders, one for each day of the year (based on a leap year). – Each record in a folder represents a single train journey on that day and are distributed by using the DirectOrdinalSpecification distributed method. – Records in this cabinet contain references to folders in the lower level Seat Map, Wait List, and Train Inventory Counters, and Train Time Table cabinets. Seat Map is a mid-level cabinet with the following characteristics: – Folders exist only if there is a reference from a Train Journey record on a specific date. – Each record in a folder represents a single seat on the referenced train journey. Wait List is a mid-level cabinet with the following characteristics: – Folders exist only if there is a reference from a Train Journey record on a specific date. – Each record in a folder represents a single wait listed passenger on the referenced train journey. PNR, which represents a passenger name record, is a leaf-level cabinet with the following characteristics: – There is one folder for each passenger number. – Folders can contain one unique number record and any number of name, address, facts, or flight history records. Train Inventory Counter is a leaf cabinet with the following characteristics: – Folders exist only if there is a reference from a Train Journey record on a specific date. – Each record in a folder represents inventory counts for a travel segment. 18 WebSphere Transaction Cluster Facility Figure 7 shows how records are laid out in a reservation folder. … Maps to PNRROOT table City Steert Title One or more passenger records per Reservation folder Maps to PAXINFO table … Term ID Issue Date Agent … … Origin One or more service records per Reservation folder Maps to PAXTRAN table Cat. Num. Comp Num. Seat Num. Time Last Name First Name Pax ID Tx. ID Date PAXGr p TKT Num. Dept Date Exp Date Rec ID 50 Serv. ID Serv. Type Rec ID 40 TX ID Rec ID 30 Pax ID PNR Rec ID 20 One reservation info record per Reservation folder One or more service records per Reservation folder Maps to PAXSERVICE table Dest Rec ID 10 Last SVC ID Last use date Last grp num. Inventory Last TX ID Seat Map PNRID Train Journey … One or more journey records per Reservation folder Maps to PAXJOURNEY table Figure 7 Reservations folder Relational realization Figure 8 on page 20 shows a realization of the logical model in a relational setting. The relational data model depicted here is kept fairly simplistic for illustration purposes. A production implementation would most likely require a far greater number of tables to store all of the required application information. Therefore, the issues that are presented in the following section would actually be exacerbated by an increase in the number of tables and rows that are required to construct the logical entities of the reservation system. 19 PNRPAX pnrid paxNum PAXTRAN transactionId pnrid PNRROOT pnrid PAXGROUPMEMBERS pnrid paxNum passengerGroup PAXGROUP pnrid passengerGroup PAXJOURNEY transactionId ticketNum PAXSERVICE transactionId passengerGroup serviceNum ticketNum PAXSEAT transactionId serviceNum passengerGroup ticketNum carNum seatNum compartmentNum Figure 8 A mapping of reservation folder records onto a set of relational tables Cost of creating a reservation A primary benefit of WTCF becomes apparent when you notice how passenger data is handled in WTCF and relational realizations. In WTCF, realization of a passenger’s reservation data is maintained as one unit; in relational realization it is spread across multiple tables.This information is shown in Figure 7 on page 19 for a PNR folder, which contains the passenger reservation details. We followed recommended practices of relational design, normalized the data in PNR folder, and mapped it onto the set of tables shown in Figure 8. Now, we estimate the cost of creating a reservation record for a family of two adults and a child in terms of number of table inserts. 1. Create a record in the PNRROOT table. 2. Create a record in the PNRTRAN table. 3. For each passenger, create a record in the PNRPAX table. 4. For each passenger, create a record in the PAXGROUPMEMBERS table. 5. Create three records in the PAXGROUP table: – One record for ticketing group consisting of two adults and one child – One record for service group consisting of two adults – One record for service group consisting of one child 6. Create a record in the PAXJOURNEY table for the ticket issued. 20 WebSphere Transaction Cluster Facility 7. Create two records in the PAXSERVICE table: – One record to define a service class for two adults – A second record for a separate service class for the child 8. As a last step, three records are added to the PAXSEAT table, one for each member in the family. The total number of inserts into the relational tables is 17. This does not take into account the cost of allocating space for new records and indexing newly created records. In WTCF, all the updates go into one single folder and require one update to the table. There is no overhead of allocating space for a row or indexing it because all rows are pre-allocated within WTCF. To access a PNR in WTCF, you could read records from the PNR folder. In a relational system, you have to assemble a PNR before presenting it to the application. This assembly typically requires joins of rather large tables for a decent size reservation system. In a typical reservation system, the number of passenger reservations will be several orders higher than other data items and usually do not fit in memory. Past observations of real systems show that passenger tables have cache hit ratios of less than 1%. This further exacerbates the cost of join operations involving passenger data. It is possible to denormalize data and reduce number of tables, and thus the number of inserts for creating a reservation. That means data is duplicated, which requires additional overhead to maintain consistency among duplicated data. The denormalization that is facilitated by WTCF however does not lead to data duplication in current circumstances. However, there is a price for this efficiency. In WTCF, data can be accessed only using predefined access paths thus limiting the ability to write arbitrary queries against data. When performing an availability request, multiple classes of inventory can be queried within the same folder. When making a booking, updates that are required for inventory counters all occur in the same folder with minimal database activity. Folders can grow to be very large in size. For example, a seat map folder for a train has several hundreds to over a thousand passengers. An internal folder indexing mechanism is automatically created when it is determined that the number of I/Os can be reduced by using an index that allows for efficiency in the following areas: Ordered insertion of new records Searching of existing records 21 22 WebSphere Transaction Cluster Facility Summary WebSphere Transaction Cluster Facility (WTCF) effectively runs high volume online transaction processing within a distributed environment. WTCF represents decades of experience supporting demanding high volume transaction processing workloads. The management of data within the WTCF middleware allows for high volume transaction processing, which are common in many industries that support mission-critical systems. The clustered environment of WTCF provides for unmatched scalability and availability through the robustness of the DB2 Engine extended in DB2 pureScale. Along with DB2, WTCF allows you to more easily develop applications using the WebSphere Application Server environment. Although the application programming interfaces are tailored for transaction processing, they are also built on the object oriented concepts of C++ and Java. Developers can quickly generate the initial database and database-specific classes using the WTCF Toolkit. This allows functions to get implemented faster in today’s demanding online transaction processing environments. The WTCF data objects are simple to use data designs for application developers and database administrators. Relational database management systems are not designed to effectively handle workloads where a consistent view of the data is critical. The reason is because of the inefficient manner in which a relational database indexes, retrieves, and stores data and proves costly because of the large number of join operations that are required to sustain high transaction rates while maintaining data integrity. WTCF has the performance characteristics and guarantees the data integrity for unique and complex transaction processing workloads. WTCF provides the infrastructure necessary to augment existing applications or create new applications that have rigorous transaction and data integrity requirements in a flexible, scalable efficient manner, establishing a cost effective platform to help IT executives meet the changing needs of the business today and into the future. © Copyright IBM Corp. 2012. All rights reserved. 23 Other resources for more information The following resources are helpful for more information: IBM WebSphere Transaction Cluster Facility Information Center: http://publib.boulder.ibm.com/infocenter/wtcfhelp/current IBM DB2 pureScale Feature Information Center: http://publib.boulder.ibm.com/infocenter/db2luw/v9r8/index.jsp WebSphere Transaction Cluster Facility product announcement http://www.ibm.com/common/ssi/ShowDoc.jsp?docURL=/common/ssi/rep_ca/0/897/ENUS2 11-300/index.html&lang=en WebSphere Transaction Cluster Facility website http://www.ibm.com/software/webservers/wtcf/ The team who wrote this guide This guide was produced by a team of specialists from around the world working at the International Technical Support Organization (ITSO). Jason Keenaghan is a Senior Software Engineer at IBM in Poughkeepsie, New York. He earned his Bachelor of Science degree from Boston College with a major in economics and a minor in computer science. He later earned an MBA from Marist College. Jason is currently one of the head architects for two leading high volume transaction processing platforms from IBM: z/Transaction Processing Facility and WebSphere Transaction Cluster Facility. He has over 14 years of experience designing and developing various components of these platforms including network communications, memory management, system error and recovery, database management, XML processing, SOA enablement, and enterprise integration. Sastry Duri is a Senior Software Engineer at the IBM T. J. Watson Research Center in Hawthorne, New York. Dr. Duri earned a Ph.D. degree in computer science from the University of Illinois at Chicago. His professional interests include distributed and high-performance computing systems, mobile commerce applications, RFID-based supply chains, and sensor and actuator applications. He has represented IBM in the industry standard group EPCglobal ALE Working Group, a subsidiary of the Uniform Code Council (UCC), and in OpenLS workgroup. Barry Baker is a Software Portfolio Product Manager in the IBM Software Group. He is responsible for assessing what markets IBM should compete in, determining gaps in the IBM portfolio, and partnering with customers, sales, marketing, services, development, research, and analysts to formulate and execute product roadmaps and strategies regarding high volume transaction processing, including IBM z/Transaction Processing Facility and WebSphere Transaction Cluster Facility. He has over 12 years of experience working on design, development, support, test, product management, and strategy of z/TPF. He regularly meets with C-level and line-of-business executives of the z/TPF customer set. Paul Dantzig is an IBM Research Relationship Manager for Financial Markets, Manager of High Volume Web Serving, and a Senior Technical Staff Member at the IBM T J Watson Research Center, Hawthorne, New York. He has also been an Adjunct Professor at Pace University since 1982. Currently, as part of IBM Research, he is a Research Advocate for JPMC and Verizon; Consultant and Developer on JPMC Remediation of Global Funds Control System; IBM Research Project Manager, architect, and designer of the WebSphere 24 WebSphere Transaction Cluster Facility Transaction Cluster Facility. Previous customer projects include RBC Benchmark (2011), Morgan Stanley Consulting (2010-2011), JPMC Global Funds Control System Analysis (2011, 2008), IBM Research Technical Advisor for eBay (2002 - 2007), The Chicago Mercantile Exchange Joint Study on Distributed Parallel Matching Engine (2005), architect and mentor for the Shanghai Stock Exchange Portal Site (2002-2003). In addition, he was an IBM Research Relationship Manager for Sport and Event websites (1995- 2003); IBM Research Relationship Manager for the Atlanta, Nagano, and Sydney Olympic Games; Architect for 2000 Sydney Olympic website; Architect and Application Development Manager for 1998 Nagano Olympic website; and Architect, Development Manager, and Programmer for Olympic Internet Result System for 1996 Atlanta Olympic Web Site Web Publishing Systems. Paul has patented many caching and transaction methods and published widely. He has won numerous awards within IBM for his contributions. Jonathan Collins is a Product Line Manager for the WTCF and ALCS products in the IBM Software Group. He is responsible for financial management, acquisitions, pricing, marketing, customer executive interfacing, new business opportunities, partner enablement, vendor relationships, strategy, and product direction. Jonathan has over 10 years of experience working in the TPF organization, including product development, customer support, and product management. Don Kallberg is a Senior Software Engineer at IBM in Poughkeepsie, New York, doing project management within the WebSphere Transaction Cluster Facility development team. Don has 23 years of experience with IBM, and has earned his Masters in Computer Science degree from Syracuse University and his Bachelors of Science degree from Worcester Polytechnic Institute. Don has had various responsibilities including project management, software development, test and customer consulting on the z/Transaction Processing Facility (TPF), TPF Operations Server, IBM Passenger Rail Reservation System (IPRRS), and virtual machine (VM) products. Thanks to the following contributor for his involvement with this project: Stephen Smith, International Technical Support Organization, Raleigh Center Now you can become a published author, too! Here’s an opportunity to spotlight your skills, grow your career, and become a published author—all at the same time! Join an ITSO residency project and help write a book in your area of expertise, while honing your experience using leading-edge technologies. Your efforts will help to increase product acceptance and customer satisfaction, as you expand your network of technical contacts and relationships. Residencies run from two to six weeks in length, and you can participate either in person or as a remote resident working from your home base. Learn more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html Summary 25 Stay connected to IBM Redbooks Find us on Facebook: http://www.facebook.com/IBMRedbooks Follow us on Twitter: http://twitter.com/ibmredbooks Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 Explore new IBM Redbooks® publications, residencies, and workshops with the IBM Redbooks weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html 26 WebSphere Transaction Cluster Facility Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. © Copyright IBM Corp. 2012. All rights reserved. 27 This document, REDP-4817-00, was created or updated on January 30, 2012. ® Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml Redbooks® The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: AIX® DB2® IBM® Power Systems™ pureScale® Rational® Redbooks® Redguide™ Redbooks (logo) System z® Tivoli® WebSphere® ® The following terms are trademarks of other companies: Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. 28 WebSphere Transaction Cluster Facility