Front cover WebSphere Transaction Cluster Facility Redguides

advertisement
Front cover
WebSphere Transaction Cluster Facility
Redguides
for Business Leaders
Jason Keenaghan
Sastry Duri
Barry Baker
Paul Dantzig
Jonathan Collins
Don Kallberg
Build optimized and scalable transaction
processing solutions
Understand the limits of the relational
model for transaction processing
Discover the business and IT benefits of
IBM WTCF
Executive overview
As with many companies, you likely have an IT environment that includes multiple, diverse
technology platforms, applications, and assets. Furthermore, you are probably under
constant pressure to rationalize and reduce this environment to a smaller set of target
platforms and infrastructure that can enable cost reduction and reduce complexity. Although
this is a sound approach that enables an IT executive to deliver bottom-line results, there are
likely needs and trends in your industry or in your business that continues to justify your
requirement for a diverse set of platforms and technologies. These forces are driven largely
by the diversity of the workload characteristics and non-functional requirements.
Larger IT trends, such as the growth in mobile computing, will likely drive portions of your
existing systems to their limits. These workloads are what will push you beyond your desired
reduced set of platform patterns. For these workloads that you anticipate dramatic volume
growth, a different approach is required if you are going to achieve IT efficiency. The focus of
this approach should be on selecting platforms and technologies that are workload-specific
and optimized.
In this IBM® Redguide™ publication, we introduce an example of a workload-optimized
offering that can provide a high performing and highly scalable solution for a targeted and
demanding style of transaction processing. IBM WebSphere® Transaction Cluster Facility
(WTCF) is a new offering that provides the foundation for you to deploy mission-critical online
transaction processing (OLTP) against shared data. This offering provides consistent
performance, high transaction throughput, and scalability to grow incrementally and smoothly
with your business. By using and deploying a solution that is built from the ground up on
workload-specific and optimized offerings, you should be able to reduce your costs for that
particular workload over time, when compared to addressing the workload with general
purpose offerings. Depending on the workload and the volume growth trend, those general
purpose offerings might not meet your medium- to longer-term needs.
IBM WebSphere Cluster Transaction Facility has been born out of decades of experience in
supporting high-volume, high-value transaction processing against shared data by using IBM
System z® (the IT industry standard for mission-critical OLTP), and provides similar
capabilities now on IBM Power Systems™. In this IBM Redguide, we explore the unique
features and capabilities of this offering and describe the underpinnings regarding why, for
shared data transaction processing, this solution is in fact optimized, more efficient, and more
scalable than other general purpose approaches.
© Copyright IBM Corp. 2012. All rights reserved.
1
2
WebSphere Transaction Cluster Facility
Business drivers
This section explains the business drivers such as transaction volumes, workloads, and
platforms, that you should consider when you determine the best solution for your transaction
requirements.
Growth of data and transaction volumes
The growth of data and data-intensive applications is increasing every day. According to The
Essential CIO report (IBM Institute for Business Value, CIE03073-USEN-01 2011, page 12),
“we live in a world deeply infused with data. Vast quantities are being generated and captured
as the world’s economic and societal systems become more instrumented and
interconnected. And, those same systems are becoming ever more connected and complex.
All the while, the pace of change is unabated.”
The growth in data volume is just one dimension of the challenge. Additionally, the complexity,
speed of change, and the business risk associated with ensuring the accuracy and secure
management of the data is also growing. The web ushered in a phase of dramatic increase in
new applications and the creation and access to new information.
Over time, patterns emerged around web applications, n-tier architectures, service
orientation, and business process management that helped IT executives execute on their
vision. Now, mobile, social computing, and “The Internet of Things” is driving the next phase
of dramatic growth in application and data patterns. The following website provides more
information:
http://asmarterplanet.com/blog/2010/03/the-internet-of-things.html
To simultaneously cope with and achieve value from this growth in data, companies are
undertaking numerous initiatives around analytics and Big Data, while making use of
technologies including cloud, virtualization, optimized systems, and new transaction and data
management offerings.
Before things settle into proven patterns for how best to deal with the significant growth in
data volume and application capabilities, numerous technologies and approaches will come
and go; but in the end, new patterns will almost certainly emerge.
© Copyright IBM Corp. 2012. All rights reserved.
3
As an IT executive, this becomes even more complicated by the fact that you are pressured to
return greater value to your company through IT, and to do so on a shorter time horizon and
with a constrained budget. This is driving the need to focus on several areas:
򐂰 Areas that are currently growing the fastest.
򐂰 Areas that will grow in the future. You need to assess the value of the application and data,
and select optimal platforms and offerings to be able to support those workloads today
and into the future.
In some cases, doing so might go against established pattern guidelines and biases for
technology selection, but again, it is important to stay aware of the idea that things are
changing and will continue to change. This awareness requires the ability to recognize when
current approaches will not support future business requirements. One critical area of focus,
as it relates to the growing volumes of data, is transaction processing and assessing whether
your existing approaches and solutions to transaction processing will continue to meet your
needs going forward.
In the end, the need for application infrastructure and data management capabilities that are
flexible and workload-optimized is real, immediate, and growing. And, the argument that one
application or data architecture will meet your needs going forward is losing ground as the
growth of data and new applications progresses. Your application foundation needs to be
designed to efficiently deliver applications and services to the business that are robust,
scalable, and provide the right level of transactional integrity. Because your workload and data
requirements are becoming more varied, one approach cannot fit all of your needs.
In the following section, we describe several key transaction processing characteristics to
consider as you select platforms for the future.
Workload unique characteristics
Transaction processing is a broad category of computing that most would consider to be well
understood but not adaptable for innovation. But, given the dramatic changes occurring, this
could not be further from the truth. In recent years, to deal with the growth driven by the web,
many new approaches, patterns, and offerings have emerged. These alternative approaches
have been developed to deal with growing transaction volumes and limitations with existing
approaches. As your business expands to support new modes of customer interaction and as
you try to capture more and more information from more and more sources to better enable
your business, it is natural to expect that you will require new and different approaches to
transaction processing.
As you look at your current and future requirements for transaction processing, it is important
to realize that a transaction is simply an abstract construct and not a universal unit of
measure, that would allow you to compare various platforms and various workloads. Various
platforms can be compared if you can hold the definition of the transaction constant, but if you
cannot, then trying to compare platforms by raw transaction numbers is somewhat
misleading.
For example, 10,000 transactions per second on a system that is used to manage a customer
profile is not likely to be the same as a system supporting 10,000 transactions per second
that manages bank accounts, bank transfers, and ATMs. Not only do these two applications
significantly differ in what they need to perform, there is also an inherent difference in the
value of the transaction and the tolerance with which you have for the integrity of the
transactions. For example, a customer might merely be annoyed by the loss of a transaction
4
WebSphere Transaction Cluster Facility
that was meant to update their profile with a new address, but they will most likely have a
different reaction to the loss of a transaction associated with a change to one of their bank
accounts.
Consider the primary transaction characteristics when determining the right approach:
򐂰
򐂰
򐂰
򐂰
򐂰
The size of the data set required to process a single transaction
The ability to partition the data set
Integrity requirements
The volume of queries and updates
Availability requirements
Each of these characteristics, for a given workload, will narrow your options with regards to
meeting your needs. It is important to realize that each of these characteristics cannot be
viewed in isolation because they are interrelated. For example, if your transactions and data
are such that you can effectively partition the data set and route each transaction to the
appropriate partition and the transaction can be entirely serviced by that partition, then many
solutions are available to you, both centralized and distributed in nature. But, if you look at the
expected volume of updates and your integrity requirements together, then that will certainly
narrow your options.
This IBM Redguide is meant to introduce you to the IBM WebSphere Transaction Cluster
Facility (WTCF). WTCF is not intended to address all of your transaction processing and data
management requirements. Rather, it is targeted at supporting workloads and data sets that
cannot be partitioned such that you can effectively route and isolate a transaction to a
particular partition and where the integrity of your data is paramount (for example, data loss
or inconsistent data cannot be tolerated). This way is especially true where the volume is
high, growing, and expected to continue to grow, and where high availability is a must.
Non-partitioned high volume workloads are challenging to support, IBM WebSphere
Transaction Cluster Facility is designed to effectively handle these workloads with ease.
Platform choice and flexibility
What is provided in IBM WebSphere Transaction Cluster Facility is an optimized approach to
support high value, mission-critical transaction processing and data management against
shared data. What this provides is a means to support a challenging workload and to do so in
such a way that you never have to question the view of your data being managed and
transacted against that is managed by IBM WTCF.
IBM System z is the leader in supporting these types of demanding workload characteristics
against shared data. And now, with IBM WebSphere Transaction Cluster Facility, IBM is
providing a new offering to support some of these workloads on distributed platforms. An
important note is that although IBM WTCF is targeted at similar workloads, understanding the
detailed differences in both functional and non-functional capabilities of the various hardware
platforms and software offerings is critical. The reason is that, although IBM WTCF is targeted
at high volume shared data transaction and data management workloads, there are many
differences between it and what is provided on IBM System z.
5
6
WebSphere Transaction Cluster Facility
Introducing WTCF and key features
For a quarter of a century, the relational database management system (RDBMS) has been
the dominant model for database management. But, today, non-relational or “Not Only SQL”
(NoSQL) databases are gaining mindshare as an alternative model for database
management, and are typically focused on performance, scale, and specific use cases like
providing ad-hoc query support. Just as transaction rates have grown beyond recognition
over the last decade, the volumes of data that are being stored have also increased
massively. As such, the need to augment the traditional n-tier architecture with new forms of
transaction and data management application infrastructure is immediate.
To answer this need, IBM introduced a new scale-up offering called WebSphere Transaction
Cluster Facility (WTCF) in September of 2011. This product is a combination of application
infrastructure and data management that supports demanding transactional workloads that
require a clustered, singular shared database while extending the value of WebSphere
Application Server, IBM DB2®, and the DB2 IBM pureScale® feature for transaction and
database management. With WTCF, low-latency, scalable, and continuously available
transaction applications that have the strictest transaction and data integrity demands can be
better supported on distributed systems, namely IBM Power Systems.
WTCF provides the application infrastructure to augment existing applications or create new
applications that have rigorous transaction and data integrity requirements in a flexible,
scalable, and efficient manner.
Figure 1 on page 8 depicts how WTCF fits into a general application architecture, positioned
between the Business Logic and Data/Resource tiers.
© Copyright IBM Corp. 2012. All rights reserved.
7
Tier 1
Presentation
Clients
Tier 2
Business Logic
Tier 3
Data/Resource
Application Servers
WTCF
Tivoli Monitoring
DB2 pureScale
WAS
WTCF applications are
WAS applications (or C++)
that are co-located on
clustered servers where
the data is managed.
LAN
WTCF provides a unique data
management API to enable the
creation of non-relational
network data model of
heterogeneous data structures.
WTCF Cluster
WAS
Existing Enterprise
Information Systems
WTCF is an offering
targeted logically between
the data/resource and the
business logic tiers for a
select workload.
WAS
Tivoli Monitoring
Apps
WTCF
WAS
DB2
Power
Apps
WTCF
WAS
DB2
Power
Apps
WTCF
WAS
DB2
Power
DB2 pureScale
WTCF is targeted at
applications that require
shared data.
WTCF avoids some of the
common performance/
Resource
scalability limitersResources
in relational:
Manager
no joins, no inserts,
no deletes,
(for example,
few indices, denormalized
data
databases)
model, no O/R mapping, and
no derived data/aggregates.
Figure 1 WTCF in a general application architecture
Online transaction processing (OLTP) systems are often at the heart of an enterprise’s IT
infrastructure, enabling those revenue-generating activities that are the key to business
operations. Enterprises demand that their OLTP systems offer the highest levels of reliability,
availability, serviceability (RAS), and even scalability to meet current and future needs. The
need for 24x7x365 OLTP systems is not only desirable, it is an absolute requirement. After all,
enterprises are betting their businesses on these systems. A failure, or even slight
performance degradation, can result in millions of dollars in lost revenue.
WebSphere Transaction Cluster Facility (WTCF) provides the core middleware components
for creating modern, innovative, and agile applications for the high volume OLTP environment.
The various products that comprise WTCF offer industry leading RAS capabilities within their
respective markets. Additionally, those components and features that are unique to WTCF
are built using decades of experience by IBM in high volume OLTP on System z, and
translating that to the world of distributed systems.
WTCF addresses the primary concerns of many CIOs and CTOs who must select the correct
platform for their OLTP systems:
򐂰
򐂰
򐂰
򐂰
򐂰
Performance
Scalability
Flexibility, agility, and reduced time to market through reliable tooling
Data integrity and security
Manageability
The following sections provide more details about the key features and functions that are
available in WTCF.
8
WebSphere Transaction Cluster Facility
Roles of the various products and components of WTCF
WTCF is a pre-packaged offering that consists of several IBM Software offerings. These
products, along with some unique WTCF components, have been coupled together to provide
an optimized platform for extreme transaction processing based on a shared, always
consistent, view of mission-critical information. Figure 2 shows the products and components
in a typical WTCF cluster.
Configuration
Assistant
Generated
DB Classes
for Java
WTCF Java
Library
WebSphere Application Server
Generated
DB Classes
for C++
* Tivoli
Monitoring
Java
Applications
C++
Applications
Monitoring
Agent
WTCF C++
Library
Persistence
Server
DB2
Shared
Main
Storage
Figure 2 Products and components that comprise a node in the WTCF cluster
Each node within the WTCF cluster runs on the IBM AIX® operating system for Power
processors. Figure 2 illustrates the individual components that exist as part of a WTCF node.
The WTCF product includes the following components:
򐂰 WebSphere Application Server V8 is an industry leading application server for Java
Enterprise Edition (EE) programming. WebSphere Application Server V8 provides not only
the core runtime application server environment, but also those application development,
deployment, and system management capabilities that you would expect from an
enterprise application server, including the following items:
– IBM Rational® Application Developer (RAD) Standard Edition for WebSphere
Application Server V8
– IBM Tivoli® Composite Application Manager (ITCAM) 7.2 for WebSphere Application
Server V8
– IBM Assembly and Deploy Tools for WebSphere Administration
򐂰 DB2 Enterprise Server Edition 9.7.2 is a renowned relational database management
system (RDBMS) with unmatched reliability, availability, and serviceability features. The
WTCF database is built on top of DB2, using DB2’s superior performance to provide a
9
high-performing data store, while exploiting its capabilities in a unique, non-relational
manner.
򐂰 DB2 pureScale Feature for Enterprise Server Edition 9.8.3 offers superior performance
and near-linear scalability for a centralized database in a clustered distributed system.
򐂰 Configuration Assistant is a WTCF-unique Eclipse-based application for designing,
defining, and configuring your WTCF database. Additionally, using WTCF’s Configuration
Assistant, you can generate database-specific classes in both Java and C++ for use by
your applications when interacting with the WTCF database.
򐂰 WTCF Java and C++ libraries present a unique, object-oriented, application programming
interface (API) for both Java and C++ applications. Application programmers use a
combination of these simple database constructs and base classes along with
database-specific generated classes to assemble and create high-performing transaction
processing applications. WTCF libraries give context and definition to the otherwise
opaque data that is stored in the underlying DB2 database.
򐂰 Persistence server is the liaison between the WTCF libraries and the underlying DB2
database. The persistence server is responsible for managing the transactional state of
the database and ensuring the efficient use and management of database resources.
򐂰 Shared main storage facilities are provided with WTCF to allow for the caching of
frequently accessed, primarily read-only data objects. Each node in the cluster maintains
its own cache manager that is responsible for coordinating updates and accesses to the
cache by multiple application processes that are connected to the node.
򐂰 IBM Tivoli Monitoring Agent for WTCF provides insight into the health of the various WTCF
components in real time. The agent communicates performance information to the IBM
Tivoli Monitoring product suite and can be used in conjunction with the Tivoli Enterprise
Portal to continually monitor the transaction processing environment. In addition to the
WTCF agent, standard monitoring agents can be installed to monitor the AIX operating
system and the DB2 database.
Massively scalable centralized databases with WTCF
WTCF exploits the clustering capabilities of DB2 pureScale to provide a massively scalable
OLTP system on an always-consistent, centralized database. WTCF is architected for up to
256 nodes clustered together on a single image database. Each node in the WTCF cluster
operates independently with its own unique persistence server, node manager, and cache
manager components. DB2 pureScale provides the facilities for data management,
serialization, and coordination of data accesses across nodes in the cluster.
10
WebSphere Transaction Cluster Facility
Figure 3 depicts the component architecture of a multinode WTCF installation.
WTCF
WTCF
WTCF
WTCF
Figure 3 WTCF component architecture in a multinode cluster
Database structures optimized for high volume transaction
processing
The WTCF offering enables the creation of non-relational OLTP solutions specifically tailored
to provide unmatched scalability and performance. The data model that can be implemented
with WTCF is a hybrid between a hierarchical and a networked database. Linkages among
various types of database objects are defined to provide information about interactions and
relationships among the objects. Additionally, these linkages are used for defining the
supported navigational paths through the database.
WTCF is focused on providing an object-oriented view of transactional data. In so doing,
there is a logical progression of the types of WTCF database objects. Figure 4 on page 12
shows the relationship between these various WTCF database objects. Starting with the
smallest object, the objects are as follows:
򐂰 A field is the smallest unit of data in the WTCF database representing one unique piece of
information. Every field has a data type, such as character, string, or integer.
򐂰 Each record is stored as an object of user data within the folder. User data is a collection
of fields which represent one logical unit of information.
򐂰 A folder is a collection of logically related records. For example, a folder that represents
one customer may contain a name record, an address record, and a phone number
record.
򐂰 A cabinet is a collection of logically equivalent folders. For example, the collection of all
customer folders would represent a cabinet.
򐂰 A file room is a collection of logically related cabinets with hierarchical relationships.
򐂰 A database represents the logical collection of all WTCF data that is available to an
application.
11
1...N
Database
1...N
File Room
1...N
Cabinet
1...N
Folder
1...N
Record
Field
Figure 4 Conceptual view of WTCF database objects
Folders within WTCF represent the minimal unit of information that an application can request
from the database, and upon which a serialization lock can be obtained. After a given folder
has been opened, the application then has access to all of the records that the folder
contains. Although the entire folder is accessible to the application, the entire contents of the
folder may not be read into main storage at one time; this is managed by WTCF transparently
from the application.
Records stored within a given folder may be of the same type or they may be of different
types. The ability to group heterogeneous record types together in the same folder can have
tremendous benefits from the standpoint of transactional efficiency. Applications are able to
access and manipulate logically related pieces of information that have different data
structures, while minimizing the amount of I/O activity on the overall system.
The WTCF database administrator (DBA) is responsible for defining the types of folders that
can exist in the database. Folders of the same type are said to reside in the same cabinet. For
any given cabinet, there is a defined set of records that are allowed to exist in any of those
folders. Furthermore, from within the records of a given folder, it may contain references to
other folders in other cabinets. It is these references that are used to establish the hierarchy of
various types of folders.
Figure 5 on page 13 illustrates an example of a cabinet hierarchy that could be defined by a
DBA for a sample transportation reservation system. In this example, the database is
separated into two file rooms: Passengers and Inventory. The Passengers file room contains
a cabinet called Reservation that consists of one folder for each reservation that has been
made. A Reservation folder may consist of multiple types of records, for example, a
passenger information record, an itinerary record, a travel history record, and so on.
References to Reservation folders are maintained from inside other cabinets, for example,
Name, Number, Seat, and Wait. Similarly, references to Seat and Wait folders are maintained
from inside the Transport cabinet, thus creating a multi-level navigation hierarchy.
12
WebSphere Transaction Cluster Facility
Database = "testdb"
File Room = "Passengers"
Name
Number
File Room = "Inventory"
Transport
Seat
Inventory
Group
Wait
Reservation
Transport
Inventory
Figure 5 Sample WTCF database layout
In addition, WTCF provides the ability to host logically separate databases through a
mechanism known as multitenancy. Multitenancy allows multiple instances of database
structures to exist for each configured tenant in a single WTCF database. All tenants in the
database share a common database schema, but will access different copies of data
transparently.
Application agility to drive business success
Modern object-oriented and service-oriented application architectures work best when the
underlying databases follow a similar paradigm. For this reason, WTCF presents and
manages all information in the database as a set of objects. The database administrator
defines the overall structure of the WTCF database by using the configuration assistant. The
result is a database configuration file that contains file room metadata, along with
programming artifacts that are used directly by the application programmer.
The configuration assistant is used to do the following tasks:
򐂰 Define runtime attributes of all cabinets in the file room. These attributes are used online
to control WTCF library routine operations, such as the following operations:
– Inserting records into the correct location in an organized folder.
– Comparing fields during a search for a particular record.
򐂰 Generate class definitions, in both C++ and Java, that you can use in WTCF applications.
These generated database-specific classes include the following items:
– Folder locators: Used to locate specific folders in the database based on predefined
search criteria.
– Folder indexers: Used to programmatically establish hierarchical relationships among
folders in the database.
– Records: Type-specific collections of fields that represent an object in the database.
13
Using the configuration assistant to generate database-specific classes for the application
has the following benefits:
򐂰 Provides a level of encapsulation so that applications can work with data records as
objects without knowledge of how those records are stored or serialized in a particular
folder.
򐂰 Provides a single point of control to ensure consistency in the application data definitions
and the actual format of the data as it is stored in the database.
򐂰 Reduces the time-to-market when you create new types of application data or modify
existing types of data.
򐂰 Provides APIs that are data-aware and used by the underlying WTCF library routines.
In addition to the persistent data objects that can be stored in the WTCF database, WTCF
includes a main storage cache for sharing frequently accessed common data across multiple
operating system processes. The cache manager is responsible for managing the cache and
all entities (or items) in the cache.
Each entity that is included in the cache is referred to as an item. Cache items are byte
streams that are referenced by keys; cache keys are text strings that represent a symbolic
name for the cache item. The WTCF cache is intended for data that has the following
characteristics:
򐂰 The same item is accessed by multiple processes.
򐂰 Each process is able to identify the same item by using an identical key. That is, you do not
expect different processes to refer to the same item by using different keys.
򐂰 Updates to items are not frequent; that is, new items are not added, the values of items
are not changed, or items are not deleted on a frequent basis.
14
WebSphere Transaction Cluster Facility
Comparing WTCF to relational
WebSphere Transaction Cluster Facility (WTCF) itself is implemented on top of DB2, a
relational database. It exploits DB2 strengths in unique ways and adds new functionality well
suited to developing scalable applications with unique data processing characteristics.
Reservation systems are one example of scalable, high volume transaction processing
platforms that can benefit from the capabilities offered by WTCF. At a minimum, these
systems typically manage two types of databases: passenger reservations and inventory. In
any given reservation transaction, both of these databases are frequently accessed by the
application. We examine the following considerations that make these workloads unique and
untamable through relational means:
򐂰 Data is not partitionable
Data partitioning strategies achieve scale by first partitioning data and then directing
requests to the most appropriate partition. These strategies win big when they “guess
right.” The cost of getting data from a separate partition far exceeds the cost of getting
data from within the same partition, and as a result these strategies under-perform when
they “guess wrong.” Reservation requests are not amenable to this technique because any
agent should be able to book any passenger on any flight or a combination of flights.
Therefore, partitioning data by agent, by passenger, or by flight increases the average cost
of booking a reservation for all.
򐂰 Denormalized data
These workloads contain data hubs, a collection of related but dissimilar data items held
together by a logical entity. The passenger name record (PNR) is a good example of one
such record. Because of high performance requirements, such records should be
accessed in the most efficient manner. WTCF achieves this by storing all the related data
items in a single folder. Data denormalization, in general, leads to data duplication and
associated maintenance headaches. Data normalization, however, requires costly join
operations. For data hubs, WTCF gives the ability to denormalize data without data
duplication.
򐂰 Data too big to fit in memory
If one has to pick entities with contrasting access patterns and relative sizes, those would
be seat inventory and PNRs. In a typical reservation system, seat inventory is the smaller
of the two. Inventory is small to a point that it might successfully fit in main memory in its
© Copyright IBM Corp. 2012. All rights reserved.
15
entirety. In addition, inventory access patterns are cache friendly. PNRs, however, are
several orders of magnitude larger, and therefore cannot fit entirely in main memory.
Further, PNR access patterns are unpredictable and exhibit very poor cache hit ratios. In a
relational setting, assembling a PNR record not only requires join operations, but these
operations result in a relatively higher number of disk accesses.
򐂰 Anti-cluster behavior
With the advent of massively scalable databases with shared storage, DB2 pureScale
enables nearly linear scalability. Therefore, data partitioning may not be required to build
scalable reservation applications. However, normalization requirements that are required
for an operational system lead to data distributed in multiple tables. Because clusters
share table data at the page level, collisions on rows stored in a given page will lead to
excessive page exchanges between cluster members, resulting in delayed or slow service
times.
Simple reservation data model
To understand WTCF features better, let us take a reservation system and model its data in
terms of WTCF artifacts and also as relational tables. This gives us an opportunity to
compare and contrast both approaches and compute the cost of creating a reservation.
Consider a simple reservation system for trains that consist of the following types of
information:
򐂰 Seat inventory
Train inventory contains information about the availability of seats for booking.
򐂰 Passenger records
A passenger record contains details about individual passengers, contact information,
train bookings, seat assignment, services, and history.
򐂰 Seat maps and wait lists
A seat map provides the status of each seat for every travel segment of a given train
journey. A wait list consists of an ordered list of passengers waiting for seats in a given
train on a given day.
Logical data model
We need the following logical entities to model our reservation system:
򐂰 A train station identified by name and station ID.
򐂰 A train identified by a train number.
򐂰 A train journey starting from a station, passing through zero or more stations before
terminating at its final destination. It consists of an ordered list of stations, arrival and
departure times, arrival status, and departure status.
򐂰 A travel segment consisting of an ordered pair of stations where the first station indicates
where a travel begins and the second station indicates where it ends.
򐂰 An inventory counter consisting of a travel segment, and a pair of positive integers
representing available and booked seats.
򐂰 A train inventory consisting of train ID, day of travel at the origin station, and a set of
inventory counters.
򐂰 A wait list that is an ordered list of passengers at a station for a given train.
16
WebSphere Transaction Cluster Facility
򐂰 A passenger reservation consisting of an ordered list of passenger names, their contact
information, their trips, services, and reservation history.
򐂰 A trip consisting of a trip ID unique within a reservation folder, a train ID, a travel segment,
and the date and time of travel.
򐂰 A seat map consisting of passenger details and the travel segments they occupy a given
seat in a train journey.
򐂰 A service consisting of a service ID unique within a reservation folder, trip ID, and seat
assignment.
Our example puts all of the these logical entities in context. Consider a train SP0011 named
South Polar Express. The SP0011 train, operated by FastTrack starts at the station StnA in
the rail network ABC Rail, and terminates at the station StnC in the network belonging to
Polar Rail. The train SP0011 stops at the station StnB belonging to ABC Rail before
terminating at the StnC. For stations StnA and StnB, the network code is 1 and station codes
are 1 and 2 respectively; for StnC, the network code is 2, and station code 1. The travel
segments for the train SP0011 are (StnA, StnB), (StnB, StnC)and (StnA, StnC).
WTCF realization
In this section, we describe a physical realization of the logical model described in “Logical
data model” on page 16 using WTCF artifacts. Figure 6 shows the cabinet hierarchy for the
train file room.
Train Journey
• Hash on day of the year
• One record for each train
• Train number
• Managed References
• Seat Map
• Wait List
• Train Inventory Counters
Seat Map
Wait List
• One folder per train per day
• One record per passenger
• One passenger name
• Reference to corresponding PNR
folder for reservation
• Car, compartment, seat numbers
• One folder per train per day
• One record per wait listed
passenger
• Wait list number
• Reference to corresponding
PNR folder
PNR
• One folder per reservation
• Contains different types of records
• Reservation-specific records
(example, trip info, services booked)
• Passenger-specific records
(example, passenger details, contact
information)
Train Inventory Counters
• One folder per train per day
• One record for each travel segment
Figure 6 Cabinet hierarchy
17
It consists of the following elements:
򐂰 Train Journey is a root-level cabinet with the following characteristics:
– There are 366 root folders, one for each day of the year (based on a leap year).
– Each record in a folder represents a single train journey on that day and are distributed
by using the DirectOrdinalSpecification distributed method.
– Records in this cabinet contain references to folders in the lower level Seat Map, Wait
List, and Train Inventory Counters, and Train Time Table cabinets.
򐂰 Seat Map is a mid-level cabinet with the following characteristics:
– Folders exist only if there is a reference from a Train Journey record on a specific date.
– Each record in a folder represents a single seat on the referenced train journey.
򐂰 Wait List is a mid-level cabinet with the following characteristics:
– Folders exist only if there is a reference from a Train Journey record on a specific date.
– Each record in a folder represents a single wait listed passenger on the referenced
train journey.
򐂰 PNR, which represents a passenger name record, is a leaf-level cabinet with the following
characteristics:
– There is one folder for each passenger number.
– Folders can contain one unique number record and any number of name, address,
facts, or flight history records.
򐂰 Train Inventory Counter is a leaf cabinet with the following characteristics:
– Folders exist only if there is a reference from a Train Journey record on a specific date.
– Each record in a folder represents inventory counts for a travel segment.
18
WebSphere Transaction Cluster Facility
Figure 7 shows how records are laid out in a reservation folder.
…
Maps to PNRROOT table
City
Steert
Title
One or more passenger
records per Reservation folder
Maps to PAXINFO table
…
Term
ID
Issue
Date
Agent
…
…
Origin
One or more service
records per Reservation folder
Maps to PAXTRAN table
Cat.
Num.
Comp
Num.
Seat
Num.
Time
Last
Name
First
Name
Pax ID
Tx. ID
Date
PAXGr
p
TKT
Num.
Dept
Date
Exp
Date
Rec
ID 50
Serv.
ID
Serv.
Type
Rec
ID 40
TX ID
Rec
ID 30
Pax ID
PNR
Rec
ID 20
One reservation info
record per Reservation folder
One or more service
records per Reservation folder
Maps to PAXSERVICE table
Dest
Rec
ID 10
Last SVC
ID
Last use
date
Last grp
num.
Inventory
Last TX ID
Seat Map
PNRID
Train Journey
…
One or more journey
records per Reservation folder
Maps to PAXJOURNEY table
Figure 7 Reservations folder
Relational realization
Figure 8 on page 20 shows a realization of the logical model in a relational setting. The
relational data model depicted here is kept fairly simplistic for illustration purposes. A
production implementation would most likely require a far greater number of tables to store all
of the required application information. Therefore, the issues that are presented in the
following section would actually be exacerbated by an increase in the number of tables and
rows that are required to construct the logical entities of the reservation system.
19
PNRPAX
pnrid
paxNum
PAXTRAN
transactionId
pnrid
PNRROOT
pnrid
PAXGROUPMEMBERS
pnrid
paxNum
passengerGroup
PAXGROUP
pnrid
passengerGroup
PAXJOURNEY
transactionId
ticketNum
PAXSERVICE
transactionId
passengerGroup
serviceNum
ticketNum
PAXSEAT
transactionId
serviceNum
passengerGroup
ticketNum
carNum
seatNum
compartmentNum
Figure 8 A mapping of reservation folder records onto a set of relational tables
Cost of creating a reservation
A primary benefit of WTCF becomes apparent when you notice how passenger data is
handled in WTCF and relational realizations. In WTCF, realization of a passenger’s
reservation data is maintained as one unit; in relational realization it is spread across multiple
tables.This information is shown in Figure 7 on page 19 for a PNR folder, which contains the
passenger reservation details. We followed recommended practices of relational design,
normalized the data in PNR folder, and mapped it onto the set of tables shown in Figure 8.
Now, we estimate the cost of creating a reservation record for a family of two adults and a
child in terms of number of table inserts.
1. Create a record in the PNRROOT table.
2. Create a record in the PNRTRAN table.
3. For each passenger, create a record in the PNRPAX table.
4. For each passenger, create a record in the PAXGROUPMEMBERS table.
5. Create three records in the PAXGROUP table:
– One record for ticketing group consisting of two adults and one child
– One record for service group consisting of two adults
– One record for service group consisting of one child
6. Create a record in the PAXJOURNEY table for the ticket issued.
20
WebSphere Transaction Cluster Facility
7. Create two records in the PAXSERVICE table:
– One record to define a service class for two adults
– A second record for a separate service class for the child
8. As a last step, three records are added to the PAXSEAT table, one for each member in the
family.
The total number of inserts into the relational tables is 17. This does not take into account the
cost of allocating space for new records and indexing newly created records. In WTCF, all the
updates go into one single folder and require one update to the table. There is no overhead of
allocating space for a row or indexing it because all rows are pre-allocated within WTCF.
To access a PNR in WTCF, you could read records from the PNR folder. In a relational
system, you have to assemble a PNR before presenting it to the application. This assembly
typically requires joins of rather large tables for a decent size reservation system. In a typical
reservation system, the number of passenger reservations will be several orders higher than
other data items and usually do not fit in memory. Past observations of real systems show
that passenger tables have cache hit ratios of less than 1%. This further exacerbates the cost
of join operations involving passenger data.
It is possible to denormalize data and reduce number of tables, and thus the number of
inserts for creating a reservation. That means data is duplicated, which requires additional
overhead to maintain consistency among duplicated data. The denormalization that is
facilitated by WTCF however does not lead to data duplication in current circumstances.
However, there is a price for this efficiency. In WTCF, data can be accessed only using
predefined access paths thus limiting the ability to write arbitrary queries against data.
When performing an availability request, multiple classes of inventory can be queried within
the same folder. When making a booking, updates that are required for inventory counters all
occur in the same folder with minimal database activity.
Folders can grow to be very large in size. For example, a seat map folder for a train has
several hundreds to over a thousand passengers. An internal folder indexing mechanism is
automatically created when it is determined that the number of I/Os can be reduced by using
an index that allows for efficiency in the following areas:
򐂰 Ordered insertion of new records
򐂰 Searching of existing records
21
22
WebSphere Transaction Cluster Facility
Summary
WebSphere Transaction Cluster Facility (WTCF) effectively runs high volume online
transaction processing within a distributed environment. WTCF represents decades of
experience supporting demanding high volume transaction processing workloads. The
management of data within the WTCF middleware allows for high volume transaction
processing, which are common in many industries that support mission-critical systems.
The clustered environment of WTCF provides for unmatched scalability and availability
through the robustness of the DB2 Engine extended in DB2 pureScale. Along with DB2,
WTCF allows you to more easily develop applications using the WebSphere Application
Server environment.
Although the application programming interfaces are tailored for transaction processing, they
are also built on the object oriented concepts of C++ and Java. Developers can quickly
generate the initial database and database-specific classes using the WTCF Toolkit. This
allows functions to get implemented faster in today’s demanding online transaction
processing environments. The WTCF data objects are simple to use data designs for
application developers and database administrators.
Relational database management systems are not designed to effectively handle workloads
where a consistent view of the data is critical. The reason is because of the inefficient manner
in which a relational database indexes, retrieves, and stores data and proves costly because
of the large number of join operations that are required to sustain high transaction rates while
maintaining data integrity. WTCF has the performance characteristics and guarantees the
data integrity for unique and complex transaction processing workloads.
WTCF provides the infrastructure necessary to augment existing applications or create new
applications that have rigorous transaction and data integrity requirements in a flexible,
scalable efficient manner, establishing a cost effective platform to help IT executives meet the
changing needs of the business today and into the future.
© Copyright IBM Corp. 2012. All rights reserved.
23
Other resources for more information
The following resources are helpful for more information:
򐂰 IBM WebSphere Transaction Cluster Facility Information Center:
http://publib.boulder.ibm.com/infocenter/wtcfhelp/current
򐂰 IBM DB2 pureScale Feature Information Center:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r8/index.jsp
򐂰 WebSphere Transaction Cluster Facility product announcement
http://www.ibm.com/common/ssi/ShowDoc.jsp?docURL=/common/ssi/rep_ca/0/897/ENUS2
11-300/index.html&lang=en
򐂰 WebSphere Transaction Cluster Facility website
http://www.ibm.com/software/webservers/wtcf/
The team who wrote this guide
This guide was produced by a team of specialists from around the world working at the
International Technical Support Organization (ITSO).
Jason Keenaghan is a Senior Software Engineer at IBM in Poughkeepsie, New York. He
earned his Bachelor of Science degree from Boston College with a major in economics and a
minor in computer science. He later earned an MBA from Marist College. Jason is currently
one of the head architects for two leading high volume transaction processing platforms from
IBM: z/Transaction Processing Facility and WebSphere Transaction Cluster Facility. He has
over 14 years of experience designing and developing various components of these platforms
including network communications, memory management, system error and recovery,
database management, XML processing, SOA enablement, and enterprise integration.
Sastry Duri is a Senior Software Engineer at the IBM T. J. Watson Research Center in
Hawthorne, New York. Dr. Duri earned a Ph.D. degree in computer science from the
University of Illinois at Chicago. His professional interests include distributed and
high-performance computing systems, mobile commerce applications, RFID-based supply
chains, and sensor and actuator applications. He has represented IBM in the industry
standard group EPCglobal ALE Working Group, a subsidiary of the Uniform Code Council
(UCC), and in OpenLS workgroup.
Barry Baker is a Software Portfolio Product Manager in the IBM Software Group. He is
responsible for assessing what markets IBM should compete in, determining gaps in the IBM
portfolio, and partnering with customers, sales, marketing, services, development, research,
and analysts to formulate and execute product roadmaps and strategies regarding high
volume transaction processing, including IBM z/Transaction Processing Facility and
WebSphere Transaction Cluster Facility. He has over 12 years of experience working on
design, development, support, test, product management, and strategy of z/TPF. He regularly
meets with C-level and line-of-business executives of the z/TPF customer set.
Paul Dantzig is an IBM Research Relationship Manager for Financial Markets, Manager of
High Volume Web Serving, and a Senior Technical Staff Member at the IBM T J Watson
Research Center, Hawthorne, New York. He has also been an Adjunct Professor at Pace
University since 1982. Currently, as part of IBM Research, he is a Research Advocate for
JPMC and Verizon; Consultant and Developer on JPMC Remediation of Global Funds
Control System; IBM Research Project Manager, architect, and designer of the WebSphere
24
WebSphere Transaction Cluster Facility
Transaction Cluster Facility. Previous customer projects include RBC Benchmark (2011),
Morgan Stanley Consulting (2010-2011), JPMC Global Funds Control System Analysis
(2011, 2008), IBM Research Technical Advisor for eBay (2002 - 2007), The Chicago
Mercantile Exchange Joint Study on Distributed Parallel Matching Engine (2005), architect
and mentor for the Shanghai Stock Exchange Portal Site (2002-2003). In addition, he was an
IBM Research Relationship Manager for Sport and Event websites (1995- 2003); IBM
Research Relationship Manager for the Atlanta, Nagano, and Sydney Olympic Games;
Architect for 2000 Sydney Olympic website; Architect and Application Development Manager
for 1998 Nagano Olympic website; and Architect, Development Manager, and Programmer
for Olympic Internet Result System for 1996 Atlanta Olympic Web Site Web Publishing
Systems. Paul has patented many caching and transaction methods and published widely. He
has won numerous awards within IBM for his contributions.
Jonathan Collins is a Product Line Manager for the WTCF and ALCS products in the IBM
Software Group. He is responsible for financial management, acquisitions, pricing, marketing,
customer executive interfacing, new business opportunities, partner enablement, vendor
relationships, strategy, and product direction. Jonathan has over 10 years of experience
working in the TPF organization, including product development, customer support, and
product management.
Don Kallberg is a Senior Software Engineer at IBM in Poughkeepsie, New York, doing
project management within the WebSphere Transaction Cluster Facility development team.
Don has 23 years of experience with IBM, and has earned his Masters in Computer Science
degree from Syracuse University and his Bachelors of Science degree from Worcester
Polytechnic Institute. Don has had various responsibilities including project management,
software development, test and customer consulting on the z/Transaction Processing Facility
(TPF), TPF Operations Server, IBM Passenger Rail Reservation System (IPRRS), and virtual
machine (VM) products.
Thanks to the following contributor for his involvement with this project:
Stephen Smith, International Technical Support Organization, Raleigh Center
Now you can become a published author, too!
Here’s an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Learn more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Summary
25
Stay connected to IBM Redbooks
򐂰 Find us on Facebook:
http://www.facebook.com/IBMRedbooks
򐂰 Follow us on Twitter:
http://twitter.com/ibmredbooks
򐂰 Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
򐂰 Explore new IBM Redbooks® publications, residencies, and workshops with the IBM
Redbooks weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
򐂰 Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
26
WebSphere Transaction Cluster Facility
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
© Copyright IBM Corp. 2012. All rights reserved.
27
This document, REDP-4817-00, was created or updated on January 30, 2012.
®
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of
International Business Machines Corporation in the United States, other countries, or
both. These and other IBM trademarked terms are marked on their first occurrence in
this information with the appropriate symbol (® or ™), indicating US registered or
common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A
current list of IBM trademarks is available on the Web at
http://www.ibm.com/legal/copytrade.shtml
Redbooks®
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX®
DB2®
IBM®
Power Systems™
pureScale®
Rational®
Redbooks®
Redguide™
Redbooks (logo)
System z®
Tivoli®
WebSphere®
®
The following terms are trademarks of other companies:
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel
SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
28
WebSphere Transaction Cluster Facility
Download