Data Resilience ata Resilience ta Resilience a Resilience

advertisement
Front cover
Data
ata
taa Resilience
esilience
silience
ilience
lience
ience
ence
nce
cee
Solutions
olutions
lutions
utions
tions
ions
ons
nss
for IBM i5/OS High Availability Clusters
Understand the scope of business
continuity problems and solutions
Learn about data resilience technologies,
their features and limitations
Determine the right technologies
for your availability needs
Steve Finnes
Bob Gintowt
Mike Snyder
ibm.com/redbooks
Redpaper
International Technical Support Organization
Data Resilience Solutions for IBM
i5/OS High Availability Clusters
February 2005
Note: Before using this information and the product it supports, read the information in “Notices” on page v.
First Edition (February 2005)
This edition applies to Version 5 Release 3 Modification 0 of IBM i5/OS.
This document created or updated on February 7, 2005.
© Copyright International Business Machines Corporation 2005. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
The team that wrote this Redpaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1. What is business continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Major business continuity problem sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Problem categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Selection criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 3. Overview of business continuity technologies . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Backup window reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Workload balancing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Application resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Data resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 4. Applicability of a solution to a problem set . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 5. Detailed attributes of data resilience technologies. . . . . . . . . . . . . . . . . . . 15
5.1 Logical replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Switchable device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Cross-site mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.4 IBM TotalStorage Enterprise Storage Server PPRC used with the iSeries Copy Services
for ESS toolkit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 6. Comparison characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 7. General guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Appendix A. Decision factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Primary decision factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Supporting decision factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Appendix B. Cautions, caveats, and other tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Basic single system availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Backup window reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multisystem HA solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Planned maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Recovery for disaster outages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
36
36
36
37
37
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
© Copyright IBM Corp. 2005. All rights reserved.
iii
iv
Data Resilience Solutions for IBM i5/OS High Availability Clusters
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions are
inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and
distribute these sample programs in any form without payment to IBM for the purposes of developing, using,
marketing, or distributing application programs conforming to IBM's application programming interfaces.
© Copyright IBM Corp. 2005. All rights reserved.
v
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Eserver®
Eserver®
ibm.com®
iSeries™
i5/OS™
Enterprise Storage Server®
FlashCopy®
IBM®
Redbooks (logo)
Redbooks™
TotalStorage®
™
The following terms are trademarks of other companies:
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of others.
vi
Data Resilience Solutions for IBM i5/OS High Availability Clusters
Preface
Choosing the correct set of data resilience technologies in the context of your overall
business continuity strategy can be complex and difficult. It is a given that business continuity
is an extremely broad topic. This IBM® Redpaper provides some insight into this broad topic
and then describes the technologies that support improved data resilience for end users. It
explains the capabilities, advantages, and limitations of the various technologies. It provides
information and techniques that you can use to select the best set of technologies available
on IBM Eserver i5, to use in conjunction with the IBM i5/OS™ high availability clusters, to
satisfy your specific business continuity goals.
This IBM Redpaper is organized so that you can study the content from cover-to-cover
(recommended) or use specific sections for reference as needed. It begins with a discussion
about the business continuity requirements. This information helps you to determine and
prioritize your business requirements in the context of the specific problem sets of interest to
you.
Next, this IBM Redpaper presents an overview of the technologies that are related to
business continuity. This helps you to understand the technology categories and choices
within each category that are available to address the problem sets. Then the paper explains
how the technologies apply to the various business continuity requirements and how they
compare with one another. A detailed analysis is included to help position the various
technologies against your specific business requirements. Finally, conclusions are drawn
about the technologies, that map solutions to the characteristics of end-user environments.
Although this paper does not describe the value proposition of high availability or contrast
technologies for other aspects of business continuity, you can find a starter set of references
for this type of material on the IBM Eserver iSeries™ High Availability Web site at:
http://www-1.ibm.com/servers/eserver/iseries/ha/
The team that wrote this Redpaper
This Redpaper was produced by a team of specialists from IBM Rochester, Minnesota.
Steve Finnes
Bob Gintowt
Mike Snyder
IBM Rochester
Thanks to the following people for their contributions to this project:
Lou Antoniolli
Sue Baker
Selwyn Dickey
Janice Dunstan
Eric Hess
Mike McDermott
Jeff Palm
Stu Preacher
Jim Ranweiler
© Copyright IBM Corp. 2005. All rights reserved.
vii
Larry Youngren
IBM Rochester
Become a published author
Join us for a two- to six-week residency program! Help write an IBM Redbook or Redpaper
dealing with specific products or solutions, while getting hands-on experience with
leading-edge technologies. You'll team with IBM technical professionals, Business Partners
and/or customers.
Your efforts will help increase product acceptance and customer satisfaction. As a bonus,
you'll develop a network of contacts in IBM development labs, and increase your productivity
and marketability.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our papers to be as helpful as possible. Send us your comments about this
Redpaper or other Redbooks™ in one of the following ways:
򐂰 Use the online Contact us review redbook form found at:
ibm.com/redbooks
򐂰 Send your comments in an email to:
redbook@us.ibm.com
򐂰 Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYJ Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
viii
Data Resilience Solutions for IBM i5/OS High Availability Clusters
1
Chapter 1.
What is business continuity
Clients are continually faced with the complex task of determining which solution or
technologies to deploy that address the various business requirements which must be
supported by IT. In the case of business continuity requirements, the task is equally daunting.
Detailed business continuity requirements must be developed and documented, the solution
types identified, and the solution choices evaluated. This is a challenging task due in part to
the complexity of the problem. It is also partly due to the confusion associated with conflicting
requirements, mis-stated objectives, unrealistic expectations, and incomplete information.
Business continuity is the capability of a business to withstand outages and to operate
important services normally and without interruption in accordance with predefined
service-level agreements. To achieve a given desired level of business continuity, a collection
of services, software, hardware, and procedures must be selected, described in a
documented plan, implemented, and practiced regularly. The business continuity solution
must address the data, the operational environment, the applications, the application hosting
environment, and the end-user interface. All must be available to deliver a good, complete
business continuity solution.
Business continuity includes disaster recovery (DR) and high availability (HA). For example,
one aspect of a business continuity plan may be the set of resources, plans, services, and
procedures used to recover important applications and to resume normal operations for these
applications at a remote site. This is done in the event of a disaster that causes a complete
outage at the production site. This Disaster Recovery Plan includes a stated disaster
recovery goal (for example, resume operations within eight hours) and addresses acceptable
levels of degradation.
Another major aspect of business continuity goals for many customers is high availability.
This can be defined as the ability to withstand all outages (planned, unplanned, and
disasters) and to provide continuous processing for all important applications. The ultimate
goal is for the outage time to be less that .001% of total service time. The differences between
high availability and disaster recovery typically include more demanding recovery time
objectives (seconds to minutes) and more demanding recovery point objectives (zero end
user disruption). HA solutions are characterized by the fully automated failover to a backup
system to approach the objective of a nondisruptive continuation of productive end user
activity. HA solutions must have the ability to provide an immediate recovery point. At the
same time, they must provide a recovery time capability that is significantly better than the
© Copyright IBM Corp. 2005. All rights reserved.
1
recovery time that you would experience in a non-HA solution topology. For example,
recovering a single system from tape can take many hours. The HA solution should be able to
do this recovery in minutes. Frequently, the scope of disaster recovery involves the entire
system (operating system, the hosted data, and the associated data). High availability
solutions are more granular and can be targeted to individual critical resources within a
system, for example a specific application instance.
At the heart of the iSeries high availability solution is cluster technology. A cluster is a
collection of interconnected complete systems used as a single, unified computing resource.
The cluster provides a coordinated, distributed process across the systems to deliver the
solution. This results in higher levels of availability, some horizontal growth and simpler
administration across the enterprise. You should expect cluster architectures and
implementations to deliver the complete solution to availability. In the complete solution, you
must address the operational environment, the application hosting environment, application
resilience, and the end-user interfaces in addition to providing data resilience mechanisms.
IBM i5/OS cluster technology focuses on all aspects of the complete solution. The integrated
cluster resource services enable you to define a cluster of systems and the set of resources
that should be protected against outages. Cluster resource services detect outage conditions
and coordinate automatic movement of critical resources to a backup system.
2
Data Resilience Solutions for IBM i5/OS High Availability Clusters
2
Chapter 2.
Major business continuity
problem sets
The starting point for the selection process is to fully identify the set of availability problems
that you are attempting to address. For business continuity, these problems can be collected
into five major categories. Each problem is placed into one of these categories based on a set
of criteria, which is also explained in this chapter.
© Copyright IBM Corp. 2005. All rights reserved.
3
2.1 Problem categories
The problem sets are not mutually independent. Detailed analysis shows that elements of
each problem set overlap as illustrated in Figure 2-1. Each problem category is explained in
the following sections.
HA for
planned
outages
Backup window
reduction
Disaster recovery
HA for
unplanned
outages
Workload
balancing
Figure 2-1 Overlapping problem sets
Backup window reduction
The requirement for this category is to reduce or eliminate the amount of non-production time
required to do regular backups, such as nightly backups. Typically, the production window (a
metaphor for time) has grown. As a result, the backup window has also increased to
accommodate the increase in size and number of end-user objects that must be saved.
Although a daily outage for performing backups may be available, the length of the outage
exceeds the available window.
Availability for planned outage events
The requirement for this category is to ensure that production can continue, even though an
outage is required for planned maintenance. Planned maintenance outages include hardware
service, hardware upgrade, system software service, system software upgrades, certain
system administrative tasks, application software service, and application software upgrades.
Planned outages may also be the result of environmental factors such as building power
upgrades and IT center relocation. Business operations demand that the applications and
associated data must continue to be available during the time of any planned maintenance.
Recovery from disaster related outage events
The requirement for this category is to ensure that an extended outage, due to some disaster,
does not adversely affect business. The business is recoverable, and the corporate
environment may be resumed with a minimal loss of data.
A solution is required to protect against local system outages as well as site disasters, such
as fire, flood, and tornado. Typically applications and associated data are relocated to a
backup system at a different physical location than the normal production system. However,
4
Data Resilience Solutions for IBM i5/OS High Availability Clusters
the business operations may tolerate IT services being unavailable for some amount of time,
while disaster recovery is performed, typically in the 12 to 48 hour range but sometimes less.
In addition, disaster recovery may involve some amount of manual processing since it is
assumed to be rarely needed.
High availability for unplanned outage events
The requirement for this category is to ensure that critical system resources, such as
applications, are continuously available during both unplanned outages and planned outages.
Unplanned outages may include system hardware failures, disk and disk subsystem failures,
power failures, operating system failures, application failures, and user errors. Applications
and associated data must be available 24 x 7.
Business operations cannot tolerate extended outages while a disaster recovery site is
brought online or while a backup system is provisioned. Business operations dictate that
failover processing to a backup server must be fully automated. Normal production is
expected to continue with acceptable degradation.
Workload balancing
Multiple workloads (from multiple applications, multiple instances of the same application, or
both) are hosted in a multiple server environment. Each application instance has a
pre-determined service-level objective such as response time goals. If one or more of the
application instances do not achieve their stated objective due to system overload, they can
be relocated to another server which has available capacity.
2.2 Selection criteria
Keep in mind that generalizations, such as those illustrated in Figure 2-1, can distort the
answer. This discussion is a guide to potentially rule out certain solution types. In practice,
consider all of the major and supporting decision factors such as:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Up time requirements
Recovery time objective
Recovery point objective
Resilience requirements
Concurrent access requirements
Geographic dispersion requirements
Tolerance for end user disruption
Outage type coverage
Cost
Service and support
You can find details about these and other decision factors in Chapter 6, “Comparison
characteristics” on page 23, and in Appendix A, “Decision factors” on page 29.
Chapter 2. Major business continuity problem sets
5
6
Data Resilience Solutions for IBM i5/OS High Availability Clusters
3
Chapter 3.
Overview of business continuity
technologies
Several technology choices are available to address the problem sets discussed in Chapter 2,
“Major business continuity problem sets” on page 3. Because the specific requirements for
each problem set are articulated across various environments, a single technology is not
capable of addressing all needs of all customers.
This chapter presents four categories of technology solutions:
򐂰
򐂰
򐂰
򐂰
Backup window reduction
Workload balancing
Application resilience
Data resilience
Backup window reduction and workload balancing tie primarily to the backup window
reduction and workload balancing problem sets described in Chapter 2, “Major business
continuity problem sets” on page 3. Application and data resilience predominantly apply to the
high availability (HA) and disaster recovery (DR) problem sets. Both are needed to achieve
high availability. In the broad context of business continuity, there are other categories that are
not addressed in this paper, such as error correcting hardware, communications error
recovery, and enablers for basic application error handling and recovery.
As stated earlier, the focus of this paper is on multiple system data resilience topologies.
However, to place the total solution into context, you must also consider the other aspects of
business continuity. This chapter introduces some basic techniques for backup window
reduction, workload balancing, the various types of application resiliency, and data resilience.
Each is covered briefly to remind you that, to achieve a total solution, you cannot address only
data resilience. Additional details regarding these other technologies, as well as the
advantages and disadvantages of the various solutions, are not provided in this paper.
© Copyright IBM Corp. 2005. All rights reserved.
7
3.1 Backup window reduction
The obvious techniques of reducing or eliminating the backup window involve either
decreasing the time to perform the backup or decreasing the amount of data backed up.
Classic examples of these techniques are:
򐂰 Improved tape technologies
Faster and denser tape technologies can reduce the total backup time.
򐂰 Parallel saves
Using multiple tape devices concurrently can reduce backup time by eliminating or
reducing serial processing on a single device.
򐂰 Saving to non-removable media
Saving to media that is faster than removable media, for example directly to direct access
storage device (DASD), can reduce the backup window. Data can be migrated to
removable media at a later time.
򐂰 Data archiving
Data that is not needed for normal production can be archived and taken offline. It is
brought online only when needed, perhaps for month-end or quarter-end processing. The
daily backup window is reduced since the archived data is not included.
򐂰 Saving only changed objects
Daily backups exclude objects that have not changed during the course of the day. The
backup window can be dramatically reduced if the percentage of unchanged objects is
relatively high.
Other save window reduction techniques leverage a second copy of the data (real or virtual).
These techniques include:
򐂰 Saving from a second system
Data resilience technologies, such as logical replication, that make available a second
copy of the data can be used to shift the save window from the primary copy to the
secondary copy. This technique can eliminate the backup window on the primary system.
Therefore, it does not affect production since the backup processing is done on a second
system.
򐂰 Save while active
In a single system environment, the data is backed up using save processing while
applications may be in production. To ensure the integrity and usability of the data, a
checkpoint is achieved that ensures a point-in-time consistency. The object images at the
checkpoint are saved, while allowing change operations to continue on the object itself.
The saved objects are consistent with respect to one another so that you can restore the
application environment to a known state.
Save while active may also be deployed on a redundant copy achieved through logical
replication. Employing such a technique can enable the save window to be eliminated
effectively.
򐂰 IBM TotalStorage® Enterprise Server (ESS) FlashCopy® used in conjunction with the
iSeries Copy Services for ESS toolkit
This technology uses the ESS function of FlashCopy on an independent auxiliary storage
pool (IASP) basis. A point-in-time snapshot of the IASP is taken on a single ESS server.
The copy of the IASP is done within the ESS (in one set of logical disk units), and the host
is not aware of the copy. The toolkit enables bringing the copy on to the backup system for
purposes of doing saves or other offline processing. The toolkit also manages bringing the
8
Data Resilience Solutions for IBM i5/OS High Availability Clusters
second system back into the cluster in a nondisruptive fashion. The Copy Services toolkit
supports multiple IASPs from the same or multiple production system being attached at
the same time.
3.2 Workload balancing
The most common technologies for workload balancing involve moving work to available
resources. Contrast this with common performance management techniques that involve
moving resources to work that does not achieve performance goals. Example workload
balancing technologies (each with its own HA implications) are:
򐂰 Front end routers
These routers handle all incoming requests and then use an algorithm to distribute work
more evenly across available servers. Algorithms may be as simple as a round robin
distribution or complex based on actual measured performance.
򐂰 Multiple application servers
An end user distributes work via some predefined configuration or policy across multiple
application servers. Typically the association from requester to server is relative static, but
the requesters are distributed as evenly as possible across multiple servers.
򐂰 Distributed, multi-part application
These applications work in response to end-user requests that actually flow across
multiple servers. The way in which the work is distributed is transparent to the end user.
Each part of the application performs a predefined task and then passes the work on to
the next server in sequence. The most common example of this type of workload
balancing is a three-tiered application with a back-end database server.
򐂰 Controlled application switchover
Work is initially distributed in some predetermined fashion across multiple servers. A
server may host multiple applications, multiple instances of the same application, or both.
If a given server becomes overloaded while other servers are running with excess
capacity, the operations staff moves applications or instances of applications with
associated data from the overloaded server to the underutilized server. Workload
movement can be manual or automated based on a predetermined policy.
3.3 Application resilience
Application resilience can be classified according to the following categories:
򐂰 No application recovery
After an outage, end users must manually restart their applications. Based on the state of
the data, they determine where to restart processing within the application.
򐂰 Automatic application restart and manual repositioning within applications
Applications that were active at the time of the outage are automatically restarted.
However, the user must still determine where to resume within the application, based on
the state of the data.
򐂰 Automatic application restart and semi-automatic recovery
In addition to the applications automatically restarting, the end users are returned to some
predetermined “restart” point within the application. The restart point may be, for example,
a primary menu within the application. This is normally consistent with the state of the
Chapter 3. Overview of business continuity technologies
9
resilient application data, but the user may have to advance within the application to
actually match the state of the data.
򐂰 Automatic application restart and automatic recovery to last transaction boundary
The user is repositioned within the application to the processing point that is consistent
with the last committed transaction. The application data and the application restart point
match exactly.
򐂰 Full application resilience with automatic restart and transparent failover
In addition to being repositioned to the last committed transaction, the end user continues
to see exactly the same window with the same data as when the outage occurred. There
is no data loss, signon is not required, and there is no perception of loss of server
resources. The user perceives only a delay in response time.
Important: You can combine any of these application resilience mechanisms with the data
resilience mechanisms described in the following section to provide a complete solution.
3.4 Data resilience
You can use a number of technologies to address the data resilience requirements described
in 2.1, “Problem categories” on page 4. This paper describes the four key multisystem data
resilience technologies. Since this is the focus of this paper, they are only introduced here.
You can find more details in Chapter 5, “Detailed attributes of data resilience technologies” on
page 15.
򐂰 Logical replication
A second copy of data is generated that is logically identical to the first. The replication is
done on an object basis (file, member, data area, program, and so on) near real time.
Where possible, the replication is done at the lowest unit of change for the object, for
example at the record level for a data base file. Otherwise, the replication is done on the
entire object when a change is detected by the replication software. Logical replication is
normally accomplished using a business partner software product and is associated with a
data cluster resource group (CRG). This technology is a data- and object-based
replication solution.
Note: See the following HA Web site for a list of HA Business Partners:
http://www-1.ibm.com/servers/eserver/iseries/ha/
򐂰 Switchable device
A single copy of the data is maintained in an IASP. However, the data in an IASP can be
moved to a backup system in the event of an outage (scheduled or unscheduled) on the
system that is currently hosting the data. The data is then available for production
processing on the new primary system (previously the backup system). The switchable
device is controlled by a device CRG. This technology involves no replication and no
additional copies of the data.
򐂰 Cross-site mirroring (XSM)
A second copy of the data in the IASP is generated that is logically identical to the first.
Using the operating system geographic mirroring function, XSM mirrors data on disks at
sites that can be separated by a significant distance. This technology extends the
functionality of a device CRG beyond the limits of physical device connectivity. As data is
written to the production copy of an IASP, geographic mirroring replicates the change to a
second copy of the IASP through another system. The detailed comparison in Chapter 5,
10
Data Resilience Solutions for IBM i5/OS High Availability Clusters
“Detailed attributes of data resilience technologies” on page 15, through Chapter 7,
“General guidelines” on page 27, assumes an XSM environment where each copy of the
IASP is also protected by normal switched device technology. XSM with geographic
mirroring is an operating system storage management-based replication solution.
򐂰 ESS PPRC used in conjunction with the iSeries Copy Services for ESS toolkit
A second copy of the data is generated that is physically identical to the first. The ESS
peer-to-peer remote copy (PPRC) function is combined with the IASP as the basic storage
unit. These functions are augmented with an iSeries Technology Center (iTC) toolkit,
which provides a level of automation and operational protection. The toolkit enables
bringing the IASP copy online to the backup system. It also manages bringing the second
system back into the cluster in a nondisruptive fashion. In addition, the toolkit provides
function to reverse the PPRC direction on switchover or failover.
PPRC is a real-time remote copy technique that synchronously mirrors a primary set of
volumes (that are updated by applications) onto a secondary set of volumes. Typically the
secondary volumes are on a different ESS located at a remote location (the recovery site)
some distance away from the application site. Mirroring is done at a logical volume level.
This technology is a storage server-based replication solution.
Chapter 3. Overview of business continuity technologies
11
12
Data Resilience Solutions for IBM i5/OS High Availability Clusters
4
Chapter 4.
Applicability of a solution to a
problem set
Using the business continuity problem decomposition and the categorization of data
resilience technologies, you can determine possible matches of technologies to specific
needs. Table 4-1 provides a starting point for this selection process. It is only one of several
tools that you should use, and therefore not an exclusive means, to select the correct set of
technologies for your requirements. See Chapter 5, “Detailed attributes of data resilience
technologies” on page 15, for additional tools. Use Table 4-1 to help you eliminate the
technologies that do not fit. After this initial analysis, perform a detailed analysis of the
complete requirement sets against the specific characteristic of each technology choice.
The data resilience technologies listed in Table 4-1 include those that are primarily targeted at
improved data resilience. The table does not include other special technologies that target
primarily save window reduction, workload balancing, or application resilience. The shaded
cells in Table 4-1 indicate that the technology for that column most likely is not applicable to
the problem set for that row.
Table 4-1 Possible technology mapping
Business continuity
requirement
Data resilience technologies
Logical replication
Backup window reduction*
Switched disk
XSM
ESS toolkit for PPRC
N/A
N/A
N/A
Planned maintenance
Recovery for disaster outage
N/A
HA for unplanned outage
N/A
Workload balancing
N/A
* Although the technologies marked N/A in this row when used alone cannot provide backup window reduction, they
can be augmented by other technologies to achieve these results. For example, you can combine IBM TotalStorage
Enterprise Storage Server® (ESS) FlashCopy with the peer-to-peer remote copy (PPRC) function and then use the
additional data copy on another system to reduce backup time.
© Copyright IBM Corp. 2005. All rights reserved.
13
14
Data Resilience Solutions for IBM i5/OS High Availability Clusters
5
Chapter 5.
Detailed attributes of data
resilience technologies
While the previous chapters introduce several availability technologies, this chapter explores
each of the four key data resilience technologies in greater detail. It examines the
characteristics, advantages, limitations, and other considerations for each technology.
© Copyright IBM Corp. 2005. All rights reserved.
15
5.1 Logical replication
Logical replication is the most seasoned and widely deployed multisystem data resiliency
topology for high availability (HA) in the iSeries space. It is typically deployed via an HA
Business Partner (HABP) solution package. Replication is executed (via software methods)
on objects. Changes to the objects (for example file, member, data area, or program) are
replicated to a backup copy. The replication is near real time (simultaneous). Typically, if the
object, such as a file, is journaled, replication is handled at a record level. For such objects as
user spaces that aren’t journaled, replication is handled typically at the object level. In this
case, the entire object is replicated after each set of changes to the object is complete.
Most logical replication solutions allow for additional features beyond object replication. For
example, you can achieve additional auditing capabilities, observe the replication status in
real time, automatically add newly created objects to those being replicated, and replicate
only a subset of objects in a given library or directory.
To build an efficient and reliable multisystem HA solution using logical replication,
synchronous remote journaling as a transport mechanism is preferable. With remote
journaling, IBM i5/OS continuously moves the newly arriving data in the journal receiver to the
backup server journal receiver. At this point, a software solution is employed to “replay” these
journal updates, placing them into the object on the backup server. After this environment is
established, there are two separate yet identical objects, one on the primary server and one
on the backup server.
With this solution in place, you can rapidly activate your production environment on the
backup server via a role-swap operation. Figure 5-1 illustrates the basic mechanics in a
logical replication environment.
Figure 5-1 Logical replication
A key advantage of this solution category is that the backup database file is “live”. That is, it
can be accessed in real time for backup operations or for other read-only application types
such as building reports. In addition, that normally means that minimal recovery is needed
when switching over to the backup copy.
The challenge with this solution category is the complexity that can be involved with setting up
and maintaining the environment. One of the fundamental challenges lies in not strictly
policing undisciplined modification of the live copies of objects residing on the backup server.
Failure to properly enforce such a discipline can lead to instances in which end users and
16
Data Resilience Solutions for IBM i5/OS High Availability Clusters
programmers make changes against the live copy so that it no longer matches the production
copy. Should this happen, the primary and the backup versions of your files are no longer
identical. Significant advances by iSeries HABPs, in the form of tools designed to simplify the
management aspects and perform periodic data validation, can help detect such behavior.
Another challenge associated with this approach is that objects that are not journaled must go
through a check point, be saved, and then sent separately to the backup server. Therefore,
the granularity of the real-time nature of the process may be limited to the granularity of the
largest object being replicated for a given operation.
For example, a program updates a record residing within a journaled file. As part of the same
operation, it also updates an object, such as a user space, that isn’t journaled. The backup
copy becomes completely consistent when the user space is entirely replicated to the backup
system. Practically speaking, if the primary system fails, and the user space object is not yet
fully replicated, a manual recovery process is required to reconcile the state of the
non-journaled user space to match the last valid operation whose data was completely
replicated.
Another possible challenge associated with this approach lies in the latency of the replication
process. This refers to the amount of lag time between the time at which changes are made
on the source system and the time at which those changes become available on the backup
system. Synchronous remote journal can mitigate this to a large extent. Regardless of the
transmission mechanism used, you must adequately project your transmission volume and
size your communication lines and speeds properly to help ensure that your environment can
manage replication volumes when they reach their peak. In a high volume environment,
replay backlog and latency may be an issue on the target side even if your transmission
facilities are properly sized.
5.2 Switchable device
The iSeries implementation of switchable device, independent auxiliary storage pools
(IASPs), supports both directory objects (such as the integrated file system (IFS)) and library
objects (such as database files). It is provided as part of i5/OS Option 41, High Availability
Switchable Resources. The IT community often refers to this topology as switched disks.
A key point about this solution is that it’s inherently a logical solution, as opposed to a purely
mechanical-switching solution. The architecture is deployed within the operating system as a
special class of auxiliary storage pool (ASP) that is independent of a particular host system.
The practical outcome of this architecture is that switching an IASP from one system to
another involves less processing time than a full initial program load (IPL). Figure 5-2
illustrates the concept of a switchable device.
The benefit of using IASPs lies in their operational simplicity. The single copy of data is always
current, meaning there is no other copy with which to synchronize. No in-flight data, such as
data that is transmitted asynchronously, can be lost. And, there is minimal performance
overhead. Role swapping or switching is relatively straight forward, although you may have to
account for the time required to vary on the IASP.
Another key benefit of using IASPs is zero-transmission latency. The major effort associated
with this solution involves setting up the direct access storage device (DASD) configuration,
the data, and application structure. Making an IASP switchable is relatively simple. The IASP
is placed into a switchable tower that is attached to two servers (or partitions) via a
high-speed link (HSL) loop.
Chapter 5. Detailed attributes of data resilience technologies
17
Figure 5-2 Switchable device
Limitations are also associated with the IASP solution. First, there’s only one logical copy of
the data in the IASP. This can be a single point of failure, although the data may be protected
using RAID 5 or mirroring. The data cannot be concurrently accessed from both hosts for
such things as read access or more importantly for backup to tape operations.
Certain object types, such as configuration objects, cannot be stored in an IASP. You need
another mechanism, such as periodic save/restore or logical replication, to ensure that these
objects are appropriately maintained.
Another limitation involves hardware associated restrictions. Examples include distance limits
in the HSL loop technology and outages associated with certain hardware upgrades. The
IASP cannot be brought online to a down level system. Other considerations include
database restrictions on cross-IASP relationships such as JOINs and referential integrity
rules. Therefore, up-front database design and analysis are essential.
5.3 Cross-site mirroring
This solution type involves the mirroring of IASP data via i5/OS storage management to a
second and perhaps remote server over a communications fabric. Cross-site mirroring (XSM)
is included in Option 41 of i5/OS Version 5 Release 3. It enables the switching or automatic
failover to a mirrored copy of the IASP (see Figure 5-3) in addition to locally switching the
IASP between systems. It addresses the single point of failure issue of the basic switchable
device structure. It also provides a means to develop a remote mirrored copy of your IASP
data via a function called geographic mirroring.
18
Data Resilience Solutions for IBM i5/OS High Availability Clusters
Primary
(source
system)
Production
data
Backup
Mirror
Figure 5-3 Cross-site mirroring
The benefits of this solution are essentially the same as the basic switchable device solution
with the added advantage of providing disaster recovery to a second copy at increased
distance. The biggest benefit continues to be operational simplicity. All of the data placed in
the production copy of the IASP, including the journal receivers, is mirrored to a second IASP
on a second, perhaps remote, system. The switching operations are essentially the same as
that of the switchable device solution with the added benefit that you can also switch to the
mirror copy of the IASP, making this a straightforward HA solution to deploy and operate. As in
the switchable device solution, objects not in the IASP must be handled via some other
mechanism and the IASP cannot be brought online to a down-level system.
XSM also provides real-time replication support for hosted integrated environments such as
Microsoft® Windows® and Linux®. This is not generally possible through journaling-based
logical replication.
A potential limitation of an XSM solution is performance impacts in certain workload
environments. As with any solution, when synchronous communications are used, you must
consider distance, bandwidth, and latency limitations associated with transmission times.
When running input/output (I/O)-intensive batch jobs, some performance degradation on the
primary system is possible. Also, be aware of the increased central processing unit (CPU)
overhead required to support XSM.
In addition, another limitation of an XSM solution is that concurrent operations cannot access
the mirror copy of the IASP. For example, if you want to back up to tape from the
geographically mirrored copy, you must quiesce operations on the source system and detach
the mirrored copy. Then you must vary on the detached copy of the IASP, perform the backup
procedure, and then re-attach the IASP to the original production host. This mandates full
data resynchronization between the production and mirrored copies.
Depending on how long it takes to synchronize the primary and backup IASP copies, it may
be impractical to detach and then reattach the mirrored copy for a back up-to-tape operation
in certain environments. Your system is running exposed while doing the backups and when
synchronization is occurring. Synchronization is also required for any persistent transmission
interruption, such as the loss of all communication paths between the source and target
systems for an extended period of time. To minimize the potential for this situation, we
recommend that you use redundant transmission links. You should also use XSM in at least a
Chapter 5. Detailed attributes of data resilience technologies
19
three system configurations where the production copy of the IASP can be switched to
another system at the same site that can maintain geographic mirroring.
You can configure an XSM environment where both of the production and mirrored copies of
the IASP are non-switchable between servers at each site. Although this applies to some
end-user environments, it is not the configuration assumed in this paper. When running with
such a configuration, your system is still exposed to some single points of failure.
5.4 IBM TotalStorage Enterprise Storage Server PPRC used
with the iSeries Copy Services for ESS toolkit
This solution type involves the replication of data at the storage controller level to a second
storage server using IBM TotalStorage Enterprise Storage Server (ESS) copy services. An
IASP is the basic unit of storage for the ESS peer-to-peer remote copy (PPRC) function.
PPRC generates a second copy of the IASP on another ESS. The toolkit comes as part of the
iSeries Copy Services for ESS services offering. It provides a set of functions to combine the
PPRC, IASP, and i5/OS cluster services for coordinated switchover and failover processing
through a cluster resource group (Figure 5-4).
iSeries Cluster and
Device Domain
Figure 5-4 ESS PRC Toolkit
This solution provides the benefit of the remote copy function and coordinated switching
operations, which gives you good data resiliency capability if the replication is done
synchronously. The toolkit enables you to attach the second copy to a backup server without
an IPL. No load source recovery is involved in the operations. You also have the ability to
combine this solution with other ESS-based copy services functions, such as FlashCopy, for
additional benefits such as save window reduction.
This solution also has limitations. The switchover processing, while mostly automated,
requires some manual intervention to coordinate actions between i5/OS and the ESS. When
you are done using the IASP on the backup system and switch back to the original primary via
a scheduled switchover with PPRC, then no IPL is necessary. However, if the switchover is
unscheduled, then an extra IPL on the failed system is required before it can accept the IASP
again. Because of the required manual processing, this solution isn’t defined as an HA
solution but principally as a disaster-recovery solution. It can be used for certain kinds of
planned outages.
20
Data Resilience Solutions for IBM i5/OS High Availability Clusters
If the two ESS are connected synchronously, you must also be aware of the distance
limitations associated with transmission times as with any solution when synchronous
communications are used. This approach requires tools and services to deploy.
Prior to ESS Release 2.4.0, asynchronous PPRC was never recommended as part of this
solution. Previous implementations of asynchronous PPRC did not preserve write order.
Therefore, a consistent view of the data as well as internal object consistency could not be
guaranteed. However, the recently announced IBM TotalStorage Global Mirror for ESS
provides an asynchronous PPRC solution that preserves the order of writes and provides
significantly longer distances.
A variation of this solution is to use ESS PPRC without IASPs and the toolkit. This is not
considered in this paper because such a solution involves a long recovery time due to load
source recovery processing and long IPL recovery steps. In addition, such a solution does not
protect you from simultaneously updating both copies of the data (a feature of the toolkit),
thereby eliminating identical copies of the data.
Chapter 5. Detailed attributes of data resilience technologies
21
22
Data Resilience Solutions for IBM i5/OS High Availability Clusters
6
Chapter 6.
Comparison characteristics
With this high level overview of solution applicability to a major problem set, it is important to
explore the detailed characteristics and attributes of the solutions based on several important
characteristics. For this chapter, we selected some key characteristics to consider. However,
you may have other characteristics that are equally or more important to your environment. To
compare the various availability techniques that use some form of data resiliency, we use the
following characteristics in the technology comparison shown in Table 6-1:
򐂰 Primary use: This indicates whether the solution is primarily oriented for users with a high
availability (HA) requirement or for users who only have a disaster recovery requirement.
Disaster recovery (DR) involves recovering all important applications and resuming
normal operations for these applications at a remote site in the event of a disaster that
causes a complete outage at the production site. High availability enables your
environment to withstand all outages (planned, unplanned, and disasters) to provide
continuous processing for all important applications for a very high percentage of time.
򐂰 Characteristic of replication mechanism: This provides a brief description of the major
characteristics of the solution that generates a copy of the data onto auxiliary storage.
򐂰 Recovery time: This refers to the length of time that it takes to recover from an outage
(scheduled, unscheduled, or disaster) and resume normal operations for an application or
a set of applications.
򐂰 Recovery point: This indicates the point in time where recovery processing returns
control to end users. The recovery point is from the perspective of both data and
application processing.
򐂰 Ordering of changes: This indicates how changes are ordered on the backup system and
how the order relates to the original sequence of changes on the primary system.
Ordering of changes must be preserved to ensure data integrity and the internal
consistency of objects.
The ordering mechanism can also affect recovery time. For example, while there may not
be strict ordering as the system expects it, power-loss integrity can sometimes be
preserved by initial program load (IPL) processing to ensure object consistency. However,
if interdependent changes are not done in the correct order, IPL recovery may in fact
introduce data integrity issues.
򐂰 Concurrent access: This is the level of access allowed to a secondary copy of data. This
row also addresses the real-time currency of the replicate copy, which describes the
© Copyright IBM Corp. 2005. All rights reserved.
23
nature of the lag time (or latency) between the secondary copy and the primary copy. A
value of 100% current would mean that, from a user’s perspective, the copy and primary
are updated simultaneously (0 latency).
򐂰 Geographic dispersion: This refers to distance limitations imposed between systems
and data copies.
򐂰 Number of backup systems: The value for this row indicates the maximum number of
backup systems that can be involved with the replication and failover process.
򐂰 Number of data copies allowed: The value in this row indicates the maximum number of
secondary data copies that can be processed by this solution.
򐂰 Cost factors: This row specifies whether there are any specific requirements for the direct
access storage device (DASD) used for the primary or secondary copies of the data. This
row also describes any additional cost considerations relative to the data storage.
򐂰 End-user disruption: This row indicates what the user will experience during normal
production and system recovery. For save window reduction, this includes any processing
from the time at which the save was initiated until the copy is captured and the user can
resume normal processing.
򐂰 Outage coverage: This may apply to a planned, unplanned, disaster, or save window
reduction.
򐂰 Cluster controlled resource: This row specifies whether the resource that is providing
the resilience falls under the control of cluster services (for single point of management,
automated failover, coordinated switchover, and so on).
򐂰 Risks: This row identifies potential risks or exposures that are involved with the solution.
When examining Table 6-1, consider the following notes:
򐂰 Since Table 6-1 only addresses data resilience techniques, not all of the decision factors
mentioned in Appendix A, “Decision factors” on page 29, are included in the comparison.
򐂰 In some cases, the distance limits are stated as “virtually unlimited”. While this is
technically true, the actual distance limits are gated by response time degradation
tolerances, throughput impacts, characteristics of the communications fabrics, and other
factors. For example, for the IBM TotalStorage Enterprise Storage Server (ESS), we
recommend that you never use synchronous PPRC across distances greater than
100 KM.
򐂰 Independent auxiliary storage pool (IASPs) vary on processing covers the time needed to
bring the device online to the system and any recovery processing needed on the IASP
contents.
24
Data Resilience Solutions for IBM i5/OS High Availability Clusters
Table 6-1 Technology comparison
Logical replication
Switchable IASPs
XSM with geographic
mirroring
ESS PPRC with IASP
and iTC Toolkit
Primary use
HA (including DR)
HA (no DR)
HA (including DR)
DR
Characteristic
of replication
mechanism
򐂰
򐂰
򐂰
Recovery time
considerations
򐂰
򐂰
򐂰
򐂰
Object based
replication
Changes at record
or object level based
on data and audit
journal
Logical copy of
object-level changes
for selected objects
򐂰
򐂰
Apply lag plus
replication
switchover overhead
Journal settings
No IPL required
Minutes
򐂰
򐂰
No replication
One copy of
data that is
switchable
between
systems
򐂰
򐂰
򐂰
򐂰
IASP vary on
SMAPP or
journal settings
No IPL required
Minutes
򐂰
򐂰
򐂰
򐂰
Page-level
replication as
controlled by
operating system
based on storage
management writes
Logical copy since
physical DASD
configurations can
differ
򐂰
IASP vary on
SMAPP or Journal
settings
No IPL required
Minutes
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Recovery point
considerations
򐂰
򐂰
򐂰
򐂰
Ordering of
changes
Concurrent
access
򐂰
򐂰
򐂰
򐂰
Geo dispersion
Transaction
boundary with
commitment control
Mixed, audit and
data journal
Data or objects sent
to target are
recovered
Any changes not
transmitted are lost
(zero data loss with
synch remote
journal)
򐂰
Based on journal
receiver content and
HABP ability to
synchronize
changes from data
and audit journals
򐂰
Typically read only,
possibly shared
data
Always some lag
time in data
currency
Remote Journal
helps
򐂰
Virtually unlimited
򐂰
򐂰
Transaction
boundary with
commitment
control
Last data
written to IASP
Objects not in
IASP
򐂰
Ordering
preserved
򐂰
򐂰
򐂰
Transaction
boundary with
commitment control
Last data written to
IASP
Objects not in IASP
򐂰
򐂰
򐂰
򐂰
Ordering at system
level
Ordering preserved
across ASP group
򐂰
򐂰
򐂰
No concurrent
access since
no copy of data
Limited (250 M)
򐂰
򐂰
No, requires
resynchronization
Second copy
current
Virtually unlimited
򐂰
򐂰
Sector level
replication of all
pages written to
disk
Physical copy of
an IASP based on
disk I/O (cache
based)
Quiesce time
Manual steps plus
vary on
SMAPP/Journal
settings
IPL sometimes
required before
use backup again
Tens of minutes
Quiesce point for
breaking PPRC
Transaction
boundary with
commitment
control
Last data written
to disk (some
automation and
protection from
mistakes)
Ordering at
controller level
Preserved at LUN
set level for sync
PPRC
No order for
asynch until 2.4.0
No concurrent
access
Copy current with
synch PPRC;
incoherent with
asynch PPRC
Virtually unlimited
Chapter 6. Comparison characteristics
25
Logical replication
Switchable IASPs
XSM with geographic
mirroring
ESS PPRC with IASP
and iTC Toolkit
Number of
backup
systems
1<= n <127 (or BP max)
n=1 (with
switchable towers)
1<= n <=3 (2 or 3 with
switchable towers)
1<=n<=2 (2 with
cascading PPRC)
Number of data
copies
127 (or BP max)
None
1
2
Cost factors
򐂰
򐂰
򐂰
򐂰
End user
disruption
򐂰
򐂰
Any DASD
configuration
HABP software
Bandwidth
Duplicate disks
򐂰
Replication
overhead
Can automatically
restart application
򐂰
򐂰
Switchable
tower (or IOP)
i5/OS Option
41
򐂰
Can
automatically
restart
application
򐂰
򐂰
򐂰
򐂰
򐂰
Any (flexible) DASD
configuration
i5/OS Option 41
Bandwidth
Duplicate disks
򐂰
Geographic
mirroring overhead
Can automatically
restart application
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Ext DASD (2 x
Shark)
Bandwidth
i5/OS Option 41
PPRC
Toolkit
Duplicate disks
PPRC and toolkit
overhead
Semi automatic
application restart
Outage
coverage
Planned, unplanned,
disaster, save window
Planned,
unplanned
Planned, unplanned,
disaster
Some planned
outages, disaster
Cluster control
Yes
Yes
Yes
Yes – of switchable
devices
Risks
򐂰
򐂰
򐂰
26
Loss of in flight data
Mismatch of data
levels for various
objects
Monitoring logical
object replication
environment
򐂰
Disk subsystem
is single point
of failure,
therefore no
protection
against
catastrophic
disk failure
򐂰
򐂰
Asynch case: Loss
of copy for some
double failure
situations; OK if can
quiesce and
vary-off mirror copy
Resynch may yield
lengthy unprotected
condition
(especially with only
two systems)
Data Resilience Solutions for IBM i5/OS High Availability Clusters
򐂰
򐂰
򐂰
IPL on backup
systems in some
situations
Somewhat
complex
Never use asynch
PPRC, unless
using the new
Global Mirror
option
7
Chapter 7.
General guidelines
Choosing a multisystem data resilience mechanism can be a complex and confusing
decision. Table 6-1 on page 25 helps you to compare the resilience technologies in detail.
This section provides general guidelines about when a particular mechanism may be best
suited for a given environment. The technologies are not mutually exclusive. The solution that
best fits a set of customer requirements may be achieved by deploying a combination of
available technologies.
Consider logical replication when:
򐂰 You need two or more copies of the data.
򐂰 You want concurrent access to second data copy.
򐂰 You need backup window reduction.
򐂰 You need to selectively replicate objects within a library or directory.
򐂰 You IT staff can monitor the state of the replication environment.
򐂰 Geographic dispersion between copies is needed, especially if they need distances
greater than what can be achieved by hardware solutions.
򐂰 You already have deployed a solution using logical object replication.
򐂰 You need a solution that has no special hardware configuration requirements.
򐂰 Failover and switchover times should not exceed tens of minutes.
򐂰 Transaction level integrity is important for all journaled objects.
Consider switchable independent auxiliary storage pools (IASPs) when:
򐂰 Only one copy of the data with hardware protection satisfies your requirement and you
have considered or addressed avoiding unplanned outages due to disk subsystem
failures.
򐂰 You need a simple, low cost and low maintenance solution.
򐂰 Disaster recovery (DR) is not needed. Coverage for planned and certain types of
unplanned is all that is required.
򐂰 The source and target system are at the same site.
© Copyright IBM Corp. 2005. All rights reserved.
27
򐂰 You want consistent failover and switchover times within minutes and that do not depend
on transaction volumes.
򐂰 Transaction-level integrity is important for all objects. You need immediate availability of all
object changes with no loss of in flight data. Objects not within an IASP either do not need
to be replicated or are handled via some other mechanism.
򐂰 You need the highest throughput environment.
򐂰 Your environment calls for multiple, independent databases that can be moved between
systems.
Consider cross-site mirroring when:
򐂰 You want a system-generated second copy of the data (at an IASP level).
򐂰 You need two copies of data, but do not need concurrent access to a second copy.
򐂰 A relatively low cost and low maintenance solution is desired, but you also need disaster
recovery.
򐂰 Geographic dispersion between copies is needed, but your distance requirement does not
adversely impact your acceptable production performance goals.
򐂰 You want consistent failover and switchover times within minutes and that do not depend
on transaction volumes.
򐂰 Transaction-level integrity is important for all objects. You need immediate availability of all
object changes with no loss of in flight data. Objects not within an IASP either do not need
to be replicated or are handled via some other mechanism.
򐂰 The second copy that is not available during resynchronization fits within your service level
objectives.
Consider IBM TotalStorage Enterprise Storage Server (ESS) peer-to-peer remote copy
(PPRC) with IASP and Toolkit when:
򐂰 You desire a storage-based solution for DR, especially if multiple platforms are involved.
򐂰 You do not need complete high availability (HA), but seek to cover DR and some planned
outages for critical application data.
򐂰 Recovery times of one hour or more are acceptable. (Actual recovery times can be less.)
򐂰 You want two copies of data, but do not need concurrent access to a second copy.
򐂰 Geographic dispersion between copies is needed, but your distance requirement does not
adversely impact your acceptable production performance goals. Alternatively, consider
PPRC Global Mirror (asynchronous PPRC).
򐂰 Transaction-level integrity is important for all objects. You need availability of all object
changes with no loss of in-flight data.
Use a combination solution when no single solution meets all of your business continuity
requirements.
28
Data Resilience Solutions for IBM i5/OS High Availability Clusters
A
Appendix A.
Decision factors
A user employs the decision factors in some form of decision tree to determine which data
resilience or replication mechanism best suits the user’s business continuity needs. The
group categorized as primary decision factors are those requirements that are most likely to
be common across a wide user audience. They are also most likely to carry more weight in
the decision process. An underlying assumption for all of these is that the mechanism does
not compromise the integrity of the data.
The group categorized as supporting decision factors are normally, but not always, secondary
requirements in determining an availability solution.
© Copyright IBM Corp. 2005. All rights reserved.
29
Primary decision factors
The primary decision factors are based on the criteria.
Up-time requirements
Up-time requirements refers to the total amount of time that the system is available for
end-use applications. The value is stated as a percent of total scheduled working hours.
Typically the cost per outage hour is used as a determining factor in up-time requirements.
The values with a corresponding downtime for a 24x365 shop are:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
<90%
90 to 95%
95 to 99%
99.1 to 99.9%
99.99%
99.999%
(downtime of 876 or more hours (36 days)/year)
(downtime of 438 to 876 hours/year)
(downtime of 88 to 438 hours (3.6 days)/year)
(downtime of 8.8 to 88 hours/year)
(downtime of about 50 minutes/year)
(downtime of about 5 minutes/year)
Recovery time objective
The recovery time objective (RTO) indicates the length of time it takes to recover from an
outage (scheduled, unscheduled, or disaster) and to resume normal operations for an
application or a set of applications. The RTO may be different for scheduled and unscheduled
outages.
The values are:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
More than 4 days is acceptable
1 to 4 days
<24 hour
<4 hours
<1 hour
Approaching zero (near immediate)
Recovery point objective
The recovery point objective (RPO) indicates the point in time where recovery processing
returns the end users, from both the data and application perspective.
The values are:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Last save (weekly, daily, ...)
Start of last shift (8 hrs)
Last major break (4 hrs)
Last batch of work (1 hour to tens of minutes)
Last transaction (seconds to minutes)
In-flight changes may be lost (power loss consistency)
Now (near immediate)
Resilience requirements
The resilience requirements are the set of information and system capabilities required to be
made resilient. These entities remain available (for example via failover) even when the
system currently hosting them experiences an outage.
30
Data Resilience Solutions for IBM i5/OS High Availability Clusters
The values are:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Nothing needs to be made resilient
Application data
Application and system data
Application programs plus item #3
Preserve application state plus item #4
Preserve the application environment plus item #5
Preserve all communications, hardware devices, user clients plus item #6
Concurrent access requirements
Concurrent access requirements indicates the level of access that is required to secondary
copies of the data for other work activity offloaded from primary copies, such as save and
batch reports. You must consider at least the frequency, duration, and access method.
The values are:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
None
Seldom and during period of non-production
Infrequent but during normal production for short (seconds to minutes) durations
Infrequent but during normal production for long durations
Frequently during production for short durations
Frequently during production for long durations
Nearly all the time (near continuous)
The values for the access method are:
򐂰
򐂰
򐂰
򐂰
None
Read only
Read with limited update
Simultaneous concurrent update
Geographic dispersion requirements
Geographic dispersion requirements indicate the proximity that is required for secondary
copies of data, system services, and application environment with respect to the location of
the production version.
The values are:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Same room
100 meters or less (same building)
100 to 200 meters (across the street)
Less than 2 km (across the campus)
2 km to 20 km (same city)
20 km to 40 km (same region)
>40 km but <=100 km
100s of km
Appendix A. Decision factors
31
Tolerance for end user disruption (slowdowns, failover time, manual
restart procedures, and so on)
This factor indicates the impacts which are acceptable to application end users and are
caused by the availability solution and the corresponding recovery processing.
The values for solution impact are:
򐂰 Not an issue (The availability of primary importance, performance can be affected as long
as availability solution delivered.)
򐂰 Some performance degradation is acceptable
򐂰 Slight degradation in performance
򐂰 No perceived performance impact
The values for recovery processing impact are:
򐂰 Each user must manually restart applications and determine where to reposition
򐂰 Automatic application restart but end users must determine where to reposition
򐂰 Automatic restart and users automatically are repositioned to the last main menu
(application book marking)
򐂰 Automatic restart and automatic reposition to last transaction boundary (commit control)
򐂰 Automatic restart and automatic reposition to the last screen with all non-committed data
shown
Outage type coverage (planned, unplanned, disaster)
This factor indicates the types of outages for which recovery is to be provided. There are
implied levels of granularity for each of these. The user should itemize all specific outages
requiring protection.
The values are:
򐂰 None
򐂰 Site disasters only (building outage, regional disasters, and so on)
򐂰 Scheduled outages only (software maintenance, release upgrades, and so on)
򐂰 Unscheduled outages only (hardware device failure, system hardware failure, power
failure, software/application failures, human errors, and so on)
򐂰 All scheduled and unscheduled outages
Cost
Cost requirements indicate the allowed impact for solution cost on adoption and deployment.
Solution cost is the total cost of ownership which includes the initial cost to procure and
deploy the solution, the ongoing costs to use the solution, and any cost/performance impacts.
Cost is typically predicated on a thorough business impact analysis.
The values are:
򐂰
򐂰
򐂰
򐂰
򐂰
Cost is not a factor.
Cost has slight bearing on decision.
Based on outage analysis, the solution cost must be contained within some budget.
Cost is a significant factor in the decision.
Unwillingly or unable to spend anything on availability solution
Service and support
A high availability (HA) solution consists of both function and service. Due to the varying
customer environments in the market, along with the varying degrees of completeness of
solution types as discussed in this paper, service is a primary consideration factor to be used
32
Data Resilience Solutions for IBM i5/OS High Availability Clusters
in the selection of a solution. There are two types of services as explained in the following
sections.
The solution provider must be able to provide customer satisfaction data done in one of the
standard methodologies such as net satisfaction index (NSI). This person must also
document a full suite of both partner and customer education resources on the topics of
deployment and utilization of the products being marketed in a given region.
Project management, planning, training, and testing
This is the deployment aspect of an HA service offering. It must be done by specialists who
are certified by the solution provider company. Also the solution provider company must be
able to provide the information about the certification or training process.
A project plan is the basis for an availability solution deployment. The solution provider must
demonstrate a complete project plan with deliverables, time tables, dependencies, and
owners as well as the metrics that define completion of the deployment of the project.
The number of certified specialists is also a critical factor. A typical HA solution project can
take around 30 person days spread over three months. Therefore, it is critical that the solution
provider has sufficient resource available to completely implement a given HA project plan in
the time budgeted for the project.
Technical support
This service is provided by certified support specialists. The solution provider must have local
language support for customer support that must be available 24X7 and must be available at
a regional level. Due to the sensitive nature of a high availability solution, it is important that
the solution provider demonstrate sufficient staffing resource to cover critical situations
simultaneously along with a documented procedure for escalation and closure.
Supporting decision factors
The following decision factors sometimes serve as the basis for additional requirements. You
can use them to help determine the correct data resilience solution.
Keep in mind that, for some situations, one or more of these factors may actually be primary
decision factors. Determine the correct set that fits your specific environment.
򐂰 Downtime window availability (nights, weekends, holiday weekends, and so on, or never)
򐂰 Switch frequency: How often you plan on exercising outage protection for planned
maintenance, such as weekly or monthly
򐂰 Save processing objectives, procedures, and so on
򐂰 Restore processing objectives, procedures, and so on
򐂰 Transaction boundary requirements
򐂰 Workload balancing objectives (movement of application or data to achieve desired
workload objectives)
򐂰 Usability of solution and associated skill requirements
This includes susceptibility for introducing user errors or failing because of complex
manual procedures. It also includes the degree of automation required to achieve the
desired solution for this and other processing. The amount of technical training and depth
of knowledge of the solutions is also included in this item. Plus this factor includes the
complexity of deployment. That is whether the solution be easily installed, configured, and
implemented.
Appendix A. Decision factors
33
򐂰 Complementary services and support available
򐂰 The number of copies of the data that are needed (1, 2, more than 2)
򐂰 Reasons for having additional copies of data (concurrent save, offloading non-production
work to backup system, business intelligence, and so on)
򐂰 Configuration flexibility: Interconnection requirements, number systems involved, backup
order, or centralized enterprise solution
򐂰 Objects covered or supported
򐂰 Performance factors (CPU overhead, response time, throughput, batch processing time,
and so on)
򐂰 Switch frequency (how often you will do a planned switchover)
򐂰 Integration of the solution into existing or planned environments
34
Data Resilience Solutions for IBM i5/OS High Availability Clusters
B
Appendix B.
Cautions, caveats, and other tips
The tips, cautions, and caveats presented in this appendix may be helpful to determine your
business continuity needs. Where appropriate, enlist the services of a qualified high
availability (HA) specialist to guide you through all aspects of establishing and implementing
an HA solution.
In general, when you change your IT environment, you may need to re-evaluate and possibly
adjust your business continuity solutions. Adding or changing hardware or applications may
affect what you have currently deployed.
© Copyright IBM Corp. 2005. All rights reserved.
35
Basic single system availability
For basic single system availability, consider the following points:
򐂰 Any good business continuity implementation is grounded in basic single system
availability characteristics. Use such techniques as journaling, commitment control, and
predictive error analysis to provide a strong foundation for any of the problem categories
described in Chapter 2, “Major business continuity problem sets” on page 3.
򐂰 Ensure that you employ the appropriate hardware-based protection mechanisms to avoid
outages caused by single points of failure. For example, determine whether you should
use a direct access storage device (DASD) protection mechanism, such as mirroring or
RAID, or dual power sources.
򐂰 Journal objects for single system integrity and recovery. Journaling is supported for
database as well as integrated file system (IFS) objects, data areas, and data queues. If
you are using logical replication, journaling enables real-time replication of new and
changed data at a record level. You need to achieve the appropriate balance between
increased journaling overhead and IPL recovery times. Therefore, do not journal objects
where data integrity and recovery are not an issue (such as for temporary files) or when
you do not need real-time, record-level replication.
Backup window reduction
For backup window reduction, note the following tips:
򐂰 If using FlashCopy, ensure that all of your applications are at a quiesce point (for example,
by ending the applications) before you use FlashCopy. This is the safest way to ensure the
consistency of the saved data.
򐂰 If you are using logical object replication, you can perform backups on the backup system
instead of the primary system. However, you must still ensure that the save is from a
known recovery point on the target system. You can achieve a quiesce point either by
quiescing your applications or, more typically, by suspending replication until all changes
are applied on the backup system. A save can be initiated at this point on the backup
system.
Then replication can be restarted to ensure that the changes are sent to the backup
system. However, the changes can be held up and not applied until the save is completed.
Complete the save as quickly as possible and return to normal replication so that:
– Your replication processing doesn’t fall too far behind.
– You do not run an exposed system (if also doing HA).
Multisystem HA solution
For a multisystem HA solution, consider that:
򐂰 The best HA solutions are grounded on clustering technology. This enables better
automation, single point of control, system detection of failures, coordinated failover, and
coordinated switchover.
򐂰 Any good HA solution is exercised regularly. If you are not doing regular switching from
you primary server to your backup server, then you may not have the desired level of
availability in practice.
36
Data Resilience Solutions for IBM i5/OS High Availability Clusters
Planned maintenance
When performing a planned maintenance, keep in mind the following points:
򐂰 In general, regardless of the data resilience mechanism chosen, it is always best to
quiesce application end users prior to a planned outage. This ensures the highest
predictability of data consistency.
򐂰 If you are using the IBM TotalStorage Enterprise Storage Server (ESS) toolkit with
peer-to-peer remote copy (PPRC) for planned maintenance, additional considerations
apply. You can switch over to the secondary copy of the independent auxiliary storage pool
(IASP) on the second ESS box, perform planned maintenance, and then switch back
through the use of the toolkit services. The switch processing is somewhat longer and
more complicated than the other mechanisms indicated.
򐂰 Determine if multiple copies of data and multiple targets for application hosting are needed
to provide resilience in planned outage scenarios. If exposure to an outage of the
production system during the time that the backup system is offline is assessed as too
risky, then use multiple targets for the application and data. Alternately, you may use a
combination of solutions to avoid a loss of protection.
Recovery for disaster outages
Always exercise your disaster recovery plan. It is not enough simply to set up an environment
for disaster recover. The plan must be exercised to ensure that the automated and manual
processing involved yields the desired results and that operations staff are familiar with the
plan.
Appendix B. Cautions, caveats, and other tips
37
38
Data Resilience Solutions for IBM i5/OS High Availability Clusters
Glossary
application resilience The application itself can
continue to provide end-user services even if the system
that originally hosted the application fails.
asynchronous remote journaling The journal entry is
replicated to the target system after control is returned to
the application that deposits the journal entry on the
source system.
business continuity The capability of a business to
withstand outages and operate important services
normally and without interruption in accordance with a
predefined service-level agreement.
cluster A collection of complete systems that work
together to provide a single, unified computing capability.
An iSeries cluster is made up of only iSeries servers.
cluster node A system that is a member of a cluster.
cluster resource group (CRG) A collection of related
cluster resources that defines actions to be taken during a
switchover or failover operation of the access point of
resilient resources. The group describes a recovery
domain and supplies the name of the CRG exit program
that manages the movement of an access point.
cross-site mirroring (XSM) A function of i5/OS High
Available Switchable Resources, Option 41, that provides
geographic mirroring and the services to switch over or
automatically cause failover to a secondary copy,
potentially at another location, in the event of an outage at
the primary location.
data resilience The data remains accessible to the
application even if the system that originally hosted the
data fails.
disaster recovery The set of resources, plans,
services, and procedures used to recover mission critical
applications and to resume normal operations for these
applications at a remote site.
ESS Enterprise Storage Server; see IBM TotalStorage
Enterprise Storage Server.
failover A cluster event that causes cluster-critical
resources (for example, data and applications) to switch
over from the primary server to a backup system due to
the failure of the primary server or of the resource.
FlashCopy A hardware-based copy function that
provides a point-in-time volume copy within a single ESS.
© Copyright IBM Corp. 2005. All rights reserved.
geographic mirroring A subfunction of XSM that
generates a mirror image of an independent disk pool on
a system, which is (optionally) geographically distant from
the originating site for availability or protection purposes.
HABP See High Availability Business Partner.
high availability The ability to withstand all outages
(planned, unplanned, and disasters) and to provide
continuous processing for all mission critical applications.
High Availability Business Partner (HABP) Provides a
set of software, services, and solutions that enable high
availability for data and applications.
high-speed link (HSL) loop The system-to-tower
connectivity technology that is required to implement
switchable independent disk pools that reside on an
expansion unit (tower). The servers and towers in a
cluster using resilient devices on an external tower must
be on an HSL loop connecting with HSL cables.
IASP See independent auxiliary storage pool.
IBM TotalStorage Enterprise Storage Server (ESS)
Consists of a storage server and attached disk storage
devices. The storage server provides integrated caching
and RAID support for the attached disk devices. The disk
devices are attached via a Serial Storage Architecture
(SSA) interface. The ESS can be configured in a variety of
ways to provide scalability in capacity and performance.
independent auxiliary storage pool (IASP) Also
known as independent disk pool. One or more storage
units that are defined from the disk units or disk-unit
subsystems that makes up addressable disk storage. An
independent disk pool contains objects, the directories
that contain the objects, and other object attributes such
as authorization ownership attributes. An independent
disk pool can be made available (varied on) and made
unavailable (varied off) without restarting the system. An
independent disk pool can be either switchable among
multiple systems in a clustering environment or privately
connected to a single system.
journal A system object that identifies the objects being
journaled, the current journal receiver, and all the journal
receivers on the system for the journal. It is the process of
recording, in a journal, the changes made to objects, such
as physical file members or access paths, or the
depositing of journal entries by system or user functions.
39
logical replication The process of generating a second
copy of data that is logically identical to the first. The
replication is done on an object basis (file, member, data
area, program, and so on) near real time.
outage (planned, unplanned, disaster) An event that
causes disruptive loss of IT resources. The outage can be
planned, unplanned, or the result of a disaster. Planned
outages include scheduled maintenance of systems and
software. Unplanned outages include unrecoverable
failures of hardware or software components as well as
environmental disruptions such as intermittent power
loss. Disasters typically result in loss of an entire site due
to natural events (such as a flood or hurricane) or human
caused events (for example, sabotage).
peer-to-peer remote copy (PPRC) A hardware-based
remote copy option that provides a synchronous volume
copy across storage subsystems for disaster recovery,
device migration, and workload migration.
PPRC See peer-to-peer remote copy.
recovery point objective (RPO) The point in time
where recovery processing returns the end users (from
both data and application perspective).
recovery time objective (RTO) The length of time it
takes to recover from an outage and resume normal
operations for an application or a set of applications.
remote journal Remote journal management allows
you to establish journals and journal receivers on a
remote system or to establish journal and receivers on
independent disk pools that are associated with specific
journals and journal receivers on a local system. The
remote journaling function can replicate journal entries
from the local system to the journals and journal receivers
that are located on the remote system or independent disk
pools after they are established. Delivery of the journal
entries can be either synchronous or asynchronous. See
synchronous remote journaling and asynchronous remote
journaling.
switchable device The physical resource containing the
resource, such as independent disk pools that can be
switched between systems in a cluster. The device can be
an expansion unit that contains disk units in a multiple
system environment. It could also be an input/output
processor (IOP) that contains disk units in a logically
partitioned (LPAR) environment.
switchover A cluster event that causes a cluster critical
resource (for example, data or application) to switch over
from the primary server to a backup system due to the
manual intervention from the cluster management
interface. The resource becomes available for processing
on the backup system.
synchronous remote journaling The journal entry is
replicated to the target system concurrently with the entry
being written to the local receiver on the source system.
system In the context of this paper, a system is the set
of hardware and software that delivers an operational
environment for one or more applications. Each system
has exactly one operating system. A system may
therefore be a standalone server with a single operating
system or in may be an LPAR in an LPAR environment.
system-managed access-path protection
(SMAPP) An i5/OS function that allows a user to specify
a goal for the maximum amount of time that the system
should use to recover access paths after an abnormal
system end. The system automatically protects access
paths so that they can be recovered within the time
specified.
XSM See cross-site mirroring.
resilience An object that has the ability to recover
readily, as from failure. The object is capable of returning
to an original condition. See also application resilience
and data resilience.
RPO See recovery point objective.
RTO See recovery time objective.
SAN See storage area network.
SMAPP See system-managed access-path protection.
storage area network (SAN) A managed, high-speed
network that enables any-to-any interconnection of
heterogeneous servers and storage systems.
40
Data Resilience Solutions for IBM i5/OS High Availability Clusters
Back cover
Data Resilience
Solutions
utions
®
Redpaper
for IBM i5/OS High Availability Clusters
Understand the scope
of business continuity
problems and
solutions
Learn about data
resilience
technologies, their
features and
limitations
Determine the right
technologies for your
availability needs
Choosing the correct set of data resilience technologies in the context
of your overall business continuity strategy can be complex and
difficult. It is a given that business continuity is an extremely broad
topic. This IBM Redpaper provides insight into this broad topic and
describes the technologies that provide improved data resilience for
end users. It explains the capabilities, advantages, and limitations of
the various technologies. Plus it provides information and techniques
that you can use to select the best set of technologies available on
IBM Eserver i5, to use in conjunction with the IBM i5/OS high
availability clusters, to satisfy your specific business continuity goals.
This IBM Redpaper is organized so that you can study the content
from cover-to-cover (recommended) or use specific sections for
reference as needed. It begins with a discussion about the business
continuity requirements. This information helps you to determine and
prioritize your business requirements in the context of the specific
problem sets of interest to you.
Next this IBM Redpaper presents an overview of the technologies that
are related to business continuity. This helps you to understand the
technology categories and choices within each category that are
available to address the problem sets. Then, the paper explains how
the technologies apply to the business continuity requirements and
compare with one another. A detailed analysis helps position the
various technologies against your business requirements. Finally,
some conclusions are drawn about the technologies, that map
solutions to the characteristics of end-user environments.
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
For more information:
ibm.com/redbooks
Download