IBM Power Systems High Availability Considerations IBM i Erik Rex Cert Consultant Specialist IBM Danmark aps rex@dk.ibm.com © 2012 IBM Corporation Downtime Causes Of Downtime Solution Required Disaster Recovery High Availability Downtime refers to a period of time or a percentage of a time span that a machine or system (usually a computer server) is offline or not functioning, usually as a result of either system failure (such as a crash or routine maintenance. Source: IBM HA Presentation, Eric Hess, April 2001 (Continuous Operations) * ‘HA’ generally refers to solutions that provide BOTH recovery and availability. Not all technologies provide a solution for BOTH…iTera 5.0 HA does Reliability is not the same as Availability! 2 © 2012 IBM Corporation Business Continuity is: Capability of a business to withstand outages and operate mission critical services normally and without interruption per a pre-defined Service Level Agreement – Solution must address data, operational environment, applications, the application hosting environment, and the end user interface – Requires a collection of services, software, hardware, and procedures to be selected, described in a documented plan, implemented, and practiced regularly Includes both Disaster Recovery (DR) and High Availability (HA) – DR addresses the set of resources, plans, services and procedures to recover and resume mission critical applications at a remote site in the event of a disaster – HA defined as the ability to withstand all outages (planned, unplanned, and disasters) and to provide continuous processing for all mission critical applications 3 © 2012 IBM Corporation What is a Service Level Agreement (SLA)? General: – Contractual service commitment. – A document that describes the minimum performance criteria a provider promises to meet while delivering a service. – Typically also sets out the remedial action and any penalties that will take effect if performance falls below the promised standard. Relative to Availability: – Commitment to the business describing the level of availability for IT services that support critical business solutions. – Addresses when IT services are expected to be fully operational, when they may be running degraded, and when they won’t be available – Driven primarily be importance of IT services in providing business solutions, cost factors, and realism. Many factors involved 4 © 2012 IBM Corporation Application Resilience Combine with Data Resilience for complete solution Fully transparent • Full resilience with automatic restart & transparent failover • Users repositioned to last committed transaction • No data loss, no sign-on required, no perceived loss of server; only delay in response Semi-transparent: • Automatic application restart & recovery to last transaction boundary • The resilient data & the application restart point match exactly Huh? Did something happen? Semi-automatic: • Automatic application restart & recovery to some architected application “restart” point • Normally consistent with state of data, but user may have to manually match application to position of data Basic application failover: • Automatic application restart after outage • User manually repositions within application HA enabled applications and iSeries Clusters Checkpoint restart. Not too bad . iSeries Clusters Data resiliency Single Server Start over. Where's all my work? No application recovery: • Users manually restart application with resilient data • User determines where to resume work 5 © 2012 IBM Corporation The Fundamentals Recovery or Continuity ? Clustering Application Resiliency Data Resiliency – IASPs, Replication Transaction Integrity - Journaling Data Protection - Raid 5 & Mirroring Data Replication alone is not sufficient for HA Clustering, Automation and application resiliency completes the equation 6 © 2012 IBM Corporation Clustering for HA/DR Clustering is the default deployment for HA/DR in the mainframe and Unix marketplace With PowerHA SystemMirror, clustering is available to our IBM i customers Provides for automated failover – minimal IT operations involvement Planned and unplanned outage management 7 © 2012 IBM Corporation Clustering A property of the Operating System Provides the logical connections between resilient data groups Can enable the automation of physical and logical switching Can enable a resilient application to be “switched”, activated and repositioned to a defined state Enables the automatic sequencing of events that bring the user, application and data to a coherent production state automatically Application design is the primary limiting factor 8 Application Resiliency Data Resiliency Replication and Switched IASPs Cluster Management iSeries Navigator or partner products High availability cluster enabled applications APIs Cluster Resource Services Base IBM i cluster functions from IBM Heart beating IP Address Takeover Reliable internal cluster communications Switchover administration Distributed activities © 2012 IBM Corporation Data Resilience Technologies Logical Replication – Business partner software product •Vision Solution, Quick EDD, iCluster …. Switchable Device – Switchable IASPs Operating System Storage Management based Replication – PowerHA System Mirror Cross-Site Mirroring (XSM) with Geographic Mirroring SAN Storage Server based Replication – SAN Metro/Global Mirror used with PowerHA IBM I Advanced Copy Service toolkit 9 © 2012 IBM Corporation Logical Replication Second copy of data is generated logically identical to first Replication done on object basis (file, member, data area, program, etc.) near real-time Backup (target system) Tape Normally done via a business partner software product 10 © 2012 IBM Corporation Logical Replication Backup (target system) Tape Widely deployed data resiliency topology for Power IBM i – Typically deployed via an HA Business Partner solution package – Replication done on object basis (file, data area, program, etc.) near real-time • Done at the lowest unit of change for the object, e.g. record level for database files • Otherwise, done on entire object when change detected by replication software • Solution Packages can use IBM i Remote Journaling as efficient, reliable transport mechanism. Benefits: – Rapid activation of production environment on backup server via role-swap operation – Replicated data can be concurrently accessed for backups or other read-only apps – Minimal recovery is needed when switching over to the backup copy Considerations: – – – – 11 Complexity of setup and maintenance Modification of ‘live’ copies of objects on backup server Lag time between changes on source being available on backup server Consistency between journaled and non-journaled objects © 2012 IBM Corporation Switchable IASPs Independent Auxiliary Storage Pools (IASPs) – IBM i Option 41 - High Availability Switchable Resources – Switch disks from one system to another Benefits: – – – – – Simplicity Data is always current (no copy to synchronize) No in-flight data to lose Minimal performance overhead Supports integrated environments (Windows, Linux) as well as IBM i Considerations: – – – – 12 Setup DASD configuration, data, and application structure Single copy of data (mirroring recommended to protect data, reduce SPOFs) No concurrent access from both hosts HW restrictions (distance, conc maint) © 2012 IBM Corporation Switchable Devices PowerHA Basic Concepts (for IBM i clients) Admin domain (IASP) Application data aka (volume group) The underlying data resiliency is not based on replication it is based on a pool of disk which is shared and switchable between nodes in the cluster PowerHA SystemMirror enables the cluster nodes to behave as resource for the applications in the event of an outage Admin Domain takes care of the sysbas data…(fyi…no you do not use a software replication product to replicate the Admin Domain data) 13 © 2012 IBM Corporation PowerHA SystemMirror Strategy Deep Integration – Cluster Aware IBM i – SLIC based event processing – Centralize cluster topology management Ease of deployment & ease of use – System Director Navigator management – Discovery based deployment – Cluster wide security Multi-Site & Disaster Recovery – Differentiate with IBM storage – Integrated IBM copy services Solution Package Optimization – Standard Edition, Enterprise Editions 15 © 2012 IBM Corporation PowerHA SystemMirror (Cross-Site Mirroring) (XSM) with Geographic Mirroring Second copy of data in an IASP is generated logically identical to first Changes to production IASP replicated to second copy of IASP thru another system Operating system storage management based replication solution Primary (source system) Backup (target system) Production Data Backup 16 Mirror Copy © 2012 IBM Corporation PowerHA Cross-Site Mirroring (XSM) with Geographic Mirroring Mirroring of IASP data via IBM i storage management to a second server – XSM Included in Option 41 of OS – Enables switching or automatic failover to mirrored copy of IASP Benefits: – – – – – Primary (source system) Same as switched device Production Data Two copies of IASP data Can be local or remote (Sync or Async) Ease of deployment and operation Supports integrated environments (Windows, Linux) as well as IBM i Client partitions – Heartbeat monitoring with automate failover or manual switchover Backup Backup (target system) Mirror Copy Considerations: – Performance impacts of synchronous operation, distance, bandwidth, latency – Mirror copy cannot be concurrently access – Lengthy full data re-synchronization 17 © 2012 IBM Corporation PowerHA Basic Concepts Admin domain Geomirror Internal disk is not switchable (LUNs required), you use geomirror with internal disk configurations 18 © 2012 IBM Corporation PowerHA Metro/Global Mirror with IBM i Advanced Copy Services toolkit Replication of iASP data at storage controller level to Backup SAN using Metro or Global Mirror – Metro or Global Mirror generates a second copy of the IASP on another Storage server – Toolkit part of Power IBM i Advanced Copy Services for IBM i offering – Combines Metro/Global Mirror, PowerHA, IASP, and IBM i cluster services – Coordinated switchover/failover Benefits: – Remote copy and coordinated switching without an IPL – Can combine with FlashCopy for backup window reduction Considerations: – – – – 19 Performance impacts of synchronous mode: distance, bandwidth, latency Mirror copy cannot be concurrently accessed Asynchronous mode requires IBM SAN Global Mirror Requires tools and services to deploy © 2012 IBM Corporation PowerHA Enterprise Edition Two Node Cluster Admin domain *SYSBAS (Prod) POWER7 IBM i IASP (Switchable) HA POWER7 IBM i Production Metro Mirror Global Mirror IASP F l a s h C o p y DR IASP Flash Backup *SYBAS (Backup I) DS8000 SVC Storewize V7000 DS8000 SVC* Storewize V7000* Local Site 20 *SYSBAS (DR) POWER7 IBM i © DR Site * Initially available English only via prpq 5799 HAS © 2012 IBM Corporation PowerHA Enterprise Edition Three Node Cluster DR Site LUN Group (active) Production Admin domain *SYSBAS (Prod) POWER7 IBM i IASP (Switchable) HA *SYBAS (Backup I) POWER7 IBM i DS8000 SVC* Storewize V7000* Local Site 21 *SYSBAS (DR) MetroMirror Global Mirror F l a s h C o p y IASP DR IASP Flash Backup *SYBAS (Backup I) POWER7 IBM i DS8000 SVC* Storewize V7000* © DR Site © 2012 IBM Corporation Redundant VIOS I/O Virtualization Redundant VIOS partitions provide two paths to attached SAN storage VIOS – AIX, i, and Linux partitions – One set of disk – Client partitions use MPIO Redundant VIOS partitions provide access to mirrored SAN storage VIOS Power Hypervisor VIOS – AIX, i, and Linux partitions – Mirrored set of disk – Mirroring done by client partitions (e.g., IBM i) VIOS Power Hypervisor Note: Redundant VIOS partitions are not supported on BladeCenter JS12, JS22, JS23, and JS43 22 © 2012 IBM Corporation PowerVM can Help Manage Risk Business and IT security and resiliency are as critical as ever, and must be dynamic and intelligent in order to match the speed of business change PowerVM Live Partition Mobility – Move running IBM I, AIX and Linux partitions between systems – Using VIOS Virtualized SAN and Network Infrastructure √ Eliminate planned outages and balance workloads across systems 23 © 2012 IBM Corporation IBM i Capacity Back Up (CBU) Licensing Example Planning CBU allows PowerHA licenses entitlement fail-over from the registered production server – Minimum 1 entitlement required on the CBU box – CBU server allows the temporary transfer of entitlements from primary server for non concurrent usage on the CBU server – Round-up when using partial processors – 3.5 processors = 4 entitlements Example No HA/DR required for Partition 1 – No PowerHA licenses HA required for Partition 2 and 3 – All processors in the production server partitions 2 and 3 are licensed for PowerHA – One key, 8 entitlements – The license key will be a permanent key installed on partition 2 and 3 A single processor is licensed on the CBU server – One key, one entitlement – The license key will be a temporary key for 8 cores good for two years 24 installed on partitions 2 and 3 Production CBU Partition #1 Unused Partition #2 Active standby CBU Cores Partition #3 Active standby CBU Cores IBM i, 5250, PowerHA © 2012 IBM Corporation Key Solutions Comparison Characteristics 1. Primary use 8. Number of Backup systems 2. Characteristic of Replication Mechanism 9. Number of Data copies allowed 3. Recovery Time 4. Recovery Point 5. Ordering of changes 6. Concurrent access 10. Cost Factors 11. End User 12. Outage coverage 13. Cluster controlled resource 14. Risks 7. Geographic dispersion Consider other decision factors 25 © 2012 IBM Corporation Applicability of Solution to Problem Set Start to determine possible matches of technologies to specific needs 1. Initial analysis to eliminate technologies that do not fit 2. After initial analysis, perform detailed analysis of complete requirement sets against specific characteristics of each technology Data Resilience Technologies Business Continuity Requirement Logical replication Backup Window Reduction Switched disk PowerHA PowerHA with Copy Services toolkit n/a Planned Maintenance Recovery for disaster outage n/a HA for unplanned outage Workload Balancing 26 n/a n/a n/a © 2012 IBM Corporation Conclusions When to consider Logical Replication? – – – – – – Need two or more copies of the data Want some level of concurrent access to second data copy Need backup window reduction Already have solution deployed using logical object replication Need a solution that has no special hardware configuration requirements Transaction level integrity is important for all journaled objects When to consider Switchable IASPs ? – – – – – – – – 27 Single copy of data meets requirements; addressed exposure to disk subsys failures Need a very simple, low cost, low maintenance solution No need for DR solution Source and target system will be at the same site Want consistent fail/switchover times within minutes independent of transaction volumes Need transaction level integrity for all objects; no loss of in-flight data Need highest throughput environment Need multiple, independent databases that can be moved between systems © 2012 IBM Corporation Conclusions (2) When to consider PowerHA System Mirror (Cross-Site Mirroring) ? – Want a system-generated second copy of the data (at an IASP level) – Need two copies of data, but do not need concurrent access to second copy – Want relatively low cost, low maintenance solution, but also need disaster recovery – Want consistent fail/switchover times within minutes independent of transaction volumes When to consider PowerHA Metro/Global Mirror with IASP and Toolkit ? – Want storage based solution for HA; especially if multiple platforms are involved – Want consistent fail/switchover times within minutes independent of transaction volumes – Need two copies of data, but do not need concurrent access to second copy 28 © 2012 IBM Corporation Conclusions (3) When to consider a combination solution? – When no single solution meets all of your business continuity requirements 29 © 2012 IBM Corporation Power Systems 30 © 2012 IBM Corporation Summary: Clustering PLUS Data replication PLUS Cluster enabled replication EQUALS ’real’ HA Application Resiliency Data Resiliency Replication and Switched IASPs Cluster Management iSeries Navigator or partner products High availability cluster enabled applications APIs Cluster Resource Services Base IBM i cluster functions from IBM Data Replication alone is not sufficient for HA Clustering, Automation and application resiliency completes the equation 31 Heart beating IP Address Takeover Reliable internal cluster communications Switchover administration Distributed activities © 2012 IBM Corporation PowerHA Webcast V1.0 PowerHA for IBM i Resources PowerHA Website – www.ibm.com/systems/power/software/availability/ PowerHA Options for IBM i - Introduction (incl. 7.1) – http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS4021 PowerHA and DS8000 Storage Integration on IBM i – http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS4361 Lab Services – http://www-03.ibm.com/systems/services/labservices Redbooks at www.redbooks.ibm.com – PowerHA SystemMirror for IBM i Cook book - SG24-7994 (Jan 2012) 35 – Implementing PowerHA for IBM i - SG24-7405-00 (Nov 2008) – Clustering and IASPs for Higher Availability - SG24-5194-01 – Independent ASPs: A Guide to Moving Applications to IASPs - SG24-6802-00 – Independent ASP Performance Study on the IBM iSeries - REDP-3771-00 – Implementing SAP Applications on the IBM System i with IBM i5/OS - SG24-7166-00 IBM System Storage Solutions for IBM i – Course code: AS930, Duration: 4.0 days – www-304.ibm.com/jct03001c/services/learning/ites.wss/us/en?pageType=course_description&courseCode=AS930 Is your ISV solution registered as ready for PowerHA? – http://www-304.ibm.com/isv/tech/validation/power/index.html High Availability Clusters (Power HA) and Independent Disk Pools for IBM i – Course code: AS541,OS830 Duration: 4.0 days – www-304.ibm.com/jct03001c/services/learning/ites.wss/us/en?pageType=course_description&courseCode=AS541 Risk Self Assessment: – www.ibm.com/smarterplanet/us/en/business_resilience_management/overview/index.html?re=2brf24 © Copyright IBM Corporation 2011 © 2012 IBM Corporation Special notices This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area. Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied. All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions. IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice. IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies. All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generallyavailable systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment. Revised September 26, 2006 36 © 2012 IBM Corporation Special notices (cont.) IBM, the IBM logo, ibm.com AIX, AIX (logo), AIX 5L, AIX 6 (logo), AS/400, BladeCenter, Blue Gene, ClusterProven, DB2, ESCON, i5/OS, i5/OS (logo), IBM Business Partner (logo), IntelliStation, LoadLeveler, Lotus, Lotus Notes, Notes, Operating System/400, OS/400, PartnerLink, PartnerWorld, PowerPC, pSeries, Rational, RISC System/6000, RS/6000, THINK, Tivoli, Tivoli (logo), Tivoli Management Environment, WebSphere, xSeries, z/OS, zSeries, Active Memory, Balanced Warehouse, CacheFlow, Cool Blue, IBM Systems Director VMControl, pureScale, TurboCore, Chiphopper, Cloudscape, DB2 Universal Database, DS4000, DS6000, DS8000, EnergyScale, Enterprise Workload Manager, General Parallel File System, , GPFS, HACMP, HACMP/6000, HASM, IBM Systems Director Active Energy Manager, iSeries, Micro-Partitioning, POWER, PowerExecutive, PowerVM, PowerVM (logo), PowerHA, Power Architecture, Power Everywhere, Power Family, POWER Hypervisor, Power Systems, Power Systems (logo), Power Systems Software, Power Systems Software (logo), POWER2, POWER3, POWER4, POWER4+, POWER5, POWER5+, POWER6, POWER6+, POWER7, System i, System p, System p5, System Storage, System z, TME 10, Workload Partitions Manager and X-Architecture are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A full list of U.S. trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. AltiVec is a trademark of Freescale Semiconductor, Inc. AMD Opteron is a trademark of Advanced Micro Devices, Inc. InfiniBand, InfiniBand Trade Association and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade Association. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries or both. Microsoft, Windows and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries or both. NetBench is a registered trademark of Ziff Davis Media in the United States, other countries or both. SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC). The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC). UNIX is a registered trademark of The Open Group in the United States, other countries or both. Other company, product and service names may be trademarks or service marks of others. Revised December 2, 2010 37 © 2012 IBM Corporation