(Templates v2.2) (Publication Template v3.1) EMC® RecoverPoint Release 3.3 Administrator’s Guide P/N 300-010-641 REV A02 EMC Corporation Corporate Headquarters: Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com Copyright © 2006 - 2010 EMC Corporation. All rights reserved. Published March, 2010 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners. 2 EMC RecoverPoint Release 3.3 Administrator’s Guide Contents Preface............................................................................................................................ 13 Introduction ....................................................................................... Audience ..................................................................................... Related documentation............................................................. Conventions used in this documentation .............................. Where to get help....................................................................... Online help ................................................................................. Chapter 1 14 14 14 15 15 16 Concepts RecoverPoint product family .......................................................... RecoverPoint .............................................................................. RecoverPoint/SE ....................................................................... RecoverPoint configurations ........................................................... CDP configurations ................................................................... CRR configurations ................................................................... CLR configurations.................................................................... RecoverPoint hardware and software............................................ RPAs............................................................................................. Splitters........................................................................................ RecoverPoint Management Applications .............................. RecoverPoint logical entities ........................................................... Consistency groups ................................................................... Copies .......................................................................................... Replication sets .......................................................................... Journals ....................................................................................... Volumes....................................................................................... Snapshots .................................................................................... Links ............................................................................................ EMC RecoverPoint Release 3.3 Administrator’s Guide 20 20 20 22 22 23 23 24 24 27 28 30 30 32 33 33 36 38 47 3 Contents RecoverPoint performance .............................................................. 49 Application regulation.............................................................. 49 Replication modes ..................................................................... 50 RPO control ................................................................................ 53 RTO control ................................................................................ 56 Distributed consistency groups............................................... 58 Load balancing........................................................................... 62 RecoverPoint data recovery procedures........................................ 73 Image access ............................................................................... 73 Failover ....................................................................................... 77 RecoverPoint synchronization processes ...................................... 78 Initialization ............................................................................... 78 Full sweeps ................................................................................. 80 Volume sweeps .......................................................................... 83 Long initializations.................................................................... 85 Short initializations ................................................................... 85 First-time initializations............................................................ 86 Fast first-time initializations .................................................... 87 RecoverPoint data flow.................................................................... 89 RecoverPoint replication phases ............................................. 89 The write phase.......................................................................... 89 The transfer phase ..................................................................... 90 The distribution phase .............................................................. 93 RecoverPoint workflows ............................................................... 106 Configuring replication .......................................................... 106 Monitoring and managing RecoverPoint ............................ 106 Moving operations to another site ........................................ 106 Event notification .................................................................... 106 Chapter 2 Getting Started Licensing overview......................................................................... 108 The Getting Started Wizard........................................................... 110 Welcome screen........................................................................ 110 Account Settings screen .......................................................... 110 System Report Settings screen ................................................ 111 Managing RecoverPoint licences.................................................. 113 Prerequisites ............................................................................. 113 Defining your license key in RecoverPoint ......................... 113 Requesting an activation code ............................................... 114 Defining your activation code in RecoverPoint .................. 115 Upgrading your license .......................................................... 116 Re-activating your license ...................................................... 117 4 EMC RecoverPoint Release 3.3 Administrator’s Guide Contents Viewing your license information ......................................... 118 Access control .................................................................................. 119 User authentication.................................................................. 119 User authorization ................................................................... 124 Chapter 3 Starting Replication Adding splitters............................................................................... 128 How to add splitters to the RecoverPoint system ............... 129 Creating new consistency groups ................................................. 132 The New Consistency Group Wizard ................................... 132 Configuring replication policies.................................................... 143 Configuring consistency group policies ............................... 143 Configuring copy policies....................................................... 152 Modifying existing settings and policies ..................................... 158 How to modify an existing consistency group .................... 158 How to modify an existing copy ........................................... 163 How to modify an existing replication set ........................... 166 How to modify an existing journal ....................................... 170 Manually attaching volumes to splitters ..................................... 173 Chapter 4 Managing and Monitoring RecoverPoint Management Application...................................... 178 The System Pane ...................................................................... 179 The Traffic Pane ........................................................................ 180 The Navigation Pane ............................................................... 181 The Component Pane .............................................................. 185 Monitoring and analyzing system performance ........................ 213 Monitoring and analyzing system performance ................. 213 Chapter 5 Testing, Failover, and Migration Use cases........................................................................................... 224 First-time initialization............................................................ 224 First-time initialization from backup .................................... 225 First-time failover..................................................................... 227 Testing a replica........................................................................ 228 Offloading a task ...................................................................... 229 Recovering from a disaster ..................................................... 229 Recovering the production source......................................... 230 Failing over to a replica temporarily..................................... 231 Routine maintenance on production system ....................... 233 EMC RecoverPoint Release 3.3 Administrator’s Guide 5 Contents Migration .................................................................................. Bookmarking ................................................................................... Creating a bookmark............................................................... Applying bookmarks to multiple groups simultaneously Automatic periodic bookmarking......................................... Applying bookmarks using KVSS ........................................ Accessing a replica.......................................................................... Enabling image access ............................................................ Direct Image Access ................................................................ Image Access Enabled mode ................................................. Failover commands ........................................................................ Chapter 6 Notification of Events Configuring event notification ..................................................... E-mail notification .......................................................................... SNMP notification .......................................................................... SNMP trap configuration ....................................................... OMSA support......................................................................... Syslog notification .......................................................................... System reports................................................................................. Before you begin ...................................................................... System report operations........................................................ Best practice.............................................................................. System alerts.................................................................................... Before you begin ...................................................................... System alert operations .......................................................... Collecting system information...................................................... Process alternatives ................................................................. Process errors ........................................................................... Splitter credentials................................................................... How to collect system information....................................... Chapter 7 233 235 235 236 237 239 242 242 246 247 249 252 253 255 256 257 259 260 261 262 265 266 266 267 268 268 269 269 269 Host Cluster Support Configuring RecoverPoint cluster support................................. 276 Appendix A Events Introduction..................................................................................... 278 Normal events ................................................................................. 279 Detailed events................................................................................ 301 6 EMC RecoverPoint Release 3.3 Administrator’s Guide Contents Appendix B Kutils Reference Introduction ..................................................................................... 312 Usage.......................................................................................... 312 Path designations ..................................................................... 312 Commands ....................................................................................... 314 flushFS ....................................................................................... 315 manage_auto_host_info_collection ....................................... 316 mount......................................................................................... 317 showFS....................................................................................... 318 show_vol_info .......................................................................... 319 show_vols.................................................................................. 320 sqlRestore .................................................................................. 321 sqlSnap....................................................................................... 323 start............................................................................................. 326 stop............................................................................................. 327 umount ...................................................................................... 328 Appendix C Troubleshooting My host applications are hanging................................................. 330 When does application regulation happen? ........................ 330 How does application regulation work? .............................. 330 How do I know application regulation is happening?....... 330 What can I do to stop my group from being regulated? .... 331 My copy is being regulated ........................................................... 332 When does control action regulation happen? .................... 332 How do I know control action regulation is happening? .. 332 How does control action regulation work?.......................... 332 How do I release a copy from control action regulation? .. 333 How do I verify that regulation is over? .............................. 333 My copy has entered a high load state......................................... 334 How do I know a copy is experiencing a high load?.......... 334 What is a permanent high load? ............................................ 334 When do permanent high loads occur? ................................ 335 How do permanent high loads work? .................................. 335 How can I tell a copy is under permanent high load?........ 335 What can I do to come out of permanent high load?.......... 336 How do I verify that a permanent high load is over? ........ 336 What is a temporary high load?............................................. 336 When do temporary high loads occur? ................................ 336 How do temporary high loads work?................................... 337 How can I tell a copy is under temporary high load? ........ 337 What should I know about temporary high loads? ............ 337 EMC RecoverPoint Release 3.3 Administrator’s Guide 7 Contents How do I verify that a temporary high load is over?......... My RPA keeps rebooting ............................................................... When does reboot regulation happen? ................................ How does reboot regulation work? ...................................... How do I know reboot regulation is happening?............... What should I do to stop reboot regulation?....................... 8 EMC RecoverPoint Release 3.3 Administrator’s Guide 337 339 339 339 339 339 Figures Title 1 2 3 4 5 6 7 Page Examples of snapshots and bookmarks...................................................... 39 Automatic snapshot consolidation .............................................................. 42 Snapshot consolidation policy...................................................................... 46 Schematic of logged image access................................................................ 75 RecoverPoint Management Application................................................... 178 Consistency Groups Tab ............................................................................. 187 Normal replication to local and remote replica simultaneously........... 242 EMC RecoverPoint Release 3.2 Administrator’s Guide 9 Figures 10 EMC RecoverPoint Release 3.2 Administrator’s Guide Tables Title 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Page Journal size with snapshot consolidation equation legend ...................... 35 Snapshot consolidation policies .................................................................... 45 Image access modes ........................................................................................ 74 RecoverPoint license parameters ................................................................ 108 Add New User settings ................................................................................ 120 Predefined users ............................................................................................ 121 LDAP Configuration settings ...................................................................... 122 Add New Role settings................................................................................. 125 Permissions that may be granted or denied.............................................. 125 Consistency Group General Settings.......................................................... 134 Copy General Settings .................................................................................. 135 Consistency Group General Settings.......................................................... 143 Consistency Group Compression Policy Settings .................................... 144 Consistency Group Protection Policy Settings.......................................... 144 Consistency Group Resource Allocation Policy Settings ........................ 147 Consistency Group Stretch Cluster / SRM Support Policy Settings ..... 148 Consistency Group Advanced Policy Settings ......................................... 149 Copy Protection Policy Settings .................................................................. 153 Copy Journal Policy Settings ....................................................................... 155 Copy Advanced Policy Settings .................................................................. 156 General commands ....................................................................................... 185 Multiple consistency group commands..................................................... 186 Specific consistency group commands ...................................................... 188 Status Tab ....................................................................................................... 190 Copy commands............................................................................................ 193 Journal Tab: Image information .................................................................. 197 Journal Tab: Journal information ................................................................ 197 Journal Tab: Sample images information .................................................. 198 Journal Tab: Snapshot Consolidation Progress information .................. 199 Splitter commands ........................................................................................ 199 EMC RecoverPoint Release 3.2 Administrator’s Guide 11 Tables 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 12 Volume commands ....................................................................................... vCenter Server commands........................................................................... vCenter Server detail commands................................................................ Add vCenter Server Settings ....................................................................... Edit vCenter Server Settings........................................................................ Log commands .............................................................................................. Log filtering settings ..................................................................................... Components monitored from the Management Application................. Bottlenecks ..................................................................................................... Consolidated statistics output..................................................................... Consolidation policies .................................................................................. Image access modes ...................................................................................... Image access enabled mode......................................................................... Failover commands....................................................................................... New Alert Rule settings ............................................................................... SNMP general settings ................................................................................. RecoverPoint SNMP trap variables ............................................................ Syslog settings ............................................................................................... Collect system information settings ........................................................... Listing of normal events and their descriptions....................................... Listing of detailed events and their descriptions ..................................... EMC RecoverPoint Release 3.2 Administrator’s Guide 203 205 205 206 207 210 211 213 214 217 236 244 247 249 253 255 256 259 270 279 301 Preface Preface As part of an effort to improve and enhance the performance and capabilities of its product lines, EMC periodically releases revisions of its hardware and software. Therefore, some functions described here may not be supported by all versions of the software or hardware currently in use. For the most up-to-date information on product features, refer to your product release notes. If a product does not function properly or does not function as described in the following sections, please contact your EMC representative. Preface 13 Preface Introduction This help file is part of the EMC RecoverPoint documentation, and is intended for use by those who are responsible for administering the EMC RecoverPoint system. Audience Readers of this help file are expected to be familiar with the following topics: ◆ ◆ ◆ ◆ Documentation relevance per RecoverPoint product operating systems network topologies storage technologies enterprise-level applications Excluding the limitations described in EMC RecoverPoint and RecoverPoint/SE Release Notes, the procedures in this documentation are correct for both RecoverPoint and RecoverPoint/SE, see “RecoverPoint product family” on page 20. However, any procedure steps or guidelines that are only available as part of the new features of RecoverPoint/SE version 3.2, are included and excluded, from the general procedures with In RecoverPoint/SE only or In RecoverPoint only labels. Note: These additional steps and guidelines are only applicable to procedures when the conditions in “New features of RecoverPoint/SE version 3.2” on page 20 are met. When these conditions are not met, all of the new features are unavailable, and therefore, the general procedure steps and descriptions apply to both RecoverPoint and RecoverPoint/SE. Related documentation 14 Related documents include: ◆ EMC RecoverPoint Deployment Manager Product Guide ◆ EMC RecoverPoint CLI Reference Guide ◆ EMC RecoverPoint Deploying RecoverPoint with SANTap and SAN-OS Technical Notes ◆ EMC RecoverPoint Deploying RecoverPoint with SANTap and NX-OS Technical Notes ◆ EMC RecoverPoint Deploying RecoverPoint with Connectrix AP-7600B and PB-48K-AP4-18 Technical Notes EMC RecoverPoint Release 3.3 Administrator’s Guide Preface Conventions used in this documentation EMC uses the following conventions for special notices: Note: A note presents information that is important, but not hazard-related. EMC uses the following type style conventions in this help file: Where to get help Normal • Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) • Names of resources, attributes, pools, Boolean expressions, buttons, DQL statements, keywords, clauses, environment variables, functions, utilities • URLs, pathnames, filenames, directory names, computer names, filenames, links, groups, service keys, file systems, notifications Bold • Names of commands, daemons, options, programs, processes, services, applications, utilities, kernels, notifications, system call, man pages • Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) • What user specifically selects, clicks, presses, or types Italic • • • • Courier • System input and output (command prompt indicates to input everything after the command prompt) <> Angle brackets enclose parameter or variable values supplied by the user [] Square brackets enclose optional values | Vertical bar indicates alternate selections - the bar means “or” {} Braces indicate content that you must specify (that is, x, or y, or z) ... Ellipses indicate nonessential information omitted from the example Full titles of publications referenced in text Emphasis (for example a new term) Variables Values of parameters EMC support, product, and licensing information can be obtained as follows. Product information: For documentation, release notes, software updates, or for information about EMC products, licensing, and service, go to the Powerlink Web site (registration required) at: http://powerlink.emc.com Technical support: For technical support, go to EMC Customer Service on Powerlink. To open a service request through Powerlink, you must have a valid support agreement. Introduction 15 Preface Please contact your EMC representative for details about obtaining a valid support agreement or to answer any questions about your account. Your suggestions: Your suggestions will helps us continue to improve the accuracy, organization, and overall quality of the user publications. Please send your opinion of this guide to: SSG_Documentation@EMC.com Online help To search, bookmark, or print sections of the documentation released with your RecoverPoint product version; select Help > Help Contents from the main menu of the EMC RecoverPoint Management Application GUI (see “Monitoring and analyzing system performance” on page 213). The RecoverPoint Help dialog box is displayed. The following sections deal with the topics: ◆ ◆ ◆ ◆ ◆ “Searching help topics” “Viewing search results” “Printing help topics” “Bookmarking help topics” “Viewing bookmarked help topics” The RecoverPoint Help dialog box contains the contents of the EMC RecoverPoint Administrator’s Guide. The RecoverPoint Help landing page contains links to the complete and most current RecoverPoint documentation on Powerlink. 16 EMC RecoverPoint Release 3.3 Administrator’s Guide Preface Searching help topics To search the RecoverPoint Help files for a specific term, type the term into the Search field at the top-left corner of the RecoverPoint Help dialog box, and click the Go button. Click one of the search results to display the corresponding content. Viewing search results To view the results of your last search, click the Search Results Tab in the bottom-left corner of the RecoverPoint Help dialog box. Printing help topics To print a topic, or a topic and all subtopics contained in the topic, click the Print button in the Contents Pane of the RecoverPoint Help dialog box. Introduction 17 Preface Bookmarking help topics Viewing bookmarked help topics 18 To bookmark help topics for later viewing, click the Bookmark button in the top-right corner of the RecoverPoint Help dialog box. To view bookmarked help topics, click the Bookmarks Tab in the bottom-left corner of the RecoverPoint Help dialog box. EMC RecoverPoint Release 3.3 Administrator’s Guide 1 Concepts Concepts This section explains the basic concepts of RecoverPoint, replicating with RecoverPoint and the workflows for configuring and managing replication volumes. The topics in this section are: ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ RecoverPoint product family ........................................................... 20 RecoverPoint configurations ............................................................ 22 RecoverPoint hardware and software............................................. 24 RecoverPoint logical entities ............................................................ 30 RecoverPoint performance ............................................................... 49 RecoverPoint data recovery procedures ......................................... 73 RecoverPoint synchronization processes ....................................... 78 RecoverPoint data flow ..................................................................... 89 RecoverPoint workflows................................................................. 106 Concepts 19 Concepts RecoverPoint product family The RecoverPoint product family consists of: ◆ “RecoverPoint” - provides cost-effective, continuous data protection and continuous remote replication, enabling on-demand protection and recovery at any point in time. ◆ “RecoverPoint/SE”- ensure continuous data protection and continuous remote data replication for your EMC CLARiiON networked storage. Note: For the complete list of RecoverPoint and RecoverPoint/SE limitations, see the EMC RecoverPoint and RecoverPoint/SE Release Notes. RecoverPoint EMC RecoverPoint brings you continuous data protection and continuous remote replication for on-demand protection and recovery at any point in time. The advanced capabilities of RecoverPoint include policy-based management, application integration, and WAN acceleration. With RecoverPoint you'll implement a single, unified solution to protect and/or replicate data across heterogeneous storage. You'll simplify management and reduce costs, recover data at a local or remote site at any point in time, and ensure continuous replication to a remote site without impacting performance. RecoverPoint/SE EMC RecoverPoint/SE brings continuous data protection and continuous remote replication to your EMC CLARiiON networked storage. RecoverPoint/SE gives you on-demand protection and recovery at any point in time and advanced capabilities such as policy-based management and bandwidth optimization. With RecoverPoint/SE you'll implement a single unified solution for data protection, simplify management, reduce costs, and avoid data loss due to server failures or data corruption. New features of RecoverPoint/SE version 3.2 20 New features were introduced in RecoverPoint/SE version 3.2 that are only available when the following conditions are met: ◆ A new installation is performed using the RecoverPoint/SE version 3.2 Installer Wizard. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts ◆ Only CLARiiON splitters (one per site) are required for replication, and installed at each customer site. Note: See “Documentation relevance per RecoverPoint product” on page 14. RecoverPoint product family 21 Concepts RecoverPoint configurations EMC RecoverPoint replicates data over any distance: ◆ ◆ ◆ within the same site or to a local bunker site some distance away, see “CDP configurations” on page 22. to a remote site, see “CRR configurations” on page 23. to both a local and a remote site simultaneously, see “CLR configurations” on page 23. The following sections deal with the topics: ◆ ◆ ◆ CDP configurations “CDP configurations” “CRR configurations” “CLR configurations” There are two types of CDP configurations: ◆ (Standard) CDP - in which all components (splitters, storage, RPAs, and hosts) exist at the same site. ◆ Stretch CDP - in which the production host exists at the local site, splitters and storage exist at both the bunker site and the local site, and the RPAs only exist at the bunker site. In this configuration, the repository volume and both the production and local journals exist at the bunker site. RecoverPoint CDP can instantly recover data to any PIT by leveraging bookmarks from the replica journal. In CDP configurations, data can be replicated locally at a distance that does not exceed the limitation specified in the EMC RecoverPoint and RecoverPoint/SE Release Notes, and the data is transferred by Fibre Channel. By definition, writes from the splitter to the RPA are written synchronously, and snapshot granularity is set to per second, so the exact data size and contents are ultimately dependant upon the number of writes made by the host application per second. Users can change the snapshot granularity to per write, if necessary. Users can also set the replication mode to synchronous, when an RPO time of zero is required. 22 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts CRR configurations In CRR configurations, data is transferred between two sites over Fibre Channel or a WAN. In this configuration, the RPAs, storage and splitters exist at both the local and the remote site. By default, the replication mode is set to asynchronous, and snapshot granularity is set to dynamic, so the exact data size and contents are ultimately dependant upon the policies set by the user and system performance. This provides protection to application consistent, and other specific points in time. Note: Synchronous replication is only supported when the local and remote sites are connected via Fibre Channel, see EMC RecoverPoint Deployment Manager Product Guide for limitations. CLR configurations Both RecoverPoint CDP and CRR feature bidirectional replication and a point-in-time recovery mechanism which enables replica volumes to be rolled back to a previous point-in-time and used for read/write operations without effecting the ongoing replication process or data protection. RecoverPoint configurations 23 Concepts RecoverPoint hardware and software The following replication hardware and software is used in the RecoverPoint solution: ◆ ◆ ◆ RPAs “RPAs” “Splitters” “RecoverPoint Management Applications” The RPA is RecoverPoint's intelligent data protection appliance. In RecoverPoint, RPAs manage all aspects of reliable data replication at all sites. During replication for a given consistency group, an RPA at the production site makes intelligent decisions regarding when and what data to transfer to the replica site. It bases these decisions on its continuous analysis of application load and resource availability, balanced against the need to prevent degradation of host application performance and to deliver maximum adherence to the specified replication policy. The RPAs at the replica site distribute the data to the replica storage. In the event of failover, these roles can be reversed. Moreover, RecoverPoint supports simultaneous bidirectional replication, where the same RPA can serve as the production RPA for one consistency group and the replica RPA for another. Each RPA has the following dedicated interfaces: ◆ Fibre Channel. Used for data exchange with local host applications and storage subsystems. The RPA supports a dual-port configuration to the Fibre Channel, thereby providing redundant connectivity to the SAN-attached storage and the hosts. ◆ WAN. Used to transfer data to other sites (Ethernet). ◆ Management. Used to manage the RecoverPoint system (Ethernet). You can access each RPA directly through an SSH connection to the RPAs dedicated box-management IP address. You can also access all RPAs in the RecoverPoint configuration through the virtual site-management IP address of each site in the RecoverPoint configuration. In other words, once RecoverPoint is configured, you can manage the entire installation from a single location. 24 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts RPA terminology In RecoverPoint, the following terminology is used when referencing RPAs: ◆ Preferred RPA Each consistency group must be assigned one or more preferred RPAs. A non-distributed consistency group will have one preferred RPA, called the Primary RPA. Distributed consistency groups have multiple preferred RPAs, a minimum of one primary RPA and one secondary RPA, and they can be assigned a maximum of three secondary RPAs, see “Distributed consistency groups” on page 58. ◆ RecoverPoint cluster A group of inter-linked RPAs, working together closely, to provide replication services (so closely, that in many respects, they form a single computer). The RPAs in a RecoverPoint cluster are called nodes. The nodes of the RecoverPoint cluster are connected to each other through the local area network, a wide area network, or by Fibre Channel. A RecoverPoint cluster can be deployed within a single site for CDP (or stretched CDP) configurations, or deployed in two sites for CRR and CLR configuration. To scale-up and support a higher throughput rate, more RPA nodes can be added to the RecoverPoint cluster. RecoverPoint can be deployed with two to eight nodes per site, as set during RecoverPoint system installation. The cluster size must be the same at both sites in a RecoverPoint installation (see EMC RecoverPoint Deployment Manager Product Guide). The RPA nodes are used to attain availability. If a node fails, the consistency groups using that node (i.e. the consistency groups whose Primary or Secondary RPA settings are set to the RPA that failed), will flip-over to a different node in the RecoverPoint cluster. The RecoverPoint cluster at each site is managed by a process called the site control. The RPA node that can be used to host the site control is selected using cluster leader arbitration (LEP) and can only be RPA1 or RPA2. Physically, the RPA cluster can be located in the same facility as the host and storage subsystems. Alternatively, because RPAs have their own independent computing and storage resources, they can be located at a separate facility some distance away from RecoverPoint hardware and software 25 Concepts the host or storage subsystems. This provides greater data protection in the event of a localized disaster. See EMC RecoverPoint and RecoverPoint/SE Release Notes for limitations. During normal operation, all RPAs in a cluster are active all of the time. Consequently, if one of the RPAs in a cluster goes down, the RecoverPoint system supports immediate switchover of the functions of that box to another RPA in the cluster. ◆ Primary RPA The RPA that, whenever possible, handles replication for a consistency group. If an error occurs in the primary RPA, replication can, in most cases, be switched over to another RPA at the same side. The Primary RPA is set through the consistency group Policy tab (see “Copy General Settings” on page 135). ◆ RPA1 The RPA node that was designated as RPA1 of a RecoverPoint cluster, during a RecoverPoint installation. ◆ RPA2 The RPA node that was designated as RPA 2 of a RecoverPoint cluster, during a RecoverPoint installation. Note: Only the first two RPAs (RPA 1 and RPA 2) in a RecoverPoint cluster can host the Site control services. ◆ Site management (also known as: site control) - The process that manages the RecoverPoint cluster at each site. In CDP configurations, there is only one site control. In CRR and CLR configurations, there are separate site controls for each site in the configuration. The active instance of the site control is run only by RPA 1 or RPA 2. The user accesses the site control to manage and monitor RecoverPoint, using a Site Management IP address. To run the EMC RecoverPoint Management Application GUI, the user connects to a RecoverPoint cluster by opening a browser window and typing the site management IP of the RecoverPoint cluster into the browser address bar. To run the EMC RecoverPoint Command Line Interface, the user would connect to a RecoverPoint cluster by opening an SSH connection with the site management IP of the RecoverPoint cluster. 26 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts To identify the site control, log into the RecoverPoint Management Application as a user with SE privileges, and click on the RPAs section of the Navigation Pane. The ID column of the RPAs table, displays an asterisk for each RPA acting as the Site Control. Note: The RPA node that can be used to host the site control is selected using cluster leader arbitration (LEP) and can only be RPA1 or RPA2. ◆ Site management IP A virtual, floating IP address assigned to the RPA that is currently active (runs the Site control). In the event of a failure by this RPA, this floating IP address dynamically switches to the RPA that assumes operation (which will either be RPA1 or RPA2). Although using the site management IP is best practice, all management activities can also be performed on a specific RPA, by entering its dedicated IP address. Splitters A splitter is proprietary software that is installed on either host operating systems, storage subsystems, or intelligent fibre switches. Splitters access replica volumes; i.e., volumes that contain data to be replicated. The primary function of a splitter is to “split” application writes so that they are sent to their normally designated storage volumes and the RPA simultaneously. The splitter carries out this activity efficiently, with little perceptible impact on host performance, since all CPU-intensive processing necessary for replication is performed by the RPA. The splitter function can be host-based (see “Host-based splitters” on page 28), intelligent fabric-based (SANTap, Brocade), or array-based (“Array-based splitters” on page 28). This help file is written for RecoverPoint systems using the following host-based and array-based splitters; Windows, Solaris and CLARiiON. ◆ For SANTap splitter procedures, see EMC RecoverPoint Deploying RecoverPoint with SANTap and NX-OS Technical Notes and EMC RecoverPoint Deploying RecoverPoint with SANTap and SAN-OS Technical Notes ◆ For Brocade splitter procedures, see EMC RecoverPoint Deploying RecoverPoint with Connectrix AP-7600B and PB-48K-AP4-18 Technical Notes. RecoverPoint hardware and software 27 Concepts Host-based splitters ◆ For VMWare splitter procedures, see EMC RecoverPoint Deploying RecoverPoint with VMware Technical Notes. ◆ For AIX splitter procedures, see EMC RecoverPoint Deploying RecoverPoint with AIX Technical Notes. A host-based splitter is a splitter that is installed on the host, and performs the data splitting function. To date, this feature is available for the following operating systems: ◆ Windows ◆ Solaris ◆ AIX The RecoverPoint host-based splitter consists of two main components, a user space application, which is frequently referred to as the KDriver and a kernel driver, which is frequently referred to as the splitter. In general, the KDriver is responsible for the control path and the splitter is responsible for the I/O path. Note: Refer to the EMC support matrix on Powerlink for exact support statements including OS versions, and other caveats in supported and unsupported configurations. Array-based splitters An array-based splitter is a splitter that is used with EMC CLARiiON storage arrays, to perform the data splitting function. To date, this feature is only available for EMC CLARiiON storage, and therefore, the RecoverPoint array-based splitter is sometimes referred to as CLARiiON splitter. The array-based splitter can either be bundled with the CLARiiON's FLARE, or a splitter enabling package can be non-disruptively upgraded. Note: Refer to the EMC support matrix on Powerlink for exact support statements including OS versions, and other caveats in supported and unsupported configurations. RecoverPoint Management Applications 28 The RecoverPoint Management Applications allow you to manage the RecoverPoint system. Management activities are the primary subject of this Administrator's Guide. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts Site management provides access to all boxes in the local RPA cluster, as well as to the RPA cluster at the other site (i.e., to which the local cluster is joined). Command line interface (CLI) RecoverPoint Management Application Management activities can be carried out interactively or by scripts using the command-line interface. For information about the command-line interface, refer to the EMC RecoverPoint CLI Reference Guide. Management activities can also be carried out using any standard web browser to access the RecoverPoint Management Application. The rest of this help file describes how to use the features of the RecoverPoint Management Application to manage and monitor RecoverPoint replication. See “Monitoring and analyzing system performance” on page 213. RecoverPoint hardware and software 29 Concepts RecoverPoint logical entities The following logical entities constitute your replication environment: ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ Consistency groups “Consistency groups” “Non-distributed (regular) consistency groups” “Distributed consistency groups” “Copies” “Replication sets” “Journals” “Volumes” “Snapshots” “Links” A consistency group consists of one or more replication sets (see “Replication sets” on page 33). Each replication set consists of a production volume and the replica volumes to which it is replicating. The consistency group ensures that updates to the replicas are always consistent and in correct write order; that is, the replicas can always be used to continue working or to restore the production source, in case it is damaged. The consistency group monitors all the volumes added to it to ensure consistency and write-order fidelity. If two data sets are dependent on one another (for instance, a database and a database log), they must be in the same consistency group. Imagine a motion picture film. The video frames are saved on one volume, the audio on another. But neither will make sense without the other. The saves must be coordinated so that they will always be consistent with one another. In other words, the volumes must be replicated together in one consistency group. That will guarantee that at any point, the saved data will represent a true state of the film. A consistency group consists of: 30 ◆ Settings and policies: Settings, such as consistency group name, primary RPA, reservation support; policies, such as compression, bandwidth limits, and maximum lag, that govern the replication process, see “Configuring replication policies” on page 143. ◆ Replication sets: A production source volume and the replica volumes to which it replicates, see “Replication sets” on page 33 and “The replica volumes” on page 36. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts ◆ Journals: Receive changes to data. Each copy has a single journal. Changes are distributed from the replica journal to storage. The replica journals also retain rollback data for their replica, see “Journals” on page 33, “The replica journal volumes” on page 37 and “The production journal volume” on page 36. The production journal does not contain rollback information. The system marking information is contained in the production journal. Consistency group types In RecoverPoint, consistency groups can be of the following types: ◆ “Non-distributed (regular) consistency groups” ◆ “Distributed consistency groups” Note: Throughout the RecoverPoint documentation, the term “consistency group” is used to refer to groups when no differentiation is required between distributed and non-distributed groups. How can I tell if a group is distributed or non-distributed? You can tell that a group is distributed by clicking on a consistency group name in the Navigation Pane and looking at the top of the consistency group’s Status Pane. If the group is distributed, the text (primary) followed by a comma-separated list of numbers indicating the designated secondary RPAs is displayed. If the group is non-distributed, only one RPA is specified in this area. You can tell which of all of your groups are distributed by selecting Consistency Groups in the Navigation Pane. In the Active RPA column of the consistency group list, distributed consistency groups are displayed with the text (primary) followed by a comma-separated list of numbers indicating the designated secondary RPAs. RecoverPoint logical entities 31 Concepts For groups that are non-distributed, only one RPA is specified in this column. Non-distributed (regular) consistency groups New consistency groups are by default defined as non-distributed. Non-distributed consistency groups transfer data through one primary RPA that is designated by the user during group creation and can be modified at any time through the group policy settings. A maximum of 128 consistency groups can be defined in RecoverPoint, and a single RPA cannot be configured to have more than 64 consistency groups. In the event of RPA failure, groups that transfer data through one RPA will move to other RPAs in the cluster. In such a case, an RPA can temporarily hold up to 128 groups, and the data of all groups will continue to be transferred between sites. This state is temporary, however, as an RPA with more than 64 groups may run into high loads, and if this state is prolonged, group policies could be affected. Each RPA has a maximum throughput rate (see EMC RecoverPoint and RecoverPoint/SE Release Notes for this limit), which together with the host write-rate and available network resources, limits the maximum size of the consistency group. For a higher throughput rate, and to balance the load of extra-large consistency groups, define the consistency group as distributed, through the group policy settings (see “Distributed consistency groups” on page 58). For the complete set of limitations associated with consistency groups, see the EMC RecoverPoint and RecoverPoint/SE Release Notes. Copies 32 A logical RecoverPoint entity that constitutes all of the volumes defined for replication at a given location (production, local, or remote). These include; a journal size limit setting that defines RTO, journal compression policies, and protection policies that define snapshot consolidation and the required protection window. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts In CDP and CRR configurations, there is one production copy and one replica. In CLR configurations, there is one production copy and two replicas (one local copy at the production site and one remote copy at the disaster recovery site). Note: The term ‘replica’ is used to differentiate between production and non-production copies, whenever necessary. In CLR configurations there are two replicas, or non-production copies (Also known as: Targets). The production copy consists of production volumes and the production journal, which may consist of one or more journal volumes. The non-production copies (i.e. replica copies) each consist of replica volumes and a replica journal, which may consist of one or more journal volumes. In the RecoverPoint Management Application (GUI), new copies are defined using the New Consistency Group Wizard. In the RecoverPoint Command Line Interface (CLI), new copies are defined using the create_copy command. Replication sets Every SAN-attached storage volume in the production storage must have a corresponding volume at each copy. A production volume and its associated replica volumes is called a replication set. Each consistency group contains as many replication sets as there are volumes in the production storage to replicate. Data consistency and write-order fidelity are maintained across all volumes assigned to a consistency group, including volumes on different storage systems. At least one volume must be added to the journal of each copy in a replication set, see “Journals” on page 33, “The replica journal volumes” on page 37 and “The production journal volume” on page 36. Journals One or more volumes are dedicated on the storage at each replica site for the purpose of holding images that are either waiting to be distributed, or that have already been distributed, to the replica storage. See “The production journal volume” on page 36 and “The replica journal volumes” on page 37. Each journal holds as many images as its capacity allows. Subsequently, the oldest image (provided that it has already been distributed to the copy storage) is removed to make room for the newest one, in a first-in, first-out manner. The actual number of RecoverPoint logical entities 33 Concepts images in the journal varies, depending on the size of the images and the capacity of the storage dedicated to this purpose. See “Journal size without snapshot consolidation” on page 34. You can address individual images in a replica journal. Hence, if required due to a disaster, you can roll back the stored data image to an earlier image that was unaffected by the disaster. Frequent small-aperture snapshots provide high granularity for achieving maximum data recovery in the event of such a rollback. Journal size without snapshot consolidation For example: The recommended journal size when not utilizing snapshot consolidation is: 1.05 * [(Δ data per second)*(required rollback time in seconds)/(1 – image access log size)] + (reserved for marking) If: Δ data per second = 5 Mb/s required rollback time = 24 hr = 86 400 s image access log = 0.20 reserved for marking = 1.5 GB Then required journal size: 1.05 * 5 Mb/s * 86 400 s / (1 - 0.20) + 1.5 GB = 579 000 Mb 579 000 Mb = 579 000/8 MB = 72 375 MB = 72.4 GB You can use iostat (UNIX) or perfmon (Windows) to determine the value for Δ data per second. The default image access log size is 20%, see “Copy Journal Policy Settings” on page 155. See EMC RecoverPoint and RecoverPoint/SE Release Notes for the minimum and maximum journal size limitations. Journal size with snapshot consolidation 34 Journal sizing with snapshot consolidation must take into account the incremental change of data over the period of consolidation, see “Snapshot consolidation” on page 39. Therefore, when snapshot consolidation is enabled, the formula for estimating the journal size is: EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts [[(Change rate * (Journal for period of continuous data protection)) + (Journal for daily backups) (Change rate * 24hrs) (1 - Locality of reference per day) + (Journal for weekly backups) (Change rate * 7 days)(1 - Locality of reference per week)] / [1 - Percentage of journal used for the image access log]] / Compression or: J = [[ (C*(PW+24Hrs))+(BD + 6)(C*24Hrs)(1 -LD) + (BW-1)(C*7days)(1-LW)] / (1 - IA)]] / COMP Where: Table 1 Journal size with snapshot consolidation equation legend Symbol Denotes Description J Journal size The required journal size. C Change rate The average change rate based on iostat.a LD Locality of reference per day The percentage of data that is repeated to the same location on the snapshots of a day, where the putative daily incremental backup size is C*24hrs(1-LD). LW Locality of reference per week The percentage of data that is repeated to the same location in the snapshots of a week, where the putative weekly incremental backup size is C*7Days(1-LW). PW Protection window The continuous protection rate specified for Do not consolidate snapshots for at least setting, see “Copy Protection Policy Settings” on page 153. BD Number of Daily backups The value specified for the one snapshot per day for x days setting, see “Copy Protection Policy Settings” on page 153. BW Number of Weekly backups The value specified for the one snapshot per day for y weeks setting, see “Copy Protection Policy Settings” on page 153. IA Image access log percentage Default=20% The value specified for the proportion of journal allocated for image access log setting, see “Copy Journal Policy Settings” on page 155. COMP Compression The value specified for the enable compression setting, see “Consistency Group Compression Policy Settings” on page 144, where: • If compression is disabled Comp=1 • If compression is enabled Comp=2 a. You can use iostat (UNIX) or perfmon (Windows) to determine the value for Δ data per second. RecoverPoint logical entities 35 Concepts Note: For consolidation to occur, 25% of the total journal space allotted to snapshots (~75%, see “Journal size without snapshot consolidation” on page 34) must be free (must not contain snapshots). Otherwise, snapshot consolidation will not run and you will receive an appropriate event message. In other words, 25% of 75% of the total journal size must not contain snapshots, for consolidation to occur. See EMC RecoverPoint and RecoverPoint/SE Release Notes for the minimum and maximum journal size limitations, when snapshot consolidation is enabled. See “Journal size without snapshot consolidation” on page 34. Volumes In the EMC RecoverPoint Management Application, LUNs are represented as volumes. Therefore, this help file refers to LUNs when referencing the storage entity, and volumes when referencing the RecoverPoint entity. The following types of volumes exist in all RecoverPoint configurations: ◆ ◆ ◆ ◆ ◆ The production volumes The replica volumes The production journal volume “The production volumes” “The replica volumes” “The production journal volume” “The replica journal volumes” “The repository volume” The production volumes are the volumes that are written to by the host applications at the production site, see “Replication sets” on page 33. The replica volumes are the volumes that the production volumes are replicated to, see “Replication sets” on page 33. The production journal volume stores information about the replication process (called marking information) that is used to make synchronization of the replication volumes at the two sites, when required, much more efficient. In RecoverPoint/SE only: The production and local replica journals and repository volume must all reside on the same CLARiiON array. 36 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts Note: Since the production journal contains the system marking information, the removal of a journal volume from the production site will cause a full-sweep synchronization. The replica journal volumes Each journal (see “Journals” on page 33) can consist of one or more journal volumes. Note: In RecoverPoint/SE only: The production and local replica journals and repository volume must all reside on the same CLARiiON array. If more than one volume at a time is added to the journal, it is recommended that all added volumes be the same capacity for best performance and efficiency. If the added volumes are the same or nearly the same capacity (at least 85% of the largest volume), data is striped across those journal volumes, improving performance. When striped, the capacity used in each journal volume is equal to the capacity of the smallest journal volume in the group of added volumes; remaining capacity in those volumes is not used. Volumes of very different capacities will be concatenated and not striped. In most cases, this will affect performance, but all capacity will be used. If two groups of volumes of two different capacities are added, they are striped in two groups. If additional volumes are added afterwards, the new volumes will be considered as a group by themselves according to the criteria above. Existing volumes and newly added volumes will not be striped together. In the case that the combined physical size of all journal volumes at a given copy is larger than the combined physical size of the journal volumes at the other copy, the protection window at the first copy will be larger than the protection window at the other copy. Note: Journal volumes must not reside on LUNs that are virtually provisioned (thin LUNs). The repository volume A special volume, called the repository volume, must be dedicated on the SAN-attached storage at each site, for each RPA cluster. The repository volume serves all RPAs of the particular cluster and splitters associated with that cluster. It stores configuration information about the RPAs and consistency groups, which enables a RecoverPoint logical entities 37 Concepts properly functioning RPA to seamlessly assume the replication activities of a failing RPA from the same cluster. Note: In RecoverPoint/SE only: The production and local replica journals and repository volume must all reside on the same CLARiiON array. Snapshots A snapshot is a point in time marked by the system for recovery purposes. A snapshot includes only that data that has changed from the previous image. Upon being distributed, it creates a new current image on the remote storage. A snapshot is the difference between one consistent image of stored data and the next. Snapshots are taken seconds apart. The application writes to storage; at the same time, the splitter provides a second copy of the writes to the RecoverPoint appliance. In asynchronous replication, the appliance gathers several writes into a single snapshot. The exact time for closing the snapshot is determined dynamically depending on replication policies and the journal of the consistency group. In synchronous replication, each write is a snapshot. When the snapshot is distributed to a replica, it is stored in the journal volume, making it possible to revert to previous images by selecting the stored snapshots. The snapshots at a copy are displayed in the copy Journal Tab, see “The Journal Tab” on page 196. For each consistency group, a Snapshot Granularity policy can be configured to regulate data transfer, and the following granularities can be defined: Bookmarks 38 ◆ Fixed (per write): To create a snapshot from every write operation. ◆ Fixed (per second): To create one snapshot per second. ◆ Dynamic: To have the system determine the snapshot granularity according to available resources. A bookmark is a named snapshot. The bookmark uniquely identifies an image. Bookmarks can be set and named manually; they can also be created automatically by the system either at regular intervals or in response to a system event. Bookmarked images are listed by name. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts Figure 1 Examples of snapshots and bookmarks. You can bookmark a snapshot at any time. Bookmarks are useful to mark particular points in time, such as an event in an application, or a point in time you wish to fail over to. The procedures for bookmarking are described in “Managing and Monitoring” on page 177. The bookmarked snapshots at a copy are displayed in the copy Journal Tab, see “The Journal Tab” on page 196. Snapshot dilution Only 1000 snapshots (a sub-set of all available snapshots) are displayed in the copy Journal tab. You can access images that do not appear in the Journal tab image list either by specifying a point in time, previous/next, or searching for an image in the search engine. The actual image that is accessed in this way is subject to the criteria set in the image search utility. To accommodate the snapshot display limit of the image list, the system begins diluting the list of images as the number of images it is holding approaches the limit. For example, it may begin by diluting the journal of every second snapshot. Ultimately, the objective of the dilution process is to maintain a journal in which the maximum images are displayed and distributed in a more-or-less even manner (e.g., by time, by number of snapshots) across the journal. The system does not dilute bookmarked images from the list, unless the number of bookmarked snapshots exceeds the snapshot maximum. In that case, the system drops the oldest bookmarked snapshots from the list of addressable snapshots. The list of snapshots at a copy are displayed in the copy Journal Tab, see “The Journal Tab” on page 196. Snapshot consolidation RecoverPoint captures every write, enabling you to recover data from any point in time. Keeping days to months of every point in time, RecoverPoint logical entities 39 Concepts however, requires very large journals. Over time, re-writes to a single disk location consume a large amount of journal space. Additionally, the need to recover back to an exact point in time decreases as the data ages. The granularity of snapshots becomes less important over time. Snapshot consolidation enables longer-term point-in-time recovery using the same storage consumption. Snapshot consolidation discards re-writes to the same disk location to save on journal space, which allows a longer history to be retained in the journal. Using snapshot consolidation, you decide how long to retain every write captured, and at what point to consolidate these writes to a daily, weekly, or monthly snapshot. Snapshot consolidation allows you to retain the crucial per write or per second data of write transactions for a specified period of time (for example, the last 12 hours, days, weeks or months of transactions) and only then to start gradually increasing the granularity of older snapshots at preset intervals (for example, to create daily, then weekly, and then monthly snapshots). With snapshot consolidation: 40 ◆ All writes to a single disk location are saved in their end-state at one specific point-in-time. Therefore, all writes to the same disk location between the start-time and end-time of the consolidation are deleted. ◆ The amount of journal space saved depends on the I/O pattern, how many blocks get re-written, and other factors. The formula for journal size when snapshot consolidation is enabled is described in “Journal size with snapshot consolidation” on page 34. ◆ The time needed to consolidate snapshots depends on the number and size of the snapshots being consolidated. ◆ A minimum of 1GB of changes (writes) must have occurred and been transferred to the replica journal for the snapshot consolidation process to start. ◆ By default, 20% of the journal's capacity is dedicated to the image access log used during Logged access (although this default value can be modified by the user), another 5% of the journal's capacity is dedicated to the calculation of indexes that are used for Virtual access, and an additional ~1GB is dedicated to handling bursts during distribution. This means that only ~75% of the journal is available for the storage of snapshots. When snapshot EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts consolidation is enabled, the consolidation process uses 25% of the remaining ~75%. Therefore, for consolidation to occur, 25% of 75% of the journal must be free (must not contain snapshots). Otherwise, snapshot consolidation will not run and you will get an appropriate event message (see “Journal size without snapshot consolidation” on page 34 and “Journal size with snapshot consolidation” on page 34). ◆ Snapshots can be consolidated: • automatically through the RecoverPoint Management Application Copy Policy settings. See “Configuring copy policies” on page 152 for instructions on configuring automatic snapshot consolidation. • manually through the RecoverPoint Command Line Interface consolidate_snapshots command. See “Manual snapshot consolidation” on page 44 for more information. ◆ You can set a snapshot consolidation policy for an individual snapshot. This policy determines how the snapshot is treated during both automatic and manual snapshot consolidation. This allows you to retain a specific snapshot in its original form for a specific duration. See “Automatic snapshot consolidation” on page 41 for more information. This section deals with the following topics: ◆ ◆ ◆ ◆ ◆ “Automatic snapshot consolidation” “Enabling automatic consolidation” “Manual snapshot consolidation” “Snapshot consolidation policy” “Viewing consolidation results” Automatic snapshot consolidation Using automatic snapshot consolidation, you can define how snapshots are consolidated during four intervals. You can define the interval during which: • snapshots are not consolidated. During this interval, RecoverPoint maintains continuous data (that is, all writes), allowing you to recover to any point-in-time. By default, automatic snapshot consolidation maintains all writes for 2 days. RecoverPoint logical entities 41 Concepts • snapshots are consolidated into a daily snapshot. Once a day during this interval, snapshots that are older than the interval defined for maintaining continuous data are consolidated into a single daily snapshot. By default, automatic snapshot consolidation maintains 5 daily consolidations. You can also specify that the daily consolidation process run indefinitely. This setting disables weekly and monthly consolidations. • snapshots are consolidated into a weekly snapshot. Once a week during this interval, snapshots that are older than the total interval defined for continuous data and daily consolidations are consolidated into a single snapshot. By default, automatic snapshot consolidation maintains 4 weekly consolidations. • snapshots are consolidated into a monthly snapshot. Once a month, snapshots that are older than the total interval defined for continuous data, the daily period, and the weekly period are collapsed into a monthly consolidation. RecoverPoint maintains as many monthly consolidations as the journal can hold. You can also specify that the weekly consolidation process run indefinitely. This setting disables monthly consolidations. Figure 2 42 Automatic snapshot consolidation EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts The settings in Figure 2 maintain two days of continuous data, three daily consolidations, two weekly consolidations, and monthly consolidations up to the capacity of the journal. The snapshot in the example uses the default consolidation policy of ‘Always consolidate’. Refer to “Snapshot consolidation policy” on page 44 for more information. Suppose that a snapshot is taken at 10 A.M. on 3/15. Here’s what happens: ◆ The original version of the snapshot remains in the journal for at least the next 48 hours. ◆ After 48 hours have elapsed, the original snapshot becomes a candidate for daily consolidation. It will be consolidated by the daily consolidation process. The consolidated daily snapshot remains in the journal for at least 3 days. ◆ After 3 days, the daily snapshot becomes a candidate for weekly consolidation. It will be consolidated by the weekly consolidation process. That consolidated weekly snapshot remains in the journal for at least 2 weeks. ◆ After 2 weeks, the weekly snapshot becomes a candidate for monthly consolidation. It will be consolidated by the monthly consolidation process. The consolidated monthly snapshot remains in the journal as long as space is available. Enabling automatic consolidation When a consistency group is created, automatic snapshot consolidation is disabled by default. Snapshots of this group are not included in the automatic snapshot consolidation process. To enable automatic snapshot consolidation: 1. In the RecoverPoint Management Application, select a copy in the Navigation Pane. In the Component Pane, click the Policy Tab, and then the Protection Tab to display the automatic snapshot consolidation settings. 2. Select the Enable RecoverPoint snapshot consolidation checkbox to begin consolidating the snapshots in your copy journal according to the default settings. If required, adjust these settings to your specific requirements according to the instructions in Table 18 on page 153. RecoverPoint logical entities 43 Concepts Once enabled, automatic snapshot consolidation begins according to the values set on the Policy tab, provided the system is in the distribution phase and initialization is over. Note: You cannot enable automatic consolidation on a consistency group that belongs to a group set. For automatic consolidation to take place, the following conditions must be met: ◆ The total size of all snapshots between the specified start and end times must be at least 1 GB. ◆ Snapshots that account for 90% of the consolidation period must be available in journal. For example, for daily consolidation to take place, the starting and ending snapshots must be at least 22 hours apart. Likewise, automatic consolidation will not take place if the snapshots in the journal exceed 110% of the consolidation period. For example, daily consolidation will not take place if the starting and ending snapshots are more than 26 hours apart. Manual snapshot consolidation Manual snapshot consolidation allows you to select specific point-in-times to use as the starting and closing snapshots in a snapshot consolidation. For example, a script can be run once a day at a specific time to create a bookmark and then to use that bookmark as the closing snapshot for a manual consolidation. Use the consolidate_snapshots command in the RecoverPoint command line interface to manually consolidate snapshots. See the EMC RecoverPoint CLI Reference Guide for more information. Snapshot consolidation policy You can set a consolidation policy for an individual snapshot. The consolidation policy determines how the snapshot is treated during both automatic and manual snapshot consolidation. You can set the consolidation policy for a snapshot when it is first created, and you can set or change the consolidation policy for a snapshot (both regular and consolidated) at any time after it is first created. 44 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts By applying a consolidation policy to a snapshot, you can determine if and when a snapshot is consolidated. Table 2 summarizes the effects of snapshot consolidation policy. Table 2 Snapshot consolidation policies Consolidated in daily consolidations? Consolidated in weekly consolidations? Consolidated in monthly consolidations? Consolidated in manual consolidations? Always consolidate Yes Yes Yes Yes Survive daily No Yes Yes Yes Survive weekly No No Yes Yes Survive monthly No No No Yes Never consolidate No No No No Snapshots with this policy ... Take note of the following best practices: ◆ Apply a specific consolidation policy to a single snapshot within the period defined by the policy. That is, in any one month (28-day span), apply the ‘Survive monthly’ consolidation policy to a single snapshot. In any one week, apply the ‘Survive weekly’ to a single snapshot. In any one day, apply the ‘Survive daily’ to a single snapshot. ◆ Do not apply the same (or longer) consolidation policy to two snapshots within the same time period. If you do, automatic consolidation will not take place between those snapshots. For example, if you apply the ‘Survive weekly’ policy to two snapshots in the same week, no automatic consolidations will take place between the snapshots. ◆ Do not apply consolidation policies in overlapping periods. For example, do not apply the ‘Survive weekly’ policy in the same 24-hour period that you apply the “Survive daily” policy. If you apply overlapping policies, the policy with the longer time span takes precedence. RecoverPoint logical entities 45 Concepts Figure 3 shows a bookmark named Server restored created on January 1 with a consolidation policy of ‘Survive weekly’. Figure 3 Snapshot consolidation policy Using the automatic consolidation settings defined in Figure 2 on page 42, the bookmark would be treated as follows during automatic snapshot consolidation: ◆ The bookmark is not included in any automatic consolidation job that runs within 48 hours of the bookmark being created. ◆ After 48 hours elapse, daily consolidation jobs begin to run. A daily consolidation job runs once a day for the next three days. But because the bookmark has a policy of ‘Survive weekly’, the bookmark is not consolidated in any of these daily jobs. Instead, the bookmark remains untouched in the journal in its original form. ◆ After 3 days (5 days since the bookmark was originally created), daily snapshots begin to be consolidated into weekly snapshots. Still, this bookmark, since it has a policy of ‘Survive weekly’, is not consolidated into a weekly consolidation. Rather, it remains available in the journal in its original form. ◆ After 2 weeks and 3 days, monthly consolidation jobs begin. At this point, the bookmark is consolidated into a monthly consolidation and is no longer available in the journal (unless it is selected by the consolidation process to be the monthly snapshot). Manual consolidation overrides any consolidation policy applied to a specific snapshot, with the exception of ‘Never consolidate’. Since the Server restored bookmark has a consolidation policy of ‘Survive weekly,’ it will be included in a manual consolidating if it falls within the time or image range specified by the consolidation. 46 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts Viewing consolidation results Consolidated snapshots are represented by an icon in the Journal Tab of the copy. The icon indicates the type of consolidation (daily, weekly, monthly, or manual). Additionally, a tooltip indicates the amount of space saved by the consolidation. The following image shows an example of a manual consolidation with a closing time of 20:09:16 that saved 6.84GB of space. See “The Journal Tab” on page 196 for more information. You can also use the get_images command to view the results of a consolidation. See the EMC RecoverPoint CLI Reference Guide for more information. Links The communication pipe between a production and replica copy, through which data is transferred. In RecoverPoint, data transfer for each link can be over WAN or Fibre Channel. There can be one or two such links per consistency group, depending on the number of copies in the RecoverPoint installation: ◆ The remote link: In CRR or CLR configurations; the link between the production volumes and their corresponding remote replica volumes ◆ The local link: In CDP or CLR configurations; the link between the production volumes and their corresponding local replica volumes. This section answers the questions: ◆ ◆ ◆ What are the main properties of a link? What can trigger a link to close? “What are the main properties of a link?” “What can trigger a link to close?” “How can I tell whether a link is open or closed?” The link can be open, meaning that transfer is possible. The link can be closed, meaning that transfer is not possible to the replica. During a user or system-triggered pause in transfer. RecoverPoint logical entities 47 Concepts User triggered pauses in transfer happen when the user manually: ◆ Runs the pause transfer command. ◆ Changed the Primary RPA of the consistency group. ◆ Selects to enable image access in direct access mode. ◆ Changes the direction of replication, failing over to the replica (see “Failover” on page 77). System- triggered pauses in transfer happen when the system encounters one of the following states: How can I tell whether a link is open or closed? ◆ High loads, both temporary and permanent (see “My copy has entered a high load state” on page 334). ◆ Initialization after a disaster (see “Initialization” on page 78). ◆ WAN unavailable. ◆ No communication with RPA at remote site (for remote replica). ◆ No communication with RPA at local site (for local replica). ◆ System failure (a failed process, where the system does not recognize the reason for failure). You can verify whether a link is open or closed in the RecoverPoint Management Application. In the main Consistency Groups table or in the Status pane of a particular consistency group; when the state of Transfer for a copy is something other than N/A, Paused or Paused by system (in other words, when the state of Transfer is Active, Initializing, etc.), the link is open. Note: Only relevant for replica copies, not relevant for the production copy. 48 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts RecoverPoint performance Naturally, replication performance is subject to: ◆ ◆ ◆ ◆ ◆ ◆ Replication requirements WAN/FC link conditions Storage performance Host performance Change-rate of replicated data Number of RPAs installed per site The following settings are available to users that want to control and monitor the performance of their replication environments, according to the aforementioned constraints: ◆ ◆ ◆ ◆ ◆ ◆ Application regulation “Application regulation” “Replication modes” “RPO control” “RTO control” “Distributed consistency groups” “Load balancing” Allow Regulation is a consistency group policy that enables RecoverPoint to control the acknowledgement of writes back to the host in the case of bottlenecks or insufficient resources that would otherwise prevent RecoverPoint from replicating the data. When the Allow regulation setting is disabled, the selected RPO (lag) is not guaranteed, and the system will try it's best to replicate within the RPO setting, without affecting host performance. In asynchronous replication mode (see “Asynchronous replication mode” on page 51), when enabled, the Allow Regulation setting slows host applications when approaching the lag policy limit (see “RPO control” on page 53) or a high load state (see “My copy has entered a high load state” on page 334). In synchronous replication mode (see “Synchronous replication mode” on page 51), this setting is automatically enabled, and cannot be modified. For more information, see “My host applications are hanging” on page 330 and “Allow Regulation” on page 146. RecoverPoint performance 49 Concepts Replication modes Regardless of the replication mode, RecoverPoint is unique in its ability to guarantee a consistent replica at the target side under all circumstances, and in its ability to retain write order fidelity in multi-host heterogeneous SAN environments. RecoverPoint replicates data in one of two replication modes: ◆ Asynchronous replication mode - the host application initiates a write, and does not wait for an acknowledgment from the remote RPA before initiating the next write. The data of each write is stored in the local RPA, and acknowledged at the local site. The RPA decides based on the lag policy and system loads/available resources when to transfer the writes in the RPA to the replica storage. This is the default replication mode. In asynchronous replication mode, a Snapshot Granularity policy can be configured to regulate data transfer, (see Snapshot Granularity in “Consistency Group Advanced Policy Settings” on page 149). The following granularities can be defined: • Fixed (per write): To create a snapshot from every write operation. • Fixed (per second): To create one snapshot per second. • Dynamic: To have the system determine the snapshot granularity according to available resources. See “Asynchronous replication mode” on page 51 for a more detailed description. ◆ Synchronous replication mode - the host application initiates a write, and then waits for an acknowledgment from the remote RPA before initiating the next write. This is not the default replication mode, and must be specified by the user. Replication in synchronous mode produces a replica that is always one hundred percent up-to-date with its production source. The trade-off, is that to ensure that no subsequent writes are made until an acknowledgement is received from the remote RPA, host applications can be regulated by RecoverPoint, and this could impact application performance. Alternatively, users can configure RecoverPoint to dynamically alternate between synchronous and asynchronous replication modes, according to predefined lag and/or throughput conditions. To do so, see “Dynamic sync mode” on page 52. 50 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts Synchronous replication mode is only supported for replication: • to a local replica. • to a remote replica over Fibre Channel. Note: Synchronous replication mode is not supported for replication over the WAN. See “Synchronous replication mode” on page 51 for a more detailed description. Asynchronous replication mode The host application initiates a write, and does not wait for an acknowledgment from the remote RPA before initiating the next write. The data of each write is stored in the local RPA, and acknowledged at the local site. The RPA decides based on the lag policy and system loads/available resources when to transfer the writes in the RPA to the replica storage. This is the default replication mode. The primary advantage of asynchronous replication is its ability to provide synchronous-like replication without degrading the performance of host applications. Asynchronous replication, however, is not the primary mode of replication in all situations. Asynchronous replication does not conserve bandwidth. Furthermore, and particularly as volumes increase, more data can be lost, as larger chunks of data that have been acknowledged at the local site may not be delivered to the target side in the case of a disaster. RecoverPoint replicates asynchronously only in situations in which doing so enables superior host performance without resulting in an unacceptable level of potential data loss. Synchronous replication mode The host application initiates a write, and then waits for an acknowledgment from the remote RPA before initiating the next write. This is not the default replication mode, and must be specified by the user. To configure RecoverPoint to replicate synchronously, see “Consistency Group Protection Policy Settings” on page 144. In order to ensure that no subsequent writes are made until an acknowledgement is received from the remote RPA, host applications are regulated by RecoverPoint. If your applications cannot be regulated for any reason, choose asynchronous replication mode. Replication in synchronous mode produces a replica that is always one hundred percent up-to-date with its production source. RecoverPoint performance 51 Concepts Synchronous replication mode is efficient for replication both within the local SAN environment (as in CDP configurations, referred to as local replication), as well as for replication over Fibre Channel (as in CRR configurations, referred to as remote replication). However, when replicating synchronously, the longer the distance between the production source and the replica copy, the greater the latency. In order to replicate data synchronously, your current RecoverPoint license must support synchronous replication. To verify that your current RecoverPoint license supports synchronous replication; from the main system menu of the RecoverPoint Management Application, select System > System Settings. Click on Account Settings in the Navigation Pane, and verify that the word Supported is displayed next to Synchronous Replication in the License Usage section of this dialog box. If you wish to replicate synchronously and this feature is not supported in your current version of RecoverPoint, contact EMC Customer Service. By default, new consistency groups are created with asynchronous mode enabled, and must be set to replicate synchronously through the RecoverPoint Management Application, see “Consistency Group Protection Policy Settings” on page 144. Note: New consistency groups are created with the Measure lag when writes reach the target RPA (as opposed to the journal) setting enabled. When replicating synchronously, performance is substantially higher when this setting is enabled, see “Measure lag when writes reach the target RPA (as opposed to the journal)” on page 150. To verify that a consistency group copy is replicating synchronously, check its transfer status in the Consistency Groups Tab, see Figure 6 on page 187 or the group’s Status Tab, see “The Status Tab” on page 190. Dynamic sync mode When replicating synchronously over longer distances, users can set RecoverPoint to replicate in dynamic sync mode, a submode of synchronous replication mode. In this mode, users can define group protection policies that will enable the group to automatically begin replicating asynchronously whenever the group’s data throughput or latency reaches a maximum threshold, and then automatically revert to synchronous replication mode when the throughput or latency falls below a minimum threshold. 52 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts You can also switch manually between replication modes, using the RecoverPoint CLI. This is useful, for example, if you generally require synchronous replication, but wish to use CLI scripts and your system scheduler to manually switch between replication modes during different times in the day, like during your nightly backups. When the replication policy is controlled dynamically by both throughput and latency (both Dynamic by latency and Dynamic by throughput are enabled), it is enough that one of the two values of Start async replication above are met for RecoverPoint to automatically start replicating asynchronously to a replica. However, both Resume sync replication below settings must be met before RecoverPoint will automatically revert to synchronous replication mode. To prevent jittering, the values specified for Resume sync replication below must be lower than the values specified for Start async replication above, or the system will issue an error. To check whether a replica is being replicated to synchronously, see “Multiple consistency group management” on page 186. Note: Groups undergo a short initialization every time the replication mode changes (for example, from synchronous to asynchronous and vice-versa). During initialization, data is not transferred synchronously. RPO control For example: The Recovery Point Objective (or RPO) is the point in time to which you are required to recover data, for a specific application, as defined by your organization. This is generally a definition of what an organization determines is an acceptable loss in a disaster situation. If a company’s data must be restored to within 3 hours of the disaster event and the time it takes to get the recovered data back into production is 6 hours: ◆ The RPO is 3 hours ◆ The RTO is 6 hours There is a trade-off between the amount of data that a customer is willing to lose and its cost. If the customer must have an RPO of zero, this means that replication must by synchronous (see “Synchronous replication mode” on page 51), or in other words, that each write must be replicated to the DR site before another write is made. This usually introduces additional cost in terms of the resources that are RecoverPoint performance 53 Concepts required for this to occur effectively (such as storage performance and replication bandwidth). Each RecoverPoint configuration provides a different level of protection (in terms of RPO) in the case of logical, storage, local and regional disasters. How do I control the RPO? In asynchronous replication mode (“Asynchronous replication mode” on page 51), RecoverPoint allows you to control the RPO for each consistency group through the Lag and Allow regulation settings in the consistency group protection policy section (see “Consistency Group Protection Policy Settings” on page 144). Using the Lag setting, RPO can be expressed in terms of time, quantity of data, or number of writes. To guarantee the RPO setting, host applications can be throttled upon approaching the defined Lag setting, using the Allow regulation setting (see “Allow Regulation” on page 146). In synchronous replication mode (see “Synchronous replication mode” on page 51), this setting is automatically enabled, and cannot be modified. 54 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts How do I monitor the RPO? To monitor the RPO, in the EMC RecoverPoint Management Application: ◆ Click on a consistency group name in the Navigation Pane, click the Statistics Tab in the components area, and click the Replication Performance Tab to the bottom of the Components area. In the Replication Performance tab, the consistency group's RPO can be monitored in terms of time, writes, or quantity of data. ◆ You can also click on System Monitoring in the Navigation Pane, and select the Groups tab to monitor the lag (RPO) of all consistency groups in RecoverPoint. In the RecoverPoint Command Line Interface, run the get_group_statistics command, and identify the Lag output in the Link stats area. RecoverPoint performance 55 Concepts RTO control The Recovery Time Objective (or RTO) is the duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity. RTO includes the time for trying to fix the problem without a recovery, the recovery itself (RecoverPoint's role in an organization's RTO, controlled by the Maximum journal lag setting), tests and the communication to the users. Decision time for users is not included. RecoverPoint's role in an organization's RTO can be defined as the amount of time it would take to enable physical access to the latest application-consistent image at a replica, or in other words, the amount of time that it would take to apply all of the snapshots in a replica journal to the replica storage and bring the replica storage up-to-date with the latest application-consistent image at production. In RecoverPoint, RTO is measured by data size. Note: Since there is a trade-off between the amount of data that can be stored in a replica journal (the amount of available data recovery points) and the amount of time to which the replica can be up-to-date with production (the RTO setting), if there is no specific reason to modify the RTO setting, it is recommended to leave the default setting. How do I control the RTO? 56 RecoverPoint allows you to control the data access aspects of RTO for each copy through the Maximum journal lag setting in the copy journal policy (see “Copy Journal Policy Settings” on page 155). When the system approaches the limit set in the Maximum journal lag policy, it moves to three-phase distribution mode (see “Three-phase distribution” on page 97). EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts How do I monitor the RTO? In the RecoverPoint Management Application GUI, click on a copy name in the Navigation Pane and select the copy Journal Tab to display the current journal lag. The value displayed in the Journal Lag field indicates the RTO in terms of data size. In the EMC RecoverPoint Command Line Interface, run the get_group_statistics command to display the journal lag. RecoverPoint performance 57 Concepts Distributed consistency groups Distributed consistency groups allow users to create and use consistency groups that require a total throughput and IOPS rate that exceeds the supported throughput and IOPS rate of a single RecoverPoint appliance, and prevent users from having to split data that requires strict write-order fidelity into multiple consistency groups. They do so by dividing the consistency group into four segments, and running these segments on one primary RPA and one to three additional secondary RPAs, as defined by the user. Distributed groups can handle a much higher throughput and IOPS rate (see the EMC RecoverPoint and RecoverPoint/SE Release Notes for this limit) regardless of the amount of data being replicated. Up to eight distributed consistency groups can be defined in RecoverPoint, and the total number of distributed and non-distributed consistency groups is 128. Distributed consistency group support is a function of your RecoverPoint license. See “How do I verify that distributed groups are supported by my RecoverPoint license?” on page 61 for more information. To better understand the differences between distributed and non-distributed consistency groups, read “Consistency groups” on page 30. For the complete set of limitations associated with distributed consistency groups, see the EMC RecoverPoint and RecoverPoint/SE Release Notes. This section answers the questions: ◆ ◆ ◆ ◆ ◆ ◆ 58 “When should I set a consistency group as distributed?” “How do distributed consistency groups work?” “What should I know before setting a group as distributed?” “How do I verify that distributed groups are supported by my RecoverPoint license?” “How do I set a consistency group as distributed?” “How do I monitor each group segment’s performance?” EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts When should I set a consistency group as distributed? How do distributed consistency groups work? You should consider setting a consistency group as distributed when: ◆ The maximum throughput rate of a single RPA is not sufficiently sustaining the write-rate or peaks of the consistency group (see the EMC RecoverPoint and RecoverPoint/SE Release Notes for the maximum throughput rate of a single RPA). ◆ Your consistency group is experiencing high loads often. ◆ You expect a consistency group will, in the future, require a higher throughput rate than that of a single RPA. In this case, it is preferable to initially create the consistency group as distributed, rather than modifying an existing consistency group after creation. The reason for this is explained in the journal configuration instructions in “What should I know before setting a group as distributed?” on page 59. Distributed consistency groups are divided into four segments and these segments are transferred through one primary RPA and up to three secondary RPAs, as designated by the user. RecoverPoint data recovery processes are affected in the following way: ◆ The primary RPAs at both sites (if two sites exist) are responsible for the receipt and handling of all system process requests. ◆ All of the marking information is handled by the primary RPA at the source-side. The data flow for distributed consistency groups, in synchronous and asynchronous replication modes, is described in “The transfer phase” and “The distribution phase”. What should I know before setting a group as distributed? When setting a group as distributed, the following limitations apply: ◆ The snapshot granularity of all links in the consistency group can be no finer than one second. See “Consistency Group Advanced Policy Settings” on page 149 for more information on snapshot granularity. ◆ Journal loss will occur when modifying a group’s topology (setting a non-distributed group as distributed, or setting a distributed group as non-distributed). ◆ When configuring journals for a distributed consistency group, keep the following in mind: RecoverPoint performance 59 Concepts • All copies of distributed consistency groups must have a journal that is at least 20 GB in size. See “Journals” on page 33 for more information. • The recommended journal size for distributed groups with snapshot consolidation enabled is at least 40GB. See “Snapshot consolidation” on page 39 for more information. • oIf the capacity of an existing copy journal is less than the minimum journal size required for distributed consistency groups (see the EMC RecoverPoint and RecoverPoint/SE Release Notes for this limit), the consistency group will need to be disabled and then enabled again after adding journal volumes, and this will cause a full sweep. See “How to resize an existing journal volume” on page 171 for more information. ◆ Distributed consistency groups are only supported if there is a Fibre Channel connection between all RPAs in a RecoverPoint cluster (per site). Therefore: • In Fibre Channel environments, make sure all of the RPAs at each site are connected to the SAN through a Fibre Channel switch, and zoned together so that they see each other in the SAN. • In iSCSI environments, make sure all RPAs are physically connected to each other through their HBA Fibre Channel ports. Note: In iSCSI configurations, there can be a maximum of two RPAs per RecoverPoint cluster (i.e. per site) because two of the four existing Fibre Channel ports in the RPA’s HBA are already connected directly to the storage. If more than 2 RPAs per RecoverPoint cluster are required, connect all of the RPAs in the cluster through Fibre Channel switches (two should be used for high availability) and zone them together. 60 ◆ If any of the primary or secondary RPAs associated with a consistency group becomes unavailable, there will be a brief pause in transfer on all of the group’s primary and secondary RPAs, and all of the group segments will undergo a short initialization. ◆ Under certain circumstances (for example, if one of the primary or secondary RPAs becomes unavailable) two consistency group segments could be handled by the same RPA. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts ◆ In general, distributed consistency groups offer better performance than non-distributed (regular) consistency groups, as distributed groups run on a minimum of two RPAs (one primary RPA and one secondary RPA). There is only a small improvement in performance when a group is run on three RPAs. However, there is a steep improvement in performance when a group is run on four RPAs. How do I verify that distributed groups are supported by my RecoverPoint license? To verify that this feature is supported, select System > System Settings from the main RecoverPoint menu and click on the Account Settings tab. In the License Usage section of the accounts settings screen, you should see the text Distributed Groups: Supported. If this text does not appear, contact EMC Customer Support. How do I set a consistency group as distributed? Before you set a consistency group as distributed, read “What should I know before setting a group as distributed?” on page 59, and review the relevant limitations and performance statistics in the EMC RecoverPoint and RecoverPoint/SE Release Notes. To set a non-distributed group as distributed, or set a distributed group as non-distributed: RecoverPoint performance 61 Concepts 1. Display the consistency group’s Advanced policy section. 2. Check or uncheck the Distribute group checkbox. 3. The Secondary RPA checkboxes are enabled or disabled accordingly. 4. If you are enabling this feature, select one to three secondary RPAs, by checking the relevant checkboxes in the Secondary RPA section. Note: When modifying a group’s topology, journal loss will occur. How do I monitor each group segment’s performance? Load balancing To monitor each group segment’s performance separately, and in relation to the group as a whole, run the detect_bottlenecks command in the EMC RecoverPoint Command Line Interface. Load balancing is the process of assigning preferred RPAs to consistency groups. Assigning a preferred RPA to a consistency group dictates the RPA to “prefer” when transferring data for a specific group. During a group’s creation, one or more preferred RPAs (see “Distributed consistency groups” on page 58) must manually be assigned to each new group. Each group will always run on its preferred RPA/s, unless an RPA disaster occurs. If an RPA disaster occurs, the group will flip-over to another (non-preferred) RPA, and then back as soon as possible. Flipover (also known as: switchover) can cause high loads on RPAs, when the loads of all consistency groups defined in the system are not evenly distributed between RPAs, so these loads must be balanced. In other words, groups must be moved from RPA to RPA every once in awhile to re-arrange the 62 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts load. On the other hand, to ensure consistency, consistency groups are initialized when moving from RPA to RPA. During switchover, all groups running on the preferred RPA are initialized once, when they move to the non-preferred RPA, and then another time, when they switch back to their preferred RPAs. Therefore, any re-assignment of RPAs during replication should be carefully planned out, as not to affect the performance of the production host applications. This section answers the questions: ◆ ◆ ◆ When should I perform load balancing? How do I perform load balancing? “When should I perform load balancing?” “How do I perform load balancing?” “How do I check that load balancing improved my group’s performance?” You should perform load balancing: ◆ When a new consistency group is added to the replication environment. After a new consistency group is added to the system (see “Creating new consistency groups” on page 132), wait a week (7 full days) for a long enough traffic history to accumulate before you perform load balancing. ◆ When a new RPA is added to a RecoverPoint cluster. Perform load balancing immediately after the RPA is added. ◆ If your system enters high load frequently. When load balancing is required, the high load event logs will display a message indicating so. When you see this message, perform load balancing. ◆ If the bottleneck detection tool recommends it. When load balancing required, the bottleneck detection tool will display a message indicating so. When you see this message, perform load balancing. ◆ Periodically. To ensure that your system is always handling distributing loads evenly, a script can be created to periodically perform load balancing. There are two load balancing methods in RecoverPoint: ◆ “Manual load balancing”, which is performed by setting or changing a consistency group’s primary RPA setting (and secondary, if relevant). RecoverPoint performance 63 Concepts ◆ See also: ◆ ◆ ◆ ◆ “Using RecoverPoint’s load balancing tool”, which analyses multiple group performance over time, recommends a load balancing strategy based on this analysis, and optionally, automatically assigns preferred RPAs to specified consistency groups based on the recommendation. When should I perform load balancing?: page 1-63 Manual load balancing: page 1-64 Using RecoverPoint’s load balancing tool: page 1-65 How do I check that load balancing improved my group’s performance?: page 1-72 Manual load balancing To manually perform load balancing: 1. Perform a system analysis to identify consistency groups with a high change-rate or run the balance_load CLI command to create a load balancing recommendation, see “How do I use the load balancing tool?” on page 67. 2. Based on your system analysis or the load balancing recommendation output (see “How do I check that load balancing improved my group’s performance?” on page 72) from the Navigation Area, select the name of the group whose preferred RPAs you want to change, and click on the group’s Policy Tab. 3. Change the preferred RPAs of the consistency group: • For regular (non-distributed) consistency groups, change the Primary RPA setting and click the Apply button, see “Non-distributed (regular) consistency groups” on page 32 for more information. • For distributed consistency groups, you have the following options: – Change the Primary RPA setting, by clicking the Apply button. – Change any or all secondary RPAs by clicking on the Advanced policy section, and then checking or unchecking the relevant secondary RPAs in the General section of the Advanced tab. Note: See “Distributed consistency groups” on page 58 for more information. 64 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts 4. Check the load balancing results, to verify improved performance, see “How do I check that load balancing improved my group’s performance?” on page 72. Note: Each time a consistency group’s preferred RPA (primary or secondary) is modified, the group undergoes an initialization to ensure consistency. Using RecoverPoint’s load balancing tool RecoverPoint’s automated load balancing tool enables: ◆ more efficient preliminary group configuration, by automating the process of assigning preferred primary RPAs (and preferred secondary RPAs, see “Distributed consistency groups” on page 58) manually, yet providing a recommendation, allowing users who wish to balance RPA load manually, do so, based on the recommendation. ◆ more efficient ongoing group administration, by automating the task of balancing the loads of groups between RPAs, as workloads and circumstances change after the preliminary group configuration. Since the modification of preferred RPA assignments causes flipovers that cause initializations, if the load balancing analysis finds that no significant changes in workload or number of consistency groups have occurred, the mechanism will not recommend preferred RPA re-assignment. Also, the load balancing mechanism smooths-out I/O statistics, so that any sudden peaks or lapses in traffic do not cause it to recommend unnecessary changes. This section answers the questions: ◆ ◆ ◆ ◆ “What is the load balancing tool good for?” “What should I know before I begin using the tool?” “How do I use the load balancing tool?” “How do I retrieve the load balancing results?” What is the load balancing tool good for? RecoverPoint’s load balancing tool automates the following tasks: ◆ Analyzing of the performance of a specified set or all consistency groups in the system over a substantial period of time, and displaying a recommended load balancing strategy based on the historical performance statistics. RecoverPoint performance 65 Concepts ◆ Saving the recommended load balancing strategy (containing the current traffic performance statistics) of any or all of the consistency groups in the system, at a particular point-in-time: • for later reference and analysis, see “How do I retrieve the load balancing results?” on page 71 and “How do I check that load balancing improved my group’s performance?” on page 72. • as reference material, from which to perform manual load balancing, see “Manual load balancing”. ◆ Automatically re-assigning preferred RPAs to all, or a subset of all, consistency groups in the system, based on the recommended load balancing strategy. What should I know before I begin using the tool? Before you begin using the load balancing tool, note the following: 66 ◆ The balance_load command is only available on the site control RPA, or through the floating Site Management IP. This command cannot be run if site control is down. If site control goes down, and then back up again, five minutes must pass before this command can be run again. ◆ The load balancing analysis is performed on all RPAs in the environment. If there are three RPAs in total, and one of them goes down for a week, and it is the primary RPA of a group included in the analysis, the load balancing recommendation calculations are still based on the total number of RPAs, as if they were all working. In other words, the load balancing mechanism accumulates group statistics for its calculations, as if groups were always running on their preferred RPAs, even if flipovers happened during the week. ◆ The output of the load balancing analysis and recommendation are saved in a text file. By default, only the most recent ten files are saved, see “How do I retrieve the load balancing results?” on page 71 for more information. ◆ For best results, groups should be configured and replicating for seven days before running the load balancing tool, so that a long enough traffic history is available for the load balancing analysis. If seven days of traffic data are not available, any existing time period of data is used, and the load balancing recommendation is accompanied by a note indicating the period of time upon which the recommendation results are based, and noting that seven days are preferable. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts ◆ The load balancing analysis is performed on all RPAs at all sites. ◆ Groups can be excluded from the load balancing recommendation, however, even when excluded from the recommendation, all groups are included in the analysis. ◆ The load balancing tool is a smart mechanism that can identify cases in which distributed group segments do not actually require four separate RPAs to run on, and recommends RPA assignments accordingly. Distributed group segments can be run on different RPAs, or the same RPA, and each group segment is treated as a separate group in the load balancing analysis and recommendation. If you are not well acquainted with the distributed consistency group feature, it is recommended that you become acquainted with this feature before using the load balancing tool. See “Distributed consistency groups” on page 58 for more information. How do I use the load balancing tool? If you want to use the load balancing tool: ◆ To analyze system performance and display a load balancing recommendation containing the performance results on the screen, perform steps 1-5. ◆ To analyze system performance, display a load balancing recommendation containing the performance results on the screen, and retrieve the results from the webdownload directory for later reference or to perform manual load balancing (see “Manual load balancing” on page 64), perform steps 1-6, but answer NO when asked to apply the recommendation. ◆ To analyze system performance, display a load balancing recommendation containing the performance results on the screen, and automatically re-assign RPAs to consistency groups based on this recommendation, perform steps 1-6, and answer YES when asked to apply the recommendation. To use the load balancing tool: 1. Make sure you are well acquainted with the information in “What should I know before I begin using the tool?” on page 66. 2. Open a direct SSH session with the RPA running the site control, or connect to the RPA running the site control through the site management IP. 3. Run the balance_load command. RecoverPoint performance 67 Concepts 4. Enter the consistency groups to leave out of the load balancing recommendation, enter multiple consistency groups in the form of a comma-separated list, and press ENTER. Note: The groups that you specify here will be preceded by an asterisk in the load balancing recommendation output, and their traffic statistics will be included in the analysis upon which the recommendation is based. 5. The system lets you know that the analysis and recommendation process can take several minutes, and prompts you as to whether or not to start the process now. Type YES, and click ENTER, to run the analysis and recommendation process now. [Wait for the process to complete] At the end of the process: a. If the load balancing recommendation is based on less than 7 days-worth of data, a note is displayed indicating so. b. One or more of the following load balancing recommendations are displayed: – No action is necessary. The environment is stable, and groups are evenly distributed across all RPAs. – Action is necessary. The environment is not stable because groups are not evenly distributed across all RPAs. To correct this, apply the load balancing recommendation or manually modify the Preferred RPA of each group. – Action may be necessary. The environment is stable, although groups are not evenly distributed across all RPAs, and this may affect future performance. To distribute groups evenly across all RPAs, apply the load balancing recommendation or manually modify the Preferred RPA of each group. c. The current preferred and recommended RPA assignments are displayed per consistency group, along with each RPAs average throughput and incoming write-rate, in a Recommended Load Distribution table. 68 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts Note: For distributed groups, each consistency group segment is treated as a separate group, and the performance statistics are displayed per group segment. d. Two tables, Traffic per RPA before Application of Recommendation and Traffic per RPA after Application of Recommendation, are displayed in order to aid you in the process of understanding the performance implications on all RPAs, if you do choose to apply the recommendation. Each RPAs average throughput and IOPS are displayed, and the RPA with the least amount of traffic is highlighted, in each of the tables. RecoverPoint performance 69 Concepts e. If any non-distributed (regular) consistency groups have experienced a minute of throughput that exceeded the maximum throughput rate of a single RPA (as desribed in EMC RecoverPoint and RecoverPoint/SE Release Notes) at least 50 times during the week, the load balancing recommendation indicates that these groups should be set as distributed. If any distributed consistency groups have experienced less than 40 minutes of throughput at a rate of over 60 MB/sec during the week, the load balancing recommendation indicates that these groups should be set as non-distributed. f. If the total group throughput is exceeded, a message is displayed indicating that you should add additional RPAs to your RecoverPoint cluster, and run the balance_load command again after 7 days. Note: See the EMC RecoverPoint and RecoverPoint/SE Release Notes for the total throughput per distributed consistency group and total throughput per RPA limitations. 6. Decide whether or not to apply the recommendation: ◆ If the load balancing recommendation was No action is necessary, a message is displayed indicating the location of the output file containing the analysis results and recommendations, and the process ends. See “How do I retrieve the load balancing results?” on page 71 for instructions on how to retrieve the results. You can save the results file, and use it later to: • analyze system performance over an extended period of time. • manually perform load balancing, see “Manual load balancing” on page 64. Note: Only the past ten load balancing results are saved. The oldest results are replaced with the newest results. ◆ If the load balancing recommendation was Action Necessary or Action may be Necessary, RecoverPoint asks whether you want to apply the recommended load balancing. • Type NO, and click ENTER, to indicate that you do not wish to apply the recommended load balancing. In this case, a message is displayed indicating the location of the output file containing the analysis results and recommendations, and the 70 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts process ends. See “How do I retrieve the load balancing results?” on page 71 for instructions on how to retrieve the results. • Type YES, and click ENTER, to indicate that you do wish to apply the recommended load balancing. In this case: – All relevant consistency group preferred RPAs are re-assigned according to the load balancing recommendation. – An initialization process starts on each consistency group whose preferred primary or secondary RPAs are modified. Note: For distributed consistency groups, if one group segment is initialized, all group segments are initialized. – A normal scope event is logged indicating that the load balancing process has ended. The event contains a copy of the load balancing recommendation output. – A message is displayed indicating the location of the output file containing the analysis results and recommendations, and the process ends. See “How do I retrieve the load balancing results?” on page 71 for instructions on how to retrieve the results. 7. Optionally, you can now check the implications on performance, see “How do I check that load balancing improved my group’s performance?” on page 72. How do I retrieve the load balancing results? After using the load balancing tool, to retrieve the results: 1. Open a browser window and type the following path into your browser’s address bar: <SiteMgmtIP/SiteControlIP>/info/load_balancing/ Note: The balance_load command is only available on the site control RPA, or through the floating Site Management IP. 2. Download the relevant text file/s and store them for later reference. The load balancing output is saved to text files with the name: balance_load_yyyy-mm-dd_hh-mm.txt. RecoverPoint performance 71 Concepts Note: Ten load balancing output files are stored in this location at a time. When ten files have already been stored, the older files are removed to make way for newer files. Note the date and time of the output file is indicated in the filename. The file names are suffixed with the text “applied” whenever the load balancing recommendation was actually applied by the user, through the load balancing tool. How do I check that load balancing improved my group’s performance? After load balancing manually or applying the recommendation produced by the load balancing tool, check the implications on performance: 1. Wait 7 days, for a new traffic history to be available. 2. Re-run the balance_load CLI command, see “How do I use the load balancing tool?” on page 67. 3. Compare the old load balancing results and performance statistics to the new load balancing results and performance statistics, see “How do I retrieve the load balancing results?” on page 71. 72 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts RecoverPoint data recovery procedures The following procedures are central in RecoverPoint: ◆ ◆ Image access “Image access” “Failover” When replicating normally, writes to the production source are also written to the journal of the replica or replicas. The state of the storage at the copy site is No access; that is, a host cannot access the storage. To test a replica to verify that it is a reliable and consistent replica of the production storage image, it is necessary to access the image of the replica. Image access is also required in order to restore the production storage from the replica, to roll back to a previous state of the data, to recover files, and to fail over to the replica. Such access can be enabled only by suspending distribution of data at the copy, from the journal to the replica volume/s. Access to the remote image can be either logged or virtual. To specify the image for which access is to be enabled, the user either selects an image from a list of images, or specifies a specific point-in-time. When specifying a point-in-time, a set of specific search parameters can also be defined. This section deals with the following topics: ◆ ◆ ◆ ◆ ◆ “Image access modes” “Logged access” “Virtual access” “Direct access” “Disabling image access” RecoverPoint data recovery procedures 73 Concepts Image access modes Table 3 74 When image access is enabled, host applications at the copy site can access the replica. The access modes listed in the following table are available. Image access modes Mode Description Logged access (physical) In Logged access, the system rolls backwards (or forwards) to the snapshot (point in time) you select to access. There will be a delay while the successive snapshots are applied to the replica image to create the image you selected. The length of delay depends on how far the selected snapshot is from the snapshot currently being distributed to storage. Once the access is enabled, hosts will have direct access to the replica volumes, and the RPA will not have access; that is, distribution of snapshots from the journal to storage will be paused. When you disable image access, the writes to the volume while image access was enabled will be rolled back (undone). Then distribution to storage will continue from the accessed snapshot forward. Virtual access (instant) In Virtual access, the system creates the image you wish to access in a separate virtual LUN (or in memory). Access is very fast, as the system does not actually roll to the image in storage, and the virtual volume and the physical volume have the same SCSI ID; therefore the switch from one to the other will be transparent to servers and applications. You can use Virtual access in the same way as logged access; however, it is not suitable if you need to run many commands or if you need data from large areas of the replica. When you disable image access, the virtual LUN and all writes to it are discarded. Virtual access (instant) with Roll image in background In Virtual access with Roll image in background, the system creates the image you wish to access in a virtual volume, which is very fast, as in Virtual access. But, simultaneously in background, the system rolls to the physical image. Once the system has rolled to the image, the virtual volume is discarded, and the physical volume takes its place. At this point, the system continues as for Logged access. The virtual volume and the physical volume have the same SCSI ID; therefore the switch from one to the other will be transparent to servers and applications. When you disable image access, the writes to the volume while image access was enabled will be rolled back (undone). Then distribution to storage will continue from the accessed image forward. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts Table 3 Logged access Figure 4 Image access modes Mode Description Direct access Direct access allows host application to write directly to the replica storage. These changes cannot be automatically undone; that is, the journal is deleted. To restore consistency between the replica and the production storage, it will be necessary to perform a full sweep synchronization. Direct access generally gives better performance than other types of image access, especially when the replica is written to many times. However, the journal is deleted, you cannot undo the writes, and if a disaster occurs and you lose your other replicas, you will not be able to remove the writes from this replica (as you can no longer synchronize to another replica). The journal consists of snapshots (represented by small vertical lines in the displayed image), which are collections of one or more writes to storage. Since the snapshots are always in strict write-order, writing the next snapshot to storage will create a valid image of the data at the next point in time. By the same token, rolling back to the previous snapshot (undoing the last write to storage) will create a valid image of the storage data as it was at the previous point in time. Schematic of logged image access RecoverPoint data recovery procedures 75 Concepts Image access allows you to pick the exact point in time at which you wish to see the storage data. The selected snapshot marker ( ) in Figure 3 marks the snapshot you wish to access. In logged access, RecoverPoint will roll back (or forward) the snapshots, applying them to the image in storage until the selected image (point in time of the data) is reached. Once this image is reached (it may take time if there are many snapshots to undo or to distribute), you can access it and even modify it (for instance, for testing or data analysis). Any changes made to the image during image access will be recorded in the image access log, so that the image can be restored to its exact state before image access. When you disable image access, all writes made during image access will be undone (according to the undo information in the image access log), and the image will roll forward to distribute writes in the journal to the replica. Virtual access Virtual access works on the same principle, except that the image is not created by rolling back snapshots and applying them to the stored image. Instead, the image is created on a virtual disk by pulling the needed data either from the snapshots or from the storage. The virtual disk is deleted when image access is disabled. Virtual access is quicker than logged access, but is not suitable for extensive processing or when large areas of the image must be accessed. Direct access Direct access is an image access mode that does not impose a limit to the amount of data that you can write to a replica's storage. This type of image access provides better system performance when accessing the replica, because no rollback information to the image access log is being written in parallel with the ongoing disk I/Os. During direct access, host applications write directly to the replica storage. These changes cannot be automatically undone, because when an image is directly accessed, the journal at the replica is deleted. To restore consistency between the replica and the production storage, a full-sweep synchronization is required. However, all of the writes made while direct access is enabled are stored as marking information, and the synchronization process after direct image access is, therefore, much more efficient. Direct access generally gives better performance than other types of image access, especially when the replica is written-to many times. However, the journal is deleted, you cannot undo the writes, and if a disaster occurs and you lose your other replicas, you will not be able 76 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts to remove the writes from this replica (as you can no longer synchronize to another replica). If you wish to preserve a particular image of the replica, to give yourself added protection, you can back up or clone the image before beginning your offline processing. Disabling image access Before disabling image access, shut down host applications accessing the replica and unmount replica volumes from the replica host. Disabling image access restores the storage state to No access. Changes to the replica recorded in the image access log are automatically undone, so that the replica is restored to the state it was in before it was accessed. Failover Failing over a consistency group to a local or remote replica allows system operations to continue as usual from the replica. Hosts attached to the replica continue operations by running applications. If the former production site is still functional, it now serves as a replica of the (former) replica. Snapshots are now transferred from the (former) replica to the (former) production journal and from the production journal to the production storage. The journal of the replica becomes invalid (since the replica is now the source). The same failover procedure can be used for planned maintenance at the production site while the replica site takes over normal operations. When the production storage has been restored or the planned maintenance complete, system operations can be resumed at the original production source by failing back. Note the special case of Recover Production, in which the production source is restored (resynchronized) from the selected replica. Restoration starts from the snapshot (point of time) selected by the user. From that point of time forward, the production source is restored from the replica; that is, in the production source, data that is newer than the selected point in time will be rolled back and rewritten from the version in the replica. In this case, the replica’s journal is preserved and remains valid. The specific failover use cases and procedures for carrying out different types of failovers are discussed in “Testing, Failover, and Migration” on page 223. RecoverPoint data recovery procedures 77 Concepts RecoverPoint synchronization processes The following sections describe the data flow and logic of the RecoverPoint synchronization processes that are responsible for data consistency. The following sections deal with the topics: ◆ ◆ ◆ ◆ ◆ ◆ ◆ Initialization “Initialization” “Full sweeps” “Volume sweeps” “Long initializations” “Short initializations” “First-time initializations” “Fast first-time initializations” Initialization is the RecoverPoint process used to synchronize the data of the replica volumes with their corresponding production volumes, and ensure consistency. Generally, all synchronization processes in RecoverPoint are called Initialization, for specific scenarios see “What types of initialization exist in RecoverPoint?” on page 78. This section answers the questions: ◆ ◆ ◆ ◆ ◆ What types of initialization exist in RecoverPoint? “What types of initialization exist in RecoverPoint?” “When does initialization occur?” “How does initialization work?” “How can I tell a consistency group is being initialized?” “What should I know about initialization?” In specific cases, the initialization process can vary to suite the particular purpose, in an efficient manner, while promoting high performance. In these cases initialization can be called: ◆ ◆ ◆ ◆ ◆ ◆ 78 “Full sweeps” “Volume sweeps” “Long initializations” “Short initializations” “First-time initializations” “Fast first-time initializations” EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts When does initialization occur? Generally, consistency groups are initialized whenever: ◆ A user runs the Start Transfer command after transfer is paused. ◆ A user changes the Primary RPA setting, causing a flipover. ◆ RecoverPoint starts transfer after transfer was paused (for example, after a WAN outage). ◆ RecoverPoint encounters OCWs or ICWs (over-complete writes and incomplete writes). The initialization triggers vary per initialization type (see “What types of initialization exist in RecoverPoint?” on page 78). See each initialization type for a list of specific triggers. How does initialization work? The initialization mechanism works as follows: 1. The production and replica volumes are divided into an equal number of data segments, each. 2. RecoverPoint checks the delta marking information and backlog information to see which segments of the consistency group replica volumes are dirty. 3. A small digital signature (hash) is created of all segments of both the replica and production volumes. 4. The production and replica hashes are compared. 5. RecoverPoint checks these signatures (to see which of the dirty segments are actually different). 6. RecoverPoint ONLY synchronizes (transfers the data of) the segments of the replica volumes that are dirty, and whose values are actually different than their corresponding production volume segments. How can I tell a consistency group is being initialized? When initialization occurs, RecoverPoint lets you know it is happening by: ◆ Logging the following events upon start and finish of initialization: • Synchronization started • Synchronization completed ◆ Displaying the process state and progress. You can see this information: RecoverPoint synchronization processes 79 Concepts • In the RecoverPoint Command Line Interface, by running the get_group_states command, and verifying that the state of transfer is Initializing. • In the RecoverPoint Management Application, by clicking a consistency group name in the Navigation Pane and identifying the state of one or all copys’ Transfer in the consistency group Status Tab is displayed as Init, followed by the progress of the initialization process, in percent. How long does initialization take? The time that it takes to transfer the data will vary, depending on the size of the volumes being replicated, network resources, storage performance, how consistent the replica volume data is with the production volume source, at the point-in-time that the process is triggered, and whether compression was enabled as well as the compressibility of the production data. During normal replication some consistency group segments differ, and a partial synchronization is required. During first-time initialization all of the consistency group segments differ, and a full synchronization is required. What should I know about initialization? Full sweeps 80 During initialization: ◆ The host applications can be either running or not. Initialization of one consistency group does not interfere with the operation of other consistency groups. ◆ All of the production host writes are stored in the replica journal's image access log. These writes are applied to storage by the system automatically after initialization is complete. ◆ No snapshots are created, and therefore, no consistent PITs are available for recovery. In most cases, the initialization process is short, so this is not an issue. ◆ There must be one complete image at the remote side, in order to be able to fail over to the replica. This means that until the initialization process is complete, you will not be able to fail over in the case of a disaster at the production site. A full sweep is an initialization process (see “Initialization” on page 78), which is performed on all of the volumes in a consistency group, when the RecoverPoint system cannot identify which blocks are identical between the production and replica volumes, and must EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts therefore mark all blocks for all volumes in the consistency group, as dirty. This section answers the questions: ◆ ◆ ◆ ◆ ◆ When do full sweeps occur? “When do full sweeps occur?” “How do full sweeps work?” “How long do full sweeps take?” “How can I tell that a full sweep is in progress?” “What should I know about full sweeps?” During normal operation of the RecoverPoint system, full sweep initializations should only occur the first time a consistency group is created. However, to guarantee consistency between a production source and its replica, there may be other cases in which a full sweep may be required. Full sweeps are required when: ◆ A user enables a disabled consistency group. ◆ A user accesses the replica in Direct access mode. ◆ A user removes a journal from a consistency group. ◆ A user changes the Proportion of journal allocated for image access log (20-80%) setting. ◆ There is no marking information due to: • A splitter malfunction (in which case the consistency group is still enabled but unavailable until the splitter functions again). • An I/O load that exceeds the system limit (see the EMC RecoverPoint and RecoverPoint/SE Release Notes for this limit). • An RPA being unable to perform marking (for example, the production journal is inaccessible). • A double (hardware) disaster (for example, a concurrent failure of a splitter and an RPA). • A production journal loss or malfunction. • A user runs the Set Markers command. Full sweeps also occur when: ◆ A user replaces a Brocade switch (whether there is one path or multi path to storage). RecoverPoint synchronization processes 81 Concepts How do full sweeps work? How long do full sweeps take? How can I tell that a full sweep is in progress? ◆ A user swaps LUN numbers on the storage array, when the LUNs have already been exposed to RPAs (if not done according to procedure). ◆ In Brocade, a user binds and performs LUN discovery on the storage, causing a new ITL to be discovered to a specific volume. Same as the initialization process, except that in a full sweep, ALL of the volume segments in the consistency group are marked as dirty (see “How does initialization work?” on page 79). See “How long does initialization take?” on page 80 for a detailed description. When a full sweep occurs, RecoverPoint lets you know it is happening by: ◆ Logging the event Next synchronization will be a full sweep before the following initialization events: • Synchronization started • Synchronization completed ◆ Displaying the process state and progress. You can see this information: • In the RecoverPoint Management Application, by clicking a consistency group name in the Navigation Page and identifying the state of Transfer in the consistency group Status Tab is Init, followed by the progress of the initialization process, in percent. • In the RecoverPoint Command Line Interface, by running the get_group_states command, and verifying that the state of transfer is Initializing. Avoiding full sweep when swapping LUN numbers Swapping LUN numbers when those LUNs that have already been exposed to an RPA cluster should be avoided when possible. When it cannot be avoided, it should be done according to the following procedure, to avoid a full sweep of the entire consistency group. Loss of the journal cannot be avoided. 1. Remove both LUNs from their respective replication sets. Do not disable the consistency group. 2. Swap the LUNs (on the storage array). 82 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts 3. Refresh the SAN view. Use one of the following procedures: a. If using SANTap services. Refresh the SAN view of the SANTap switch. Use the RecoverPoint CLI command refresh_santap_view. b. If using Brocade splitter agent, wait 5 minutes, to allow the switch time to update its SAN view. c. If using host-based splitter: AIX hosts. Use AIX shell command cfgmgr to rescan the SAN. Then from AIX command line, run rc.kdrv refresh_view. Solaris hosts. Use Solaris shell command drvconfig to rescan the SAN. Then from Solaris command line, run rc.kdrv refresh_view. Windows hosts. Click Start > Settings > Control Panel > System. On Hardware tab, double-click Device Manager. Right-click on Disk drives > Scan for hardware changes. 4. Add LUNs to their replication sets. They will automatically be attached to their splitters. A volume sweep of those LUNs occurs. What should I know about full sweeps? Volume sweeps See “What should I know about initialization?” on page 80 for a detailed description. A volume sweep is an initialization process (see “Initialization” on page 78), which is performed on a specific volume in a consistency group, whose algorithm is the same as that of a full sweep, except that in a volume sweep, only the segments of specific volume/s are marked as dirty. Note: A volume sweep on all volumes in a consistency group is called a full sweep. This section answers the questions: ◆ ◆ ◆ ◆ ◆ ◆ “What are volume sweeps used for?” “When do volume sweeps occur?” “How do volume sweeps work?” “How long do volume sweeps take?” “How do I know that a volume sweep is in progress?” “What should I know about volume sweeps?” RecoverPoint synchronization processes 83 Concepts What are volume sweeps used for? Volume sweeps are used when the RecoverPoint system cannot identify which blocks are identical between the production and replica volume, and must therefore mark all blocks in the volume as dirty. When do volume sweeps occur? Volume sweeps occur whenever a user: How do volume sweeps work? ◆ Defines a new volume in RecoverPoint. ◆ Defines a new splitter in RecoverPoint. ◆ Enables a volume that has been disabled. ◆ Enables a splitter that has been disabled. ◆ Adds a new Replication Set to an enabled consistency group. ◆ In SANTap or Brocade, attaches a new host. ◆ Manually attaches a volume to a splitter. ◆ Adds a new replication set to an enabled consistency group that does not contain any additional replication sets. Same as the initialization process, except that in a volume sweep, ALL of the segments of the specific volume/s are marked as dirty (see “How does initialization work?” on page 79). How long do volume sweeps take? See “How long does initialization take?” on page 80 for a detailed description. How do I know that a volume sweep is in progress? When a volume sweep occurs, RecoverPoint lets you know it is happening by: ◆ Logging the event Next synchronization will be a volume sweep before the following initialization events: • Synchronization started • Synchronization completed ◆ Displaying the process state and progress. You can see this information: • In the RecoverPoint Management Application, by clicking a consistency group name in the Navigation Page and identifying the state of Transfer in the consistency group Status Tab is Init, followed by the progress of the initialization process, in percent. 84 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts • In the RecoverPoint Command Line Interface, by running the get_group_states command, and verifying that the state of transfer is Initializing. What should I know about volume sweeps? Long initializations See “What should I know about full sweeps?” on page 83 for a detailed description. Also, when a volume sweep is triggered, all of the volumes of the consistency group undergo a short initialization (see “Short initializations” on page 85) in parallel with the volume sweep. See “One-phase distribution” on page 102 to answers the questions: ◆ ◆ ◆ Also known as: Short initializations “When is one-phase distribution triggered?” “How do I know one-phase distribution is happening?” “What are the work-arounds to one-phase distribution?” Long initialization is also known as One-phase distribution, Long init, Long re-sync, Non-consistent init, Init nc, and Init non-consistent. A short initialization is an initialization process (see “Initialization” on page 78) that uses marking information to re-synchronize a copy's replica volumes with their production sources. Because this initialization process uses delta markers to synchronize the replica with production, the initialization process is much faster and more efficient. This section answers the questions: ◆ ◆ ◆ ◆ ◆ “When do short initializations occur?” “How do short initializations work?” “How long do short initializations take?” “How do I know that a short initialization is in progress?” “What should I know about short initializations?” RecoverPoint synchronization processes 85 Concepts Also known as: When do short initializations occur? Short initialization is also known as Short init, Short resync, Short resynchronization, and Resynchronization. Short initializations generally occurs when restarting transfer for a consistency group after a pause in transfer. How do short initializations work? See “How does initialization work?” on page 79 for a detailed description. How long do short initializations take? See “How long does initialization take?” on page 80 for a detailed description. How do I know that a short initialization is in progress? See “How can I tell a consistency group is being initialized?” on page 79 for a detailed description. What should I know about short initializations? See “What should I know about initialization?” on page 80 for a detailed description. First-time initializations A first-time initialization is a full sweep (see “Full sweeps” on page 80) that happens when a consistency group is enabled. First-time initializations generally happen when the system is new, and all consistency group volumes need to undergo a full sweep before they can be used for data recovery purposes. Note: During first-time initialization, the journal is unnecessary, since the replica does not contain previous data that can be used to construct a complete image, and a complete image must be transferred before failover is possible. The following considerations apply to first-time initializations only: 86 ◆ You can use the initialization from backup procedure to initially synchronize the replica with production, saving the time it would otherwise take to synchronize the data over a WAN or FC connection (see “First-time initialization from backup” on page 225). ◆ By default, RecoverPoint writes the first snapshot directly to the replica storage, bypassing the journal. You can override the default setting (using the Perform fast first-time initialization EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts setting, see “Fast first-time initializations” on page 87) to write the initialization snapshot first to the journal (which is more time-consuming but provides greater data protection). ◆ To enable failover during initialization, it is recommended to disable the Allow distribution of snapshots that are larger than capacity of journal volumes setting. See “Initialization” on page 78 for a more detailed description of initialization. Fast first-time initializations A first-time initialization is a full sweep (see “Full sweeps” on page 80) that happens when a consistency group is enabled. The initialization snapshot is written directly to the replica storage (see “First-time initializations” on page 86), bypassing the journal, and the Perform fast first-time initialization setting is enabled. Note: During first-time initializations, the journal is unnecessary, since the replica does not contain previous data that can be used to construct a complete image, and a complete image must be transferred before failover is possible. See “Initialization” on page 78 for a more detailed description of initialization. This section answers the questions: ◆ ◆ ◆ ◆ ◆ “When do fast first-time initializations occur?” “How do fast first-time initializations work?” “How long do fast first-time initializations take?” “How do I know a fast first-time initialization is occurring?” “What do I need to know about fast first-time initializations?” When do fast first-time initializations occur? Whenever a consistency group is enabled for the first time, while the Perform fast first-time initialization setting is enabled. By default, the Perform fast first-time initialization setting is enabled. How do fast first-time initializations work? The fast first-time initialization process is the same as the Initialization process (see “How does initialization work?” on page 79). However, in first-time initialization processes, RecoverPoint writes the initialization snapshot directly to the storage, bypassing the journal. RecoverPoint synchronization processes 87 Concepts How long do fast first-time initializations take? How do I know a fast first-time initialization is occurring? This process is substantially faster than the Initialization process performed through the journal (when Perform fast first-time initialization setting is disabled), see “How long does initialization take?” on page 80. When a fast first-time initialization occurs, RecoverPoint lets you know it is happening by changing the state of Transfer to Init and the Image state to Long resync. To display these indicators: What do I need to know about fast first-time initializations? 88 ◆ In the RecoverPoint Command Line Interface, run the get_group_states command, to view the image and transfer states. ◆ In the RecoverPoint Management Application, click a consistency group name in the Navigation Page to display the consistency group Status Tab to view the Image and Transfer states. During fast first-time initializations, the distribution process is much faster, but no history is saved in the journal. Also, from the start of the process, and until the end of the process, the replica is not consistent with its production source, Therefore, if a disaster were to occur during this process, you would not be able to fail over to the replica and a full sweep would be required. EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts RecoverPoint data flow The following sections describe the data flow and logic of the RecoverPoint phases that are responsible for replication. RecoverPoint replication phases The three major phases performed by RecoverPoint to guarantee data consistency and availability (RTO and RPO) during replication. There are three RecoverPoint replication phases: ◆ ◆ ◆ “The write phase” (the splitting phase) “The transfer phase” “The distribution phase” Each of these phases are processes performed by each consistency groups Primary RPA, and controlled by the policies and settings set by the user, through the Hardware Management Wizard, the RecoverPoint Management Application and the RecoverPoint Command Line Interface. The write phase The write phase is the RecoverPoint replication phase in which host writes are intercepted by the splitter and received by the local RPA, prior to transfer (see “The transfer phase” on page 90). Generally, the flow of data for write transactions is as follows: 1. The production host writes data to the production volumes which is intercepted by the splitter. The splitter sends the write data to the RPA. 2. Immediately upon receipt of the write data, the local RPA returns an ACK to the splitter. 3. The splitter then writes the data to the production storage volume. 4. The storage system returns an ACK to the splitter upon successfully writing the data to storage. 5. The splitter sends an ACK to the host that the write has been completed successfully. The sequence of events 1-5 can be repeated multiple times, and in parallel, for multiple writes. RecoverPoint data flow 89 Concepts Note: This is the flow of data for host-based splitters. For intelligent fabric and array splitters, the flow of data varies, per splitter. The transfer phase The transfer phase is the RecoverPoint replication phase in which host writes are sent from a source RPA to a target RPA, after “The write phase” and before “The distribution phase”. The transfer phase differs for: ◆ ◆ ◆ “Non-distributed (regular) groups” “Asynchronous distributed groups” “Synchronous distributed groups” For distributed and non-distributed (regular) consistency groups, the transfer phase is over when an ACK is received by the source RPA. Note: The following examples are of CRR configurations. In CDP configurations (see “RecoverPoint configurations” on page 22), all of the following steps are performed within the local RPA using inter-process communications (IPC). See “Consistency groups” on page 30 for more information on distributed and non-distributed consistency groups. Non-distributed (regular) groups For non-distributed consistency groups, the flow of data during the transfer phase is as follows: 1. After processing the data (for example, applying the various compression techniques), the source RPA sends the data to the target RPA. 90 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts 2. The target RPA writes the data to the journal. 3. Upon the successful writing of the data to the RPA or the journal (depending on the value of the Measure lag when writes reach the target RPA (opposed to journal setting), the target RPA returns an ACK to the source RPA, see “Measure lag when writes reach the target RPA (as opposed to the journal)” on page 150. Asynchronous distributed groups For distributed consistency groups, in asynchronous replication mode, the flow of data during the transfer phase is as follows: 1. The source Primary RPA divides the consistency group into four segments. 2. The source Primary RPA routes the relevant consistency group segments to the appropriate secondary RPAs at the source side. 3. After processing the data (for example, applying the various compression techniques), the primary and secondary RPAs at the source side send their consistency group segments to their corresponding target RPAs. RecoverPoint data flow 91 Concepts 4. Each target RPA writes the data of its consistency group segment to a different stream in the replica journal. Note: Upon the successful writing of the data to the target RPA or the replica journal (depending on the value of the Measure lag when writes reach the target RPA setting), each target RPA returns an ACK to its corresponding source RPA, see “Measure lag when writes reach the target RPA (as opposed to the journal)” on page 150. 5. All secondary RPAs at the source side send their ACKs back to the Primary RPA. Synchronous distributed groups For distributed consistency groups, in synchronous replication mode, the flow of data during the transfer phase is as follows: 1. The source Primary RPA sends the consistency group data to the Primary RPA at the target side. 92 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts 2. The Primary RPA at the target side divides the consistency group data into four segments, and routes the relevant consistency group segments to the appropriate secondary RPAs. 3. Each target RPA writes the data of its consistency group segment to a different stream in the target journal. Note: Upon the successful writing of the data to the target RPA or the replica journal (depending on the value of the Measure lag when writes reach the target RPA setting), each secondary RPA at the target side returns an ACK to the primary RPA at the target side, see “Measure lag when writes reach the target RPA (as opposed to the journal)” on page 150. 4. The primary RPA at the target side returns an ACK to the primary RPA at the source side. Note: During initialization, the data flow of synchronous distributed groups is identical to that of asynchronous distributed groups, see “Asynchronous distributed groups” on page 91. The distribution phase The distribution phase is the RecoverPoint replication phase responsible for the writing of the production image to the target replica storage, which is performed by the target RPA, after “The transfer phase”. Since the replica storage is being written to during this process, during distribution, the state of the replica Storage is No Access. By default, the system distributes in five-phase distribution mode (see “Five-phase distribution” on page 94). In rare cases the system switches to three-phase distribution mode (also called fast-forwarding, see “Three-phase distribution” on page 97), and in some initialization scenarios, the system switches to one-phase distribution mode (see “One-phase distribution” on page 102). The replica journal history consists of snapshots that have already been distributed to the replica storage and snapshots that are still waiting for distribution in the queue of snapshots waiting for distribution. When data is received by the RPA faster than it can be distributed to the replica storage volumes, it accumulates in the queue of snapshots waiting for distribution of the replica journal. RecoverPoint data flow 93 Concepts The Maximum Journal Lag setting dictates the maximum amount of snapshot data (in MB or GB) that is permissible to retain in the replica journal before distribution to the replica storage. In other words, the amount of data that would have to be distributed to the replica storage before failover to the latest image could take place, or (in terms of RecoverPoint’s role in the RTO) the maximum time that would be required in order to bring the replica up-to-date with production. Note: For distributed consistency groups, regardless of the distribution mode, each target RPA is responsible for the distribution of its own consistency group segments to the replica storage. See “Consistency groups” on page 30 for more information on distributed and non-distributed consistency groups. The following sections deal with the topics: ◆ ◆ ◆ ◆ Five-phase distribution “Five-phase distribution” “Three-phase distribution” “One-phase distribution” “How do I monitor distribution performance in RecoverPoint?” Five-phase distribution is the default distribution mode in RecoverPoint. This section answers the questions: ◆ ◆ ◆ “How does the default (five-phase) distribution process work?” “How do I know five-phase distribution is happening?” “How do I monitor the performance of five-phase distribution?” How does the default (five-phase) distribution process work? The five-phase distribution process works in the following way: 1. The target RPA writes the newest data (the most current writes made by the host applications) to the beginning of the queue of snapshots waiting for distribution in the replica journal. 2. The target RPA reads the oldest data at the end of the queue of snapshots waiting for distribution to the replica storage. 3. The target RPA reads the current data of the replica volume. 94 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts 4. The target RPA writes the current data of the replica volume to the top of the distributed snapshot list, so that the replica volume can be rolled back. 5. The target RPA writes the newest data to the replica storage. How do I know five-phase distribution is happening? The following indicators are displayed in the RecoverPoint Management Application (GUI) during five-phase distribution: ◆ In the consistency group Status Tab; the Image value is either Distributing pre-replication image or Distributed followed by date and timestamp of the snapshot that is currently being distributed. RecoverPoint data flow 95 Concepts ◆ In the copy Journal Tab, the Current parameter value is either Distributing pre-replication image or Distributed followed by the date and timestamp of the snapshot that is currently being distributed. ◆ The following indicators are displayed in the RecoverPoint Command Line Interface (CLI) during five-phase distribution: When the command get_group_state is run, the value of the Journal parameter is DISTRIBUTING IMAGES TO STORAGE. How do I monitor the performance of five-phase distribution? To monitor the performance (and duration) of the distribution process in five-phase distribution mode, see “How do I monitor distribution performance in RecoverPoint?” on page 104. 96 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts Three-phase distribution Three-phase distribution is a distribution process that skips the steps necessary for the storing of data in the distributed snapshot list of the replica journal, in order to overcome journal or storage performance issues, or enforce RTO policies set by a user. The system only switches to three-phase distribution when the write-rate of the production host is greater than the distribution-rate of five-phase distribution. This, over time, can cause the queue of snapshots waiting for distribution to reach the maximum journal capacity (or the value of the maximum journal lag setting), and leave no space for the distributed snapshot list to be saved. In three-phase distribution, there are only three I/O operations as opposed to the five of five-phase distribution, which decreases the minimum throughput requirement needed to distribute write information to the replica storage, and significantly speeds up the distribution process. Typically, the system will do its best to stay in five-phase distribution mode, and will go back to five-phase distribution as soon as possible after entering three-phase distribution mode. Note: Since the three-phase distribution process does not write the data to the distributed snapshot list of the replica journal, no PITs can be recovered prior to the time that the process is triggered. However, this is not cause for data-loss concern as during this process, all write-data is saved in the queue of snapshots waiting for distribution in the replica journal. This section answers the questions: ◆ ◆ ◆ ◆ ◆ Also known as: “When is three-phase distribution triggered?” “When does three-phase distribution end?” “What should I be aware of in three-phase distribution mode?” “How does three-phase distribution work?” “How do I know three-phase distribution is happening?” Three-phase distribution is also known as: Fast-forwarding When is three-phase distribution triggered? The system switches to three-phase distribution mode (also called fast-forwarding): ◆ When there are performance issues with either the replica or journal storage. ◆ When the journal lag exceeds the Maximum Journal Lag setting (that defines the RTO policy) in the Journal Copy Policy, see “Copy Journal Policy Settings” on page 155. RecoverPoint data flow 97 Concepts When does three-phase distribution end? If the system entered three-phase distribution mode because of performance issues with either the replica or journal storage, the system resumes five-phase distribution after a short period of time. If the system entered three-phase distribution mode because of the journal lag exceeded the Maximum Journal Lag setting, the system resumes five-phase distribution as soon as the actual journal lag falls below the value of the Maximum Journal Lag setting. What should I be aware of in three-phase distribution mode? During three-phase distribution, no undo information is saved in the replica journal. Also, when a three-phase distribution process is triggered, some of the journal history is lost. The exact amount of journal history that is lost, and why, can be generally explained as follows. The replica journal history consists of snapshots that have already been distributed to the replica storage and snapshots that are still waiting for distribution in the queue of snapshots waiting for distribution. When data is received by the RPA faster than it can be distributed to the replica storage volumes, it accumulates in the queue of snapshots waiting for distribution of the replica journal. The Maximum Journal Lag is the maximum amount of snapshot data (in MB, or GB) that is permissible to retain in the replica journal before distribution to the replica storage. In other words, the amount of data that would have to be distributed to the replica storage before failover to the latest image could take place, or (in terms of RTO) the maximum time that would be required in order to bring the replica up-to-date with production. When the Maximum Journal Lag value is exceeded, in order to accelerate the distribution process, the system starts clearing out all snapshots irrelevant to the data that would have to be distributed to the replica storage before failover to the latest image could take place, to ensure the RTO policy. 98 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts For example: Lets say production has made writes up to the present point-in-time (Now), and all of the snapshots in the replica journal up until point-in-time T1 are waiting for distribution, after which all snapshots have already been distributed to the replica storage (up until point-in-time T0), so the total journal lag (the maximum time that would be required in order to bring the replica up-to-date with production) is between point-in-time T1 and Now (or Now > T2 > T1), where point-in-time T1 represents the state of data at the replica storage Now. Now let’s say that a Maximum journal lag policy of 1GB has been set by the user (see “Copy Journal Policy Settings” on page 155) and the journal lag exceeds the specified value. In this case: 1. RecoverPoint discards all of the snapshots that have already been distributed (T1>T0) to the replica storage, since this data has already been applied to the replica. 2. To ensure the RTO policy is met, RecoverPoint starts distributing in three-phase distribution mode until the journal lag falls below the maximum journal lag policy (T2). 3. When the journal lag falls below the maximum journal lag policy (T2), RecoverPoint resumes five-phase distribution. Note: All of the snapshots between Now > T2 (the maximum journal lag), and all following snapshots, are retained in the replica journal, and available for data recovery. RecoverPoint data flow 99 Concepts How does three-phase distribution work? The three-phase distribution process works in the following way: 1. The target RPA writes the newest data (the most current writes being made by the host applications) to the beginning of the queue of snapshots waiting for distribution in the replica journal. 2. The target RPA reads the oldest data at the end of the queue of snapshots waiting for distribution to the replica storage. 3. The target RPA writes the data to the replica volume. 100 EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts How do I know three-phase distribution is happening? The following indicators are displayed in the RecoverPoint Management Application (GUI) during three-phase distribution: ◆ In the consistency group Status Tab; the text (fast forward) is displayed next to the current image time stamp. ◆ In the copy Journal Tab; (fast forward) is displayed next to the current image time stamp. ◆ The following indicators are displayed in the RecoverPoint Command Line Interface (CLI) during three-phase distribution: RecoverPoint data flow 101 Concepts When the command get_group_statistics is run; the value of the Mode parameter in the copy Journal section is Fast Forward. How do I monitor the performance of three-phase distribution? To monitor the performance (and duration) of the distribution process in three-phase distribution mode, see “How do I monitor distribution performance in RecoverPoint?” on page 104. One-phase distribution 102 One-phase distribution is a distribution process in which the target RPA writes the initialization data directly to the replica volume (bypassing the journal). This process is used to save on initialization time, in times in which the saving of a journal history is not critical (for example, in first-time initialization, when the first snapshot being transferred contains the whole image). When the initialization EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts snapshot is too large for the capacity of the journal dedicated on storage, and the saving of a journal history is not critical, enabling this distribution mode saves the cost of adding additional journal volumes for the sole purpose of storing the initialization snapshot. Note: During one-phase distribution, the distribution process is much faster, but no history is saved in the journal. Also, from the start of the process, and until the end of the process, the replica is not consistent with its production source. Therefore, if a disaster were to occur during this process, you would not be able to fail over to the replica until a full sweep was performed. This section answers the questions: ◆ ◆ ◆ “When is one-phase distribution triggered?” “How do I know one-phase distribution is happening?” “What are the work-arounds to one-phase distribution?” RecoverPoint data flow 103 Concepts Also known as: One-phase distribution is also known as: Long initialization, Long init, Long re-sync, Non-consistent init, Init nc, Init non-consistent When is one-phase distribution triggered? The system switches to one-phase distribution mode only during initialization, and only in the following cases: During first-time initialization: ◆ When a group is first enabled (because the Perform fast first-time initialization setting is enabled by default). ◆ During a long resync (also known as non-consistent initialization) when the Allow distribution of snapshots that are larger than the capacity of journal volumes setting is enabled (This option is enabled by default, but can be changed by the user). How do I know one-phase distribution is happening? During one-phase distribution, for the replica being initialized: ◆ A warning event with the text initializing in long resync mode is logged. ◆ The image state is displayed as Long resync. You can display the image state: • By running the get_group_states command in the RecoverPoint Command Line Interface (CLI). • In the consistency group Status Tab of the RecoverPoint Management Application (GUI). What are the work-arounds to one-phase distribution? To avoid long initializations, perform one of the following procedures in the RecoverPoint Administrator's Guide: How do I monitor distribution performance in RecoverPoint? 104 ◆ Use the initialization from backup procedure. ◆ Enable the Allow distribution of snapshots that are larger than the capacity of the journal setting. ◆ Add volumes to the replica journal - in this case, the additional journal volumes (space) must be permanent or a full sweep will occur when the volumes are removed (even though the user really only needs this extra space for the duration of the init). To monitor distribution process performance: 1. Run the detect_bottlenecks command in the RecoverPoint Command Line Interface (CLI). EMC RecoverPoint Release 3.3 Administrator’s Guide Concepts 2. Select the 4) General detection including initialization and high load periods with peak writing analysis option. 3. Answer yes to both Do you want an advanced overview? and Do you want a detailed overview? 4. Specify the other required information. 5. Use your spacebar to scroll-down until you reach a section that starts with the text: System overview of the copy:<copyname> 6. See “Five-phase distribution”. Note the process steps labeled as Distributor phase 1 and Distributor phase 2, and note their performance statistics. 7. See “Three-phase distribution”. Note the process labeled Fast forward distribution duration, and note the statistics. RecoverPoint data flow 105 Concepts RecoverPoint workflows The following sections deal with the topics: ◆ ◆ ◆ ◆ “Configuring replication” “Monitoring and managing RecoverPoint” “Moving operations to another site” “Event notification” Configuring replication After completing the installation process (see EMC RecoverPoint Deployment Manager Product Guide), configure the RecoverPoint system to start replication. The entire process is described step-by-step in “Starting Replication” on page 127. Monitoring and managing RecoverPoint Once replication has been configured and started, the user normally does not need more than minimal involvement with the system. Monitoring replication is very simple and described in “Managing and Monitoring” on page 177. RecoverPoint management activities provide access to the settings affecting RecoverPoint operation. The settings, how they affect operation, and how to access them is described in “Managing and Monitoring” on page 177. Moving operations to another site In case of disaster, or simply to perform routine maintenance on your production site, you may wish to fail over operations to another site. Failover use cases and the procedures for carrying out different types of failovers are discussed in “Testing, Failover, and Migration” on page 223. Event notification RecoverPoint supports the following types of event notification: ◆ ◆ ◆ ◆ ◆ ◆ 106 “E-mail notification” “SNMP notification” “Syslog notification” “System reports” “System alerts” “Collecting system information” EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started Getting Started This section describes how to obtain and enter RecoverPoint license and activation codes and how to administer user settings. The topics in this section are: ◆ ◆ ◆ ◆ Licensing overview.......................................................................... The Getting Started Wizard ............................................................ Managing RecoverPoint licences ................................................... Access control ................................................................................... Getting Started 108 110 113 119 107 Getting Started Licensing overview The temporary license key that ships with RecoverPoint appliances is valid for seven days from initial installation. After that, you must have an activated license to replicate with RecoverPoint appliances. ◆ To enable RecoverPoint for seven days, see “Defining your license key in RecoverPoint” on page 113. ◆ To enable RecoverPoint permanently, see “Requesting an activation code” on page 114 and “Defining your activation code in RecoverPoint” on page 115. ◆ To upgrade RecoverPoint licenses, see “Upgrading your license” on page 116. ◆ To reactivate RecoverPoint licenses, see “Re-activating your license” on page 117. ◆ To display your RecoverPoint license information, see “Viewing your license information” on page 118. The license controls the following system capabilities: Table 4 108 RecoverPoint license parameters Parameter Description Range of Values Expiration When license expires • Number of days until expiration • Permanent license Storage Type Supported Types of storage arrays supported • Unlimited • Homogeneous Storage Arrays Number of storage arrays supported at a single site none Cluster Size Maximum number of RPAs installed at each site 2–8 Remote Replication Maximum capacity of remote replica none EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started Table 4 RecoverPoint license parameters Parameter Description Range of Values Local Replication Maximum capacity of local replica none Compression Is compression of data transferred over the WAN supported? Supported/not supported Journal compression Whether journal compression is enabled. Supported/not supported Licensing overview 109 Getting Started The Getting Started Wizard The first time the Management Application GUI is run after a RecoverPoint installation, the Getting Started Wizard is displayed to guide users through the steps necessary to configure the basic RecoverPoint settings needed to deploy their applications properly. This wizard is only displayed once, and is no longer accessible after these initial settings have been defined. Although the wizard is no longer accessible, all of the settings defined through the wizard are. These settings and how they can be accessed are discussed in detail in each relevant section. There are three screens in the Getting Started Wizard: 1. Welcome screen - provides users with a brief description of the wizard and its functionality, see “Welcome screen” on page 110. 2. Account Settings screen - allows users to enter and display their licensing information and activate the product, see “Account Settings screen” on page 110. 3. System Report Settings screen - allows users to configure the RecoverPoint system report mechanism, see “System Report Settings screen” on page 111. Welcome screen The first time you access the RecoverPoint Management Application, the Getting Started Wizard is displayed. The Welcome Screen is the first screen of the Getting Started Wizard. It provides users with a brief description of the wizard and its functionality. Click the Next button in the welcome screen to configure your RecoverPoint license, see “Account Settings screen” on page 110. Account Settings screen 110 The Account Settings screen is displayed via the: ◆ Getting Started Wizard, the first time you open the Management Application GUI after RecoverPoint is installed. ◆ System > System Settings > Account Settings Tab of the Management Application GUI, each subsequent time the Management Application GUI is displayed. EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started In the Account Settings screen: ◆ To enable RecoverPoint for seven days, see “Defining your license key in RecoverPoint” on page 113. ◆ To enable RecoverPoint permanently, see “Requesting an activation code” on page 114 and “Defining your activation code in RecoverPoint” on page 115. ◆ To upgrade RecoverPoint licenses, see “Upgrading your license” on page 116. ◆ To reactivate RecoverPoint licenses, see “Re-activating your license” on page 117. ◆ To display your RecoverPoint license information, see “Viewing your license information” on page 118. Click the Next button in the Account Settings screen to configure the system alerts and reports mechanisms, see “System Report Settings screen” on page 111. System Report Settings screen The System Report Settings are displayed via the: ◆ Getting Started Wizard, the first time you open the Management Application GUI after RecoverPoint is installed. ◆ System > System Settings > System Report Settings Tab of the Management Application GUI, each subsequent time the Management Application GUI is displayed. In the System Report Settings screen: ◆ Specify the Transfer Method through which you want to send the system report. Server addresses may be entered either in IP or DNS format. If entered in IP format, both IPv4 and IPv6 addresses are valid. ◆ To enable the automatic sending of weekly system reports, check the System Reports checkbox. ◆ To include system alerts in the system report, check the System Alerts checkbox. ◆ To encrypt the output with RSA encryption using a 256-bit key before sending, check the Encrypt checkbox. ◆ To compress the output before sending, check the Compress checkbox. The Getting Started Wizard 111 Getting Started Note: See “System reports” on page 260 and “System alerts” on page 266 for more information about RecoverPoint reports and alerts. Click the Finish button to close the Getting Started Wizard and apply your changes. 112 EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started Managing RecoverPoint licences The following sections guide you through the processes of: ◆ ◆ ◆ ◆ ◆ ◆ Prerequisites Defining your license key in RecoverPoint Before you begin Before performing the following tasks: ◆ Make sure you have the email sent to you by your EMC account executive containing your RecoverPoint license information. You will need to have your Account ID, Software Serial IDs, Company Name, Contact Info (e-mail address) and RecoverPoint License Key readily available. ◆ Make sure you are logged into the RecoverPoint Management Application as admin. To start using RecoverPoint, you must define a valid license key in the RecoverPoint system. Within seven calendar days of defining a valid license key in the RecoverPoint system, you must also define a valid activation code. If a valid activation code is not defined within seven calendar days, dialog boxes will be blank and RecoverPoint will not work. See “Requesting an activation code” on page 114. Before you begin: ◆ ◆ How to define your license key in RecoverPoint “Defining your license key in RecoverPoint” “Requesting an activation code” “Defining your activation code in RecoverPoint” “Upgrading your license” “Re-activating your license” “Viewing your license information” Read “Licensing overview” on page 108 Perform the “Prerequisites” on page 113 To define your license key in the RecoverPoint system: 1. If you are installing the RecoverPoint license from the Getting Started Wizard (see “The Getting Started Wizard” on page 110), skip this step. From the main menu of the RecoverPoint Management Application GUI, select System > System Settings and click the Account Settings link in the navigation area. Managing RecoverPoint licences 113 Getting Started The Account Settings screen is displayed. 2. Copy the Account ID, Software Serial IDs, Company Name, and Contact Info (e-mail address) from the email containing your license information to the appropriate fields. 3. Click the Update button. The Updating license key dialog box is displayed. a. Copy the license key from the email containing your license information to the License Key field b. Click the OK button to exit the dialog box. 4. Click the Apply button. Your license key should now be displayed in the Licence Keys section of the dialog box. RecoverPoint is now enabled for use for seven calendar days. To enable RecoverPoint permanently, go on to perform “Requesting an activation code” on page 114. Requesting an activation code Before you begin To enable RecoverPoint permanently, you must define a valid license key and activation code in the RecoverPoint system. Before you begin: ◆ ◆ How to request an activation code Perform the “Prerequisites” on page 113. Perform the steps in “Defining your license key in RecoverPoint” on page 113. To request an activation code: 1. If you are installing the RecoverPoint license from the Getting Started Wizard (see “The Getting Started Wizard” on page 110), skip this step. From the main menu of the RecoverPoint Management Application GUI, select System > System Settings and click the Account Settings link in the navigation area. The Account Settings screen is displayed. 2. Click the Obtain Activation Code link. The RecoverPoint Licensing Server Login page is displayed in your default browser window. 114 EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started 3. Open the email containing your license information. Copy your license key and account ID, and paste them into the appropriate fields on the Web page. 4. Click the Login button. The RecoverPoint Licensing Server License Details page is displayed. 5. Click the Obtain activation code button. The RecoverPoint Licensing Server Obtain Activation Code page is displayed. 6. Copy the required information from the email containing your license information, into the appropriate fields. 7. Click the Obtain button. Your request is processed and your activation code is immediately sent to the specified email address. To enable RecoverPoint permanently, proceed on to “Defining your activation code in RecoverPoint” on page 115. Defining your activation code in RecoverPoint Before you begin To enable RecoverPoint permanently, you must define a valid license key and activation code in the RecoverPoint system. Before you begin: ◆ ◆ How to define your activation code in RecoverPoint Perform the “Prerequisites” on page 113 Perform “Defining your license key in RecoverPoint” on page 113 and “Requesting an activation code” on page 114 To define your activation code in the RecoverPoint system: 1. If you are activating RecoverPoint from the Getting Started Wizard (see “The Getting Started Wizard” on page 110), skip this step. From the main menu of the RecoverPoint Management Application GUI, select System > System Settings and click the Account Settings link in the navigation area. The Account Settings screen is displayed. 2. Click the Update button. The Updating license key dialog box is displayed. Managing RecoverPoint licences 115 Getting Started a. Enter the license key and the activation code you received via email to the appropriate fields. b. Click the OK button to exit the Updating license key dialog box. 3. Click the Apply button. Your activation code is displayed in the License Keys section to the bottom of the screen. Your RecoverPoint product is now permanently enabled. Upgrading your license Before you begin To upgrade your RecoverPoint license, you will have to request the upgrade, and then re-define the license key and activation code in the RecoverPoint system. Before you begin: ◆ ◆ How to upgrade your RecoverPoint license Read “Licensing overview” on page 108. Perform the “Prerequisites” on page 113. To upgrade your RecoverPoint license: 1. From the main menu of the RecoverPoint Management Application GUI, select System > System Settings and click the Account Settings link in the navigation area. The Account Settings screen is displayed. 2. Click the Obtain Activation Code link. The RecoverPoint Licensing Server Login page is displayed in your default browser window. 3. Open the email containing your license information. Copy your license key and account ID, and paste them into the appropriate fields on the Web page. 4. Click the Login button. The RecoverPoint Licensing Server License Details page is displayed. 5. Click the Request to upgrade version button. The RecoverPoint Licensing Server Request to Upgrade Version page is displayed. 6. Copy the required information from the email containing your license information, into the appropriate fields. 116 EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started 7. Click the Send button. Your request is processed and a new activation code is sent to the specified email address within 48 hours. 8. When the new activation code arrives, perform the process described in “Defining your activation code in RecoverPoint” on page 115. Your new activation code is displayed in the License Keys section to the bottom of the screen, and your new RecoverPoint license is now installed. Re-activating your license Before you begin If you format your repository volume, you may need to request a new activation code from the RecoverPoint licensing server and then re-activate your RecoverPoint license. Before you begin: ◆ ◆ How to re-activate your RecoverPoint license Read “Licensing overview” on page 108. Perform the “Prerequisites” on page 113. To re-activate your RecoverPoint license: 1. From the main menu of the RecoverPoint Management Application GUI, select System > System Settings and click the Account Settings link in the navigation area. The Account Settings screen is displayed. 2. Click the Obtain Activation Code link. The RecoverPoint Licensing Server Login page is displayed in your default browser window. 3. Open the email containing your license information. Copy your license key and account ID, and paste them into the appropriate fields on the Web page. 4. Click the Login button. The RecoverPoint Licensing Server License Details page is displayed. 5. Click the Request to reactivate license button. The RecoverPoint Licensing Server Request to Reactivate License page is displayed. Managing RecoverPoint licences 117 Getting Started 6. Copy the required information from the email containing your license information, into the appropriate fields. 7. Click the Send button. Your request is processed and a new activation code is sent to the specified email address within 48 hours. 8. When the new activation code arrives, perform the process described in “Defining your activation code in RecoverPoint” on page 115. Your new activation code is displayed in the License Keys section to the bottom of the screen, and your new RecoverPoint license is now installed. Viewing your license information See Table 4 on page 108 for a detailed explanation of the system capabilities controlled by your RecoverPoint license. To display your RecoverPoint license information: 1. If you are in the Getting Started Wizard (see “The Getting Started Wizard” on page 110), skip this step. From the main menu of the RecoverPoint Management Application GUI, select System > System Settings and click the Account Settings link in the navigation area. The Account Settings screen is displayed. 2. Your RecoverPoint license information, and the RecoverPoint capabilities defined by your license, are displayed in the License Usage section of the Account Settings screen, after the following procedures are performed: • “Defining your license key in RecoverPoint” on page 113 • “Requesting an activation code” on page 114 • “Defining your activation code in RecoverPoint” on page 115 118 EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started Access control This section discusses authentication and authorization of users in the RecoverPoint system. The following sections deal with the topics: ◆ ◆ User authentication Password security “User authentication” “User authorization” RecoverPoint provides two independent mechanisms for authenticating users: appliance-based authentication and authentication via the organization’s LDAP (Lightweight Directory Access Protocol) server. The two authentication mechanisms can be used simultaneously, LDAP may be used exclusively, or appliance-based authentication can be used exclusively. The command-line interface (CLI) command set_security_level sets the restrictions on passwords. The possible settings are as follows: ◆ High: User passwords to access the RPA must have a minimum of fourteen characters, they can only be reset once every 24 hours, at least two must be lower case, at least two must be upper case, and at least two must be non-alphabetical (either digits or special characters); all user passwords expire in 60 days; the same password cannot be reused until at least ten other passwords have been used. ◆ Low: User passwords to access RPA must have a minimum of five characters, they expire after 60 days. Regardless of the security level, any user who tries unsuccessfully three times to log on will be locked out. To unlock the user, use the CLI command unlock_user. Only users with Security permission can unlock a user. ! IMPORTANT The default security level is set to Low. It is recommended that RecoverPoint administrators set the level to High to meet relevant security standards, such as those of the USA Department of Defence Security Technical Implementation Guides (DoD STIG). Access control 119 Getting Started Configuring authentication To configure RecoverPoint users, from the System menu, select System Settings. Select User Settings, and click the Users tab. The same commands are used to configure users, whether they are authenticated by the RecoverPoint appliance or by an LDAP server. To add, edit, or delete a RecoverPoint user: 1. From the Systems menu, select System Settings. In the System Settings Navigation Pane, select User Settings. 2. To add a user, click Add. Select Local User or LDAP User/Group. To configure a Local User: Refer to Table 5 on page 120. Provide a Username and a Password, according to “Password security” on page 119. Select the Role. To limit access to specific consistency groups, click Limited to consistency groups and select the consistency groups to which you are granting this user access. To configure an LDAP User or Group: Refer to Table 5 on page 120. Select either a user name or a group from the list. Select the Role. To limit access to specific consistency groups, click Limited to consistency groups and select the consistency groups to which you are granting this user access. To edit a user, click Edit and modify the password or permissions as needed. To remove a user, select the user and click Remove. Table 5 Add New User settings Settings Description Local User The following settings define a local new user on the RecoverPoint appliance. User Name Name of new user to add for appliance-based authentication. A user name must start with a lower-case alphabetic character. All subsequent characters must be lower-case alphabetic, numeric, or hyphen (-). No other characters are legal in user names. Password Password for new user. All printing ASCII characters are legal in passwords. Confirm Password 120 EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started Table 5 Add New User settings Settings Description LDAP User/Group To add a user or users that already exists in the Active Directory on the LDAP server, grant them access to the RecoverPoint appliance, and define their role (permissions). To be able to access RecoverPoint, every user in the Active Directory must be added to RecoverPoint user’s list and assigned a role, or be a member of a group that has access. User Name/s Name of user to add to RecoverPoint user list. Groups Name of a group or groups (in Active Directory) to add to the authorized users of RecoverPoint. User Settings Role Predefined users Table 6 Permissions for RecoverPoint are granted on the basis of roles. Set the role for the new local user, LDAP user, or LDAP group. Limited to consistency groups When checked, limits the access of this local user, LDAP user, or LDAP group to the specified consistency groups. Consistency Group Name Check the consistency groups that the specified user or group may access. The RecoverPoint appliance is shipped with the following local users already defined: Predefined users User Role Initial Password Permissions security-admin security security-admin Security: changing users and roles, security levels, LDAP configuration admin admin admin All, except security and webdownload boxmgmt boxmgmt boxmgmt Upgrade monitor monitor monitor None (i.e., read only) webdownload webdownload webdownload Web download You cannot remove the preconfigured users, and you cannot change their permissions. It is recommended that you change the passwords. If you wish to implement a purely LDAP-based authentication Access control 121 Getting Started system, you need not assign (that is, give out the passwords) of any predefined users. Only users with security permission can add users, and can remove and edit permissions for users that have previously been added. Configuring LDAP-based authentication Table 7 To configure RecoverPoint to use the organization’s LDAP server for authentication, go the RecoverPoint Management Application. From the System menu, select System Settings. Select User Settings, and click the LDAP Configuration tab. Enter settings in the LDAP Configuration dialog box. Refer to Table 7 on page 122. LDAP Configuration settings Settings Description LDAP configurations: Enable Active Directory Support Check to activate RecoverPoint authentication and authorization using an LDAP server. Primary LDAP server IP address of the primary LDAP server Secondary LDAP server (optional) IP address of secondary LDAP server Base Distinguished Name Node in the LDAP tree from which to start a search for users: dc=Klaba,dc=COM Search Base Distinguished Name Root of the LDAP user search tree. The suffix of the Search Base Distinguished Name must be the Base Distinguished Name. The format will be similar to the following: cn=Users,dc=Klaba,dc=COM Binding Type: To specify the type of binding (authentication against the Active Directory). Use Anonymous Select only If Active Directory is configured to permit anonymous binding to query the LDAP server. Use the following user: If Active Directory is configured to allow binding only by a specific user, use this option for binding to query LDAP server. Bind Distinguished Name Distinguished name to use for initial binding when querying the LDAP server. The format of the Bind Distinguished Name will be similar to the following: cn=Administrator,cn=Users,dc=Klaba,dc=COM The bind distinguished name can be any user on the LDAP server who has read permission for the directory in the defined search base. 122 EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started Table 7 LDAP Configuration settings Settings Password Description Password of the bind distinguished name to use for initial binding when querying LDAP server Directory Access Protocol LDAP To send LDAP query over non-secure connection. LDAP over SSL To send LDAP query over a secure connection. User certificate from file Path to Active Directory certificate to use for secure communication with LDAP server. RecoverPoint only accepts LDAP certificates in PEM format. To format and install the certificate created on the LDAP server in PEM format, use the following procedure: 1. On the LDAP server, export a copy of the server certificate from the Active Directory server. Use the Certification Authority application's Copy File to … option to export the certificate in Base-64 Encoded X.509 (.cer) format. 2. Copy the server certificate to a system with OpenSSL Certificate Authority software installed. You can use any Linux or Windows system. 3. Log into the system where you copied the certificate, and run the following command: On Linux: > /opt/symas/bin/openssl x509 -in AD_certificate_name -out OpenLDAP_certificate name On Windows: > openssl x509 -in drive:/path/AD_file.cer -inform d -out drive:/path/OpenLDAP_file.pem This step creates the PEM certificate. 4. Install the certificate on each RecoverPoint appliance. At the RecoverPoint Management Application, from the menu select System > System Settings > User Settings > LDAP Configuration tab. Enter the settings, including User certificate from file (the path to the certificate in PEM format). Advanced settings The advanced settings are optional and can be left at their default values. Access control 123 Getting Started Table 7 LDAP Configuration settings Settings Description Search scope Base: to limit a search to the search base distinguished name. Use this option to shorten search times when you are certain that all users are at the search base level. One level: to limit a search to the search base distinguished name and the level immediately below it. Use this option to shorten search times when you are certain that all users are within one level of the search base level. Subtree: search entire subtree from the search base distinguished name down. Search time limit Default 30 sec. Username attributes Name of the attribute that contains the username in the user node of the Active Directory tree. Example: sAMAcountName LDAP group attributes Name of the attribute that contains the group name in the user node of the Active Directory tree. Example: memberOf User object class Object class of users in the Active Directory tree: Example: user When you have completed entering data in the LDAP Configuration screen, click the Test configuration button to verify that you have configured correctly. User authorization User authorization grants or denies users access to resources managed by RecoverPoint. User authorization is identical, regardless of whether the user was authenticated by RecoverPoint or by an LDAP server. User authorization can be limited to specific consistency groups. For details, refer to User Settings in Table 5 on page 120. Each RecoverPoint user is defined by a user name, a password, and a role. A role is a named set of access permissions. By assigning a role to users, the users receive all the access permissions defined by the role. Table 9 on page 125 lists the permissions that may be granted or denied to a role. 124 EMC RecoverPoint Release 3.3 Administrator’s Guide Getting Started Configuring roles Table 8 Table 9 To configure roles, from the System menu, select System Settings. Select User Settings, and click the Roles Tab. To delete a role, select a role and click Delete. To change permissions of a role, click Edit and select or deselect permissions as needed. To add a role, click Add and enter settings in the Add New Role dialog box. Refer to Table 8 on page 125 and Table 9 on page 125. Add New Role settings Setting Description Role name Name of the new RecoverPoint role to add. Permissions Select the access permissions to be granted to all persons who are assigned to this role. Permissions that may be granted or denied Permission Description Splitter Configuration Add or remove splitters, and attach or detach splitters to volumes. Group Configuration Create and remove consistency groups, and modify all group settings except those that are included in the data transfer, target image, and failover permissions, may bookmark images, and resolve settings conflict. Data Transfer Enable and disable access to image, and undo writes to the image access log. Target Image Enable and disable access to an image, resume distribution, and undo writes to the image access log. Failover Modify replication direction (use temporary and permanent failover), initiate failover, verify failover System Configuration Configure and manage e-mail alerts, SNMP, System Reports, rules, licenses, serial ID, account ID, syslog, and other system configuration settings. Security All commands dealing with roles, users, LDAP configuration, and security level. Access control 125 Getting Started Table 9 126 Permissions that may be granted or denied Permission Description Boxmgmt Install RecoverPoint appliances. Upgrade RecoverPoint appliance maintenance, including upgrading to a minor RecoverPoint release, upgrading to a major RecoverPoint release, replacing an RPA, and adding new RPAs. Web download Download RecoverPoint installation packages from the EMC web site. EMC RecoverPoint Release 3.3 Administrator’s Guide 3 Starting Replication Starting Replication Once RPAs and (host-based, fabric-based, or storage-based) splitters are installed and configured, you define what you want to replicate and how. The topics in this section are: ◆ ◆ ◆ ◆ ◆ Adding splitters................................................................................ Creating new consistency groups.................................................. Configuring replication policies .................................................... Modifying existing settings and policies ...................................... Manually attaching volumes to splitters ...................................... Starting Replication 128 132 143 158 173 127 Starting Replication Adding splitters Before you begin, make sure you are well acquainted with the concepts; “Splitters” on page 27 and “Documentation relevance per RecoverPoint product” on page 14. It is assumed that host-based, fabric-based, and array-based splitters have been installed as needed. For details, see splitter installation in the EMC RecoverPoint Deployment Manager Product Guide and technical notes for specific splitters. Before you begin, add splitters to the RecoverPoint system. Later on, when volumes are added, they will be automatically attached to all of the splitters that have access to that volume. Note: In RecoverPoint/SE only: Each installed CLARiiON splitter is automatically added to your configuration, and attached to all available volumes. If you are using RecoverPoint/SE, there is no need to perform the steps outlined in this section, and you can skip to “Creating new consistency groups” on page 132. For boot-from-SAN groups: When a consistency group is configured to boot from SAN, special considerations and procedures are necessary, please contact EMC Customer Service for more information. For Brocade splitters: If you added splitters based on a Connectrix AP-7600B or PB-48K-AP4-18 switch, make sure you follow the instructions in “Configure RecoverPoint for replication over the Connectrix device” in EMC RecoverPoint Deploying RecoverPoint with Connectrix AP-7600B and PB-48K-AP4-18 Technical Notes. For SANTap splitters: For SANTap splitters, switch login credentials must be defined for each splitter added to the RecoverPoint system, see “Splitter credentials” on page 269. For CLARiiON splitters: Although the two storage processors of a CLARiiON splitter are listed as separate entities (CLARiiON Splitter 1-A and CLARiiON Splitter 1-B), they are managed as a single entity. If you add or remove a splitter, the second storage processor instance is automatically added or removed. If you attach or detach a volume at one instance, the same volume will automatically be attached or detached at the other storage processor instance. For CLARiiON splitters, Navisphere login credentials must be defined for each splitter added to the RecoverPoint system, see “Splitter credentials” on page 269. 128 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Note: If a production storage volume is rolled back by a CLARiiON SnapView session, the CLARiiON splitter will automatically initialize a full synchronization (full sweep) of the production storage volume. A single CLARiiON splitter can be shared by up to four 4 RPA clusters. While attaching a CLARiiON splitter to a fifth RPA cluster appears to succeed, the splitter is in an error state for the newly attached RPA cluster. All splitter operations for this RPA cluster fail and return the following error: Maximum RPA clusters per splitter exceeded. Use the Remove Splitter command to remove the CLARiiON splitter from the fifth RPA cluster. How to add splitters to the RecoverPoint system Before you begin, make sure you are well acquainted with the concept of “Adding splitters” on page 128. To add splitters to your RecoverPoint system: 1. In the Navigation Pane, select Splitters. Notice the splitters, if any, listed in the Component Pane; these are the splitters that have already been added to the RecoverPoint system. 2. Right-click on Splitters and select Add New Splitter. Alternatively, click the Add New Splitter button. The Add Splitter Wizard is displayed. Note: If this consistency group contains a boot-from-SAN volume, skip to Step f. 3. In the first screen of the Add Splitter Wizard: Adding splitters 129 Starting Replication a. Click the Rescan Splitters button, to refresh the list of available splitters. b. Select the splitter to add. You can also select multiple splitters, or all splitters. To select or deselect specific splitters; hold down the Ctrl key on your keyboard, and select the first splitter. While still holding down the Ctrl key, select the second, third, etc. file. Clicking on a selected splitter deselects it. To select a range of splitters; hold down the Shift key, and click the first splitter. Scroll down to the last splitter you want to select with the Shift key still down, and select it. To select all splitters; click on any splitter, and click Ctrl+A. c. Add a splitter for every host that writes to any volume in a consistency group. Best practice is to select and add all available splitters. d. If you did not add any CLARiiON or SANTap splitters, skip this step. If you added a CLARiiON or SANTap splitter, the Enter login credentials screen is displayed, prompting you to enter login credentials for each CLARiiON or SANTap splitter added to the system. For support purposes, it is recommended that you enter credentials as soon as possible. e. If you did not add splitters based on a Connectrix® AP-7600B or PB-48K-AP4-18 switch, skip this step. If you added splitters based on a Connectrix AP-7600B or PB-48K-AP4-18 switch, make sure you follow the instructions in “Deploy RecoverPoint for this switch” in EMC RecoverPoint Deploying RecoverPoint with Connectrix AP-7600B and PB-48K-AP4-18 Technical Notes. f. If this group does not contain a boot-from-SAN volume, skip this step. If this splitter is the remote-site splitter for a boot-from-SAN volume, check the Show boot-from-san peer for other site’s host checkbox. 130 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication When you enable this setting, the list of splitters changes from those at the remote site to those at the production site. This happens because the boot-from-SAN volume at the remote site does not exist yet. It is created subsequently by replicating the production boot-from-SAN volume. Rather than configuring the remote-site boot-from-SAN volume, you specify the splitter on the production site and replicate the entire boot-from-SAN volume with the splitter to the remote site. You specify the remote-site splitter here for the benefit of the RecoverPoint system. Special considerations arise when attaching to a boot volume, please contact EMC Customer Service for more information. g. Click the Next button to view a summary. h. Click the Finish button to exit the Add Splitter Wizard. If an available splitter is not added, a warning is displayed in the status line of the RecoverPoint Management Application. 4. Click on the warning link for more information. If more splitters are available for addition, the Current System Warnings dialog box is displayed, with the following warnings: 5. Click on the warning to display a list of splitters that have not been added, and are still available for addition. 6. To add additional splitters to the RecoverPoint System, repeat this process. Adding splitters 131 Starting Replication Creating new consistency groups Before you begin, make sure you are well acquainted with the concept of “Consistency groups” on page 30. Note: In RecoverPoint only, you must define your splitters in the system before creating a consistency group. If you have not added splitters to your system, see “Adding splitters” on page 128. The New Consistency Group Wizard The New Consistency Group Wizard helps you to create consistency groups, and guides you through the following tasks: ◆ Configuring, protection, resource allocation, stretch clusters, advanced settings and policies, and compression, see “How to configure a new consistency group” on page 133. ◆ Configuring copies: Production source and local replica or remote replica (or both), see “How to configure the production copy” on page 134 and “How to configure the replica copies” on page 135. ◆ Specifying replication sets and adding volumes to each copy, see “How to add replication sets” on page 136. ◆ Adding journal volumes to each copy, see “How to configure journals” on page 138. To display the New Consistency Group Wizard: ◆ Select Consistency Groups in the Navigation Pane in the main window of the RecoverPoint Management Application, Right-click and select the Add Group option. -or- ◆ 132 Click the Add Group button in the toolbar above the Component Pane of the RecoverPoint Management Application. EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication How to configure a new consistency group This section describes how to create a new consistency group using the New Consistency Group Wizard. Before you begin, make sure you are well acquainted with the concepts; “Consistency groups” on page 30, and “Documentation relevance per RecoverPoint product” on page 14. Note: In RecoverPoint/SE only: RecoverPoint/SE can automatically provision journal volumes. To do so, it allocates dedicated RAID groups. Up to six RAID groups can be configured per site. If you are planning to use the automatic journal provisioning feature, make sure you have allocated five free hard disks per RAID group before beginning the following procedure. To create a new enabled consistency group: Note: To create a disabled consistency group (and manually add splitters, configure replication sets and volumes, attach volumes to the splitters, and start replication later), click the Finish button after Step 2 and skip Step 3. 1. Right-click on Consistency Groups in the Navigation Pane in the main window of the RecoverPoint Management Application and select the Add Group option. The Define the Consistency Group and its Settings screen of the New Consistency Group Wizard is displayed. In RecoverPoint only: • Creating a consistency group that contains a boot-from-SAN volume involves many special considerations, please contact EMC Customer Service for more information. • Volumes attached to a CLARiiON splitter cannot be in the same consistency group with volumes attached to a host-based splitter. They can be in the same RPA cluster in different consistency groups. If, however, volumes in one consistency group at a site are attached to a CLARiiON splitter, all consistency groups on that site must reside on a CLARiiON array. • The same consistency group can use a CLARiiON splitter at one site and a different splitter at the other site. Creating new consistency groups 133 Starting Replication 2. Enter the following information into the Define the Consistency Group and its Settings screen of the New Consistency Group Wizard. Table 10 Consistency Group General Settings Setting Values and description Name Enter a descriptive name for the consistency group. Primary RPA Select which RPA you prefer to replicate the consistency group. When the primary RPA is not available, the consistency group will switch to another RPA in the RPA cluster. Whether data will transfer when replication is switched to another RPA depends on the Allow data transfer even when group is handled by non-preferred RPA policy (Table 17 on page 149). Note: Best practice is to ensure that synchronous groups are set to use different RPAs than asynchronous groups. Mixing between the two may result in low I/O rates for the synchronous groups. It is also recommended that dynamic sync and purely synchronous consistency groups reside on different RPAs, whenever possible. See “Replication modes” on page 50 for more information on synchronous, asynchronous, and dynamic sync groups. The policy settings in the other sections of this screen are optional. The default values provide a practical configuration. It is recommended to accept the default settings unless there is a specific business need to set other policies. These settings can be changed at any later time by selecting the consistency group in the Navigation Pane and clicking on its Policy Tab. See “Configuring consistency group policies” on page 143. Note: To create a disabled consistency group (and manually add splitters, configure replication sets and volumes, attach volumes to the splitters, and start replication later), click the Finish button now and skip the next step. 3. Click the Next > button. The Define the Production Copy and its Settings screen is displayed. See “How to configure the production copy” on page 134. How to configure the production copy 134 Before you begin, make sure you are well acquainted with the concept of “Copies” on page 32. EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication To configure the production copy: 1. Specify the production site in the Production Site field. 2. Enter the following information into the General settings section: Table 11 Copy General Settings Setting Values and description Name Enter a descriptive name for the copy in RecoverPoint. Journal Compression Note: Not available in RecoverPoint/SE. Default = none The journal of the production storage is used only after failing over to another replica, which becomes the production source, so that the previous production source becomes a replica. It is recommended to compress the journal when forcing asynchronous replication. If the RPA is also the production source for transmission across the WAN (to a remote replica) for some other consistency group, compressing journals will affect the transfer rate over the WAN and is not recommended. Note: The following applies to journal compression at replicas, but not to the production source: • To change the value of journal compression, the consistency group must be in one of the following states: disabled, direct image access, or distributing. • If the consistency group is writing to this journal volume while you change the compression level, the existing journal will be lost. The other policy settings are optional. The default values provide a practical configuration. It is recommended to accept the default settings unless there is a specific business need to set other policies. To change these settings at a later time, click on the copy in the Navigation Pane, and select its Policy Tab in the Component Pane. See “Configuring copy policies” on page 152. 3. Apply your settings by clicking the Next > button. The Define any Replica Copies and their Settings screen is displayed, see “How to configure the replica copies” on page 135. How to configure the replica copies Before you begin, make sure you are well acquainted with the concept of “Copies” on page 32. Note: In RecoverPoint/SE only: Because there is a limitation of one splitter per site, in a three-copy configuration, make sure the local copy is stored on the same CLARiiON array as the production copy. Creating new consistency groups 135 Starting Replication To configure replica copies: 1. To create a local replica, select the Create Local Copy at <SiteName> checkbox, and enter a name for the replica in the Name field. 2. To create a remote replica, select the Create Remote Copy at <SiteName> checkbox, and enter a name for the replica in the Name field. The other policy settings are optional. The default values provide a practical configuration. It is recommended to accept the default settings unless there is a specific business need to set other policies. If you do want to specify values for the available settings, note that they are identical to the settings for the production copy. Configure these settings, as required, according to the instructions in “Configuring copy policies” on page 152. 3. Click the Next > button. The Select the Production Volumes for which to Create Replication Sets screen is displayed. See “How to add replication sets” on page 136. How to add replication sets Before you begin, make sure you are well acquainted with the concepts; “Replication sets” on page 33, “The production volumes” on page 36, “The replica volumes” on page 36, and “Documentation relevance per RecoverPoint product” on page 14. To add a new replication set to a consistency group: 1. Click the Rescan button to update the list of available volumes at the production site. 2. Select one or more production volumes to replicate. Note: In RecoverPoint only: Only the masked volumes at the specified site are displayed in the available volumes list. Therefore, in the Volume Details area, ensure the selected volumes are seen by all RPAs. If they are not, mask the unseen LUNs to RecoverPoint WWNs, click the Rescan button to update the list of available volumes, and redo this step. 3. Click the Next > button. The first of the Add Volume from <SiteName> to <RSetNum> screens is displayed. 4. Click the Rescan button to update the list of available volumes at the replica site. 136 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication 5. For each specified production volume, at each copy; a. Note the production volume specified in the Production Volume of <RSetNum> area at the top of the screen. b. From the Volumes at <SiteName> that can be Added to <RSetNum> list, select the volume that you want to replicate the specified production volume to. Note: The volume list only displays volumes at the site that are equal to, or greater than, the specified production volume. Use the Filter volumes by: fields to filter the volumes in the list by Product, Vendor, Name, UID or LUN. For best performance during failover, select a volume that is the same size as the one specified in the Production volume of <RSetNum> section. If a volume of the same size is not available, select a volume that is as similar in size as possible. Note: For CLARiiON splitter environments, you must select a volume that is exactly the same size as the one specified in the Production volume of <RSetNum> section. Note: In RecoverPoint only: Only the masked volumes at the specified site are displayed in the available volumes list. Therefore, in the Volume Details area, ensure the selected volumes are seen by all RPAs. If they are not, mask the unseen LUNs to RecoverPoint WWNs, click the Rescan button to update the list of available volumes, and redo this step. c. Click the Next > button. When all of the specified production volumes are assigned volumes at each copy, the Review Replication Set Configuration screen is displayed. 6. For each replication set, you can click the RSet<num> text in the Name column of the Replication Sets table and enter a descriptive name. Note: In RecoverPoint only, clicking on a cell in any of the copy columns (Production, Local, or Remote) displays the relevant volume’s information in the Volume Details section at the bottom of the screen. Creating new consistency groups 137 Starting Replication 7. Apply your settings by clicking the Next > button. • If you are using RecoverPoint, the Select Journal Volumes screen is displayed, see “How to configure journals” on page 138. • If you are using RecoverPoint/SE, the Select Journal Provisioning Method screen is displayed, see “How to configure journals in RecoverPoint/SE” on page 139. How to replicate Oracle For instructions on how to replicate an Oracle database, including using Oracle hot backup procedures with RecoverPoint bookmarks for point-in-time snapshots and quick testing and disaster recovery, refer to Replicating Oracle with EMC® RecoverPoint Technical Notes. How to configure journals Before you begin, make sure you are well acquainted with the concepts; “Journals” on page 33, “The production journal volume” on page 36, “The replica journal volumes” on page 37, and “Documentation relevance per RecoverPoint product” on page 14. To configure journals: 1. For each copy; a. Click the Rescan button to update the list of available volumes. b. Select the volumes that you want to add to the journal at the copy site. Multiple volumes can be selected. Use the Filter volumes by: fields to filter the volumes in the list by Product, Vendor, Name, UID or LUN. For best performance, select volumes that are identical in size. If identically sized volumes are not available, select volumes that are similar in size. Note: In RecoverPoint only: Only the masked volumes at the specified site are displayed in the available volumes list. Therefore, in the Volume Details area, ensure the selected volumes are seen by all RPAs. If they are not, mask the unseen LUNs to RecoverPoint WWNs, click the Rescan button to update the list of available volumes, and redo this step. c. Click the Next > button. The Create Consistency Group screen is displayed. 138 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication 2. Review the settings in the Create Consistency Group screen, and verify that they are correct. 3. Make sure you have read “Starting replication” on page 141. Note: If you do not wish to start data transfer immediately, uncheck the Start data transfer immediately checkbox. Before you start transfer to any replica, make certain that the replica volumes are unmounted from any hosts and any volume groups are deported from the logical volume manager (AIX, HP-UX, Windows, and Solaris have volume managers built into the operating system; Veritas Volume Manager can be used with any of these operating systems). 4. To create the consistency group and apply all of the specified settings, click the Finish button. Swapping LUN numbers of journal volumes Swapping LUN numbers for LUNs that have already been exposed to an RPA cluster should be avoided when possible. When it cannot be avoided, it should be done according to the following procedure. Loss of the journal cannot be avoided. 1. Disable the consistency group. Disabling the consistency group causes a full sweep of the consistency group when it is enabled. This procedure will also cause journal loss. 2. Remove journals from the consistency group. 3. Swap the LUNs (on the storage array). 4. Add LUNs as journals. 5. Enable the consistency group. A full sweep of the consistency group occurs. How to configure journals in RecoverPoint/SE Before you begin, make sure you are well acquainted with the concepts; “Journals” on page 33, “The production journal volume” on page 36, “The replica journal volumes” on page 37, and “Documentation relevance per RecoverPoint product” on page 14. In RecoverPoint/SE you can: ◆ manually configure journal volumes by following the steps in “To manually configure journal volumes” on page 141. Creating new consistency groups 139 Starting Replication ◆ allow RecoverPoint/SE to automatically provision and configure them for you by following the steps in “To automatically provision journal volumes” on page 140. To automatically provision journal volumes To allow RecoverPoint/SE to automatically provision and configure the journal volumes for you (default option): Note: Make sure you have allocated five free hard disks per RAID group, per site, before beginning the following procedure. 1. If this is the first consistency group you are creating, specify the number of RAID groups to create in the Select number of raid groups to create field. Otherwise, skip this step. Note: Once these RAID settings are defined, they cannot be modified or undone, as this option will not be displayed again. 2. Decide whether you wish to provide a pre-defined value for the size of each copy journal, or have RecoverPoint/SE calculate the required journal size based on bandwidth and a required protection window. • To pre-define the journal size at each copy, click the Predefined Journal Size radio button and enter a value for the journal size (in GB). Note: The specified journal size will be applied to every journal of every copy. • To have RecoverPoint/SE calculate the required journal size based on bandwidth and a required protection window, click the Bandwidth radio button, and enter both values (see “Required protection window” on page 153). The required journal size is displayed under the required protection window as the Calculated journal size. Note: The specified calculated journal size will be applied to every journal at every copy. d. Click the Next > button. The specified RAID groups are created, and cannot be modified. The Create Consistency Group screen is displayed. 140 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication 3. Review the settings in the Create Consistency Group screen, and verify that they are correct. Note: Make sure you have read “Starting replication” on page 141. If you do not wish to start data transfer immediately, uncheck the Start data transfer immediately checkbox. Before you start transfer to any replica, make certain that the replica volumes are unmounted from any hosts and any volume groups are deported from the logical volume manager (AIX, HP-UX, Windows, and Solaris have volume managers built into the operating system; Veritas Volume Manager can be used with any of these operating systems). 4. To create the consistency group and apply all of the specified settings, click the Finish button. ! CAUTION If you selected to start data transfer immediately, a full sweep synchronization process begins on all volumes in the consistency group. To manually configure journal volumes To manually configure journal volumes: 1. Select the Manually Select Journal Volumes radio button. 2. Click the Next > button. The Select Journal Volumes screen is displayed. 3. Follow the instructions in “How to configure journals” on page 138. Starting replication When a consistency group, copy, or volume is defined in RecoverPoint for the first time, its volumes are initialized, see “First-time initialization” on page 224. By default, RecoverPoint writes the first snapshot directly to the replica, without first writing it to the journal. You can override the default (refer to Perform fast first-time initialization in Table 17 on page 149) to write the initialization snapshot first to the journal. This option is more time-consuming but provides greater data protection. Once first-time initialization is completed, each consistency group will be in one of the following states: Creating new consistency groups 141 Starting Replication ◆ Replicating: The consistency group is enabled, the splitter is replicating to the RPAs, the RPAs are transferring to the replica journal or journals. The snapshots from the journal are then distributed to the replica storage. ◆ Marking: The consistency group is enabled, the splitter is replicating to the RPAs, but the RPAs are unable to transfer to the replica journal. The location of the changes is stored in RPA1, RPA2, as well as on the production journal volume. When contact with the remote site is restored, the remote replica is synchronized, but only at those locations that were marked as having changed. Then transfer and replication can resume. The following can cause the RPA to go to marking mode: • WAN unavailable • RPAs at remote site not available (for instance, loss of power) • Transfer disabled manually • High load (temporary bottleneck in replication environment) ◆ 142 No marking/no replication: the splitter does not write to the RPAs. This can be caused by a manually disabled consistency group or by a disaster at the production site (no RPAs available). EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Configuring replication policies Replication with the RecoverPoint system is policy-driven. A replication policy, based on the particular business needs of your company, is uniquely specified for each consistency group, and each copy. The policy comprises a set of settings that collectively governs the way in which replication is carried out. Replication behavior changes dynamically during system operation in light of the policy, the level of system activity, and the availability of network resources. The replication policy settings are presented in “Configuring consistency group policies” on page 143 and “Configuring copy policies” on page 152. Configuring consistency group policies The tables in this section describe the available policy settings for consistency groups, they include: ◆ ◆ ◆ ◆ ◆ ◆ Table 12 “Consistency Group General Settings” “Consistency Group Compression Policy Settings” “Consistency Group Protection Policy Settings” “Consistency Group Resource Allocation Policy Settings” “Consistency Group Stretch Cluster / SRM Support Policy Settings” “Consistency Group Advanced Policy Settings” Consistency Group General Settings Setting Values and description Name The name of the consistency group. Primary RPA The RPA that you prefer to replicate the consistency group. When the primary RPA is not available, the consistency group will switch to another RPA in the RPA cluster. Whether data will transfer when replication is switched to another RPA depends on the Allow data transfer even when group is handled by non-preferred RPA policy (Table 17 on page 149). Note: Best practice is to ensure that synchronous groups are set to use different RPAs than asynchronous groups. Mixing between the two may result in low I/O rates for the synchronous groups. It is also recommended that dynamic sync and purely synchronous consistency groups reside on different RPAs, whenever possible. See “Replication modes” on page 50 for more information on synchronous, asynchronous, and dynamic sync groups. Configuring replication policies 143 Starting Replication Table 13 Table 14 144 Consistency Group Compression Policy Settings Setting Values and description Enable Compression Default=enabled To compress data before transferring over the WAN. Can reduce transfer time significantly. Only available if license supports compression. Not relevant for CDP (single-site) configurations. Compression Level Default = 10 Compression decreases transfer time, but increases computational effort. 1: Highest level of compression; requires more RPA resources 10: Fastest compression Consistency Group Protection Policy Settings Setting Values and description Asynchronous Default=enabled When enabled, RecoverPoint replicates consistency group data asynchronously, see “Asynchronous replication mode” on page 51. Synchronous Default=disabled When enabled, RecoverPoint replicates consistency group data synchronously, see “Synchronous replication mode” on page 51. Dynamic by latency Default=disabled Only relevant for synchronous replication mode. When enabled, RecoverPoint alternates between synchronous and asynchronous replication modes, as necessary, according to latency conditions (the number of milliseconds or microseconds between the time the data is written to the local RPA and the time that it is written to the RPA or journal at the remote site), see “Dynamic sync mode” on page 52. Start async replication above: When the specified limit is reached, RecoverPoint automatically starts replicating asynchronously, see “Asynchronous replication mode” on page 51. Resume sync replication below: When the specified limit is reached, RecoverPoint goes back to replicating synchronously, see “Synchronous replication mode” on page 51. EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Table 14 Consistency Group Protection Policy Settings Setting Values and description Dynamic by throughput Default=disabled Only relevant for synchronous replication mode. When enabled, RecoverPoint alternates between synchronous and asynchronous replication modes, as necessary, according to throughput conditions (the total writes that reach the local RPA, per copy, in kb/s), see “Dynamic sync mode” on page 52. Start async replication above: When the specified limit is reached, RecoverPoint automatically starts replicating asynchronously, see “Asynchronous replication mode” on page 51. Resume sync replication below: When the specified limit is reached, RecoverPoint goes back to replicating synchronously, see “Synchronous replication mode” on page 51. System Optimized Lag Default = enabled This setting defines the RPO of the consistency group. To have RecoverPoint determine the best lag for an efficient and practical solution. If any other solution is needed, please contact EMC Customer Service. Lag Default = disabled This setting defines the RPO of the consistency group, and is set manually, in MB, GB, writes, seconds, minutes, or hours. In RecoverPoint, lag starts being measured when a write made by the production host reaches the local RPA, and stops being measured when the write reaches either the target RPA, or the target journal, depending on the Measure lag when writes reach the target RPA (opposed to journal) setting. Note: When the Allow regulation setting is disabled, the selected RPO is not guaranteed, but the system will try it's best to replicate within the RPO setting, without affecting host performance. Configuring replication policies 145 Starting Replication Table 14 Consistency Group Protection Policy Settings Setting Values and description Allow Regulation Default = disabled Allows RecoverPoint to control the acknowledgement of writes back to the host in the case of bottlenecks or insufficient resources that would otherwise prevent RecoverPoint from replicating data. When enabled, slows host applications when approaching the lag policy limit. When the system cannot replicate the current incoming write-rate while guaranteeing the lag setting, the system delays acknowledgements to guarantee that RPO is always enforced. Additionally, if there is a bottleneck in the system, the system will regulate host applications instead of entering a high load state. Note: For BFS groups (consistency groups configured to boot from the SAN) in Windows KDriver environments, enabling this policy is discouraged. When disabled, the system will use policies and limits as guidelines and make an effort to meet them. In synchronous replication mode (see “Synchronous replication mode” on page 51), although the Allow Regulation checkbox is disabled, this policy is automatically enabled, and cannot be modified. In dynamic sync mode (see “Dynamic sync mode” on page 52), the Allow Regulation checkbox is enabled, but the user setting only applies when the group is replicating asynchronously. During synchronous replication, host applications are always regulated. Note: Since host applications are always regulated in sync replication mode, replicating synchronously with BFS groups (consistency groups configured to boot from the SAN) in Windows KDriver environments, is discouraged. Minimize 146 Default = minimize lag Only relevant for remote replication over the WAN or Fibre Channel. Lag: To keep the lag (difference) between sites to a minimum; system will use as much bandwidth as needed. Lag is the maximum offset between writing data to the local RPA and writing it to the RPA or journal at the remote site. Intervals between snapshots will depend on available bandwidth and I/O load. Bandwidth: To use as little bandwidth as possible while making sure that maximum lag policy is not exceeded (by keeping the lag in the RPA memory for as long as possible before reaching the specified lag setting). NOTE: If Minimize = Bandwidth, in CRR specifications, lag must be set to System Optimized Lag; otherwise the system will issue an error. EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Table 15 Consistency Group Resource Allocation Policy Settings Setting Values and description Priority Default = Normal Only relevant for remote replication over the WAN or Fibre Channel, when two or more consistency groups are using the same Primary RPA. Select the priority assigned to this consistency group. The priority determines the amount of bandwidth allocated to this consistency group in relation to all other consistency groups. Possible values are: Idle, Low, Normal, High, and Critical If a consistency group is set to Idle, all other consistency groups that are set to a greater value will receive a greater share of the RPA resources in the event of contentions. The consistency group set to Idle will only receive resources when no other consistency group requires them. Note: Consistency groups with Idle settings are still provided some resources when other groups are replicating on the same RPA, even if the consistency group’s Primary RPA is heavily loaded. However, unless the consistency group’s Primary RPA is very heavily loaded, the effects of an Idle setting may not be noticeable. Bandwidth Limitation Default = unlimited Used to limit the bandwidth that is made available to the consistency group. Only relevant for remote replication over the WAN. Note: Bandwidth limitation is not supported for CDP configurations, or remote replication over Fibre Channel. Unlimited: This consistency group may use as much available bandwidth as needed to meet policies. Limited: This consistency group may use up to the specified amount of bandwidth. The feature works as follows: • System sums bandwidth limitations of all consistency groups on a single RPA, and limits the outgoing throughput from the RPA to that sum. • If consistency groups with settings of ‘limited’ and ‘unlimited’ run on the same RPA, the effect is that there is no limit to the outgoing throughput from the RPA. Note: When limiting the bandwidth, ensure that groups with limited bandwidth are not configured to run on an RPA with other consistency groups whose bandwidth is unlimited or this feature will not work. See “Consistency Group General Settings” on page 143 for more information on setting the primary RPA. Configuring replication policies 147 Starting Replication Table 16 148 Consistency Group Stretch Cluster / SRM Support Policy Settings Setting Values and description Desired cluster mode Default = none Use RecoverPoint/CE Default = disabled Check this option to enable stretch cluster support. Group is managed by RecoverPoint CE. RecoverPoint can only monitor. Only relevant if Use RecoverPoint/CE is enabled. Check this option to activate stretch cluster support. When activated, hosts in a Microsoft Cluster can automatically fail over from one site to the other. RecoverPoint assures that the application data is in the identical state at the original site and the failover site, so that the failover is transparent to the application. When activated, all RecoverPoint user-initiated capabilities are disabled. The user cannot access images, change policies, or change volumes. Bookmarks cannot be created in the RecoverPoint Management Application, but they can be created using the RecoverPoint command-line interface bookmark commands. Group is in maintenance mode. It is managed by RecoverPoint, CE can only monitor. Only relevant if Use RecoverPoint/CE is enabled. Check this option for planned or unplanned maintenance of the RecoverPoint system. When activated, stretch cluster support is disabled and user-initiated RecoverPoint capabilities are enabled. When activated, hosts are not able to fail over to the other site, although they can still fail over to another host within the same site. When activated, RecoverPoint user-initiated capabilities, such as image access, image testing, changing policies and creating bookmarks are available. Use SRM Default = disabled Check this option to enable VMware SRM support. This option is valid when a RecoverPoint Storage Replication Adapter for VMware Site Recovery Manager is installed on the vCenter Servers. For more information about the RecoverPoint Adapter for VMware SRM, refer to the EMC RecoverPoint Adapter for VMware Site Recovery Manager Release Notes, available on Powerlink. Group is managed by SRM. RecoverPoint can only monitor. Only relevant if Use SRM is enabled. Check this option to activate VMware SRM support. When activated, VMware Site Recovery Manager manages the group and can perform failover and test failover from one site to the other. When activated, all RecoverPoint user-initiated capabilities are disabled. The user cannot access images, change policies, or change volumes. Bookmarks cannot be created in the RecoverPoint Management Application, but they can be created using the RecoverPoint command-line interface bookmark commands. EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Table 16 Table 17 Consistency Group Stretch Cluster / SRM Support Policy Settings Setting Values and description Group is in maintenance mode. It is managed by RecoverPoint, SRM can only monitor. Only relevant if Use SRM is enabled. Check this option for planned or unplanned maintenance of the RecoverPoint system. When activated, VMware SRM support is disabled and user-initiated RecoverPoint capabilities are enabled. When activated, all RecoverPoint user-initiated capabilities, such as image access, image testing, changing policies, and creating bookmarks are available. Consistency Group Advanced Policy Settings Setting Values and description Reservations Support Default = enabled Enable only if hosts are clustered or if one of the hosts runs AIX without reservations disabled. For Reservations Support settings for AIX hosts with host-based splitters, refer to EMC® RecoverPoint Deploying RecoverPoint with AIX hosts. Allow data transfer even when group is handled by non-preferred RPA Default = enabled Each RPA cannot transfer the data of more than a specific number of consistency groups (see the Consistency groups in cluster setting in the General configuration limits table of the EMC RecoverPoint and RecoverPoint/SE Release Notes of each RecoverPoint version for this limit). If a consistency group contains both a local and remote replica, it counts as two consistency groups toward this limit. To make sure that if an RPA fails, it can always switch over to another RPA, the system will not allow more than the maximum number of consistency groups to be configured in the entire system when Allow data transfer even when group is handled by non-preferred RPA = enabled. When the primary RPA is unavailable for any reason, the consistency group is switched over to a non-preferred RPA. Enable to have non-preferred RPA perform all aspects of replication, including transferring data to the replica or replicas. Disable to prevent the non-preferred RPA from transferring data. Marking will continue as usual. When operation is restored to the primary RPA, data transfer will resume. Configuring replication policies 149 Starting Replication Table 17 Consistency Group Advanced Policy Settings Setting Values and description Measure lag when writes reach the target RPA (as opposed to the journal) Default=enabled Only relevant for CRR configurations, and synchronous replication to a CDP copy. Not relevant for asynchronous replication to a CDP copy. Note: It is recommended to leave this setting as is. By default, your RecoverPoint system is configured to measure lag and generate ACKs when writes reach the remote RPA. Disable this setting to instruct RecoverPoint to measure lag and generate ACKs when writes reach the remote journal, instead of the remote RPA. When enabled, this policy provides faster performance in both synchronous and asynchronous replication modes, by reducing both latency and lag. When Allow Regulation is enabled (see “Allow Regulation” on page 146), and lag is reduced, so is the potential requirement to regulate the host applications. In synchronous replication mode (see “Synchronous replication mode” on page 51) write performance is substantially higher when this policy is enabled. However, with this policy enabled, RecoverPoint does provide a slightly lower level of data security in the rare case of a simultaneous local and remote RPA disaster. You can check the difference in write performance by selecting a group in the Navigation Pane and clicking the Statistics Tab in the Component Pane. Note the number of Incoming Writes in the bottom right section of the Statistics Tab during normal replication, and then compare this to the number of Incoming Writes in the same section, when Measure lag when writes reach the target RPA (as opposed to the journal) is disabled. 150 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Table 17 Consistency Group Advanced Policy Settings Setting Values and description Distribute group Default=disabled Note: Both enabling and disabling this setting causes the journal of all copies in the consistency group to be lost. Allows group writes to be distributed across multiple RPAs, significantly heightening the maximum available RPA throughput, and therefore, allowing for a significantly larger group. For throughput and IOPS performance statistics (during synchronous and asynchronous replication) and feature limitations, see the EMC RecoverPoint and RecoverPoint/SE Release Notes. For more information on distributed consistency groups, see “Distributed consistency groups” on page 58. Enable to specify secondary RPAs. When enabled, a minimum of one, and a maximum of three secondary RPAs can be selected. There is only a small improvement in performance when a group is run on three RPAs. However, there is a steep improvement in performance when a group is run on four RPAs. Note: Before changing this setting, make sure all preferred RPAs (both primary and secondary) are connected by Fibre Channel and can see each other in the SAN. For more information, see “What should I know before setting a group as distributed?” on page 59. Snapshot Granularity Default = fixed (per second) Fixed (per write): To create a snapshot for every write operation, over a specific (local or remote) link. Fixed (per second): To create one snapshot per second, over a specific (local or remote) link. Dynamic: To have the system determine the snapshot granularity of a specific (local or remote) link, according to available resources. Note: For distributed consistency groups, the snapshot granularity of all links in the consistency group can be no finer than one second. See “Distributed consistency groups” on page 58 for more information. Configuring replication policies 151 Starting Replication Table 17 Consistency Group Advanced Policy Settings Setting Values and description Perform fast first-time initialization Default = enabled Only relevant for initializations that occur for the first time. Note: This setting is set per (local or remote) link. During normal replication, RecoverPoint transfers data from production to the replica journal, and then from the replica journal to the replica storage. During first-time initialization, sending data through the journal is unnecessary, since the replica does not contain previous data that can be used to construct a complete image, and a complete image must be transferred before failover is possible. When this policy is enabled, RecoverPoint transfers data directly to the replica storage. The data is not stored in the journal first, and consequently, the initialization process is substantially shorter. In this case, the replica is not consistent with production until the transfer of the whole image to the replica storage is complete. Therefore, if a disaster were to strike at the production site before the transfer of the image was complete, you would not be able to fail over to the replica. When this policy is disabled, RecoverPoint transfers data to the replica journal, and only then from the replica journal to the replica storage. Disabling this policy is useful, for example, when disabling and enabling an existing consistency group, causing the group to be initialized. In this case, RecoverPoint may be able to use the existing data at the replica site (journal and storage) to construct a complete image, which is required for failover purposes. To enable failover during initialization, it is recommended to also disable the Allow distribution of snapshots that are larger than capacity of journal volumes policy (see “Allow distribution of snapshots that are larger than capacity of journal volumes” on page 157). Configuring copy policies The tables in this section describe the available policy settings for copies, they include: ◆ ◆ ◆ ◆ 152 “Copy General Settings” “Copy Protection Policy Settings” “Copy Journal Policy Settings” “Copy Advanced Policy Settings” EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Table 18 Copy Protection Policy Settings Setting Values and description Required protection window Default = disabled The protection window indicates how far in time the replica image can be rolled back. Click the checkbox to define a required protection window. Specify the length of the required protection window. When the required protection window is defined, the status of the Current and the Predicted Protection Windows are displayed in the Journal Tab of the copy (see “The Journal Tab” on page 196). In addition, the system will raise an event in any of the following cases: • Current Protection Window becomes insufficient. • Current Protection Window becomes sufficient. • Predicted Protection Window becomes insufficient. • Predicted Protection Window was insufficient and becomes sufficient. Enable RecoverPoint snapshot consolidation Default = disabled Snapshots are consolidated to allow for the storage of a longer history in the copy Journal (see “Automatic snapshot consolidation” on page 41). The system will raise an event in any of the following cases: • The specified group is not distributing. • The specified snapshots have not reached the copy storage. • The consolidation times are not far apart enough for significant changes to have occurred. There must be a least 1 GB of space between the snapshots being consolidated. • Another consolidation is in progress on the same RPA. NOTE: Snapshot consolidation cannot be enabled for a group that is part of a group set, see “Automatic periodic bookmarking” on page 237. When RecoverPoint snapshot consolidation is enabled, the Predicted Protection Window is not calculated. For snapshot consolidation, the minimum journal size is 30GB, see “Journal size with snapshot consolidation” on page 34. Do not consolidate any snapshots for at least Default = 2 days Minimum = 12 hours. The period during which snapshot data is not to be consolidated. The period’s start time is always today, and the period’s end time is n hours / days / weeks / months ago. The following conditions apply: • Must be a minimum of12 hrs. • If no daily or weekly consolidations are specified, the remaining snapshots are consolidated monthly. Configuring replication policies 153 Starting Replication Table 18 154 Copy Protection Policy Settings Setting Values and description Consolidate snapshots that are older than x to one snapshot per day for y days Default = 5 days Snapshots are consolidated every ~24 hours. Select the Indefinitely checkbox to consolidate all subsequent snapshots in ~24 hour intervals. The following conditions apply: • If the Indefinitely checkbox is not selected, and no weekly consolidations are specified, the remaining snapshots are consolidated monthly. • If the Indefinitely checkbox is selected, weekly and monthly consolidations are disabled, and the remaining snapshots are consolidated daily. Consolidate snapshots that are older than x to one snapshot per week for y weeks Default = 4 weeks Snapshots are consolidated every ~168 hours. Select the Indefinitely checkbox to consolidate all subsequent snapshots in ~168 hour intervals. The following conditions apply: • If the Indefinitely checkbox is not selected, the remaining snapshots are consolidated monthly. • If the Indefinitely checkbox is selected, monthly consolidations are disabled, and the remaining snapshots are consolidated weekly. EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Table 19 Copy Journal Policy Settings Setting Values and description Maximum Journal Lag Default = unlimited Defines the data access aspects of RTO for each copy. Note: RTO includes the time for trying to fix the problem without a recovery, the recovery itself (RecoverPoint's role in an organization's RTO, which is controlled by this setting), tests and the communication to the users. Decision time for users is not included. When data is received by the RPA faster than it can be distributed to storage volumes, it accumulates in the journal. The Maximum Journal Lag is the maximum amount of snapshot data (in bytes, KB, MB, or GB) that is permissible to hold in the replica journal before distribution to the replica storage. In other words, the amount of data that would have to be distributed to the replica storage before failover to the latest image could take place. In terms of RTO this is the maximum time that would be required in order to bring the replica up-to-date with production. When the Maximum Journal Lag value is reached, the system switches to fast-forward (three-phase) distribution mode, and no longer retains rollback information. As soon as the lag is within the allowed limits, rollback data is retained again. See “Three-phase distribution” on page 97 for more information. Proportion of journal allocated for image access log (%) Default = 20 A host may access its journal for testing. To test, it may be necessary for the host to write to the journal. These writes are written to the image access log and rolled back as soon as testing is completed. Proportion of journal allocated for image access log determines how much may be written to the journal. Journal size limit (GB) Default = 1200 For RecoverPoint to function correctly, the journal size limit must be set for each journal. If you need to increase any journal size beyond this limit, modify the size limit accordingly. Maximum size of any volume of the journal = 2 TB; maximum total journal size = 10 TB. Configuring replication policies 155 Starting Replication Table 20 Copy Advanced Policy Settings Setting Values and description Host OS Default = Other/Mixed Select the operating system of the host writing to the volumes in the consistency group. If the hosts are not all running the same operating system, select Others/Mixed. If one or more of the volumes in the consistency group resides on an ESX server, set as follows: • VMWare ESX Windows - when running a guest Windows OS with a fabric-based or CLARiiON splitter • Windows - when running a guest Windows OS with a host-based splitter • VMWare ESX - when running a guest OS other than Windows, with a fabric-based or CLARiiON splitter Note: RecoverPoint does not support the replication of virtual machines running a guest OS other than Windows, with a host-based splitter. 156 Reservations Policy Default = auto The default value should be used except in the following case. Reservations Policy = SCSI-2 is required if all the following are true: • Windows host-based splitter and Microsoft Cluster Servers earlier than Windows 2008 clusters • DMX storage with microcode before 5772 Other values should not be used unless specifically instructed to do so by EMC Customer Service. Failall variant Default = auto This value determines the type of error message returned to a host that is trying to access a volume for which image access is not enabled. Do not change this value unless specifically instructed to do so by EMC Customer Service. EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Table 20 Copy Advanced Policy Settings Setting Values and description Allow distribution of snapshots that are larger than capacity of journal volumes Default = enabled In order to fail over to a replica, there must be one complete and consistent image at the replica site. This image can consist of multiple snapshots, contained solely in the replica journal, solely in the replica storage, or a combination of both. In certain situations (for example, after a lengthy communication outage on the WAN, or a first-time initialization) RecoverPoint may need to transfer a snapshot that is larger than the capacity of the journal. When this policy is enabled, RecoverPoint starts writing the data of the snapshot (that is larger than the replica journal) to the replica storage while the additional data from the same snapshot is still being received by the replica journal. In this case, if a disaster were to strike at the production site before the complete image was transferred to the replica storage, it would not be possible to fail over to the replica. If the ability to fail over to the pre-distribution image in the case of a disaster during the initialization process is a requirement, disable this policy. When this policy is disabled, the system automatically pauses transfer when the last complete image is about to be removed from the replica, providing the opportunity to: • increase the journal’s capacity (see “How to modify an existing journal” on page 170). When transfer is re-enabled, RecoverPoint will synchronize the writes that were made after transfer was paused. • prepare a backup. After the backup is prepared, re-enable this policy, secure in the knowledge that if a disaster should strike the production site during the initialization process (before the complete image was transferred to the replica storage), you could restore the last complete and consistent image from backup. Configuring replication policies 157 Starting Replication Modifying existing settings and policies Before you begin, make sure you are well acquainted with the concepts of “RecoverPoint Management Application” on page 29, “Consistency groups” on page 30 and “Documentation relevance per RecoverPoint product” on page 14. After a consistency group has been created in the RecoverPoint system, use the following procedures to modify its settings and policies. The following sections deal with the topics: ◆ ◆ ◆ ◆ How to modify an existing consistency group The following sections deal with the topics: ◆ ◆ ◆ ◆ ◆ ◆ ◆ How to modify the group replication policy 158 “How to modify an existing consistency group” “How to modify an existing copy” “How to modify an existing replication set” “How to modify an existing journal” “How to modify the group replication policy” “How to add a new copy” “How to remove a copy” “How to add a new replication set” “How to remove a replication set” “How to enable a consistency group and start transfer” “How to disable a consistency group” To modify the policy settings of an existing consistency group, select the consistency group in the Navigation Pane, and click its Policy Tab in the Component Pane. All of the “Configuring consistency group policies” on page 143 can be modified through the consistency group’s Policy Tab. EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication How to add a new copy To add a new copy to an existing consistency group: 1. Either: • Right-click on the consistency group name in the Navigation Pane and select the Add copy option. or • Select a consistency group name in the Navigation Pane, and click the Add Copy button above the Component Pane. 2. Enter a name for the new copy in the Name field. The other policy settings are optional. The default values provide a practical configuration. It is recommended to accept the default settings unless there is a specific business need to set other policies. If you do want to specify values for the available settings, note that they are identical to the settings for the production copy. Configure these settings, as required, according to the instructions in “Configuring copy policies” on page 152. 3. Click the Finish button. How to remove a copy To remove a copy, either: ◆ Right-click on the copy name in the Navigation Pane and select the Remove Copy option. or ◆ Select a copy in the Navigation Pane, and click the Remove Copy button above the Component Pane. See also: “Copy commands” on page 193 Modifying existing settings and policies 159 Starting Replication How to add a new replication set To add a new replication set to an existing consistency group: 1. Select a consistency group name in the Navigation Pane, and click the Add Replication Set button above the Component Pane. The Add Replication Set Wizard is displayed. 2. Follow the instructions in “How to add replication sets” on page 136. How to remove a replication set To remove a replication set from an existing consistency group: 1. Select the consistency group in the Navigation Pane 2. Click the Replication Sets Tab in the Component Pane 3. Select a replication set 4. Click the Remove Replication Sets button above the Component Pane How to enable a consistency group and start transfer To enable a disabled consistency group, and (optionally) start transfer: 1. Either: • Right-click on the consistency group name in the Navigation Pane and select the Enable group option. or • Select Consistency Groups in the Navigation Pane. In the Consistency Group Tab in the Component Pane, select one or more disabled consistency groups and click the Enable Group button above the Component Pane. 160 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication The Enabling group dialog box is displayed. Note: Before you start transfer to any replica, make certain that the replica volumes are unmounted from any hosts and any volume groups are deported from the logical volume manager (AIX, HP-UX, Windows, and Solaris have volume managers built into the operating system; Veritas Volume Manager can be used with any of these operating systems). 2. Unless you wish to initialize the replica from a backup, or group configuration is not complete, click Yes. The replica will synchronize with the source (full sweep) and transfer will start. If you wish to initialize the replica from a backup, clear the Start data transfer immediately checkbox. Click Yes. Initialize the replica from a backup, use the Clear Markers command (Table 25 on page 193). Unmount any hosts from the replica, deport any volume groups, then activate transfer with the Start Transfer button. Modifying existing settings and policies 161 Starting Replication By default, RecoverPoint writes the first snapshot directly to the replica, without first writing it to the journal. You can override the default (refer to Perform fast first-time initialization in Table 17 on page 149) to write the initialization snapshot first to the journal. This option is more time-consuming but provides greater data protection. Once first-time initialization is completed, each consistency group will be in one of the following replication modes: • Replicating: The consistency group is enabled, the splitter is replicating to the RPAs, the RPAs are transferring to the replica journal or journals. If Image access is disabled (default state), the snapshots from the journal are also distributed to the replica storage. • Marking: The consistency group is enabled, the splitter is replicating to the RPAs, but the RPAs are unable to transfer to the replica journal. The location of the changes is stored in RPA1, RPA2, as well as on the production journal volume. When contact with the remote site is restored, the remote replica is synchronized, but only at those locations that were marked as having changed. Then transfer and replication can resume. The following can cause the RPA to go to marking mode: – WAN unavailable – RPAs at remote site not available (for instance, loss of power) – Transfer disabled manually – High load (temporary bottleneck in replication environment) • No marking/no replication: the splitter does not write to the RPAs. This can be caused by a manually disabled consistency group or by a disaster at the production site (no RPAs available). 162 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication How to disable a consistency group To disabled an enabled consistency group, either: ◆ Right-click on the consistency group name in the Navigation Pane and select the Disable group option. or ◆ Select Consistency Groups in the Navigation Pane. In the Consistency Group Tab in the Component Pane, select one or more enabled consistency groups and click the Disable Group button above the Component Pane. Note: Disabling a group stops all replication, deletes journals, and causes full sweep on all copies in the group when group is re-enabled. How to modify an existing copy The following procedures will guide you through the process of modifying an existing copy: ◆ ◆ ◆ ◆ ◆ “How to modify the copy replication policy” “How to remove a copy” “How to disable a copy” “How to enable a copy” “How to modify the journal of a copy” Modifying existing settings and policies 163 Starting Replication How to modify the copy replication policy How to remove a copy To modify the policy settings of an existing copy, select the copy in the Navigation Pane, and click its Policy Tab in the Component Pane. All of the “Configuring copy policies” on page 152 can be modified through the copy’s Policy Tab. To remove a copy, either: ◆ Right-click on the copy name in the Navigation Pane and select the Remove Copy option. or ◆ Select a copy in the Navigation Pane, and click the Remove Copy button above the Component Pane. See also: “Copy commands” on page 193 How to disable a copy To disable an enabled copy, either: ◆ Right-click on the copy name in the Navigation Pane and select the Disable Copy option. or ◆ Select a copy in the Navigation Pane, and click the Disable Copy button above the Component Pane. Note: The Production copy cannot be disabled. Caution: Disabling a copy stops all replication, deletes journals, and causes full sweep when copy is re-enabled. See also: “Copy commands” on page 193 164 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication How to enable a copy To enable a disabled copy, either: ◆ Right-click on the copy name in the Navigation Pane and select the Enable Copy option. or ◆ Select a copy in the Navigation Pane, and click the Enable Copy button above the Component Pane. Note: Enabling a copy causes a full sweep synchronization of the volumes at the copy. Before you start transfer to any replica, make certain that the replica volumes are unmounted from any hosts and any volume groups are deported from the logical volume manager (AIX, HP-UX, Windows, and Solaris have volume managers built into the operating system; Veritas Volume Manager can be used with any of these operating systems). See also: “Copy commands” on page 193 How to modify the journal of a copy To change the storage capacity of a journal at a copy by adding or removing journal volumes: Note: If you are using this procedure because the capacity of your group's copy journals is less than the minimum journal size required for distributed groups (see the EMC RecoverPoint and RecoverPoint/SE Release Notes for this limit), after finishing the procedure, disable and then re-enable the consistency group (causing a full sweep). See “Distributed consistency groups” on page 58 and “Full sweeps” on page 80 for more information. 1. Select the group that the journal belongs to in the Navigation Pane and click the Add/Edit Journal Volumes button above the Component Pane, see “Copy commands” on page 193. The Journal Volumes Wizard is displayed. 2. For each copy; a. Click the Rescan button to update the list of available volumes. b. Select the volumes that you want to add to the journal at the copy site. Multiple volumes can be selected. For best performance, select volumes that are identical in size. If identically sized volumes are not available, select volumes that are similar in size. Modifying existing settings and policies 165 Starting Replication Use the Filter volumes by: fields to display the relevant volumes. Note: In RecoverPoint only: Only the masked volumes at the specified site are displayed in the available volumes list. Therefore, in the Volume Details area, ensure the selected volumes are seen by all RPAs. If they are not, mask the unseen LUNs to RecoverPoint WWNs, click the Rescan button to update the list of available volumes, and redo this step. c. Click the Next > button. When all of copies in the consistency group are assigned journals, the Journal Volumes Summary Page is displayed. 3. Apply your settings by clicking the Finish button. How to modify an existing replication set In the RecoverPoint system, data consistency and write-order fidelity are maintained across all volumes, by replication sets (see “Replication sets” on page 33), and the maximum possible volume capacity is defined by the physical size of the smallest volume of the replication set. Therefore, when you wish to add storage capacity to a particular volume, you must resize all of the volumes in the replication set. When changing the storage capacity of volumes in RecoverPoint, the following considerations apply: ◆ When a LUN’s storage capacity changes on the host storage system, the volume representing the LUN in RecoverPoint must be removed from, and then re-added to, the RecoverPoint system, for the change in LUN size to be reflected in RecoverPoint. ◆ When storage sub-systems do not support the resizing of LUNs on storage without losing the data contained on them, contact EMC Customer Service for instructions on how to resize volumes. The following procedures will guide you through the process of modifying existing replication sets: ◆ ◆ How to rename or remove a replication set “How to rename or remove a replication set” “How to enlarge the storage capacity of replica volumes” To rename or remove a replication set: 1. Select a consistency group in the Navigation Pane. 2. Select the consistency group’s Replication Sets Tab in the Component Pane. 166 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication 3. Double-click the replication set whose settings you want to modify. The Volumes Configuration dialog box is displayed. In the Volumes Configuration dialog box, you can change the replication set name or remove the whole replication set and add another in its place. You can also select one of the copies in the Navigation area to remove a volume at the copy and replace it with another. Note: For CLARiiON splitter environments, select replica volumes that are exactly the same size as the Production volume. How to enlarge the storage capacity of replica volumes Read “How to modify an existing replication set” on page 166 before starting this procedure. To enlarge the storage capacity of replica volumes: 1. In RecoverPoint: Disable the group whose replication set you wish to resize, to avoid a configuration settings conflict. To do so; a. Right-click on the group name in the Navigation Pane; and select the Disable group option. Note: The replica journal history is erased, and the replica can no longer be rolled back to a previous point-in-time. b. Select the group containing the replication set that you wish to resize, in the Navigation Pane; select the Replication Sets Tab, and double-click on the replication set. The Volumes Configuration dialog box is displayed. Modifying existing settings and policies 167 Starting Replication c. In the Volumes Configuration dialog box; click the Remove Replication Set button. d. Click the Apply button, but do not exit the Volumes Configuration dialog box. The replication set and its volumes are removed from the RecoverPoint system. 2. At the replica’s SAN: Dedicate more storage resources to the required LUNs, enlarging their storage capacity. 3. In RecoverPoint: While still in the Volumes Configuration dialog box; Configure the new replication set. To do so; a. Click the Add New Replication Set button. b. For each copy; a. In the Replication Sets node of the Navigation Pane; select a copy, and click the Add Volume button. The Select Volume dialog box is displayed. b. In the Select Volume dialog box; click the Rescan button. The RecoverPoint SAN discovery utility automatically detects the change in the physical size of the volume. Now add the new replication set, and its new volumes, to the RecoverPoint system. Note: For CLARiiON splitter environments, select replica volumes that are exactly the same size as the Production volume. c. Double-click on the resized LUN. Click the OK button to apply the changes and exit the Select Volume dialog box. The new replication set is created, and the resized LUNs are added (as volumes) to the replication set. The size of the replication set is defined by the physical size of the smallest volume of the replication set. c. Right-click on the disabled group’s name in the main Navigation Pane; and select the Enable group option. Make sure that the Start data transfer immediately checkbox is checked, and click the OK button. 168 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Note: To ensure consistency between replica volumes and their production sources, a full sweep synchronization is triggered on all volumes in the consistency group. For Windows only: Perform the following steps only if the volume is on a Windows host. 4. At the production host: Extend the production volumes in Windows. To do so; If the production host is a Windows machine, at the production host: Follow the instructions in “How to use Diskpart.exe to extend a data volume in Windows Server 2003, in Windows XP, and in Windows 2000” at: http://support.microsoft.com/kb/325590/en-us 5. In RecoverPoint: Bookmark the point-in-time that you wish to apply to the replica, give it a name that signifies the end of this process. To do so; a. In the Navigation Pane; right-click on the consistency group containing the new, larger volumes and select Bookmark Image. The Create a bookmark image for group dialog box is displayed. In the Create a bookmark image for group dialog box; type a bookmark name into the edit box, and click the OK button. This will become the first (pre-replication) image of the new volumes. b. From the Image Access menu, select Enable Image Access and select the bookmark that you created in the last step. After selecting the required image; select Logged Access, and click the Next button. In the summary screen, click the Finish button. 6. At the replica host: Ensure there are no writes to the replica’s storage, then clear the OS cache, invalidating the Windows partition table cache of the old replica volumes. To do so; a. First-time initialization can cause changes in the partition table of the replica volume (or volumes). In a Windows environment, you must clear the OS cache before changing the partition table of a replica volume. To clear the OS cache, shut down all host applications, disable the LUN (or LUNs) on which the volume resides, and re-enable it. You can do so Modifying existing settings and policies 169 Starting Replication either using the relevant commands in the RecoverPoint kutils utility (see “Kutils Reference” on page 311), or from the “Disk drives” interface of the Windows Device Manager. b. Shut down all host applications, unmount all volumes. 7. In RecoverPoint: Disable access to the replica image, apply all writes from the image access log to the replica storage, and start transfer. To do so; From the Image Access menu, select Disable Image Access. Click the OK button. How to modify an existing journal There are two ways to change the storage capacity an existing journal: ◆ Adding additional journal volumes to a journal in RecoverPoint; this process does not trigger a full sweep synchronization, nor does it erase all history in the existing journal. After this procedure, the replica can still be rolled back to a previous point-in-time. Note: The setting of a regular consistency group as a distributed consistency group, when the capacity of the group's copy journals is less than the minimum journal size required for distributed groups (see the EMC RecoverPoint and RecoverPoint/SE Release Notes for this limit), necessitates the addition of additional journal volumes, followed by a disabling and re-enabling of the consistency group, which does cause a full sweep. See “Distributed consistency groups” on page 58 and “Full sweeps” on page 80 for more information. See “How to modify the journal of a copy” on page 165 for detailed instructions on how to perform this procedure. ◆ Resizing an existing journal volume’s LUN on storage; this process does trigger a full sweep synchronization, and it does erase all history in the existing journal. After this procedure, the replica cannot be rolled back to a previous point-in-time. See “How to resize an existing journal volume” on page 171 for detailed instructions on how to perform this procedure. 170 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication How to resize an existing journal volume Read “How to modify an existing journal” on page 170 before starting this procedure. To resize journals by resizing an existing journal volume: 1. In RecoverPoint: a. Enable Image Access in Logged Access mode. Choose the ‘Select the latest image’ option, see “Accessing a replica” on page 242. Note: This ensures that all writes up to the current point-in-time are transferred from the replica journal to the replica storage, and only the host writes from this point forward, will need to be synchronized at the end of this procedure. b. Right-click on the copy name in the Navigation Pane; and select the Disable copy option. Note: The replica journal history is erased, and the replica can no longer be rolled back to a previous point-in-time. c. Select the group that the journal volume belongs to in the Navigation Pane, select the Replication Sets Tab, and double-click on one of the replication sets in the replication set list. The Volumes Configuration dialog box is displayed. d. In the Volumes Configuration dialog box; select the required journal volume’s VOL ID in the Navigation Pane, and click the Remove Volume button. e. Click the Apply button, but do not exit the Volumes Configuration dialog box. The journal volume is removed from the RecoverPoint system. 2. At the replica’s SAN: Dedicate more storage resources to the required LUN. Modifying existing settings and policies 171 Starting Replication 3. In RecoverPoint: While still in the Volumes Configuration dialog box; a. In the Journals node of the Navigation Pane, select the copy containing the journal volume, and click the Add New Journal Volume button. The Select Volume dialog box is displayed. b. In the Select Volume dialog box; click the Rescan button. The RecoverPoint SAN discovery utility automatically detects the change in the physical size of the volume. c. Double-click on the resized LUN. Click the OK button to apply the changes and exit the Select Volume dialog box. The volume is added to the RecoverPoint system. Then, in the Main Interface Navigation Pane; d. Right-click on the disabled copy’s name in the main Navigation Pane; and select the Enable copy option. Make sure that the Start data transfer immediately checkbox is checked, and click the OK button. Note: Data transfer is briefly paused for the group, and a short full sweep synchronization may occur, but only the writes that occurred after image access was enabled to the replica, will be synchronized. 172 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication Manually attaching volumes to splitters Before you begin, make sure you are well acquainted with the concepts; “Splitters” on page 27 and “Documentation relevance per RecoverPoint product” on page 14. Volumes need only be manually attached to splitters that are added to RecoverPoint after the addition of the volumes to a replication set. When volumes are added to a replication set, they are automatically attached to all splitters that see them in the SAN. For boot-from-SAN groups: When a consistency group is configured to boot from SAN, special considerations and procedures are necessary, please contact EMC Customer Service for more information. For Brocade splitters: If you added splitters based on a Connectrix AP-7600B or PB-48K-AP4-18 switch, make sure you follow the instructions in “Configure RecoverPoint for replication over the Connectrix device” in EMC RecoverPoint Deploying RecoverPoint with Connectrix AP-7600B and PB-48K-AP4-18 Technical Notes. For CLARiiON splitters: For CLARiiON splitters, Navisphere login credentials must be defined for each splitter added to the RecoverPoint system, see “Splitter credentials” on page 269. If you attach a volume to an RPA cluster, and that volume is already attached to a different RPA cluster that shares the same CLARiiON splitter, the volume appears to attach successfully to the second RPA cluster, but then faults to the “Attached to other RPA cluster/s” error state. The volume cannot be used by the RPA cluster to which it was just attached. To correct this error, use the Detach command on the Splitter Properties dialog box to detach the volume from the RPA cluster. Note: To avoid this problem, a volume should be masked to a single RPA cluster. A volume that is masked for one RPA cluster should not be masked for another RPA cluster. For SANTap splitters: For SANTap splitters, switch login credentials must be defined for each splitter added to the RecoverPoint system, see “Splitter credentials” on page 269. If you added fabric splitters with SANTap service, create AVTs. Refer to the EMC RecoverPoint Deploying RecoverPoint with SANTap Technical Notes, “Creating AVTs” and “Attaching volumes” sections for details. Manually attaching volumes to splitters 173 Starting Replication To manually attach volumes to splitters: 1. Select Splitters in the Navigation Pane and double-click a splitter in the Component Pane. The Splitter Properties dialog box is displayed. Note: If you have a SANTap or CLARiiON splitter, you can click the Credentials button in the Splitter Properties dialog box to enter login credentials at this time, see “Splitter credentials” on page 269. 2. Click the Rescan button. 3. Click the Attach button. The volumes that can be attached to the splitter are displayed. Select one or more volumes to attach to the splitter. Click OK to attach to selected volume to the splitter. ! CAUTION When you attach a volume to a splitter, RecoverPoint ensures consistency between the replica and the production source by performing a full sweep synchronization of the volume. Checking the Attach as Clean checkbox overrides the default synchronization process by informing the system that the replica volume being attached to the splitter is known to be an exact image of its corresponding production volume. If Attach as Clean is checked, and the replica volume being attached is inconsistent with its corresponding production volume, it will remain inconsistent. To ensure consistency, best practice to use the default RecoverPoint synchronization process. RecoverPoint automatically detects all paths from the splitter to the volume. If no path exists between the splitter and the volume, you cannot attach the volume to that splitter. 4. If there are volumes that have not yet been attached to a splitter, the following warning is displayed in the status line of the main RecoverPoint Management Application interface window. 174 EMC RecoverPoint Release 3.3 Administrator’s Guide Starting Replication a. Click on the warning for more information. If there are volumes which are not attached to their splitter, the following is displayed. b. Click on this warning to display the list of splitters for which volumes are still available to be attached. To attach the volumes, in the Navigation Pane, select Splitters. In the Component Pane, double-click the splitter name. Click Attach to view the volumes available to be attached to that splitter. 5. After adding splitters and attaching volumes to them, enable the consistency group. To do so, follow the instructions in “How to modify an existing consistency group” on page 158. You cannot replicate a volume until it is attached to a splitter. You cannot fail over to a local or a remote replica of a volume until it is attached to a splitter. For descriptions of the available splitter commands, see “Splitter management” on page 199. Manually attaching volumes to splitters 175 Starting Replication 176 EMC RecoverPoint Release 3.3 Administrator’s Guide 4 Managing and Monitoring Managing and Monitoring This section describes how to manage and monitor replication in RecoverPoint. The topics in this section include: ◆ ◆ RecoverPoint Management Application ...................................... 178 Monitoring and analyzing system performance ......................... 213 Managing and Monitoring 177 Managing and Monitoring RecoverPoint Management Application Almost all of the information necessary for the routine monitoring and managing of the RecoverPoint system is displayed through the RecoverPoint Management Application. The information, commands, and settings used to monitor and manage RecoverPoint are displayed in numerous panes and tabs. How to use the panes to monitor and manage replication, disaster recovery, and related commands is described in the following sections. Figure 5 178 RecoverPoint Management Application EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Note: The RecoverPoint system will enter maintenance mode when undergoing one of the following operations: minor version upgrade, major version upgrade, replace an RPA in an existing cluster, and add new RPAs to existing clusters. When in this mode, RecoverPoint can only monitor the system; user-initiated capabilities are disabled. The title bar will indicate the name of the site that is maintenance mode and the operation that is being performed. Once the system exits this mode, the RecoverPoint GUI will return to standard managing and monitoring function. For more information on these maintenance operations, refer to the EMC RecoverPoint Deployment Manager Product Guide. The System Pane The System Pane provides an overview of system health at a glance. The pane shows the status of major components of the RecoverPoint system environment, including the hosts, switches, storage devices, RPAs at two sites, and the WAN connection. When the system detects a problem with one of these major components, the System Pane displays the following: ◆ An error icon on each component that is not functioning properly. For a warning, this icon is yellow . ◆ The first line of the warning or error message is displayed when you place your cursor over the image of the currently faulty component. Click the component’s image to display additional details about the specific error or warning. ◆ The status of a component, particularly until you have completed logging in to the system, may also be Unknown. In that case, the icon is displayed on the component. RecoverPoint Management Application 179 Managing and Monitoring Multipath monitoring When multipathing monitoring is active, the system analyzes network errors at the level of individual paths, and generates a warning in the System Pane (“The System Pane” on page 179) whenever there is not full redundancy between the RPA and splitters or volumes. Full redundancy is defined as follows: ◆ For RPA-to-volume links, there must be at least two distinct paths between each RPA and volume; that is, each RPA has access to at least two storage WWNs (and controllers, where relevant) via non-overlapping paths. ◆ For RPA-to-splitter links, there must be at least two distinct paths between each RPA and each splitter using different RPA ports and host (or switch) WWNs. When RPA Multipath Monitoring is enabled, the system issues a warning upon logging in regarding any existing links without full redundancy. In addition, warning events are written to the log. By default, multipath monitoring is active for all replicas, for links both to storage and to splitters. To enable or disable RPA Multipath Monitoring: 1. At the RecoverPoint Management Application, from the System menu, select System Settings > Miscellaneous Settings. 2. Select or clear the checkboxes to select the desired options. Then click Apply. Note: In the RecoverPoint Management Application, warnings are displayed in the System Pane. Path information is displayed in the Volume Configuration and Splitter Properties dialog boxes. The Traffic Pane 180 The Traffic Pane displays the amount of SAN and WAN traffic passing through the RPAs. EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring The Navigation Pane The Navigation Pane allows you to navigate to the different tabs available in the Component Pane. In the Navigation Pane, click the component on which you wish to focus. The corresponding component’s information is displayed in the Component Pane, in one or more dedicated tabs, see “The Component Pane” on page 185. The Navigation Pane does not display monitoring information, but it does provide access to a very large number of management commands by right-clicking on system component names in the navigation tree. Multiple group management The Navigation Pane is context sensitive, therefore, most of the commands used for the management of all groups are displayed when you right-click on the Consistency Groups node. Note: Additional management commands can be found in the button area above the Component Pane, when Consistency Groups is selected, see “Multiple consistency group commands” on page 186. RecoverPoint Management Application 181 Managing and Monitoring Specific group management Most of the commands used for the management of a specific consistency groups are displayed when you right-click on a specific consistency group name. Note: Additional management commands can be found in the button area above the Component Pane, when a specific consistency group is selected, see “Specific consistency group commands” on page 188. 182 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Copy management Most of the commands used for the management of replica copies are displayed when you right-click on a specific copy name. Note: Additional management commands can be found in the button area above the Component Pane, when a copy is selected, see “Copy commands” on page 192. RecoverPoint Management Application 183 Managing and Monitoring Splitter management Most of the commands used for the management of splitters are displayed when you right-click on Splitters in the Navigation Pane. Note: Additional management commands can be found in the button area above the Component Pane, when the Splitter Tab is selected, see “Splitter commands” on page 199. RPA management Volume management vCenter Server management The commands used for the management of RPAs are displayed above the Component Pane when you click on RPAs in the Navigation Pane, see “RPA commands” on page 202. The commands used for the management of Volumes are displayed above the Component Pane when you click on Volumes in the Navigation Pane, see “Volume commands” on page 203. Most of the commands related to vCenter Servers are displayed when you right-click on vCenter Servers in the Navigation Pane. Note: Additional management commands can be found in the button area above the Component Pane, when the vCenter Servers Tab is selected, see “vCenter Server commands” on page 205. 184 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Log management Most of the commands used for the management of event logs are displayed when you right-click on Logs in the Navigation Pane. Note: Additional management commands can be found in the button area above the Component Pane, when the Logs Tab is selected, see “Logging commands” on page 210. The Component Pane All RecoverPoint commands can be accessed through the button area above the Component Pane. The Component Pane is context-sensitive, and the specific tabs that are displayed in the Component Pane depend on the entity selected in the Navigation Pane. General commands Table 21 Button The following commands are accessed through the context sensitive menu that is displayed when you right-click any component in the Navigation Pane. General commands Name Displayed Description Refresh Always Refresh refreshes the display. Refresh is useful when changes have been via the command-line interface. If the display is not updated automatically, Refresh forces an update. Expand All Always Expands and lists names of all replicas under the consistency group name. Collapse All Always Collapses names of all replicas under the consistency group name. RecoverPoint Management Application 185 Managing and Monitoring Multiple consistency group management The following tools are available for the management of multiple consistency groups. Multiple consistency group commands Most of the commands listed here are accessible through the context sensitive menu that is displayed when you right-click on the Consistency Groups node in the Navigation Pane. Note: Additional commands are also displayed through buttons that are displayed at the top right corner of the Component Pane when the Consistency Groups node is selected, and the criteria for display are met. Table 22 Button 186 Multiple consistency group commands Name Displayed Description Add Group Always Displays the New Consistency Group Wizard, to guide you through the process of creating a new consistency group, see “How to configure a new consistency group” on page 133. Group Sets Always Allows you to create, edit, and delete group sets, see “Automatic periodic bookmarking” on page 237. Remove Group Only available from button area at the top of the Component Pane: only if one or more specific consistency groups are selected in the Consistency Group Tab of the Component Pane. Deletes the selected consistency group. Caution: Cannot be undone. Disable / Enable Group Only available from button area at the top of the Component Pane if one or more consistency groups are selected in the Consistency Group Tab of the Component Pane. Displayed as: • Disable Group if selected groups are enabled. • Enable Group if selected groups are disabled. Disables or enables the selected consistency groups. Caution: Disabling a consistency group stops all replication, deletes journals, and causes full sweep if enabled again. EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Table 22 Button Multiple consistency group commands Name Displayed Description Apply Parallel Bookmark Only available from button area at the top of the Component Pane: only if one or more specific consistency groups are selected in the Consistency Group Tab of the Component Pane. Note: All selected groups must be enabled and transfer must be active. Applies the same bookmark with the same name to all selected groups. The bookmarked snapshots will be as close together in time as possible, see “Applying bookmarks to multiple groups simultaneously” on page 236. The Consistency Groups Tab When the Consistency Groups node is selected in the Navigation Pane, the Consistency Group Tab is displayed in the Component Pane. Figure 6 Consistency Groups Tab The Consistency Groups Tab displays the status of every consistency group. Use this screen for monitoring all consistency groups simultaneously. When RecoverPoint is configured to transfer data synchronously the word (Synchronized) is displayed next to the transfer state in the transfer column of this table, see Table 14, “Consistency Group Protection Policy Settings,”. RecoverPoint Management Application 187 Managing and Monitoring Specific consistency group management The following tools are available for the management of specific consistency groups. Specific consistency group commands Most of the commands listed here are accessible through the context sensitive menu that is displayed when you right-click on a specific consistency group name in the Navigation Pane. Note: Additional commands are also displayed through buttons that are displayed at the top right corner of the Component Pane when a specific consistency group is selected, and the criteria for display are met. Table 23 Button Specific consistency group commands Name Displayed Description Remove Group Only if a consistency group is selected. Deletes the consistency group. Caution: Cannot be undone. Disable / Enable Group Only if a consistency group is selected. Displayed as: • Disable if group is enabled. • Enable if group is disabled. Disables or enables the selected consistency group. Caution: Disabling a consistency group stops all replication, deletes journals, and causes full sweep if enabled again. Pause / Start Transfer Displayed as: • Pause if RPA is transferring writes. • Start if RPA transfer is paused. Applies to all replicas of the consistency group. Causes transfer of writes from the host to the replica to pause or start. The journal continues to distribute snapshots. Usage: Best practice is not to use this command. Use consistency group policies to set policies for use of bandwidth. Pause Transfer may be used when WAN bandwidth is very limited and you wish to give the largest bandwidth possible to another consistency group. In that case, you may temporarily pause transfer for lower-priority consistency groups. When Pause Transfer is activated, the Pause Transfer button becomes the Start Transfer button. As soon as possible, use the Start Transfer button to resume normal transfer. Bookmark Image Only if group is enabled. 188 Displays the Create a bookmark image for <Group> dialog box, enabling you to add a named bookmark to the current snapshot. When the snapshot is closed, it will be listed with the bookmark name that you give it. EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Table 23 Button Specific consistency group commands Name Displayed Description Clear Markers Only when right-clicking on a group name in the Navigation Pane when transfer to at least one copy in group is paused. Select for which replica or replicas to clear markers. For the selected consistency groups, Clear Markers clears all markers of the selected copy from the production journal volume; that is, it treats the selected replica as identical to the production source. Caution: To clear markers, the production source and the selected replica volume must be absolutely identical. If they are not, the inconsistencies will remain. The Clear Markers command should be used only with extreme caution. It is useful when a production source and a replica have been synchronized manually by initializing from backup and adequate bandwidth is not available to synchronize using the storage network. Otherwise, the best practice is not to use this command. Set Markers Only when right-clicking on a group name in the Navigation Pane when transfer to at least one copy in group is paused. Select for which replica or replicas and for which replication sets to set markers. Set Markers causes the selected replica or replicas to be resynchronized by a full sweep. When replicas are inconsistent, the system will automatically cause synchronization also without invoking the Set Markers command. The Set Markers command is useful if you have accidentally cleared markers; or if you attached as clean and then realize that the replica may not be clean. Otherwise, best practice is not to use this command. Add Replication Set Only in the button area above the Component Pane. Displays the New Replication Set Wizard, to guide you through the process of adding a new replication set to the consistency group, see “How to add a new replication set” on page 160. Remove Replication Set Only in the button area above the Component Pane, if the Replication Sets Tab, and one of the replication sets in the table, are selected. Removes the selected replication set, and all of its volumes, from the RecoverPoint configuration. Add/Edit Journal Volumes Only in the button area above the Component Pane. Displays the Journal Wizard, to guide you through the process of adding or removing volumes from the journal of a copy, see “How to modify the journal of a copy” on page 165. RecoverPoint Management Application 189 Managing and Monitoring The Status Tab To display a graphical representation of the consistency group transfer and failover status, select the consistency group in the Navigation Pane. Then in the Component Pane, click the Status Tab. The following information is displayed: Table 24 190 Status Tab Num Label Displays 1 <Group>: Running on <RPA2> Group name and Primary RPA setting, see “The Policy Tab” on page 192. 2 <Local Copy> at <New York> Copy name and Site Name, see “The Policy Tab” on page 199. EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Table 24 Status Tab Num Label Displays 3 Replication modes and states A visual representation of the current replication modes and states. • A dashed green line means that the system is replicating asynchronously • A dashed green line on top of a solid line means that the system is replicating synchronously • A greyed out line means that replication has stopped. Note: The lines move in the direction of replication. 4 Transfer : Data transfer state: initializing (with percent completion), paused (by user or by system) or active. When replicating synchronously, the word synchronized is displayed in active transfer mode. 5 Role : The current role of the copy with regards to failover and regulation. Before failover: Production source and local/remote replica After failover: Replica at production and local/remote source During regulation: Regulated Storage : The state of image access, see “Image access” on page 73. Image : The image currently being distributed to storage, see “Image access” on page 73. The Statistics Tab To display traffic and replication performance statistics, select the consistency group in the Navigation Pane. Then in the Component Pane, click the Statistics Tab. The following traffic statistics are displayed in the Traffic sub-tab: ◆ ◆ ◆ ◆ Total traffic Application traffic Initialization traffic Incoming writes If the consistency group contains a remote copy, the following replication performance statistics are displayed in a separate Replication Performance sub-tab: ◆ ◆ ◆ ◆ Bandwidth reduction Time lag Writes lag Data lag RecoverPoint Management Application 191 Managing and Monitoring The Replication Sets Tab To understand the concept of replication sets, read “Replication sets” on page 33. To understand the concept of volumes in RecoverPoint, read “Volumes” on page 36. To display the consistency group replication sets and their volumes; select a consistency group in the Navigation Pane and click the Replication Sets Tab in the Component Pane. The volumes of each replication set are displayed. The total replication set size (the size of the smallest volume in the replication set) and volume ID of each volume of the replication set, are also displayed. Double-clicking on a replication set opens the Volumes Configuration dialog box, described in “How to enlarge the storage capacity of replica volumes” on page 167 and “How to resize an existing journal volume” on page 171. The Journals Tab To display the configuration information of existing journals in a consistency group, select a consistency group in the Navigation Pane. Then, in the Component Pane, click the Journals Tab. The following information is displayed: ◆ ◆ The number of volumes that make up the journal at each copy The total size of the journal at each copy Note: To edit a journal, click the Add/Edit Journal Volumes button ( at the top of the Component Pane. ) The Policy Tab To display the consistency group policies and settings, select a consistency group in the Navigation Pane. Then, in the Component Pane, click the Policy Tab. The policies and settings for the consistency group are displayed. Policies and Settings are described in “Creating new consistency groups” on page 132. Copy management The following tools are available for the management of copies. Copy commands Most of the commands listed here are accessible through the context sensitive menu that is displayed when you right-click on a specific copy name in the Navigation Pane. 192 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Note: Additional commands are also displayed through buttons that are displayed at the top right corner of the Component Pane when a specific consistency group is selected, and the criteria for display are met. Table 25 Button Copy commands Name Displayed Description Add Copy Only if a specific group is selected in the Navigation Pane, and the group contains zero or one copies. Displays the New Copy Wizard, to guide you through the process of adding a new copy to the consistency group, see “How to add a new copy” on page 159. Remove Copy Always Deletes the copy. Caution: Cannot be undone. Disable / Enable Copy Only if one or more copies are selected. Displayed as: • Disable if copy is enabled. • Enable if copy is disabled. Disables or enables the selected copy. Caution: Disabling a copy stops all replication, deletes journals, and causes full sweep when copy is re-enabled. Enable Image Access / Access Another Image Displayed as: • Enable Image Access if image access is disabled (during replication) • Access Another Image if image access is already enabled Enables hosts to access an image of a replica. After image access is enabled, use Access Another Image to select another snapshot to recover. See “Accessing a replica” on page 242. RecoverPoint Management Application 193 Managing and Monitoring Table 25 Button 194 Copy commands Name Displayed Description Disable Image Access Only if image access was enabled to the selected copy. Stops hosts from being able to access images of a replica, and enables RecoverPoint to resume replication. Unmount the volume from the host at this replica before disabling. To disable image access. • If you were in Logged access mode, any writes made directly to the LUN while image access was enabled will be discarded. Distribution from the journal to the storage will continue from the accessed image forward. • If you were in Virtual access mode, the virtual LUN and any writes to it will be discarded. Distribution will continue from the last snapshot that was distributed before the image access. • If you were in Virtual access with Roll image in background, the virtual LUN and any changes to it and any writes made directly to storage will be discarded. Distribution will continue from whatever snapshot the system has rolled to. See “Accessing a replica” on page 242. Undo Writes Only if image access was enabled to the selected copy. To undo the writes recorded in the image access log without disabling image access. Roll to Image Only available if image access was enabled, and Virtual access without Roll image in background was selected. To roll the stored replica to the selected image. Enable Direct Access Only if image access was enabled to the selected copy. To enable hosts to access an image of a replica, without imposing a limit on the amount of data that you can write to storage. In addition, Direct Image Access gives better system performance when accessing the replica, because no rollback information to the image access log is being written in parallel with the ongoing disk I/Os. Hence, this option may be preferred if you want to carry out processing that generates a high volume of write transactions at the replica. It can also be used for testing the replicated images of BFS groups. See “Direct Image Access” on page 246. Move to Previous Point in Time Only if image access was enabled to the selected copy. Roll the stored image back one snapshot. Move to Next Point in Time Only if image access was enabled to the selected copy. Roll the stored image forward one snapshot. EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Table 25 Button Copy commands Name Displayed Description Start / Pause Transfer Displayed as: • Pause if RPA is transferring writes. • Start if RPA transfer is paused. Causes transfer of writes from the host to the replica to pause or start. The journal continues to distribute snapshots. Usage: Best practice is not to use this command. Use consistency group policies to set policies for use of bandwidth. Pause Transfer may be used when WAN bandwidth is very limited and you wish to give the largest bandwidth possible to another consistency group. In that case, you may temporarily pause transfer for lower-priority consistency groups. When Pause Transfer is activated, the Pause Transfer button becomes the Start Transfer button. As soon as possible, use the Start Transfer button to resume normal transfer. - Failover Only if image access was enabled to the selected copy. Caution: This command will erase the journal at this site. To use the selected (local or remote) replica as the source. Transfer from production will stop. If the system has only a local or only a remote copy, but not both, this replica will automatically become the production source and the production source will become the local or remote replica. If the system has three copies (production, local, and remote), transfer to the third copy will not be resumed until production is restored as the source. In a three-copy configuration, to convert the current source to the production source, select Set local copy as production or Set remote copy as production. - Recover Production Only if image access was enabled to the selected copy. Repairs the production source using the replica as the source. Recover Production is only available if the replica’s journal is still in tact; therefore, Recover Production is not available if you used Direct Image Access or after distributing a snapshot that is larger than the capacity of the journal (refer to Table 19 on page 155). Transfer from production source will be paused. Transfer to a third copy will not resume until production is restored as the source. Host access to the selected replica will be blocked. You will only be able to restore the production source from the selected replica. While being restored, the role of the production replica will be “Production (being restored).” When the restore is completed, enable image access at the production source, and select the failover option Resume production. The production journal is discarded. RecoverPoint Management Application 195 Managing and Monitoring Table 25 Copy commands Button Name Displayed Description - Set Local/Remote Copy as Production Only after image access was enabled, and a failover was performed on the selected copy, in a three-copy configuration. Sets the current replica as the production source. If the local replica is converted to the production source and there is a remote replica, the remote replica will require a full sweep. If the remote replica is converted to the production source and there is also a local replica, you must delete either the original production source or the local replica, before the remote replica can become a production source. In other words, having two remote replicas is not supported. Resume Production Only after image access was enabled, and after either failover or a recover production was performed on the selected copy. Restores the production copy as the data source. Clear Markers Only available from right-clicking on a copy name in the Navigation Pane when transfer to the copy is paused. Select for which replica or replicas to clear markers. Clears all markers of the selected copy from the production journal volume; that is, it treats the selected replica as identical to the production source. Caution: To clear markers, the production source and the selected replica volume must be absolutely identical. If they are not, the inconsistencies will remain. The Clear Markers command should be used only with extreme caution. It is useful when a production source and a replica have been synchronized manually by initializing from backup and adequate bandwidth is not available to synchronize using the storage network. Otherwise, the best practice is not to use this command. Set Markers Only available from right-clicking on a copy name in the Navigation Pane when transfer to the copy is paused. Select for which replica or replicas and for which replication sets to set markers. Causes the selected replica or replicas to be resynchronized by a full sweep. When replicas are inconsistent, the system will automatically cause synchronization also without invoking the Set Markers command. The Set Markers command is useful if you have accidentally cleared markers; or if you attached as clean and then realize that the replica may not be clean. Otherwise, best practice is not to use this command. The Journal Tab To display journal information for a replica, select that replica under its consistency group in the Navigation Pane. In the Component Pane, click the Journal Tab. 196 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring The following tables describe the information displayed in the Journal Tab for a replica. Table 26 Journal Tab: Image information Info Description Current Snapshot currently being distributed to the journal. Storage Storage access status. The current condition of a volume at the replica storage can be: • No access, during normal replication. • Direct Access, during direct image access. • Logged Access, during logged access. Table 27 Journal Tab: Journal information Info Description Journal Lag Amount of data (represented by snapshots) in the replica journal that has not yet been distributed to the replica storage. The maximum journal lag setting defines the current recovery point objective. In the event of a disaster, this is the maximum amount of data loss that may be incurred. Compression Level This feature is relevant for both synchronous and asynchronous replication. Default = none When enabled, instructs the target RPA of the consistency group to compress the snapshots in the journal so that more images (that is, a longer protection window) can be saved with the same journal capacity (saving storage cost). Enabling this option is encouraged if you have cost considerations, a low incoming write-rate, and/or limited bandwidth. Take note of the following: • Compression is not available in RecoverPoint/SE. • Compression is not relevant for the production journal (since the production journal does not contain snapshots). • Enabling journal compression while a consistency group is enabled results in the loss of all snapshots in the journal. • Compression impacts the CPU resources of the target RPA of the consistency group, and can impact that RPA’s ability to sustain its write-load. If the target RPA of the consistency group for which you want to enable this option is also transferring the data of other consistency groups across the WAN, note that enabling this setting will affect the RPAs transfer rate. See the EMC RecoverPoint Release Notes for throughput limitations. Required Protection Window Default = disabled Indicates how far in time the replica image can be rolled back. RecoverPoint Management Application 197 Managing and Monitoring Table 27 Journal Tab: Journal information Info Description Current Protection Window Indicates how far the replica journal can be rolled back. If the Required Protection Window is defined (Table 18 on page 153), the Current Protection Window will be in one of the following statuses (indicated in parentheses after the Current Protection Window): • Sufficient: Image can be rolled back far enough to meet the Required Protection Window • Insufficient: Image cannot be rolled back far enough to meet the Required Protection Window • Extending: Replication has not been running long enough to be roll backed as far as the Required Protection Window. If the Required Protection Window is not defined, the status will be N/A. Predicted Protection Window System’s prediction of the eventual size of the protection window. Note that there is no guarantee on how long it will take to reach the predicted protection window, and no guarantee that it will ever be reached (conditions may change before it is reached). If the Required Protection Window is defined, the Predicted Protection Window is in one of the following statuses: • Sufficient: Predicted Protection Window is large enough to meet the Required Protection Window • Insufficient: Predicted Protection Window is not large enough to meet the Required Protection Window If the Required Protection Window is not defined or replication has not been running long enough to make predictions, no status will be indicated. It can take 24 hours or longer of journal entries before the system finishes calculating the predicted protection window. Note: When RecoverPoint snapshot consolidation is enabled, the Predicted Protection Window value is displayed as N/A. (see “Snapshot consolidation” on page 39). Space Saved by Consolidation Amount of space saved by snapshot consolidation. This value is updated only after a consolidation process completes. Table 28 198 Journal Tab: Sample images information Info Description Time Closing time of the snapshot. Size Size of snapshot. Bookmark Details For bookmarks, displays the bookmark icon and bookmark name. For consolidated snapshots, displays the consolidated snapshot icon and consolidation type (manual, daily, weekly, monthly). A tool tip indicates how much space was saved by the consolidation. Consolidation Policy Consolidation policy applied to this snapshot (Never Consolidate, Survive Daily, Survive Weekly, Survive Monthly). A blank value indicates the default policy of Always Consolidate. EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring If a snapshot consolidation job is in process, the following information is displayed: Journal Tab: Snapshot Consolidation Progress information Table 29 Info Description Consolidation type Type of consolidation: Manual, Daily, Weekly, Monthly. Consolidation range Start and end times of snapshots being consolidated. Progress Completion percentage. A pending status indicates that the consolidation is waiting for additional snapshots in the journal to be distributed to the user volume. The consolidation will begin automatically once the snapshots have been distributed. Stop Cancels the consolidation process. The consolidation process stops after it completes processing the PIT it is currently working on. Stopping a consolidation process returns the journal to the same state it was in before the consolidation started. The Policy Tab To display policies and settings of a replica, in the Navigation Pane, under a consistency group, select a replica. In the Component Pane, click the Policy Tab. The general and protection settings, and advanced policies for the selected replica can be modified. Policies and Settings are described in “How to configure the production copy” on page 134 and “How to configure the replica copies” on page 135. Splitter management The following tools are available for the management of splitters. Splitter commands The commands listed here are available through buttons that are displayed at the top right corner of the Component Pane. Table 30 Button Splitter commands Name Displayed Description Add Splitter Always To add a splitter. For instructions, refer to “Adding splitters” on page 128. After adding the splitter, use the Splitter Property dialog box to attach the volumes to the splitter. Remove Splitter Only if a specific splitter is selected. To remove the selected splitter from the system. Only splitters that are not attached to volumes can be removed. RecoverPoint Management Application 199 Managing and Monitoring Table 30 Button Splitter commands Name Displayed Description Show Splitter Properties Only if a specific splitter is selected. To attach volumes to splitters, see “Manually attaching volumes to splitters” on page 173. Rescan Splitters Always To rescan the SAN for available splitters before attaching a splitter, see “Manually attaching volumes to splitters” on page 173. The Splitter Tab To display splitter information, from the Navigation Pane, select Splitters. All splitters that have been added to RecoverPoint are displayed in the RPAs Tab of the Component Pane, with the following information: ◆ Splitter status ◆ Name of splitter Note: The multi-cluster icon next to a CLARiiON splitter indicates that it is attached to multiple RPA clusters. A tooltip indicates the number of RPA clusters attached to the splitter (maximum of 4). ◆ Site ◆ Splitter Type (host/fabric/storage). Host OS is shown for host splitters, switch type is shown for fabric splitters, and storage type is shown for storage-based splitters. ◆ RPA Link status To manage a splitter, double-click on its name. The Splitter Properties dialog box is displayed. From the dialog box, you can attach volumes to splitters and detach them. 200 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring The dialog box displays the following additional information: ◆ Name of consistency group ◆ Status of consistency group ◆ Splitter type ◆ Status of link to RPA ◆ Site ◆ Paths between RPA and splitter • RPA#, RPA port, WWN of HBA port on host ◆ Attached volumes • Boot volume indicator • Consistency group • Copy • Replication set • Path from splitter to storage • Storage channel • Storage target • LUN (logical unit number) • User access to storage Note: If a volume is masked to more than one RPA cluster sharing the same CLARiiON splitter, it can be attached to more than one RPA cluster. The volume, however, can only be used by the first RPA cluster to which it is attached. It is in an error state for all other RPA clusters, indicated by the "Attached to other RPA cluster/s" state in the Volume Access field. To correct this error, use the Detach command on the Splitter Properties dialog box to detach the volume from the RPA cluster. To avoid this problem, a volume should be masked to a single RPA cluster. A volume that is masked for one RPA cluster should not be masked for another RPA cluster. To manage splitters from the Splitter Properties dialog box, refer to instructions in “Modifying existing settings and policies” on page 158. RecoverPoint Management Application 201 Managing and Monitoring RPA management The following tools are available for the management of RPAs. RPA commands To display detailed information about a particular RPA, including performance statistics, double-click on the RPA name or the RPA Properties icon ( ). The following information and statistics are displayed: ◆ Version - Indicates RecoverPoint software release running on the RPA. ◆ Hardware Platform - Indicates the RPA hardware platform. ◆ RPA status - Indicates if the RPA is connected. ◆ Repository Volume - Indicates if the RPA can access the Repository Volume. ◆ Storage Link - Indicates if the RPA can access storage. ◆ LAN Interface - Indicates if the physical LAN port is alive, the interface card is functional, and communication with other local RPAs exists. ◆ Communication with the remote site - Indicates if the RPA has access to the other site. ◆ Data Link - Indicates if data can be replicated to the corresponding RPA on the second site. ◆ WAN Interface - Indicates if the physical WAN port is alive, the interface card is functional, and communication with other local RPAs exists. The following statistics are displayed: Total traffic Application traffic ◆ Initialization traffic ◆ Incoming writes ◆ Bandwidth reduction ◆ CPU usage In addition, the Fabric Interface settings are displayed for each RPA HBA port. These settings are needed when replacing a faulty RPA. ◆ ◆ ◆ ◆ ◆ 202 Port number Port WWN Node WWN EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring The RPAs Tab To display RPA information, from the Navigation Pane, select RPAs. The following information is displayed in the RPAs Tab of the Component Pane: ◆ Status Indicates if the RecoverPoint appliance, LAN interface card, and WAN interface card are alive. ◆ Site (location of the RPA) ◆ RPA ID ◆ WAN IP address of the RPA ◆ Management IP address of the RPA ◆ Connectivity Indicates if communication to all RPAs in the cluster is alive and if the Storage Link, Repository Volume, Data Link, and communication with remote site are all alive; if one RPA is down, connectivity of every RPA in the cluster will report an error. Volume management The following tools are available for the management of volumes. Volume commands The commands in the following table are available through a button that is displayed at the top right corner of the Component Pane. Table 31 Button Volume commands Name Displayed Description Volume Properties Only if a specific volume is selected. Displays properties of the specific volume. Same as double-clicking on the volume in the Component Pane. The Volumes Tab To display volume information, in the Navigation Pane, select volumes. All volumes in all consistency groups are displayed. The following settings are displayed for each volume: ◆ ◆ ◆ ◆ ◆ Status of the volume Site Consistency group Copy Replication set RecoverPoint Management Application 203 Managing and Monitoring ◆ ◆ Volume type (Repository, Replication, or Journal) Volume size To obtain more detailed information about a specific volume, double-click on its row. The Volume Properties dialog box is displayed with the following additional information: ◆ For each path between the volume and the RPA: • RPA number • RPA port number • RPA port WWN • Storage controller • Serial number • LUN vCenter Server management ◆ Storage vendor ◆ Storage system ◆ Volume ID ◆ UIDs ◆ Size of the volume To display data from the VMware vCenter Server in the RecoverPoint Management Application, go to the Navigation Pane and select vCenter Servers. In addition to displaying ESX servers and all their virtual machines, datastores, and RDM drives, the RecoverPoint vCenter Server view also displays the replication status of each volume. The RecoverPoint vCenter Servers view is for monitoring only (read-only). The Navigation Pane displays the vCenter Servers object, and under it, all VMware vCenter Servers registered with RecoverPoint. The view displays data extracted from the VMware vCenter Server together with RecoverPoint replication data. For detailed information about a particular vCenter Server, in the Navigation Pane, click the vCenter Server's IP address. 204 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring vCenter Server commands When vCenter Servers is selected in the Navigation Pane, the commands in the following tables are available through the buttons that are displayed at the top right corner of the Component Pane. Table 32 Button vCenter Server commands Name Displayed Description Add vCenter Server Always To create a connection between RecoverPoint and a VMware vCenter Server, which allows RecoverPoint to display the VMware view of volumes configured for replication. To add a vCenter Server, in the dialog box that is displayed, enter the information shown in Table 34 on page 206. Edit vCenter Server When a specific vCenter Server is selected To modify the credentials of an existing connection between RecoverPoint and a VMware vCenter Server. Use when one or more credentials of the vCenter Server has changed. In the dialog box that is displayed, the information in Table 35 on page 207 can be edited. Remove vCenter Server When a specific vCenter Server is selected To remove a connection between RecoverPoint and a VMware vCenter Server. Rescan Always To obtain the latest information from the VMware vCenter Servers registered with RecoverPoint, and update the RecoverPoint Management Application. When a specific vCenter Server is selected in the Navigation Pane, the commands in the following table are available through the buttons that are displayed at the top right corner of the Component Pane. Table 33 Button vCenter Server detail commands Name Displayed Description Edit vCenter Server When a specific vCenter Server is selected To modify the credentials of an existing connection between RecoverPoint and a VMware vCenter Server. Use when one or more credentials of the vCenter Server has changed. In the dialog box that is displayed, the information in Table can be edited. Remove vCenter Server When a specific vCenter Server is selected To remove a connection between RecoverPoint and a VMware vCenter Server. Expand Always To expand the display of virtual machines running on each ESX server and the storage devices the virtual machines are accessing. RecoverPoint Management Application 205 Managing and Monitoring Table 33 Button vCenter Server detail commands Name Displayed Description Collapse Always To collapse the display of virtual machine and storage device details under each ESX server. The following commands are available in the RecoverPoint Management Application menu bar under the vCenter Server menu item: ◆ ◆ ◆ ◆ “Add” “Edit” “Remove” “Rescan” Add To create a connection between RecoverPoint and a VMware vCenter Server, which allows RecoverPoint to display the VMware view of volumes configured for replication. To add a vCenter Server, in the dialog box that is displayed, enter the following information: Table 34 206 Add vCenter Server Settings Setting Description Site RecoverPoint site where the vCenter Server server is located. IP IP address of the vCenter Server. This is also the display name of the vCenter Server in the Navigation Pane. Port TCP port number of the vCenter Server. Username vCenter Server username. Password vCenter Server password. EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Table 34 Add vCenter Server Settings Setting Description Certificate The best practice is to configure the vCenter Server to require the use of a certificate. If you wish to specify a certificate, browse to and select the certificate file. Default certificate locations: • Windows 2003 Server: C:\Documents and Settings\All Users\ Application Data\VMware\VMware VirtualCenter\SSL\rui.crt. • Windows 2008 Server: C:\Users\All Users\ Application Data\VMware\VMware VirtualCenter\SSL\rui.crt. Once RecoverPoint has read the certificate, it does not need further access to the location. For more information about the location of the vCenter’s security certificate, refer to Replacing vCenter Server Certificates VMware vSphere 4.0, available at www.vmware.com. Edit To modify the credentials of an existing connection between RecoverPoint and a VMware vCenter Server. Use when one or more credentials of the vCenter Server have changed. In the dialog box that is displayed, the following information can be edited. Table 35 Edit vCenter Server Settings Setting Description IP IP address of the vCenter Server. Port TCP port number of the vCenter Server. Username vCenter Server username. Password vCenter Server password. RecoverPoint Management Application 207 Managing and Monitoring Table 35 Edit vCenter Server Settings Setting Description Certificate The best practice is to configure the vCenter Server to require the use of a certificate. If you wish to specify a certificate, browse to and select the certificate file. Default certificate locations: • Windows 2003 Server: C:\Documents and Settings\All Users\ Application Data\VMware\VMware VirtualCenter\SSL\rui.crt. • Windows 2008 Server: C:\Users\All Users\ Application Data\VMware\VMware VirtualCenter\SSL\rui.crt. Once RecoverPoint has read the certificate, it does not need further access to the location. For more information about the location of the vCenter’s security certificate, refer to Replacing vCenter Server Certificates VMware vSphere 4.0, available at www.vmware.com. Remove To remove a connection between RecoverPoint and a VMware vCenter Server. Rescan To rescan VMware vCenter Servers registered with RecoverPoint. The vCenter Servers Tab When vCenter Servers is selected in the Navigation Pane, the Component Pane displays all vCenter Servers registered with RecoverPoint, with their site name and username. When, in the Navigation Pane, the IP address of an individual vCenter Server is selected, detailed information is displayed about the virtual machines running on that ESX server, and their volumes and raw device mappings. The following information is displayed for an individual vCenter Server: ◆ Each ESX server in the vCenter Server and its IP address ◆ Each virtual machine running in the ESX server, with the following details: • Replication status: fully configured for replication ( ), partially configured for replication ( ), or not configured for replication ( ). 208 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring • IP address ◆ Every LUN and raw device mapping accessed by each virtual machine, with the following details: • Replication status: fully configured for replication ( configured for replication ( ). ) or not • For LUNs or devices configured for replication by RecoverPoint, the following are displayed: consistency group, copy (Production, Local, Remote), replication set, and which datastore for each LUN or raw device mapping is configured for replication. System Monitoring management RecoverPoint monitors selected setting values to let the user know how close they are to their limits. The limits may be determined by the system, policies, licensing, or limitations of external technologies. To view monitored settings and their limits, in the Navigation Pane, select System Monitoring. Monitored settings are displayed in the Component Pane. Displayed settings are divided into three categories: System, Group, and Splitters. Select the tab of the category of settings you wish to view. You can sort settings by any column displayed by clicking in the column header. You can filter monitored settings by, the Filter by: field, selecting a category for filtering, then entering text to filter by in the text box provided. The System Tab The following information is shown: ◆ ◆ ◆ ◆ ◆ ◆ Severity: OK, minor, major, critical. Indicates how close a monitored setting is to its limit. Description: name of the setting value Site: replica site of the displayed setting RPA: RPA of the displayed setting Status: value of the setting and its limit Status bar: graphic display of the setting value and its limit. RecoverPoint Management Application 209 Managing and Monitoring The Group Tab The following information is shown: ◆ ◆ ◆ ◆ ◆ Severity: OK, minor, major, critical. Indicates how close a monitored setting is to its limit. Description: name of the setting value Group name: Name of the consistency group involved Status: value of the setting and its limit Status bar: graphic display of the setting value and its limit. The Splitters Tab The following information is shown: ◆ ◆ Severity: OK, minor, major, critical. Indicates how close a monitored setting is to its limit. Description: name of the setting value, including: • Number of RPA clusters attached to a splitter • Total number of volumes attached to a splitter ◆ ◆ ◆ ◆ ◆ Event log management Site: replica site of the displayed setting Host Name: name of the host involved Status: value of the setting and its limit Status bar: graphic display of the setting value and its limit. Context: additional limitations specific to a particular type of splitter (for example, Number of Brocade ITLs per DPC) RecoverPoint logs events that occur in the RecoverPoint system. The event log can be viewed. In addition, RecoverPoint offers several options (email, SNMP, and syslog) for configuring event notification (“Notification of Events” on page 251). The system events are listed and described in “Events” on page 277. Logging commands The following commands are available through buttons that is displayed at the top right corner of the Component Pane. Table 36 Button 210 Log commands Name Displayed Description Log Event Properties Only if a specific event log is selected. Opens the Event Details dialog box, which displays additional information about an individual log. EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Table 36 Button Log commands Name Displayed Description Log Filter Always Filters the logs displayed according to the criteria you specify. Filters are saved per user. Filtering criteria include: • From (time) • To (time) • Topics to include • Scope • Level • Text filter Table 37 Log filtering settings Setting Values Time Select the time range of events you wish to display. Topics Select which events to include in the display, according to the following topics: • Site • RPA • Consistency group • Splitter • Management Scope Normal: To report only the root cause for an entire set of detailed and advanced events. In most cases, these events are sufficient for effective monitoring of system behavior. Detailed: This category includes all events, with respect to all components, that are generated for use by users. Advanced: In specific cases, for instance, for troubleshooting a problem, EMC Customer Service may ask you to retrieve information from the advanced log events. These events contain information that is intended primarily for the technical support engineers. Level Info: Messages are informative in nature, usually referring to changes in the configuration or normal system state. Warning: Message indicates a warning, usually referring to a transient state or an abnormal condition that does not degrade system performance. Error: Message indicates an important event that is likely to disrupt normal system behavior and/or performance. Description Text Enter words in the description text by which to filter displayed events. Match all and Match any options are available. RecoverPoint Management Application 211 Managing and Monitoring The Logs Tab To view RecoverPoint system events, in the Navigation Pane, select Logs. The event log is displayed in the Component Pane, with most recent events first. You can sort events by any column displayed by clicking in the column header. For additional information about a single event, double-click on it. When viewing a single event, use the Prev and Next buttons to browse events. Events can be filtered by time (From and To can be specified), by topic, by scope, by level, and by descriptive text. Note that by default, only events with Normal scope are displayed. To view all events that may be relevant to users, use the Filter Log command to change Scope to Detailed. 212 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Monitoring and analyzing system performance The following table lists the components of the RecoverPoint system that can be monitored from the Management Application. From the Management Application monitoring panes, no changes can be made to monitored settings. Table 38 Components monitored from the Management Application Information Type Displayed See System SAN and WAN traffic overview Always “The Traffic Pane” on page 180 System health overview Always “The System Pane” on page 179 vCenter Servers vCenter Servers selected in the Navigation Pane “The vCenter Servers Tab” on page 208 System monitoring, system limits System Monitoring selected in the Navigation Pane • “The System Tab” on page 209 • “The Group Tab” on page 210 • “The Splitters Tab” on page 210 Consistency group event logs Logs selected in the Navigation Pane “The Logs Tab” on page 212 System event logs System > Collect System Information selected from the main system menu “Collecting system information” on page 268 Consistency group transfer and failover status When a consistency group is selected in the Navigation Pane, Status Tab “The Status Tab” on page 190 Consistency group performance statistics When a consistency group is selected in the Navigation Pane, Statistics Tab “The Statistics Tab” on page 191 Monitoring and analyzing system performance The RecoverPoint system produces extensive statistics that can be used to analyze system performance. You can use the results of these analyses to ensure that system capacity is sufficient, to adjust system settings for optimal system performance, and to plan future expansions. These include: ◆ ◆ ◆ ◆ ◆ “Detecting bottlenecks” “Exporting statistics” “Exporting consolidated statistics” “Statistics analysis tool” “Throttling I/O” Monitoring and analyzing system performance 213 Managing and Monitoring Detecting bottlenecks This feature returns statistics about RecoverPoint system performance, by group, RPA (“box”), and site. The quantity and type of statistics depends on the filters specified in the CLI detect_bottlenecks command. The standard output includes the following set of statistics: ◆ WAN (or Fibre Channel) throughput ◆ Incoming data rate ◆ Output data rate, during (and not during) initialization (synchronization) ◆ Compression CPU utilization ◆ Percentage of time in transfer (group only) ◆ Percentage of time in initialization (group only) ◆ Lag (RPO) during transfer (group only) ◆ RPA utilization (RPA only) ◆ Link utilization (group only) ◆ Line latency between sites (site only) ◆ Packet loss (site only) More detailed statistics are used primarily by EMC Customer Service, for analysis of system performance and problems. For more information on the types of statistics and filters available, refer to the EMC RecoverPoint CLI Reference Guide. More important for the normal user, this feature analyzes the system data in an effort to detect the existence of any of a set of predefined problem types, called “bottlenecks”. The types of bottlenecks are presented in the following table. Table 39 Bottlenecks Type Output/Notes For boxes and sites 214 Box balance “Boxes are not balanced.”, with data on the load handled by each box at the site. NOTE: Box balance is checked only if the time period defined is greater than 30 minutes. Compression “Compression level is too high. The box resources cannot handle the current level.” EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Table 39 Bottlenecks Type Output/Notes SAN target “Box may be regulating the application. Consider reducing box load”, with data on the total amount of incoming data, the number of writes, and the amount of incoming data per write. Box utilization “Box utilization reached 80%.” For consistency groups and links Slow production journal “Writing to the local journal volume was slow during this period.”, with data on the delay factor. Journal phase 1 “Journal is unable to handle the incoming data rate.”, with the required I/O rates for the journal and the replication volumes at local or remote copies, for both normal and fast-forward distribution modes. Journal phase 2 “Journal and replication volumes are unable to handle the incoming data rate.”, with data on the required I/O rates for the journal and the replication volumes at local or remote copies, for both normal and fast-forward distribution modes. Journal regulation “Remote storage is too slow to handle incoming data rate and regulate the distribution process.” with data on the required I/O rates for the journal and the replication volumes at local or remote copies, for both normal and fast-forward distribution modes. Unknown “Target site cannot handle the incoming data rate.”, with the distribution problem required I/O rates for the journal and the replication volumes at the remote site, for both normal and fast-forward distribution modes. Slow WAN “WAN is too slow.”, with data on total throughput for the site, the identity of the boxes at which the problem appeared, and the throughput of that box (or boxes). NOTE: A slow WAN bottleneck is detected by group, but generates data by site and box. Slow read source “Reading rate from the source replication volume/s during synchronization is too slow.”, with the reading rate. Slow read target “Reading rate from the target replication volume/s during synchronization is too slow.”, with the reading rate. Link utilization “Link utilization reached 80%.” In addition to the analysis of overall system behavior, the same type of analysis can be performed on specialized system behavior; such as, periods of initialization (synchronization), high load periods, and peak periods of incoming data (writes). Monitoring and analyzing system performance 215 Managing and Monitoring In some cases, an action to correct the problem is explicitly recommended as part of the bottleneck report. In all cases, the detection of a bottleneck is intended to lead to a correction of the problem and an improvement in system performance. Exporting statistics This feature is used primarily to write unprocessed system statistics to a CSV file, according to the specified filters. The standard output includes the following set of statistics: ◆ ◆ ◆ ◆ ◆ ◆ ◆ WAN throughput Total incoming data rate Initialization (synchronization) output rate CPU utilization Compression ratio Percentage of time in transfer (group only) Percentage of time in initialization (group only) The feature can be activated by the export_statistics command in the CLI, which offers the filtering settings: from, to, include_global_statistics, site, RPA, group, categories, frequency, and file. The standard output (as listed above) is produced when the categories setting is set to the value, OVERVIEW. More detailed statistics are used primarily by EMC Customer Service, for remote analysis of system performance and problems. Exporting consolidated statistics The export_consolidated_statistics CLI command provides data series for a selection of important RecoverPoint operational statistics. It enables advanced users, customer support representatives, and implementation specialists to analyze system traffic and workload trends, to identify correlation between spikes in two or more settings, and to discover the root causes of high loads and other significant system behaviors. With the export_consolidated_statistics command, you specify the granularity at which to collect statistics (minute, hour, and/or day) and, for each granularity, the time frame over which to collect the statistics. The resulting CSV file organizes the output for each entry according to the standard bottleneck detection settings. The following table lists the statistics that are output by the export_consolidated_statistics command. All statistics are valid for 216 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring the sampled time interval, that is, the difference between the “time from” and “time to” values on that row. Table 40 Consolidated statistics output Statistic Unit of measure Description Incoming writes rate for linka MB/sec Throughput of writes that arrived at a link on the production side. Incoming IOs rate for link IOs/sec Number of writes that arrived at a link on the production side. Average IO size can be computed as (Incoming writes rate for link) / (Incoming IOs rate for link). Non - initialization output rate for link MB/sec Rate at which normal write traffic was transferred for the link. Initialization output rate for link MB/sec Rate at which data was actually transferred for the link for purpose of synchronization. Data synchronization rate for link MB/sec Rate at which data was checked for possible transfer for the link for purpose of synchronization. In first-time initialization, this is the rate at which the data from the replication set volumes is transferred. In other cases, comparison of signatures increases the rate. Compression CPU utilization % Portion of processor time used to compress the incoming data over a link. This statistic is relevant only for remote links. Percentage time in transfer % of time Portion of time link was in active transfer state. Percentage time of initialization % of time Portion of time link was in initializing transfer state. RPO - lag in time between replicas during transfer after init sec Actual recovery point objectiveb for link, as measured in seconds. It is measured only when the transfer state for the link was active. RPO - lag in data between replicas during transfer after init MB Actual recovery point objective for link, as measured in megabytes of data. It is measured only when the transfer state for the link was active. RPO - lag in IOs between replicas during transfer after init IOs Actual recovery point objective for link, as measured in number of IOs. It is measured only when the transfer state for the link was active. Group-Link utilization % Aggregate measure of portion of RPA capacity used by the link, based on number of IOs, writes throughput, and CPU utilization. Distributor receiver regulation duration % of time Portion of time that the distribution process for a copy was forced to regulate the incoming data rate. A high value indicates a slow storage journal volumes relative to the rate of the incoming data. Distributor phase 1 thread load % of time Portion of time used to receive data and write it to the journal volumes for the link. The value is dependent on the performance of the journal volumes. Monitoring and analyzing system performance 217 Managing and Monitoring Table 40 218 Consolidated statistics output Statistic Unit of measure Distributor phase 1 effective speed MB/sec Rate at which data was received and written to the journal volumes for the link. The value is dependent on the performance of the journal volumes. Distributor phase 2 thread load % of time Portion of time used to read data from the journal volumes and write to the replica volumes, using either normal or fast-forward distribution mode. The value is dependent on the performance of the journal and replica volumes. Distributor phase 2 effective speed MB/sec Rate at which data was read from the journal volumes and written to the replica volumes, using either normal or fast-forward distribution mode. The value is dependent on the performance of the journal and replica volumes. Fast forward distribution duration % of time Portion of time at copy when distribution was in fast-forward mode. WAN throughput from box (over IP) or Cross-site throughput from box (over FC) Mb/sec Outgoing throughput from RPA to remote site. Total incoming writes rate for box MB/sec Throughput of writes that arrived at an RPA on the production side. For CLR, this is the sum of throughputs for all CDP and CRR links on this RPA (which is double the actual incoming writes rate). Incoming IOs rate for box IOs/sec Number of writes that arrived at an RPA on the production side. For CLR, this is the sum of incoming IOs for all CDP and CRR links (which is double the actual number of incoming writes). Average IO size can be computed as (Total incoming writes rate for box) / (Incoming IOs rate for box). Non - initialization output rate for box (average over all period) MB/sec Rate at which normal write traffic was transferred by the RPA. Initialization output rate for box (average over all period) MB/sec Rate at which data was actually transferred by the RPA for purpose of synchronization. Data synchronization rate for box (average over all period) MB/sec Rate at which data was checked for possible transfer by the RPA for purpose of synchronization. In first-time initialization, this is the rate at which the data from the replication set volumes is transferred. In other cases, comparison of signatures increases the rate. Compression CPU utilization % Portion of processor time used to compress the incoming data over all links on the RPA. Replication process CPU utilization – per box % Portion of processor time used for the replication process on the RPA, including compression and other replication activities. Description EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring Table 40 Consolidated statistics output Statistic Unit of measure Description Distributor receiver regulation duration % of time Portion of time that the distribution process across all copies served by the RPA was forced to regulate the incoming data rate. A high value indicates slow storage journal volumes relative to the rate of the incoming data. Distributor phase 1 effective speed MB/sec Rate at which data was received and written to the journal volumes for all copies served by the RPA. The value is dependent on the performance of the journal volumes. Distributor phase 2 effective speed MB/sec Rate at which data was read from the journal volumes and written to the replica volumes for all copies served by the RPA, using either normal or fast-forward distribution mode. The value is dependent on the performance of the journal and replica volumes. SAN target thread load % of time Portion of time used by RPA for initial processing of writes, before assigning them to the relevant links. Box utilization % Aggregate measure of portion of RPA capacity used, based on both IO load and processor utilization. WAN throughput from site (over IP) or Cross-site throughput from site (over FC) Mb/sec Combined outgoing throughput from a site (to remote site). Total incoming writes rate for site MB/sec Throughput of writes that arrived on the production side. For CLR, this is the sum of throughputs for all CDP and CRR links at the site (which is double the actual incoming writes rate). Incoming IOs rate for site IOs/sec Number of writes that arrived on the production side. For CLR, this is the sum of incoming IOs for all CDP and CRR links at the site (which is double the actual number of incoming writes). Average IO size can be computed as (Total incoming writes rate for box) / (Incoming IOs rate for box). Non - initialization output rate for site (average over all period) MB/sec Rate at which normal write traffic was transferred by all RPAs at the site. Initialization output rate for site (average over all period) MB/sec Rate at which data was actually transferred by all RPAs at the site for purpose of synchronization. Data synchronization rate for site (average over all period) MB/sec Rate at which data was checked for possible transfer by all RPAs at the site for purpose of synchronization. In first-time initialization, this is the rate at which the data from the replication set volumes is transferred. In other cases, comparison of signatures increases the rate. Compression CPU utilization % Portion of processor time used to compress the incoming data over all links for all RPAs at a site. Monitoring and analyzing system performance 219 Managing and Monitoring Table 40 Consolidated statistics output Unit of measure Statistic Description Distributor receiver regulation duration % of time Portion of time that the distribution process across all copies at a site was forced to regulate the incoming data rate. A high value indicates slow storage journal volumes relative to the rate of the incoming data. Distributor phase 1 effective speed MB/sec Rate at which data was received and written to the journal volumes for all copies at the site. The value is dependent on the performance of the journal volumes. Distributor phase 2 effective speed MB/sec Rate at which data was read from the journal volumes and written to the replica volumes for all copies at a site, using either normal or fast-forward distribution mode. The value is dependent on the performance of the journal and replica volumes. Line latency between sites msec Time it takes for data to be transferred from a site to the other site. Packet loss % of packets Measure of the reliability of the line between a site and the other site, as measured in portion of packets that must be resent. Latency and packet loss impact the effective throughput on a line between two sites. a. In an RPA, a “link” is a logical channel from production to copy, either local or remote, for a given consistency group. b. An “actual recovery point objective” is the time/data/writes that were waiting to be transferred over the link to the copy, averaged over the sampled time interval. Statistics analysis tool Using the export_consolidated_statistics command in the CLI together with the proprietary RecoverPoint MS Excel-based statistics analysis tool provides an effective and unified way for assessing current system performance, as well as planning to ensure for adequate future system capacity. The statistics analysis tool enables graphical representation of the statistics in the CSV files created by the export_consolidated_statistics command. A typical use of the tool would be to show a plot, over time, of “RPO - lag in data between replicas during transfer after init” alongside “WAN throughput from site”. If the result shows that the lag increases concurrently with a decrease in WAN throughput, then the user can conclude that in order to meet the RPO objective consistently, WAN bandwidth or quality of service must be increased. Adding the “Box utilization” statistic to the same plot can be used to confirm that the problem is not caused by the alternative explanation of an overworked appliance. 220 EMC RecoverPoint Release 3.3 Administrator’s Guide Managing and Monitoring These data can be analyzed using the statistics analysis tool, which can be configured to show desired time frames, consistency groups, appliances, sites or links, or any combination of these filters, in various graphical ways. The tool is provided both as a useful way to easily access and interpret the data series, and as a reference implementation showing how third parties can build their own tools to process the data in more specialized way. Users who require more advanced BI-like analysis, can also load the CSV file/s into most database systems, typically using database vendor-supplied loading utilities, and leverage SQL or SQL-based reporting engines to analyze the data. The statistics analysis tool is available on the Powerlink website, at: http://Powerlink.EMC.com. Throttling I/O A RecoverPoint initialization or synchronization process can temporarily put a heavy I/O load on a storage device. In some situations, it is recommended to restrict use of the storage device by these processes in order to allow other processes (including, for example, normal application writes to a RecoverPoint replication volume that belongs to a different consistency group, but which is located on the same device) to function with minimum disruption. Use the config_io_throttling command in the CLI to set the maximum rate at which any RPA can read from any storage device. Monitoring and analyzing system performance 221 Managing and Monitoring 222 EMC RecoverPoint Release 3.3 Administrator’s Guide 5 Testing, Failover, and Migration Testing, Failover, and Migration This section explains how to test and work from replicas; and how to recover from disasters and how to migrate to a different storage system. The topics in this section include: ◆ ◆ ◆ ◆ Use cases............................................................................................ Bookmarking .................................................................................... Accessing a replica........................................................................... Failover commands.......................................................................... Testing, Failover, and Migration 224 235 242 249 223 Testing, Failover, and Migration Use cases The following use cases present the most common uses of image access and failover, with a brief outline of the process for each use case. The details for each step are given later in this section. The concept of bookmarking is explained in “Bookmarks” on page 38. The concepts of image access and failover are explained in “Image access” on page 73 and “Failover” on page 77. First-time initialization When a consistency group is initialized for the first time, the RecoverPoint system cannot identify which blocks are identical between the production and replica volumes, and must therefore mark all blocks for that volume. This is true both following the creation of a new consistency group and following the addition of a volume (or volumes) to an existing group. While initializing, the RecoverPoint system efficiently determines which blocks are actually different between the production and replica copies, and sends only the data for those blocks to the replica storage, as the initialization snapshot. The volumes at the local and remote site can be initialized while the host applications are either running or not. Initialization of one consistency group does not interfere with the operation of other consistency groups. Initialization can be carried out automatically over IP or Fibre Channel. Alternatively, you can back up the current production data, manually transfer it to the remote site, and then copy the production image to the replica volumes, where it becomes the pre-replication image. The state of data transfer prior to any type of initialization is always paused. Note: First-time initialization can cause changes in the partition table of the replica volume (or volumes). In a Windows environment, you must clear the OS cache before changing the partition table of a replica volume. To clear the OS cache, disable the LUN (or LUNs) on which the volume resides, and re-enable it. You can do so either using the relevant commands in the RecoverPoint kutils utility (see “Kutils Reference” on page 311), or from the “Disk drives” interface of the Windows Device Manager. 224 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration First-time initialization from backup It is possible to initialize new consistency groups by creating a backup of your production volumes, physically transferring them to the remote site, and copying the backup images onto the remote storage volumes. During this process, applications at the production site can be running or not. Although transfer is paused during this process, unless the production host is completely shut down during the creation and physical transfer of the production image to the remote site, the host applications keep writing to storage. In this time, the production and replica volumes become inconsistent, and any writes to production volumes during this process are subsequently synchronized, upon completion of the process, and start of transfer. Only synchronizing the changes made to the production volumes during this time results in a relatively small amount of additional traffic, and takes substantially less time than to synchronize over IP or Fibre Channel. To initialize a new consistency group from a backup image, when the production and replica volumes are not consistent: At the production site: 1. Ensure that all splitters that have access to replication volumes in the group are attached to those volumes, see “Adding splitters” on page 128. 2. Create a new consistency group, replication set, and define replication and journal volumes, see “Creating new consistency groups” on page 132 and “Modifying existing settings and policies” on page 158. 3. Select the new consistency group’s name in the Navigation Pane, right-click, and select Pause Transfer to stop the transfer of replicated data from production to the replica. 4. Select the new consistency group’s name in the Navigation Pane, right-click and select Clear Markers to inform the system that the copy volume at the remote site is known to be identical to its corresponding production volume, and a full volume sweep synchronization is not required. The Clear Markers dialog box is displayed. Use cases 225 Testing, Failover, and Migration In the Clear Markers dialog box, select the remote copy, and click the OK button. 5. Create a block-based backup of the production volumes. 6. Physically transfer the backup to the remote site. At the remote site: 7. From the Image Access menu, select Enable Image Access and specify the image that you wish to access. Do not select the pre-replication image. After selecting the required image: Select Logged Access, and click the Next button. In the summary screen, click the Finish button. 8. From the Image Access menu, select Enable Direct Access to enable the remote host to directly access the image selected in Step 7. ! CAUTION In a three copy configuration, the journal of the third copy is erased, and all history for the third copy is lost. 9. Restore the backup onto the remote replica volumes. The backup image becomes the pre-replication image. 10. From the Image Access menu, select Disable Image Access, and check the Start data transfer immediately checkbox to resume replication. Click the OK button to finish the process. 226 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration ! CAUTION Upon start of transfer, the system synchronizes the data of the replica volume with the corresponding production data, which has presumably changed in the time it took to create the backup, and manually transfer it. At the local site, in a three copy configuration: 11. In a three copy configuration, you will also need to start transfer to the third copy. To do so, select the Start Transfer icon. First-time failover If you are performing this procedure on an AIX host, see the EMC RecoverPoint AIX Technical Notes for the correct procedure. Volumes attached to Windows hosts require additional steps when failing over for the first time. Best practice is to run a planned failover as soon as possible after initialization of a consistency group; that is, as soon after the transfer status changes from Initializing to Active. Windows hosts Before accessing a replica for the first time, it is necessary to update the disk information stored by the replica host. During initialization of a replication volume, the source volume is replicated in its entirety, including disk information (disk signature, partition table, and other disk information). This means that the disk information on the replica disk will have been created by the production host and not by the replica host. It is therefore necessary to invalidate the replica host’s disk information. To update the disk information of a replica: 1. Enable image access to the replica. 2. From the Windows Control Panel, select Computer Management > Disk Management. Examine all listed disks (Multiple paths may exist to each disk. In consequence, disks may be listed many times.) Note the device name or LUN number of the replication volume. 3. In Computer Management, select Device Management > Disk drives. Disable all instances of the replication volume. Disks can be identified by their device name or LUN number. Run the Scan for hardware changes command. In scripts, volumes can be enabled and disabled as follows: • In Windows 2000 and 2008, use the kutils disable and enable commands. Use cases 227 Testing, Failover, and Migration • In Windows 2003, use the Microsoft utility devcon. 4. In Computer Management, select Device Management > Disk drives. Enable all instances of the replica volume. Run the Scan for hardware changes command. Verify that all relevant disks are displayed. 5. Mount the replication volume: a. In Computer Management > Disk Management, run Rescan disks. b. Find the replication volume and assign it a drive letter. c. Verify that the disk is now accessible from the host. 6. From the replica host, unmount the replication volume (unassign or remove the drive letter). 7. Disable image access (resume distribution). Testing a replica From time to time, it is a good practice to make sure that replicas can be used to restore data, recover from disaster, or seamlessly take over production. In most cases, while testing a replica, applications can continue to run on the production servers, and replication can continue as usual. The writes will be stored in the replica journal until testing is completed. When testing is completed and write access at the replica disabled, any writes made during testing will be rolled back; and the writes from production will be distributed from the journal to the replica. The entire process can be completed without application downtime and without loss of data at the replica. To test a replica follow the steps below. For detailed instructions of the individual commands, refer to “Bookmarking” on page 235 and “Failover commands” on page 249. 1. From the Image Access menu, select Enable Image Access. 228 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration If Virtual Access is appropriate, it is the quickest. If you are only testing images, Virtual Access without Roll image in background is preferred. However, if you need to test images for an extended period of time or need maximum performance while testing, select Logged Access (physical). 2. At the host, mount the replica volume you wish to access. If the volume is in a volume group managed by a logical volume manager, import the volume group. 3. If desired, run fsck (chkdsk on Windows) on the replica volumes. 4. Access the volumes and test as desired. If you need to test longer than is possible with Logged Access (because the journal is full or will be full) or you require even better performance than Logged Access, Direct Image Access may be preferable. Refer to “Direct Image Access” on page 246 for details. Note the drawbacks of using Direct Image Access. 5. When testing is completed, unmount the replica volumes from the host. If using logical disk management, deport the volume groups. Then Disable Image Access at the replica. The writes to the replica will automatically be undone. Offloading a task The same procedure as for testing (“Testing a replica” on page 228) can be used to offload a task to a replica. For instance, if you need to run a large query on a database, and you do not want to tie up your source, you can run the query on a replica. Of course, this assumes that you do not need the very latest information (your data will be out of date by your lag time, possibly a few seconds, plus the length of time it takes to run the query). Recovering from a disaster To recover from a disaster (hardware failure or logical disaster), fail over to a replica. You have these options: Use cases 229 Testing, Failover, and Migration Recovering the production source ◆ Fail over temporarily to a replica only to use the replica’s journal to roll back to a previous point in time in the production storage (“Recovering the production source” on page 230). ◆ Fail over to a replica and work from there until the production source is repaired or you have recovered from the disaster at the production site (“Failing over to a replica temporarily” on page 231). ◆ Fail over permanently to a replica, making that the new production source (“Routine maintenance on production system” on page 233). Use this same process to migrate to a new production site. To correct file or logical corruption by rolling back to a previous point in time, use Recover Production. Access an image in the replica, verify that it is not corrupt (that is, that the image predates the corruption), then roll back to that point in time. Use the following procedure. For instructions of the individual commands, refer to “Bookmarking” on page 235 and “Failover commands” on page 249. 1. From the Image Access menu of the replica, select Enable Image Access. Use Virtual Access, if practical. 2. From the host, mount the replica volumes. If the volume is in a volume group managed by a logical volume manager, import the volume group. 3. If desired, run fsck (chkdsk on Windows). 4. Test the snapshots until you find one that you wish to roll back to (before the corruption or disaster). 5. Roll to the selected snapshot: use Virtual Access with Roll, Logged Access, or Roll to Image. You do not need to wait until rolling to the selected snapshot finishes. 6. At the copy’s Failover menu, select Recover production. In the Component Pane for the selected consistency group, the image status of the production source will be Distributing Pre-replication image and the role will be Production (being restored). 230 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration 7. After the transfer status changes to Active, enable image access at the production source. Select any image after the Pre-replication image. 8. Unmount the replica volumes from the replica hosts. If the volume is in a volume group managed by a logical volume manager, deport the volume group. When image access is enabled at the production source, Resume Production will be enabled. 9. At the production source, click Resume Production (Failover Actions icon). The production journal is erased. The production source is rolled back to the selected image and normal replication from the production source is restored. Failing over to a replica temporarily Use the following procedure to temporarily fail over to a replica and continue working from the replica until the production site is available or repaired. 1. Enable image access. If there is any chance that the latest image of the replica is not usable, select Virtual Access with Roll image in background. If you do not need to test images, at the replica you to which you wish to fail over, select Enable Image Access. The latest snapshot is a logical choice. When prompted, select the snapshot to which you wish to fail over. When prompted, select Logged Access. It can take a few minutes for the system to roll to the desired snapshot. 2. From the replica host, mount the replica volumes. If the volume is in a volume group managed by a logical volume manager, import the volume group. Use cases 231 Testing, Failover, and Migration 3. If desired, run fsck (chkdsk on Windows). 4. If necessary, test and find a usable image (“Testing a replica” on page 228). 5. At the Failover Actions menu, select Failover to <local replica name> or Failover to <remote replica name>. The replica’s journal will be erased. In a CLR (three-copy) configuration, transfer to the third copy will pause until production is resumed at the production source. 6. Repair the production site as needed. In the meantime, your applications and business operations can continue at the replica. The production journal and the production storage (assuming they are online) will be kept up-to-date from the replica. 7. When repairs at the production site have been completed, select Enable Image Access at the production site. Then at the production site, select Resume Production. The production journal is erased. If you have three copies (production, local, and remote), transfer to the third copy will automatically be resumed. 232 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration 8. Unmount the replica volumes from the replica hosts. If the volume is in a volume group managed by a logical volume manager, deport the volume group. Routine maintenance on production system Failing over to a replica (“Failing over to a replica temporarily” on page 231) is also useful for performing maintenance on the production source, such as updating the storage system. Migration Use the following procedure if you are migrating from one production site to another or failing over permanently to another site. Migration requires advanced planning to ensure risk-free migration without data loss. Contact EMC Customer Service for assistance before migrating. For Windows hosts, to ensure an up-to-date image of the file system, you should flush all file systems that reside on replication volumes. Note that some applications, such as Exchange, have their own cache, which should be flushed as well. Flushing the file system does not flush application level data. To gracefully shut down source-site host activities: 1. Close all applications that are using the consistency group’s volumes at the production site. Flush application data as necessary. 2. On Windows hosts at the source-site, run (for each drive that is in the consistency group): kutils flushFS <drive letter>. 3. On the source-site hosts, run (for each drive that is in the consistency group): kutils umount <drive letter>. Note: If the host is boot-from-SAN, shut down the host machine as well. 4. Enable image access. If there is any chance that you do not wish to use the latest image, select Virtual Access with Roll image in background. Skip to Step 5. If you do not need to test images, at the replica you wish to fail over to, select Enable Image Access. When prompted, select the snapshot to which you wish to fail over. The last snapshot is a logical choice. When prompted, select Logged Access. Use cases 233 Testing, Failover, and Migration It can take a few minutes for the system to roll to the desired snapshot. 5. From the replica host, mount the replica volumes. If the volume is in a volume group managed by a logical volume manager, import the volume group. 6. If desired, run fsck (chkdsk on Windows). 7. If necessary, test and find a usable image (“Testing a replica” on page 228). 8. Click the Failover Actions menu, select Failover to <local replica name> or Failover to <remote replica name>. The replica’s journal will be erased. In a three-copy (CLR) configuration, the transfer to the third replica will pause until production is resumed at the production source. 9. From the Failover Actions menu, select Set local copy as production or Set remote copy as production. Only relevant to CLR (configuration with remote and local replica): If you fail over to the local replica, a full sweep will be required to resynchronize the remote replica. If you fail over to the remote replica, you will not be able to retain both the production source and the local replica. RecoverPoint does not support more than one remote replica. ! CAUTION The production journal is erased. 234 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration Bookmarking The concept of bookmarking is explained in “Bookmarks” on page 38. Take note of the following: Creating a bookmark ◆ You can only bookmark a snapshot for a consistency group that is enabled and actively transferring. ◆ latest is a reserved term and therefore cannot be used as a bookmark name. ◆ Some applications support a quiesced state. For best reliability, you should use the quiesced state when bookmarking a snapshot. To create a bookmark: 1. In the Navigation Pane, select the consistency group you want to bookmark. 2. Click the Bookmark button. 3. Enter the following information on the Bookmark a snapshot of <group> dialog: 1. Enter a descriptive name for the bookmark. 2. Set the consolidation policy for the bookmark. The default consolidation policy for a snapshot is Always Consolidate, which means that the snapshot is consolidated the next time that the consolidation process runs. a. Check Set snapshot consolidation policy. b. Set the snapshot consolidation policy: Bookmarking 235 Testing, Failover, and Migration Table 41 Consolidation policies Policy Description Never consolidate snapshot Snapshot is never consolidated. Snapshot must survive Daily/Weekly/Monthly consolidations • Daily: Snapshot remains after daily consolidations, but is consolidated in weekly, monthly and manual consolidations. • Weekly: Snapshot remains after daily and weekly consolidations, but is consolidated in monthly and manual consolidations. • Monthly: Snapshot remains after daily, weekly and monthly consolidations, but is consolidated in manual consolidations. 4. Click OK. Applying bookmarks to multiple groups simultaneously To apply the same bookmark at a single point in time across multiple consistency groups, use the Parallel bookmark command. When the command is executed, the system immediately closes a snapshot on each of the consistency groups specified. Note: You cannot use latest as a bookmark name, because latest is a reserved term. Some applications support a quiesced state. For best reliability, you should use the quiesced state when bookmarking a snapshot. To apply parallel bookmarks: 1. In the Navigation Pane, select Consistency Groups. In the Component Pane, select all the consistency groups you wish to bookmark simultaneously. All selected consistency groups must be enabled and transfer must be active. 2. Click Parallel Bookmarks (upper right corner of the Component Pane). 3. Enter the following information on the Create Parallel Bookmark dialog: 236 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration 1. Enter a descriptive name for the bookmark. 2. Set the consolidation policy for the bookmark. The default consolidation policy for a snapshot is Always Consolidate, which means that means that the snapshot is consolidated the next time that the consolidation process runs. 1. Check Configure consolidation policy. 2. Set the snapshot consolidation policy. Table 41 on page 236 describes the snapshot consolidation policies. 4. Click OK. Automatic periodic bookmarking A group set allows you to automatically bookmark a set of consistency groups so that the bookmark represents the same recovery point in each consistency group in the group set. This allows you to define consistent recovery points for consistency groups that are distributed across different RPAs. The automatic periodic bookmark consists of the name you specified for the group set and an automatically incremented number. Numbers start at zero, are incremented up to 65535, then begin again at 0. The same bookmark name is used across all the groups. To apply automatic bookmarks, the sources must be at the same site (replicating in the same direction) and transfer must be enabled for each consistency group included in the group set. Group sets A group set is a set of consistency groups to which the system applies parallel bookmarks at a user-defined frequency. Group sets are useful for consistency groups that are dependent on one another or that must work together as a single unit. The Group Set Details dialog box allows you to create, edit, or remove group sets. Creating a group set Note: Automatic bookmarking of group sets will succeed only if all groups are active and all sources are at the same site. To create a group set: 1. In the Navigation Pane, select Consistency Groups. In the Component Pane, select all the consistency groups you wish to bookmark automatically. Bookmarking 237 Testing, Failover, and Migration 2. Click Group Sets (upper right corner of the Component Pane). The Group Set Details dialog box is displayed. 3. In the Group Set Details dialog box, click Create. 4. Enter a name for the automatic bookmarks. Select the consistency groups to be in the group set, and specify the bookmarking frequency. It is recommended that the interval between automatic bookmarks not be less than 30 seconds. 5. Click OK. Editing a group set Editing a group set only allows you to change the bookmarking frequency. 1. In the Navigation Pane, select Consistency Groups. 2. Click Group sets (upper right corner of the Component Pane). The Group Set Details dialog box is displayed. 3. In the Group Set Details dialog box, click Edit. You can edit only the bookmarking frequency. 4. Edit the bookmarking frequency as desired. The interval between automatic bookmarks should not be less than 30 seconds. 5. Click OK. 238 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration Removing a group set Use the following procedure to remove group sets. ! CAUTION The selected group sets are removed as soon as you click OK. 1. In the Navigation Pane, select Consistency Groups. 2. Click Group sets (upper right corner of the Component Pane). The Group Set Details dialog box is displayed. All existing group sets are displayed. 3. Select the Group Set you wish to remove, and click Remove. 4. To remove the selected group sets and close the dialog box, click OK. Applying bookmarks using KVSS The RecoverPoint KVSS utility is a command-line utility that enables applying bookmarks to Windows 2003 and 2008-based applications that support Microsoft Volume Shadow Copy Service (VSS). Microsoft Exchange and SQL are examples of Windows applications that support Volume Shadow Copy Service. A single bookmark can be used to bookmark Volume Shadow Copy Service-aware applications in many consistency groups. Volume Shadow Copy Service guarantees that the applications are in a consistent state at the point-in-time when each bookmark is applied to an image. As a result, recovery using an image with a KVSS bookmark is faster than recovering from normal RecoverPoint images. KVSS bookmarks are applied using the kvss.exe bookmark command. The working folder for running KVSS commands is %SystemDriver%/VssKashyaProvider/. Bookmarking 239 Testing, Failover, and Migration The syntax is as follows: kvss.exe bookmark writers=writer1[ writer2…] groups=group1[ group2…] bookmark=<bookmark_name> [policy=never | survive_daily | survive_weekly | survive_monthy | always] ip=<RecoverPoint_mgmt_ip_address>‘ [type = [FULL|COPY]] where: writer = a VSS-aware host application group = RecoverPoint consistency group bookmark = name by which you can identify the applied bookmark policy = consolidation policy to set for this snapshot. Valid values are: ◆ never; Snapshot is never consolidated. ◆ survive_daily; Snapshot remains after daily consolidations, but is consolidated in weekly, monthly, and manual consolidations. ◆ survive_weekly; Snapshot remains after daily and weekly consolidations, but is consolidated in monthly and manual consolidations. ◆ survive_monthly; Snapshot remains after daily, weekly, and monthly consolidations, but is consolidated in manual consolidations. ◆ always; Snapshot is consolidated in every consolidation process, whether manual or automatic. Note: The default policy is always. If the consolidation_policy parameter is not specified, the snapshot is consolidated in both automatic and manual consolidation processes. ip = RecoverPoint site-management IP type = The shadow copy type: either FULL or COPY. This setting is optional. The default is COPY. The settings full and copy are implemented by the writer application. Generally, when type = full, backup logs are truncated; when type = copy, backup logs are not truncated. 240 EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration Note: Values should be surrounded by quotation marks. You can use the vssadmin list writers command to obtain a list of registered writers on the host machine. The following is an example of a command used to produce a bookmark for a Microsoft Exchange application: kvss.exe bookmark writers="Microsoft Exchange Writer" groups="exchange group" bookmark="exchange hourly snapshot" policy=”survive_daily” ip=10.10.0.145 Note: To use KVSS in a Microsoft Cluster Server environment with Symmetrix DMX storage, the SPC-2 flag must be enabled on the Symmetrix ports. Bookmarking 241 Testing, Failover, and Migration Accessing a replica When replicating normally, writes to the production source are also written to the journal of the replicas. The storage at the replica sites is not accessible (state of storage = No access), because the snapshots in the journal are being distributed to the storage at that site. Figure 7 Normal replication to local and remote replica simultaneously To do any of the following, you must access the replica: ◆ Test a replica ◆ Roll back the production source to a previous point in time ◆ Fail over to a replica ◆ Migrate permanently to a different production site To enable a host to access a replica, enable image access to that replica; then mount the volume. If the access is logged (or virtual with roll), distribution of writes from the journal to the replica will stop. Writes will be collected in the journal until image access is disabled. If the journal is completely filled with writes, replication will be disabled. Enabling image access 242 1. To enable image access, click the Image Access pull-down of the desired replica EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration The Enable Image Access drop-down menu is displayed 2. Select Enable Image Access. The Enable Image Access dialog box is displayed. 3. Select by which method you wish to specify which snapshot to access: Select an image from the list or Specify desired point in time or Directly select the latest image Then select the specific image you wish to access. The snapshot you select to access depends on what you want to achieve: • To test an image, you may wish to start with the last image known to be valid. • To analyze data, you generally want the latest snapshot. • To fail over to a replica, you generally want the most recent snapshot that you know to be valid. For instance, if you are using Microsoft Volume Shadow Copy Service, you probably Accessing a replica 243 Testing, Failover, and Migration want to select the most recent shadow copy. The shadow copies will be bookmarked with the name that you assigned to shadow copies in the Microsoft Volume Shadow Copy Service configuration. • To restore the production source, select Production Restore. • Migration should be well planned in advance, and the snapshot to select for migration should be part of an overall migration strategy. After specifying the snapshot, the Image access mode dialog box is displayed. Select one of the options listed in the following table for the image access mode. Table 42 244 Image access modes Mode Values and description Logged access (physical) To fail over, migrate, or restore production from this replica, select Logged access. Logged access rolls backwards (or forwards) to the snapshot (point in time) you wish to recover. There will be a delay until the system rolls to the specified image. The length of delay depends on how far the selected image is from the image currently being distributed to storage. Once access is enabled, hosts in the SAN will have direct access to the storage volumes, and the RPA will not have access; that is, distribution of images from the journal to storage will be paused. If you disable image access, the writes to the storage volume while image access was enabled will be rolled back. Then distribution to storage will continue from the current image forward. If you wish to use the current image as is (with the writes to storage made now), fail over to this image or restore production. Virtual access (instant) To test an image or restore a file to a previous state, select Virtual access. Virtual access creates the required parts of the image you wish to recover in a separate virtual volume (or in memory). Access is very fast, as the system does not actually roll to the image in storage. You can use virtual access in the same way as logged access, however, it is not suitable if you need to run many commands or if you need data from large areas of the replica. After you have tested an image and found it suitable, you may wish to use the Roll to Image command (Table 43 on page 247) to actually roll to the selected image. When you disable image access, the virtual volume and all writes to it are discarded. EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration Table 42 Image access modes Mode Values and description Roll image in background To test an image and roll to it in background. Virtual access with Roll image in background creates the required parts of the image in a virtual volume, but simultaneously rolls to the physical image in background. Once the rollback is completed, the virtual volume is discarded, and the physical volume takes its place. The virtual volume and the physical volume have the same SCSI ID, therefore the switch from one to the other will be transparent to servers and applications. If you disable image access, the writes to the volume while image access was enabled will be discarded. Then distribution to storage will continue from the current image forward. If you wish to use the current image as is (with the writes to storage made now), fail over to this image or restore production. After selecting the Image access mode and clicking Next, the Image access mode Summary box is displayed. Check your choices. If necessary, go back and change any choices that are not as you desire. When satisfied, click Finish. 4. From the host, mount the volumes you wish to access. If desired, run fsck (chkdsk on Windows). Even if you use virtual access, mount the volume. The virtual image has the same SCSI ID as the actual image. Possible courses of action at this point: • Access another image • Disable image access • Undo writes • Enable direct access • Move to previous point in time • Move to next point in time • Fail over to local (or remote) replica • Recover production These courses of action are described in Table 43 on page 247 and Table 44 on page 249. If you selected virtual access, you can roll to the actual logged image using Roll to image. Accessing a replica 245 Testing, Failover, and Migration To set this replica as the production site, you must first fail over to the replica. Undo information for any changes made to the replica by the host will be written to the image access log, and automatically undone when you disable image access. The quantity of data that can be written by the host application to the replica journal is limited by the capacity of the journal. About 5% of the journal is reserved for instant recovery (indexing) information; and approximately 1 GB is reserved for handling peaks during distribution. The remaining disk space is available for the journal and the image access log. The size of the image access log is, by default, 20% of the available disk space; however, this proportion can be modified (refer to Proportion of journal allocated for image access log (%) in Table 19 on page 155). The remaining available space is reserved for the journal. For virtual access, the maximum size for the image access log is approximately 40 GB. If the capacity for the image access log is reached, host access to the replica is blocked and target-side processing is halted. You can ensure continued use of the image access log by adding capacity to the journal (up to the limits for journal or image access log size). Direct Image Access Alternatively, you can use the Direct Image Access command, which does not impose a limit to the amount of data that you can write to storage. In addition, Direct Image Access gives better system performance when accessing the replica, because no rollback information to the image access log is being written in parallel with the ongoing disk I/Os. Hence, this option may be preferred if you want to carry out processing that generates a high volume of write transactions at the replica. It can also be used for testing the replicated images of BFS groups. Direct Image Access has the following drawbacks: 246 ◆ Journal is cleared. ◆ After selecting Direct Image Access, you cannot roll back to an earlier image, if in the meantime you discover corrupted data. Moreover, in the event of a disaster at the source side, you will be unable to remove any new data that you have written to the replica (unless you have a third replica with a journal). ◆ Transfer for the consistency group must be paused. EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration ◆ Nonetheless, the system continues to write markers to the production journal volume, and it can use those markers later to resync the replica with the source. Note: If you wish to preserve a particular image of the replica, to give yourself added protection, you can back up the image before beginning your offline processing. Image Access Enabled mode Table 43 After enabling image access by selecting an image, the choices listed in the following table are available. Image access enabled mode Command Values and description Access Another Image Select another snapshot to recover. You can roll forward or backward. Disable Image Access Unmount the volume from the host at this replica before disabling. To disable image access. If you were in Logged access mode, any writes made directly to the LUN while image access was enabled will be discarded. Distribution from the journal to the storage will continue from the accessed image forward. If you were in Virtual access mode, the virtual LUN and any writes to it will be discarded. Distribution will continue from the last snapshot that was distributed before the image access. If you were in Virtual access with Roll image in background, the virtual LUN and any changes to it and any writes made directly to storage will be discarded. Distribution will continue from whatever snapshot the system has rolled to. Undo Writes To undo the writes recorded in the image access log without disabling image access. Roll to Image Only available for Virtual access without Roll image in background: To roll the stored replica to the selected image. Enable Direct Access Caution: A full sweep will occur when you disable image access. To allow the host at this site to modify the replica. These writes are not logged in the image access log and cannot be undone except by a full sweep. Move to previous point in time Roll the stored image back one snapshot. Accessing a replica 247 Testing, Failover, and Migration Table 43 248 Image access enabled mode Command Values and description Move to next point in time Roll the stored image forward one snapshot. EMC RecoverPoint Release 3.3 Administrator’s Guide Testing, Failover, and Migration Failover commands After enabling image access, the following possibilities for failing over to the enabled image are available. Table 44 Failover commands Command Values and description Fail over to Local/Remote copy Caution: This command will erase the journal at this site. To use the selected (local or remote) replica as the source. Transfer from production will stop. If the system has only a local or only a remote copy, but not both, this replica will automatically become the production source and the production source will become the local or remote replica. If the system has three copies (production, local, and remote), transfer to the third copy will not be resumed until production is restored as the source. In a three-copy configuration, to convert the current source to the production source, select Set local copy as production or Set remote copy as production. Recover Production To repair the production source using the replica as the source. Recover Production is only available if the replica’s journal is still in tact; therefore, Recover Production is not available if you used Direct Image Access or after distributing a snapshot that is larger than the capacity of the journal (refer to Table 19 on page 155). Transfer from production source will be paused. Transfer to a third copy will not resume until production is restored as the source. Host access to the selected replica will be blocked. You will only be able to to restore the production source from the selected replica. While being restored, the role of the production replica will be “Production (being restored).” When the restore is completed, enable image access at the production source, and select the failover option Resume production. The production journal is discarded. Set Local/Remote copy as production Only available after failing over to a local or remote replica in a three-copy configuration: To set the current replica as the production source. If the local replica is converted to the production source and there is a remote replica, the remote replica will require a full sweep. If the remote replica is converted to the production source and there is also a local replica, you must delete either the original production source or the local replica, before the remote replica can become a production source. In other words, having two remote replicas is not supported. Failover commands 249 Testing, Failover, and Migration Table 44 250 Failover commands Command Values and description Resume Production Only after image access was enabled, and after either failover or a recover production was performed on the selected copy. Restores the production copy as the data source. EMC RecoverPoint Release 3.3 Administrator’s Guide 6 Notification of Events Notification of Events This section explains how to configure RecoverPoint event notification. Various events generate entries to the RecoverPoint system log. RecoverPoint notifies of events by logging events in the Management Application. RecoverPoint can be configured to notify of events by e-mail, setting SNMP traps to designated hosts, or by sending events to a syslog server. In addition, by default RecoverPoint sends system reports to EMC Customer Service. The topics in this section are: ◆ ◆ ◆ ◆ ◆ ◆ ◆ Configuring event notification....................................................... E-mail notification............................................................................ SNMP notification............................................................................ Syslog notification............................................................................ System reports .................................................................................. System alerts ..................................................................................... Collecting system information ....................................................... Notification of Events 252 253 255 259 260 266 268 251 Notification of Events Configuring event notification RecoverPoint supports the following types of event notification: ◆ ◆ ◆ ◆ ◆ ◆ 252 “E-mail notification” “SNMP notification” “Syslog notification” “System reports” “System alerts” “Collecting system information” EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events E-mail notification To configure email notification of system events, use the following procedure. 1. From the System menu, select System Settings > Alert Settings. 2. To enable sending emails, check Email System Enabled. 3. Specify an SMTP server’s address to send the emails. The server address may be entered either in IP format or in DNS format. 4. Specify the email address of the sender. This address will be displayed as the sender of the email. 5. Specify which alerts you wish to send to whom. To do so, click the Add button. Then fill in the Add New Alert Rule dialog box. The settings are described in the following table. Table 45 New Alert Rule settings Setting Values Rule Enabled Enabled: To enable the specified rule E-mail notification 253 Notification of Events Table 45 New Alert Rule settings Setting Values Topic Select which events to report, according to the component of the RecoverPoint system where the events occur: • All Topics • Site • RPA • Group • Splitter Level Info: Messages are informative in nature, usually referring to changes in the configuration or normal system state. Warning: Message indicates a warning, usually referring to a transient state or an abnormal condition that does not degrade system performance. Error: Message indicates an important event that is likely to disrupt normal system behavior and/or performance. Scope Normal: To report only the root cause for an entire set of detailed and advanced events. In most cases, these events are sufficient for effective monitoring of system behavior. Detailed: This category includes all events, with respect to all components, that are generated for use by users. Advanced: In specific cases (for instance, for troubleshooting a problem) EMC Customer Service may ask you to retrieve information from the advanced log events. These events contain information that is intended primarily for the technical support engineers. Type Immediate: Each event notification is sent immediately as it occurs. Daily: Event notifications are sent once a day in a digest. 6. After specifying the Alert Rules, click Add to add the email addresses to be notified of events matching these alert rules. To edit an existing alert rule, select the rule and click Edit. To remove an alert rule, select it and click Remove. Event notifications are sent by e-mail as specified. 254 EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events SNMP notification RecoverPoint supports the standard Simple Network Management Protocol (SNMP), including support for SNMP version 3 (SNMPv3). RecoverPoint supports various SNMP queries and can be configured to generate SNMP traps (notification events), which are sent to designated network management systems. To configure SNMP notification of system events, use the following procedure. 1. From the System menu, select System Settings, and select SNMP Settings from the Navigation Pane. The SNMP Settings Pane of the System Settings dialog box is displayed. 2. Enter the desired values in the General Settings group box, according to the information in the following table. Table 46 SNMP general settings Setting Range of values Agent Enabled When selected, enables the RecoverPoint SNMP agent. The agent must be enabled to send SNMP traps (notification events). Send Event Traps When selected, sends SNMP traps (notification events) to the RecoverPoint SNMP agent. Event Trap Level Error: Only sends important messages indicating an event that is likely to disrupt normal system behavior or performance. Warning: In addition to Errors, sends warnings, usually referring to a transient state or an abnormal condition that does not degrade system performance. Info: In addition to Warnings and Errors, sends informative messages, usually referring to changes in the configuration or normal system state. Trap Destination (local) (optional) The network management server at the local site to which you wish to deliver notifications. The address may be either in IP or DNS format. A DNS address will work only if a DNS server is configured in the RecoverPoint system. Trap Destination (remote) (optional) The network management server at the remote site to which you wish to deliver notifications. The address may be either in IP or DNS format. A DNS address will work only if a DNS server is configured in the RecoverPoint system. SNMP notification 255 Notification of Events 3. Enter the desired values in the Advanced Settings group box. If you are using SNMP version 1, enter the read-only community string (a type of password; but note that it is transmitted in cleartext). If you are using SNMP version 3, click Add, and enter user names and passwords. To remove a user name, click on it and click Remove. Click Apply to save your choices. 4. Click OK. SNMP trap configuration RecoverPoint supports the default MIB-II. The RecoverPoint MIB can be downloaded from powerlink.emc.com at the following location: Home > Support > Software Downloads and Licensing > Downloads P–R > RecoverPoint The application MIB OID is: 1.3.6.1.4.1.21658 The trap identifiers for RecoverPoint traps are as follows: 1. Info 2. Warning 3. Error The product ID = Kashya The RecoverPoint SNMP trap variables and their possible values are listed in the following table. RecoverPoint SNMP trap variables Table 47 256 Variable OID Description dateAndTime 3.1.1.1 Date and time that trap was sent eventID 3.1.1.2 Unique event identifier; the values are listed in Appendix A, page 277. siteName 3.1.1.3 Name of site where event occurred EMC RecoverPoint Release 3.3 Administrator’s Guide Value Notification of Events RecoverPoint SNMP trap variables Table 47 Variable OID Description Value eventLevel 3.1.1.4 See values. 3. 4. 5. 6. 7. info warning warning off error error off eventTopic 3.1.1.5 See values. 1. 2. 3. 4. 5. site K-Box group splitter management hostName 3.1.1.6 Name of host kboxName 3.1.1.7 Name of K Box volumeName 3.1.1.8 Name of volume groupName 3.1.1.9 Name of group eventSummary 3.1.1.10 Short description of event eventDescription 3.1.1.11 More detailed description of event OMSA support OpenManage Server Administrator (OMSA) support provides RecoverPoint customers with the ability to: ◆ Display Dell hardware event notifications, together with RecoverPoint event notifications, in real-time, in the same management console. The instructions for this procedure follow. ◆ Collect system information that includes Dell hardware configuration information. To do so, see “Collecting system information” on page 268. Note: This feature is only relevant for systems in which the RPAs are running on Dell PowerEdge platforms. RecoverPoint generates events that result in Simple Network Management Protocol (SNMP) traps. When an event with predefined characteristics (defined in RecoverPoint and OMSA MIBs) occurs on SNMP notification 257 Notification of Events your system, the SNMP subagent sends information about the event, along with trap variables, to a specified management console. To view OMSA and RecoverPoint events in real-time: 1. Configure the RecoverPoint SNMP agent to send event traps to your event management console. Note: In the RecoverPoint Management Application SNMP Settings dialog box; make sure that the Agent Enabled and Send Event Traps checkboxes are checked and that the IP address of the machine that has been dedicated as the event management console is defined as the Trap Destination value, see “SNMP notification” on page 255. 2. Install a MIB browser on the machine dedicated as the event management console. Note: MIB browsers are used to manage SNMP-enabled devices and applications in a network. MIB browsers enable users to load MIBs, issue SNMP requests, and receive SNMP traps. 3. Open the MIB Browser on your management console, and: a. Enter the site management IP of the RPA cluster into the address bar. Note: The site management IP is a virtual IP address assigned to the RPA that is currently active. In the event of failure by this RPA, this IP address dynamically switches to the RPA that assumes operation. b. Enable the MIB browser’s Trap Receiver. In the Trap Receiver, both the RecoverPoint and OMSA events are displayed in real-time. RecoverPoint events are preceded with their severity level (Info, Warning, or Error), the Dell OMSA event OIDs have the prefix 1.3.6.1.4.1.674. See also The Dell OpenManage™ Server Administrator Version 1.9 SNMP Reference Guide can be found at the Dell website at: http://support.dell.com/support/edocs/software/svradmin/1.9.2/ en/SNMP/ 258 EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events Syslog notification To configure notification of system events by syslog, use the following procedure. 1. From the System menu, select System Settings > Syslog Settings. 2. To enable sending notification by syslog, check Syslog Enabled. 3. Enter the desired values for the remaining fields in the Syslog Settings dialog box. Refer to the following table. Table 48 Syslog settings Setting Range of values Facility Select one of the available labels to be attached to all messages. Level Info: Informative messages, usually referring to changes in the configuration or normal system state will be sent in addition to warnings and errors. Warning: Warnings, usually referring to a transient state or an abnormal condition that does not degrade system performance will be sent in addition to errors. Error: Only important messages indicating an event that is likely to disrupt normal system behavior or performance will be sent. Target Host (local) (optional) Specify the syslog server at the local site to which you wish to deliver notifications. The address may be either in IP or DNS format. A DNS address will work only if a DNS server is configured in the RecoverPoint system. Target Host (remote) (optional) Specify the syslog server at the remote site to which you wish to deliver notifications. The address may be either in IP or DNS format. A DNS address will work only if a DNS server is configured in the RecoverPoint system. 4. Click OK. Syslog notification 259 Notification of Events System reports The system reports (SyR) mechanism provides one-way communication between a RecoverPoint installation and the EMC System Reports database. This mechanism supports two types of information, system alerts, and system configuration reports. System reports on the configuration and state of the RecoverPoint system are sent per site, every Sunday. System alerts are sent in real-time (at the time that the events occur) to the EMC System Reports database, allowing EMC to provide pre-emptive support for RecoverPoint issues. The system report mechanism filters system alerts (see “System alerts” on page 266), and decides whether a service request should be opened with EMC Customer Service. If a service request is required, the system reports mechanism will automatically open one. One example of a possible alert rule may be; if an RPA is down for more than an hour, open a service request. The system reports mechanism is enabled by default, but can be disabled at any time through the Management Application or CLI. By default, the system report and alerts are compressed and encrypted with RSA encryption using a 256-bit key before they are sent. Note: The system report mechanism will only send system alerts and reports provided the SMTP settings are configured and the Software Serial IDs, provided with each RecoverPoint license, are entered into the Management Application GUI or CLI. By default, system reports are sent through FTPS to SyR's FTPS server, but they can be configured for transfer through a customer's ESRS server (by way of SMTP), or a customer's designated SMTP server. System reports are sent in XML format. The following sections describe the handling of system reports: ◆ ◆ ◆ 260 “Before you begin” “System report operations” “Best practice” EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events Before you begin Entering software serial IDs Before you begin: ◆ Ensure that your Software Serial IDs have been entered into the Management Application, see “Entering software serial IDs” on page 261. ◆ Decide upon a transfer method for the system reports and configure the required transfer settings, see “Configuring a server for transfer” on page 262. Two Software Serial IDs, one for each site in the RecoverPoint installation, are supplied with each RecoverPoint license. After the product’s installation, you must enter these IDs into the Management Application GUI (or CLI) to enable the sending of system alerts and/or reports to the system report mechanism. To enter your Software Serial IDs into the Management Application: 1. From the System menu, select System Settings > Account Settings. 2. Enter the Software Serial ID of the first site in the Software serial ID (<site1>) field. 3. Enter the Software Serial ID of the second site in the Software serial ID (<site2>) field. 4. Click the Apply button. System reports 261 Notification of Events Configuring a server for transfer By default, system reports are transferred through RecoverPoint’s built-in FTPS server. If FTPS is your required method of transfer, no further configuration is necessary and you can skip to the “System report operations” section. Note: If you wish to transfer system reports using FTPS, ensure that ports 990 and 989 are open and available for FTPS traffic. If you wish to transfer system reports using SMTP or ESRS, ensure that port 25 is open and available for SMTP traffic. To define ESRS or SMTP as your method of transfer, click System in your main RecoverPoint menu bar and select System Settings > System Report Settings. ◆ To transfer system reports through an SMTP server: a. In the Transfer Method section, select the SMTP radio button. b. In the SMTP Server Address field, specify the IP address or DNS name of your dedicated SMTP server, in IPv4 or IPv6 format. c. Click the Apply button. ◆ To transfer system reports through an ESRS gateway: a. In the Transfer Method section, select the ESRS radio button. b. In the ESRS Gateway IP Address field, specify the IP address of your ESRS gateway in IPv4 format. c. Click the Apply button. System report operations Most of the system report operations are performed through the System Report Settings screen of the RecoverPoint Management Application GUI. To access the system report settings, from the main RecoverPoint menubar, click the System menu, and select System Settings > System Report Settings. 262 EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events The following sections describe the handling of system reports: ◆ ◆ “Operations in the Management Application” “Additional CLI options” System reports 263 Notification of Events Operations in the Management Application The following operations can be performed through the Management Application interface. Enabling system reports To enable the automatic sending of weekly system reports: 1. Check the System Reports checkbox. 2. Click the Apply button. Disabling system reports To disable the automatic sending of weekly system reports: 1. Uncheck the System Reports checkbox. 2. Click the Apply button. Enabling system alerts To enable the sending of system alerts within the system reports: 1. Check the System Alerts checkbox. 2. Click the Apply button. Disabling system alerts To disable the sending of system alerts within the system reports: 1. Uncheck the System Alerts checkbox. 2. Click the Apply button. Encrypting system reports To encrypt the output of the system reports and alerts with RSA encryption using a 256-bit key before sending: 1. Check the Encrypt checkbox. 2. Click the Apply button. Compressing system reports To compress the output of the system reports and alerts before sending: 1. Check the Compress checkbox. 2. Click the Apply button. Note: Additional system report operations are available through the RecoverPoint CLI, see “Additional CLI options” on page 265. 264 EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events Additional CLI options The following system report operations can only be performed through the CLI. See the EMC RecoverPoint CLI Reference Guide for more information. Viewing system reports To view the current system report information, from the RecoverPoint Command Line Interface, run the get_system_report CLI command. Sending system reports to a specified e-mail address To send the current system report information to a specified e-mail address, from the RecoverPoint Command Line Interface, run the get_system_report CLI command. Displaying the current system report settings To display the current system report settings, from the RecoverPoint Command Line Interface, run the get_system_report_settings CLI command. Best practice The following configuration is the best practice and recommended by EMC: System Reports = enabled System Alerts = enabled Encrypt = enabled Compress = enabled System reports 265 Notification of Events System alerts RecoverPoint appliances send system events (whose scope is normal) to the EMC System Reports database, in real-time, through the System Alerts mechanism, provided the SMTP settings are configured and the Software Serial IDs, supplied with each RecoverPoint license, are entered into the Management Application or CLI. The system alert mechanism will filter these events, to decide whether a service request should be opened with EMC Customer Service. If a service request is required, the system alerts mechanism will automatically open one. System alerts are used by EMC to provide pre-emptive support for RecoverPoint issues, and are deployed according to predefined alert rules. One example of a possible alert rule may be; if an RPA is down for more than an hour, send an alert. This mechanism is enabled by default, but can be disabled at any time through the Management Application GUI or the CLI. The following sections describe the handling of system alerts: ◆ ◆ Before you begin “Before you begin” “System alert operations” System alerts are sent through a designated SMTP server. Before you begin, enable e-mail notifications and define the SMTP server though which alerts will be sent (see “E-mail notification” on page 253). Two Software Serial IDs, one for each site in the RecoverPoint installation, are supplied with each RecoverPoint license. After the product’s installation, you must enter these IDs into the Management Application GUI (or CLI) for the system alert mechanism to work. To enter your Software Serial IDs into the Management Application: 1. From the System menu, select System Settings > Account Settings. 2. Enter the Software Serial ID of the first site in the Software serial ID (<site1>) field. 3. Enter the Software Serial ID of the second site in the Software serial ID (<site2>) field. 4. Click the Apply button. 266 EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events System alert operations All system alert operations can be accessed via the Alert Settings dialog box of the RecoverPoint Management Application GUI. To access this dialog box, from the System menu, select System Settings > Alert Settings. To view system alerts: 1. Select Logs in the Navigation Pane. 2. Identify the warning and error events whose scope is Normal. System alerts 267 Notification of Events Collecting system information The past thirty days of system information can be collected from RPAs and splitters. This information is used to analyze and resolve support cases, and can be collected through: ◆ the Deployment Manager, see EMC RecoverPoint Deployment Manager Product Guide. ◆ the Management Application GUI, see “How to collect system information” on page 269. Upon completion of the collection process, an output file is placed in: http://[RPA IP address]/info or https://[RPA IP address]/info To retrieve the output file, you must log in as a user with webdownload permissions. See “Access control” on page 119 for more information on users with webdownload permissions. Note: The system information collected from RPAs also includes Dell OMSA hardware configuration information, provided the RPAs are running on Dell PowerEdge platforms. The following sections describe the process of system information collection: ◆ ◆ ◆ ◆ Process alternatives “Process alternatives” “Process errors” “Splitter credentials” “How to collect system information” If for any reason (connectivity issues, etc.) the collection process fails to collect the specified host information, you can collect the information directly from individual hosts on which the feature is enabled. To do so: ◆ for Windows-based hosts; from the Program Files\KDriver\hic directory on the host, run: host_info_collector 268 EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events ◆ for Solaris or AIX-based hosts; from the kdriver/info_collector directory on the host, run: info_collect.sh Process errors Errors will occur in the following cases: ◆ If connection with an RPA is lost while info collection is in process, no information is collected. In this case, run the process again. If the collection from the remote site failed because of a WAN failure, run the process locally at the remote site. Splitter credentials How to collect system information ◆ If a simultaneous info collection process is being performed on the same RPA, only the collector that established the first connection can succeed. ◆ If an FTP failure occurs, the entire process fails. In order to collect system information for SANTap and CLARiiON splitters, you must first enter splitter credentials. You can enter splitter credentials as part of the following procedures, through the specified interfaces: ◆ When collecting system information through the Collect System Information Wizard, see “How to collect system information” on page 269. ◆ When adding new splitters, through the Add Splitter Wizard, see “Adding splitters” on page 128. ◆ When managing existing splitters, through the Splitter Properties dialog box, see “Manually attaching volumes to splitters” on page 173. In the RecoverPoint GUI, system information is collected using the Collect System Information Wizard. To collect system information: 1. Select System > Collect System Information from the main system menu. The Collect System Information Wizard is displayed. Collecting system information 269 Notification of Events 2. Configure the collection process by entering the desired values into the fields of the first Collect System Information dialog box screen. Refer to the following table. Table 49 Collect system information settings Setting Description Include information from Default=yesterday, the current hour Specify the date and time (in GMT or your local time) of the first system information that you want to include in the output. Although the system information of the past thirty days is available for collection, only three days of system information can be collected at a time. Note: GMT is not adjusted for daylight savings time. To Default=today, the current hour Specify the date and time (in GMT or your local time) of the latest system information that you want to include in the output. Although the system information of the past thirty days is available for collection, only three days of system information can be collected at a time. Note: GMT is not adjusted for daylight savings time. Include system components The system components whose system information to include in the output. Possible options are RPAs only, Splitters only, or Splitters and RPAs. Include core files Optional Default=disabled Whether or not to include core files in the output. Note: Core files may be large. Subsequently, including these files in the collection process may substantially increase collection time. Include both sites 270 Optional Default=disabled Whether to include the system information of components from both sites in the output. When disabled, only the system information of the site from which the collection process is triggered is collected. EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events Table 49 Collect system information settings Setting Description Copy output file to an FTP server Optional Default=disabled Whether to copy the collection process output file to an FTP server. When enabled, RecoverPoint will create a copy of the collection process output file, and upload it to the specified FTP server. FTP server address Optional The IP or DNS address of the FTP server to which to upload the collection process output file. For example: 10.10.180.10 or ftp.EMC.com Port Optional The port through which to access the specified FTP server. Username Optional The username to use when logging into the specified FTP server. Password Optional The password to use when logging into the specified FTP server. Remote path Optional The path to the copy of the output file stored on the specified FTP server. For example: / (to access the rootdir) Override default file name (Not recommended) Optional Default=disabled Whether to override the default file name of the output file placed on the FTP server. When enabled, RecoverPoint renames the output file uploaded to the FTP server according to the new file name specified in the Filename field. Note: It is recommended to keep this setting disabled, and the default file name as is. Collecting system information 271 Notification of Events Table 49 Collect system information settings Setting Description New file name Optional Only relevant when the Specify filename setting is enabled. The new file name for the output file placed on the FTP server. Note: It is recommended to keep the Specify filename setting disabled, and the default file name as is. 3. Click the Next button, to continue on to the next screen of the Collect System Information Wizard. 4. If you set the Include system components field to RPAs only, skip this step. Otherwise; the Splitter Selection screen of the Collect System Information Wizard is displayed. In the Splitter Selection screen: a. Select the splitters whose system information you want to include in the collection process and click the Next button. A screen is displayed for each splitter for which login credentials must be defined. b. If you have already configured login credentials for all of the SANTap and CLARiiON splitters that you selected in the last step (see “Splitter credentials” on page 269), click Next until the Summary screen is displayed. Otherwise, enter credentials for each selected splitters, and click the Next button. When credentials have been defined for each selected SANTap and CLARiiON splitter, the Summary screen in displayed. 5. In The Summary screen: a. Review the displayed information to verify that your configuration settings are correct. To do so; – Verify that a green checkmark exists in the Status column of all listed splitters and RPAs, and that the text Action succeeded exists in the Details column of all listed splitters and RPAs. 272 EMC RecoverPoint Release 3.3 Administrator’s Guide Notification of Events – If all splitters and RPAs do not have green checkmarks in their Status columns, see “Process alternatives” on page 268. Note: If required, click the Back button to edit your settings. b. Click the Next button to initiate the collection process. Note: You can click the Cancel button at any time during the collection process to immediately stop the process. The System Information Results screen of the Collect System Information Wizard is displayed. 6. In The System Information Results screen: a. Verify that the specified system information has been successfully collected. To do so: – Verify that a green checkmark exists in the Status column of all listed splitters and RPAs, and that the text Action succeeded exists in the Details column of all listed splitters and RPAs. – If all splitters and RPAs do not have green checkmarks in their Status columns, see “Process alternatives” on page 268. b. Retrieve the locally stored output file. To do so: Click the Output file (HTTP) link, or Output file (HTTPS) link, and enter the username and password of a user with webdownload privileges. See “Access control” on page 119 for more information on users with webdownload permissions. 7. Click the Finish button to exit the wizard. 8. If you enabled the Copy output file to an FTP server option in Step 2, you can now retrieve the remote copy of the output file from the FTP server. To do so: a. Open a Web browser window. b. Enter the FTP server address or DNS name of the FTP server you specified in Step 2 into the address bar. Collecting system information 273 Notification of Events c. At the login prompt, enter the Username and Password you specified in Step 2. d. Browse to the Remote path that you specified in Step 2. 274 EMC RecoverPoint Release 3.3 Administrator’s Guide 7 Host Cluster Support Host Cluster Support The RecoverPoint system supports both local and remote high-availability host clusters. The topics in this section are: ◆ Configuring RecoverPoint cluster support .................................. 276 Note: When a RecoverPoint installation includes reservation-aware host clusters and host splitters, you must install a KDriver on any host that can access any replication volume on the storage, or on any host that could access a replication volume in the absence of the cluster configuration. You must then add the splitter to the RecoverPoint configuration, and attach that splitter to all volumes the splitter could access. Host Cluster Support 275 Host Cluster Support Configuring RecoverPoint cluster support To configure RecoverPoint for applications running on a host cluster, follow the instructions for configuring replication, in “Starting Replication” on page 127. The following changes are required to the standard procedures: ◆ Place all volumes that are resources of the host cluster in a single consistency group. Simplify management by assigning the name of the host cluster as the name of the consistency group. ◆ When configuring the consistency group, in the Policy Tab, General Settings, enable Reservations Support. In Advanced Settings, verify that Global cluster mode = None. ◆ The journal volumes must not be resources of the host cluster. ◆ In Advanced Policies, Reservations Policy = SCSI-2 may be required. Refer to the instructions for Reservations Policy in “Configuring copy policies” on page 152. ◆ Assign each replication set the same name as its disk resource in the cluster. Note: If all cluster nodes (hosts) at the side are down, you may not be able to create replication volumes. To correct this problem, either bring up the nodes, or run a rescan_san command (from the CLI), with volumes=FULL. ◆ Best practice: Perform first-time failover as soon as configuration is completed. First-time failover instructions are different for different host clusters. Refer to the Technical Note for your host cluster to perform first-time failover. Prior to enabling the consistency group, the System Pane may show an Error on the source-side storage and display the message “Volume cannot be accessed by any RPA.” The error status may also be displayed in the Status Tab and Replication Sets Tab display for the consistency group. The errors are removed upon enabling the consistency group. 276 EMC RecoverPoint Release 3.3 Administrator’s Guide A Events Events This section presents a comprehensive list of the events that may occur during RecoverPoint operation, as reported in the logs. ◆ ◆ ◆ Introduction ...................................................................................... 278 Normal events .................................................................................. 279 Detailed events ................................................................................. 301 Events 277 Events Introduction RecoverPoint generates an event log in response to events in the RecoverPoint system. The events in the event log may be viewed (“Event log management” on page 210). In addition, RecoverPoint offers several options (e-mail, SNMP, and syslog) for sending event notifications (“Notification of Events” on page 251). Table 50 on page 279 and Table 51 on page 301 list system events and their description. The events are divided into two tables, according to scope: normal and detailed. 278 EMC RecoverPoint Release 3.3 Administrator’s Guide Events Normal events The normal events include both “root-cause” events (a single description for an event that can spawn many individual events) and other selected basic events. Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 1000 Management Info User logged in. (User <user>) User login 1001 Management Warning Login failed. (User <user>) User failed to login 1003 Management Warning Failed to generate SNMP trap. (Trap contents) System failed to send SNMP trap 1004 Management Warning Failed to send email alert to specified address. (Address <email address>, Event summary <summary>) System failed to mail an email alert 1005 Management Warning Failed to update file. (File <file>) Failure to update local config file (passwords, ssh keys, syslog configuration, SNMP configuration) 1006 Management Info Settings changed. (User <user>, Settings <settings>) User changed settings 1007 Management Warning Settings change failed. (User <user>, Settings <settings>, Reason <reason>) Failure to change settings 1008 Management Info User action succeeded. (User <user>, Action <action>) User performed one of the following actions: bookmark_image, clear_markers, set_markers, undo_logged_writes, set_num_of_streams 1009 Management Warning User action failed. (User <user>, Action <action>, Reason <reason>) One of the following actions failed: bookmark_image, clear_markers, set_markers, undo_logged_writes, set_num_of_streams 1011 Management Error Grace period expired. You must install an activation code to activate your RecoverPoint license. Grace period expired 1014 Management Info User bookmarked an image. (Group <group>, Snapshot <bookmark>) User bookmarked image Normal events 279 Events Table 50 280 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 1015 Management Warning RPA to storage multipathing problem. (RPA <RPA>, Volume <volume>) Single path only or more paths between RPA and volume is not available. 1016 Management Warning Off RPA multipathing problem fixed. (RPA <RPA>, Volume <volume>) All paths between RPA and volume are available. 1017 Management Warning RPA to host multipathing problem. (RPA <RPA>, Splitter <splitter>) One or more paths between RPA and splitter is not available. 1018 Management Warning Off RPA multipathing problem fixed. (RPA <RPA>, Splitter <splitter>) All paths between RPA and splitter are available. 1019 Management Warning User action performed successfully. (Markers cleared. Group <group>, <copy>) (Replication set attached as clean. Group <group>) User cleared markers or attached replication set as clean. 1021 RPA Error An error has occurred in the firmware of an HBA. Please collect system information and send the results to EMC Customer Service as soon as possible. To collect system information, log in as boxmgmt and run Collect System Info from the Diagnostics menu. An internal error occurred in the HBA firmware. 3001 RPA Warning RPA is no longer a cluster member. (RPA <RPA>) An RPA is known to be disconnected from site control 3005 RPA Error Settings conflict between sites. (Reason <reason>) Settings conflict between the sites was discovered 3006 RPA Error Off Settings conflict between sites resolved by user. (Using Site <site> settings) Settings conflict between the sites was resolved by user 3020 RPA Warning Off The link to RPA at the other site has been restored. 3021 RPA Warning Error occurred in link to RPA at the other site. 3030 RPA Warning RPA switched path to storage. (RPA <RPA>, Volume <volume>) A storage path change was initiated by the RPA 4056 Group Warning No image found in journal to match query. (Group <group>) No image was found in the journal to match the query. EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 4090 Group Warning Image access log is 90% full. When log is full, writing by hosts at target side will be disabled. (Group <group>) Image access log is 90% full 4106 Group Warning Capacity reached -- cannot write additional markers for this group to <production volume>. Starting full sweep. (Group <group>) Disk space for markers filled (for the group) 4100 Group Warning Group created. Creating a new group modifies the load distribution across RPAs. To balance the write load across all RPAs, run the balance_load command in seven days, and apply the recommendation. New group has been created. 4117 Group Warning Virtual access buffer is 90% full. When buffer is full, writing by hosts at target side will be disabled. (Group <group>) Usage of virtual access buffer has reached 90%. 4131 Group Warning Transfer paused or synchronizing for unusually long time For given copy, time that transfer has been paused or synchronizing exceeds pre-set value. 4132 Group Warning Off Transfer has resumed (following long pause or synchronization) For copy on which transfer had been paused or synchronizing for unusually long time, transfer has now restarted. 4133 Group Error Copy regulation has started. User requests copy regulation. 4134 Group Error Off Copy regulation has ended due to a user action or internal timeout. User requests to end copy regulation. 4135 Group Info Data transfer to copy paused by user. User pauses transfer to copy. 4136 Group Info Data transfer to copy resumed by user. User resumes transfer to copy. 4137 Group Info Snapshot consolidation has been successful. System successfully consolidates snapshots, according to user’s request. 4138 Group Warning Snapshot consolidation has failed. System does not successfully consolidate snapshots, according to user’s request. 4174 Group Info Migration of configuration data has started. Normal events 281 Events Table 50 282 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 4175 RPA Info System has entered RPA addition maintenance mode. System begins RPA addition maintenance mode. 4176 RPA Info System has entered major version upgrade maintenance mode. System begins major version upgrade maintenance mode. 4177 RPA Info System has entered minor version upgrade maintenance mode. System begins major version upgrade maintenance mode. 4178 RPA Info System has entered RPA replacement maintenance mode. System enters RPA replacement maintenance mode. 4179 RPA Info System has exited RPA addition maintenance mode. System exits RPA addition maintenance mode. 4180 RPA Info System has exited major version upgrade maintenance mode. System exits major version upgrade maintenance mode. 4181 RPA Info System has exited minor version upgrade maintenance mode. System exits minor version upgrade maintenance mode. 4182 RPA Info System has exited RPA replacement maintenance mode. System exits RPA replacement maintenance mode. 4210 System Warning A virtual machine is no longer being replicated. The LUNs of the virtual machine are no longer configured for replication. 4211 System Warning A virtual machine is now partially replicated. Some LUNs of the virtual machine are no longer configured for replication. 4212 System Info A virtual machine is now fully replicated. All LUNs of the virtual machine are now configured for replication. 4213 System Error vCenter Server is not accessible. vCenter Server credentials may be incorrect or there may be a problem with the physical connection between the vCenter Server and RecoverPoint. 4300 Site Warning Writing rate to production journal is slow For 90% of last 10 minutes, the production journal performed poorly, which is defined as a delta marker (or, in SANTap, pwlhandler) flush > 0.5 second. EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 4301 RPA Warning Box is unable to handle incoming data rate, due to high compression level WAN compression > 70% of total CPU capacity averaged over the last 10 minutes 4302 Group Warning Journal is unable to handle incoming data rate Accumulator distributorPhase1TotalTime > 0.65 4303 Group Warning Journal and replication volumes at copy In last 10 minutes of distribution, are unable to handle incoming data rate phase2 time > 80%, and fast forward > 10% 4304 Group Warning Remote storage is unable to handle incoming data rate; regulating distribution Accumulator distributorReceiverRegulation > 0.15 4305 Group Warning Remote site is unable to handle incoming data rate Incoming data rate exceeds distribution rate, and no problem detected with journal performance. 4306 Site Warning Data transfer rate between sites is slow One of the following accumulators exceeds a threshold: • vacancyObserverAccumulator • transmitterReceiverWaitingTime • transmitterReceiverCreditWaiting Time • transmitterMPIWaitTime 4307 Site Warning Reading rate from local replication volumes during resynchronization is slow Resync during last 10 minutes, and reading rate of the local replication volumes at copy is < 10 Mbytes/sec 4308 Site Warning Reading rate from replication volumes Resync during last 10 minutes, at copy during resynchronization is slow and reading rate of the replication volumes at copy is < 10 Mbytes/sec 4309 RPA Warning Box utilization reached 80% Utilization exceeds 80% 4310 Group Warning Link utilization reached 80% Utilization exceeds 80% 4311 Group Info Load balancing recommendation User ran the balance_load CLI command. 4400 Site Warning Off Writing rate to production journal is normal Thirty minutes after occurrence of Event 4300, the production journal is successfully handling incoming writes. Normal events 283 Events Table 50 284 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 4401 RPA Warning Off Box handling incoming data rate successfully Thirty minutes after occurrence of Event 4301, RPA is successfully handling incoming data. 4402 Group Warning Off Journal handling incoming data rate successfully Thirty minutes after occurrence of Event 4302, journal is successfully handling incoming data. 4403 Group Warning Off Journal and replication volumes at copy handling incoming data rate successfully Thirty minutes after occurrence of Event 4303, journal and replication volumes are successfully handling incoming data. 4404 Group Warning Off Remote storage no longer regulating distribution Thirty minutes after occurrence of Event 4304, remote storage is successfully handling incoming data. 4405 Group Warning Off Remote site handling incoming data rate successfully Thirty minutes after occurrence of Event 4305, remote site is successfully handling incoming data. 4406 Site Warning Off Data transfer rate between sites is normal Thirty minutes after occurrence of Event 4306, data transfer between sites is no longer slow. 4407 Site Warning Off Reading rate from local replication volumes during synchronization is normal Thirty minutes after occurrence of Event 4307, reading rate from local replication volumes during synchronization is no longer slow. 4408 Site Warning Off Reading rate from replication volumes at copy during synchronization is normal Thirty minutes after occurrence of Event 4308, reading rate from replication volumes at copy during synchronization is no longer slow. 4409 RPA Warning Off Box utilization is normal Thirty minutes after occurrence of Event 4310, and current RPA utilization is < 80%. 4410 Group Warning Off Link utilization is normal Thirty minutes after occurrence of Event 4310, and current link utilization is < 80%. 5008 Splitter Warning Host shut down. (Host Splitter <splitter>) Host shutdown/restarted EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 5010 Splitter Warning Splitter stopped -- depending on policy, writing by host may be disabled for some groups, and a full sweep may be required for other groups. (Splitter <splitter>) Splitter stopped by user without detaching volumes; policy implemented per volume 5011 Splitter Info Splitter stopped (Splitter <splitter>); full sweep will be required. Splitter stopped by user after removing volumes; volumes disconnected 5012 Splitter Warning Splitter stopped (Splitter <splitter>); writes to replication volumes disabled. Splitter stopped; host access to all volumes disabled 5017 Splitter Error Off Splitter version is supported. Splitter version is supported. 5018 Splitter Error Splitter version is not supported. Splitter version is not supported. 5052 Group Info RecoverPoint was successfully configured to replicate synchronously to one of the replicas of this group, but since then the splitter at the replica site has been replaced. The new splitter does not support synchronous replication, and consequentially, all replication has been stopped. Replace your current splitter with one that supports synchronous replication. If problem persists, contact EMC Customer Service. Splitter was downgraded to a version that does not support synchronous replication. 5053 Group Error RecoverPoint is configured to replicate synchronously to a replica of this group, and the splitter at the replica site now supports synchronous replication. Splitter was upgraded to a version that does support synchronous replication. 5054 Group Error Consistency group is configured with a LUN greater than 2 TB and a CLARiiON splitter version that does not support the LUN. Splitter was downgraded to a version that does not support LUNs greater than 2 TB. 5055 Group Error Off Consistency group was configured with a LUN greater than 2 TB and a CLARiiON splitter version that did not support the LUN. Now the splitter version supports the LUN. Splitter was upgraded to a version that supports LUNs greater than 2 TB. Normal events 285 Events Table 50 286 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 10000 - Info Changes occurring in system. Analyzing... - 10001 - Info System changes have occurred. System is now stable. - 10002 - Info System activity has not stabilized -issuing an intermediate report. - 10101 - Error Cause of system activity unclear. To obtain more information, filter events log using Detailed scope. - 10102 - Info Site control recorded internal changes that do not impact system operation. - 10201 - Info Settings have changed. - 10202 - Info System changes have occurred at the other site. - 10203 - Error RPA cluster was down. - 10204 - Error One or more RPAs are disconnected from the RPA cluster. - 10205 - Brief Error An error in communication has occurred in an internal process. - 10206 - Info Internal process was restarted. - 10207 - Error Internal process was restarted. - 10210 - Error Initialization is experiencing high load conditions. - 10211 - Error Temporary problem in Fibre Channel link between splitters and RPAs. - 10212 - Error Off Temporary problem in Fibre Channel link between splitters and RPAs -resolved. - 10501 - Info Synchronization completed. - 10502 - Info Access to target-side image enabled - EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 10503 - Error Transferring latest snapshot before pausing transfer (no data loss). - 10504 - Info Journal cleared. - 10505 - Info Undoing of writes to image access log completed. - 10506 - Info Roll to physical image complete -logged access to physical image now enabled. - 10507 - Info Due to system changes, the journal was temporarily out of service. Journal is now available. - 10508 - Info All data flushed from local-side RPA; automatic failover will proceed. - 10509 - Info Initial long resync completed. - 10510 - Info Following a pause transfer, system now cleared to restart transfer. - 10511 - Info Finished recovering replication backlog. - 12001 - Error Splitter is down. - 12002 - Error Error occurred in all links to the other site; the other site may be down. - 12003 - Error Error occurred in link to RPA at the other site. - 12004 - Error Error in data link over WAN -- all RPAs unable to transfer replicated data to other site. - 12005 - Error Error in data link over WAN -- RPA unable to transfer replicated data to other site. - 12006 - Error RPA is disconnected from the RPA cluster. - 12007 - Error All RPAs are disconnected from the RPA cluster. - 12008 - Error RPA is down. - Normal events 287 Events Table 50 288 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 12009 - Error Group entered high load. If high load persists, consider running the balance_load command and applying the load balancing recommendation, or manually modifying the preferred RPA of each group according to the recommendation. - 12010 - Error Journal error -- full sweep to be performed after error is corrected. - 12011 - Error Image access log or virtual buffer is full -- writing by hosts at target side is disabled. - 12012 - Error Unable to enable virtual access to image. - 12013 - Error Unable to enable access to specified image. - 12014 - Error Fibre Channel link between all RPAs and all splitters and storage is down. - 12016 - Error Fibre Channel link between all RPAs and all storage is down. - 12022 - Error Fibre Channel link between RPA and splitters or storage volumes (or both) is down. - 12023 - Error Fibre Channel link between RPA and all splitters and storage is down. - 12024 - Error Fibre Channel link between RPA and all splitters is down. - 12025 - Error Fibre Channel link between RPA and all storage is down. - 12026 - Error Error occurred in link to RPA at the other site. - 12027 - Error All replication volumes attached to the consistency group (or groups) are not accessible. - 12029 - Error Fibre Channel link between all RPAs and one or more volumes is down. - EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 12033 - Error Repository volume is not accessible; data may be lost. - 12034 - Error Writes to storage occurred without corresponding writes to RPA. - 12035 - Error Error in WAN link to RPA cluster at other site. - 12036 - Error Renegotiation of transfer protocol requested. - 12037 - Error All volumes attached to the consistency group (or groups) are not accessible. - 12038 - Error All journal volumes attached to the consistency group (or groups) are not accessible. - 12039 - Error Long resync started. - 12040 - Error System has detected bad sectors in volume. - 12041 - Error Splitter is up. - 12042 - Error Splitter write may have failed (while group was transferring data). Synchronization will be required. - 12043 - Error Splitter writes may have failed. - 12044 - Error Problem with IP link between RPAs (in at least in one direction). - 12045 - Error Problem with all IP links between RPAs. - 12046 - Error Problem with IP link between RPAs. - 12047 - Error RPA network interface card (NIC) problem. - 12048 - Error Splitter version is not supported. - 12049 - Info RPA has entered maintenance mode. - Normal events 289 Events Table 50 290 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 12050 - Warning RecoverPoint has dynamically started replicating asynchronously to one of the replicas of this group. The group will now be initialized. During initialization, data is not transferred synchronously. - 12054 - Error RPA to storage connectivity failure. - 12055 - Error RPA to CLARiiON storage connectivity failure. - 12056 - Error RPA to CLARiiON storage/splitter connectivity failure. - 12072 - Error Fibre channel link between <RPAs> is down. - 14001 - Error Off Splitter is up, and version is supported. - 14002 - Error Off All WAN links to other site restored. - 14003 - Error Off The link to RPA at the other site has been restored. - 14004 - Error Off Data link over WAN restored -- all RPAs able to transfer replicated data to other site. - 14005 - Error Off Data link over WAN restored -- RPA able to transfer replicated data to other site. - 14006 - Error Off Connection of RPA to the RPA cluster is restored. - 14007 - Error Off Connection of all RPAs to the RPA cluster is restored. - 14008 - Error Off RPA is up. - 14009 - Error Off Group exited high load – initialization completed. If high load persists, consider running the balance_load command and applying the load balancing recommendation, or manually modifying the preferred RPA of each group according to the recommendation. - EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 14010 - Error Off Journal error corrected -- full sweep required. - 14011 - Error Off Image access log or virtual buffer no longer full. - 14012 - Error Off Virtual access to image enabled. - 14013 - Error Off No longer trying to access a diluted image. - 14014 - Error Off Fibre Channel link between all RPAs and all splitters and storage is restored. 14016 - Error Off Fibre Channel link between all RPAs and all storage is restored. - 14022 - Error Off Fibre Channel link that was down between RPA and splitters or storage volumes (or both) is restored. - 14023 - Error Off Fibre Channel link between RPA and all splitters and storage is restored. - 14024 - Error Off Fibre Channel link between RPA and all splitters is restored. - 14025 - Error Off Fibre Channel link between RPA and all storage is restored. - 14026 - Error The link to RPA at the other site has been restored. - 14027 - Error Off Access to all volumes attached to the consistency group (or groups) is restored. - 14029 - Error Off Fibre Channel link between all RPAs and one or more volumes is restored. - 14033 - Error Off Access to repository volume is restored. - 14034 - Error Off Replication consistency in writes to storage has been restored. - 14035 - Error Off WAN link to RPA at other site is restored. - Normal events 291 Events Table 50 292 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 14036 - Error Off Renegotiation of transfer protocol completed. - 14037 - Error Off Access to all replication volumes attached to the consistency group (or groups) has been restored. - 14038 - Error Off Access to all journal volumes attached to the consistency group (or groups) is restored. - 14039 - Info Long resync completed. - 14040 - Error Off System has detected correction of bad sectors in volume. - 14041 - Error Off System has detected that volume is no longer read only. - 14042 - Error Off Synchronization in progress to restore any failed writes in group. - 14043 - Error Off Synchronization in progress to restore any failed writes. - 14044 - Error Off Problem with IP link between RPAs (in at least in one direction) corrected. - 14045 - Error Off All IP links between RPAs restored. - 14046 - Error Off IP link between RPAs restored. - 14047 - Error Off RPA network interface card (NIC) problem corrected. - 14049 - Info RPA is out of maintenance mode. - 14050 - Info RecoverPoint has dynamically resumed synchronous replication to one of the replicas of this group. The group will now be initialized. During initialization, data is not transferred synchronously. - 14054 - Error off End of RPA to storage connectivity failure. - 14055 - Error off End of RPA to CLARiiON storage connectivity failure. - EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 14056 - Error off End of RPA to CLARiiON storage/splitter connectivity failure. - 14072 - Error off Fibre channel link between <RPAs> has been restored. - 16000 - Error Transient root cause - 16001 - Error Splitter was down. Problem has been corrected. - 16002 - Error Error occurred in all WAN links to other site. Problem has been corrected. - 16003 - Error Error occurred in link to the RPA cluster at the other site. Problem has been corrected. - 16004 - Error Error occurred in data link over WAN -all RPAs were unable to transfer replicated data to other site. Problem has been corrected. - 16005 - Error Error occurred in data link over WAN -RPA was unable to transfer replicated data to other site. Problem has been corrected. - 16006 - Error RPA was disconnected from the RPA cluster. Connection has been restored. - 16007 - Error All RPAs were disconnected from the RPA cluster. Problem has been corrected. - 16008 - Error RPA was down. Problem has been corrected. - 16009 - Error Group entered high load. Problem has been corrected. If high load persists, consider running the balance_load command and applying the load balancing recommendation, or manually modifying the preferred RPA of each group according to the recommendation. - 16010 - Error Journal error occurred. Problem has been corrected -- full sweep required. - Normal events 293 Events Table 50 294 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 16011 - Error Image access log or virtual buffer was full -- writing by hosts at target side was disabled. Problem has been corrected. - 16012 - Error Was unable to enable virtual access to image. Problem has been corrected. - 16013 - Error Was unable to enable access to specified image. Problem has been corrected. - 16014 - Error Fibre Channel link between all RPAs and all splitters and storage was down. Problem has been corrected. - 16016 - Error Fibre Channel link between all RPAs and all storage was down. Problem has been corrected. - 16022 - Error Fibre Channel link between RPA and splitters or storage volumes (or both) was down. Problem has been corrected. - 16023 - Error Fibre Channel link between RPA and all splitters and storage was down. Problem has been corrected. - 16024 - Error Fibre Channel link between RPA and all splitters was down. Problem has been corrected. - 16025 - Error Fibre Channel link between RPA and all storage was down. Problem has been corrected. - 16026 - Error Error occurred in link to the RPA cluster at the other site. Problem has been corrected. - 16027 - Error All volumes attached to the consistency group (or groups) were not accessible. Problem has been corrected. - 16029 - Error Fibre Channel link between all RPAs and one or more volumes was down. Problem has been corrected. - EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 16033 - Error Repository volume was not accessible. Problem has been corrected. - 16034 - Error Off Writes to storage occurred without corresponding writes to RPA. Problem has been corrected. - 16035 - Error Error occurred in link to the RPA cluster at the other site. Problem has been corrected. - 16036 - Error Renegotiation of transfer protocol was requested, and has already been completed. - 16037 - Error All replication volumes attached to the consistency group (or groups) were not accessible. Problem has been corrected. - 16038 - Error All journal volumes attached to the consistency group (or groups) were not accessible. Problem has been corrected. - 16039 - Info System ran long resync. - 16040 - Error System had detected bad sectors in volume. Problem has been corrected. - 16041 - Error System had detected that volume is read only. Problem has been corrected. - 16042 - Error Splitter write may have failed (while group was transferring data). Problem has been corrected. - 16043 - Error Splitter writes may have failed. - 16044 - Error There was a problem with an IP link between RPAs (in at least in one direction). Problem has been corrected. 16045 - Error There was a problem with all IP links between RPAs. Problem has been corrected. - Normal events 295 Events Table 50 296 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 16046 - Error There was a problem with an IP link between RPAs. Problem has been corrected. - 16047 - Error There was an RPA network interface card (NIC) problem. Problem has been corrected. - 16048 - Brief Error Splitter version was not supported. Problem has been corrected. - 16049 - Info RPA temporarily entered maintenance mode, but has since exited. - 16050 - Warning RecoverPoint had dynamically started replicating asynchronously to one of the replicas of this group, but has since resumed synchronous replication. Consequentially, the group has been initialized twice. During initialization, data was not transferred synchronously. If this is not the expected behavior, contact EMC Customer Service. - 16054 - Error brief End of brief RPA to storage connectivity failure. - 16055 - Error brief End of brief RPA to CLARiiON storage connectivity failure. - 16056 - Error brief End of brief RPA to CLARiiON storage/splitter connectivity failure. - 16072 - Error Fibre channel link between <RPAs> was down, but the problem has been corrected, and the link is back up again. 18001 - Error Splitter was up and supported, but it is now down or not supported. - 18002 - Error All links to other site were temporarily restored, but problem has returned. - 18003 - Error Link to the RPA cluster at the other site was temporarily restored, but the problem has returned. - EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 18004 - Error Off Data link was temporarily restored, but problem has returned -- all RPAs are unable to transfer replicated data to other site. - 18005 - Error Off Data link was temporarily restored, but problem has returned -- RPA is currently unable to transfer replicated data to other site. - 18006 - Error Off Connection of RPA to the RPA cluster was temporarily restored, but problem has returned. - 18007 - Error Off All RPAs were temporarily restored to the RPA cluster, but problem has returned. - 18008 - Error Off RPA was temporarily up, but problem has returned – RPA is down - 18009 - Error Off Group temporarily exited high load, but problem has returned - 18010 - Error Off Journal error was temporarily corrected, but problem has returned. - 18011 - Error Off Image access log or virtual buffer was temporarily no longer full, and writing by hosts at target side was re-enabled -but problem has returned. - 18012 - Error Off Virtual access to image was temporarily enabled, but problem has returned. - 18013 - Error Off Access to image was temporarily enabled, but problem has returned. - 18014 - Error Off Fibre Channel link between all RPAs and all splitters and storage was temporarily restored, but problem has returned. - 18016 - Error Off Fibre Channel link between all splitters and all storage was temporarily restored, but problem has returned. - Normal events 297 Events Table 50 298 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 18022 - Error Off Fibre Channel link that was down between RPA and splitters or storage volumes (or both) was temporarily restored, but problem has returned. - 18023 - Error Off Fibre Channel link between RPA and all storage was temporarily restored, but problem has returned. - 18024 - Error Off Fibre Channel link between RPA and all splitters was temporarily restored, but problem has returned. - 18025 - Error Off Fibre Channel link between RPA and all storage was temporarily restored, but problem has returned. - 18026 - Error Link to the RPA cluster at the other site was temporarily restored, but the problem has returned. - 18027 - Error Off Access to all journal volumes attached to the consistency group (or groups) was temporarily restored, but problem has returned. - 18029 - Error Off Fibre Channel link between all RPAs and one or more volumes was temporarily restored, but problem has returned. - 18033 - Error Off Access to repository volume was temporarily restored, but problem has returned. - 18034 - Error Off Replication consistency in writes to storage and writes to RPAs was temporarily restored, but problem has returned. - 18035 - Error Off Link to the RPA cluster at the other site was temporarily restored, but the problem has returned. - 18036 - Error Off Negotiation of transfer protocol was completed, but is now requested again. - EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 50 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 18037 - Error Off Access to all volumes attached to the consistency group (or groups) was temporarily restored, but problem has returned. - 18038 - Error Off Access to all replication volumes attached to the consistency group (or groups) had been temporarily restored, but problem has returned. - 18039 - Info Long resync was completed, but has now restarted. - 18040 - Error Off User marked volume as OK, but bad sectors problem persists. - 18041 - Error Off User marked volume as OK, but read-only problem persists. - 18042 - Error Off Synchronization had restored any failed writes in group, but problem has returned. - 18043 - Error Off Internal problem. - 18044 - Error Off Problem with IP link between RPAs (in at least in one direction) was corrected, but problem has returned. - 18045 - Error Off Problem with all IP links between RPAs (in at least in one direction) was corrected, but problem has returned. - 18046 - Error Off Problem with IP link between RPAs was corrected, but problem has returned. - 18047 - Error Off RPA network interface card (NIC) problem was corrected, but problem has returned. - 18049 - Info RPA temporarily exited maintenance mode, but has since re-entered. - Normal events 299 Events Table 50 300 Listing of normal events and their descriptions Event ID Topic Level Description Trigger 18050 - Info RecoverPoint had dynamically resumed synchronous replication to one of the replicas of this group, but has since started replicating asynchronously again. Consequentially, the group has been initialized twice. During initialization, data was not transferred synchronously. If this is not the expected behavior, contact EMC Customer Service. - 18054 - Error Recurring RPA to storage connectivity failure. - 18055 - Error Recurring RPA to CLARiiON storage connectivity failure. - 18056 - Error Recurring RPA to CLARiiON storage/splitter connectivity failure. - 18072 - Error Fibre channel link between <RPAs> was temporarily restored, but the problem has returned, and the link is back down again. - EMC RecoverPoint Release 3.3 Administrator’s Guide Events Detailed events Detailed events contain more information than normal-scope events. Table 51 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 1002 Management Info User logged out. (User <user>) User logged off from the system 1010 Management Warning Grace period expires in 1 day. You must install an activation code to activate your RecoverPoint license. 1 day prior to grace period expiration 1012 Management Warning License expires in 1 day. You must obtain a new RecoverPoint license. 1 day prior to RecoverPoint license expiration 1013 Management Error License expired. You must obtain a new RecoverPoint license. RecoverPoint license expired 2000 Site Info Site management running on <RPA>. Site control open; RPA has become cluster leader 3000 RPA Info RPA has become a cluster member. (RPA <RPA>) RPA connected to site control 3002 RPA Warning Site management switched over to this RPA. (RPA <RPA>, Reason <reason>) Leadership is transferred from RPA to RPA 3007 RPA Warning Off RPA is up. (RPA <RPA>). RPA that was previously down came up 3008 RPA Warning RPA appears to be down. (RPA <RPA>) RPA suspects that other RPA is down 3011 RPA Info RPA access to volumes restored. (RPA <RPA>, Volume <volume>, Volume Type <type>) Volumes that were inaccessible became accessible 3012 RPA Warning RPA unable to access volumes. (RPA <RPA>, Volume <volume>, Volume Type <type>) Volumes ceased to be accessible to the RPA 3013 RPA Warning Off RPA access to <repository volume> restored. (RPA <RPA>, Volume <volume>) Repository volume that was inaccessible became accessible 3014 RPA Warning RPA unable to access <repository volume>. (RPA <RPA>, Volume <volume>) Repository volume became inaccessible to a single RPA Detailed events 301 Events Table 51 302 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 3020 RPA Warning Off WAN link to RPA at other site restored. (RPA at other site: <RPA>) RPA regained the WAN connection with an RPA at the other site 3021 RPA Warning Error in WAN link to RPA at other site. (RPA at other site: <RPA>) RPA lost the WAN connection with an RPA at the other site 3022 RPA Warning Off LAN link to RPA restored. (RPA <RPA>) RPA regained the LAN connection with an RPA at the local site 3023 RPA Warning Error in LAN link to RPA. (RPA <RPA>) RPA lost the LAN connection with an RPA at the local site, without losing connection through the repository volume 3035 RPA Info An internal process restarted, starting control process. A control process is triggered (for various reasons). 4000 Group Info Group capabilities OK. (Group <group>) Capabilities are full and previous capabilities are unknown 4001 Group Warning Group capabilities minor problem. (Group <group>) Capabilities are either: 1) not full temporarily on RPA on which the group is currently running, or 2) not full indefinitely on the RPA on which the group is not running 4003 Group Error Group capabilities problem (Group <group>) Capabilities are not full indefinitely on the RPA on which the group is running 4007 Group Info Pausing data transfer. (Group <group>, Reason: <reason>) Stop transfer by user 4008 Group Warning Pausing data transfer. (Group <group>, Reason: <reason>) Transfer stopped temporarily by system 4009 Group Error Pausing data transfer. (Group <group>, Reason: <reason>) Transfer stopped indefinitely by system 4010 Group Info Starting data transfer. (Group <group>) Start transfer requested by user 4015 Group Info Transferring latest snapshot before pausing transfer (no data loss). (Group <group>) In total storage disaster -flushing buffer before stopping replication EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 51 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 4016 Group Warning Transferring latest snapshot before pausing transfer (no data loss). (Group <group>) In total storage disaster -flushing buffer before stopping replication 4017 Group Error Transferring latest snapshot before pausing transfer (no data loss). (Group <group>) In total storage disaster -flushing buffer before stopping replication 4018 Group Warning Transfer of latest snapshot from source complete (no data loss). (Group <group>) In total storage disaster -- last snapshot from source site is available at target site 4019 Group Warning Group in high load -- transfer to be paused temporarily. (Group <group>) Disk manager high load 4020 Group Warning Off Group is no longer in high load (Group <group>) End of disk manager high load 4021 Group Error Journal full -- initialization paused. To complete initialization, enlarge journal or allow long resync. (Group <group>) In initialization -- journal is full and long resync is not allowed 4022 Group Error Off Initialization resumed. (Group <group>) End of situation, in initialization, where journal is full and long resync is not allowed 4023 Group Error Journal full -- transfer paused. To restart transfer, first disable access to image. (Group <group>) Access to image is enabled and journal is full 4024 Group Error Off Transfer restarted. (Group <group>) End of situation where access to image is enabled and journal is full 4025 Group Warning Group in high load -- initialization to be restarted. (Group <group>) Group in high load; initialization to be restarted 4026 Group Warning Off Group is no longer in high load. (Group <group>) End of high load for group 4027 Group Error Group in high load -- Journal full, roll to physical image paused, transfer paused. (Group <group>) No space left to which to write during roll. 4028 Group Error Off Group is no longer in high load. (Group <group>) Added journal capacity or disabled image access. Detailed events 303 Events Table 51 304 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 4040 Group Error Journal error -- full sweep to be performed. (Group <group>) Journal volume error 4041 Group Info Group activated. (Group <group>, RPA <RPA>) Group replication-ready; i.e., replication could take place if other factors are OK, such as network, RPAs, storage access. 4042 Group Info Group deactivated. (Group <group>, RPA <RPA>) Group deactivated as a result of user action 4043 Group Warning Group deactivated. (Group <group>, RPA <RPA>) Group temporarily deactivated by system 4044 Group Error Group deactivated. (Group <group>, RPA <RPA>) Group deactivated indefinitely by system 4051 Group Info Disabling access to image -- resuming distribution. (Group <group>) Access to image is disabled (i.e., distribution is resumed) by user 4054 Group Error Enabling access to image. (Group <group>) Access enabled to image indefinitely by system 4057 Group Warning Specified image removed from journal. Try a later image. (Group <group>) Specified image was removed from the journal (i.e., FIFO). 4062 Group Info Access enabled to latest image. (Group <group>, Failover site <site>) Access enabled to latest image during automatic failover 4063 Group Warning Access enabled to latest image. (Group <group>, Failover site <site>) Access enabled to latest image during automatic failover 4064 Group Error Access enabled to latest image. (Group <group>, Failover site <site>) Access enabled to latest image during automatic failover 4080 Group Warning Current lag exceeds maximum lag. (Group <group>, Lag <lag>, Maximum lag <max lag>) Group’s lag exceeds maximum lag (when not regulating application) 4081 Group Warning Off Current lag within policy. (Group <group>, Lag <lag>, Maximum Lag <max_lag>) Group’s lag drops from above the maximum lag to below 90% of maximum. 4082 Group Warning Starting full sweep. (Group <group>) Group’s markers set EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 51 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 4083 Group Warning Starting volume sweep. (Group <group>, Pair <pair>) Volume's markers set 4084 Group Info Markers cleared. (Group <group>) Group’s markers cleared 4085 Group Warning Unable to clear markers. (Group <group>) Attempt to clear group’s markers failed 4086 Group Info Initialization started. (Group <group>) Initialization started 4087 Group Info Initialization completed. (Group <group>) Initialization completed 4091 Group Error Image access log is full -- writing by hosts at target side disabled. (Group <group>, Site <site>) Image access log is full 4095 Group Info Writing image access log to storage -writes to log cannot be undone. (Group <group>) Started marking to retain writes to image access log 4097 Group Warning Maximum journal lag exceeded. Distribution in fast-forward -- older images removed from journal. (Group <group>) Fast-forward started (causing loss of snapshots from before maximum journal lag was exceeded) 4098 Group Warning Off Maximum journal lag within limit. Distribution normal -- rollback information retained. (Group <group>) Five minutes passed since fast-forward stopped 4099 Group Info Initializing in long resync mode. (Group <group>) Started long resync 4107 Group Info Integrity check completed; no inconsistencies found. Integrity check completed successfully, and no inconsistencies were found. 4108 Group Info Integrity check completed, inconsistencies found. Integrity check completed successfully, but inconsistencies were found. 4109 Group Error Integrity check was aborted by system. The preferred RPA setting is modified or image access is enabled for a copy of the specified consistency group. 4110 Group Info Enabling virtual access to image. (Group <group>) User initiated enabling virtual access to an image Detailed events 305 Events Table 51 306 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 4111 Group Info Virtual access to image enabled. (Group <group>) Virtual access to an image has been enabled 4112 Group Info Rolling to physical image. (Group <group>) Rolling to the image (in background) while virtual access to the image is enabled. 4113 Group Info Roll to physical image stopped. (Group <group>) Rolling to the image (i.e., in background, while virtual access to the image is enabled) is stopped. 4114 Group Info Roll to physical image complete – logged access to physical image now enabled. (Group <group>) System completed roll to physical image. 4115 Group Error Unable to enable access to virtual image, due to partition table error. (The partition table on at least one of the volumes in group <group> has been modified since logged access was last enabled to a physical image. To enable access to a virtual image, first enable logged access to a physical image.) Attempt to pause on a virtual image is unsuccessful (due to a change in the partition table of a volume (or volumes) in the group). 4116 Group Error Virtual access buffer is full -- writing by hosts at target side is disabled. (Group <group>) Attempt to write to the virtual image is unsuccessful (because virtual access buffer usage is 100%). 4118 Group Error Unable to enable virtual access to an image. (Group <group>) Attempt to enable virtual access to the image is unsuccessful (due to insufficient memory). 4119 Group Error Initiator issued an out of bounds I/O. Contact Technical Support. (Initiator <initiator wwn>, Group <group>, Volume <volume>) Configuration problem 4120 Group Warning Journal usage (with logged access enabled) now exceeds this threshold. (Group <group>, <journal usage threshold>) Journal usage (with logged image access enabled) has passed a specified threshold. EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 51 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 4121 Group Error Unable to gain permissions to write to replica. RPAs unable to write to replication or journal volumes because they do not have proper permissions. 4122 Group Error Off Trying to regain permissions to write to replica User has indicated that the permissions problem has been corrected. 4123 Group Error Unable to access volumes -- bad sectors encountered RPAs unable to write to replication or journal volumes due to bad sectors on the storage. 4124 Group Error Off Trying to access volumes that previously had bad sectors User has indicated that the bad sectors problem has been corrected. 4125 Group Error Journal capacity is currently insufficient for the required protection window. User specified a required protection window, but journal does not support rollback of that size (even though it is full). 4126 Group Error Off Journal capacity is currently sufficient for the required protection window. User specified a required protection window; journal did not support rollback of that size, but now does. 4127 Group Warning Journal capacity is predicted to be insufficient for the required protection window. User specified a required protection window; system predicts that journal will not support rollback of that size. 4128 Group Warning Off Journal capacity is predicted to be sufficient for the required protection window. User specified a required protection window; system predicted that journal would not support rollback of that size, but now predicts that it will. 4129 Group Warning Image access is enabled on group copy for unusually long time For given copy, time that image access has been enabled on same image exceeds pre-set value. Detailed events 307 Events Table 51 308 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 4130 Group Warning Off Image access on group copy is now disabled (or is enabled on a different image) For copy, where image access had been enabled on same image for unusually long time, user has now disabled image access (or enabled access to a different image). 5000 Splitter Info Splitters attached to volume. (Splitter <splitter>, Volume <volume>) User attached a splitter to a volume 5001 Splitter Info Splitters detached from volume. (Splitter <splitter>, Volume <volume>) User detached a splitter from a volume 5002 Splitter Error RPA unable to access splitter. (Splitter <splitter>, RPA <RPA>) RPA is unable to access a splitter 5003 Splitter Error Off RPA access to splitter restored. (Splitter <splitter>, RPA <RPA>) RPA can access a splitter that was previously inaccessible 5004 OBSOLETE - - - 5005 OBSOLETE - - - 5006 OBSOLETE - - - 5007 OBSOLETE - - - 5013 Splitter Error Splitter is down. (Splitter <splitter>) Connection to splitter lost with no warning; splitter crashed, or connection is down 5015 Splitter Error Off Splitter is up. (Splitter <splitter>) Connection to splitter regained after splitter crash 5016 Splitter Warning Splitter has restarted (Splitter <splitter>) Boot timestamp of splitter has changed 5030 Splitter Error Splitter write is suspected of possible failure. (Splitter <splitter>, Group <group>) Splitter write succeeded to RPA, but not necessarily to storage 5031 Splitter Warning Splitter not splitting to replication volumes -volume sweep/s will be required. (Host <Host>, Volumes <Volume Names>, Groups <Groups>) Splitter not splitting to replication volumes 5032 Splitter Info Splitter splitting to replication volumes. (Host <Host>, Volumes <Volume Names>, Groups <Groups>) Splitter started splitting to replication volumes EMC RecoverPoint Release 3.3 Administrator’s Guide Events Table 51 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 5035 Splitter Info Writes to replication volumes disabled. (Splitter <splitter>, Volumes <Volume Names>, Groups <Groups>) Writes to replication volumes disabled 5036 Splitter Warning Writes to replication volumes disabled. (Host <Host>, Volumes <Volume Names>, Groups <Groups>) Writes to replication volumes disabled 5037 Splitter Error Writes to replication volumes disabled. (Splitter <splitter>, Volumes <Volume Names>, Groups <Groups>) Writes to replication volumes disabled 5038 Splitter Info Splitter delaying writes. (Splitter <splitter>, Volumes <volumes>, Groups <groups>) - 5039 Splitter Warning Splitter delaying writes. (Splitter <splitter>, Volumes <volumes>, Groups <groups>) - 5040 Splitter Error Splitter delaying writes. (Splitter <splitter>, Volumes <volumes>, Groups <groups>) - 5041 Splitter Info Splitter not splitting to replication volumes. (Splitter <splitter>, Volumes <volumes>, Groups <groups>) Splitter not splitting to replication volumes as result of user decision 5042 Splitter Warning Splitter not splitting to replication volumes. (Splitter <splitter>, Volumes <volumes>, Groups <groups>) Splitter not splitting to replication volumes 5043 Splitter Error Splitter not splitting to replication volumes. (Splitter <splitter>, Volumes <volumes>, Groups <groups>) Splitter not splitting to replication volumes due to system action 5045 Splitter Warning Simultaneous problems reported in splitter and RPA. Full-sweep resynchronization will be required upon restarting data transfer. Marking backlog on splitter lost (as result of concurrent double disaster to splitter and RPA) 5046 Splitter Warning Transient error – re-issuing splitter write. - 5050 Splitter Warning Failed to collect system information. Verify that the correct login credentials have been defined for this splitter. - Detailed events 309 Events Table 51 310 Listing of detailed events and their descriptions Event ID Topic Level Description Trigger 5051 Splitter Warning No login credentials have been defined for this splitter. Define login credentials to extend the period in which system information is saved, from three days to thirty days. For support purposes, it is recommended that you do so as soon as possible. - 6000 Group Error An unrecognizable error has occurred. The specified image cannot be accessed. Try accessing a different image. If you cannot access any other images, contact EMC Customer Service. - 6001 Group Error off The system has stopped trying to access an inaccessible image of a distributed group. No user action is required. - EMC RecoverPoint Release 3.3 Administrator’s Guide B Kutils Reference Kutils Reference This section presents information on the syntax and use of each of the commands available as part of the kutils utility. ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ Introduction ...................................................................................... flushFS ............................................................................................... manage_auto_host_info_collection ............................................... mount................................................................................................. showFS............................................................................................... show_vol_info .................................................................................. show_vols.......................................................................................... sqlRestore .......................................................................................... sqlSnap............................................................................................... start..................................................................................................... stop ..................................................................................................... umount .............................................................................................. Kutils Reference 312 315 316 317 318 319 320 321 323 326 327 328 311 Kutils Reference Introduction The kutils utility is installed automatically when you install a RecoverPoint splitter on a host (see the EMC RecoverPoint Deployment Manager Product Guide). When using fabric-based splitters, a standalone version of kutils can be installed separately on hosts. The following sections describe the use of this utility: ◆ ◆ ◆ Usage “Usage” “Path designations” “Commands” A kutils command is always introduced with the kutils key word. When this key word is entered by itself, the kutils utility returns usage notes, as follows: C:\program files\kdriver\kutils> kutils Usage: kutils <command> <arguments> Examples of the command usage are provided for each command. For commands that are available only on hosts running Windows, a Windows-type system prompt is always shown. For other commands, the example may use either a Windows or Unix system prompt. Note: Mount points are not native to Windows. Although Windows supports reparse points, which allows using Unix-like mount points in the NTFS file system, Kutils only supports the use of drive letters. Path designations The path to a device can be designated in the following ways: ◆ device path Example: "SCSI\DISK&VEN_EMC&PROD_MASTER_RIGHT&REV_0001\5&1 33EF78A&0&000" ◆ storage path Example: "SCSI#DISK&VEN_EMC&PROD_MASTER_RIGHT&REV_0001#5&1 33EF78A&0&000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}" 312 EMC RecoverPoint Release 3.3 Administrator’s Guide Kutils Reference ◆ volume path Example: "\\?\Volume{33b4a391-26af-11d9-b57b-505054503030}\" The particular designation used is noted in the description for each command. In addition, some commands (e.g., showDevices, showFS) return the symbolic link for a device. The symbolic link generally provides additional information about the characteristics of the specific device. The following are examples of symbolic links: "\Device\0000005c" "\Device\EmcPower\Power2" "\Device\Scsi\ql23001Port2Path0Target0Lun2" Introduction 313 Kutils Reference Commands The following sections present descriptions and examples of each of the kutils commands: ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ 314 “flushFS” “manage_auto_host_info_collection” “mount” “showFS” “show_vol_info” “show_vols” “sqlRestore” “sqlSnap” “start” “stop” “umount” EMC RecoverPoint Release 3.3 Administrator’s Guide Kutils Reference flushFS This command initiates an OS-flush of the file system. Parameters drive_letter Drive designation for the file system that is to be flushed. Usage Examples This command is available only on hosts running Windows. To initiate an OS-flush of the file system on the device designated as drive E: C:\program files\kdriver\kutils> kutils flushFS E: Flushing buffers for drive E:... Flushed. Related commands None Commands 315 Kutils Reference manage_auto_host _info_collection Parameters This command is used to display the current setting for automatic host info collection (HIC), or to enable or disable the feature. setting Possible values are ENABLE and DISABLE. Usage This command is not included in the kutils standalone version. When no parameter is appended to the command, it displays the current setting. Examples To display the current automatic host info collection setting: C:\program files\kdriver\kutils> kutils manage_auto_host_info_collection Automatic host info collection enabled. Related commands 316 None EMC RecoverPoint Release 3.3 Administrator’s Guide Kutils Reference mount This command mounts a file system. Parameters drive_letter Drive designation for the file system that is to be mounted. path_to_device (optional) Volume path for the device. Usage This command is available only on hosts running Windows. For Windows 2003 and later, use the mountvol.exe, which comes with the Windows operating system. You must specify a volume path for the path_to_device parameter whenever the host has no previous record of a device mapping to the designated drive. If for any reason the mount operation fails, you should follow the following procedure: 1. Create a text file, rescan.txt, that includes the following single line rescan 2. Run the command: diskpart.exe /s rescan.txt Examples To mount a device: C:\program files\kdriver\kutils> kutils mount E: Mounting drive E: as "\\?\Volume{33b4a391-26af-11d9-b57b-505054503030}\"... Mounted. or C:\program files\kdriver\kutils> kutils mount E: \\?\Volume{33b4a391-26af-11d9-b57b-505054503030}\ Mounting drive E: as "\\?\Volume{33b4a391-26af-11d9-b57b-505054503030}\"... Mounted. Related commands umount Commands 317 Kutils Reference showFS This command presents the drive designation, and, as available, the device path, storage path, and symbolic link, for each mounted physical device. Parameters Usage Examples None This command is available only on hosts running Windows. To show the mounted devices according to drive letter: C:\program files\kdriver\kutils> kutils showFS Obtaining mapping... This could take several minutes. Drive C: \\.\PHYSICALDRIVE0: "IDE\DISKMAXTOR_6E040L0__________________________NAR61H A0\314530564C454547202020202020202020202020" Drive E: \\.\PHYSICALDRIVE5: "SCSI\DISK&VEN_EMC&PROD_POWER&\0000284500461065" "SCSI#DISK&VEN_EMC&PROD_POWER&#0000284500461065#{53f563 07-b6bf-11d0-94f2-00a0c91efb8b}" "\Device\EmcPower\Power2" Related commands 318 None EMC RecoverPoint Release 3.3 Administrator’s Guide Kutils Reference show_vol_info This command presents information on the specified volume, including: RecoverPoint name (if created in RecoverPoint), storage path, size, vendor, and product. Parameters volume_name The RecoverPoint name or storage path of the volume for which you want to display information. Usage Both the volume's RecoverPoint name and its storage path are legitimate values for the volume_name parameter. This command is not included in the kutils standalone version. Examples To show information about a specific volume: $:/kdriver/bin/kenv.sh/kdriver/bin/kutils# kutils show_vol_info Vol2 Name Vol2 Path \??\SCSI#Disk&Ven_EMC&Prod_Power&#0000284500461069#{53f 56307-b6bf-11d0-94f2-00a0c91efb8b} Size 3450 Vendor EMC Product SYMMETRIX Related commands show_vols Commands 319 Kutils Reference show_vols This command presents information on all volumes (RecoverPoint and non-RecoverPoint) to which the host has access, including: RecoverPoint name (if created in RecoverPoint), size, and storage path. Parameters Usage None In the information returned by the command, a “-” is displayed under name for any volume that has not been created in the RecoverPoint context. This command is not included in the kutils standalone version. Examples To display information on all volumes to which the host has access: C:\program files\kdriver\kutils> kutils show_vols name Size Path 39205MB \??\IDE#DiskMaxtor_6E040L0__________________________NAR 61HA0#314530564c454547202020202020202020202020#{53f56307 -b6bf-11d0-94f2-00a0c91efb8b} Vol6 862MB \??\SCSI#Disk&Ven_EMC&Prod_Power&#0000284500461003#{53f5 6307-b6bf-11d0-94f2-00a0c91efb8b} Vol1 3450MB \??\SCSI#Disk&Ven_EMC&Prod_Power&#0000284500461065#{53f5 6307-b6bf-11d0-94f2-00a0c91efb8b} . . . 7MB \??\SCSI#Disk&Ven_EMC&Prod_Power&#0000284500461192030001 0d00#{53f56307-b6bf-11d0-94f2-00a0c91efb8b} Related commands 320 show_vol_info EMC RecoverPoint Release 3.3 Administrator’s Guide Kutils Reference sqlRestore This command is used to restore a snapshot previously created by the sqlSnap command. Additional information essential to the understanding and use of this command is presented in EMC Deploying with Microsoft SQL Server Technical Notes. Parameters database Name of the SQL Server database that is to be restored. Multiple databases can be specified using a comma to separate between the databases. Note: Note: When there is more than one instance of Microsoft SQL on the same server, you can use the format <instance_name>.<database_name> to specify the database. metadata_drive Directory from which the VDI metadata is read when restoring. Note: Note: This directory must reside on one of the volumes being replicated. Alternatively, the above parameters can be incorporated into the following single parameter: file Name of a configuration file with the following format: database=<db1[,db2,…]> metadata_drive=<drive> Usage This command is available only on hosts running Windows. When restoring from a configuration file, the command reads only the database and metadata_drive parameters from within the file. Example To restore the VDI storage: C:\program files\kdriver\kutils> kutils sqlRestore database=db1,db2 metadata_drive=E Alternatively, the syntax can specify a configuration file, from which the command reads only the first two parameters: C:\program files\kdriver\kutils> kutils sqlRestore file=sqlparams.file Commands 321 Kutils Reference where the structure of sqlparams.file is as follows: db1,db2 E:\ Related commands 322 sqlSnap EMC RecoverPoint Release 3.3 Administrator’s Guide Kutils Reference sqlSnap This command performs a VDI-based SQL Server snapshot. It includes a backup operation, used to put SQL Server into a quiescent state for taking a snapshot. Additional information essential to the understanding and use of this command is presented in EMC Deploying with Microsoft SQL Server Technical Notes. Parameters database Name of the SQL Server database that is to be replicated. Multiple databases can be specified using a comma to separate between the databases. Note: When there is more than one instance of Microsoft SQL on the same server, you can use the format <instance_name>.<database_name> to specify the database. metadata_drive Directory in which the VDI metadata is stored (and from which it is read when restoring). Note: This directory must reside on one of the volumes being replicated. group Name of a RecoverPoint consistency group to which the image (related to this snapshot) to be bookmarked belongs. You can specify multiple groups. tag The label text to be used when bookmarking the VDI-enabled snapshot, or the name of the snapshot from which to restore. policy Snapshot consolidation policy to set for this snapshot. Valid values are: • never; Snapshot is never consolidated. • survive_daily; Snapshot remains after daily consolidations, but is consolidated in weekly, monthly and manual consolidations. Commands 323 Kutils Reference • survive_weekly; Snapshot remains after daily and weekly consolidations, but is consolidated in monthly and manual consolidations. • survive_monthly; Snapshot remains after daily, weekly and monthly consolidations, but is consolidated in manual consolidations. • always; Snapshot is consolidated in every consolidation process, whether manual or automatic. ip IP address of the local RecoverPoint cluster management interface. drives_to_flush Drive (or drives) where database files are stored (or to which they should be restored). Each drive is designated by a drive letter, e.g., “E”. When performing a backup, the system performs a file-system flush operation just before bookmarking a snapshot. You can specify multiple drives by using a comma to separate the drive designations. Alternatively, all of the above parameters can be incorporated into the following single parameter: file Name of a configuration file with the following format: database=<db1[,db2,…]> metadata_drive=<drive> group=<group1[,group2,…]> tag=<bookmark_name> policy=<policy_name> ip=<mgmt_ip> drives_to_flush=<drive1[,drive2,…]> Usage This command is available only on hosts running Windows. To bookmark a VDI snapshot (while using the kutils sqlSnap utility), you must have IP connectivity to the RecoverPoint cluster at least one of the sites. You can verify this by pinging the RecoverPoint Cluster Management IP addresses from the host. Examples To take the VDI snapshot: C:\program files\kdriver\kutils> kutils sqlSnap database=db1,db2 metadata_drive=E group=group1,group2 324 EMC RecoverPoint Release 3.3 Administrator’s Guide Kutils Reference bookmark=Hourly_VDI_9-12-04-2315 policy=survive_daily ip=192.168.0.1 drives_to_flush=E,F This would be equivalent to running the following command: C:\program files\kdriver\kutils> kutils sqlSnap file=sqlparams.file where the structure of sqlparams.file is as follows: database=db1,db2 metadata_drive=E:\ group=group1,group2 tag=Hourly_VDI_9-12-04_2315 policy=survive_daily ip=192.168.0.1 drives_to_flush=E,F In response (to either command), RecoverPoint immediately takes a bookmarked VDI-enabled snapshot, and returns a report on the result of the operation (i.e., success or error). Related commands sqlRestore Commands 325 Kutils Reference start This command causes the host to split writes via the host-based splitter. Parameters Usage Examples None This command is not included in the kutils standalone version. To split writes via the host splitter: $:/kdriver/bin/kenv.sh/kdriver/bin/kutils# kutils start Splitter started Successfully Related commands 326 stop EMC RecoverPoint Release 3.3 Administrator’s Guide Kutils Reference stop This command causes the host-based splitter to stop splitting writes. Parameters Usage Examples None This command is not included in the kutils standalone version. To stop splitting writes via the host splitter: C:\program files\kdriver\kutils> kutils stop Stopping Service Succeeded Related commands start Commands 327 Kutils Reference umount This command unmounts a file system. Parameters drive_letter Drive designation for the file system that is to be unmounted. Usage Examples This command is available only on hosts running Windows 2008. For Windows 2003, use the mountvol.exe, which comes with the Windows operating system. To unmount a device: C:\program files\kdriver\kutils> kutils umount E: Unmounting drive E:... unmounted from "\\?\Volume{33b4a391-26af-11d9-b57b-505054503030}\" Related commands 328 mount EMC RecoverPoint Release 3.3 Administrator’s Guide C Troubleshooting Troubleshooting This section presents the user actions necessary to mitigate events that may occur during RecoverPoint operation. This section describes occurrences in RecoverPoint, how to identify them, and how to mitigate them, from a user’s perspective. ◆ ◆ ◆ ◆ My host applications are hanging ................................................. My copy is being regulated ............................................................ My copy has entered a high load state ......................................... My RPA keeps rebooting ................................................................ Troubleshooting 330 332 334 339 329 Troubleshooting My host applications are hanging Some RecoverPoint users set a policy that enables RecoverPoint to control the acknowledgement of writes back to the host in the case of bottlenecks or insufficient resources that would otherwise prevent RecoverPoint from replicating the data. If your host applications experience delays, loss of client connectivity, or slow response times, check whether the Allow Regulation setting in the Consistency group Policy tab is checked. See “Application regulation” on page 49 for more information. This section answers the questions: ◆ ◆ ◆ ◆ 330 “When does application regulation happen?” “How does application regulation work?” “How do I know application regulation is happening?” “What can I do to stop my group from being regulated?” When does application regulation happen? Application regulation happens when a user enables the Allow Regulation Consistency Group Protection policy setting in the RecoverPoint Management Application, or sets the regulate_application parameter in the set_policy CLI command. How does application regulation work? The system slows host applications when approaching the lag policy limit (see “RPO control” on page 53). When the system cannot replicate the current incoming write-rate while guaranteeing the lag setting, the system slows host applications to guarantee that the RPO is always enforced. Additionally, if there is a bottleneck in the system, the system will regulate the host applications instead of entering a high load state (see “My copy has entered a high load state” on page 334). How do I know application regulation is happening? If your host applications experience delays, loss of client connectivity, or slow response times, check whether there is a check in the Allow Regulation checkbox of the consistency group Policy tab. If there is, your host applications are being regulated to ensure an RPO. EMC RecoverPoint Release 3.3 Administrator’s Guide Troubleshooting What can I do to stop my group from being regulated? To come out of this state, uncheck the Allow Regulation checkbox. Note: Before unchecking this checkbox, make sure you are familiar with all of the contents of “My host applications are hanging” on page 330, “Application regulation” on page 49 and “Allow Regulation” on page 146, and understand all of the implications of doing so. My host applications are hanging 331 Troubleshooting My copy is being regulated RecoverPoint includes a smart mechanism that protects the system from adverse affects and over-consumption of system resources, when a system component is operating improperly in the system. This mechanism is referred to as Control action regulation. This section answers the questions: ◆ ◆ ◆ ◆ ◆ When does control action regulation happen? Control action regulation happens when a system component is operating improperly in the system, and jittering (quickly changing) between two states for a set period of time. How do I know control action regulation is happening? You know the control action regulation mechanism has been enabled, when: How does control action regulation work? 332 “When does control action regulation happen?” “How do I know control action regulation is happening?” “How does control action regulation work?” “How do I release a copy from control action regulation?” “How do I verify that regulation is over?” ◆ Event number 4133 (Copy regulation has started) is displayed in the event log. ◆ In the consistency group Status Tab of the GUI, the Role of a copy becomes Regulated and is displayed in red. When control action regulation happens, to allow the environment to stabilize, the control action regulation mechanism places the copy in a Regulated state, in which the system will protect itself by closing the link to the copy, limiting any adverse affects and over-consumption of system resources. The copy stays in the state it was in before regulation began for 30 minutes or until corrective action is taken. EMC RecoverPoint Release 3.3 Administrator’s Guide Troubleshooting How do I release a copy from control action regulation? How do I verify that regulation is over? When control action regulation happens, you can: ◆ Release all groups at all copies from this state by running the unregulate_all_copies command from the CLI. ◆ Check previous event logs. Look for repetitive errors that may indicate a specific problem in the system. ◆ Check SAN/IP events outside of RecoverPoint, as instabilities may not originate from RecoverPoint. ◆ If regulation persists, collect all system information (See “Collecting system information” on page 238), and contact EMC Customer Support for further instruction. The following indicators can help you verify that your copy is no longer being regulated: ◆ Event 4132 (Copy regulation has ended due to a user action or internal timeout.) is displayed in the event log. ◆ The Role of a copy is no longer displayed in red in the consistency group Status Tab. My copy is being regulated 333 Troubleshooting My copy has entered a high load state High load is a system state that indicates resource depletion during replication. There are two kinds of high loads in RecoverPoint: ◆ “What is a permanent high load?” - In these cases, RecoverPoint stops and waits for a user action in order to come out of high load. ◆ “What is a temporary high load?” - In these cases, RecoverPoint tries to recover from the high load and will keep trying until the condition that triggered the high load changes. This section answers the questions: ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ How do I know a copy is experiencing a high load? What is a permanent high load? 334 “How do I know a copy is experiencing a high load?” “When do permanent high loads occur?” “How do permanent high loads work?” “How can I tell a copy is under permanent high load?” “What can I do to come out of permanent high load?” “How do I verify that a permanent high load is over?” “When do temporary high loads occur?” “How do temporary high loads work?” “How can I tell a copy is under temporary high load?” “What should I know about temporary high loads?” “How do I verify that a temporary high load is over?” You know that a copy has entered a high load state, when: ◆ Warning events are logged specifying that the replica is experiencing high load. ◆ In the consistency group Status Tab of the GUI, the Transfer state of a copy becomes High load. A permanent high load is a system state that happens during replication, when the size of the journal, or the queue of snapshots waiting for distribution of the journal at the replica copy, is insufficient. EMC RecoverPoint Release 3.3 Administrator’s Guide Troubleshooting When do permanent high loads occur? A permanent high load generally happens in one of two cases: ◆ When a user accesses a replica in logged or virtual access modes for a long time (see “Image access” on page 53) and the queue of snapshots waiting for distribution of the journal reaches its maximum capacity (see “The distribution phase” on page 70), in this case the system will Pause Transfer and wait for user input. ◆ When the system is in initialization mode, and the journal volume has reached its maximum capacity, while the Allow distribution of snapshots that are larger than capacity of journal volumes setting is disabled (Also known as: Long initialization, or long resync). How do permanent high loads work? When any of the events described in “When do permanent high loads occur?” on page 335 occurs, the system stops transfer and waits for user input. How can I tell a copy is under permanent high load? The following indicators are displayed when your copy is experiencing a permanent high load: ◆ Warning events are logged specifying that the replica is experiencing high load. ◆ The Transfer state is displayed as High load. You can display the transfer state: • By running the get_group_states command in the RecoverPoint Command Line Interface. • In the consistency group Status Tab of the RecoverPoint Management Application: My copy has entered a high load state 335 Troubleshooting What can I do to come out of permanent high load? How do I verify that a permanent high load is over? To release a copy from a permanent high load: ◆ If the user accessed a replica in logged or virtual access mode for a long time (see “Image access” on page 53) and the queue of snapshots waiting for distribution of the journal reached its maximum capacity (see “The distribution phase” on page 70), you should disable image access or enable direct access (see “Image access modes” on page 74). ◆ If the system was in initialization mode, and the journal volume became full, while the Allow distribution of snapshots that are larger than capacity of journal volumes setting was disabled, see “Long initializations” on page 85. The following indicators can help you verify that your copy is no longer experiencing a permanent high load: ◆ Warning events are logged specifying that the replica is no longer experiencing high load. ◆ The Transfer state is no longer displayed as High load. You can display the transfer state: • By running the get_group_states command in the RecoverPoint Command Line Interface. • In the consistency group Status Tab of the RecoverPoint Management Application: 336 What is a temporary high load? A temporary high load is a system state that happens during replication, when the RPA resources at the production site are insufficient. When do temporary high loads occur? Temporary high loads occur: ◆ In extreme cases, during uncommonly great durations of uncommonly extreme write loads ◆ Replica or journal volumes not fast enough to handle distribution ◆ WAN too slow ◆ Compression level too high EMC RecoverPoint Release 3.3 Administrator’s Guide Troubleshooting How do temporary high loads work? When any of the events described in “When do temporary high loads occur?” on page 336 occurs, traffic is paused and started immediately. If resources are still low, the system waits five minutes and then tries to pause and start transfer again until the required resources are available. Note: Upon every start of transfer, a short initialization occurs. How can I tell a copy is under temporary high load? The following indicators are displayed when your copy is experiencing a temporary high load: ◆ Warning events are logged specifying that the replica is experiencing high load. ◆ The Transfer state is displayed as high load, followed by a progress status. You can display the transfer state: • By running the get_group_states command in the RecoverPoint Command Line Interface. • In the consistency group Status Tab of the RecoverPoint Management Application: What should I know about temporary high loads? Temporary high loads are a common occurrence and expected to happen from time to time. How do I verify that a temporary high load is over? The following indicators can help you verify that your copy is no longer experiencing a temporary high load: If the high load lasts for an extreme period of time or occurs too frequently (and will eventually impact the business RPO) contact EMC Customer Support for a mitigation plan. ◆ Warning events are logged specifying that the replica is no longer experiencing high load. My copy has entered a high load state 337 Troubleshooting ◆ The Transfer state is no longer displayed as High load. You can display the transfer state: • By running the get_group_states command in the RecoverPoint Command Line Interface. • In the consistency group Status Tab of the RecoverPoint Management Application: 338 EMC RecoverPoint Release 3.3 Administrator’s Guide Troubleshooting My RPA keeps rebooting Reboot regulation is a state of regulation that allows the system to detach an RPA from its RPA cluster in the event of frequent unexplained reboots or internal failures. This section answers the questions: ◆ ◆ ◆ ◆ “When does reboot regulation happen?” “How does reboot regulation work?” “How do I know reboot regulation is happening?” “What should I do to stop reboot regulation?” When does reboot regulation happen? Reboot regulation happens when an RPA is frequently and unexpectedly rebooting, or undergoing a repeated internal failure. How does reboot regulation work? When an RPA behaves in the manner described in “When does reboot regulation happen?” on page 339, the system detaches the RPA from the RPA cluster. How do I know reboot regulation is happening? Reboot regulation is happening when the red icon is frequently displayed in the Connectivity column of the RPAs tab. The user receives the following message when logging into the RPA as a boxmgmt user: What should I do to stop reboot regulation? To stop reboot regulation, contact EMC Customer Service for further instructions. My RPA keeps rebooting 339 Troubleshooting 340 EMC RecoverPoint Release 3.3 Administrator’s Guide