Disaster Recovery of Oracle Fusion Middleware and

Disaster Recovery of Oracle Fusion Middleware
and Oracle Database Server
with EMC RecoverPoint
Applied Technology
Abstract
Oracle database and application administrators face many challenges to managing the application and storage
resources necessary for Oracle operations. This white paper outlines how EMC® RecoverPoint provides costeffective local and remote replication of Oracle Fusion Middleware and Oracle Database Server as part of a
disaster recovery solution.
August 2009
Copyright © 2009 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION
MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE
INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com
All other trademarks used herein are the property of their respective owners.
Part Number h6450
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
2
Table of Contents
Executive summary ............................................................................................ 5
Background .................................................................................................................................. 5
Introduction ......................................................................................................... 5
Audience ...................................................................................................................................... 6
RecoverPoint....................................................................................................... 6
Advantages of RecoverPoint ....................................................................................................... 6
Local and remote recovery .......................................................................................................... 6
Replication modes ....................................................................................................................... 7
Oracle database protection with RecoverPoint............................................................................ 7
Federated environments and consistency groups ....................................................................... 8
Use case objectives.......................................................................................... 10
Oracle/EMC environment block diagram ........................................................ 10
RecoverPoint validation architecture.............................................................. 12
CLARiiON array-side preparations.................................................................. 14
System configuration and storage .................................................................. 15
Hardware configuration.............................................................................................................. 15
Software configuration ............................................................................................................... 16
Oracle Fusion Middleware, web tier, and Database volume layout........................................... 17
RecoverPoint CRR consistency groups..................................................................................... 18
RecoverPoint group sets and parallel bookmarks ..................................................................... 18
OCFS2 configuration ........................................................................................ 19
High availability for persistent stores ......................................................................................... 19
Use high-availability storage for state data................................................................................ 19
Network configuration...................................................................................... 20
Planning for disasters and planned downtime............................................... 20
Initiate the replication process ................................................................................................... 20
RecoverPoint software installation and configuration ............................................................ 21
Starting replication.................................................................................................................. 21
Switchover procedures .............................................................................................................. 22
Switchback procedures.............................................................................................................. 24
Failover procedures ................................................................................................................... 26
Failback procedures................................................................................................................... 27
General recommendations............................................................................... 28
Setting snapshots or manual bookmarks based on requirements............................................. 28
Periodic DR testing .................................................................................................................... 28
Event notification........................................................................................................................ 29
General I/O and sizing ............................................................................................................... 29
Conclusion ........................................................................................................ 29
References and resources ............................................................................... 30
Oracle......................................................................................................................................... 30
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
3
EMC ........................................................................................................................................... 30
Appendix ........................................................................................................... 31
Oracle DR terminology............................................................................................................... 31
RecoverPoint terminology.......................................................................................................... 32
RecoverPoint write splitters ....................................................................................................... 33
RecoverPoint image access modes .......................................................................................... 34
Virtual access (instant) ........................................................................................................... 34
Virtual access (instant) with Roll image in background.......................................................... 34
Logged access (physical)....................................................................................................... 35
Disable Image Access............................................................................................................ 35
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
4
Executive summary
Oracle’s family of middleware products is comprehensive, standards-based application infrastructure
software – from the leading Java application server to SOA and Enterprise 2.0 portals. Pre-integration with
Oracle Applications, Database, and Enterprise Manager speeds implementation and lowers costs.
EMC® RecoverPoint is an important advancement in the area of data replication. RecoverPoint provides
local and remote replication for heterogeneous servers and storage and enables multiple applications to
have replication consistency with fine-grained control over local and remote recovery.
This white paper describes an application topology supporting core services provided by Oracle Fusion
Middleware and Oracle Database Server. The data replication and disaster recovery products represented
can address your specific application topology as it expands beyond the core represented here. This
disaster recovery validation test incorporates the use of Oracle Application Server SOA Suite and Oracle
Database Server 10gR2.
Oracle SOA Suite is a comprehensive, hot-pluggable software suite for the building, deployment, and
management of a service-oriented architecture. This includes service-oriented application development,
service-oriented applications and IT systems integration, and service-oriented management of business
processes. It plugs in to a heterogeneous IT infrastructure and enables enterprises to adopt SOA
incrementally.
These applications rely on hundreds of gigabytes or even terabytes of data and they share one common
factor. They need a well-designed recovery plan in case of disaster. Oracle and EMC deliver key solutions
that address the protection of mission-critical applications.
Enterprise deployments need protection from unforeseen disasters and natural calamities. One protection
solution involves setting up a disaster recovery (DR) site at a geographically different location from the
production site. The DR site may have equal or fewer services and resources compared to the production
site. Application data, metadata, configuration data, and security data are replicated to the DR site on a
periodic basis. The DR site is normally in a passive mode; it is started when the production site is not
available. This deployment model is sometimes referred to as an active/passive model. This model is
normally adopted when the two sites are connected over a WAN and network latency does not allow
clustering across the two sites.
Background
This white paper describes the setup and testing environment deployed to validate EMC RecoverPoint 3.1
with both Oracle Fusion Middleware and an Oracle database. This test configuration validates that the
entire environment can be restored at a secondary site (DR site) in the event of a major production site
failure. This test validates third-party replication using EMC RecoverPoint and Oracle Database Server as
a joint solution that protects both database and non-database artifacts.
The white paper Disaster Recovery of Oracle Fusion Middleware with EMC RecoverPoint outlines an alternative
solution approach where Oracle Data Guard is used to protect Oracle database files.
Introduction
This white paper is a follow-on paper to Disaster Recovery of Oracle Fusion Middleware with EMC
RecoverPoint. It addresses and explains the benefits of using EMC RecoverPoint local and remote
replication to provide operational and disaster recovery for the Oracle Fusion Middleware and database
environments. RecoverPoint provides application-consistent recovery points that can be utilized in response
to a number of possible scenarios, enhancing the native availability of Oracle environments.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
5
Audience
The intended audience for this paper includes both storage administrators as well as database administrators
seeking to understand best practices for failover and recovery of both Oracle Database Server and Oracle
Fusion Middleware environments. The reader will establish an understanding of the importance of using an
integrated solution to ensure effective recovery of all necessary data files to restart not only Oracle
databases, but the applications they support. Detailed best practices are provided based upon EMC and
Oracle joint testing, helping provide IT organizations with guidelines to develop a complete disaster
recovery solution specific to their business.
RecoverPoint
RecoverPoint is proven technology for high-availability Oracle environments with both local and remote
protection across SAN storage with complete protection against many possible disaster scenarios. This
type of environment provides resiliency against failures within the data center infrastructure. It can help
improve recovery from a regional disaster, all with the added benefit of immediate and instantaneous
application recovery.
Oracle products are inherently highly available and provide enterprise-class reliability without
compromising security, performance, or scalability. To enhance the built-in availability features for
Oracle, consider the following requirements for a data protection solution:
•
Protection from infrastructure failure (such as a storage array or SAN switch)
•
Protection from local or regional disaster
• Protection from data corruption
Many companies are deploying continuous data protection (CDP) as a way to meet their recovery time
objectives (RTO) and recovery point objectives (RPO). A true CDP implementation ensures that all
changes to an application’s data are tracked and retained consistently. In effect, CDP creates an electronic
journal of application data snapshots, one for every instant in time that a modification occurs.
Advantages of RecoverPoint
RecoverPoint CDP preserves a record of the write transactions that take place within its environment,
providing crash or application consistent recovery points. For local replication, RecoverPoint captures
every write with I/O splitter technology (see the “RecoverPoint write splitters” section on page 33) and
preserves them in a local journal; for remote replication, transactions are grouped based on user-specified
policies, with significant write changes preserved in a journal at the DR site.
This preservation of writes ensures that if data is lost or corrupted, such as from a server failure, virus,
Trojan horse, software errors, or end-user errors, it is always possible to recover a clean copy of the
affected data. Another advantage for RecoverPoint is that this data recovery can be performed at either the
local or remote locations. These recovery points can be immediately accessed and mounted back to
production environments in seconds — much less time than is the case with disk-based snapshots, tape
backups, or archives.
Local and remote recovery
EMC RecoverPoint is a comprehensive data-protection solution providing concurrent local and remote
(CLR) data protection. This integrates both CRR (remote) and CDP (local) replication, allowing users to
recover applications to any point in time. The integration of local (CDP) and remote (CRR) replication
protects data against catastrophic events that can bring entire data centers to a standstill. RecoverPoint
delivers superior data protection by allowing both local and remote replication with no application
performance degradation. As a result, organizations can deploy geographically dispersed data centers for
maximum protection from local or regional failure or disaster.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
6
RecoverPoint CDP, CRR and CLR use the same write-splitting methods, journaling technologies, and
appliance platform. They both make use of consistency groups, which let the user define sets of volumes
for which the inter-write order must be retained during replication and recovery. This ensures that data at
any point in time will be fully self-consistent.
With RecoverPoint CDP, instead of compressing the data and sending over an IP network to a remote
volume, it writes the data to a local journal and then to a local volume. As there is no IP network involved,
and hence no latency concern, RecoverPoint CDP can synchronously track every write in the local journal
and distribute the write to the target volume—all without impacting application-server performance.
RecoverPoint CRR transfers significant writes, based on an application’s RPO/RTO, to a remote site where
they are saved in a history journal. Once the appliance receives the write, it will bundle this write up with
others into a package. Redundant blocks are eliminated from the package, and the remaining writes are
sequenced and stored with their corresponding timestamp and bookmark information. The package is then
compressed, and an MD-5 checksum is generated for the package.
The package is then scheduled for delivery across the IP network to the remote appliance. Once received
the remote appliance verifies the checksum to ensure the package was not corrupted in the transmission.
The data is then uncompressed and written to the journal volume. Once the data has been written to the
journal volume, it is distributed to the remote volumes, ensuring that write-order sequence is preserved.
RecoverPoint CRR enables the user to perform backups at the remote site, eliminating the need to take the
production file systems offline.
With RecoverPoint data recovery can be performed locally and/or remotely by rewinding the target
volumes back to a selected point in time by using earlier versions of data saved in the journal.
RecoverPoint CRR provided protection for Oracle Fusion Middleware data. It validated that an entire
environment could be restored at a secondary or disaster recovery site, in the event of a major production
site failure.
Replication modes
When transferring data from the local site to the remote site, the RecoverPoint system automatically
switches between replication modes on the fly to ensure that it is always using the method that best fits the
current load conditions and the replication policy. The RecoverPoint system automatically switches
between the following modes, according to load conditions:
•
Continuous synchronous mode
•
Continuous asynchronous mode
•
Snapshot mode
RecoverPoint automatically uses the replication mode that is most effective for the current conditions,
including the application load, throughput capacity, and replication policy. Regardless of the replication
mode, RecoverPoint is unique in its ability to guarantee a consistent replica at the target side under all
circumstances, and in its ability to retain write order fidelity in multi-host heterogeneous SAN
environments. For more information regarding replication modes, please refer to the EMC RecoverPoint
Release 3.1 Administrator’s Guide.
Oracle database protection with RecoverPoint
EMC RecoverPoint supports both single instance and Oracle Real Application Clusters (RAC) for local and
remote replication of the Oracle RAC SAN-attached volumes. The Oracle LUNs are grouped into a single
consistency group, and replication sets are created to map the production LUNs to the remote and/or local
copy LUNs. RecoverPoint then processes the consistency group based on the type of recovery required.
The following are among the options Oracle provides for protecting its databases, which can be integrated
with RecoverPoint replication.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
7
•
Application-consistent recovery from a shutdown (also known as “cold” backup)
The user creates a consistency group that represents the Oracle application. The consistency group
contains all volumes for the application, including data files, online redo log files, configuration files,
and optionally control files. This method produces a copy from which you can restore the database
with 100 percent reliability. Because normal operations must be halted while the “cold” backup is
being created, this method is not appropriate for systems that must operate on a 24 x 7 basis. In
addition, any changes protected by RecoverPoint that are made before or after the “cold” backup will
not be available as an application-consistent recovery point, but they can be recovered as a crashconsistent recovery point (see below).
When the database is shut down, the user will create a RecoverPoint bookmark for the specific
consistency group, identifying the image as a “cold” backup image. This bookmark can be used to
identify a point-in-time recovery image that represents a fully restorable and restartable Oracle
database image.
•
Crash-consistent recovery during operations
This process enables the creation of crash-consistent images without requiring system shutdown. This
process is performed by default by RecoverPoint for all applications as part of the RecoverPoint writesplitting operations. As a write is issued to the production volumes, RecoverPoint splitters intercept it
and send a copy of the write to the RecoverPoint appliance for further processing. These captured
writes represent the on-disk consistent data, which is the same data that remains on external storage
even when an application crashes.
When Oracle is restarted from a server crash, an instance must first open the database and then execute
recovery operations. The instance automatically uses the redo log to recover the committed data in the
database buffers that was lost when the instance failed. Oracle also undoes any transactions that were
in progress on the failed instance when it crashed, then clears any locks held by the crashed instance
after recovery is complete. When Oracle is restarted from a RecoverPoint crash-consistent image, it
will perform the same recovery procedures.
•
Application-consistent recovery during operation (also known as “hot” or “fuzzy” backup)
This process enables the creation of application-consistent images without requiring system shutdown.
It is required that all data files belonging to the relevant tablespaces, archive log files, and control files
are flushed from the server’s in-memory buffers to disk. To ensure that Oracle can recover from these
images, Oracle must write additional information to the log file; information not required when crashconsistent images are sufficient.
This feature in RecoverPoint requires that the user script several commands both to the Oracle Server
and to the RecoverPoint appliance. The procedure entails placing the appropriate tablespace or
database into backup mode (for example, ALTER TABLESPACE BEGIN BACKUP or ALTER
DATABASE BEGIN BACKUP). When in “hot” backup mode, when a database block is modified,
the entire block is written to the online redo logs. In normal operations, only the changed bytes are
written. Also the data file headers are not updated with the SCN when a checkpoint is performed.
Once Oracle backup mode is set, the script creates a RecoverPoint bookmark for the specific
consistency group to identify the image as an application-consistent image. Archived redo logs can be
used against this image.
Federated environments and consistency groups
Federated environments are related applications that span multiple servers and storage arrays at the same
site. Each application has its own RPO and RTO policies that govern the protection and recoverability of
the application’s data. In RecoverPoint terminology, each application becomes a RecoverPoint
consistency group with its own policy, journal, and replication set. For a successful recovery of the
federated environment the customer must ensure that the individual consistency groups have a common
recovery point across all of the applications.
To get a common recovery point, the user creates a “group set” that defines the RPO across all of the
RecoverPoint consistency groups that make up the federated environment. This is done though the use of a
common bookmark in each journal for the consistency groups that are part of the “group set.”
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
8
When the user recovers each of the consistency groups that make up the federated environment and selects
the common bookmark, the user will ensure that all of the application’s data is recovered to the exact same
point in time. This enables the data to be used for such advanced features as building a testing and
development environment, creating a federated backup image, data mining, and so forth.
Use of federated consistency groups provides the following:
•
Allows application recovery to be tiered by service level
ƒ Multiple volumes per group
ƒ Mixed recovery point objectives within the same infrastructure
•
Provides independent replication controls
ƒ Recover by group, either locally or remotely
ƒ Start/stop by group
•
Enables grouping of optimization
ƒ Importance
ƒ Resource usage
ƒ Recovery point and recovery time objectives
•
Each tier can have different service level agreements
ƒ Consistency groups per tier
ƒ Operational recovery of tier
•
Enforces consistency across tiers
ƒ Federated environments
ƒ Recover to a known point for all applications
ƒ Disaster recovery for tier or application
ƒ Spans operating systems, applications, storage, and servers
•
Enables advanced functions
ƒ Full environment cloning
ƒ Application upgrade testing
ƒ Data mining
ƒ Consistent production rebuild
The federated environment topology consists of three separate tiers: the web tier, the Oracle Fusion
Middleware SOA Suite application tier, and the database tier. For successful recovery, consistency must be
maintained between the middleware application tier and the database tier.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
9
Use case objectives
Objective
Details
Validate the test system architecture deploying key Oracle
and EMC technologies.
The archtecture consists of SAN-based storage
with multiple Linux servers deployed in a
production / DR site mode.
Utilize EMC RecoverPoint to protect all tiers, that is,web,
Oracle Fusion Middleware, and database.
Describe how RecoverPoint is set up and
administered to protect the Oracle WebLogic
application server, configuration information, and
the WebLogic persistent stores (transaction logs),
the SOA Suite binaries and configuration, and the
Oracle database.
Ensure a very small to zero RTO and a very small RPO.
Detail how RecoverPoint CLR is used to create
simultaneous local/remote replicas and
bookmarks.
Provide no single point of failure.
Accomplish with clustered WebLogic
Application servers, EMC storage as well as
highly available replication appliance (RPA)
configurations.
Support the replication of WebLogic configuration and
transaction logs (TLogs)
Achieve through EMC RecoverPoint CLR for
local and remote replication.
Set up procedures to demonstrate planned switchover and
switchback and unplanned failover and failback.
Show these procedures using EMC RecoverPoint.
Demonstrate ease of use in WebLogic and Oracle database
environment failover and failback procedures.
Document these procedures with easy-to-follow
failover and failback instructions.
Run the Oracle validation test suite, verifying failover data
integrity.
Execute test program and procedures that validate
RecoverPoint as a DR solution. This paper
describes a test environment that is representative
of DR protection of a production environment.
The actual validation tests are not addressed here.
Oracle/EMC environment block diagram
Figure 1 depicts the major components and relationships between the servers, and applications on the
primary/production and disaster recovery sites. The configuration consists of a single web host and two
application servers with Oracle Fusion Middleware. They are clustered using Oracle WebLogic Server
clustering. EMC RecoverPoint protects all objects including the Oracle Home binaries, Oracle Fusion
Middleware, and the single instance database.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
10
Clients
Primary Site
Standby Site
Load
Balancer
Web Host
APACHE
SERVER
App Host # 1
RecoverPoint
Manager
Load
Balancer
Web Host
APACHE
SERVER
Web Host
APACHE
SERVER
App Host # 2
App Host # 1
RecoverPoint
Manager
Admin
Server
Cluster
RecoverPoint
Appliance
BPEL+
ESB +
OWSM
Web Host
APACHE
SERVER
App Host # 2
Admin
Server
Cluster
BPEL+
ESB +
OWSM
RecoverPoint
Appliance
BPEL+
ESB +
OWSM
BPEL+
ESB +
OWSM
RecoverPoint (RP) CRR
Fusion Middleware & Logs
Database
Database
Data Guard Replication
Figure 1. Topology diagram of an SOA Suite Application and RecoverPoint use case
environment
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
11
RecoverPoint validation architecture
The RecoverPoint architecture for the middleware disaster recovery consists of four RecoverPoint
appliances (RPAs) attached to both primary and secondary storage through Fibre Channel with two paths to
either appliance. The connections are managed through a Brocade DCX and 7600 switch. They have an IP
connection on a private subnet through a gigE dLink switch to simulate a WAN link; this is in addition to
the management IP connection that runs through a Cisco Catalyst switch. Both storage arrays, the EMC
CLARiiON® CX4-480 and CX4-960, have the integrated RecoverPoint write-splitting technology. Each
storage array has a storage group created specifically to mask the LUNs to the RecoverPoint appliances as
well as the middleware hosts.
Figure 2. RecoverPoint use case architecture
All the hosts are zoned through the Brocade switches to the CX-480 and CX-960, using the switch explorer
GUI. The RecoverPoint appliances were also zoned over to the storage in the same manner. For each LUN
masked to the RecoverPoint appliances, a replication set is created linking the source LUN to a target LUN.
These replication sets are all placed within the middleware consistency group.
The CX splitter is integrated with each storage processor (SP) of the CLARiiON array and this will send
one copy of the write to the storage array and the other to the RecoverPoint appliance.
The following types of storage volumes are required for RecoverPoint configuration:
•
Repository volume: This volume holds the configuration and marking information during replication.
At least one repository volume is required per site and is accessible from all the RPAs at the site.
•
Journal volume: This volume is used to store all the modifications. The application-specific
bookmarks and timestamp details are written to the journal volume. The size of the journal depends on
the write activity to the protected LUN(s) and the RPO required for the data. Best practices
information on the sizing and configuration of journal volumes is available in the EMC RecoverPoint
Installation Guide.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
12
•
Replication set: The association created between the source volume and the local and/or remote target
volumes is called the replication set. A consistency group contains one or more replication sets.
•
Consistency group: The logical grouping of replication sets identified for replication is called a
consistency group. Consistency groups ensure that the updates to the associated volumes are always
consistent with write-order preserved and that they can be used to restore the database at any point of
time.
The following shows the Middleware consistency group and its assigned replication sets from the
RecoverPoint Management Application GUI. The names of the volumes in the Middleware and DR
columns are derived from the “friendly” names of the LUNs returned by the CLARiiON array during the
SCSI discovery operations. The number in parenthesis is the CX LUN ID number.
Figure 3. Consistency group and replication set definitions
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
13
The following RecoverPoint Management Application GUI screenshot shows the Middleware consistency
group storage volumes replicated by RecoverPoint (defined by replication sets).
Figure 4. Storage volumes in replication sets
CLARiiON array-side preparations
The integrated RecoverPoint write-splitting technology is enabled on each storage processor (SP) of the
CLARiiON array. The integrated write-splitter will intercept every write to a protected LUN and will send
a copy of the write to the RecoverPoint appliance and send the original to the protected LUN.
This setup has a CX4-480 as primary storage and a CX4–960 at the DR site. RecoverPoint 3.1 is installed
and configured to utilize the integrated RecoverPoint write-splitting technology in the CX4 arrays.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
14
System configuration and storage
Hardware configuration
Table 1. Server types used
Dell PowerEdge 2950
Dell PowerEdge 900
2 Quad-Core CPUs,
4 Quad-Core CPUs,
2.50 GHz, 16 GB RAM,
2.93 GHz, 32 GB RAM,
Brocade 8 GB HBAs
Brocade 8 GB HBAs
Table 2. Fibre Channel switch types
Primary
DR
Brocade DCX
Brocade 7600
Table 3. Storage types used
Primary
DR
CLARiiON CX4-480
FLARE 28
CLARiiON CX4-960
FLARE 28
Table 4. Server functions and types
Server function
Server type
Production Site – Apache Web Server
Dell 2950
Production Site – Oracle WebLogic Server Cluster
1
Dell 2950
Production Site – Oracle WebLogic Server Cluster
2
Dell 2950
Production Site – Oracle Database Server
Dell 900
DR Site – Apache Web Server
Dell 2950
DR Site – Oracle WebLogic Server Cluster 1
Dell 2950
DR Site – Oracle WebLogic Server Cluster 2
Dell 2950
DR Site – Oracle Database Server
Dell 900
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
15
Software configuration
Table 5. Software versions and configuration
Software
Version
Configuration
OEL Enterprise Linux Server
(Carthage)
Release 5.2
x86_64
EMC PowerPath®
5.1 SP 2
EMC RecoverPoint CLR
3.1
Oracle Database 10g Release 2
10.2.0.1
Linux X86_64
Oracle Database Server Patch Set
10.2.0.4
Patch Set 6810189
Linux X86_64
Oracle Enterprise Manager 10g
Grid Control
10.2.0.4
Oracle Enterprise Manager 10g
Grid Control Agent
10.2.0.4
Linux x86_64
Oracle WebLogic Server
9.2 MP3
Linux x86
Oracle Application Server SOA
Suite 10g Release 3
10.1.3.1
Linux x86
Oracle Application Server Patch
Set
10.1.3.4
Patch Set 7272722
Oracle SOA Suite 10.1.3.4 for
WebLogic Server 9.2 MP3 Patch
Patch 7490612
Oracle Patch – Resolve Prereq
Installation Issue
10.1.3.1
Apache HTTP Server
2.2.3
Oracle JDeveloper 10g Release 3
Studio Version 10.1.3
Apache ANT
1.7.1
Patch 633905
Linux x86_64
Windows XP – Service Pack 2
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
16
Oracle Fusion Middleware, web tier, and Database volume layout
Table 6. Volume layout
Volume
name
Tier
Size
Mounted
on nodes
Mount point
Notes/Comments
VolWeb
Web
Tier
20G
webhost
/u01/app/oracle
Volume for Apache
Install
VolAdmin
App
Tier
20G
apphost1
/u01/app/oracle/wls/soaDomain/admin
Volume for Admin
server Instances
VolWLS1
App
Tier
20G
apphost1
/u01/app/oracle/wls/soaDomain/mng1
Volume for
Managed Server
Instance
VolWLS2
App
Tier
20G
apphost2
/u01/app/oracle/wls/soaDomain/mng2
Volume for
Managed Server
Instance
VolData
App
Tier
20G
apphost1,
apphost2
/u01/app/oracle/data
Volume for TLogs
and JMS Data
VolOrcl1
App
Tier
20G
apphost1
/u01/app/oracle/product
Volume for
binaries, both
Oracle and
WebLogic
VolOrcl1
App
Tier
20G
apphost2
/u01/app/oracle/product
Volume for
binaries, both
Oracle and
WebLogic
Oradata_1
DB
Tier
100G
dbhost
/u01/app/oracle
Volume for Oracle
binaries and Flash
Recovery Area
Oradata_2
DB
Tier
100G
dbhost
/u01/oradata/orclsoa/cg1_dbf_undo
Database Files and
Undo
Oradata_3
DB
Tier
50G
dbhost
/u01/oradata/orclsoa/cg2_redo
Database Online
Redo Logs
Oradata_4
DB
Tier
100G
dbhost
/u01/oradata/orclsoa/cg3_arch_ctl
Database Archive
Logs and
Controlfiles
Journal
RP
50G
RP
appliance
Raw Volume - Fusion Middleware
Consistency Group Production
RecoverPoint
metadata
Journal
RP
50G
RP
appliance
Raw Volume - Fusion Middleware
Consistency Group CRR Replica
RecoverPoint
metadata
Journal
RP
50G
RP
appliance
Raw Volume - Database Consistency
Group, cg1_dbf_undo Production
RecoverPoint
metadata
Journal
RP
50G
RP
appliance
Raw Volume - Database Consistency
Group, cg1_dbf_undo CRR Replica
RecoverPoint
metadata
Journal
RP
50G
RP
Raw Volume - Database Consistency
RecoverPoint
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
17
appliance
Group, cg2_redo Production
metadata
Journal
RP
50G
RP
appliance
Raw Volume - Database Consistency
Group, cg2_redo CRR Replica
RecoverPoint
metadata
Journal
RP
50G
RP
appliance
Raw Volume - Database Consistency
Group, cg3_arch_ctl Production
RecoverPoint
metadata
Journal
RP
50G
RP
appliance
Raw Volume - Database Consistency
Group, cg3_arch_ctl CRR Replica
RecoverPoint
metadata
Please note the VolData volume is mounted simultaneously on both application hosts. This volume
contains the WebLogic persistent file-based store. File-based stores must be configured on shared storage.
The OCFS2 clustered file system is configured for the shared storage.
All server root volumes are installed on local disk. Although not used in this validation, RecoverPoint
does support the replication of root volumes installed in the SAN (boot from SAN).
RecoverPoint CRR consistency groups
For this use case three additional consistency groups for the database tier were created. The Oracle Fusion
Middleware tier consists of one consistency group as previously configured. The white paper Disaster
Recovery of Oracle Fusion Middleware with EMC RecoverPoint has further detail.
The Oracle Fusion Middleware has two types of data that are being replicated. The first is the SOA Suite,
WebLogic binaries, and configuration files. The second is the persistent file-based store used for the Java
Message Services (JMS) and Transaction Logs (TLogs). The persistent file-based store is used to recover
transactions.
The RecoverPoint consistency groups used for the database tier will be provisioned for 1) data files and
undo logs, 2) online redo logs, and 3) archived redo plus control files. This will provide for the various
strategies of disaster recovery according to the business needs of the organization, that is, applicationconsistent recovery, crash-consistent recovery, and crash-and-application consistent recovery.
The online redo logs change frequently. In a RecoverPoint configuration, placing the online redo logs in a
separate, lower-priority consistency group and giving priority to the data and archive redo log replication
sets (crash-consistent strategy) may improve replication performance and slightly decrease time between
snapshots (point-in-time recovery objective). The actual impact for all disaster recovery strategies depends
on the bandwidth and the percentage of writes being sent over the link to the replica site. Please refer to the
Replicating Oracle with EMC RecoverPoint Technical Notes.
Each consistency group has settings and policies to include configuration group name, preferred RPA, and
reservation support; and policies, such as compression, bandwidth limits, and maximum lag, which govern
the replication process. In the configuration described above, the Lag time will need to be defined. This is
the maximum offset between writing data to production storage and writing it to the RPA or journal at the
replication site. The EMC RecoverPoint Release 3.1 Administrator’s Guide has further detail on the
settings and policies.
RecoverPoint group sets and parallel bookmarks
RecoverPoint group sets are used throughout a federated environment. Federated environments are related
applications that may span multiple servers and storage arrays. Each application has its own RPO and RTO
policies that govern the protection and recoverability of the application’s data. After creating each
consistency group with its appropriate settings and policies, to ensure that the individual consistency groups
have a common consistency point across all of the applications, create a group set.
The group set allows you to automatically bookmark a set of consistency groups so that the bookmark
represents the same recovery point of each consistency group in the group set. This allows you to define
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
18
consistent recovery points for consistency groups that are distributed across different RPAs. The automatic
periodic bookmark consists of the name you specified for the group set and an automatically incremented
number. Numbers start at zero, and are incremented up to 65535, then begin again at 0.
The same bookmark name is used across all consistency groups. To apply automatic bookmarks, the
sources must be at the same site (replicating in the same direction) and transfer must be enabled for each
consistency group included in the group set. The Group Set Details dialog box in the RecoverPoint
Manager Application allows you to create, edit, or remove group sets.
When creating the group set, enter a name for the automatic bookmarks. Select the consistency groups to be
in the group set, and specify the bookmarking frequency. This will enable the parallel bookmarks. It is
recommended that the interval between automatic bookmarks not be less than 30 seconds.
If you prefer to take a bookmark at a specific time other than the automated times, or choose to manually
take the parallel bookmarks, this can also be done through the RecoverPoint Manager Application.
In the Navigation pane, select consistency groups. In the component pane, select all the consistency groups
to bookmark simultaneously. All selected consistency groups must be enabled and transfer must be active.
Click the parallel bookmarks icon in the upper right corner of the component pane. When prompted, enter
the name of the bookmark. Please do not use the name “latest” as it is a reserved word in RecoverPoint.
The group set can be perceived as a single entity, but each consistency group functions as a separate unit
within the group set. The user recovers each of the consistency groups that make up the federated
environment and selects the common bookmark. This will ensure all of the application’s data is recovered
to the exact same point in time. This can be executed through the RecoverPoint Manager Application or
using the command line interface (CLI).
Maintaining the autonomy of the consistency group in a group set provides the flexibility for each
application to maintain its own policies and settings to govern the protection and recoverability of the
application’s data. It enables the data to be used for such advanced features as building a testing and
development environment, creating a federated backup image, and data mining.
OCFS2 configuration
High availability for persistent stores
The WebLogic application servers are usually clustered for high availability. For the local site high
availability of the SOA Suite Topology, a persistent file-based store is used for the Java Message Services
(JMS) and Transaction Logs (TLogs). This file store needs to reside on shared disk that is accessible by all
members of the cluster, that is, by apphost1 and apphost2. The persistent file-based store can be migrated
along with its parent server as part of the whole server migration feature that provides both automatic and
manual migration.
There are two methods for configuring a synchronous write policy: the Cache-Flush and the Direct-Write
policies. The Cache-Flush policy improves performance, but the downside is possibly losing sent messages
or generating duplicate messages in the event of an operating system crash or hardware failure. This is due
to the fact that transactions are complete as soon as the writes are cached in memory, instead of waiting for
acknowledgement that the writes are written to disk.
Use high-availability storage for state data
The server migration process moves or “migrates” services. Some state information associated with the
work in process at the time of failure is persisted to storage. To ensure high availability, it is critical that
such state information remains available to the server instance and the services it hosts after migration. It
should be stored in a shared storage system that is accessible to any potential machine to which a failed
migratable server might be migrated. For highest reliability, use a shared storage array solution (like EMC
CLARiiON) that is itself highly available and a SAN designed for high availability.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
19
If independent local file systems resided on a shared LUN, there would be no means of cache
synchronization, and the file systems would eventually corrupt each other. Therefore, the shared storage
solution in a SAN as deployed in this white paper uses the Oracle Cluster File System (OCFS2) as the hostbased clustered or shared file system technology. OCFS2 is a symmetric shared disk cluster file system
that allows each node to read and write both metadata and data directly to the SAN.
A shared storage solution in a SAN as described in this paper uses a host-based clustered or shared
filesystem technology. Oracle Cluster File System (OCFS2) is used here, but any shared filesystem
technology can be used such as Red Hat's GFS, Veritas VxCFS, or IBM GCFS.
Network configuration
The EMC RecoverPoint system has been designed as a secure platform for CDP (local), CRR (remote), and
CLR (local and remote replication). EMC has invested in ensuring security for all aspects of its
RecoverPoint system, including the operating system, networking, and RecoverPoint software.
Security settings are divided into the following categories:
• RecoverPoint appliance (RPA) operating system and networking
•
Access control settings to limit access by end users or by external product components
•
Log settings related to the logging of events
•
Communication security settings related to security for the product network communications
•
Data security settings available to ensure protection of the data handled by the product
Secure serviceability is available to ensure control of service operations performed on the products by EMC
or its service partners.
Other security considerations list known issues, including false positive findings that appear when scanning
the product for vulnerabilities.
For further discussion and implementation detail, please refer to EMC RecoverPoint Release 3.1 and
Service Pack Releases Security Configuration Guide.
Planning for disasters and planned downtime
Initiate the replication process
RecoverPoint is an intelligent data protection and recovery solution that runs on out-of-band appliances
attached to the SAN and the IP network. RecoverPoint uses your enterprise’s existing network and storage
systems. To prepare for installation, you should be familiar with how the RecoverPoint system integrates
with your existing systems.
In preparation for RecoverPoint installation:
1.
The RPAs connect to the hosts and storage subsystems using a Fibre Channel SAN. Before
installing RecoverPoint, these subsystems should be in place.
2.
RPAs are linked to the WAN interface using an Ethernet/IP connection (eth0).
3.
Before you install RPAs and define consistency groups, ensure that sufficient volumes are
available on the SAN-attached storage at each site for use by RecoverPoint. As part of the
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
20
installation process, you must create a repository volume and journal volumes at both the primary
and secondary sites. The journal is maintained on one or more SAN-attached storage volumes for
each copy of a consistency group; that is, the production copy, together with a local copy or a
remote copy, or both. To configure a consistency group, you must designate one or more volumes
on the production storage to be replicated, together with corresponding volumes on the copy (or
copies). RecoverPoint operation requires proper storage mapping (LUN masking) configuration.
4.
The RecoverPoint system uses the Network Time Protocol (NTP) to synchronize the clocks across
all of the machines that are attached to a given installation. It is highly recommended that you
configure an external NTP server, which runs on Linux machines, for use by the RecoverPoint
system prior to RecoverPoint installation.
5.
Zoning on the Fibre Channel switch must be determined. It allows for communication between
the host, storage, and appliances. Zoning instructions vary according to the type of HBA ports
installed in your RPAs. Without the proper zoning, replication cannot be performed properly.
6.
The RPAs should be installed independently at the primary and secondary sites.
RecoverPoint software installation and configuration
RecoverPoint is installed with a GUI wizard that helps you install new RecoverPoint clusters from a single
management point. At the end of the wizard, you will be able to start building a replication configuration.
Before you begin using the RecoverPoint Installer, ensure you have completed the following operations:
1.
Identified the type of installation (one site or two sites).
2.
Confirmed that all RPAs are available at the site(s).
3.
Confirmed that at least two ports are available on each of two fabrics for each RPA.
4.
Completed the integration with installed networks and storage.
5.
Completed the Customer Site Planning Sheet.
6.
Unpacked and physically installed the RPAs.
7.
Assigned Management IP addresses for the RPAs (done when logging in as the user “boxmgmt”
with the password “boxmgmt”). The IP addresses are used by the RecoverPoint Installer to
complete the installation.
For detailed information, please refer to the EMC RecoverPoint Release 3.1 Installation Guide.
After completing system installation for the first time and verifying that all clocks are synchronized, verify
that you can access the management interface using both the RecoverPoint Management Application GUI
and the CLI.
Starting replication
It is assumed that host-based, fabric-based, and array-based splitters have been installed as needed. If not,
please refer to the EMC RecoverPoint Release 3.1 Installation Guide. Before you begin, add splitters to the
RecoverPoint system.
Create the consistency groups using the Add New Group Wizard. Define the settings and policies for the
consistency group.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
21
Configure the copies. Enter the values for the General Settings, Protection Settings and Advanced Settings
for the production, local, and remote replicas. If you are using the Create New Consistency Group Wizard,
the Production Copy Settings dialog box appears.
In the Replication Set and Journal Volumes Configuration dialog box, click the Add New Replication Set
button to define a new replication set. A replication set consists of a source (normally production) volume
and a corresponding volume for each replica. Each consistency group contains as many replication sets as
there are source volumes in the configuration group.
Add the volumes to the splitters that have been installed. When the volumes are added, they will be
automatically attached to all of the splitters that have access to that volume.
Now that the configuration groups and their journals are defined and configured, splitters have been added,
and volumes attached to splitters, enable the groups. Enabling the groups will initiate the
transfer/replication/sync of the production volumes with the replica/disaster recovery site volumes.
When a consistency group is initialized for the first time, the RecoverPoint system must complete full
synchronization of all designated volumes. The volumes at the local and remote site can be initialized
while host applications are running.
Alternatively, the current production middleware installation can be backed up, and manually transferred to
the remote site. This is possible, because the RecoverPoint system can efficiently determine which blocks
are different between the production and replica copies. It sends only the data for those blocks to the
replica storage, as the initialization snapshot.
For detailed procedures and further explanation, please refer to the EMC RecoverPoint Release 3.1
Administrator’s Guide.
Switchover procedures
Switchovers are planned operations for the purpose of testing and validation that result in smooth transition
of services and applications from one site to another. A trigger may be a maintenance window or to
comply with regulatory requirements to validate disaster recovery functionality. Switchovers can provide a
mechanism to establish, test, and prove SLA, RPO, and RTO requirements.
During a switchover, the current production site becomes the disaster recovery site, and the disaster
recovery site becomes the current production site.
The initial state of the production site is presumed to be up and functioning. The procedures begin with the
shutdown of the production site.
The procedures to execute the switchover are as follows.
1.
The decision is made to initiate switchover to the disaster recovery site, and all participants in this
process are advised.
2.
Shut down the Oracle Fusion Middleware, that is, SOA Suite and WebLogic.
a)
Shut down the WebLogic Managed Servers.
ƒ
Log in to the WebLogic Console. Choose SOADomain > Control. Select the managed
servers, and click the Shutdown tab.
ƒ
Choose the option Force Shutdown Now.
b) Shut down the WebLogic processes on the application servers.
ƒ
Log in to the WebLogic Administration Server application host. Using the command
line, manually shut down the Node Manager process first and then shut down the
Administration Server process.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
22
ƒ
Log in to the other nodes in the cluster and manually shut down the Node Manager
process. In this configuration, the WebLogic Administration Server is only configured
on one server.
3.
Log in to the webhost server(s). Shut down the Apache httpd processes.
4.
Unmount the file systems on all of the application and webhost servers.
5.
Log in to the database server.
a)
Cleanly shut down the database and Listener. If the database is managed by Oracle Enterprise
Manager 10g Grid Control, then log in to Oracle Grid Control and shut down the database and
Listener. If not, use any variation of the standard Oracle commands, that is, “shutdown
immediate” and “lsnrctl stop”.
b) Unmount the database file systems.
6.
Perform any network, DNS updates, or modifications to /etc/hosts or DNS if necessary at the disaster
recovery site.
7.
Log in to the RecoverPoint Management Application.
a)
Confirm current image selection on the production site:
ƒ
All applications are now shut down. Take a parallel bookmark of the production site.
The parallel bookmark is needed to identify the point across all of the consistency groups
that represents the point in time the shutdown has completed. It causes any data held at
the production site to be flushed to the remote site. Therefore if a recovery is needed of
the production site prior to switchover, the parallel bookmark is guaranteed to have the
latest data prior to the switchover. For detailed procedures to create the parallel
bookmark, please refer to the EMC RecoverPoint 3.1 Administrator’s Guide, in the
section, “Applying bookmarks to multiple groups simultaneously.”
ƒ
A journal entry reflecting the parallel bookmark will be entered into each consistency
group’s journal. In this configuration, there are five consistency groups, and therefore
respectively, five journal entries.
ƒ
To view the bookmark for each consistency group, in the Navigation pane, click on the
consistency group name. Then choose the production site and click the Journal tab.
When you see the journal entry for the bookmark, you can be assured that all the data has
flowed through the pipe and has been replicated to the DR site.
b) On the DR site for each consistency group in the group set:
8.
ƒ
Enable access to the replicated images and applications at the DR site.
ƒ
Click on the first icon under the name of the DR site. Choose Enable Image Access.
ƒ
Choose Select an image from the list. From the list that appears, select the named
parallel bookmark.
ƒ
Choose the type of access mode. The production site will be recovered from the DR site.
Therefore, select Virtual access with instantaneous access to the image, and Roll
image in the background to enable the recovery process.
ƒ
Confirm the current image selection
Log in to the database server.
a)
Mount the database file systems.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
23
b) Start the database and Listener. If the database is managed by Oracle Enterprise Manager 10g
Grid Control, then log in to Oracle Grid Control and start the database and Listener. If not,
use any variation of the standard Oracle commands, that is, “startup” and “lsnrctl start”.
9.
Mount the file systems on all the application and webhost servers.
10. Log in to the webhost server(s). Start the Apache httpd processes.
11. Start the Oracle Fusion Middleware, that is, SOA Suite and WebLogic.
a)
Start the WebLogic processes on application servers.
ƒ
Log in to the WebLogic Administration Server application host. Using the command
line, manually start the Administration Server process first and then start the Node
Manager process.
ƒ
Log in to the other nodes in the cluster and manually start the Node Manager process. In
this configuration, the WebLogic Administration Server is only configured on one server.
b) Start the WebLogic Managed Servers.
ƒ
Log in to the WebLogic Console. Choose SOADomain > Control. Select the managed
servers, and click the Start tab.
12. The site is ready to perform work. Applications are started, and database has been switched over.
Switchback procedures
Testing and validation of the disaster recovery plan have been completed. SLA, RPO, and RTO
requirements have been verified, and compliance with regulatory requirements has been met and proven.
Planned maintenance is done. The next tasks are to prove the procedures to switch back from the disaster
recovery site to the primary site and in this test scenario, to reserve all changes and propagate to the
primary site.
During a switchback, the disaster recovery site, now acting as the primary site, returns to its initial function
as the DR site.
The initial state of the DR site is up and functioning. The production site is down. We begin by shutting
down all processes on the DR site in preparation to recover the production site from the DR site images
with RecoverPoint.
The procedures to execute the switchback are as follows.
1.
The decision is made to initiate switchback to the production site, and all participants in this process
are advised.
2.
Shut down the Oracle Fusion Middleware, that is, SOA Suite and WebLogic.
a)
Shut down the WebLogic Managed Servers.
ƒ
Log in to the WebLogic Console. Choose SOA Domain > Control. Select the managed
servers, and click the Shutdown tab.
ƒ
Choose the option to Force Shutdown Now.
b) Shut down the WebLogic processes on the application servers.
ƒ
Log in to the WebLogic Administration Server application host. Using the command
line, manually shut down the Node Manager process first and then shut down the
Administration Server process.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
24
ƒ
Log in to the other nodes in the cluster and manually shut down the Node Manager
process. In this configuration, the WebLogic Administration Server is only configured
on one server.
3.
Log in to the webhost server(s). Shut down the Apache httpd processes.
4.
Unmount the file systems on all of the application and webhost servers.
5.
Log in to the database server.
a)
Cleanly shut down the database and Listener. If the database is managed by Oracle Enterprise
Manager 10g Grid Control, then log in to Oracle Grid Control and shut down the database and
Listener. If not, use any variation of the standard Oracle commands, that is, “shutdown
immediate” and “lsnrctl stop”.
b) Unmount the database file systems.
6.
Perform any network, DNS updates, or modifications to /etc/hosts or DNS if necessary at the disaster
recovery site.
7.
Log in to the RecoverPoint Management Application.
a)
On the DR site:
ƒ
For each consistency group in the group set, to switch back to the production site, choose
the option Recover production from the DR site. RecoverPoint will automatically take a
snapshot of the latest image. The journal entry for the bookmark will be referred to as the
“Pre-replication Image”. This is the image restored on the production site.
ƒ
When the Recover production option is chosen, there will be several questions asking
whether you are sure you want to perform the production restore, and do you want to
continue. Respond Yes to these questions.
ƒ
During the initial phases of transfer, the connection will be paused for reconfiguration.
This can be seen in the Component pane of the RecoverPoint Management Application.
When the value Transfer changes from Paused to Active, you can now move to the
production site to complete the switchback recovery.
b) On the production site:
ƒ
Replication will continue from the DR site to the production site, until all data is
transferred. The amount of time to complete transfer is relative to the amount of changes
and bandwidth. In the Component pane for the selected consistency group the image
status of the production site will be “Distributing Pre-replication image” and the role will
be “Production (being restored)”.
ƒ
When the Pre-Replication image is recorded on the production site, all data has been
replicated, and syncing completed. To verify the status of the transfer, click on the entry
for the production site under each consistency group. Choose the Journal tab. The
bookmark entry will indicate the replication status. When it states “Synchronization
completed (Primary)”, replication is complete and you can continue with the next step.
ƒ
For each consistency group, enable image access. Choose logged access (physical) when
recovering the site on a permanent basis. Virtual images are temporary, as their name
indicates.
ƒ
After confirming the image, choose Resume production for each consistency group on
the production site. You will be asked if you choose to continue with this action. It will
be noted that there will be a pause while reconfiguration occurs. The response should be
Yes.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
25
8.
ƒ
When “Resume production” is completed at the production site, note the direction of the
replication flow will change and revert back, originating from the production site to the
DR site for each consistency group.
ƒ
The volumes are now visible on the production site and are mountable as read/write. The
RecoverPoint splitter is enabled, and all new writes will be sent to the appliance.
ƒ
Although not required, best practice would recommend a parallel bookmark be created
for the group set. This should be implemented before restarting the applications on the
production site. This will provide an audit trail and a bookmark of all applications at a
consistent point in time, in case there is an unexpected situation and you must again
switch over immediately to the DR site.
Log in to the database server.
a)
Mount the database file systems.
b) Start the database and Listener. If the database is managed by Oracle Enterprise Manager 10g
Grid Control, then log in to Oracle Grid Control and start the database and Listener. If not,
use any variation of the standard Oracle commands, that is, “startup” and “lsnrctl start”.
9.
Mount the file systems on all the application and webhost servers.
10. Log in to the webhost server(s). Start the Apache httpd processes.
11. Start the Oracle Fusion Middleware, that is, SOA Suite and WebLogic.
a)
Start the WebLogic processes on the application servers.
ƒ
Log in to the WebLogic Administration Server application host. Using the command
line, manually start the Administration Server process first and then start the Node
Manager process.
ƒ
Log in to the other nodes in the cluster and manually start the Node Manager process. In
this configuration, the WebLogic Administration Server is only configured on one server.
b) Start the WebLogic Managed Servers.
ƒ
Log in to the WebLogic Console. Choose SOA Domain > Control. Select the managed
servers, and click the Start tab.
12. The production site is recovered. Data is being replicated to the DR site.
Failover procedures
The extent of the disaster and anticipated length of time at the disaster recovery site will directly affect
which types of failover procedures are required. If the failover was due to a catastrophe, such as fire,
flooding, or earthquake, then migration to the disaster recovery site is the likely scenario. In this case, the
personality of the sites changes. The disaster recovery site becomes the permanent production site until the
previous production site is rebuilt or repaired.
With this type of failover, when the production site becomes available, a resynchronization of all data and
applications will be executed. For the RecoverPoint procedures to execute a migration please refer to the
EMC RecoverPoint 3.1 Administrator’s Guide.
In the following scenario, the process addresses failover due to temporary loss of site, or unplanned
downtime, and the system crashes. The procedures are very similar to those detailed in the previous
“Switchover procedures” and “Switchback procedures” sections. The major difference is we are beginning
with no access to the production site, and no ability to gracefully shut down all applications and cleanly
switch over the database.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
26
The procedures to execute the failover are as follows:
1.
Log in to RecoverPoint Management Application. Enable access to the replicated images, as per steps
7b in the “Switchover procedures” section that starts on page 22. The wizard will ask which image to
choose. For this exercise, we choose the named parallel bookmark image taken prior to switchover
from the list of bookmarks.
2.
Start the database and Listener. If the database is managed by Oracle Enterprise Manager 10g Grid
Control, then log in to Oracle Grid Control and start the database and Listener. If not, use any
variation of the standard Oracle commands, that is,“startup” and “lsnrctl start”.
3.
Continue with steps 9 through 11 of the “Switchover procedures” section, that is, mount and start the
Apache HTTP application.
4.
Perform any network, DNS updates, or modifications to /etc/hosts or DNS if necessary at the disaster
recovery site.
5.
The site is ready for work.
Failback procedures
The systems on the production site are now available and a decision is made to return to the production site.
The applications and database at the disaster recovery site must be shut down, failed back, or re-instantiated
and restarted.
The procedures to execute the failback are as follows.
1.
Shut down the WebLogic Managed Servers, WebLogic processes, and Apache httpd services, and
unmount the file systems, as per steps 2 through 4 of the “Switchback procedures” section of this paper
starting on page 24.
2.
Cleanly shut down the database and Listener. If the database is managed by Oracle Enterprise
Manager 10g Grid Control, then log in to Oracle Grid Control and shut down the database and
Listener. If not, use any variation of the standard Oracle commands, that is, “shutdown immediate”
and “lsnrctl stop”. Unmount the database file systems.
3.
In this implementation, the failover was temporary. Therefore, to fail back, log in to the RecoverPoint
Management Application and repeat steps 7a and 7b of the “Switchback procedures” section.
4.
There is no direct relationship between having to re-instantiate or recover the database and
RecoverPoint. It could be possible due to the bandwidth and the size of the database that it would be
better to recover from backup, and then sync only the changes with RecoverPoint. To implement this
procedure, please refer to the EMC RecoverPoint Release 3.1 Administrator’s Guide. If reinstantiation is not required, then mount the database file systems. Start the database and Listener. If
the database is managed by Oracle Enterprise Manager 10g Grid Control, then log in to Oracle Grid
Control and start the database and Listener. If not, use any variation of the standard Oracle commands,
that is, “startup” and “lsnrctl start”
5.
After completion of these steps, the original production site will be available to resume work. To start
the applications repeat steps 9 through 11 of the “Switchover procedures”.
6.
Perform any network, DNS updates, or modifications to /etc/hosts or DNS if necessary at the
production site. The production site is recovered. Data is being replicated to the DR site.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
27
General recommendations
Setting snapshots or manual bookmarks based on requirements
The RPO policy for the Oracle Fusion Middleware binaries has been set based on size, number of writes,
and time, with a configured lag time of 12 hours. The lag time of 12 hours was decided upon based on the
knowledge that binaries have few expected modifications. The way to think about your policy is that you
are guaranteed a recovery point within 12 hours.
The binaries are scheduled to be patched. We would like to ensure the patched binaries are replicated prior
to the configured lag time after the application of the patch. The concern is if an event occurs that causes
failover to the DR site, then the correct version of the code will be available.
To ensure the configuration changes are replicated, a bookmark is created. A bookmark is a named
snapshot (image) that uniquely identifies the image at a point in time. The bookmark creates a
transactional consistent snapshot that is transferred to the replica. When the bookmark is created it forces
all data held at the production side in the RPA’s buffers to be flushed to the remote site. At the remote site
a corresponding bookmark is recorded when replication is completed.
In this situation, it is best practice to create a bookmark prior to patching the binaries. If a recovery of the
binaries prior to patching is needed, the “pre-patch” bookmark will guarantee the latest data prior to the
patch.
After patching the binaries, create a “post-patch” bookmark, so a recovery using this bookmark is
guaranteed to have the latest data image after the patch.
By creating both a “pre-patch” and “post-patch” bookmark, it becomes simple to choose either bookmark,
depending on the point-in-time recovery required.
To create the bookmark, in the Navigation pane of the RecoverPoint Management Application, select the
consistency group. Verify on the Status tab that it is active. Click the Bookmark button and enter a
descriptive name.
Images’ snapshots will continue independent of the manually initiated bookmark per the RPO set policy.
Periodic DR testing
To ensure governance, regulatory requirements, and failover to the disaster recovery site will be successful
in times of catastrophe, it is a good practice to verify the replicas can be used to restore data, recover from a
disaster, or seamlessly take over. In most cases, while testing a replica, applications can continue to run on
the production servers, and replication can continue. The writes will be stored in the replica journal until
testing is completed.
Upon completion of testing, write access to the replica is disabled, which results in any writes made during
testing rolled back by RecoverPoint. However, any concurrent writes from the production applications will
be automatically distributed from the journal to the replica. This entire process can be completed without
application downtime and without loss of data at the replica.
In the RecoverPoint Management Application at the DR site:
1.
From the Image Access menu, select Enable Image Access. If you are only testing the image, and do
not expect a high rate of modification/change to the data, selecting Virtual Image Access without
Roll Image in Background is appropriate. If you expect to do more testing/forensics, over an
extended period of time or need maximum performance while testing, select Logged Image Access
(physical). Virtual access is instantaneous while physical access will require more time before the
image is available for use. For further discussion of the differences in the image access modes, please
refer to the “Appendix” section of this paper.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
28
2.
At the host, mount the replica volume you wish to access. If the volume is in a volume group managed
by a logical volume manager, import the volume group.
3.
If desired, run “fsck” (“chkdsk” on Windows) on the replica volumes. This is optional.
4.
Access the volumes and test as desired.
5.
When testing is completed, unmount the replica volumes from the host. If using logical disk
management, deport the volume groups. Then select Disable Image Access at the replica. The writes
to the replica will automatically be undone.
Event notification
Complete event logging is provided for all RecoverPoint management operations and status changes.
Events are stored within the RecoverPoint System Log, which is accessible from the management
application. Event logging includes auditing information such as commands, command errors, events, and
the System Log. Additionally, RecoverPoint also supports the following types of event notification: e-mail,
SNMP, Syslogs, System Reports, and System Alerts.
General I/O and sizing
The transfer rate of a single RecoverPoint appliance (RPA) is approximately 60 MB per second per RPA.
Iostat can be used to measure the data change rate. This data is relevant in determining the number of
RPAs needed to meet RTO and Service Level Agreements. It is useful in determining the size of the
journals and retention time, bandwidth, and degree of compression.
Group sets should be configured for consistency groups that are dependent on one another or that must
work together as a single unit, that is, in a federated environment. A group set provides the capability to
apply parallel bookmarks at a user-defined frequency. In this implementation, the data is being replicated
from both the web server and application hosts. On the application hosts, the rate of change varies for
different volumes. The WebLogic persistent store must be replicated at a higher rate than the SOA and
WebLogic binaries and configuration data need to be transferred. To implement group sets, please refer to
the EMC RecoverPoint 3.1 Administrator’s Guide.
Conclusion
Enterprise Oracle deployments need protection from unforeseen disasters and natural calamities. Oracle
provides Data Guard as a technology to remotely replicate Oracle database files synchronously or
asynchronously to allow for recovery of the Oracle database.
However, protecting the database alone is not enough to protect the business itself. Full application level
recovery is required to bring the business back to a production-ready state. To that end, Oracle works with
partners to validate enterprise replication technologies to protect Fusion Middleware environments on
which Oracle enterprise business applications run.
This white paper summarizes the results of a joint EMC-Oracle engineering effort to utilize EMC
RecoverPoint for local and remote replication of Oracle Fusion Middleware and Oracle Database Server
information. It highlights key concepts and setup and administration examples of how to deliver
application-aware recovery to specific points in time and provide continuous data protection , while also
incorporating additional features such as bandwidth compression to reduce overall TCO.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
29
References and resources
Oracle
OCFS2 A Cluster File System for Linux, User’s Guide for Release 1.4
http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.4/ocfs2-1_4-usersguide.pdf
Oracle Cluster File System 2 (OCFS2) User’s Guide
OCFS2 – Frequently Asked Questions
http://oss.oracle.com/projects/ocfs2/documentation/v1.2/
Oracle WebLogic Server 9.2 documentation on the Oracle website
http://e-generation.beasys.com/wls/docs92/admin.html
SOA Architect Center page on the Oracle website
http://www.oracle.com/technology/tech/soa/index.html
Oracle Data Guard Concepts and Administration 10g Release 2 (10.2)
http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/toc.htm
EMC
The following white papers can be found EMC.com. For a selection of other EMC RecoverPoint white
papers, go to our resource library.
• Introduction to EMC RecoverPoint 3.1: New Features and Functions
http://www.emc.com/collateral/software/white-papers/h2781-emc-recoverpoint-3-new-features.pdf
• Using EMC RecoverPoint Concurrent Local and Remote for Operational Disaster Recovery
http://www.emc.com/collateral/software/white-papers/h4175-recoverpoint-concurrent-local-remote-operdisaster-recovery-wp.pdf
• EMC RecoverPoint Family Overview
http://www.emc.com/collateral/software/white-papers/h2346-recoverpoint-ov.pdf
The following are available on Powerlink, EMC’s password-protected website for customers and partners:
•
Enhancing Oracle Database Recovery with EMC RecoverPoint—Applied Technology\
•
Disaster Recovery of Oracle Fusion Middleware with EMC RecoverPoint
•
Replicating Oracle with EMC RecoverPoint Technical Notes
•
EMC RecoverPoint Release 3.1 Administrator’s Guide
•
EMC RecoverPoint Release 3.1 and Service Pack Releases Security Configuration Guide
•
EMC RecoverPoint Release 3.1 Installation Guide
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
30
Appendix
Oracle DR terminology
This appendix defines the following Oracle disaster recovery terminology:
•
Application Server host name: This paper differentiates between the terms Application Server host
name and network host name.
The Application Server host name is the host name that Oracle Application Server uses for the host
when Oracle Application Server is configured on the host. During installation, the installer
automatically retrieves the Application Server host name from the current host and stores it in the
Oracle Application Server configuration metadata on disk. A host can have only one Application
Server host name.
See also network host name later in this section.
•
Asymmetric topology: A disaster recovery configuration that is different across tiers on the
production site and standby site. In an asymmetric topology, the standby site can use less hardware (for
example, the production site could include four hosts with four Application Server instances while the
standby site includes two hosts with four Application Server instances. Or, in a different asymmetric
topology, the standby site can use fewer Application Server instances. For example, the production site
could include four Application Server instances while the standby site includes two Application Server
instances). Another asymmetric topology might include a different configuration for a database (for
example, using a Real Application Clusters database at the production site and a single instance
database at the standby site).
•
Disaster recovery: The ability to safeguard against natural or unplanned outages at a production site
by having a recovery strategy for applications and data to a geographically separate standby site.
•
Network host name: A host name assigned to an IP address that is resolved through DNS resolution.
The network host name is the host name by which a particular host is known within the host's network.
A host can have the same network host name and Application Server host name. A host can have only
one Application Server host name, but it can have multiple network host names.
See also Application Server host name earlier in this section.
•
Production site setup: The process of creating the production site. To create the production site using
the procedure described in this manual, you must plan and create Application Server host names and
network host names, create mount points and links on the hosts to the Oracle home directories on the
shared storage where the Oracle Application Server instances will be installed, install the binaries and
instances, and deploy the applications.
•
Site failover: The process of making the current standby site the new production site after the
production site becomes unexpectedly unavailable (for example, due to a disaster at the production
site). This paper also uses the term "failover" to refer to a site failover.
•
Site switchover: The process of reversing the roles of the production site and standby site.
Switchovers are planned operations done for periodic validation or to perform planned maintenance on
the current production site. During a switchover, the current standby site becomes the new production
site, and the current production site becomes the new standby site. This paper also uses the term
"switchover" to refer to a site switchover.
•
Site synchronization: The process of applying changes made to the production site at the standby site.
For example, when a new application is deployed at the production site, you should perform
synchronization so that the same application will be deployed at the standby site, also.
•
Standby site setup: The process of creating the standby site. To create the standby site using the
procedure described in this paper, you must plan and create Application Server host names and
network host names, perform a switchover operation (which replicates the Oracle home directories and
installations from the production site shared storage to the standby site shared storage), and create
mount points and links to the Oracle home directories on the standby shared storage.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
31
•
Symmetric topology: An Oracle Application Server Disaster Recovery configuration that is
completely identical across tiers on the production site and standby site. In a symmetric topology, the
production site and standby site have the identical number of hosts, load balancers, instances, and
applications. The same ports are used for both sites. The systems are configured identically and the
applications access the same data. This paper describes how to set up a symmetric Oracle Application
Server disaster recovery topology for an enterprise configuration.
•
Topology: The production site and standby site hardware and software components that comprise an
Oracle Application Server disaster recovery solution.
RecoverPoint terminology
•
Bookmarks: A bookmark is a named snapshot. The bookmark uniquely identifies an image.
Bookmarks can be set and named manually; they can also be created automatically by the system
either at regular intervals or in response to a system event. Bookmarked images are listed by name.
•
Consistency group: A consistency group is a logical grouping of replication volumes that must be
consistent across one another. The need for consistency across these volumes could be due to the
volumes being used by the same application or needing to have the data on the volumes at the same
point in time when recovered due to data dependencies.
A consistency group is also used to determine replication direction and policies on a set of replication
volumes. Each consistency group is an independent entity and can have different replication direction
and policies than other consistency groups. This allows for synchronous and asynchronous replication
as well as bi-directional replication to exist in the same environment.
Consistency groups are a technology that groups together various objects, either on a single system or
across systems, so that when they are moved or copied, they’re seen as a group. Remember, you either
get all of the data or none of it—you don't want to get part of the data.
•
Continuous data protection (CDP): Local replication across heterogeneous environments. At the
local site, the CDP engine captures every I/O into the local CDP journal with I/O bookmarking to
capture application events. CDP provides instantaneous or on-demand any-point-in-time recovery
regardless of the array type.
•
Continuous local and remote data protection (CLR): Simultaneous block-level local replication and
asynchronous block level remote replication for LUNs with one copy residing locally in the same
SAN, and the second copy residing remotely in a different SAN. Locally, every write is journaled.
With remote replication significant groups of writes are journaled (bandwidth efficiencies).
With CLR, users are enabled to independently recover from local or remote sites. Recovery of one
copy locally or remotely can occur without affecting the other copy. This ability to fail over to a local
copy of data without impacting a remote site extends RecoverPoint disaster recovery to encompass
local as well as regional events.
•
Continuous remote replication (CRR): Remote replication, with remote site recovery. It provides
heterogeneous replication with policy-based bandwidth reduction and efficiencies in asynchronous or
synchronous replication environments. CRR implements bi-directional, heterogeneous, block-level
replication across any distance using asynchronous, synchronous, and snapshot modes.
•
Journal: Provides time-stamped recovery points with application-consistent bookmarks. It also
correlates system-wide events (port failure, system error, and so on) with potential corruption events,
which is very useful when performing root-cause analysis. These application and system bookmarks
are automatic, but users can also enter their own bookmarks into the system.
The RecoverPoint journal, an important component of the protection process, provides the following
capabilities:
ƒ Tracks all data changes to every protected LUN. It saves each write so that it represents an anypoint-in-time image of a protected LUN.
ƒ Utilizes bookmarks for application-aware recovery.
ƒ Maintains all of the application, user, and environmental bookmarks associated with specific pointin-time images.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
32
ƒ Repository for live data updates. Maintains a reserved space, called the target-side processing
space, which is used to store changes to an image that has been recovered.
ƒ Provisioned from existing SAN LUNs. Can be configured from any SAN-accessible LUN, or from
a collection of “concatenated” LUNs. The size of the journal can be dynamically increased by
adding a new LUN without discarding the existing history contents.
ƒ Dynamically compressed, which saves storage. The data is stored in a compressed format that can
be used to roll a protected LUN back to any point in time.
•
RecoverPoint appliance (RPA): Based on a standard Dell 1μ server running a customized Linux
kernel. The appliance has four 4 Gb/s Fibre Channel ports that are used to attach into a single- or dualnode (A/B) fabric. Each RPA also has two Ethernet ports—one is used as the management-control
network and one is used to communicate to a remote RecoverPoint appliance cluster. It is designed for
high availability with redundant power and cooling.
The RecoverPoint application software provides core functionality and management for the system.
Appliances are deployed in a two- to eight-node cluster configuration that allows active-active failover
between the nodes.
The RecoverPoint software is designed to avoid the split-brain issues that can arise with traditional
clustering technologies. All RecoverPoint appliances are in constant communication and use a shared
private SAN volume to maintain metadata state. If a RecoverPoint appliance fails, one of the other
RecoverPoint appliances will take over without interrupting any in-progress CDP or CRR operations.
•
Replication volumes: Volumes with data to be replicated. If the source and target replication volumes
differ in size, then the source must be the smaller of the two volumes. This is typical in heterogeneous
storage environments as well as some environments where different versions of storage management
software are used. Any excess size will not be replicated and will be hidden from the host servers.
•
Replication set: The association created between the source volume and the local and/or remote target
volumes is called the replication set. A consistency group contains one or more replication sets.
•
Repository volume: This volume holds the configuration and marking information during the
replication. At least one repository volume is required per site and is accessible from all RPAs at the
site.
•
Snapshot: A snapshot is the difference between one consistent image of stored data and the next.
Snapshots are taken seconds apart. The application writes to storage; at the same time, the splitter
provides a second copy of the writes to the RecoverPoint appliance. In asynchronous replication, the
appliance gathers several writes into a single snapshot. The exact time for closing the snapshot is
determined dynamically depending on replication policies and the journal of the consistency group. In
synchronous replication, each write is a snapshot. When the snapshot is distributed to a replica, it is
stored in the journal volume, so that is it possible to revert to previous images by using the stored
snapshots.
RecoverPoint write splitters
The function of the splitter is to mirror writes from the application server to LUNs being protected by
RecoverPoint. When a write is requested from the application server it is split and sent to the RecoverPoint
appliance in one of three ways.
•
The first method utilizes a host splitter/driver. This host splitter is a lightweight driver that resides in
the I/O stack, below any file system and volume manager, and just above any multipath driver (such as
EMC PowerPath). The splitter looks at the destination for the write packet. If the write is to a LUN
that RecoverPoint protects, the splitter will send a copy of the write packet to the RecoverPoint
appliance. It does this by rewriting the target address inside the packet to redirect it to the
RecoverPoint appliance’s pseudo-LUN, and reissuing the write down the stack.
•
The second method is through an intelligent fabric switch, either the Brocade Connectrix® AP-7600B
or Connectrix ED-48000B with the SAS APIs, or one of the Cisco Connectrix MDS-9000 series
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
33
switches with the SANTap API. The switch intercepts all writes to LUNs being protected by
RecoverPoint, and sends a copy of that write to the RecoverPoint appliance.
•
The third method is through a CLARiiON array-based splitter, which is supported on CLARiiON CX3
arrays with FLARE 26 patch code, and CLARiiON CX4 arrays with FLARE 28 patch code. The array
intercepts all writes to the LUNs being protected by RecoverPoint, and sends a copy to the
RecoverPoint appliance.
In all cases, the original write travels though its normal path to the production LUN.
When the copy of the write is received by the RecoverPoint appliance, it is acknowledged back (ACK).
This ACK is received by the splitter (the host, CLARiiON or fabric splitter), and held until the ACK is
received back from the production LUN. With both ACKs received, the ACK is sent back to the host, and
I/O continues normally.
Once the appliance has acknowledged the write, it will move the data into the local journal volume, along
with a timestamp and any application-, event-, or user-generated bookmarks for the write. When the data is
safely in the journal, it is then distributed to the target replica volumes, with care taken to ensure that write
order is preserved during this distribution.
RecoverPoint image access modes
To test and verify the replica at the disaster recovery site is a reliable and consistent image of the
production site, it is necessary to access the image, as referenced in the procedures in this paper.
Image access is required to restore production from the disaster recovery site, and to roll back to a previous
state of the data. It is also required to temporarily operate systems from a replicated copy while
maintenance work is carried out on the production site and to fail over to the replica. When image access is
enabled, host applications at the copy site can access the replica.
The following is a discussion of several of the available image access strategies that were pertinent in this
implementation, that is, Logged, Virtual, Virtual with Roll, and Disable Image access.
Virtual access (instant)
In Virtual access, the system creates the image selected in a separate virtual LUN within the RecoverPoint
appliance. Performance is constrained by the RecoverPoint appliance; however access to the point-in-time
image is nearly instantaneous. The image can be used in the same way as logged access (physical), but
again, all data changes are temporary and are stored in a special place on the local journal.
Generally, this type of image access is chosen because the user is not sure which image, or point in time is
needed. The user must access several images to conduct forensics and determine which replica is required.
As stated above, the image is accessed through the RecoverPoint appliance; virtual access is not
recommended for heavy workloads or production work.
You will not be able to recover the production site from a virtual image. By definition, the image is
temporary. Generally when work is completed, the choice is made to disable image access.
If it is determined the image should be maintained, then access must be changed to Logged access using
Roll To Image. This can be done on the fly using a pull-down menu in the GUI or through the
RecoverPoint command line interface. When you disable image access, the virtual LUN and all writes to it
are discarded.
Virtual access (instant) with Roll image in background
In Virtual access with Roll image in background, the system first creates the image in a virtual volume
managed by the RecoverPoint appliance. This provides very fast access to the image, the same as in
Virtual access. Simultaneously in background, the system rolls to the physical image. Once the system has
completed this action, the virtual volume is discarded, and the physical volume takes its place. At this
point, the system continues to function as if you had chosen Logged image access initially.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
34
The virtual volume and the physical volume have the same SCSI ID. The virtual LUNs owned by the
RecoverPoint appliance look like physical target LUNs but were dynamically created by the RPA. If you
execute a SCSI inquiry against the virtual LUNs, the data returned will be the same as if the inquiry was
invoked against the physical LUNs. Zoning and all other characteristics will appear to have the same
configuration. The virtual LUNs can be mounted to the disaster recovery server and are seen by the
operating system as physical LUNs.
The switch from virtual to physical will be transparent to the servers and applications. The user will not
see any difference in access. Once this occurs, changes are read from the physical volume instead of being
performed by the RecoverPoint appliance.
If you disable image access, the writes to the volume while image access was enabled will be rolled back
(undone). Then distribution to storage will continue from the accessed image forward.
This type of access is recommended when the decision is made to move from the current production site to
the disaster recovery site, that is, catastrophe, and immediate access to the image is required. If the
intention is to roll back in time, but you need immediate access to images to determine which image is
valid, this is also the best option. This option is also viable for a heavy workload. Lastly, as mentioned,
production cannot be recovered from a virtual only image. This type of access or logged access is
necessary.
Logged access (physical)
In Logged access, the system rolls backward (or forward) to the snapshot (point in time) you select to
access. There will be a delay while the successive snapshots are applied to the replica image to create the
image you selected. The length of delay depends on how far the selected snapshot is from the snapshot
currently being distributed to storage.
Once the access is enabled, hosts will have direct access to the replica volumes, and the RPA will not have
access; that is, distribution of snapshots from the journal to storage will be paused.
When you disable image access, the writes to the volume while image access was enabled will be rolled
back (undone). Then distribution to storage will continue from the accessed snapshot forward.
Logged access is the preferred image access for production. When recovering production from the disaster
recovery site this image should be enabled.
Disable Image Access
Choosing to disable image access means all changes to the replica will be discarded or thrown away. It
does not matter what type of access was initiated, that is, logged or another type, or whether the image
chosen was the latest or an image back in time.
When the splitter is disabled the LUN will be masked off. The operating system will still see the LUNs as
mounted. But in fact, if you try to access data or rescan the disks using a SCSI command, the servers will
report errors. Disabling the image is like yanking out the LUNs from underneath the application.
Applications usually issue errors when data is cached or information is pending to be flushed from the
operating system.
The cleanest way to ensure there will be no errors on the disaster recovery site is to first shut down the
applications. Then unmount the file systems, and disable the image. Disabling image access restores the
storage state to No access. Changes to the replica recorded in the image access log are automatically
undone, so that the replica is restored to the state it was in before it was accessed.
Using Disable Image Access effectively says the work done at the disaster recovery site is no longer
needed. Some reasons may be that the point-in-time image chosen was not the correct image, or the
information sought was obtained and propagated by another means.
Disaster Recovery of Oracle Fusion Middleware and Oracle Database Server with EMC RecoverPoint
Applied Technology
35