Unbreakable Architecture

advertisement
Session id: 40180
Proven Techniques for
Maximizing Availability
Maximum Availability Architecture
Lawrence To, Shari Yamaguchi
High Availability Systems Group
Systems Technologies
Oracle Corporation
Agenda







Achieving High Availability
Maximum Availability Architecture (MAA)
Solutions to Real World Questions
Real MAA Deployments
MAA in 10g
Future MAA
Q&A
Achieving High Availability
 Prevent outages before they occur.
 Tolerate outages - planned or unplanned so
they are transparent to the business.
 Recover quickly if an outage does occur.
Causes of Downtime
Computer
Failures
Unplanned
Downtime
Data
Failures
System
Changes
Planned
Downtime
Data
Changes
Human Error,
Corruption,
Storage Failure,
Site Failure
System Maintenance,
Software Maintenance,
Application Changes
High Availability is …
Maximum Availability
Architecture
 Best Oracle High Availability Architecture
 Best Practices




Building the configuration.
Managing the configuration.
Recovering from outages quickly.
Restoring full fault tolerance.
 Continual Testing
 Evolves with new Oracle versions and
features
Maximum Availability
Architecture
 What to Use:
–
High Availability Blueprint for Database, Oracle.
Application Server, Enterprise Manager, and more.
 How to Build, Manage, and Recover:
–
–
–
Following configuration and operational best
practices.
Understanding outages and detailed recovery
solutions.
Restoring fault tolerance after an outage.
Unbreakable Architecture + Best Practices = Maximum Availability
Maximum Availability
Architecture
Oracle Application Server
Oracle Application Server
WAN Traffic Manager
Dedicated Network
RAC
Primary Site
Data Guard
RAC
Secondary Site
MAA Was Created Based on…
 Real world customer requests and
questions:
–
–
–
–
–
What issues should we consider for choosing the
most optimal high availability architecture?
What is Oracle’s best high availability
architecture?
How can we manage this high availability
environment?
What are the performance trade-offs?
How do we repair from various outages?
Examples of Issues That Have
Been Addressed
 What is the best solution to avoid service
disruption for host and instance failures?
 Which Disaster Recovery solution should
we adopt?
 What is the best way to configure the
standby database over a network?
 How do you configure Oracle Application
Server for high availability?
Best Solution to Avoid Service
Disruption
Real Application Clusters
 Fast Failover
–
–
–
Protection from local site system failures
Faster than cold cluster failover solution
Fast-start fault recovery (instance failure MTTR)
 Availability and Accessibility
–
Allows for scheduled outages
 Add and remove nodes transparently
–
Transparent Application Failover (TAF) provides
uninterrupted service
Best Solution to Avoid Service
Disruption
Real Application Clusters
A
B
B
 Higher Scalability
–
–
–
All system resources from all nodes are leveraged
Cache fusion eliminates need to partition data or
modify the application – fully application transparent
Connection load balancing distributes connection
requests from application tier
 Manageability
–
Provides a single image of the database to manage
Fast Instance Recovery
Performance stays constant as recovery
gets faster.
900
800
700
600
500
400
writes/sec
tps
300
200
100
0
disabled
300
180
fast_start_mttr_target setting
90
Which Disaster Recovery
Option?
• Storage or Remote Mirroring, Geo-Clusters
•
•
Vulnerable to human error and data failures.
Latency.
• Streams and Replication
•
•
Ideal for active-active configurations that may involve
heterogeneous environments.
Offers finer granularity on what gets replicated and when.
• Data Guard
•
Provides comprehensive data protection, data availability, and
data recovery benefits, along with an integrated management
framework.
Data Guard Architecture
Physical/Logical
Standby
Database
Oracle Net
Transactions
LGWR
Online
Redo Logs
Primary
Database
ARCH
Archived
Redo Logs
MRP/ LSP
RFS
Standby
Redo Logs
ARCH
Archived
Redo Logs
Choosing: Physical or Logical
Standby
Questions
Recommendations
1. Do you require strict
zero data loss?
Yes - use a physical standby database
No – go to next question
2. Do you have any
unsupported logical
standby data types?
run this query:
SELECT DISTINCT OWNER,TABLE_NAME FROM
DBA_LOGSTDBY_UNSUPPORTED
ORDER BY OWNER,TABLE_NAME;
Rows returned – use a physical standby or investigate
switching to supported data type
No rows returned – go to next question
3. Do you need to
have the standby
database open for read
and/or write access?
Yes – evaluate logical standby database
No -- evaluate physical standby database
Configuring Standby Over the
Network
 Performance Case Examples
–
–
Primary database in Tokyo and standby database in Kyoto (229
miles and 7ms RTT) in Maximum Protection mode ensure no data
loss even in the face of a disaster, with minimum performance
impact (2-3%).
Primary database in San Francisco and standby database in New
York (2582 miles and 78ms RTT) in Maximum Performance mode
had only seconds of data loss, with minimum performance impact
(1%).
 Best Practices are Key
–
–
–
–
–
Assess bandwidth and latency
Pick the appropriate transport mechanism and protection mode:
ARCH, LGWR SYNC or LGWR ASYNC
Set TCP Socket Buffer Sizes = Bandwidth x Round Trip Latency
Set SDU = 32K
Evaluate SSH port forwarding with compression
Fast Redo Apply
Redo apply out performs high production
redo rates.
Production Redo Rate
14
Standby Redo Appy
Rate
12
MB/sec
10
8
6
4
2
0
High OLTP
Batch Load
Transaction Profile
Fast SQL Apply
SQL Apply can manage high transaction
rates.
300
250
TPS
200
150
100
50
0
Full
Read Only
Consistency Model
None
Oracle Application Server 10g
High Availability
 Middle Tier
–
–
Oracle Application Server OC4J and Web Cache
clustering
Redundant mid-tier servers front ended by a load
balancer
 Infrastructure
–
–
Active Clusters which incorporates Real
Application Clusters
Cold Failover Clusters
Oracle Application Server 10g
HA Middle Tier
Clients
Load Balancer
Web Cache
Application
Server Tier
OC4J Clusters
Database Tier
Oracle Application Server 10g
Active Clusters Infrastructure
MAA in 10g
 Continuing to Test and Validate Oracle
Database and Application Server 10g
–
–
–
–
–
Flashback capabilities, RAC, Data Guard with
Real Time Apply
Rolling upgrades and scheduled maintenance
enhancements
Incorporating best practices into the core 10g
products
Best practices formalized into Oracle Database
and Application Server 10g documentation
MAA White Paper updates
Future MAA
 Incorporating E-Business Suite
 Incorporating Collaboration Suite
 Continuing to work with:
–
–
–
–
–
Internal Deployments
Outsourcing Deployments
Consultants
Partners
External Customers
MAA Test Lab
Sun Microsystems
Oracle Application Server
Oracle Application Server
Hewlett-Packard
WAN Traffic Manager
EMC
Dedicated Network
F5 Networks
RAC
Primary Site
Data Guard
RAC
Shunra
Secondary Site
MAA Information Sources
 Oracle Technology Network
–
http://otn.oracle.com/deploy/availability/htdocs/maa.htm





Maximum Availability Architecture
Oracle9i Media Recovery Best Practices
Oracle9i Data Guard: SQL Apply Best Practices
Oracle9i Data Guard Role Management Best Practices
Oracle9i Data Guard Primary Site and Network Configuration
Best Practices
 Oracle9iAS Cluster configuration
 Oracle Consulting – Advanced
Technologies Solutions (ATS) Group
–
http://otn.oracle.com/consulting/9iServices
Next Steps
High Availability Sessions from Oracle
Tuesday in Moscone Room 304
Wednesday in Moscone Room 304
11:00 AM
8:30 AM
How Oracle Database 10g
Revolutionizes Availability and
Enables the Grid
Oracle Database 10g - RMAN and ATA
Storage in Action
11:00 AM
3:30 PM
Oracle Recovery Manager (RMAN)
10g: Reloaded
Oracle Data Guard: Maximum Data
Protection at Minimum Cost
1:00 PM
5:00 PM
Proven Techniques for Maximizing
Availability
Oracle Database 10g Time Navigation:
Human-Error Correction
4:30 PM
Data Guard SQL Apply: Back to the
Future
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
Next Steps
High Availability Sessions from Oracle
Thursday
Database HA Demos All Four Days
In The Oracle Demo Campground
8:30 AM in Moscone Room 304
Oracle Database 10g Data
Warehouse Backup and Recovery:
Automatic, Simple, Reliable
8:30 AM in Moscone Room 104
Building RAC Clusters over
InfiniBand
Real Application Clusters
Data Guard
Database Backup & Recovery
Flashback Recovery
LogMiner, Online Redefinition, and
Cross Platform Transportable
Tablespaces
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
Reminder –
please complete the
OracleWorld online session
survey
Thank you.
QUESTIONS
ANSWERS
Download