Session id: 40180 Proven Techniques for Maximizing Availability Maximum Availability Architecture Lawrence To, Shari Yamaguchi High Availability Systems Group Systems Technologies Oracle Corporation Agenda Achieving High Availability Maximum Availability Architecture (MAA) Solutions to Real World Questions Real MAA Deployments MAA in 10g Future MAA Q&A Achieving High Availability Prevent outages before they occur. Tolerate outages - planned or unplanned so they are transparent to the business. Recover quickly if an outage does occur. Causes of Downtime Computer Failures Unplanned Downtime Data Failures System Changes Planned Downtime Data Changes Human Error, Corruption, Storage Failure, Site Failure System Maintenance, Software Maintenance, Application Changes High Availability is … Maximum Availability Architecture Best Oracle High Availability Architecture Best Practices Building the configuration. Managing the configuration. Recovering from outages quickly. Restoring full fault tolerance. Continual Testing Evolves with new Oracle versions and features Maximum Availability Architecture What to Use: – High Availability Blueprint for Database, Oracle. Application Server, Enterprise Manager, and more. How to Build, Manage, and Recover: – – – Following configuration and operational best practices. Understanding outages and detailed recovery solutions. Restoring fault tolerance after an outage. Unbreakable Architecture + Best Practices = Maximum Availability Maximum Availability Architecture Oracle Application Server Oracle Application Server WAN Traffic Manager Dedicated Network RAC Primary Site Data Guard RAC Secondary Site MAA Was Created Based on… Real world customer requests and questions: – – – – – What issues should we consider for choosing the most optimal high availability architecture? What is Oracle’s best high availability architecture? How can we manage this high availability environment? What are the performance trade-offs? How do we repair from various outages? Examples of Issues That Have Been Addressed What is the best solution to avoid service disruption for host and instance failures? Which Disaster Recovery solution should we adopt? What is the best way to configure the standby database over a network? How do you configure Oracle Application Server for high availability? Best Solution to Avoid Service Disruption Real Application Clusters Fast Failover – – – Protection from local site system failures Faster than cold cluster failover solution Fast-start fault recovery (instance failure MTTR) Availability and Accessibility – Allows for scheduled outages Add and remove nodes transparently – Transparent Application Failover (TAF) provides uninterrupted service Best Solution to Avoid Service Disruption Real Application Clusters A B B Higher Scalability – – – All system resources from all nodes are leveraged Cache fusion eliminates need to partition data or modify the application – fully application transparent Connection load balancing distributes connection requests from application tier Manageability – Provides a single image of the database to manage Fast Instance Recovery Performance stays constant as recovery gets faster. 900 800 700 600 500 400 writes/sec tps 300 200 100 0 disabled 300 180 fast_start_mttr_target setting 90 Which Disaster Recovery Option? • Storage or Remote Mirroring, Geo-Clusters • • Vulnerable to human error and data failures. Latency. • Streams and Replication • • Ideal for active-active configurations that may involve heterogeneous environments. Offers finer granularity on what gets replicated and when. • Data Guard • Provides comprehensive data protection, data availability, and data recovery benefits, along with an integrated management framework. Data Guard Architecture Physical/Logical Standby Database Oracle Net Transactions LGWR Online Redo Logs Primary Database ARCH Archived Redo Logs MRP/ LSP RFS Standby Redo Logs ARCH Archived Redo Logs Choosing: Physical or Logical Standby Questions Recommendations 1. Do you require strict zero data loss? Yes - use a physical standby database No – go to next question 2. Do you have any unsupported logical standby data types? run this query: SELECT DISTINCT OWNER,TABLE_NAME FROM DBA_LOGSTDBY_UNSUPPORTED ORDER BY OWNER,TABLE_NAME; Rows returned – use a physical standby or investigate switching to supported data type No rows returned – go to next question 3. Do you need to have the standby database open for read and/or write access? Yes – evaluate logical standby database No -- evaluate physical standby database Configuring Standby Over the Network Performance Case Examples – – Primary database in Tokyo and standby database in Kyoto (229 miles and 7ms RTT) in Maximum Protection mode ensure no data loss even in the face of a disaster, with minimum performance impact (2-3%). Primary database in San Francisco and standby database in New York (2582 miles and 78ms RTT) in Maximum Performance mode had only seconds of data loss, with minimum performance impact (1%). Best Practices are Key – – – – – Assess bandwidth and latency Pick the appropriate transport mechanism and protection mode: ARCH, LGWR SYNC or LGWR ASYNC Set TCP Socket Buffer Sizes = Bandwidth x Round Trip Latency Set SDU = 32K Evaluate SSH port forwarding with compression Fast Redo Apply Redo apply out performs high production redo rates. Production Redo Rate 14 Standby Redo Appy Rate 12 MB/sec 10 8 6 4 2 0 High OLTP Batch Load Transaction Profile Fast SQL Apply SQL Apply can manage high transaction rates. 300 250 TPS 200 150 100 50 0 Full Read Only Consistency Model None Oracle Application Server 10g High Availability Middle Tier – – Oracle Application Server OC4J and Web Cache clustering Redundant mid-tier servers front ended by a load balancer Infrastructure – – Active Clusters which incorporates Real Application Clusters Cold Failover Clusters Oracle Application Server 10g HA Middle Tier Clients Load Balancer Web Cache Application Server Tier OC4J Clusters Database Tier Oracle Application Server 10g Active Clusters Infrastructure MAA in 10g Continuing to Test and Validate Oracle Database and Application Server 10g – – – – – Flashback capabilities, RAC, Data Guard with Real Time Apply Rolling upgrades and scheduled maintenance enhancements Incorporating best practices into the core 10g products Best practices formalized into Oracle Database and Application Server 10g documentation MAA White Paper updates Future MAA Incorporating E-Business Suite Incorporating Collaboration Suite Continuing to work with: – – – – – Internal Deployments Outsourcing Deployments Consultants Partners External Customers MAA Test Lab Sun Microsystems Oracle Application Server Oracle Application Server Hewlett-Packard WAN Traffic Manager EMC Dedicated Network F5 Networks RAC Primary Site Data Guard RAC Shunra Secondary Site MAA Information Sources Oracle Technology Network – http://otn.oracle.com/deploy/availability/htdocs/maa.htm Maximum Availability Architecture Oracle9i Media Recovery Best Practices Oracle9i Data Guard: SQL Apply Best Practices Oracle9i Data Guard Role Management Best Practices Oracle9i Data Guard Primary Site and Network Configuration Best Practices Oracle9iAS Cluster configuration Oracle Consulting – Advanced Technologies Solutions (ATS) Group – http://otn.oracle.com/consulting/9iServices Next Steps High Availability Sessions from Oracle Tuesday in Moscone Room 304 Wednesday in Moscone Room 304 11:00 AM 8:30 AM How Oracle Database 10g Revolutionizes Availability and Enables the Grid Oracle Database 10g - RMAN and ATA Storage in Action 11:00 AM 3:30 PM Oracle Recovery Manager (RMAN) 10g: Reloaded Oracle Data Guard: Maximum Data Protection at Minimum Cost 1:00 PM 5:00 PM Proven Techniques for Maximizing Availability Oracle Database 10g Time Navigation: Human-Error Correction 4:30 PM Data Guard SQL Apply: Back to the Future For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/ Next Steps High Availability Sessions from Oracle Thursday Database HA Demos All Four Days In The Oracle Demo Campground 8:30 AM in Moscone Room 304 Oracle Database 10g Data Warehouse Backup and Recovery: Automatic, Simple, Reliable 8:30 AM in Moscone Room 104 Building RAC Clusters over InfiniBand Real Application Clusters Data Guard Database Backup & Recovery Flashback Recovery LogMiner, Online Redefinition, and Cross Platform Transportable Tablespaces For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/ Reminder – please complete the OracleWorld online session survey Thank you. QUESTIONS ANSWERS