Recommendation - OneStart

Assessment Results IU Insight Discovery Workshop Version 1.5 October 2006 1 IU and Oracle Confidential Insight Document Title: IU Insight Deliverable v1.5.doc Revision date: 3/8/2016 July 2006 Oracle USA, Inc. World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquires: Phone: +1 650.506.7000 Fax: +1 650.506.7200 www.oracle.com Copyright © 2006, Oracle, All rights reserved This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. 2 IU and Oracle Confidential Contents Contents .........................................................................................................................................................3 Executive Commitment and Partnership ........................................................................................................................ 4 Executive Overview.....................................................................................................................................5 Insight Approach and Scope ............................................................................................................................................. 5 About Indiana University ................................................................................................................................................. 5 Summary of Findings ......................................................................................................................................................... 6 Summary of Recommendations ....................................................................................................................................... 7 Recommendations .......................................................................................................................................9 High Availability, and Grid Based Architectures ......................................................................................................... 9 The DBA Infrastructure Instance ................................................................................................................................ 10 The Oncourse CL Instance ........................................................................................................................................... 12 Improve planned outage windows ................................................................................................................................ 13 Best Practices ...................................................................................................................................................................... 17 Disaster Recovery ......................................................................................................................................................... 17 Monitoring and Maintenance ...................................................................................................................................... 17 Service Level Agreements ........................................................................................................................................... 18 Decision Support System ................................................................................................................................................ 18 Business Analysis ......................................................................................................................................22 List of Appendices .....................................................................................................................................25 3 IU and Oracle Confidential Executive Commitment and Partnership Your success is how we define our success. We look forward to continually working with you to move from your current state to your desired state and value realization. Through the Insight initiative Oracle Executive Management is prepared to work in close partnership with IU to support this strategic effort by jointly defining the architecture, sharing best practices and reviewing ongoing project efforts. On behalf of the entire Oracle Insight team, we thank you for the opportunity to work with you on your technology roadmap. We look forward to continued success. Respectfully, Jim Zemaitis Regional Vice President Oracle Corporation 4 IU and Oracle Confidential Executive Overview Insight Approach and Scope The Oracle North American Strategic Accounts organization appreciates Indiana University’s (IU) participation in Oracle’s Insight program. The Insight program is designed to help Oracle’s most important clients realize increased value for their investment in Oracle. At IU’s request, a team of Oracle solution architects conducted a discovery workshop focused on infrastructure optimization. This included a review of the current infrastructure, examination of the potential to support flexible grid architecture, recommendations relating to administration and manageability best practices, and a supporting business case/ROI for infrastructure consolidation. The discovery sessions were a dialogue during which both sides learned. At the end of the discovery sessions, the Oracle team provided and validated their initial analysis and recommendations. The first draft of this document will be reviewed by both teams together. The University’s comments will be incorporated into the final deliverable. As a result, this solution document is a collaborative effort between Oracle and IU, and we thank you for the opportunity. Our goal is to give you access to the best thinking and experience from Oracle and our vast customer base to assist you in meeting your business goals. As part of the deliverable, we provide a CD with all documents, white papers, and information referenced during this Insight. . About Indiana University Indiana University has eight campuses: the original campus in Bloomington, which is a residential campus; an urban campus in Indianapolis, which also includes the IU Medical Center; and six regional campuses in the Indiana cities of Gary, South Bend, Fort Wayne, Kokomo, Richmond, and New Albany. IU has:  More than 92,000 students on its eight campuses  922 degree programs  Almost 475,000 living alumni, including 230,000 working in Indiana 5 IU and Oracle Confidential  An annual operating budget of $2.2 billion  16,000 employees, including faculty and professional and support staff  More than 150 research centers and institutes  An endowment of more than $1 billion Indiana University is internationally known for the quality of its academic programs and attracts students from all over the world. At the same time, IU plays a key role in the economic and social well-being of Indiana residents, offering educational, cultural, and economic benefits to the state. Indiana University is a leader in fostering the multidisciplinary research essential to solving challenges of life and health. It is also a leader in forming the partnerships with business, industry, government, and other academic institutions that lead to important research and development and economic growth. Summary of Findings Indiana University has done an excellent job “doing more with less”, in an environment where state funding is declining and the demands of a 24 x 7 user community are increasing. Indiana University is a “university in transition”. The trustees have begun a process of re-thinking the organization structure that has been in place for 30 years. The President of the University will be leaving in less than two years. The university is reassessing its priorities to face the changing demands of the educational landscape. Several key factors in this re-assessment, and initiatives communicated to the Insight team include:  An increase in on-line learning  Growth in research  Participation in the open-source community  Reduction in state funding  All IT activities at IU must create significant value.  Improving physical facilities  Finding better ways to keep up with technology The Insight team found that the U.I.T.S. infrastructure and U.I.T.S. support teams at IU displayed the following characteristics:  Maximizing resources - doing more with less  Good systems administration skills and knowledge, from both a hardware and operating system perspective.  Leveraging of IBM relationship, for attractive pricing and support  Good Marketing IU as a world class research institution (9th largest 6 IU and Oracle Confidential supercomputer in the world dedicated to research)  U.I.T.S. demonstrated they have a clear prioritized list of projects.  Departments are coordinated and work well together.  U.I.T.S. is respected and consulted by other groups.  Good DBA practices and skills. For example: Maintaining the discipline to support only two versions of the Oracle database for production systems. The Insight team also learned that there are several challenges facing the university:  Do more with less  Challenges relating to support of a 24 x 7 Environment  Reducing planned downtime  Collaboration in Open-Source Community - Site failover issues - Staff Support - Open Source development teams are located across time-zones - No acceptable down-time - In addition to production, there is a requirement for both the development environment and test environment to be available 24 x 7  Need rolling upgrades  Removing single points of failure  Defining and scoping of Service Level Agreements (SLA’s)  Develop a better working relationship with Oracle  Data growth and data movement  Challenges relating to identity management and security It is with these initiatives and challenges in mind that the Insight team developed the recommendations described in the next section. Summary of Recommendations Implement high-availability (HA. The Insight team found that there is a need for some database systems at IU to be highly available, and leverage clustering technology. This HA technology will provide:  24 x 7 availability  Shortened maintenance windows  More productive management  Better more accurate system monitoring  Better utilization of hardware 7 IU and Oracle Confidential  Capacity on demand The Insight team recommends that U.I.T.S. first implement HA in the internal-facing DBA Infrastructure Instance of the Oracle database. This is a good system to learn the clustering technology. After the DBA Infrastructure Instance is made ready for HA, the Insight team recommends deployment of Oncourse CL on a HA database infrastructure. There is a business requirement for Oncourse CL to be available 24 x 7, and therefore this is a good candidate for clustering. In this document, we have also included several recommendations and best-practices relating to reducing planned maintenance windows, as well as several management and monitoring best practices. The Insight team recommends an investigation to improve the decision support system (DSS). The large volume of data that is rebuilt everyday and the large amounts of data that flow from the OLTP systems to staging areas to the DSS presents an opportunity for process and architectural improvement. The Insight team acknowledges that the IU team is aware of this challenge, and we look forward to working together to craft a solution that brings significant value to IU. Technical details of this recommendation, and all the recommendations made by the Insight team, can be found in the recommendations section of this document. After presenting the detailed recommendations in this document, we provide a business analysis section to highlight the costs and benefits associated with the detailed recommendations. Because the scope of this engagement was limited to infrastructure, we did not go into detail during our discovery around the DSS processes and reporting systems that rely on DSS. Perhaps another Insight, focused on Business Intelligence, would provide additional value. During a Business Intelligence Insight, we would assemble a different Insight team, with deep knowledge of data warehousing and business intelligence. Additionally, an Insight focused on the Roadmap to Fusion, or Security may be of interest to IU going forward. 8 IU and Oracle Confidential Recommendations High Availability, and Grid Based Architectures Oracle introduced Real Application Clusters (RAC) with the release of Oracle 9i. Oracle RAC allows multiple database nodes to connect to a single instance of the database and transact simultaneaously. This technology eliminates the database as a single point of failure. RAC is an important part of an overall strategy to improve unplanned outages. Higher service levels Grid computing reduces the time and the effort necessary to failover due to its active / active architecture. If a database node fails for any reason (CPU board failure, etc), the remaining nodes continue to serve the database requests. The system is still available. Those users who were connected through the failed node are reconnected to another server within a few seconds, and users who were connected to others servers continue their work without any interruption, as shown in the illustration. This approach allows for continuous processing without interrupting users and eliminates downtime due to manual node failover. In addition to providing high availability, Oracle RAC typically provides: 9 IU and Oracle Confidential  Better scalability – Oracle RAC allows the database to scale horizontally. As the environment reaches its capacity limits, additional nodes can be added on demand, instantly increasing capacity resources.  Better utilization of hardware - Nodes can be shared amongst clusters in the overall environment allowing unused resource cycles to be better utilized. For example, a system that requires nightly batch cycles can borrow them from a typical OLTP system that requires them during the day.  Reduced cost – Because of its scalability features, Oracle RAC can leverage cheaper commodity type hardware, typically reducing the TCO of the environment. For more information regarding costs please see the business analysis section of this document. For more information on GRID computing and RAC please see: http://www.oracle.com/technologies/grid/index.html or the appendices on the companion CD. The DBA Infrastructure Instance The DBA Infrastructure instance, also known as the OEM Instance (Oracle Enterprise Manager), is the first system the Insight team recommends move towards a highavailability architecture. The reason for this choice is that the DBA Infrastructure Instance is an internal facing application, without a user community, and will provide an excellent environment to learn the nuances of Oracle cluster technology. Key Findings In a later section, we discuss the potential for improving the DSS Architecture at U.I.T.S. Improving the architecture of the DSS at U.I.T.S. would allow the DBA staff to better leverage the features of Oracle RMAN, such as integral backups, central view of a backup catalog, flash backups and recoveries, improved backup performance, etc. Currently, backups of the DSS environment take too long to leverage the features of RMAN. Definition - RMAN(Recovery Manager): An Oracle tool that allows you to perform physical database backups in a more controlled manner. With RMAN, the backups and recoveries are managed for you through the RMAN toolset as well as with a GUI interface using Enterprise Manager. Syntax is simplified and the scripting is powerful and consistent across platforms. This is the toolset that Oracle has been investing in and moving toward since Oracle 8i. Currently, the backup and recovery is connected to an IBM Tivoli Storage Manager (TSM) environment. When the IU staff begins to use RMAN as their backup standard, this RMAN environment will require a higher level of availability. Currently, in this environment, the database is a single point of failure. The production OEM environment is currently on the 4-CPU, 8 GB LPAR esdb13 with an rperf rating of 8.69. Issues: 10 IU and Oracle Confidential  If RMAN were used as the U.I.T.S backup and recovery standard, the RMAN database would need to be architected for high availability. This is because RMAN contains the information necessary to recover from an unplanned outage.  There may also be licensing issues for the media manager integration. Recommendation: Use Oracle RMAN for Backups The Insight team recommends using Oracle RMAN as its mechanism for backup and recovery. Oracle RMAN maintains an online catalog of all backups allowing a view of the University’s backup history. New features in Oracle RMAN, most notably incremental backups, can significantly improve the backup and recovery time. Benefits:  Allows a catalog view of backup and recovery history  Significantly improves backup and recovery time leveraging incremental backups Recommendation: Architect the DBA Infrastructure Instance for HA Clustering the DBA Infrastructure instance would allow the RMAN database to be highly available. With the release of Oracle 9i, Oracle introduced the concept of Real Application Clusters. Oracle RAC allows multiple database nodes to simultaneously connect to a single shared storage and act as a single instance of the database. Therefore, when one of the nodes fails the database continues to process uninterrupted. Besides high availability, Oracle RAC provides other benefits. Environments clustered with RAC can scale horizontally; as the environment nears its capacity, nodes can be added to the cluster. Because of the horizontal scalability, you can leverage commodity hardware for the architecture; oft times at a significant cost savings. Oracle RAC allows for better utilization of hardware as well; nodes can be shared between clusters as determined by the workload. With the inception of Oracle 10g RAC, Oracle introduced two key new features:  Oracle Cluster Repository Services (CRS)  Oracle Automatic Storage Management (ASM) Prior to Oracle 10g, a 3rd-party vendor clusterware was required to run Oracle RAC. With CRS and ASM, this is no longer a necessity. CRS maintains a repository of all clusters within the University. This allows for ease in adding and removing nodes, sharing nodes between clusters, etc. ASM is a purpose built volume manager for Oracle database files. For more information on CRS, please see Oracle Tech Net: Clustering at http://www.oracle.com/technology/products/database/clustering/index.html or see the appendices on the companion CD. For more information on ASM please see Oracle Tech Net: ASM at http://www.oracle.com/technology/products/database/asm/index.html or see the 11 IU and Oracle Confidential appendices on the companion CD For the U.I.T.S. case, the esdb13 environment could be broken into two separate 2CPU, 8 GB nodes. These nodes would connect to the same OEM Production instance, but be clustered together using Oracle RAC. Figure 1 Two-Node Cluster The nodes either could be LPARs as they are today, or could be implemented using commodity-type hardware. Further details of the economic impact behind this decision are included in the Business Analysis section of this document. Benefits:  Database is no longer a single point of failure  Provides for horizontal scalability of the database  Better utilization of hardware  Potential home for other critical administration repositories such as those required for Oracle’s Automatic Storage Management (ASM) and grid control. The Oncourse CL Instance Key Findings Using the DBA Infrastructure instance as the foundation for highly available systems at IU, this reference architecture can be applied to all systems requiring high availability. During the Insight discussions the Oncourse CL system was targeted as one of the next systems that would require the highest levels of availability. The Oncourse CL system will likely have to support a larger number of users than currently supported so scalability will be a requirement. Additionally, due to the 24 X 7 nature of Oncourse CL, a highly available system will also be a requirement. The current Oncourse CL production system resides on the 6-CPU, 8 GB LPAR 12 IU and Oracle Confidential esdb06. This LPAR is on an IBM p570 with 16 processors, 64 GB of memory and an rperf rating of 68.40. Recommendation - Architect Oncourse CL for HA To configure the Oncourse CL for high availability, a three node, 2-cpu / 8 GB per node configuration could be utilized, as shown below: Figure 2 Three-Node Cluster Besides high availability, the Oncourse CL system is the perfect case study for horizontal scalability. As this application is new, its performance characteristics are not yet well defined. Oracle RAC would allow the architecture to grow in lock-step with the Oncourse CL user adoption. Benefits:  Provide the Oncourse CL application on a highly available database  Allow the Oncourse CL database resources to scale horizontally  Allow the dynamic allocation of new system resources while the application is still accessible to users  Clustering the Oncourse CL database allows for online maintenance and rolling upgrades Improve planned outage windows Key Findings Reducing the duration of planned outage windows was identified as a key area of contention for the IU project teams. Outages for the PeopleSoft SIS can last hours currently. Many of these outages are occurring during the mid-week day. IU works 13 IU and Oracle Confidential on many collaborative research projects where users may be world wide, thus further complicating planning for downtime. From the perspective of the DBA teams, they understandingly like to perform maintenance during the working day whenever possible, to avoid late night and weekend work. Issues:  Outages are occurring during the day and impacting development schedules  Planning for downtime in off hours means IT staff must work on nights and weekends. Recommendation – Perform Rolling Upgrades One of the prominent causes of system down time has traditionally been hardware and software upgrades. As customers rely more and more on the IT infrastructure to power their “always on” business, the concept of the “offline maintenance window” is gradually becoming a thing of the past. Today, customers need IT vendors to provide capabilities to perform such routine maintenance tasks without causing any business interruption. As the technology leader, Oracle Database 10g contains the most comprehensive set of features that enable customers to achieve this objective. We will now present an overview of those features that allow customers to perform software upgrades with little or no downtime. Let’s examine the various types of software management operations typically performed in an Oracle environment:  One off patches  Critical Patch Updates (CPUs)  Patchsets  Release and Version Upgrades  Operating System Updates  Application Updates Each of these scenarios is described in detail in the following sections. One off patches One-off patches are generally issued in response to critical bugs encountered by the customers that need to be fixed immediately and cannot wait until the release of the next patchset. The decision regarding when and whether to release a one-off patch is made by Oracle Worldwide Support and Defects and Diagnostic Resolution (DDR) group within Server Technologies based on set of pre-defined criteria, such as:  The bug must have a significant impact on customer’s ability to conduct normal business  The bug leads to issues such as database hangs, crashes, or data corruption and must be fixed immediately Each one-off patch is identified by a patch number (e.g. 3574504). Installing one-off 14 IU and Oracle Confidential patches does not change the version of the installed Oracle software. For example, even after applying patch number 3574504 on top of the version 10.1.0.3 of the Oracle database software, the version of the updated software will continue to be 10.1.0.3. However, a list of the installed one-off fixes is maintained in the Oracle install inventory and can be queried if desired. Installing one-off patches has traditionally required the database to be shutdown. However, starting with the version 9.2.0.2, certain one-off patches can be applied across Real Application Clusters (RAC) instances in a “rolling” fashion, provided each instance uses a local installation of the Oracle software (i.e. the instances do not share Oracle Home). The word “rolling” signifies that the patch can be installed on each instance, one at a time, without requiring other instances to go down. This allows the database to be accessible during the patch application process. However, it is important to note that the instance on which the patch is being applied must be brought down. Another important point to remember is that not all one-off patches today are rolling updateable. The patches that modify common data structures, inter-instance messages, or database metadata (e.g. views, stored procedures, or any other on-disk structure) cannot be applied in a rolling fashion. Consequently, patches that can be applied in a rolling fashion are clearly labeled that way (on MetaLink and in the readme.txt file). Customers can also use the OPATCH utility to determine if a given one off patch is rolling updateable or not. All Oracle Clusterware patches are rolling updateable. For further details on rolling patch updates, please refer to Database Rolling Updates with Real Application Clusters at http://www.oracle.com/technology/deploy/availability/pdf/Rolling_Patch_Updat e_Data_Sheet.pdf or see the appendices on the companion CD Or peruse the Oracle Documentation set: Oracle Database High Availability Architecture and Best Practices at http://st-doc.us.oracle.com/10/101/server.101/b10726/recover.htm#i1006430 or see the appendices on the companion CD. Critical Patch Updates In order to provide customers scheduled, periodic software updates to address security and other critical issues, Oracle has started releasing quarterly Critical Patch Updates (CPU) beginning January 2005. CPUs contain cumulative fixes for a number of critical issues and, just like the one-off patches, they do not alter the version of the installed Oracle software. For more information on the Critical Patch Updates program, please refer to the MetaLink note 290738.1 at https://metalink.oracle.com/metalink/plsql/showdoc?db=Not&id=290738.1 or see the appendices on the companion CD. Critical Patch Updates may or may not be rolling updateable and the release/preinstallation note for each CPU will indicate this. For example, both Alert 68 and CPU released in April this year were rolling updateable. In addition, the rolling update tests are often still in progress when a CPU is first released. As such, an updated version is later released once the tests are completed. The July 2005 CPU, which was initially released on July 12th, will soon be updated 15 IU and Oracle Confidential with the results of rolling update tests. Patchsets Patchsets and release upgrades are considered relatively major software upgrades. This type of software update typically requires changes in the database metadata and it does cause the version of the installed Oracle software to change. For example, applying the 10.1.0.4 patchset on top of the version 10.1.0.2 will update the version number of the updated software to 10.1.0.4. Release and Version Upgrades Generally speaking, release and patchset updates to the database software cannot be applied across RAC instances in a rolling fashion. The only exception to this rule is Oracle Clusterware, which can be upgraded to a new patchset or release in a rolling manner (e.g. Oracle Clusterware version 10.1 can be upgraded to 10.2 one machine at a time). However, Oracle Database 10g provides another technology – Data Guard – that can minimize the downtime required to perform patchset and release upgrades to a few minutes, even when a rolling upgrade across RAC instances is not possible. Using Data Guard SQL Apply, customers can create a logical standby database that can subsequently be upgraded to the new version without impacting the current production or primary database. After the logical standby upgrade is completed, a Data Guard switchover operation may be executed to make this upgraded standby database the new primary database, and applications and users can be rerouted to this database, making the old primary database available for upgrade without causing any application outage. This feature is only available in Oracle Database 10g Patchset 1 onwards – i.e. it can only be used to upgrade from 10.1.0.3 or higher versions. For more information on performing rolling upgrades using a logical standby database, please refer to: Technical White Paper: Oracle Database 10g Release 2 High Availability at http://www.oracle.com/technology/deploy/availability/pdf/TWP_HA_10gR2_HA _Overview.pdf or see the appendices on the companion CD Documentation: Oracle Database High Availability Architecture and Best Practices at http://st-doc.us.oracle.com/10/101/server.101/b10726/recover.htm#i1006387 or see the appendices on the companion CD MetaLink note 300479.1: Rolling Upgrades with Logical Standby at http://metalink.oracle.com/metalink/plsql/docs/rollup_10_1_0_4.pdf or see the appendices on the companion CD It may be noted here that this mode of upgrade does require another database (i.e. the logical standby database) to be created, which can be either RAC or non-RAC. Also, while performing such upgrades on a RAC database, all instances must be taken offline. In another words, rolling patchsets and release upgrades can not be performed across RAC instances in the manner rolling patch updates are done i.e. one instance at a time. But as stated above, it can be done using the Data Guard SQL Apply feature that requires creation of a logical standby database. This is an important distinction to understand and explain to customers in order to set the right expectations. 16 IU and Oracle Confidential Operating System Updates Operating System updates may or may not require the machine to be taken off service. Generally speaking, most major OS upgrades, such as going from Red Hat AS 2.3 to 3.0, do require machine to be taken down while some of the patches may be applied online. In either case, OS vendors will indicate the fact whether the update can be applied online or not. Oracle Clusterware and Oracle Real Application Clusters support rolling upgrades of the OS when the version of the Oracle Database is certified on both releases of the OS. Alternatively Data Guard standby databases (physical standby database or logical standby database) can always be used to perform rolling OS upgrades, using a similar strategy as in the case of rolling database patchset and release upgrades. Best Practices During our discovery sessions, the Insight team heard from IU that they would like information regarding best practices relating to infrastructure management. Specifically, best practices relating to Disaster Recovery, Service Level Agreements (SLA’s), Maintenance and Monitoring. What follows is a discussion of those areas. When necessary, we reference appendices and white papers. Disaster Recovery During the Indiana University Insight discovery sessions the team discussed the Disaster Recovery (DR) initiative underway and Oracle’s approach to protect business critical databases and applications from system failures, user errors, administration errors and data corruptions that might bring a production database down. Oracle Data Guard optimizes the primary-to-secondary replication and is the recommended approach for IU to achieve their disaster recovery goals. Changes to the primary database are immediately replicated to the failover database, minimizing data loss. Oracle Data Guard is also a supported product with user interfaces, tool kits and monitors that prove invaluable to administrators. For more information on Data Guard, please see: http://www.oracle.com/technology/deploy/availability/htdocs/DataGuardOvervi ew.html or see the appendices on the companion CD Monitoring and Maintenance Oracle Enterprise Manager 10g Grid Control is uniquely positioned to streamline the monitoring and administration of the IU infrastructure. All of the requirements highlighted during the Insight are included in the Grid Control product, as well as several other features that will also prove valuable. IU already holds licenses for this product and some of its packs (Diagnostics and Tuning). Some of the features described may require additional pack licenses. 17 IU and Oracle Confidential Oracle Enterprise Manager 10g Grid Control provides:  System monitoring and tuning  Service Level Management  Inventory Monitoring  Patch Management and Deployment (Database and O/S)  Provisioning and Deployment (Database and O/S)  Policy compliance Organizing and automating these tasks utilizing a tool can greatly ease the workload of the IU staff and allow them to spend their time on more productive projects. For more information on Oracle Enterprise Manager 10g Grid Control please see http://www.oracle.com/technology/products/oem/index.html or see the appendices on the companion CD Service Level Agreements For more information on SLA best practices please see the appendices. Decision Support System Key Findings Indiana’s Decision Support System (DSS) is rebuilt on a daily basis. Historical data for the rebuild is maintained in the feed systems, i.e., the OLTP systems from which the source data is generated. For this reason, there is little opportunity with the current architecture to archive information in the DSS system. The Insight team does realize that in certain circumstances maintaining historical data in the OLTP systems may be desirable for reasons such as reporting, data retention policies and data availability. However, maintaining this volume of detail historical data should be a choice, not a requirement. As the total data volume continues to grow, without approaching a steady state, this produces a cascading effect of not being able to sustain backup windows because of the growth in data volume. It also has a subsequent negative performance effect on the OLTP systems. Copies of data are being generated to regenerate the DSS system, which creates a multi step process. This process continues to produce a growing amount of overhead and potential performance issues as the source data continues to grow. Additionally, 18 IU and Oracle Confidential it also represents a physical cost as systems and disk allocations are required to support the infrastructure. With today’s technologies, it is possible to isolate and capture change data from source systems and apply it in a single step to the DSS target without introducing any other levels of complexity. The Insight team recommends that Indiana University consider alternative techniques in the construction and maintenance of their DSS systems, thereby avoiding exceeding the capacity of their systems and environment. Since Indiana has a working DSS system already, it will be a significantly reduced effort to migrate to an improved technical position. First, Indiana already has a welldefined set of source data. Secondly, Indiana additionally has a well-defined operational DSS system currently in use. In essence, it is known where the data is being loaded from, where it is going to, the frequencies that the data is loaded and the transformations necessary to achieve a usable state. It should be possible to adopt newer technologies for the ETL process and to implement new internal data structures within the DSS without having to modify the existing reporting and analysis facilities. Issues:  DSS is rebuilt daily  Source OLTP systems maintain large amounts of historical data.  The bulk of data affects backup windows and has performance implications.  Multiple copies of the same data are being maintained. Additionally, U.I.T.S. would like to investigate the potential of adopting a more holistic view of their business intelligence environment, as the U.I.T.S. team feels there are several areas where reporting can be improved. Recommendation – Investigate Improving the DSS System Implementation of three concepts will go a long way towards the re-architecture of the DSS. They are:  Altering the change-data capture on the source systems  Streamlining the ETL process  Implementing partitioning in the DSS schemas These concepts are discussed in further detail in the sections below. The proposed solution eliminates the need for using ‘flash copy’ and reduces the amount of data necessary to keep the DSS target updated on a daily basis. This would free up major portions of disk space that could be provisioned elsewhere. The OLTP and DSS systems are raised to a new level of efficiency, accuracy and stability with the new technologies. IT will now have the ability to archive older information from the OLTP systems thus making them significantly more efficient. 19 IU and Oracle Confidential In addition, the IT staff will no longer have to do huge rebuilds of the DSS system. Because of the major reduction in data overhead, the backups of the systems become significantly less stressful and schedule constrained. The amount of data manipulation is significantly reduced, backup and recovery performance is significantly improved and the maintenance of all data is more granular. As an added benefit, the partitioning schema provides a component level of high availability as a by-product of the partitioning implementation. Failed process resumption at point-of-failure is a built-in feature of the components of the proposed solution. This gives the IT department the ability to fix the underlying problem and resume processing where it left off. Through every part of the proposed process are components focused on reducing the management and overhead of the existing system and significantly improving the performance and reliability of the process. Alter the Change-Data Capture on Source Systems The first component is to implement change-data capture on the source systems. Rather than sending copies of the OLTP data to the DSS, we only really need to send data that has changed. Benefits:  Eliminating the data copies significantly reducing the amount of data being transferred  Transfer each piece of data to the DSS system only once  Allow archival of the source OLTP data, significantly reducing the backup processing and improving the overall performance of the OLTP systems  Conserve large amounts of disk space  Allow migration of data to DSS in a single step Streamline ETL Operations Next, employ Oracle Streams technology to provide ETL (Extract, Transformation, Loading) operations necessary to load the data into the DSS target. This technology provides a schedulable system that has the ability to pick the change-data from the source systems, provides transformations of data in the process of movement at either the source system or the target system and provides queueable updates to the target DSS system that can be restarted from point of failure should a problem arise with the load process. An additional optional component that could be used is Oracle Data Warehouse Builder (OWB) that would provide a sustainable graphicaly oriented data mapping system that can be used to generate the scope of how the Oracle Streams process would be invoked. The Core OWB functionality is now a feautre of the Oracle Database and is available to IU with no new licensing costs. The Oracle Streams process would use Advanced queueing on the target DSS system to stage data to be loaded into the DSS tables. 20 IU and Oracle Confidential Benefits:  Schedulable system that has the ability to pick the change-data from the source systems. This allows IU to determine the refresh rate applied from the source OLTP systems to the target DSS systems.  Provides transformations in the process of movement. This would allow IU to merge/matchdata from disperate systems and provide data consistency in the resulting DSS system. Data transformations are easily modified in a central facility thus streamlining the data movement process.  Can be restarted from point of failure. As the number of data sources increase and the volume of data ETL increases, restart at point of failure becomes increasingly important to provide data feeds in a timely and accurate fashion.  Allow for the retirement of the ODS environments allowing IU to free up disk and CPU resources for reallocation elsewhere. Partition the DSS In order to provide performance, maintenance and high availability benefits, Indiana should be using the partitioning capabilities of the database in the target DSS. IU is already licensed for partiotioning, so this would present no new licensing costs. Partitioning can effectively be used by the Oracle Cost Based Optimizer (CBO) to significantly improve update and query performance. Because maintenance can be much more granular and isolated to smaller portions of the total data, maintenance time can be significantly reduced. The added advantage that is provided gratis when employing partitioning is the fact that, when a partition becomes inaccessable for any reason, all the rest of the surviving data is still available to users while the inaccessable partition is restored. Lastly, the usage of partitioning allows the creation of local indexes that work in conjunction with the Cost Based Optimizer to perform partition elimination during system usage to significantly improve performance by reducing the data result set sizes and system memory utilization. Benefits:  Improved performance – the CBO can leverage partitioning to significantly improve update and query performance.  Easier maintenance - maintenance is much more granular and isolated to smaller portions of the total data.  Higher levels of availability - when a partition becomes inaccessable for any reason, all the rest of the surviving data is still available to users.  Allow use of local index – can perform partition elimination to significantly improve performance by reducing the data result set sizes and system memory utilization. 21 IU and Oracle Confidential Business Analysis Like many institutions of higher education, Indiana University is learning how to cope and grow in an era of increasingly tight public funding, where funding for new information technology initiatives requires reducing the cost of existing products and services, and where demonstrating real value is the best currency with which to compete for increasingly scarce dollars. At the same time, IU is a strong contributor to the Open Source community, with support for projects like Kuali and Oncourse CL. The boundaries of the institution are expanding; IU is becoming a global supplier of educational services. Along with that expansion of boundaries come the requirements of a global services organization: the need to support 24X7 operations, with their ever expanding requirements for storage and computing power, and ever shrinking windows of downtime within which to perform system maintenance. In an effort to help Indiana University grapple with the growing importance of its IT infrastructure, the Oracle Insight program reviewed the Decision Support System (DSS), the RMAN system and Oncourse CL. Unlike many other organizations assisted by the Insight Program, Indiana University has negotiated extraordinarily aggressive pricing for their AIX server infrastructure, resulting in a core infrastructure cost that is highly optimized. Oracle confirmed this by evaluating multiple technology options, including several UNIX options as well a commodity server infrastructure. However, as noted below and elsewhere in this document, while the University’s technology choices have resulted in a low-cost infrastructure, they do not necessarily result in a highly available database environment. In response, Oracle has focused in part on recommendations that optimize high availability, without significant focus on optimizing infrastructure cost. High availability for Oracle databases is achieved through Real Application Clusters (RAC), which allows multiple database nodes or servers to simultaneously connect to a single shared storage and act as a single instance of the database. In a RAC environment, the failure of a node does not affect the ability of the database to continue uninterrupted processing. Environments clustered with RAC are also more readily scalable, as additional nodes can be added to the cluster. RAC can also lead to better utilization of the server infrastructure by combining multiple individual servers with low utilization commonly sharing server resources. The value proposition of Oracle RAC begins with the elimination of single points of failure. In a 24x7 global environment with interdependent systems, the outage of any critical system can ripple through the infrastructure. An outage in one system can impact the performance of other systems, creating transaction backlogs that threaten windows of application availability. Ultimately, these ripples can increase the risk to the integrity of the system. Manageability is also an important feature of RAC. In traditional infrastructures, applications and databases must be brought down when the underlying server infrastructure needs to be patched and maintained. With RAC, individual servers in a RAC cluster can be taken out of production for maintenance and repair without impacting database or application. In some instances, it is even possible to do rolling patching and upgrades of the database itself, further improving uptime and reducing dependence on increasingly scarce maintenance windows. 22 IU and Oracle Confidential While implementation of RAC usually results in a significant reduction in the overall cost of computing, the University has so successfully optimized its server costs that there is no meaningful difference in the cost of an AIX infrastructure and a commodity server infrastructure utilizing RAC. This means that deploying RAC across AIX servers in order to disaggregate the application layer from the server infrastructure and achieve true high availability and manageability comes at a cost. The issue, therefore, is the value of high availability. Oracle Recovery Manager (RMAN) High Availability Oracle Recovery Manager (RMAN) is the Oracle-preferred method for efficiently backing up and recovering the Oracle database. Ideally, the RMAN environment should be engineered for high availability; unavailability of the backup system is an unnecessary risk to data integrity and the ability of IT manage backup and recovery windows. While backups can be managed from single-server infrastructures, the failure of those servers – or, in the case of AIX LPAR’s, the failure of that LPAR – can mean failure of the scheduled backups and an increase in risk to the viability of the University’s applications infrastructure. Some applications are of sufficiently low value that restoring from days old backups is not a significant issue. For other applications, the loss of data may require the restoration of lost transactions or recovery of changed data through a lengthy and painstaking process which may take days, during which the application is unavailable. Oncourse CL High Availability Oncourse CL is a highly visible, important system that requires nearly ubiquitous availability. While this system can be managed within an AIX framework, true it would profit from the availability, resiliency, manageability and horizontal scalability of a RAC solution. While a RAC solution represents an increase in costs over a straight AIX solution using LPAR’s, the reality is that LPAR’s alone cannot provide database high-availability across multiple servers – meaning that a database installed with out RAC is hostage to the server on which it is installed. By utilizing a RAC across LPAR’s on multiple servers, Indiana University can achieve true high availability and gain the ability to manage the server infrastructure without impact to the database layer. The issue of course is one of cost and value. Given that deploying RAC will represent a cost above an AIX-only solution, the University must determine whether the high availability and manageability features of RAC justify the expense. The question is both one of risk of outage, and impact of an outage, and the degree to which the University wants Oncourse CL to be an always available system. Decision Support & Data Management Indiana University’s Decision Support System (DSS) is rebuilt on a daily basis. Rearchitecting the DSS system and re-evaluating how data is managed and stored in other key systems represents a significant opportunity to reduce the amount of data transferred between and stored by various systems by:  Reducing copies of data  Transferring each piece of data to the DSS system only once  Significantly reducing the backup requirement 23 IU and Oracle Confidential  Reducing the amount of data stored across systems The value of these changes can be understood both from a process management and financial perspective. Eliminating the daily rebuilds of the DSS, for example, would reduce the amount of time each day during which the system was unavailable, meaning that queries and other processes that currently wait for the daily rebuilds would be free to run on a less restricted schedule. The volume of data being replicated and rebuilt daily also drives cost. The data warehouse contains approximately 2 TB of data which is rebuilt daily. The Peoplesoft system, with 1 TB of allocated storage, is replicated at least 6 times to support backup, development and operational data stores. Reducing the amount of duplicative data stored by half would free up 4 TB of storage, at a cost of $80,000 (given a cost of $20,000 per TB for protected, usable storage). Further, the reduction in data stored and manipulated on a daily basis will significantly improve backup and recovery performance, eliminating the risk that the DSS, for example, cannot be backed up in the available time window while insuring the that the DSS is positioned to support a global, 24x7 educational institution. 24 IU and Oracle Confidential List of Appendices The following documents can be found on the companion CD AppendixA-Grid.doc 2DayDBA.pdf 10.9.RACRollingUpgrade.doc 10.10.UpgradeWithLogicalStandby.doc asm r2 new features.pdf asm_10gr2_bptwp_sept05.pdf asmov.pdf asmwp.pdf DataGuard.pdf db_storage_consolidation_wp 12-05.pdf ds_rac.pdf EMConcepts.pdf Generic_SLA.doc MAA_WP_10gASMMigration.pdf MetalinkNote290738.1.doc Oracle Data Guard.doc oracle_real_application_clusters_10g-the_foundation_for_grid_computing.pdf Rolling_Patch_Update_Data_Sheet.pdf rollup_10_1_0_4.pdf take the guesswork out of db tuning 01-06.pdf TWP_HA_10gR2_HA_Overview.pdf twp_rac10gr2.pdf 25 IU and Oracle Confidential

Recommendation - OneStart

Related documents

Products

Support

Recommendation - OneStart

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib