Information Management SW Services BigInsights Stock Sizing Best Practices May 2014 Prepared by Information Management Copyrights © 2012 IBM Information Management All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form by any means without the written permission of IBM. Licenses and Trademarks Information Server is a registered trademark of IBM. Other product or corporate names may be trademarks or registered trademarks of other companies and are used only for explanation and to the owner’s benefit, without intent to infringe. IBM Information Management 71 S. Wacker Drive Chicago, IL 60606 USA Authors Name Title Email Address Marty Donohue IM Services Big Data Consultant donohuem@us.ibm.com Revision History Version Date Author Remarks 1 2 April 2014 May 2014 Marty Donohue Lynnette Zuccala Draft Document Final Document 06/30/16 7:39 PM IBM Information Management Page ii © 2012 IBM Information Management Table of Contents TABLE OF CONTENTS ........................................................................................................................... III 1 TABLE OF FIGURES ....................................................................................................................... IV 2 TABLE OF TABLES ........................................................................................................................... V 3 INTRODUCTION ................................................................................................................................ 6 3.1 INTENDED AUDIENCE ...................................................................................................................... 6 3.2 SCOPE OF THIS DOCUMENT .............................................................................................................. 6 3.3 ORGANIZATION OF THIS DOCUMENT ................................................................................................ 6 4 BACKGROUND ................................................................................................................................... 7 5 RECOMMENDED ENVIRONMENTS ............................................................................................. 9 5.1 LAB (POC) / SANDBOX ENVIRONMENT ........................................................................................... 9 5.2 DEVELOPMENT ENVIRONMENT ......................................................................................................10 5.3 TEST ENVIRONMENT (QA) .............................................................................................................11 5.4 MOCK PRODUCTION (PERFORMANCE TEST) ...................................................................................12 5.5 PRODUCTION ..................................................................................................................................14 6 STOCK IBM CLUSTER SIZINGS ...................................................................................................15 7 CUSTOM SIZING...............................................................................................................................17 8 SUMMARY ..........................................................................................................................................18 9 APPENDIX ..........................................................................................................................................19 9.1 BIGINSIGHTS AND DATA (WATSON) EXPLORER SIZINGS................................................................19 9.2 SUPPORTED PLATFORMS ................................................................................................................20 9.3 REFERENCE MATERIAL ..................................................................................................................21 9.4 BEST INDUSTRY PRACTICES FOR SOLUTION ...................................................................................24 9.5 APPENDIX: CHANGE MANAGEMENT PROCESS ............................................................................26 06/30/16 7:39 PM IBM Information Management Page iii © 2012 IBM Information Management 1 Table of Figures Figure 3 – Lab/Sandbox Environment Hardware Design – 5 Edge/Data Nodes ............................................ 9 Figure 4 – Development Environment Hardware Design - 5 Edge/14 Data Nodes ......................................10 Figure 5 – Test (QA) Environment Hardware Design – 8 Edge/14 Data Nodes ...........................................11 Figure 6 – Mock Production (Performance Test) Environment Hardware Design – 8 Edge/9 Data Nodes ..12 Figure 7 – Mock Production (Performance Test) Environment Hardware Design with DR – 8 Edge Nodes/9 Data Nodes (with HR) ...................................................................................................................................13 Figure 8 – Production Environment Hardware Design – 8 Edge Nodes/17 Data Nodes...............................14 Figure 9 – Production Environment Hardware Design with DR – 8 Edge Nodes/17 Data Nodes with HR .15 06/30/16 7:39 PM IBM Information Management Page iv © 2012 IBM Information Management 2 Table of Tables Table 1 – Workload Components ..................................................................................................................17 Table 2 – Stock BigInsights Sizing ............................................................... Error! Bookmark not defined. Table 3 – Streams and Data (Watson) Explorer Sizing .................................................................................19 Table 4 – Software Version Comparisons .....................................................................................................20 Table 5 – Change Management .....................................................................................................................26 06/30/16 7:39 PM IBM Information Management Page v © 2012 IBM Information Management 3 Introduction 3.1 Intended Audience This document is meant for Personnel who are involved in the BigInsights Shared Services project Individual IBM solution teams Individual non IBM solution teams The audience should have a high level understanding of the Apache Hadoop (Hadoop) architecture as well as the IBM InfoSphere BigInsights platform before reading this document as it refers to concepts such as management, edge and data nodes which are specific to a Hadoop and BigInsights implementation. 3.2 Scope of this document The scope of this document is limited to the hardware and core software technology components for BigInsights core solution infrastructure. The WAN infrastructure is also beyond the scope of this document. Other Data center aspects such as power, cooling, rack placement of the hardware are the responsibilities of the infrastructure group. These are beyond the scope of this document. This document does not address performance and scalability of the solution. This document addresses only the design of the solution infrastructure. The contents of this documentation are based on a series of discussions between lead architects, the implementation experts, and SMEs (Subject Matter Experts). 3.3 Organization of this document The document starts with a high level background for a shared services strategy. Various factors that influence the infrastructure architecture such as high availability, storage, security and encryption are described in individual sections. Efforts have been made to provide relevant technology details while at the same time masking unwanted technical details. 06/30/16 7:39 PM IBM Information Management Page 6 © 2012 IBM Information Management 4 Background Big Data implementations involve a shared services platform based on the BigInsights platform. The initial migration will land files on the cluster with the intention that applications will migrate their ETL and analysis workload to BigInsights. A shared services platform supports these use cases: Data discovery Data mining / Predictive model development Predictive model execution ETL data loads Large scale computing execution Data archival The workload split (more towards analytics than data loading and offloaded high compute processes) is driving the recommendation of using a predefined configuration to support a specific amount of raw data landing on each environment. For each of the workload types there will be a need to lock down data visibility and access. For any environment where both analytical and operational types of workloads will run side by side there will be a need to manage resources across the workloads in order to ensure SLAs for operational related processes. There will be multiple projects for each type of workload and multiple people or groups of people that will need access to them. As new applications transfer their ETL / Analytics workload to the BigInsights cluster the current load (CPU and Memory consumption) should be monitored and analyzed and additional nodes added as needed. This approach of growing hardware in an incremental, cost effective, non disruptive manner is optimal for emerging technology such as Big Data and to support a pipeline of future tenants with limited knowledge of their requirements. Consistent with industry standards, IBM recommends that eight (8) environments are instantiated for this platform. They are: 1. Lab - An environment for bringing in early release versions of Big Data components for experimentation and validation of functionality. This environment is limited to use by infrastructure personnel for their experimentation with software and hardware platforms. There is no provision for High Availability (HA), other than what Apache Hadoop provides in the file system, or Disaster recovery (DR). 2. Sandbox (Analytics) - An environment for free-form experimentation with data for unbounded information discovery (POC). The governance in this environment is limited to tracking of user credentials and data sources that are being brought in for information discovery. There is no provision for High Availability (other than what Apache Hadoop provides in the file system) or Disaster recovery. This environment is designed to support 10 TB of data. 3. Development - An environment for development of applications to address specific use cases in the Client domain, on the Big Data platform. This is also the environment that is the first step towards operationalization of the unbounded discovery and analytics accomplished in the Sandbox environment. The governance in this environment will track data source, usage, lineage, and user credentials. There is no provision for High Availability (other than what 06/30/16 7:39 PM IBM Information Management Page 7 © 2012 IBM Information Management 4. 5. 6. 7. 8. Apache Hadoop provides in the file system) or Disaster recovery. This environment is designed to support 50 TB of data. Test (QA) - An environment used by the Application and System Test groups for testing and verification of applications developed to address use cases in the Client domain. The governance in this environment will track data source, usage, lineage, and user credentials. This environment will have HA capabilities provisioned for all services so as to simulate functional execution in the presence of software and hardware failures. This environment is designed to support 50 TB of data. Mock Production(Performance Test) - An environment that is a replica of the production environment in all aspects - data center, network, HA, DR - except that it is at smaller scale in terms of data size (50%) than production. The governance in this environment will track data source, usage, lineage, and user credentials. This environment will have HA and DR capabilities provisioned for all services so as to simulate functional execution in the presence of system and hardware failures. This environment is designed to support 50 TB of data. Mock Production - Disaster Recovery – This environment is not being considered for immediate deployment. The details are provided for future consideration of Client. Production - The true production environment will adhere to enterprise SLA, security and governance, with HA and DR. This environment is designed to support 100 TB of data. Production – Disaster Recovery – This environment will act as the backup environment in case of a Production environment data center failure. All of these environments except Sandbox, Lab and Development require HA to be configured. If there is a need to support a different geography with limited bandwidth to the primary cluster, consider creating a smaller cluster that serves that geography. 06/30/16 7:39 PM IBM Information Management Page 8 © 2012 IBM Information Management 5 Recommended Environments The following sections cover the hardware design for each of the environments for a typical ETL/Analytics mix (30/70). The intention of these sizings is a starting point for when the exact nature of the workload (see custom sizing questions) is unknown. The workload for these environments assumes Hbase is not being used. The Stock IBM Cluster Sizings section includes a sizing for Hbase based workloads. 5.1 Lab (POC) / Sandbox Environment The lab/sandbox environment will require at least 5 edges nodes (3-BigInsights, 1-Streams and 1-Data Explorer), System Administration Console and the 5 data nodes. HA will not be required. Figure 1 – Lab/Sandbox Environment Hardware Design – 5 Edge/Data Nodes 06/30/16 7:39 PM IBM Information Management Page 9 © 2012 IBM Information Management 5.2 Development Environment The development environment will require at least 5 edges nodes (3-BigInsights, 1-Streams and 1-Data Explorer), System Administration Console and the 14 data nodes. HA will not be required. Figure 2 – Development Environment Hardware Design - 5 Edge/14 Data Nodes 06/30/16 7:39 PM IBM Information Management Page 10 © 2012 IBM Information Management 5.3 Test Environment (QA) The test environment will require at least 8 edges nodes (6-BigInsights [3-HA], 1-Streams and 1Data Explorer), System Administration Console and the 14 data nodes. HA will be required. Figure 3 – Test (QA) Environment Hardware Design – 8 Edge/14 Data Nodes 06/30/16 7:39 PM IBM Information Management Page 11 © 2012 IBM Information Management 5.4 Mock Production (Performance Test) The mock production environment will require at least 8 edges nodes (6-BigInsights [3-HA], 1Streams and 1-Data Explorer), System Administration Console and the 9 data nodes. HA will be required. Figure 4 – Mock Production (Performance Test) Environment Hardware Design – 8 Edge/9 Data Nodes 06/30/16 7:39 PM IBM Information Management Page 12 © 2012 IBM Information Management Figure 5 – Mock Production (Performance Test) Environment Hardware Design with DR – 8 Edge Nodes/9 Data Nodes (with HR) 06/30/16 7:39 PM IBM Information Management Page 13 © 2012 IBM Information Management 5.5 Production The production environment will require at least 8 edges nodes (6-BigInsights [3-HA], 1-Streams and 1-Data Explorer), System Administration Console and the 17 data nodes. HA will be required. Figure 6 – Production Environment Hardware Design – 8 Edge Nodes/17 Data Nodes 06/30/16 7:39 PM IBM Information Management Page 14 © 2012 IBM Information Management Figure 7 – Production Environment Hardware Design with DR – 8 Edge Nodes/17 Data Nodes with HR 6 Stock IBM Cluster Sizings The following 3 sizings cover 2 production scenarios as well as a development environment setup. The scope is for BigInsights, and does not include the peripheral products (SPSS, Streams, Data Explorer, and Cognos). 06/30/16 7:39 PM IBM Information Management Page 15 © 2012 IBM Information Management Workload type Description Components Production System: Target Workload: Full Rack: 18 DataNodes, 3 MgmtNodes, Total capacity: 288tb of raw capacity, Multiply by expected compression ratio for MapReduceactual capacity - Traditional - Landing zone DataNode: 3650 M4 BD processor, E5-2650 processor 16c 64gb memory, 14- 4tb drives; 12 for data 2 for mirrored OS MgmtNode: 3550 M4 E5-2650 processor 16c, 128gb memory Switches: 1- G8264 10gbe data switch, 1-G8052 mgmt switch 42U rack same as above Half Rack: 9 DataNodes, 3 Mgmt Nodes Total capacity: 144tb of raw capacity, Multiply by expected compression ratio for actual capacity same as above Starter Rack: 3 DataNodes, 3 Mgmt Nodes Total capacity: 72tb of raw capacity Production System: Target workload: - NoSQL, Hbase, - Heavy Analytics Full Rack: 16 DataNodes, 5 MgmtNodes Total capacity: 256 tb of raw capacity, Multiply by expected compression ratio for actual capacity DataNode: 3650 M4 BD processor, E5-2680 processor 20c 256gb memory, 14- 2tb drives; 12 for data 2 for mirrored OS MgmtNode: 3550 M4 E5-2650 processor 16c, 128gb memory Switches: 1- G8264 10gbe data switch, 1-G8052 mgmt switch 42U rack same as above Half Rack: 8 DataNodes, 5 Mgmt Nodes Total capacity: 216 tb of raw capacity same as above Starter Rack: 3 DataNodes, 5 Mgmt Nodes Total capacity: 72 tb of raw capacity Development System Development System Starter Rack: 3 DataNodes, 1 Mgmt Nodes Total capacity: 144 tb of raw capacity DataNode: 3650 M4 BD processor, E5-2650 processor 16c 64gb memory, 14- 4tb drives; 12 for data 2 for mirrored OS MgmtNode: 3550 M4 E5-2650 processor 16c, 128gb memory Switches: 1- G8264 10gbe data switch, 1-G8052 mgmt switch 42U rack 06/30/16 7:39 PM IBM Information Management Page 16 © 2012 IBM Information Management Table 1 – Workload Components 7 Custom Sizing If the customer requires sizing for their specific workload, the answers to the questions below will provide input into the calculation for a custom sizing. Data volume How much data will be stored on the cluster? Data ingestion Bulk load: How much data, if any, will be bulk loaded? How quickly does bulk load need to finish? Incremental load: Will data be incrementally loaded? How much data per hour? Workload What percentage of the cluster will be used for map/reduce versus something else like HBase? Map/Reduce [if cluster will be used for M/R] How frequently will new Hadoop jobs get queued? How quickly does each Hadoop job need to finish? How much data will the typical Hadoop M/R job consume? How much data will it produce? HBase (if cluster will be used for HBase] How much data will be stored in HBase? What will be the rate of transactions against HBase? Will workloads tend to be read heavy or write heavy? What is the expected per-transaction response time? What other tools and technologies will be used on the cluster, such as Big SQL, Hive, etc. [Once we know, we can ask follow on questions, like expected response times to process Big SQL or Hive queries.] What else do we know about the workload? What type of analytics will be performed? Will it be I/O intensive? Compute intensive? Memory intensive? Cost and footprint How much is the customer willing to spend? How much floor space is available for racks? 06/30/16 7:39 PM IBM Information Management Page 17 © 2012 IBM Information Management 8 Summary The BigData implementations at IBM frequently do not have sufficient information for an accurate sizings. The architecture of BigData is one of expansion when necessary. The current Hadoop market statistic states that 72% of capacity is idle. As a result, the conversation should be one of beginning with a starting point with the recommendation that as data is migrated to the BigData platform organizations should monitor the cluster and add nodes as needed. 06/30/16 7:39 PM IBM Information Management Page 18 © 2012 IBM Information Management APPENDIX 9 9.1 BigInsights and Data (Watson) Explorer Sizings Size (TB) Equipm ent Type Stream s Mem ory Stream s Disk # of Stream s Nodes Small 0 ENTRY 8GB, Dual Octo Core 2 x 1TB 1 128 GB, Dual Octo Core 12 x 600 GB 0.25 Existing Console Hardw are Customer Standard Spec Medium 0 ENTRY 16GB, Dual Octo Core 2 x 1TB 1 128 GB, Dual Octo Core 12 x 600 GB 0.5 Existing Console Hardw are Customer Standard Spec Large 0 ENTRY 16GB, Dual Octo Core 2 x 1TB 1 128 GB, Dual Octo Core 12 x 600 GB 1 Existing Console Hardw are Customer Standard Spec Small 0 ENTRY 32GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 0.25 Existing Console Hardw are Customer Standard Spec Medium 0 ENTRY 32GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 0.5 Existing Console Hardw are Customer Standard Spec Large 0 ENTRY 32GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 1 Existing Console Hardw are Customer Standard Spec Small 0 VALUE 32GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 0.25 Existing Console Hardw are Customer Standard Spec Medium 0 VALUE 32GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 0.5 Existing Console Hardw are Customer Standard Spec Large 0 VALUE 32GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 1 Existing Console Hardw are Customer Standard Spec Small 0 VALUE 32GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 0.5 Existing Console Hardw are Customer Standard Spec Medium 0 VALUE 64GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 1 Existing Console Hardw are Customer Standard Spec Large Mock production 0 VALUE 64GB, Dual Octo Core 2 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 2 Existing Console Hardw are Customer Standard Spec Small 0 ENTERPRISE 128GB, Dual Octo Core 4 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 0 Existing Console Hardw are Customer Standard Spec Medium 0 ENTERPRISE 128GB, Dual Octo Core 4 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 0 Existing Console Hardw are Customer Standard Spec Large 0 ENTERPRISE 128GB, Dual Octo Core 4 x 2TB 1 128 GB, Dual Octo Core 12 x 600 GB 0 Existing Console Hardw are Customer Standard Spec Small 0 ENTERPRISE 4 x 2TB 1 Medium 0 4 x 2TB 1 Large 0 4 x 2TB 1 Existing Console Hardw are Existing Console Hardw are Existing Console Hardw are Customer Standard Spec Customer Standard Spec Customer Standard Spec Environm ent Data Explorer Mem ory Data Explorer Disk # of Data System Adm in Explorer Console Nodes NFS m ount Lab Sandbox Dev Test Production 128GB, Dual Octo Core 128GB, Dual ENTERPRISE Octo Core 128GB, Dual ENTERPRISE Octo Core 128 GB, Dual Octo Core 12 x 600 GB 128 GB, Dual Octo Core 12 x 600 GB 128 GB, Dual Octo Core 12 x 600 GB 2 4 8 Table 2 – Streams and Data (Watson) Explorer Sizing 06/30/16 7:39 PM IBM Information Management Page 19 © 2012 IBM Information Management 9.2 Supported Platforms This section describes the platforms supported by BigInsights. The supported platforms for BigInsights by version can be found at: http://www-01.ibm.com/support/docview.wss?uid=swg27027565 The supported platforms for Streams and Data Explorer software bundled with BigInsights can be found at: http://www-01.ibm.com/support/docview.wss?uid=swg27036473 The supported platforms for the Guardium software can be found at: http://www-01.ibm.com/support/docview.wss?uid=swg27035836 The following table compares the supported software versions with IBM supported software versions at the time of this document creation: Minimum Version IBM Recommended RH Linux 5.5 6 Guardium 9.1 9.1 Tivoli Workload Scheduler (TWS) 9.1 9.1 Cognos 10.2 10.2 Optim 9.1.0.3 9.1.0.3 WebSphere* 8.5 8.5 Table 3 – Software Version Comparisons - Note - WebSphere is bundled with the BigInsights Enterprise Edition 06/30/16 7:39 PM IBM Information Management Page 20 © 2012 IBM Information Management 9.3 Reference Material Big Data Networked Storage Solution for Hadoop http://www.redbooks.ibm.com/redpapers/pdfs/redp5010.pdf Implementing IBM InfoSphere BigInsights on IBM System x http://www.redbooks.ibm.com/redbooks/pdfs/sg248077.pdf Data security best practices: A practical guide to implementing data encryption for InfoSphere BigInsights http://www.ibm.com/developerworks/library/bd-datasecuritybp/ IBM General Parallel File System (GPFS) – IBM Internet http://www03.ibm.com/systems/software/gpfs – IBM Information Center http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpf s.v3r5.0.7.gpfs200.doc%2Fbl1adv_fposettings.htm IBM InfoSphere BigInsights – IBM Internet http://www.ibm.com/software/data/infosphere/biginsights – IBM Information Center http://pic.dhe.ibm.com/infocenter/bigins/v2r1/index.jsp IBM Integrated Management Module (IMM2) and Open Source xCAT – IBM IMM2 User's Guide ftp://ftp.software.ibm.com/systems/support/system_x_pdf/88y7599.pdf – IMM and IMM2 Support on IBM System x and BladeCenter Servers, TIPS0849 http://www.redbooks.ibm.com/abstracts/tips0849.html – SourceForge xCAT Wiki http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Main_Page – xCAT 2 Guide for the CSM System Administrator, REDP-4437 http://www.redbooks.ibm.com/abstracts/redp4437.html – IBM Support for xCAT http://www.ibm.com/systems/software/xcat/support.html 06/30/16 7:39 PM IBM Information Management Page 21 © 2012 IBM Information Management IBM Platform Computing http://www-03.ibm.com/systems/technicalcomputing/platformcomputing/products/symphony/ – IBM Internet http://www.ibm.com/systems/technicalcomputing/platformcomputing/index.html – IBM Platform Computing Integration Solutions, SG24-8081 http://www.redbooks.ibm.com/abstracts/sg248081.html – Implementing IBM InfoSphere BigInsights on System x, SG24-8077 http://www.redbooks.ibm.com/abstracts/sg248077.html – Integration of IBM Platform Symphony and IBM InfoSphere BigInsights, REDP-5006 http://www.redbooks.ibm.com/abstracts/redp5006.html – SWIM Benchmark http://www.ibm.com/systems/technicalcomputing/platformcomputing/products/sym phony/highperfhadoop.html IBM RackSwitch G8052 (1GbE Switch) – IBM Internet http://www.ibm.com/systems/networking/switches/rack/g8052 – IBM System Networking RackSwitch G8052, TIPS0813 http://www.redbooks.ibm.com/abstracts/tips0813.html IBM RackSwitch G8264 (10GbE Switch) – IBM Internet http://www.ibm.com/systems/networking/switches/rack/g8264 – IBM System Networking RackSwitch G8264, TIPS0815 http://www.redbooks.ibm.com/abstracts/tips0815.html IBM RackSwitch G8316 (40GbE Switch) – IBM Internet http://www.ibm.com/systems/networking/switches/rack/g8316/ – IBM System Networking RackSwitch G8316, TIPS0842 06/30/16 7:39 PM IBM Information Management Page 22 © 2012 IBM Information Management http://www.redbooks.ibm.com/abstracts/tips0842.html IBM System x3550 M4 (Management Node) – IBM Internet http://www.ibm.com/systems/x/hardware/rack/x3550m4 – IBM System x3550 M4, TIPS0851 http://www.redbooks.ibm.com/abstracts/tips0851.html IBM System x3630 M4 (Data Node) – IBM Internet http://www.ibm.com/systems/x/hardware/rack/x3630m4 – IBM System x3630 M4, TIPS0889 http://www.redbooks.ibm.com/abstracts/tips0889.html IBM System x Reference Architecture for Hadoop: InfoSphere BigInsights: – IBM Internet http://www.ibm.com/systems/x/solutions/analytics/bigdata.html – Implementing IBM InfoSphere BigInsights on System x, SG24-8077 http://www.redbooks.ibm.com/abstracts/sg248077.html 06/30/16 7:39 PM IBM Information Management Page 23 © 2012 IBM Information Management 9.4 Best Industry Practices for Solution As the solution implementation progresses it is extremely important to document best practices for each solution domain. To the best possible extent it should be ensured that the industry best practices are adhered to in testing, implementation, roll out and monitoring phases. Configuration Log A configuration log is an up-to-date compilation of information and configuration details about your solution infrastructure. Enough information about the infrastructure should exist so that infrastructure can be recreated based on it. Whenever changes happen, the configuration log should be updated. The configuration log can be in hardcopy or softcopy or in both. Hardcopy of the configuration log should be maintained as disaster backup. This would be required for variety of reasons: Disaster Recovery Troubleshooting Recreating infrastructure whose configuration is destroyed Planning for infrastructure additions Modifying or expanding Solution infrastructure Recovering accidentally deleted licenses Recovering or reconfiguring infrastructure configurations A suggested structure for configuration log resembles Microsoft Windows Explorer view of an online configuration log: 1. Detailed diagrams of your infrastructure Topology Different connection 2. Firmware log of all devices 3. A logbook of additions, deletions or modifications to the infrastructure 4. A directory structure for H/W and S/W devices 5. Infrastructure Profile 6. A script directory for any scripts created Data Protection-Backup Regularly test ability to restore Backing up and Restoring Infrastructure configurations When solution is implemented backup of each hardware and software need to be kept on the host. Planning Test Strategy 06/30/16 7:39 PM IBM Information Management Page 24 © 2012 IBM Information Management Every solution team should document their testing plans in their solution document. All the testing activities and the testing outcome need to be documented Standalone Testing: Here the individual solution infrastructure just implemented will be tested for basic functionality and connectivity. Testing after integration: This testing is carried out once after the solution is integrated. The nature of tests can be similar but now these are done in the integrated environment. 06/30/16 7:39 PM IBM Information Management Page 25 © 2012 IBM Information Management 9.5 APPENDIX: Change Management Process The Problem The most important factor to be kept in mind is that any modifications to any component of the solution could impact overall behavior of the solution. Therefore no change to any solution component should be directly applied to the production setup. Following are examples of changes S. No Changes to Solution Remarks 1 Applying patches to Linux Performance, normal running File system, volume manager 2 Updating FW on the H/W First try this out in the Test Devices like SAN, NAS, storage Setup Network, appliances, servers 3 Installing / upgrading new H/W Check with compatibility Matrix published by the Product test labs 4 Installing / upgrading New S/W Check with the compatibility Matrix Table 4 – Change Management Some Tips for making changes All changes must be documented Try to limit the number of changes applied at one time Automate the distribution of files / configuration Watch for system file changes 06/30/16 7:39 PM IBM Information Management Page 26