BigInsights_Stock_Sizing_Recommended_Practices_Document_v2.4.docx

Information Management SW Services
BigInsights Stock Sizing
Best Practices
May 2014
Prepared by
Information Management
Copyrights
© 2012 IBM Information Management
All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated
into any language in any form by any means without the written permission of IBM.
Licenses and Trademarks
Information Server is a registered trademark of IBM. Other product or corporate names may be trademarks or registered trademarks
of other companies and are used only for explanation and to the owner’s benefit, without intent to infringe.
IBM Information Management
71 S. Wacker Drive
Chicago, IL 60606
USA
Authors
Name
Title
Email Address
Marty Donohue
IM Services Big Data Consultant
donohuem@us.ibm.com
Revision History
Version
Date
Author
Remarks
1
2
April 2014
May 2014
Marty Donohue
Lynnette Zuccala
Draft Document
Final Document
06/30/16 7:39 PM
IBM Information Management
Page ii
© 2012 IBM Information Management
Table of Contents
TABLE OF CONTENTS ........................................................................................................................... III
1
TABLE OF FIGURES ....................................................................................................................... IV
2
TABLE OF TABLES ........................................................................................................................... V
3
INTRODUCTION ................................................................................................................................ 6
3.1
INTENDED AUDIENCE ...................................................................................................................... 6
3.2
SCOPE OF THIS DOCUMENT .............................................................................................................. 6
3.3
ORGANIZATION OF THIS DOCUMENT ................................................................................................ 6
4
BACKGROUND ................................................................................................................................... 7
5
RECOMMENDED ENVIRONMENTS ............................................................................................. 9
5.1
LAB (POC) / SANDBOX ENVIRONMENT ........................................................................................... 9
5.2
DEVELOPMENT ENVIRONMENT ......................................................................................................10
5.3
TEST ENVIRONMENT (QA) .............................................................................................................11
5.4
MOCK PRODUCTION (PERFORMANCE TEST) ...................................................................................12
5.5
PRODUCTION ..................................................................................................................................14
6
STOCK IBM CLUSTER SIZINGS ...................................................................................................15
7
CUSTOM SIZING...............................................................................................................................17
8
SUMMARY ..........................................................................................................................................18
9
APPENDIX ..........................................................................................................................................19
9.1
BIGINSIGHTS AND DATA (WATSON) EXPLORER SIZINGS................................................................19
9.2
SUPPORTED PLATFORMS ................................................................................................................20
9.3
REFERENCE MATERIAL ..................................................................................................................21
9.4
BEST INDUSTRY PRACTICES FOR SOLUTION ...................................................................................24
9.5
APPENDIX: CHANGE MANAGEMENT PROCESS ............................................................................26
06/30/16 7:39 PM
IBM Information Management
Page iii
© 2012 IBM Information Management
1 Table of Figures
Figure 3 – Lab/Sandbox Environment Hardware Design – 5 Edge/Data Nodes ............................................ 9
Figure 4 – Development Environment Hardware Design - 5 Edge/14 Data Nodes ......................................10
Figure 5 – Test (QA) Environment Hardware Design – 8 Edge/14 Data Nodes ...........................................11
Figure 6 – Mock Production (Performance Test) Environment Hardware Design – 8 Edge/9 Data Nodes ..12
Figure 7 – Mock Production (Performance Test) Environment Hardware Design with DR – 8 Edge Nodes/9
Data Nodes (with HR) ...................................................................................................................................13
Figure 8 – Production Environment Hardware Design – 8 Edge Nodes/17 Data Nodes...............................14
Figure 9 – Production Environment Hardware Design with DR – 8 Edge Nodes/17 Data Nodes with HR .15
06/30/16 7:39 PM
IBM Information Management
Page iv
© 2012 IBM Information Management
2 Table of Tables
Table 1 – Workload Components ..................................................................................................................17
Table 2 – Stock BigInsights Sizing ............................................................... Error! Bookmark not defined.
Table 3 – Streams and Data (Watson) Explorer Sizing .................................................................................19
Table 4 – Software Version Comparisons .....................................................................................................20
Table 5 – Change Management .....................................................................................................................26
06/30/16 7:39 PM
IBM Information Management
Page v
© 2012 IBM Information Management
3 Introduction
3.1
Intended Audience
This document is meant for



Personnel who are involved in the BigInsights Shared Services project
Individual IBM solution teams
Individual non IBM solution teams
The audience should have a high level understanding of the Apache Hadoop (Hadoop)
architecture as well as the IBM InfoSphere BigInsights platform before reading this document as
it refers to concepts such as management, edge and data nodes which are specific to a Hadoop
and BigInsights implementation.
3.2
Scope of this document
The scope of this document is limited to the hardware and core software technology components
for BigInsights core solution infrastructure. The WAN infrastructure is also beyond the scope of
this document. Other Data center aspects such as power, cooling, rack placement of the
hardware are the responsibilities of the infrastructure group. These are beyond the scope of this
document. This document does not address performance and scalability of the solution. This
document addresses only the design of the solution infrastructure. The contents of this
documentation are based on a series of discussions between lead architects, the implementation
experts, and SMEs (Subject Matter Experts).
3.3
Organization of this document
The document starts with a high level background for a shared services strategy. Various factors
that influence the infrastructure architecture such as high availability, storage, security and
encryption are described in individual sections. Efforts have been made to provide relevant
technology details while at the same time masking unwanted technical details.
06/30/16 7:39 PM
IBM Information Management
Page 6
© 2012 IBM Information Management
4 Background
Big Data implementations involve a shared services platform based on the BigInsights
platform. The initial migration will land files on the cluster with the intention that applications
will migrate their ETL and analysis workload to BigInsights.
A shared services platform supports these use cases:

Data discovery

Data mining / Predictive model development

Predictive model execution

ETL data loads

Large scale computing execution

Data archival
The workload split (more towards analytics than data loading and offloaded high compute
processes) is driving the recommendation of using a predefined configuration to support a
specific amount of raw data landing on each environment. For each of the workload types
there will be a need to lock down data visibility and access. For any environment where both
analytical and operational types of workloads will run side by side there will be a need to
manage resources across the workloads in order to ensure SLAs for operational related
processes. There will be multiple projects for each type of workload and multiple people or
groups of people that will need access to them. As new applications transfer their ETL /
Analytics workload to the BigInsights cluster the current load (CPU and Memory
consumption) should be monitored and analyzed and additional nodes added as needed.
This approach of growing hardware in an incremental, cost effective, non disruptive manner is
optimal for emerging technology such as Big Data and to support a pipeline of future tenants
with limited knowledge of their requirements.
Consistent with industry standards, IBM recommends that eight (8) environments are
instantiated for this platform. They are:
1. Lab - An environment for bringing in early release versions of Big Data
components for experimentation and validation of functionality. This
environment is limited to use by infrastructure personnel for their
experimentation with software and hardware platforms. There is no provision for
High Availability (HA), other than what Apache Hadoop provides in the file
system, or Disaster recovery (DR).
2. Sandbox (Analytics) - An environment for free-form experimentation with data
for unbounded information discovery (POC). The governance in this
environment is limited to tracking of user credentials and data sources that are
being brought in for information discovery. There is no provision for High
Availability (other than what Apache Hadoop provides in the file system) or
Disaster recovery. This environment is designed to support 10 TB of data.
3. Development - An environment for development of applications to address
specific use cases in the Client domain, on the Big Data platform. This is also
the environment that is the first step towards operationalization of the
unbounded discovery and analytics accomplished in the Sandbox environment.
The governance in this environment will track data source, usage, lineage, and
user credentials. There is no provision for High Availability (other than what
06/30/16 7:39 PM
IBM Information Management
Page 7
© 2012 IBM Information Management
4.
5.
6.
7.
8.
Apache Hadoop provides in the file system) or Disaster recovery. This
environment is designed to support 50 TB of data.
Test (QA) - An environment used by the Application and System Test groups
for testing and verification of applications developed to address use cases in
the Client domain. The governance in this environment will track data source,
usage, lineage, and user credentials. This environment will have HA
capabilities provisioned for all services so as to simulate functional execution in
the presence of software and hardware failures. This environment is designed
to support 50 TB of data.
Mock Production(Performance Test) - An environment that is a replica of the
production environment in all aspects - data center, network, HA, DR - except
that it is at smaller scale in terms of data size (50%) than production. The
governance in this environment will track data source, usage, lineage, and user
credentials. This environment will have HA and DR capabilities provisioned for
all services so as to simulate functional execution in the presence of system
and hardware failures. This environment is designed to support 50 TB of data.
Mock Production - Disaster Recovery – This environment is not being
considered for immediate deployment. The details are provided for future
consideration of Client.
Production - The true production environment will adhere to enterprise SLA,
security and governance, with HA and DR. This environment is designed to
support 100 TB of data.
Production – Disaster Recovery – This environment will act as the backup
environment in case of a Production environment data center failure.
All of these environments except Sandbox, Lab and Development require HA to be
configured.
If there is a need to support a different geography with limited bandwidth to the primary
cluster, consider creating a smaller cluster that serves that geography.
06/30/16 7:39 PM
IBM Information Management
Page 8
© 2012 IBM Information Management
5 Recommended Environments
The following sections cover the hardware design for each of the environments for a typical
ETL/Analytics mix (30/70). The intention of these sizings is a starting point for when the exact
nature of the workload (see custom sizing questions) is unknown. The workload for these
environments assumes Hbase is not being used.
The Stock IBM Cluster Sizings section includes a sizing for Hbase based workloads.
5.1 Lab (POC) / Sandbox Environment
The lab/sandbox environment will require at least 5 edges nodes (3-BigInsights, 1-Streams and
1-Data Explorer), System Administration Console and the 5 data nodes. HA will not be required.
Figure 1 – Lab/Sandbox Environment Hardware Design – 5 Edge/Data Nodes
06/30/16 7:39 PM
IBM Information Management
Page 9
© 2012 IBM Information Management
5.2 Development Environment
The development environment will require at least 5 edges nodes (3-BigInsights, 1-Streams and
1-Data Explorer), System Administration Console and the 14 data nodes. HA will not be required.
Figure 2 – Development Environment Hardware Design - 5 Edge/14 Data Nodes
06/30/16 7:39 PM
IBM Information Management
Page 10
© 2012 IBM Information Management
5.3 Test Environment (QA)
The test environment will require at least 8 edges nodes (6-BigInsights [3-HA], 1-Streams and 1Data Explorer), System Administration Console and the 14 data nodes. HA will be required.
Figure 3 – Test (QA) Environment Hardware Design – 8 Edge/14 Data Nodes
06/30/16 7:39 PM
IBM Information Management
Page 11
© 2012 IBM Information Management
5.4 Mock Production (Performance Test)
The mock production environment will require at least 8 edges nodes (6-BigInsights [3-HA], 1Streams and 1-Data Explorer), System Administration Console and the 9 data nodes. HA will be
required.
Figure 4 – Mock Production (Performance Test) Environment Hardware Design – 8 Edge/9 Data
Nodes
06/30/16 7:39 PM
IBM Information Management
Page 12
© 2012 IBM Information Management
Figure 5 – Mock Production (Performance Test) Environment Hardware Design with DR – 8 Edge
Nodes/9 Data Nodes (with HR)
06/30/16 7:39 PM
IBM Information Management
Page 13
© 2012 IBM Information Management
5.5 Production
The production environment will require at least 8 edges nodes (6-BigInsights [3-HA], 1-Streams
and 1-Data Explorer), System Administration Console and the 17 data nodes. HA will be required.
Figure 6 – Production Environment Hardware Design – 8 Edge Nodes/17 Data Nodes
06/30/16 7:39 PM
IBM Information Management
Page 14
© 2012 IBM Information Management
Figure 7 – Production Environment Hardware Design with DR – 8 Edge Nodes/17 Data Nodes with
HR
6 Stock IBM Cluster Sizings
The following 3 sizings cover 2 production scenarios as well as a development environment
setup. The scope is for BigInsights, and does not include the peripheral products (SPSS,
Streams, Data Explorer, and Cognos).
06/30/16 7:39 PM
IBM Information Management
Page 15
© 2012 IBM Information Management
Workload type
Description
Components
Production System:
Target Workload:
Full Rack: 18 DataNodes, 3 MgmtNodes,
Total capacity: 288tb of raw capacity,
Multiply by expected compression ratio for
MapReduceactual capacity
- Traditional
- Landing zone
DataNode: 3650 M4 BD processor, E5-2650 processor
16c
64gb memory, 14- 4tb drives; 12 for data
2 for mirrored OS
MgmtNode: 3550 M4 E5-2650 processor 16c, 128gb
memory
Switches: 1- G8264 10gbe data switch, 1-G8052 mgmt
switch
42U rack
same as above
Half Rack: 9 DataNodes, 3 Mgmt Nodes
Total capacity: 144tb of raw capacity,
Multiply by expected compression ratio for
actual capacity
same as above
Starter Rack: 3 DataNodes, 3 Mgmt Nodes
Total capacity: 72tb of raw capacity
Production System:
Target workload:
- NoSQL, Hbase,
- Heavy Analytics
Full Rack: 16 DataNodes, 5 MgmtNodes
Total capacity: 256 tb of raw capacity,
Multiply by expected compression ratio for
actual capacity
DataNode: 3650 M4 BD processor, E5-2680 processor
20c
256gb memory, 14- 2tb drives; 12 for data
2 for mirrored OS
MgmtNode: 3550 M4 E5-2650 processor 16c, 128gb
memory
Switches: 1- G8264 10gbe data switch, 1-G8052 mgmt
switch
42U rack
same as above
Half Rack: 8 DataNodes, 5 Mgmt Nodes
Total capacity: 216 tb of raw capacity
same as above
Starter Rack: 3 DataNodes, 5 Mgmt Nodes
Total capacity: 72 tb of raw capacity
Development System
Development System
Starter Rack: 3 DataNodes, 1 Mgmt Nodes
Total capacity: 144 tb of raw capacity
DataNode: 3650 M4 BD processor, E5-2650 processor
16c
64gb memory, 14- 4tb drives; 12 for data
2 for mirrored OS
MgmtNode: 3550 M4 E5-2650 processor 16c, 128gb
memory
Switches: 1- G8264 10gbe data switch, 1-G8052 mgmt
switch
42U rack
06/30/16 7:39 PM
IBM Information Management
Page 16
© 2012 IBM Information Management
Table 1 – Workload Components
7 Custom Sizing
If the customer requires sizing for their specific workload, the answers to the questions below will
provide input into the calculation for a custom sizing.
Data volume
How much data will be stored on the cluster?
Data ingestion
Bulk load: How much data, if any, will be bulk loaded? How quickly does bulk load need to
finish?
Incremental load: Will data be incrementally loaded? How much data per hour?
Workload
What percentage of the cluster will be used for map/reduce versus something else like HBase?
Map/Reduce [if cluster will be used for M/R]
How frequently will new Hadoop jobs get queued?
How quickly does each Hadoop job need to finish?
How much data will the typical Hadoop M/R job consume? How much data will it
produce?
HBase (if cluster will be used for HBase]
How much data will be stored in HBase?
What will be the rate of transactions against HBase?
Will workloads tend to be read heavy or write heavy?
What is the expected per-transaction response time?
What other tools and technologies will be used on the cluster, such as Big SQL, Hive, etc.
[Once we know, we can ask follow on questions, like expected response times to process Big
SQL or Hive queries.]
What else do we know about the workload? What type of analytics will be performed?
Will it be I/O intensive? Compute intensive? Memory intensive?
Cost and footprint
How much is the customer willing to spend?
How much floor space is available for racks?
06/30/16 7:39 PM
IBM Information Management
Page 17
© 2012 IBM Information Management
8 Summary
The BigData implementations at IBM frequently do not have sufficient information for an accurate
sizings. The architecture of BigData is one of expansion when necessary. The current Hadoop
market statistic states that 72% of capacity is idle. As a result, the conversation should be one of
beginning with a starting point with the recommendation that as data is migrated to the BigData
platform organizations should monitor the cluster and add nodes as needed.
06/30/16 7:39 PM
IBM Information Management
Page 18
© 2012 IBM Information Management
APPENDIX
9
9.1
BigInsights and Data (Watson) Explorer Sizings
Size
(TB)
Equipm ent
Type
Stream s
Mem ory
Stream s
Disk
# of
Stream s
Nodes
Small
0
ENTRY
8GB, Dual
Octo Core
2 x 1TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0.25
Existing Console
Hardw are
Customer
Standard Spec
Medium
0
ENTRY
16GB, Dual
Octo Core
2 x 1TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0.5
Existing Console
Hardw are
Customer
Standard Spec
Large
0
ENTRY
16GB, Dual
Octo Core
2 x 1TB
1
128 GB, Dual
Octo Core 12 x 600 GB
1
Existing Console
Hardw are
Customer
Standard Spec
Small
0
ENTRY
32GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0.25
Existing Console
Hardw are
Customer
Standard Spec
Medium
0
ENTRY
32GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0.5
Existing Console
Hardw are
Customer
Standard Spec
Large
0
ENTRY
32GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
1
Existing Console
Hardw are
Customer
Standard Spec
Small
0
VALUE
32GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0.25
Existing Console
Hardw are
Customer
Standard Spec
Medium
0
VALUE
32GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0.5
Existing Console
Hardw are
Customer
Standard Spec
Large
0
VALUE
32GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
1
Existing Console
Hardw are
Customer
Standard Spec
Small
0
VALUE
32GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0.5
Existing Console
Hardw are
Customer
Standard Spec
Medium
0
VALUE
64GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
1
Existing Console
Hardw are
Customer
Standard Spec
Large
Mock
production
0
VALUE
64GB, Dual
Octo Core
2 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
2
Existing Console
Hardw are
Customer
Standard Spec
Small
0
ENTERPRISE
128GB, Dual
Octo Core
4 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0
Existing Console
Hardw are
Customer
Standard Spec
Medium
0
ENTERPRISE
128GB, Dual
Octo Core
4 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0
Existing Console
Hardw are
Customer
Standard Spec
Large
0
ENTERPRISE
128GB, Dual
Octo Core
4 x 2TB
1
128 GB, Dual
Octo Core 12 x 600 GB
0
Existing Console
Hardw are
Customer
Standard Spec
Small
0
ENTERPRISE
4 x 2TB
1
Medium
0
4 x 2TB
1
Large
0
4 x 2TB
1
Existing Console
Hardw are
Existing Console
Hardw are
Existing Console
Hardw are
Customer
Standard Spec
Customer
Standard Spec
Customer
Standard Spec
Environm ent
Data
Explorer
Mem ory
Data
Explorer
Disk
# of Data
System Adm in
Explorer
Console
Nodes
NFS m ount
Lab
Sandbox
Dev
Test
Production
128GB, Dual
Octo Core
128GB, Dual
ENTERPRISE Octo Core
128GB, Dual
ENTERPRISE Octo Core
128 GB, Dual
Octo Core 12 x 600 GB
128 GB, Dual
Octo Core 12 x 600 GB
128 GB, Dual
Octo Core 12 x 600 GB
2
4
8
Table 2 – Streams and Data (Watson) Explorer Sizing
06/30/16 7:39 PM
IBM Information Management
Page 19
© 2012 IBM Information Management
9.2
Supported Platforms
This section describes the platforms supported by BigInsights.
The supported platforms for BigInsights by version can be found at:
http://www-01.ibm.com/support/docview.wss?uid=swg27027565
The supported platforms for Streams and Data Explorer software bundled with BigInsights
can be found at:
http://www-01.ibm.com/support/docview.wss?uid=swg27036473
The supported platforms for the Guardium software can be found at:
http://www-01.ibm.com/support/docview.wss?uid=swg27035836
The following table compares the supported software versions with IBM supported software
versions at the time of this document creation:
Minimum Version
IBM
Recommended
RH Linux
5.5
6
Guardium
9.1
9.1
Tivoli
Workload
Scheduler
(TWS)
9.1
9.1
Cognos
10.2
10.2
Optim
9.1.0.3
9.1.0.3
WebSphere*
8.5
8.5
Table 3 – Software Version Comparisons
- Note - WebSphere is bundled with the BigInsights Enterprise Edition
06/30/16 7:39 PM
IBM Information Management
Page 20
© 2012 IBM Information Management
9.3
Reference Material
Big Data Networked Storage Solution for Hadoop
http://www.redbooks.ibm.com/redpapers/pdfs/redp5010.pdf
Implementing IBM InfoSphere BigInsights on IBM System x
http://www.redbooks.ibm.com/redbooks/pdfs/sg248077.pdf
Data security best practices: A practical guide to implementing data encryption for InfoSphere
BigInsights
http://www.ibm.com/developerworks/library/bd-datasecuritybp/
IBM General Parallel File System (GPFS)
– IBM Internet
http://www03.ibm.com/systems/software/gpfs
– IBM Information Center
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpf
s.v3r5.0.7.gpfs200.doc%2Fbl1adv_fposettings.htm
IBM InfoSphere BigInsights
– IBM Internet
http://www.ibm.com/software/data/infosphere/biginsights
– IBM Information Center
http://pic.dhe.ibm.com/infocenter/bigins/v2r1/index.jsp
IBM Integrated Management Module (IMM2) and Open Source xCAT
– IBM IMM2 User's Guide
ftp://ftp.software.ibm.com/systems/support/system_x_pdf/88y7599.pdf
– IMM and IMM2 Support on IBM System x and BladeCenter Servers, TIPS0849
http://www.redbooks.ibm.com/abstracts/tips0849.html
– SourceForge xCAT Wiki
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Main_Page
– xCAT 2 Guide for the CSM System Administrator, REDP-4437
http://www.redbooks.ibm.com/abstracts/redp4437.html
– IBM Support for xCAT
http://www.ibm.com/systems/software/xcat/support.html
06/30/16 7:39 PM
IBM Information Management
Page 21
© 2012 IBM Information Management
IBM Platform Computing
http://www-03.ibm.com/systems/technicalcomputing/platformcomputing/products/symphony/
– IBM Internet
http://www.ibm.com/systems/technicalcomputing/platformcomputing/index.html
– IBM Platform Computing Integration Solutions, SG24-8081
http://www.redbooks.ibm.com/abstracts/sg248081.html
– Implementing IBM InfoSphere BigInsights on System x, SG24-8077
http://www.redbooks.ibm.com/abstracts/sg248077.html
– Integration of IBM Platform Symphony and IBM InfoSphere BigInsights, REDP-5006
http://www.redbooks.ibm.com/abstracts/redp5006.html
– SWIM Benchmark
http://www.ibm.com/systems/technicalcomputing/platformcomputing/products/sym
phony/highperfhadoop.html
IBM RackSwitch G8052 (1GbE Switch)
– IBM Internet
http://www.ibm.com/systems/networking/switches/rack/g8052
– IBM System Networking RackSwitch G8052, TIPS0813
http://www.redbooks.ibm.com/abstracts/tips0813.html
IBM RackSwitch G8264 (10GbE Switch)
– IBM Internet
http://www.ibm.com/systems/networking/switches/rack/g8264
– IBM System Networking RackSwitch G8264, TIPS0815
http://www.redbooks.ibm.com/abstracts/tips0815.html
IBM RackSwitch G8316 (40GbE Switch)
– IBM Internet
http://www.ibm.com/systems/networking/switches/rack/g8316/
– IBM System Networking RackSwitch G8316, TIPS0842
06/30/16 7:39 PM
IBM Information Management
Page 22
© 2012 IBM Information Management
http://www.redbooks.ibm.com/abstracts/tips0842.html
IBM System x3550 M4 (Management Node)
– IBM Internet
http://www.ibm.com/systems/x/hardware/rack/x3550m4
– IBM System x3550 M4, TIPS0851
http://www.redbooks.ibm.com/abstracts/tips0851.html
IBM System x3630 M4 (Data Node)
– IBM Internet
http://www.ibm.com/systems/x/hardware/rack/x3630m4
– IBM System x3630 M4, TIPS0889
http://www.redbooks.ibm.com/abstracts/tips0889.html
IBM System x Reference Architecture for Hadoop: InfoSphere BigInsights:
– IBM Internet http://www.ibm.com/systems/x/solutions/analytics/bigdata.html
– Implementing IBM InfoSphere BigInsights on System x, SG24-8077
http://www.redbooks.ibm.com/abstracts/sg248077.html
06/30/16 7:39 PM
IBM Information Management
Page 23
© 2012 IBM Information Management
9.4
Best Industry Practices for Solution
As the solution implementation progresses it is extremely important to document best practices
for each solution domain. To the best possible extent it should be ensured that the industry best
practices are adhered to in testing, implementation, roll out and monitoring phases.
Configuration Log
A configuration log is an up-to-date compilation of information and configuration details about
your solution infrastructure. Enough information about the infrastructure should exist so that
infrastructure can be recreated based on it. Whenever changes happen, the configuration log
should be updated. The configuration log can be in hardcopy or softcopy or in both. Hardcopy of
the configuration log should be maintained as disaster backup. This would be required for variety
of reasons:







Disaster Recovery
Troubleshooting
Recreating infrastructure whose configuration is destroyed
Planning for infrastructure additions
Modifying or expanding Solution infrastructure
Recovering accidentally deleted licenses
Recovering or reconfiguring infrastructure configurations
A suggested structure for configuration log resembles Microsoft Windows Explorer view of an
online configuration log:
1. Detailed diagrams of your infrastructure
 Topology
 Different connection
2. Firmware log of all devices
3. A logbook of additions, deletions or modifications to the infrastructure
4. A directory structure for H/W and S/W devices
5. Infrastructure Profile
6. A script directory for any scripts created
Data Protection-Backup

Regularly test ability to restore
Backing up and Restoring Infrastructure configurations
When solution is implemented backup of each hardware and software need to be kept on the
host.
Planning Test Strategy
06/30/16 7:39 PM
IBM Information Management
Page 24
© 2012 IBM Information Management
Every solution team should document their testing plans in their solution document. All the testing
activities and the testing outcome need to be documented


Standalone Testing: Here the individual solution infrastructure just implemented will be
tested for basic functionality and connectivity.
Testing after integration: This testing is carried out once after the solution is integrated.
The nature of tests can be similar but now these are done in the integrated environment.
06/30/16 7:39 PM
IBM Information Management
Page 25
© 2012 IBM Information Management
9.5
APPENDIX: Change Management Process
The Problem
The most important factor to be kept in mind is that any modifications to any component of the
solution could impact overall behavior of the solution. Therefore no change to any solution
component should be directly applied to the production setup. Following are examples of changes
S. No
Changes to Solution
Remarks
1
Applying patches to Linux
Performance, normal running
File system, volume manager
2
Updating FW on the H/W
First try this out in the Test
Devices like SAN, NAS, storage
Setup
Network, appliances, servers
3
Installing / upgrading new H/W
Check with compatibility
Matrix published by the
Product test labs
4
Installing / upgrading New S/W
Check with the compatibility
Matrix
Table 4 – Change Management
Some Tips for making changes
All changes must be documented
Try to limit the number of changes applied at one time
Automate the distribution of files / configuration
Watch for system file changes
06/30/16 7:39 PM
IBM Information Management
Page 26