IBM® FileNet® Content Manager 5.2 High Volume Scalability White Paper February 2014 IBM® SWG Enterprise Content Management IBM® FileNet® Content Manager 5.2 High Volume Scalability © Copyright IBM Corporation 2014 Enterprise Content Management www.ibm.com This document is provided “as is” without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranty of merchantability or fitness for a particular purpose. This document is intended for informational purposes only. It could include technical inaccuracies or typographical errors. The information herein and any conclusions drawn from it are subject to change without notice. Many factors have contributed to the results described herein and IBM does not guarantee comparable results. Performance numbers will vary greatly depending upon system configuration. All data in this document pertains only to the specific test configuration and specific releases of the software described. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 2 CONTENTS IBM® FileNet® Content Manager 5.2 High Volume Scalability...................................................................1 Introduction........................................................................................................................................................3 Executive Summary...........................................................................................................................................3 1. Test environment description........................................................................................................................4 Architectural Overview........................................................................................................................................................4 Test Bed Population.............................................................................................................................................................5 2. Population to 5 billion documents.................................................................................................................6 Methodology.........................................................................................................................................................................6 ECM Workload....................................................................................................................................................................8 3. High volume transactional workloads.........................................................................................................11 IBM FileNet Content Manager workload scalability.........................................................................................................11 IBM Content Navigator workload stability........................................................................................................................12 4. Performance considerations and recommendations..................................................................................13 Database storage.................................................................................................................................................................13 File Systems / File Stores...................................................................................................................................................13 JVMs...................................................................................................................................................................................13 Indexing..............................................................................................................................................................................13 Covering index for sweeps.................................................................................................................................................14 Consult the IBM FileNet CM Information Center performance tuning topics..................................................................14 Conclusion.........................................................................................................................................................15 References.........................................................................................................................................................16 Appendix...........................................................................................................................................................17 Detailed Software Information...........................................................................................................................................17 Detailed Hardware Information..........................................................................................................................................17 Detailed Tunings................................................................................................................................................................17 IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 3 Introduction IBM FileNet Content Manager (FileNet CM) is capable of managing enterprise volumes of business content. Many customers currently rely on FileNet CM to handle daily ECM workloads with repositories well into the hundreds of millions of documents, with several exceeding one billion documents in production. This paper demonstrates that FileNet CM, running on IBM POWER/AIX, DB2, GPFS and WebSphere Application Server, is capable of scaling to five billion documents and beyond while maintaining the industry leading performance that customers have come to expect. It also provides configuration best practices and performance guidance as the basis for confidence in achieving similar results in production systems. Executive Summary The IBM FileNet CM product has matured over the past 15 years of continuous improvement, becoming an industry leader in Enterprise Content Management. FileNet CM is in production at thousands of the world’s largest companies, including many who are storing hundreds of millions of documents per object store, concurrently accessed by thousands of users. The stability and scalability of the FileNet CM platform has been well established, although concerns around the database and file storage layers when populations reach into billions of documents prompted further investigation. This paper reaffirms the ability of both the database and file storage to scale a FileNet CM object store into the billions of documents. IBM FileNet CM achieved unprecedented levels of data and workload scalability during this study. The FileNet CM repository successfully scaled to 5 billion document objects, consuming upwards of 120 terabytes of IBM XIV storage. Various operations were performed during pre-defined population checkpoints, demonstrating the scalability of the FileNet CM object store. Checkpoint operations included typical database maintenance tasks, the execution of custom property queries, FileNet CM sweeps, as well as IBM Content Navigator and API-driven workloads. The sweep framework is a new feature in IBM FileNet CM 5.2, providing the ability to traverse a database table and perform a pre-defined action on content elements that meet specific criteria. Two common types of sweeps were performed in order to demonstrate linear scalability. First, a sweep on the root document class, effectively sweeping over the entire multi-billion row Document table. Second, a more targeted sweep, specific to a single document class was executed. Overall system performance and scalability was demonstrated through the execution of a full mix of ECM operations driven by a custom Java API-based test driver. The ECM Workload was executed regularly throughout the population, scaling to over 120,000 transactions per minute. System reliability was demonstrated by running a sustained workload via IBM Content Navigator (ICN). IBM Rational Performance Tester was used to drive a complex workload for 72 hours and successfully demonstrated system stability when using a large content repository. The performance results reported in this paper present data and workloads run on an isolated network on specific operating environments and system configurations. Actual performance in real customer environments with production workloads may vary depending on many factors such as system configuration, workload characteristics, and data volume. The results presented here are not guaranteed to be repeatable in other systems. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 4 1. Test environment description Architectural Overview The test environment was comprised of a single IBM Power 7 780 (Model 9179-MHD) server with 5 native LPARs for the various software components. The Tivoli LDAP server was hosted on a local Microsoft Windows® virtual machine. All file storage and database data files were located on an IBM XIV “Generation 3” high performance storage device with SSD cache. All systems communicated through a private 10 Gbps VLAN. The IBM FileNet CM application was deployed in a three-node IBM WebSphere Network Deployment cluster, and each LPAR (node) contained a single FileNet CM Java Virtual Machine (JVM). See the Appendix at the end of this paper for more detailed environment information. Figure 1 – Overall system architecture diagram IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 5 Test Bed Population The IBM FileNet object store was populated with 120 custom document classes, each consisting of between 1060 custom properties in total. Custom metadata was created to simulate typical customer needs. Figure 2 describes the custom properties and their values, which were used for query performance validation as the data population increased. Section 2, custom property validation, will show the results of the query’s performance. Custom Property Type Custom Property Values Date Time Timestamp at time of document creation String (default 128 character length) Unique FileNet document ID (GUID) Integer – low cardinality Normal distribution (“Bell curve”) between 1-10 Integer – high cardinality Random number between 1-1,000,000 ID Unique FileNet document ID (GUID) Float – Multi-value 2 random float values per document String – Multi-value 2 random string values per document (out of 6 total) String (256 character length) ~300 byte long String (stored in the LOB space) Figure 2 – Custom property descriptions. Content The ECM Workload (see section 2, ECM Workload) population was comprised of 10 million documents ranging from 10 KB to 5 MB distributed across multiple document classes and consisting of multiple versions. The remainder of the large population consisted of approximately 1 billion 30 KB documents, 2 billion 8 KB documents and 2 billion zero-byte content elements. Custom objects The ECM Workload utilizes FileNet CM custom objects. A total of 10 million custom objects were populated with roughly 1 million filed into folders. Folders The ECM Workload relies on an exact number of documents to be filed into specific folders during browse and query operations. A total of 1.3 million folders were created with a total of 7 million documents filed into these folders, with a varying number of containees ranging from 50 to 500. LDAP The IBM Tivoli Directory Server was pre-populated with 152,000 unique users and 5,000 groups. Each group contained a minimum of 20 members. The documents populated for the ECM Workload contained an additional 6 non-default ACLs comprised of 4 grant and 2 deny privileges. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 6 2. Population to 5 billion documents Methodology The primary objective of these tests was to confirm that the IBM FileNet Content Platform Engine (CPE) would scale as the data population increased. At each population checkpoint, the following operations were performed to validate system performance: • Custom property index creation and gathering updated database table statistics • Execution of queries against various custom properties (cold and warm database) • ECM Workload; full mix of operations performance test • Execution of operations that sweep across the entire document repository Population checkpoints occurred at various stages of the 5 billion document population. During the initial phase to reach 1 billion documents, checkpoints were performed every 250 million documents. The system remained very stable up to the 1 billion milestone and checkpoints were increased to every 500 million documents until 3 billion documents were populated. The final checkpoint was performed at the 5 billion document milestone. Ingestion Document ingestion was performed using a custom IBM FileNet CPE Java API client application over the Java EJB (IIOP) protocol. The population was performed by 4 population clients (JVMs), each executing 4 Java threads, for a total of 16 concurrent population threads. Each population client created documents for a specific document class. Database Maintenance Operations Index creation was performed without gathering statistics. After all custom property indexes were created, basic table statistics were gathered, followed by detailed statistics with default sampling. Figure 3 demonstrates the linear scalability of updating database statistics on a multi-billion row database table. Figure 3 – Database table statistics (RUNSTATS) scalability results. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 7 Custom Property Validation The following custom property queries were manually executed by a stand-alone client application using the FileNet CPE Java API. • SELECT TOP 100 low_card_prop FROM DOCUMENT WHERE low_card_prop = [INT]; • SELECT high_card_prop FROM DOCUMENT WHERE high_card_prop = [INT]; • SELECT datetime_prop FROM DOCUMENT WHERE datetime_prop >= [TIMESTAMP] AND datetime_prop <= [TIMESTAMP] ORDER BY datetime_prop; • SELECT string_prop FROM DOCUMENT WHERE string_prop = ‘{GUID}’; • SELECT id_prop FROM DOCUMENT WHERE id_prop = {GUID}; • SELECT id_prop, string_prop FROM DOCUMENT WHERE id_prop = {GUID} OR string_prop = ‘{GUID}’; • SELECT Id FROM DOCUMENT WHERE [FLOAT] IN float_prop_mv; • SELECT TOP 100 Id FROM DOCUMENT WHERE (‘[COLOR]’ IN string_prop_mv AND high_card_prop = [INT]); • SELECT long_string_prop FROM DOCUMENT WHERE high_card_prop = [INT]; Custom property validation was performed twice for each query, once when the database was “cold” and then immediately afterwards when the database was “warm.” In addition to restarting the FileNet CM applications between property validations, the database was deactivated as well to ensure all memory buffers were cleared. Figure 4 – As the database tables became larger, cold execution times gradually increased. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 8 Figure 5 – As the database tables became larger, warm execution times remained stable as content was already loaded into the database bufferpool. ECM Workload The ECM Workload is a Java API driven workload comprised of various create, delete, update, browse, query and retrieval operations with documents ranging in size from 10KB to 5MB. Figure 6 shows the distribution of operations in the workload. All content was stored in the file store. Each test execution was performed by a single test driver with 2,000 virtual users (client threads), with an average “think time” of 3 seconds, resulting in an effective workload of 40,000 transactions per minute (TPM). Operation Type Percentage of total workload Create/Delete 19% Update 14% Browse 14% Query 25% Retrieve 28% Total 100% Figure 6 – ECM Workload work mix operation breakdown IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 9 Figure 7 demonstrates the stability of each type of operation for the ECM Workload executed at various population intervals. Figure 7 – ECM Workload response time stability as population increased. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 10 Sweep Execution A full Document table (DocVersion) sweep was performed along with a sweep based on a specific document class. The document class sweep was roughly ¼ of the total population for a given interval, hence the smaller duration. Figure 8 shows how sweep execution scaled linearly as the population increased. Both sweeps were executed in preview mode to demonstrate raw performance capability without modifying the test bed. Figure 8 – “Preview mode” document sweep scalability results IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 11 3. High volume transactional workloads IBM FileNet Content Manager workload scalability In addition to validating that the IBM FileNet Content Manager repository is able to maintain stable performance characteristics as a population grows to several billion documents, further tests were performed to evaluate the scalability of the IBM FileNet CM Java API at multiple checkpoints from 1 billion to 3 billion documents. Good response times were observed up to 120,000 transactions per minute, after which the test client became the bottleneck, preventing accurate reporting of operation response times. Figure 9 – Demonstrates the scalability characteristics of the ECM Workload IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 12 IBM Content Navigator workload stability The IBM Content Navigator (ICN) component was configured against the 5 billion document IBM FileNet Content Manager object store. A 6,000 transaction per minute (TPM) full-mix workload was performed over 72 hours successfully demonstrating the stability of ICN against a large FileNet CM object store. The ICN driven workload included workflow and ICN Teamspace operations in addition to standard content based operations. Operation Type Percentage of total workload Create/Delete Document 15% Update 10% Browse 10% View Property 15% Search 10% Retrieve 10% Workflow 20% Social Features 5% Teamspace 5% Total 100% Figure 10 – ICN work mix operation breakdown IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 13 4. Performance considerations and recommendations The performance topics below were instrumental in achieving the goal of this study and should be strongly considered when attempting to scale a FileNet CM object store into the billions of documents. Many concepts discussed in this section can be leveraged by smaller scale deployments. IBM recommends strict performance benchmarks be performed in a highly controlled test environment before implementing any of the suggested tunings below in a production environment. Database storage IBM suggests enabling DB2 table and index compression on any table that will grow to a significant number of rows. Compression allows for more efficient use of buffer pools along with improved performance of table access in addition to general space savings with minimal CPU overhead. IBM recommends DB2 databases be created with automatic storage and dedicated table spaces for data, index and large objects (LOBs) for each object store. Ensuring that all table spaces are created on storage groups with multiple storage paths is crucial for optimal performance. Utilizing multiple storage paths will allow for parallel pre-fetching of table space data along with more efficient access to underlying storage devices by allowing multiple I/O paths (if available) to be accessed concurrently. File Systems / File Stores For large populations, IBM recommends that file store(s) use the large storage area structure. Multiple storage areas should be used to prevent too many documents from being stored in a single subdirectory of the underlying file system, which may adversely affect file system performance. Consult the IBM FileNet CM Information Center for more information and recommendations. IBM’s General Parallel File System (GPFS) was used to provide the shared file system for each CPE instance. GPFS recommends that each file system should not exceed 1 billion documents. It is also important to ensure that the GPFS clients have sufficient cache available. For metadata-heavy workloads, consider creating dedicated NSDs for GPFS file system metadata. Please see the Appendix for more details on the specific tunings applied to this environment and the GPFS Wiki for more information about GPFS. JVMs Ample JVM heap space is critical for optimal application performance. Closely monitoring the heap usage during the system validation phase of a new deployment, as well as periodically during the lifecycle of your application will help ensure that excessive garbage collection is not impacting application response times. An inadequately sized heap can also create unnecessary load on server(s) due to frequent garbage collections, potentially increasing IT infrastructure costs. IBM suggests enabling verbose garbage collection to monitor JVM garbage collection activity. Indexing Ensuring sufficient SORTHEAP is available for in-memory sort operations is critical when creating large indexes. DB2 will perform a maximum of 6 concurrent sorts by default. Therefore, ensure that the SHEAPTHRESH_SHR value is at least 6 times that of the SORTHEAP value. Inability to perform sorts in memory will result in sort overflows into the temporary table space(s). Disk access is far slower than RAM, so these overflows can have a considerable impact on the performance of index creation. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 14 In addition to sort overflows, temporary tables will be heavily utilized as sorting operations are completed and results are written to disk prior to the final sort merge. Therefore, having the temporary table space(s) spread across multiple high performance storage paths is strongly recommended. Covering index for sweeps Minimizing the impact of sweep execution on the database’s resources with the presence of a proper covering index is crucial. Having a covering index in place will result in index-only access and minimize physical I/O reads during the sweep execution. A sufficiently sized bufferpool will help minimize physical I/O as well. Please consult the IBM FileNet CM Information Center for more information on creating sweep indexes. Consult the IBM FileNet CM Information Center performance tuning topics It is always suggested to refer to the P8 performance tuning topics in the IBM FileNet CM Information Center early in the planning and testing phase of a new deployment rather than after the environment is in production. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 15 Conclusion Enterprise organizations are experiencing content growth at ever-increasing rates. For more than a decade, IBM FileNet Content Manger (FileNet CM) has been continually refined and improved upon to handle large amounts of business-critical data, securely and efficiently. IBM FileNet CM has been proven to meet the needs of Enterprise Content Management customers worldwide with repositories populated well into the hundreds of millions, with several customers exceeding a billion documents. The traditional guidance of spreading large populations across multiple object stores adds unnecessary complexity and administrative overhead to a company’s IT organization. Many valid reasons do exist for utilizing multiple object stores in a FileNet CM system, such as segmenting the operations of varying business units or to meet IT backup and restore policies, to name a few. This paper demonstrates that even for document populations well into the billions, it is not necessary to utilize multiple object stores in order to achieve and maintain industry leading performance. Provided that the best practices and guidance outlined above are adhered to, it is possible to scale an IBM FileNet CM object store to five billion documents and beyond. IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 16 References 1. IBM FileNet Content Manager Software: http://www.ibm.com/software/products/en/filecontmana/ 2. IBM FileNet P8 5.2 Information Center: http://publib.boulder.ibm.com/infocenter/p8docs/v5r2m0/index.jsp a. Performance Tuning Topics: http://publib.boulder.ibm.com/infocenter/p8docs/v5r2m0/topic/com.ibm.p8.performance.d oc/p8ppt000.htm b. Monitoring IBM FileNet P8: http://publib.boulder.ibm.com/infocenter/p8docs/v5r2m0/topic/com.ibm.p8.sysmgr.admin. doc/overview_monitoring_p8.htm c. Troubleshooting the Content Platform Engine http://pic.dhe.ibm.com/infocenter/p8docs/v5r2m0/topic/com.ibm.p8.ce.trouble.doc/p8pct00 0.htm 3. IBM DB2 V10.1 Information Center: http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp 4. IBM GPFS Wiki: https://www.ibm.com/developerworks/community/wikis/home? lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29 IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 17 Appendix Detailed Software Information IBM FileNet P8 Content Manager 5.2.0.2 GA IBM Content Navigator 2.0.2.1 GA IBM WebSphere Network Deployment 8.5.0.2 • IBM JDK 1.7.0.4 (PPC64) IBM DB2 v10.1.0.2 LUW Advanced Enterprise Server Edition • Storage Optimization Feature Enabled IBM GPFS v3.5.0.9 IBM AIX PPC64 7.1.0.2 IBM Tivoli Directory Server v6.3 Detailed Tunings IBM FileNet Content Manager • CPE o o o o o Detailed Hardware Information o o IBM AIX POWER 7 780 • M/T 9179-MHD • • 128 CPU cores (SMT disabled) 2TB RAM • • • • 8 Gbps fiber cards (12 I/O paths configured) 10 Gbps Ethernet cards All XIV hdisks’ queue_depth increased to 256. All XIV hdisks’ max_transfer increased to 100,000. 8 GB min/max heap size 4 GB nursery Gencon garbage collection policy (WAS 8.x default) Object Store JDBC data source pool max threads: 500 Global Configuration Database (GCD) JDBC data source pool max threads: 10 ORB thread pool max threads: 100 IBM DB2 JDBC JCC driver v4.15.82 IBM Content Navigator • 8 GB min/max heap size IBM GPFS • pagepool: 8GB maxFilesToCache: 5000 block size: 256KB • 15 file systems with dedicated NSD for metadata each • • IBM XIV “Generation 3” • • • 15 modules “full stack” IBM DB2 80x2TB SAS hard disks • 6 TB SSD cache • • • INSTANCE_MEMORY: 256GB (fixed) Bufferpools o Data tablespace: 64GB fixed o Index tablespace: 32GB fixed Automatic maintenance tasks disabled Instance parameters o DB2_WORKLOAD=FILENET_CM o DB2_MINIMIZE_PREFETCH=YES (included by default in the above workload in future fix packs of DB2) IBM® FileNet® Content Manager 5.2 High Volume Scalability Author Information Michael Bordash ECM Server System Test Engineer Contributors Matthew Vest ECM Server System Test & Performance Engineering Senior Manager Dave Royer ECM Performance Architect, Senior Software Engineer Special thanks to the following members of the IBM FileNet CM development team: Mike Winter, ECM Distinguished Engineer Joseph Raby, ECM Development, Manager David Skinner, ECM Development Haibing Qiao, ECM Development Page 18 IBM® FileNet® Content Manager 5.2 High Volume Scalability Page 19 Disclaimer The information in this publication is not intended as a substitution of the IBM FileNet product documentation provided by IBM. Please see http://www.ibm.com/software/data/content-management for more information about what publications are considered to be product documentation. References in this publication to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent program that does not infringe any of IBM's intellectual property rights may be used instead of the IBM product, program or service. Information in this publication was developed in conjunction with use of the equipment specified, and is limited in application to those specific hardware and software products and levels. The information contained in this publication was derived under specific operating and environmental conditions. While IBM has reviewed the information for accuracy under the given conditions, the results obtained in your operating environments may vary significantly. Accordingly, IBM does not provide any representations, assurances, guarantees, or warranties regarding performance. Any information about non-IBM ("vendor") products in this document has been supplied by the vendor and IBM assumes no responsibility for its accuracy or completeness. IBM, IBM FileNet Content Manager, DB2, WebSphere, AIX, Rational, and Tivoli are trademarks or registered trademarks of IBM Corporation in the United States, other countries, or both. Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. UNIX is a trademark of The Open Group. Windows is a registered trademark of Microsoft Corporation in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others. ®Copyright IBM Corporation 2014 Produced in the United States of America All Rights Reserved The e-business logo, the eServer logo, IBM, the IBM logo, IBM Directory Server, DB2, FileNet, FileNet Content Manager and WebSphere are trademarks of International Business Machines Corporation in the United States, other countries or both. The following are trademarks of other companies: Solaris, Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both. Windows and Windows 2008 Enterprise Edition are trademarks of Microsoft Corporation in the United States and/or other countries Oracle 9i and all Oracle-based trademarks and logos are trademarks of the Oracle Corporation in the United States, other countries or both. Other company, product and service names may be trademarks or service marks of others. INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PAPER “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. Information in this paper as to the availability of products was believed accurate as of the time of publication. IBM cannot guarantee that identified products will continue to be made available by their suppliers. This information could include technical inaccuracies or typographical errors. Changes may be made periodically to the information herein; these changes may be incorporated in subsequent versions of the paper. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this paper at any time without notice. Any references in this document to non-IBM web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY, USA 10504-1785 © Copyright IBM Corporation 2014 IBM 3565 Harbor Boulevard Costa Mesa, CA 92626-1420 USA Printed in the USA 01-07 All Rights Reserved. IBM and the IBM logo are trademarks of IBM Corporation in the United States, other countries, or both. All other company or product names are registered trademarks or trademarks of their respective companies. The IBM home page on the Internet can be found at ibm.com