Informatica Data Director Performance

advertisement
Informatica Data Director Performance
© 2011 Informatica
Abstract
A variety of performance and stress tests are run on the Informatica Data Director to ensure performance and scalability for
a wide variety of business scenarios. This article describes the performance and stress tests conducted on the Informatica
Data Director, and provides guidance in performance and capacity planning for the deployments. It also describes the IDD
architecture and the 3-tier architecture in a typical IDD deployment and provides the sample model and the methodology for
sizing the IDD environments.
Supported Versions
¨ Informatica Data Director 9.0.1
Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Informatica Data Director Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Testing Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Test Environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Performance Test Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Scalability Test Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
IDD Capacity Planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction
Informatica conducts extensive performance and stress tests on Informatica Data Director (IDD) for various MDM Hub
scenarios. The results of these tests help in performance and capacity planning for the deployments.
The methodology and the test environments chosen, ensure that all the important IDD performance characteristics are
measured. You can use the guidance provided in this article, along with other tools and documents for architecting, and
performance and capacity planning for the deployments.
This article includes an overview of the IDD architecture and the 3-tier architecture in a typical IDD deployment. It also
includes the sample model and the methodology for sizing the IDD environments based on the measured performance
characteristics and the industry standard usage patterns for sizing the enterprise applications.
The tests described in this article are based on the typical IDD usage patterns and should be used to understand the
deployment architecture and the performance characteristics of IDD and selecting the appropriate hardware for the expected
user loads. Actual configuration performance may vary from these estimates based on the network loads and bandwidth
availability, differences between the assumptions made in the sample model and the actual usage patterns, as well as other
factors.
Informatica Data Director Architecture
IDD is a web application that uses the MDM Hub infrastructure for all of its metadata and configuration storage needs and
leverages the MDM Hub APIs for all data operations.
It is best to view the full IDD stack as a three-tier architecture. These tiers consist of the Informatica MDM Hub Store, the
Informatica MDM Hub Server(s) (includes Cleanse-Match Servers) and the Informatica Data Director runtime.
2
The Hub Store is where business data is stored and consolidated. The Hub Store contains common information about all of
the databases that are part of an Informatica MDM Hub implementation. It resides in a supported database server
environment -- either Oracle (10g or 11g) or DB2 UDB (v8.1).
The Hub Server is the run-time component that manages core and common services for the Informatica MDM Hub. The Hub
Server is a J2EE application, deployed on a supported application server that orchestrates the data processing within the
Hub Store. Supported application servers are JBoss 4.03, WebLogic 9.2 and 10.0, and Websphere 6.1.0.x.
The Data Director is a J2EE based Web application, which is deployed in the same J2EE application archive with the Hub
Server to simplify configuration and provide the optimal configuration for the Hub API performance. Data Director supports
clients with Firefox 3.5.x, Internet Explorer 7.0 and Safari 4.0.
Data Director represents the latest generation of highly interactive browser based applications and uses AJAX extensively
for enhancing user experience. Data Director also utilizes Adobe Flash for several of its modules that require graphical
representation of data.
Testing Methodology
Informatica MDM Hub can be used in a variety of scenarios and use cases, and provides both batch and real time APIs.
Informatica conducts extensive tests as part of its release process, to ensure adequate performance for all of the MDM Hub
scenarios. The results of these tests and the performance benchmarks are available through Informatica support.
The Data Director testing methodology augments on the MDM Hub test results and is intended to test the performance
characteristics specific to IDD, such as web sessions, AJAX operations, and client (browser) side script execution.
The tests are run on a small development data set to simplify the setup and reduce the environment costs. What allows us
to do this is that the IDD uses MDM Hub API for all data access and manipulation. IDD performance does not directly
depend on the size of the data set and the performance characteristics on the large data sets can be derived from the MDM
Hub API tests that are run independently from the IDD tests.
We define 3 types of automated tests that are run regularly in the IDD release cycle:
Performance Tests
The goal of the performance tests is to measure the end-to-end (glass-back) response times as experienced by users. The
performance tests are performed using the WatiN scripting library using Internet Explorer and represent around 70 different
user interaction scenarios in IDD. The response time is measured in the browser using JavaScript and browser event
instrumentation. The measurements include the browser XHTML/XML rendering time and JavaScript execution for all user
and AJAX interactions. Performance tests are typically run in a single test client configuration because of the high load that
the HTML rendering and the script execution put on the client. The main metric tracked for these tests is the response
latency. The performance tests are used for improving the functional performance and user interaction experience.
Stress Tests
The goal of the stress tests is to measure the performance characteristics of IDD Server runtime under load. These tests are
performed by simulating the basic IDD user scenarios to create a representative user load profile with the 80% to 20% ratio
of the read to write operations. The scalability tests are done using the WatiN library and the JMeter. The tests are executed
outside of the browser to effectively simulate the load with both the latency and the throughput measured. The tests are run
with gradually increasing the number of simulated users over the ramp-up period (typically 1 hour) to measure the latency
and response times under different loads. The WatiN tests are run with no-wait users to achieve the maximum load with the
minimum number of clients. The JMeter tests are run using a large number of threads simulating the user think time.
Survival Tests
The goal of the survival tests is to ensure that IDD can sustain significant user loads for extended periods of time. The tests
are run at 80% of the full capacity and are run with a large number of test threads using the fixed throughput timers to
simulate a large number of occasional users with significant think time, which would create a large number of server sessions.
3
Test Environments
The following test environment is used for the performance tests:
Server
CPU
OS
Hub Config
DB Server
Intel Core 2 Duo, 1.86 GHz,
2GB RAM
Windows Server 2003 OS,
Service Pack 2
ORACLE 10.2
App Server
Intel Core 2 Duo, 1.86 GHz,
2GB RAM
Windows Server 2003 OS,
Service Pack 2
JBoss AS 4.2.3
Test Client
Intel Core 2 Duo, 1.86 GHz,
2GB RAM
Windows XP
IE 7
The following test environment is used for the stress and survival tests:
Server
Hardware
OS
Hub Config
DB Server
Intel Core 2 Quad, 2.4 GHz,
8GB RAM
Linux Red Hat 4.1.1-52
ORACLE 10.2
App Server
Intel Core 2 Quad, 2.4 GHz,
8GB RAM
Linux Red Hat 4.1.1-52
JBoss AS 4.2.3
Test Client
Intel Core 2 Quad, 2.4 GHz,
8GB RAM
Windows Server 2003 OS,
Service Pack 2
IE 7
Performance Test Results
All performance tests presented in this documents have been demonstrated on the XU SP2 Patch B version of the MDM Hub.
Here are the performance test results for the representative performance test scenarios.
Test Scenarios
Scenario Details
Latency (sec)
Creation of a new "Person" tab in data view
Creates new 'Person' tab in 'Data view'
2.1
Removes tab
1.4
Saves 'Person' record with all fields filled
and having 10 'Names', 10 'Phones' and
10 Addresses
7.4
Deletes person record
1.4
Switching to search tab
2
Switching to existing record tab
2.3
Switching to copied record tab
2.6
Search for person record with specified
Display Name
1.3
Saving and deleting 'Person' record in
'Data View'
Switching between data view tabs
operations
Search and subject area record opening
from search tab time
4
Test Scenarios
Scenario Details
Latency (sec)
Opening record from search tab
3.2
Closing tab in Data View
2.2
Switching to 'Phone' children tab
2.8
Switching to 'Names' children tab
1.6
Switching to 'BillAddresses' children tab
2.4
Basic search
1.1
Basic search, search result navigation
1.6
Switch to 'Saved queries' tab
1.2
Switch to 'Active query' tab
0.7
Query builder open
1.5
Save Query As dialog opening time
0.9
Xrefs dialog opening/closing
Open Xrefs dialog
5.1
Basic HM operations
Open HM Tab for large graph
4.5
Switching between child tabs in 'Person'
record
Basic search operation, navigation
through search results
UI operations on search tab
The performance test results are specific to the different types of user interactions in IDD. The tests show that all of the
major user interactions can be completed within 1.5 - 2.5 seconds depending on the complexity of the operation with a few
of outliers that perform significant operations on the data.
Scalability Test Results
Stress tests have been performed in both the performance and the stress test environments. We have demonstrated near
linear scalability across the environments with the 13.4 TPS achieved in the performance test environment and 26.0 TPS
achieved in the stress test environment and achieving 6.5 TPS per CPU core throughput.
Here is the stress test graph for the tests performed in the stress test environment.
The graph shows that the latency quickly stabilizes after the initial system warm up and is very stable through the ramp up
period when the application server CPU is saturated, as seen from the CPU Usage graph below.
5
IDD Capacity Planning
In this section we will explain the methodology that we use for extrapolating the stress test results for capacity planning. For
simplicity we assume that all users are in a single, or adjacent time zones. Similar model can be easily created for multiple
time zones.
This model is based on estimating the number of user sessions and throughput for each of the sessions. It allows for a small
number of input parameters that are usually easy to estimate in any business environment. Please note that other
approaches, such as estimating the total number of transactions can also be used.
Here are the parameters that are used in the sizing model.
6
Parameter
Description
Sample Value
Number of frequent users
Number of users that rely on IDD to do
their job and use it constantly throughout
their working day. These are typically
data stewards that are responsible for the
data quality.
20
Number of occasional users
Number of users that use IDD
occasionally. These are typically business
users that use IDD several times a day to
access or manage the master data as
part of the regular business processes.
1000
Percent of time active for the frequent
users
Typically we use 50%, which represents
that total amount of time that the frequent
users would be actively working with the
system.
50%
Percent of time active for the occasional
users
Typically we use 10%, which represents
that total amount of time that the frequent
users would be actively working with the
system.
10%
Work Day
Duration of a normal workday for the
business users.
10 hours
Peak load period
Period of the day where the majority of
operations are performed and the % of
operations that are performed during the
peak load period.
2 hours
30%
Think time
Amount of time that the user needs to
review or correct the data. Typical think
5 seconds
Parameter
Description
Sample Value
time that we observed for IDD users
averaged 5 seconds.
The total number of active user sessions during the peak load period can be calculated as:
Number of Sessions = [Frequent User Sessions] + [Occasional User Sessions]
where
Frequent User Sessions = [# frequent users] * [% of time active]
and
Occasional User Sessions = [# occasional users] * [% of time active] * [work load %] * [work day] / [peak load time]
For our sample values the number of session can be calculated as:
Number of sessions = 20 * .5 + 1000 * .1 * .3 * 10 / 2 = 10 + 150 = 160
Given the assumptions on the think time and the typical response times demonstrated in the performance tests above we
could calculate the throughput created by a single user as:
Single User throughput = ([avg tx response time] + [think time]) / 60 sec
Single User throughput = (1 sec + 5 sec) / 60 sec = .1 TPS
Thus the 160 concurrent sessions would have produced ~ 16 TPS (Sustained TPS) if the requests were evenly distributed
throughout the day. The load patterns however are random and such assumption is not valid.
We typically recommend the factor of 3 to account for such discrepancies and to ensure that the system can provide
adequate response at any point in time despite the load fluctuations.
Peak TPS = 3 * [Sustained TPS]
Using these assumptions and the throughput test results that were demonstrated, we can calculate the number of App
Server cores that would be required for this configuration to support the IDD load:
Number of Cores = CEILING([Peak TPS] / [Throughput per Core])
Number of Cores = 8 cores
For our sample configuration we would recommend 2 Application servers with 4 CPU cores each to support the IDD load.
Conclusion
In this article we described the variety of the performance and stress tests that are run on the Informatica Data Directory to
ensure the adequate performance and scalability for a wide variety of business scenarios and have shown that we were able
to achieve high throughput and low latencies in the performance and scalability tests that have been conducted in out
performance labs.
We also included the framework for sizing the IDD environments using the achieved test results and recommended a simple
sizing model that can be easily adapted to any specific scenario and modified to use the available data about the user
activity profiles and the user load.
Author
Dmitri Korablev
VP Strategy and Planning
7
Download