Mapping Design optimization techniques

advertisement
Mapping Design optimization techniques
Mapping Design optimization techniques
Challenge
Optimizing PowerCenter to create an efficient execution environment.
Description
Although PowerCenter environments vary widely, most sessions and/or mappings can
benefit from the implementation of common objects and optimization procedures.
Follow these procedures and rules of thumb when creating mappings to help ensure
optimization.
General Suggestions for Optimizing
1. Reduce the number of transformations. There is always overhead involved
in moving data between transformations.
2. Consider more shared memory for large number of transformations.
Session shared memory between 12MB and 40MB should suffice.
3. Calculate once, use many times.
o
Avoid calculating or testing the same value over and over.
o
Calculate it once in an expression, and set a True/False flag.
o
Within an expression, use variable ports to calculate a value than can
be used multiple times within that transformation.
4. Only connect what is used.
o
Delete unnecessary links between transformations to minimize the
amount of data moved, particularly in the Source Qualifier.
o
This is also helpful for maintenance. If a transformation needs to be
reconnected, it is best to only have necessary ports set as input and
output to reconnect.
o
In lookup transformations, change unused ports to be neither input
nor output. This makes the transformations cleaner looking. It also
makes the generated SQL override as small as possible, which cuts
down on the amount of cache necessary and thereby improves
performance.
5. Watch the data types.
o
o
The engine automatically converts compatible types.
Sometimes data conversion is excessive. Data types are automatically
converted when types are different between connected ports.
Minimize data type changes between transformations by planning data
flow prior to developing the mapping.
1
Mapping Design optimization techniques
6. Facilitate reuse.
o
o
Plan for reusable transformations upfront.
Use variables. Use both mapping variables as well as ports that are
variables. Variable ports are especially beneficial when they can be
used to calculate a complex expression or perform a disconnected
lookup call only once instead of multiple times
o
o
Use mapplets to encapsulate multiple reusable transformations.
Use mapplets to leverage the work of critical developers and minimize
mistakes when performing similar functions.
7. Only manipulate data that needs to be moved and transformed.
o
Reduce the number of non-essential records that are passed through
the entire mapping.
o
Use active transformations that reduce the number of records as early
in the mapping as possible (i.e., placing filters, aggregators as close to
source as possible).
o
Select appropriate driving/master table while using joins. The table
with the lesser number of rows should be the driving/master table for
a faster join.
8. Utilize single-pass reads.
o
Redesign mappings to utilize one Source Qualifier to populate multiple
targets. This way the server reads this source only once. If you have
different Source Qualifiers for the same source (e.g., one for delete
and one for update/insert), the server reads the source for each
Source Qualifier.
o
o
Remove or reduce field-level stored procedures.
If you use field-level stored procedures, the PowerCenter server has to
make a call to that stored procedure for every row, slowing
performance.
Lookup Transformation Optimizing Tips
1. When your source is large, cache lookup table columns for those lookup tables
of 500,000 rows or less. This typically improves performance by 10 to 20
percent.
2. The rule of thumb is not to cache any table over 500,000 rows. This is only
true if the standard row byte count is 1,024 or less. If the row byte count is
more than 1,024, then the 500k rows will have to be adjusted down as the
number of bytes increase (i.e., a 2,048 byte row can drop the cache row count
to between 250K and 300K, so the lookup table should not be cached in this
case). This is just a general rule though. Try running the session with a large
lookup cached and not cached. Caching is often still faster on very large
lookup tables.
3. When using a Lookup Table Transformation, improve lookup performance by
placing all conditions that use the equality operator = first in the list of
2
Mapping Design optimization techniques
conditions under the condition tab.
4. Cache only lookup tables if the number of lookup calls is more than 10 to 20
percent of the lookup table rows. For fewer number of lookup calls, do not
cache if the number of lookup table rows is large. For small lookup tables(i.e.,
less than 5,000 rows), cache for more than 5 to 10 lookup calls.
5. Replace lookup with decode or IIF (for small sets of values).
6. If caching lookups and performance is poor, consider replacing with an
unconnected, uncached lookup.
7. For overly large lookup tables, use dynamic caching along with a persistent
cache. Cache the entire table to a persistent file on the first run, enable the
update else insert option on the dynamic cache and the engine will never have
to go back to the database to read data from this table. You can also partition
this persistent cache at run time for further performance gains.
8. Review complex expressions.

Examine mappings via Repository Reporting and Dependency Reporting within
the mapping.


Minimize aggregate function calls.
Replace Aggregate Transformation object with an Expression Transformation
object and an Update Strategy Transformation for certain types of
Aggregations.
Operations and Expression Optimizing Tips
1. Numeric operations are faster than string operations.
2. Optimize char-varchar comparisons (i.e., trim spaces before comparing).
3. Operators are faster than functions (i.e., || vs. CONCAT).
4. Optimize IIF expressions.
5. Avoid date comparisons in lookup; replace with string.
6. Test expression timing by replacing with constant.
7. Use flat files.

Using flat files located on the server machine loads faster than a database
located in the server machine.

Fixed-width files are faster to load than delimited files because delimited files
require extra parsing.

If processing intricate transformations, consider loading first to a source flat
file into a relational database, which allows the PowerCenter mappings to
access the data in an optimized fashion by using filters and custom SQL
Selects where appropriate.
8. If working with data that is not able to return sorted data (e.g., Web Logs),
consider using the Sorter Advanced External Procedure.
3
Mapping Design optimization techniques
9. Use a Router Transformation to separate data flows instead of multiple Filter
Transformations.
10. Use a Sorter Transformation or hash-auto keys partitioning before an
Aggregator Transformation to optimize the aggregate. With a Sorter
Transformation, the Sorted Ports option can be used, even if the original
source cannot be ordered.
11. Use a Normalizer Transformation to pivot rows rather than multiple instances
of the same target.
12. Rejected rows from an update strategy are logged to the bad file. Consider
filtering before the update strategy if retaining these rows is not critical
because logging causes extra overhead on the engine. Choose the option in
the update strategy to discard rejected rows.
13. When using a Joiner Transformation, be sure to make the source with the
smallest amount of data the Master source.
14. If an update override is necessary in a load, consider using a Lookup
transformation just in front of the target to retrieve the primary key. The
primary key update will be much faster than the non-indexed lookup override.
Suggestions for Using Mapplets
A mapplet is a reusable object that represents a set of transformations. It allows you
to reuse transformation logic and can contain as many transformations as necessary.
Use the Mapplet Designer to create mapplets.
1. Create a mapplet when you want to use a standardized set of transformation
logic in several mappings. For example, if you have several fact tables that
require a series of dimension keys, you can create a mapplet containing a
series of Lookup transformations to find each dimension key. You can then use
the mapplet in each fact table mapping, rather than recreate the same lookup
logic in each mapping.
2. To create a mapplet, add, connect, and configure transformations to complete
the desired transformation logic. After you save a mapplet, you can use it in a
mapping to represent the transformations within the mapplet. When you use a
mapplet in a mapping, you use an instance of the mapplet. All uses of a
mapplet are tied to the parent mapplet. Hence, all changes made to the parent
mapplet logic are inherited by every child instance of the mapplet. When the
server runs a session using a mapplet, it expands the mapplet. The server
then runs the session as it would any other session, passing data through each
transformation in the mapplet as designed.
3. A mapplet can be active or passive depending on the transformations in the
mapplet. Active mapplets contain at least one active transformation. Passive
mapplets only contain passive transformations. Being aware of this property
when using mapplets can save time when debugging invalid mappings.
4. Unsupported transformations that should not be used in a mapplet include:
4
Mapping Design optimization techniques
COBOL source definitions, normalizer, non-reusable sequence generator, preor post-session stored procedures, target definitions, and PowerMart 3.5-style
lookup functions.
5. Do not reuse mapplets if you only need one or two transformations of the
mapplet while all other calculated ports and transformations are obsolete.
6. Source data for a mapplet can originate from one of two places:

Sources within the mapplet. Use one or more source definitions connected
to a Source Qualifier or ERP Source Qualifier transformation. When you use
the mapplet in a mapping, the mapplet provides source data for the mapping
and is the first object in the mapping data flow.

Sources outside the mapplet. Use a mapplet Input transformation to define
input ports. When you use the mapplet in a mapping, data passes through the
mapplet as part of the mapping data flow.
7. To pass data out of a mapplet, create mapplet output ports. Each port in
an Output transformation connected to another transformation in the mapplet
becomes a mapplet output port.

Active mapplets with more than one Output transformations. You need
one target in the mapping for each Output transformation in the mapplet. You
cannot use only one data flow of the mapplet in a mapping.

Passive mapplets with more than one Output transformations. Reduce
to one Output Transformation; otherwise you need one target in the mapping
for each Output transformation in the mapplet. This means you cannot use
only one data flow of the mapplet in a mapping.
5
Download