Always use sorted data for very large data aggregations, or use PowerCenter server 64-bit , and allocate a large amount of memory. Sorted aggregations run much faster than unsorted aggregations.
Aggregator transformations do not sort data. The aggregator uses a clustering algorithm, not a sort algorithm. When there are duplicate rows the aggregator may put data out in a seemingly sorted order but it does not guarantee it.
Keep mappings as simple as possible. The smaller the better in terms of performance and tuning. Divide and conquer is the best strategy for fastest mapping performance. Sometimes multi-staging the work, or splitting the workload between the database and stages can release dependencies upstream, and increase parallelism.
Make sure to allocate a large amount of memory (as much as possible) for mapping objects that cache.
Aggregator transformations can be used to pivot (de-normalize) data, refer to article 11583 .
When replacing PERL code, make sure to break the code into units of work. Use each unit as a design step in the mapping architecture. Develop the overall complex mapping, then break it apart into smaller manageable steps.
Keep the mapping objects as streamlined as possible. Run the data through the transforms, not around them. This helps with the partitioning options at the session level, as well as the parallelism capabilities of the mapping.
When using a Sort, Aggregator, Joiner, or Lookup transformation keep the keys as "small" as possible (measured in precision). Much of the same mathematics that play in computing relational database indexes also play in computing the"indexed" fields that perform the operations listed above.
Keep filter conditions simple, move the complex condition expressions into expression objects. This keeps the filter fast. When the filter runs slowly it's usually because of a complex condition.
Break complex conditions down into smaller parts. Use the variables within an expression to build complex expression logic. This keeps the mappings more maintainable.
Never have more than five (5) targets per mapping. This will slow down the mapping exponentially. Complex maps usually demand multiple targets, but the more targets you have, the poorer the performance.
Complex architectures usually require update strategies within the mapping. The update strategies can result in a performance hit to the session, sometimes significant. It is recommended to minimize the usage of Update Strategies transformations for optimal performance.
If you have very large and complex mappings that are running with a large amount of data (~50 million+ rows) then it is recommended to use the PowerCenter 64-bit server for optimal performance. It provides you with access to plenty of memory and high speed performance for large mappings.
Any mapping with 50+ objects is simply too large and MUST be broken down into multiple mappings.
To create complex output (say a mainframe ASCII file), use a single flat file, single string (4k if necessary), format the string in one or more export "expressions". Use the LPAD and RPAD functions to re-format data, and put record indicator columns on the output side.
Always set the "master" in the joiner to be the smaller of the two tables (except when using detail outer join or full outer join). This will keep the caching of the two set to the minimum number of rows.
Replace a lookup with a joiner, whenever you are faced with extremely large data sets (refer to article 19304 ).
Use reusable lookups instead of the same lookup multiple times. This will assist in reusing the lookup caches, and improve performance.
If you are going to use a sequence generator, and share it across multiple mappings (or make the session run in parallel) then set it to cache a minimum of 10,000 values.