PERFORMANCE TUNING IN PARALLEL ENVIRONMENTS
 

Any given system can be tuned to favor one application so much that it actually

negatively impacts the performance of other applications. This phenomenon is

exacerbated as we introduce parallel capabilities into the system.

Many factors affect the performance of an application

RDBMS configuration and performance

Memory vs. System Working Set Size

CPU's vs. System Load

Data input/output throughput Rates

Amdahl’s law (an application is gated by it’s slowest component).

Best Practices:

Establish baselines (especially with I/O), use copy with no output

Avoid the use of only one flow for tuning/performance testing.

Prototyping can be a powerful tool.

Work in increments...change 1 thing at a time.

Evaluate data skew: repartition to balance the data flow

Isolate and Solve - determine which stage is causing a problem.

distribute file systems (if possible) to eliminate bottlenecks

Do NOT involve the RDBMS in initial testing. (See above)

Understand and evaluate the tuning knobs available

Establishing a baseline:

Set up at least 3 configurations: sequential; max parallel; ½ max parallel

Use real data if possible, else use table definition

Create or generate a dataset with 2-3 times available RAM (limit test to

10-15 mins)

Using sequential configuration file:

_ Read dataset to copy (copy –f)

_ Rerun and watch for caching

_ Add a write to dataset

_ Run a read/sort/copy test (use a relatively random key for sort)

Using ½ max parallel configuration file

_ Create a non-skewed dataset

_ Rerun tests above

_ “tune” the configuration to obtain a linear application speed-up

_ Review the entire I/O system

_ Review the configuration file to spread I/O activity

Using max parallel configuration

_ Create a non-skewed dataset

☻Page 81 of 210☻

_ Rerun tests above

_ Stress the system, looking for areas of contention

Buffering (Enterprise Edition and Server)

Facility added behind the scenes to optimize and regulate data flow. It’s

primary purpose is to match the rate data produced upstream with the rate

it is consumed downstream. (see next slide)

Partitioning/Sorting (Enterprise Edition)

Operations added behind the scenes to alleviate the need for a developer to

worry about this while assuring that the flow operates correctly.

(APT_NO_PART_INSERTION & APT_NO_SORT_INSERTION)

Operator Combination (Enterprise Edition)

Operations combined behind the scenes to improve performance.

APT_DISABLE_COMBINATION

Controlling the Buffers in DataStage Enterprise Edition

APT_BUFFER_MAXIMUM_TIMEOUT – set to 1 for pre V7

Controls the speed of data flow after buffering

APT_BUFFER_MAXIMUM_MEMORY – default is 3M

Increase for large memory configurations to avoid buffering to disk

APT_BUFFER_DISK_WRITE_INCREMENT – default is 1M

Increase to create larger bursts of I/O during buffering to disk

APT_BUFFER_FREE_RUN – default is N *

APT_BUFFER_MAXIMUM_MEMORY

increase to reduce data flow impedance for large memory configurations

Controlling the Buffers in DataStage Server

Set BUFFERSIZE and TIMEOUT for intra/inter-partitioning – default is 128K

Set for project in administrator or in job properties

for a particular job

Evaluating performance with Enterprise Edition

APT_DUMP_SCORE

used to understand the details of a data flow.

APT_PM_PLAYER_TIMING

Used to understand the CPU characteristics of a data flow

APT_RECORD_COUNTS

Used to check for data skew across data partitions

Evaluating performance with Server

Performance statistics

– enabled in the “Tracing” panel of the “Job run options” presented when a server job is

run (Director or Designer)

☻Page 82 of 210☻

The Configuration File

Tells DataStage how to exploit the underlying computer hardware. For any given

system there is not one ideal config file since in a given job there is a lot of

variance about how they work on that system.

General hints: (assumes SMP environment)

avoid using the disk that are used for ‘landing’ input and output data for

scratch and resource disk

Do not use NFS or other remotely mounted disk for scratch disk

Understand the file system underneath the mount points being used by the

configuration file

Separate the I/O between nodes as much as possible to provide the

maximum I/O bandwidth

Run your application using various configurations to understand it’s

complexion during volume testing before moving to production.