The Lookup stage in DataStage

The lookup stage in Datastage 8 is an enhanced version of what was present in earlier Datastage releases. This article is going to take a deep dive into the new lookup stage and the various options it offers. Even though the lookup stage can’t be used in cases where huge amounts of data are involved(since it requires data to be present in the memory for operations), it still warrants its own place in job designs. This is because the lookup stage offers a bit more than the other conventional lookup stages like join and merge.

Lets look at the example shown below.

Source

Emp ID EmpName Dept

1001 AABB IT

1002 BBCC IT

1003 BBDD BS

Reference

Emp ID Salary Dept Quarter

1001 2000 IT Q1

1001 3000 IT Q2

1001 4000 IT Q3

Now if you use the lookup stage the with Emp iD as the key then the output would be as below

EMp ID Salary Dept EmpName Quarter

1001 2000 IT AABB Q1

But if you have a closer look at the data we can see that the reference table actually has three records for that ID. However your lookup stage actually only retrieved the one record. Now if you need to retrieve all 3 records for that ID then you will have to

- Go to the constraints page of the lookup stage
- Go to tab ‘Multiple rows returned from link’
- Select the refernce link

This will modify your output as below

EMp ID Salary Dept EmpName Quarter

1001 2000 IT AABB Q1

1001 3000 IT AABB Q2

1001 4000 IT AABB Q3

A point to be noted is that only one reference link in the lookup stage can return multiple rows. This can’t be done for more than one reference link and can only be done for in-memory lookups

There are a host of other options also available on the constraints page shown below

In addition to the lookup, the stage also gives us the option of checking if the data satisfies a particular condition like Salary > 2000,etc.. All such additional conditions that you want to check can be done in this area. How the job behaves during a lookup is determined by the ‘Condition Not Met’ or ‘Lookup Failure’ option. The four options available for this tab are Continue, Drop, Reject and Fail. Condition Not Met option will be applicable if you provide a condition check. If you do not provide such a check then the values in the ‘Condition Not Met’ option will not make a difference.

The ‘Continue’ option will allow the job to continue without failing and the retrieved refernce value will be populated as NULL. If the value is specified as ‘Drop’, then the records will be dropped from the data set if the lookup/condition has failed. If the option is specified as ‘Reject’, then all records that failed lookup will go to the reject link. You should remember to provide a reject link to the lookup stage if this option is set. Else your job will fail. If you specify the value as ‘Fail’, then the job will move to the aborted state whenever a lookup fails against the reference dataset.

The lookup stage gives us 3 different lookup options. The first is ‘Equality’ which is the normal look. The data is looked up for an exact match (Case sensitive). The second option is the Casesless match. It does exactly what the name indicates. The third and final option is the ‘Range’. This allows you to define a range lookup on the stream link or a reference link of a Lookup stage. On the stream link, the lookup compares the value of a source column to a range of values between two lookup columns. On the reference link, the lookup compares the value of a lookup column to a range of values between two source columns.