Although this is a requirement that I haven’t seen as being fairly popular there have been instances in which projects try to use the java stages to align the design with their real time architecture or sometimes merely for the purpose of reusing a piece of java functionality already being used in their systems. One thing you should be aware of is that the java stages cannot be used in cases in which you will be having huge loads of data and need fast execution rates. Like I said earlier, you will find its use only in case of real time interfacing. Now, using Java stages is fairly simple in Datastage. Lets look at the process involved in getting your java transformer to work in your job design.
Java stages were used in earlier 7.5 versions of Datastage where it was available as an additional pack. However with all following versions the Java pack ( as well as the web service pack) have been provided along with the rest of the default stages. You will find the java stages under the Real Time tab in your palette section.
A few things you will have to remember before you actually start using the java stages is that there are a couple of initializations that you will have to do like setting up the DATASTAGE_JVM and DATASTAGE_JRE variables appropriately in the Datastage Administrator(based on the server on which Datastage resides, Datastage version, java version you are using). You will be able to find the values for your particular configuration in the release notes.
The standard environment variable values for Information Server 8.1 with Java 1.4 as provided by IBM are:
DATASTAGE_JRE=C:\IBM\InformationServer\_jvm\jre
DATASTAGE_JVM=bin\j9vm
DSHOME=C:\IBM\InformationServer\Server\DSEngine
The main work in using the java stage involves writing the java code. The java transformer itself doesn’t require much configuration work. The main details you will have to enter in the transformer stage are the following
Name of the custom class being used
Classpath containing your java code or jar file. This can be any location that is accessible to the Datastage server. There is no compulsion that this has to be in the IBM Datastage folder that is holding the tr4j.jar file.
Input and Output column definitions
Everything else depends on the java code you write. When you write your java code you have to remember that the custom class that you are writing must implement a subclass of the Stage class. This is the class which will have all the functions that will help you in reading and writing the Datastage rows.
There are three methods in the class that have to be overridden to achieve the functionality that you want. The code would take the following structure.
public class sample extends Stage {
public void initialize() {
// …Any variable initializations, opening of database connections are ideally done here
}
public int process() {
// This section will contain your processing code. This is where you will read your Datastage rows and handle your business logic
}
public void terminate() {
// …Any termination logic, like deleting temporary files, etc..
}
}
A full list of available methods for reading, writing, rejecting rows are available in the java pack documentation. There are also logging methods available which will take care of your warning messages and also help you in your debugging activities. Everything you will need to write a program of your own will be available in the java pack guide. The sample programs will give you a place to start.