Generate rows from Java Script

Copied from:

http://type-exit.org/adventures-with-open-source-bi/2010/06/generating-rows-using-javascript-in-pentaho-kettle/

Generating Rows using JavaScript in Pentaho Kettle

By Slawomir Chodnicki, on June 7th, 2010

If you have read the article about using Java code in Kettle, you might be wondering whether it is possible to generate rows using the Modified JavaScript Value step in a Kettle transformation. In other words, whether you could use JavaScript to create a step, which generates rows just as the table input step and the Excel input step do. Well, it is possible. For purposes of illustration let’s create a short example that will output all of our Java system properties as rows. Download the example transformation if you like.

First things first. The Modified JavaScript Value step is not an input step as such, and will not execute without receiving some input first. Fortunately it is easy to create a single empty row using the Generate Rows input step. Just leave all fields empty and limit the output to only one row. If we connect this to a JavaScript step, it will start executing.

Now for the fun part: the JavaScript step. Its task is to output each Java system property with the fields property_name and property_value. Both fields should be strings. So the fields section of the JavaScript step editor would look like this.

Now for the script. It shows how to create new rows and fill them with values.

// get the system properties

var props = java.lang.System.getProperties();

// get an enumeration for the property names

var e = props.propertyNames();

// loop through the properties

while (e.hasMoreElements()) {

var propName = e.nextElement();

var propValue = props.get(propName);

// create a new row object, that fits our output size

var row = createRowCopy(getOutputRowMeta().size());

// find the index of the first field appended by this step

var idx = getInputRowMeta().size();

// fill each field (notice how the index is increased)

row[idx++] = propName;

row[idx++] = propValue;

// output the row

putRow(row);

}

// do not output the trigger row

trans_Status = SKIP_TRANSFORMATION;

The interesting part of the script is inside the loop. The function createRowCopy will copy the currently processed row and extend it to the given size. Here the script specifies the output size of our step, i.e. the total number of output fields coming out of the step. In the example the total number of output fields is two, of course: zero coming in and two appended, but hardcoding things like that rarely feels right.

After the row is constructed it must be filled with values. The row is really an object array, so the property name and value can simply be put into it, while observing the order in the step’s output field definition. Again, in the example it would be possible to hardcode the indices as 0 for property_name and 1 for the property_value. But suppose in a different scenario a non-empty input row really was copied and the script needed to append fields at the end of it. The script shows how to properly determine the index of the first field added by the JavaScript step. It is found by looking at the number of fields in the input row. In the example case case this is 0, of course, so 0 is the index for the property_name field in the row. A call to the putRow function wraps things up and sends the new row downstream.

The last line of the script is a bit curious. The predefined variable trans_Status has a special purpose in a JavaScript step. It controls the behavior of the step after a row has been processed. It can take three predefined values.

CONTINUE_TRANSFORMATION: The step puts the currently processed row downstream (the default).
SKIP_TRANSFORMATION: The step rejects (throws away) the currently processed row and continues with the next one.
ERROR_TRANSFORMATION: The step signals Kettle that an error has occurred.

The possible values are available in the Transform Constants folder on the left side of the JavaScript step editor for quick access. In the example a single input row is used as a trigger to execute the JavaScript step. This trigger row is not supposed to be put downstream. Setting the trans_Status variable to SKIP_TRANSFORMATION achieves exactly that.

When previewing the transformation, we see that all the system properties are generated nicely.

As a final word, I would like to point out that the presented technique is suitable only for moderate amounts of rows (a few millions maybe). A JavaScript step will always be outperformed by a proper Java plugin. There are also a few situations during development, when it might make your life a bit easier. It can help when working on a proof of concept, for example. It may be useful if you need to generate specific test data, too. It might also be useful if you need a dummy placeholder for a plugin, that is under development by somebody else. Should you feel tempted to generate massive amounts of data using this technique however, you are more likely to achieve acceptable performance by writing a proper Kettle plugin or try to get your data in the form of a data dump.

Google Sites

Report abuse