Configuring the XML Input stage XMLs are widely used in enterprises for exchanging messages and at some point or the other you are going to have to extract the data out of XML messages. For this purpose we use the XML Input stage that is part of the real time processing stages in the palette section. The best way to explain how to use this stage is by an example. Let’s have a look at the XML I created for this purpose.
<?xml version="1.0" encoding="UTF-8"?> <Order> <OrderNo>ABCD</OrderNo> <OrderDetail> <Item>Item1</Item> <Qty>2</Qty> <Total>24</Total> <UOM>Each</Total> <Delivery>C</Delivery> </OrderDetail> <OrderDetail> <Item>Item2</Item> <Qty>4</Qty> <Total>60</Total> <UOM>Each</Total> </OrderDetail> <Order>
Shown below is the picture of the job design
The XML file is read by means of a sequential file stage. The entire xml content is read as a single record in a column called XML_Input. One thing that I have done here is that I have removed all the new line characters from my XML document. This will make my configuration easier. Shown below is how the data is read in the sequential file stage.
Described below are the settings ive done in my XML INPUT stage
- Selected the input column ‘XML_Input’ in Input->XML Source->XML_source_column
- Selected the option ‘XML Document’ option in Input->XML Source-> Column Content
- Checked the option ‘Repetition Element Required’ in Output->TransformationSettings
- Checked the ‘Replace Nulls with empty values’ option in Output-> TransformationSettings – Set the key column in the ouput tab as Item
- Provided the XPATH information of each column as shown below
Shown below is the output of the job as per the above settings.
As you can see a new record is created for every new value of the ‘Item’ column. The Item column is the key and called the repetition element.
Now let’s look at what happens if we change the key column as ‘OrderNo’. All the rest of the settings are maintained as is. We can see that the output now has only one record since there is only one value of the ‘OrderNo’. This is how you control how a new record should be created. The modified output is as shown below.
Let’s now change the key to ‘Delivery’. As you can see there is only one value for delivery in the ‘Order Detail’ structure. As can be seen from the below output, only the record which had a value in Delivery has been retrieved from the XML. This is because we have checked the ‘Repetition Element Required’ option. This will ensure that if your key column does not have a value then that record will be dropped.
We can force the output of the other record by un checking the ‘Repetition Element Required’ option in Output->Transformation Settings. Such a change will output the second record also as shown below.
I hope this basic example will set you on the right track in configuring your XML Input stage. The XML Output stage is explained in the different post. Please do go through that too to get the complete picture of the XML stages.
Tip: If your XML contains namespaces and is a large document I’d advise you to first import the XML definitions into Datastage, and then while configuring the XML stages, you should import the namespace declarations as well as the column definitions and XPATHS from the imported XML definition. This should make you configuration tasks much easier. The XML definitions can be loaded from XSD and XML files.