XML Processing Model - Language Syntax


Proposed by Alessandro Vernet, Orbeon Inc.
First version: July 13, 2006. Last revision: July 26, 2006.

Use case

To illustrate the language syntax, we consider here a "real life" example. The pipeline takes as input the name of a large file, uses a custom component that parses the file, and from its content generates a sequence of small XML documents. Each document is validated. If valid, it is imported in an XML database. Otherwise an error document is created. The pipeline returns a document with all the validation errors. See the complete pipeline for this use case.

Pipeline input and outputs

<p:pipeline xmlns:p="...">
   <p:declare-input port="dump-filename" name="file"/>
   <p:
declare-output port="errors" ref="aggregated-errors"/>

pipeline := (declare-input*, declare-output*, statement+)
declare-input := ($port, $name)
declare-output := ($port, $ref)
statement := (step | for-each | choose) 

This pipeline has 1 input and 1 output. The names seen from the outside are 'dump-filename' and 'errors'. A name 'file' is assigned to the input. This name is then used to reference that input inside the pipeline. The output comes from 'aggregated-errors' which is a name defined in the pipeline.

Step

<p:step kind="vendor:parse-dump">
   <p:input port="filename" ref="file"/>
   <p:output port="documents" name="documents-to-import"/>
</p:step>

step := ($kind, input*, output*)
input := ($port, $ref | $href | {any content})
output := ($port, $name)

  • <p:input> and <p:output> are used in a step, but <p:declare-input> and <p:declare-output> are used to declare inputs/outputs on a pipeline. The different names reflect the different semantics of those elements. On <p:input> and <p:output>, the 'port' attribute is always used for the inputs/outputs name defined by components.
  • On the input, the ref="file" is a reference to the name 'file'. Instead of 'ref', the 'href' attribute can be used to get the data from a URI, as in: href="filename.xml".
  • On the output, name="documents-to-import" names the output. This name is referenced later in the pipeline.

For-each

<p:for-each ref="documents-to-import" name="source-document">
   <p:for-each-output name="sequence-of-errors" ref="error"/>
 

for-each := ($ref | $href, $select?, $name, for-each-output*, statement+)
for-each-output := ($name, $ref)

<p:for-each> has in this example the 2 attributes 'ref' and 'name'. Instead of 'ref' it can have an 'href' attribute, and it has an optional 'select' attribute:

  • The 'ref' attribute  is a reference to the sequence of documents we want to iterate on. To be consistent with the <p:with-input>, the 'href' attribute can also be used instead of the 'ref' to reference a URI.
  • If present, the optional 'select' attribute contains an XPath expression. It is evaluated on the sequence of documents pointed by 'ref' or 'href'. The returned nodeset must only contain elements, which are made into a sequence of documents.
  • The name defined with the 'name' attribute is visible inside the <p:for-each> and is used to reference the document for the current iteration.

<p:for-each-output> has 2 attributes:  'name' and 'ref':

  • The 'name' attribute defines a name for the output of the <p:for-each> that will referenced outside of the <p:for-each>.
  • The 'ref' attribute is reference to a label defined inside the <p:for-each>.

Choose/when/otherwise

<p:choose ref="is-valid">
   <p:choose-output name="error" ref="error"/>
   <p:when test="/validity != 'true'">...</p:when>
   Other <p:when>
   Optional <p:otherwise>
<p:choose>

choose := ($ref | $href, choose-output*, when+, otherwise?) 
choose-output := ($name, $ref)

when := ($test, statement+)
otherwise := (statement+)

Evaluates the XPath expression on the sequence of documents named 'is-valid'. Here again you could have href="..." instead of ref="...". The first <p:when> that returns true() is executed. If none returns true() and there is a <p:otherwise>, then the <p:otherwise> is executed.

References