Articles‎ > ‎

Code Generation using XML / XSLT

Automated code generation (in the likes of MDA) has been a dream and it has now taking another direction. Instead of trying to solve the worlds problem we are seeing a surge in DSLs / DSMs tailored to more constrained systems which are tangible and exhibit immediate cost benefit. We discuss here the phase after modeling is complete - the translation procedure which generates equivalent code in a specific language. Here we will explore the possibility of generating structured language code using XSLT (eXtensible Stylesheet Language Transformations) when the model can be represented (or reduced through a mapping) in XML format.

Template based code generation is a widely adopted method in automatic code generation. These templates are governed by parameters to represent the possible vagaries in the output. Templates also enable reducing the problem to target language agnostic method to a large extent.

XSL Transformation is widely used in translating XML documents to other formats such as - XML of a different schema, HTML and even to PDF documents. So why not generate a high-level or structured language code as the target?

Here I will illustrate how an XML representation of a model can be used along with XSL / XSLT to generate high-level language code. In the process we shall explore the pros and cons and briefly touch upon ways to extend this further. To begin with we will explore representing the model in XML.

Representing high-level language constructs in XML

XML is well suited for representing hierarchically structured information. Since high-level language code are representable in an AST form which is again hierarchical it is quite possible to have an XML which can be mapped to AST. In fact the model's XML need not even cover all the syntactic constructs of the target language. In fact the model can represent information in a much more abstract for thats is language agnostic. XSL Transformation over multiple passes can be performed to yield a XML representation of the model that closely resembles the target languages constructs.

Further XML also gives us the flexibility to facilitate templates that represent idioms and programming patterns specific to a language while still providing the raw constructs. For e.g. we could have an XML Schema which can be translated into for say STL based / non-STL based implementation in C++. On one hand we harness the power of extensibility of XML while also being able to utilize the XML schema validation to ascertain the correctness of the input to each stage.

To what extent we abstract the schema and how many stages of transformation the original XML model undergoes is dependent of the application. For simplicity we will consider a model in XML that closely resembles the target languages constructs to illustrate the possibilities. The following section talks in terms of XML, XSL and XPath as we explore this method of code generation so a familiarity with these should keep things simple. (Although not discussed here, employing a XSD help define clearly the scope of the input both for the program and the programmer.)

XML model for C Language Constructs

Here we illustrate a model for some of the basic constructs of the C Language. The following are XML fragments for certain data representations in C Language.

<Element Name="ENABLE" Type="byte" Value="1" Qualifier="#define"/>

<Element Name="today" Type="enum" EnumTag="DayOfWeek">Monday</Element>

<Element Name="wordMax" Type="word" Value="65535" Qualifier="const"/>

<Sequence Name="currentStock" Type="struct" StructTag="Stock">
        <Element Name="name" Type="string" Value="hiking boots"/>
        <Element Name="minStock"  Type="int" Value="10"/>
        <Element Name="maxStock" Type="int" Value="25"/>
</Sequence>

<Array Name="fibonacci" Type="int" Size="5" Qualifier="static">
        <Entry Value="0"/>
        <Entry Value="1"/>
        <Entry Value="2"/>
        <Entry Value="2"/>
        <Entry Value="3"/>
</Array>

#define ENABLE 1

enum DayOfWeek today = Monday;

const unsigned short wordMax = 65535;

struct Stock currentStock = {
 "hiking boots"
, 10
, 25
};

static int fibonacci[5] = {
  0 
, 1
, 1 
, 2 
, 3 
};

Each of the XML fragments illustrated above is explained here. the 
  1. ENABLE is a macro with the value '1'. The Qualifier attribute of the Element tag is used in distinguishing it from a variable definition. Here the Type attribute only serves for any purpose of the model. (For e.g. this could be mapped to a UI control based on the type.)
  2. today is a variable of enumerated type DayOfWeek initialized to the value Monday.
  3. wordMax is a constant variable of type unsigned short initialized to the value 65535. Notice that Type - word is mapped to C language's type unsigned short.
  4. currentStock is an instance of structure Stock with three fields - nameminStock and maxStock.
  5. fibonacci is an array of five ints initialized with the 1st five numbers of the Fibonacci sequence
And now for the XSL which does the above transformation.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="utf-8" indent="no"/>

    <xsl:template match="Element">
        <xsl:choose>
            <xsl:when test="'#define'=@Qualifier">
                <!-- The element is output as a macro definition -->
                <xsl:text>#define </xsl:text>
                <xsl:value-of select="@Name"/><xsl:text>&#32;</xsl:text><xsl:value-of select="@Value"/>
            </xsl:when>
            <xsl:otherwise>
                <!-- The element is output as a variable definition -->
                <xsl:call-template name="outputCQualifier"/>
<!-- Function Call!  -->
                <xsl:call-template name="outputType"/><xsl:text>&#32;</xsl:text>
                <xsl:value-of select="@Name"/><xsl:text> = </xsl:text><xsl:value-of select="@Value"/>
                <xsl:text>;</xsl:text>
            </xsl:otherwise>
        </xsl:choose>
        <xsl:text>&#13;&#10;</xsl:text>
    </xsl:template>

    <xsl:template match="Sequence">
        <xsl:call-template name="outputCQualifier"/>
        <xsl:text>struct </xsl:text><xsl:value-of select="@StructTag"/><xsl:text>&#32;</xsl:text>
        <xsl:value-of select="@Name"/><xsl:text> =&#13;&#10;</xsl:text>
           <xsl:call-template name="outputValue"/>
        <xsl:text>;&#13;&#10;</xsl:text>
    </xsl:template>

    <xsl:template match="Array">
        <xsl:call-template name="outputCQualifier"/>
        <xsl:call-template name="outputType"/><xsl:text>&#32;</xsl:text>
        <xsl:value-of select="@Name"/>
        <xsl:text>[</xsl:text><xsl:value-of select="@Size"/><xsl:text>] =&#13;&#10;</xsl:text>
           <xsl:call-template name="outputValue"/>
        <xsl:text>;&#13;&#10;</xsl:text>
    </xsl:template>

    <xsl:template name="outputCQualifier">
        <xsl:if test="''!=@Qualifier"><!-- IF-THEN only -->
            <xsl:value-of select="$Qualifier"/><xsl:text>&#32;</xsl:text>
        </xsl:if>
    </xsl:template>

    <xsl:template name="outputType">
        <xsl:choose><!-- SWITCH-CASE using cascaded IF's -->
            <xsl:when test="@Type='byte'">unsigned char</xsl:when>
            <xsl:when test="@Type='word'">unsigned short</xsl:when>
            <xsl:when test="@Type='enum'">
                <xsl:text>enum </xsl:text><xsl:value-of select="@EnumTag"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:message terminate="no">
                     <xsl:text>#missing type at node: </xsl:text>
                    <!-- Invoke XSL Code to formulate XPath of current node based on the Schema -->
                </xsl:message>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template name="outputValue">
        <xsl:variable name="TagName" select="name()"/>
        <xsl:choose>
            <xsl:when test="$TagName='Element' or $TagName='Entry'">
                <xsl:choose>
                    <xsl:when test="boolean(@Value)"><!-- IF-THEN -->
                    <xsl:if test="@Type='string'"><xsl:text>"</xsl:text></xsl:if>
                    <xsl:value-of select="@Value"/>
                    <xsl:if test="@Type='string'"><xsl:text>"</xsl:text></xsl:if>
                    </xsl:when>
                    <xsl:otherwise><xsl:value-of select="."/></xsl:otherwise><!-- ELSE -->
                </xsl:choose>
            </xsl:when>
            <xsl:when test="$TagName='Sequence'">
                <xsl:text>{ </xsl:text>
                <xsl:for-each select="Element|Sequence|Array">
                    <xsl:call-template name="outputValue"/>
                    <xsl:if test="position() != last()"><xsl:text>,</xsl:text></xsl:if>
                    <xsl:text>&#32;</xsl:text>
                </xsl:for-each>
                <xsl:text>}</xsl:text>
            </xsl:when>
            <xsl:when test="$TagName='Array'">
                <xsl:text>{ </xsl:text>
                <xsl:for-each select="Entry">
                    <xsl:call-template name="outputValue"/>
                    <xsl:if test="position() != last()"><xsl:text>,</xsl:text></xsl:if>
                    <xsl:text>&#32;</xsl:text>
                </xsl:for-each>
                <xsl:text>}</xsl:text>
            </xsl:when>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

The magic wand called XSL

The XSL snippet to the left depicts the essentials for handling of the above constructs. (There is much more required for supporting all variants of these constructs.)

A walk-through of this XSL would give a good idea of what kind of transformations can be achieved. It is quite easy to understand the working of this once you draw an analogy with processing in an AST.

Once you get past the mindset of procedural programming it is easy to understand how much more pliable this approach is compared to writing high-level language code to generate code.

Being a Functional language, XSL lacks mutable variables, native support for loops, global state variables etc. The 1st two can be achieved by using functions in XSL (<xsl:template name="function-name">) recursively. The latter and many of the other deficiencies (in comparison to procedural languages) can be solved by exploiting XSL constructs (for e.g. <xsl:message> can be used for reporting warnings / exceptions).

At worst any such deficiency can be sorted out by using XSLT Extensions. Further, with the support for invoking .NET / Java code from XSL becoming prevalent amongst XSLT processors you are only limited to your imagination to what you can achieve with it ;).

You might wonder if the sample XSL presented here is quite BIG for the constructs it handles. But we need to realize the plethora of details it handles - seamless hierarchical traversal, contextual translation, output formatting etc. against how much it would take to handle all of these in a high-level language. More importantly as we support more and more constructs, templates etc. extending the XSL proves to be more elegant.

With so much done in XSL, there is hardly any non-trivial coding left to do!

More power with XPath

While the e.g. presented thus far deals with some of the simple constructs well there are certain capabilities we've not covered that are crucial for this scheme to scale well. Programming languages supports defining a type, value and such in one place and referring to it from elsewhere to avoid redundancy of the definition. We would need a parallel capability for use in our model XML.

For this purpose we use XPath - a query language for selecting nodes and computing values from a XML document. This helps in representing variables, linked-lists, pointers and others in a very elegant way.

The following e.g. exemplifies how XPath helps in simplifying the XML model by enabling references to parts of the XML document.

<Sequence Type="struct" StructTag="stockChain">
        <Element Name="name" Type="string"/>
        <Element Name="minStock"  Type="int"/>
        <Element Name="maxStock" Type="int"/>
        <Ptr Name="pNextStockItem" Type="struct"/>
            <Ref StructTag="stockChain"/>
        </Ptr>
</Sequence>
<Sequence Type="struct" StructTag="warehouseStock">
        <Element Name="location" Type="string"/>
        <Element Name="turnAroundTime"  Type="int"/>
        <Ptr Name="pCurrentStock" Type="struct"/>
            <Ref StructTag="stockChain"/>
        </Ptr>
</Sequence>

Just like we have illustrated reference to predefined types we can similarly represent predefined values as well. The only catch is that XSL inherently cannot evaluate an XPath stored in a string variable and mandates that it be specified inline.

Now provided the model's XML Schema stipulates a means to identify type definitions uniquely, we can use a XPath such as - //Sequence[@StructTag="stockChain"], for processing Ptr nodes with a child Ref node with StructTag attribute.

When to use XSL over other means of code generation?

A precursor to being able to use this scheme is that the representation of the model follows a hierarchical structure similar to that of the AST of a high-level language or can be easily transformed to one such. In such cases, high-level language code needs to handle both processing the hierarchy as well as handle all the details involved in code generation. This can quickly grow in complexity of code and the effort involved in scaling it.

Since the model and through its transformation across stages the schema is well defined, it is easy to perform XML Schema validation using a XML Schema Definition (XSD) file on the input of every stage. This yields to a more robust code generation framework. Further this multi-phase transformation can easily be adapted to target code for different languages as long as the source model XML can be that much target language agnostic.

One major problem with using XSL is the formatting of the output. XSL does not scale well to all kinds of formatting needs. Especially when the formatting of the impending output is based on what output has been previously generated and how it was formatted. Not that it is impossible but we would end up losing the simplicity we've gained by using this scheme.

The XSL scheme is not great in terms of performance either. But performance is usually not a major constraint for code generators as much as functionality and extensibility are. Further,relying on XSL Extension Object with binding to high-level languages could worsen the performance even more.

Conclusion

Using XML / XSLT is a less known method for source code generation. When the system's model can be easily expressed in hierarchical and semantics rich XML and output is based on code templates this method is very advantageous. With tools such as Altova XMLSpy, StylusStudio and Visual Studio's integrated XSLT Debugger (and more) the development of XSL can be accelerated many folds.

This betterment in using XSL is markedly notable when hand-written code generators are considered as an alternative. To what extent this method competes with others like using the m4 processor, StringTemplate is worth investigating. Nevertheless the versatility, development / testing effort involved and intuitiveness of the method of Code Generation using XML / XSLT can starkly overshadow its weakness in many template based source code generation scenarios.

References

Comments