Not XML

One might hope that a DAV markup language might be cobbled together from some scheme that already exists, or could be supported by a slight modification of something we already know and use. (See some options and their history.) One might have hoped, for instance, that SGML (Standard Generalized Markup Language, the successor to GML) or XML (Extensible Markup Language, extended from SGML) could be used as a platform for encoding justification, arguments and evidence streams. It has several desirable properties including extensibility to allow definition of arbitrary new tags, support for Unicode, broad adoption and a widespread user base, and rigorous specification and existing applications in science and mathematics, measurement, and data handling. However, it does not seem to be possible to use XML for our purposes for a few reasons, mainly because (1) it doesn't allow overlapping elements, (2) it uses draconian error handling, and (3) it is ugly, unintuitive and fragile. Below is a synopsis of the main features of XML:

Supports essentially all Unicode

tag is a markup construct that begins with < and ends with >, of which there are three types:
start-tag, such as <section>
end-tag, such as </section>
empty-element tag, such as <line-break />

element is a document part that begins with a start-tag and ends with a matching end-tag, or is an empty-element tag

content is the string of characters, if any, between a start-tag and its end-tag
content may also contain markup, including other elements, called child elements

attribute is a markup construct consisting of a name–value pair that exists within a start-tag or empty-element tag
e.g., <img src="iggy.jpg" alt="Iggy Pop" />, where "src" and "alt" are the attributes
an attribute can appear at most once in an element
an attribute can only have a single value, although multiple values can be encoded as a delimited lists

Content including the characters <, >, and & must represent them as <, >, and &

Comments begin with 
comments cannot be nested (and cannot even include double-hyphen "--")
comments cannot include characters outside the character set of the document encoding
but can include un-escaped &

Draconian error handling (refuses to process a document if it has any errors)
allows only properly encoded legal Unicode characters
no appearances of < and & that aren't used in marking
tags delimiting elements are correctly nested, with no missing and no overlapping
tag names are case-sensitive; the start-tag and end-tag must match exactly
tag names cannot contain any of the characters !"#$%&'()*+,/;<=>?@[\]^`{|}~, nor a space character
tag names cannot begin with "-", ".", or a numeric digit

Page updated

Google Sites

Report abuse