This section defines the format of the symbolic names used within STEMMA. This includes the Key attribute associated with the top-level entities and narrative text, the Dataset Name attribute, and the names of any Citation/Resource parameters.
They are composed of ASCII alphanumeric characters (i.e. A-Z, a-z, 0-9), underscore, and hyphen but must start with a letter or underscore. Symbolic names are treated in a case-sensitive way.
Q: What length limits should apply? Any limits should only be for the purpose of a sanity constraint. Q: Should any other characters be allowed?
Entity Key names are local to the respective Dataset. If references need to be analysed across multiple Datasets (e.g. when comparing them) then their Key names should be decorated with the parent Dataset name in order to make them unique, e.g. TonysTree:Tony123. If multiple STEMMA Documents are loaded concurrently then a similar decoration must be applied using the Document names. This effectively defines a hierarchical namespace and the syntax of the decorations is designed to follow the precedent set by the URN standard.
All Key names are considered to be the same scope level within their Dataset. That is, even though some entities may have named sub-elements, their names are not considered to be subordinate to that of their parent entity. Although this was considered, it would introduce an unjustifiable complexity since some cases involve three levels (e.g. Person->Narrative->Text) and their access is not constrained by those same parents. A reference to Key.Key2 might otherwise require decomposition using separate attributes to facilitate XML query languages such as XPath, XSL Pattern, etc.
There are a number of vocabularies implemented in the attribute values and element data used by STEMMA, and these may be controlled (predefined) or partially controlled (extensible). These include type names, role names, mode names, and Property names. As section Extended Vocabularies explains, these values constitute Fragment identifiers and may be qualified with their corresponding namespace URI. They are therefore case-sensitive and limited to the ‘unreserved character’ set defined in RFC3986 (URI: Generic Syntax). This is similar to the aforementioned character set but additionally including period and tilde.