Semantic Web Annotation Using an Instance Markup Language (IML)

This is an approach to providing a simple means to create semantic annotation of traditional HTML Web pages so that specified terms can be machine identified as instances of classes or properties of an ontology located somewhere else on the Web. Traditional HTML Web site developers are not a significant part of the Semantic Web development community, even though both use W3C recommendations (standards) for their development efforts. This discontinuity of development communities means that the great majority of HTML Web developers know essentially nothing of Semantic Web technologies and, unfortunately, there are very few incentives to drive these developers into the Semantic web camp.  There are some approaches related to semantic markup of HTML Web pages. RDFa and XHTML 2  address this issue but their recommended solutions are very complex and do not support semantic markup of HTML web pages by typical HTML authors. Other approaches, such as provides a way to markup web pages but use loosely defined concepts that do not follow the more formal logic approach such as provided by the W3C OWL descriptive logic language, and as a result, does not support reasoning. This recommended approach uses an Instance Markup Langue (IML) to provide a way for HTML authors to immediately begin to participate in the growth of the Semantic Web. Writing ontologies using the Resource Description Framework (RDFs) and Web Ontology Language (OWL) provides a major hurtle in getting started in Semantic Web development. This problem can be solved by allowing experienced Semantic Web developers to continue to create domain ontologies and allows the traditional HTML authors to leverage the formal semantics of those ontologies by linking terms on their HTML web pages directly to existing remote domain ontologies. 

One of the goals of this recommendation is to facilitate the Semantic Web vision of providing "discrete" answers to Web queries. For example, a standard web search for "bordeaux wine" would return a large number of web pages that mention the word Bordeaux and the searcher could begin reading each web page hoping to find the desired specific information about Boredeaux wines. An IML semantic search using the same terms would return: Angelus, Armailhac, Ausone, Bellevue Mondotte, Bourgneuf, Croix de Labrie, Les Grandes Murailles, and any other specific references to Bordeaux wine that had been marked up using IML on any HTML Web page. The IML semantic query would also show related terms in the Wine ontology, such as winery, region, color, etc., making it easy to expand the search for more discrete information related to Bordeaux wines. Additionally, the query would provide a hot-link back to the original HTML Web page so you could read what the author had to say about the Bordeaux wine Croix de Labrie. In addition to general markup on their web pages by wine enthusiasts, commercial wine merchants would take advantage of IML to semantically markup their HTML web sites so that all of their Bordeaux wines would be found via an IML semantic search.

Some of the requirements for IML are:
- No modification needed to existing browsers, which will ignore the IML.
- The IML markup does not change the presentation of the HTML markup in any way.
- Web spiders would automatically search, collect and store the IML markup from HTML pages.
- Applications would use the stored IML markup to provide discrete search capabilities.

The Instance Markup Language is describe below:

IMLelement: The IML element denotes the HTML page contains IML semantic markup. It is located within the head element. It takes the form of <IML localuri /IML> where localuri is the address of the web page containing IML markup. Example: <IML /IML>. The IML element is simply a flag for automated spiders searching for IML content.

XMLNS: The xmlns element denotes the namespace of the ontology to which specific terms in the HTML document are to be linked. All ontologies are required to have their own unique namespace, which is the web location of that ontology. The xmlns element also provides a means to identify a shortcut for the ontology namespace to reduce typing, or cutting and pasting, each time the ontology is referenced in the HTML document. Example: <xmlns:wine="http://>. You can see the advantage of being able to type the designated shortcut "wine" for future references rather than the full wine namespace. 

INST: The inst element denotes the term within the HTML markup that it to be designated as an actual instance of a class or property of the referenced ontology. Example: <span inst="wine:bordeaux" Croix de Labri </span>. The shortcut "wine" references the wine ontology namespace. The term "bordeaux" indicates the actual class within the wine ontology. The term "Croix de Labri" is the specific instance of the wine class bordeaux. The use of "span" allows the multi-word term "Croix de Labri" to be treated as a single instance.

The two examples below show IML being used to provide semantic markup of a standard HTML Web page. The terms being marked up are arbitrarily show in red. 

Example One: Jason Pahlmeyer created a California Bordeaux wine called the Pahlmeyer Red Table Wine from the Napa Wine Company winery in Napa Valley Region, California.

Example Two: My name is John Flynn and I work for BBN Technologies. One of my close friends and co-workers is Mike Dean who is one of the Semantic Web pioneers and the developer of a semantic data triple-store called DAML-DB.

The actual IML markup that results in these examples is shown below:

<p><b>Example One: </b>Jason Pahlmeyer created a California Bordeaux wine called the

<font color="#FF0000">Pahlmeyer Red Table Wine <i4 /i4>

</font>from the <font color="#FF0000">Napa Wine Company <i3 /i3></font>winery in<font color="#FF0000">

Napa Valley Region<i3 /i3></font>, California.</p>

<p><b>Example Two:</b> My name is <font color="#FF0000">John<i1 /i1></font>

<font color="#FF0000">Flynn<i1 /i1></font> and I work for <font color="#FF0000">BBN 

Technologies<i2 /i2></font>. One of my close friends and co-workers is

<font color="#FF0000">Mike<i1 /i1></font> <font color="#FF0000">Dean<i1></i1> </font>who is one 

of the <font color="#FF0000">Semantic Web<i2 /i2></font> pioneers and the developer of a 

semantic data triple-store called <font color="#FF0000">DAML-DB<i1 /i1></font>.</p>

<font color="#FF0000">


A simple example of how instance markup might be collected by spiders and organized for search purposes or for access by software agents is shown below. The real power of this approach would come when instance data from thousands of HTML web pages were associated with Semantic Web ontologies.