ATE Dublin Core Spec

We selected the Dublin Core specification because it is simple -- designed to be used by non-catalogers as well as resource description specialists -- interoperable across many domains, readily extensible, well supported, consistent with HTML documents, and is garnering international support. The simplicity and coverage allows us to map the kinds of bibliographic support that ARTFL has provided for years with an easy to manage encoding.

For default functionality of PhiloLogic bibliographic control, we require on a selected subset of the 15 base data elements of the Dublin Core. The 15 base elements are (shown with ATE specific contents):

<head>

<meta name="DC.title" content="TITLE GOES HERE">

<meta name="DC.creator" content="AUTHOR GOES HERE">

<meta name="DC.publisher" content="PUBLISHER INFO HERE">

<meta name="DC.date" content="YEAR FIRST PUB">

<meta name="DC.type" content="GENRE OR TYPE">

<meta name="DC.identifier" content="SHORT CITATION">

<meta name="DC.contributor" content="Editor or other">

<meta name="DC.subject" content="TO BE DETERMINED">

<meta name="DC.format" content="ATE">

<meta name="DC.language" content="fr">

<meta name="DC.description" content="TO BE DETERMINED">

<meta name="DC.relation" content="TO BE DETERMINED">

<meta name="DC.coverage" content="TO BE DETERMINED">

<meta name="DC.source" content="TO BE DETERMINED">

<meta name="DC.rights" content="ARTFL:1998">

</head>

This information is mapped to a refer bibliography containing the following information in the standard layout which is used by PhiloLogic to build the bibliographic data handler:

%a Rousseau, J.-J.

%T Discours sur Sciences et Arts

%D 1750

%Y traite ou essai

%P In Oeuvres Completes, T.3. Paris, Gallimard, 1964.

%S DiScA

At this time, the default bibliographic representation allows for searching on

    • Author (DC.creator)
    • Title (DC.title)
    • Date of Publication (DC.date)

and displays in addition to the above,

    • Publisher (DC.publisher)
    • Short Citation (DC.identifier)

We will certainly be implementing optional treatment for other DC fields, as defined below and/or by the default DC specification. Most important will be

    • Genre or type (DC.type)
    • Subject (DC.subject)

ATE adapts the Dublin Core specification to reflect the kinds of information that we typically want to have in basic database functionality. Some of these adaptations may not fit with the semantics as are being defined by DC. Finally, as noted below, PhiloLogic does not currently do anything with a number of DC elements. We strongly recommend that these are used, as specified, for future compatibility. We expect that some or all of the materials will be used in PhiloLogic and for other systems.

Recommended Dublin Core Information

The following indicate how ARTFL and other internal projects will be coding documents using Dublin Core for use in PhiloLogic.

All information should use HTML character entities for accents, full names, titles and other data (rather than old ARTFL abbreviations), except where specified below. It is not required to use specific ARTFL backward compatiblility notes which are provided here to faciliate our internal transition. Each field is discussed below.

<meta name="DC.title" content="TITLE GOES HERE">

Using full HTML character entities, we want, as far as possible, to use the DC.title to reflect a bibliographic entry for the title of the document. In cases where this work is extracted from a "complete works" or other compilation, the title should refer to the individual work. Edition or compilation will appear in the DC.publisher field (see below). Example:

<meta name="DC.title" content="Poésie: Les Poèmes Dorés, Idylles et Légendes, Les Noces Corinthiennes">

It is prefered if the title can be kept on a single very long line rather than several lines.

<meta name="DC.creator" content="AUTHOR GOES HERE">

Using full HTML character entities, we want, as far as possible, to use the DC.creator to reflect a bibliographic entry for the author(s), showing last name first, followed by a coma, followed by the rest of the name. Thus Jules-Amédée Barbey d'Aurevilly should be encoded as:

<meta name="DC.creator" content="Aurevilly, Jules-Amédée Barbey d'">

There will be confusing cases, which will have to be resolved when they arise. Multiple authors should be encoded seperately using multiple DC.creator lines.

<meta name="DC.publisher" content="PUBLISHER INFO HERE">

Using full HTML character entities, we want, as far as possible, to use the DC.publisher to reflect a bibliographic entry for the publication information related to the document. In most cases, this will be the printed edition of the document which was scanned or keyboarded. In the simple case of a reprinted document, without an editor, this would be encoded:

<meta name="DC.publisher" content="Paris: Droz, 1951.">

However, in order to remain backwards compatible with the layout of the existing ARTFL publication information, we will also duplicated editor's names in DC.publisher (and ALSO put this information in the DC.contributor field). Thus

<meta name="DC.publisher" content="Ed. A. Garnier et J. Plattard. Paris: Droz, 1951.">

In cases where the document has been extracted from a collection or complete works, this should also be encoded here:

<meta name="DC.publisher" content="Oeuvres complètes de Montesquieu, Roger Callois, ed. Vol.1&2, Paris: Bibliothèque de la Pléiade, 1949-1951.">

It is important to recognize that the information in DC.publisher is provided primarily as a bibliographic control for users of ARTFL databases and will not serve as components for searching the bibliography. This is a non-standard use of the DC.publisher tag, which is supposed to denote

The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity.

<meta name="DC.date" content="YEAR FIRST PUB">

ARTFL has used a single four digit date to indicate either year of first publication or year of composition. This is a vital field, since ARTFL results are sorted, by default, in chronological order. Ideally, the DC.date will have only a single year:

<meta name="DC.date" content="1789">

This is prefered because it provides backwards compatibility with the main ARTFL database. I expect that we will have to implement slightly amended date managers in the system to reflect 1) works that are published over a range of years, ie. 1789-1794 and 2) works that cannot be assigned a single date. Thus, only where required, represent these as:

<meta name="DC.date" content="1789-1794">

<meta name="DC.date" content="1250c">

This use of the "DC.date does not correspond precisely to the usage of Dublin Core, which is a "date associated with the creation or availability of the resource" and typically includes date:month, and can have time of day as well. PhiloLogic can be configured to treat the dates as either arithmetic values or as strings, so use of standard DC dates is supported (in string mode).

<meta name="DC.type" content="GENRE OR TYPE">

The content of the DC.type is open. For internal ARTFL databases, the content should be selected from the following list of ARTFL classifications, which we received from INaLF:

correspondance

eloquence

melanges litteraires

memoires

pamphlet

poesie

recit de voyage

roman

theatre

traite ou essai

For backwards compatibility with ARTFL, we will NOT use HTML character entities for the DC.type specification. Thus, <meta name="DC.type" content="melanges litteraires">.

Note, this is NOT a standard use of the DC.type specification, which is supposed to select from an unspecified range of values, such as "home page, novel, poem, working paper, technical report, essay, dictionary" and so on. ATE and PhiloLogic can use any of these values, but DC is working on the DC.type element and has not yet come to a conclusion or even a list. For non-ARTFL databases, feel free to consult the DC site and select appropriate materials.

<meta name="DC.identifier" content="SHORT CITATION">

The ARTFL engine requires a "Short Citation" Code in order to provide a reasonable level of source identification in KWIC and Concorance reports. We will use (or abuse) the DC.identifier field, defined as

A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names are also candidates for this element.

An ARTFL short citation code consists of no more than 16 characters, indciating the author's last name and a title code. Typically, these can use a string of consonants, which allows the user (or systems) to link to full bibliographic records. Example:

Mntgne, Essais1

Mntgne, Essais2

Mntgne, Essais3

Mrmntl, PoeFr

Rss, LeBea

Vltr, Toler

Vltr, EmRu2

Arnd, CoCom

Bnnt, ConNa

Chmfrt, JeInd

Thus, a short citation shall be entered as

<meta name="DC.identifier" content="Mntgne, Essais3">

Note that this is an ARTFL internal identifier only and does not attempt to build an Internet compatible identifier. For other databases, any reasonably short string should work. The Dublin Core specification suggests that this might be a ISBN or URN.

<meta name="DC.contributor" content="Editor or other">

The DC specification reads:

A person or organization not specified in a CREATOR element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a CREATOR element (for example, editor, transcriber, and illustrator).

We are not currently using this specification in PhiloLogic, but it is clear that we will want to. In ARTFL databases, editors have been put in the publisher field. All other databases should use this field as defined.

<meta name="DC.subject" content="TO BE DETERMINED">

Currently not used in PhiloLogic. Defined as:

The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemas is encouraged.

We will want to use this.

<meta name="DC.format" content="ATE">

Currently not used in PhiloLogic. Defined as:

The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. For the sake of interoperability, FORMAT should be selected from an enumerated list that is under development in the workshop series at the time of publication of this document.

ATE is the format. I am not certain about how useful this will be.

<meta name="DC.language" content="fr">

Currently not used in PhiloLogic. Defined as: Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with RFC 1766. See: http://ds.internic.net/rfc/rfc1766.txt

Might be useful.

<meta name="DC.description" content="TO BE DETERMINED">

Currently not used in PhiloLogic. Defined as:

A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.

<meta name="DC.relation" content="TO BE DETERMINED">

Currently not used in PhiloLogic. Defined as:

The relationship of this resource to other resources... Users and developers should understand that use of this element is currently considered to be experimental.

For large textual databases, this should be handled internally. We will probably not implement anything in PhiloLogic for this.

<meta name="DC.coverage" content="TO BE DETERMINED">

Currently not used in PhiloLogic. Defined as:

The spatial and/or temporal characteristics of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element is currently considered to be experimental.

Not planning to do anything with this, at the current time. Could be useful.

<meta name="DC.source" content="TO BE DETERMINED">

Currently not used in PhiloLogic. Defined as:

A string or number used to uniquely identify the work from which this resource was derived, if applicable. For example, a PDF version of a novel might have a SOURCE element containing an ISBN number for the physical book from which the PDF version was derived.

I suspect that we could move the contents that we have in for DC.publisher here. Any suggestions?

<meta name="DC.rights" content="ARTFL:1998">

Currently not used in PhiloLogic. Defined as:

A link to a copyright notice, to a rights-management statement, or to a service that would provide information about terms of access to the resource. Formal specification of RIGHTS is currently under development. Users and developers should understand that use of this element is currently considered to be experimental.

So, we'll just specify the project and year of data creation, if it is an internal database project.

We might want to use this as a way to set file access control limits. In ARTFL, for example, we have documents that have 3 page limit (both ways from a hit) and 300 characters. A controlled vocabulary could be defined.

ARTFL Project, University of Chicago.

Revision Date: date here, please

Leonid Andreev, leonid AT math DOT harvard DOT edu

Mark Olsen, mark AT barkov DOT uchicago DOT edu