Research Notes‎ > ‎

Dates and Calendars

1        Introduction

2        What does Micro-History Data Need?

3        Time Zones

4        Computer Usage of Dates

5        Imprecision and Granularity

6        Calendars

6.1       Dual Dates

7        ISO 8601

7.1       Quarters

7.2       Calendars

7.3       XML Representation

8        STEMMA Dates

8.1       Date-value strings

9        Events

9.1       STEMMA Events

10         Glossary

 


 

1      Introduction

This paper discusses the usage and representation of dates in micro-history data. It considers variations of dates around the world, both now and historically, and considers their expression in computer-readable form such as STEMMA®.

 

Clock times are not examined in great depth on the basis that they would nearly always be used to represent very recent events, and the modern 12-hour and 24-hour clock times are now ubiquitous.

 

2      What does Micro-History Data Need?

Some people reading this may wonder why dates are such an issue. Surely, you just need a convention for writing them that isn’t susceptible to the well-known day-month/month-day ambiguities. GEDCOM copes with a simple “dd MMM yyyy” format (e.g. “12 OCT 1954”), right?

 

Well, no, there’s more to it than that. Even GEDCOM suffers from portability problems because some products do not adhere strictly to its prescribed convention. However, that date format is still an English form and contrasts with the ISO 8601 international standard. The ISO standard, in turn, only copes with the Gregorian calendar and so cannot be used for other calendar systems such as the Julian calendar. On top of all this, the date will have some degree of imprecision or granularity associated with it that must be represented unambiguously.

 

At the heart of this is a fundamental requirement that some people may not appreciate at first. There is a written form of a date and a computer-readable form, and both are required in the micro-history data. The written form may have been transcribed from some external source but it is still essentially a humanly readable version of that date. A computer-readable version, on the other hand, may be a binary form (i.e. non-printable) or textual; the essential characteristic is that it has a strict definition and unambiguous interpretation. A computer-readable form of a date (including any imprecision or granularity) is required for any type of temporal analysis, including presentation of timelines, or calculation of event constraints.

3      Time Zones

Although Time Zones (TZ) and Daylight Saving Time (DST) are usually applied to local clock times, they can also apply to local calendar dates. The importance for micro-histor is going to be slim at best but the area should be clarified.

 

ISO 8601 does not include any specific TZ designator; time values are either ‘local time’ or expressed with an offset relative to UTC (Coordinated Universal Time). Local date/times should be interpreted in the context of the data location rather than the current location of the end-user, although this would only be significant when creating a timeline across TZ boundaries.

 

4      Computer Usage of Dates

It was mentioned above that computers will require a version of a date that they can read, sort in relation to other dates, and for which they understand the imprecision or granularity associated with it.

 

In reality, all events occur at a specific instant of time, and maybe finish at a different one. Ignoring Einstein for practical purposes, those instants in time are absolute across the globe, and are independent of who is interested in them, where they live, and how they count and represent their own dates and times. However, in practice, there are several things that affect this view of events.

 

Two people recording the same event in different cultures may be using different calendars. In other words, they may have different schemes for recording the passing of days, or the passing of time within a day, and different ways of expressing it. It is still the same event and the same instant in time, but there is more than one way of measuring and recording it, and that can make it difficult to correlate different measurements.

 

Even within the same calendar, there may be different schemes around the world. For the Gregorian calendar, this usually equates with Time Zones where the start of a 24-hour day occurs at a different instant in time. Again, this is merely a measurement difference. Saying an event occurred at 13:20 UTC or 5:20am in PST (US) doesn’t affect the event itself or the instant in time that it occurred at.

 

Worse still, we even have different ways of writing a specific measured date or time around the world. Writing a date as 9-Jun-1956, or June 9th 1956, or 1956-06-09, or 9 juin 1956, or (ambiguously) 9/6/56 is simply a cultural difference or a personal preference. It’s not even a difference in the measurement scheme.

 

So, should we burden computers with all of this baggage? Should we expect a computer to be able to decipher all our human representations of a date and time, and from all possible calendars? This would not only be a bad design, in software terms, but it is simply impossible or impractical in the general case.

 

Finally, we have granularity and imprecision to think about. We may be identifying a specific month, or year, or century, rather than a specific day. We might also be unsure of the precise day, month, year, or century. These concepts might be implemented in software using a similar mechanism but their semantics are distinct.

 

The ISO 8601 standard defines a representation for the unambiguous exchange of dates and times. However, this has a number of problems, including the fact that it only relates to the Gregorian calendar. Anyway, that’s an exchange format and, in most situations, is not used for comparative purposes.

 

Computer languages and databases generally use a binary format for representing dates and times that are not only far more efficient, and smaller, but totally devoid of all those issues mentioned above. For instance, a timestamp is the representation of a date and time as an offset from some fixed epoch point. Examples might be milliseconds from 1970-01-01T00:00:00Z (which is a UTC epoch), or 100-nanosecond units from the Smithsonian base date. Handling imprecision is a matter of having a value for both the upper and lower bounds, and granularity can be handled in a similar fashion despite the difference in semantics.

 

In principle, then, the instant in time associated with any event can be represented in terms of one or two timestamps. In practice, though, there are problems with knowing the precise synchronisation between all possible calendars and that makes an algorithmic conversation between them and the Gregorian one very difficult. For instance, the Hindu calendar is in common usage but its dates do not always have an agreed correspondence in the Gregorian calendar. Government of India documents carry three dates: the Christian date, and two disparate Hindu dates.  That has provided an officially recognised synchronisation for the past 50 years, or so, but the association was unreliable before that.

 

From the point of view of micro-history data, each date should be recorded according to its native calendar (leaving conversions to happen in software when comparisons or collations are needed), but that requires international standards for representing dates and times from other calendars and they do not yet exist. We’ll revisit this topic below but let’s assume for now that something akin to ISO 8601 exists for every calendar. That would allow the easy representation of all those dates for exchange purposes.

 

Computers can easy follow their existing paradigm and represent these dates as an offset from some epoch in their respective calendars. The remaining problem is how to correlate an internal timestamp from one calendar epoch with an internal timestamp from a different one. For something like Gregorian and Julian calendars, this may simply be a matter of knowing the exact offset of the two chosen epochs. For a less precise correspondence, though, it may require an adjustment to the imprecision of the timestamp value. However, that should only occur during the correlation of those dates – neither the conversion nor the adjustment to the imprecision should be persisted back to the original data.

5      Imprecision and Granularity

These terms refer to distinct attributes of any date value. They’re differentiated in the glossary because they are often confused or lumped into one concept.

 

In micro-history, a recorded date value almost never represents a fleeting moment in time. It has a granularity associated with it which is usually a day (e.g. X was born on 1903-03-17) but it could be a month, a year, a century, etc. It can even be a yearly quarter (e.g. X’s birth was registered in 1903-Q1).

 

Irrespective of the granularity, there may be some degree of imprecision too. Most often, this is a range of values for the elements of the said granularity, e.g. January to July of 1866, or 1911 ± 1. STEMMA Dates incorporate both concepts and its main Web pages provide a table indicating how the two are used together.

 

There are also discontinuous forms of imprecision that must be treated separately. For example, you might know someone’s birthday was on 17th March but you don’t know the year. Similarly, you might know something happened during the spring of 1820-1825. The reason they need special treatment is that they cannot be represented as a valid date in computer systems or in any associated data standard. You may be able to set an upper and lower bound on the overall date but “granules” of specific information such as the day, month, season, etc., can only be used to annotate that range in some way. The aforementioned birthday case is known to cause some products to store a zero for the year, thus making the date useless for comparative purposes and probably illegal on some systems.

6      Calendars

A calendar is a mechanism by which dates are reckoned in a given culture, such as Gregorian or Julian. A list of worldwide calendars, including historical ones, may be found at: List_of_calendars. Another useful resource may be found at: Calendar FAQ.

 

As a small example, let’s look at the French Republican Calendar (also called the French Revolutionary Calendar) which was created during the French Revolution. It was partly designed to remove religious and royalist influences, and was one attempt at decimalisation in France.

 

It was adopted on 24 October 1793 but extended backwards to 22 September 1792. There was some initial debate over whether it should celebrate the revolution and begin in 1789 or celebrate the republic and begin in 1792. It was eventually decided that Year I of the French Republic began in 1792.

 

Years are usually written as Roman numerals, beginning with the year of the Republican Era. As a result, Roman numeral I indicates the first year of the republic, that is, the year before the calendar actually came into use. The first day of each year was that of the autumnal equinox.

 

There were twelve months, each divided into three ten-day weeks called décades. The tenth day, décadi, replaced Sunday as the day of rest and festivity. The five or six extra days needed to approximate the solar year were placed after the months at the end of each year.

 

Each day in the Republican Calendar was divided into ten hours, each hour into 100 decimal minutes, and each minute into 100 decimal seconds. Hence one hour was 144 conventional minutes, one minute was 86.4 conventional seconds, and one second was 0.864 conventional seconds.

 

Irrespective of how such dates were written in France, a computer exchange format only need know the individual elements (e.g. days, months, etc), their numeric ranges, and their relative ordering. Just as with the ISO 8601 all-numeric format, this results in a locale-independent representation that is also textually sortable. In this case, that might result in a format of “yy-mm-ddTh:mm:ss”. Note that the year only has a pair of digits since the calendar was not used for very long. The residual days in each year could be represented against a pseudo-13th month, or a special month value of 99. An epoch value corresponding to the start of the calendar would allow a timestamp concept to be used to internally represent all such dates and times as seconds from that epoch. The exact correspondence of that epoch with the (Gregorian-)computer one is then a separate issue that is independent of the data format.

 

The Julian Calendar was introduced in 46 BC as a replacement for the preceding Roman Calendar. Both of these have to deal with years BC. Although the ISO 8601 standard suggests that dates outside of a 4-digit range (including negative values) can be supported by mutual agreement of the respective parties, this isn’t a defined part of the standard. The presence of extra digits or a minus sign would change the assumed fixed width of the year field, and the minus sign may even be mistaken for a separator. The existing standard is also specific to the Gregorian standard and should not imply it is applicable to older calendars. For representing such dates, a different base year may need to be used in order to retain the inherent sort capability proposed above. This could be an arbitrary base or one with an historical precedent such as AUC.

 

At the time of writing, there are no international standards for the computer-readable representation of dates and times other than for the Gregorian calendar. See ISO 8601.

 

6.1    Dual Dates

This is a convention for representing multiple years in a single date that is rather messy and can be very confusing. It is also sometimes referred to as double dating since it may also apply to the day of the month in order to represent the Julian-to-Gregorian leap-year correction

 

The current Gregorian calendar was finally adopted by Britain and the British Empire (which included much of what is now the US) in September 1752[1]. Before that point, they used the Julian calendar. The start of the year was also changed from March 25 to January 1, although this didn’t quite happen at the same time in all locations. Until the Calendar Act 1750, there were alternative years depending on whether the Old Style (OS) or New Style (NS) calendar was in use.

 

The double years are mostly used to indicate both the Gregorian and Julian years in a date that precedes 25th March. For instance: 1 March 1740/41. This was common between 1582, when the first countries adopted the Gregorian calendar, and 1923, when the last European country adopted it. However, even before 1582, the year was still sometimes dual because different countries began their year on different dates. The British OS and NS just add an extra level of confusion to what should have been a straightforward calendar migration.

 

OK, so how does this affect storage of date values? Well, what was written in the records still needs to be recorded verbatim in our data – no change there. However, the machine-readable version of a date must be specific to a given calendar. For most uses (e.g. timelines, event correlation, etc), that will be the Gregorian calendar, although there is no reason why we shouldn’t store the additional Julian date too – What we need is a standard for presenting a date in a specific calendar (see Calendars), and what we don’t need is a standard that embodies the specific concept of dual dates as described here. So what about OS and NS? Well, these are particularly messy as they effectively split the concept of a Gregorian calendar into two, and we would need a separate designation for the “Gregorian OS” calendar (the NS version being the default).

 

Recognition of this format in modern genealogy varies. Family Historian accepts double years as an input format. IGI has no way of representing double dates. However, GEDCOM has a little-known syntax for representing an explicit Julian calendar entry using a “[J]” prefix, e.g. “[J] 1 Mar 1740”. See using_the_julian_and_gregorian_calendars for more details. STEMMA makes provision for an explicit calendar name, although it cannot make any recommendations about alternative names to “Gregorian”, or what their encoded values would look like, until there is a relevant international standard. However, STEMMA does provide a way of representing the same point in time as dates in multiple calendars, as would be required for dual dating.

 

In fact, these dual dates are just one specific case of a more general synchronised dates concept since several cultures have precedents for representing a given point in time according to multiple calendars.

 

In Israel, for instance, government documents usually carry a dual date embracing both the traditional Hebrew calendar and the Gregorian calendar.

 

The Indian Calendar Reform Committee, appointed in 1952 identified more than thirty well-developed calendars in systematic use across India. The two calendars most widely used in India today are the Vikrama calendar and the Shalivahana (or Saka) calendar. The Indian National Calendar is a formalised version of Shalivahana, created by the Indian Calendar Reform Committee in 1957 that has an agreed synchronisation with the Gregorian calendar. Many official documents carry both a Gregorian and National Calendar date, and sometimes Vikrama too resulting in a triple date.

 

7      ISO 8601

This ISO_8601 standard was first published by ISO in 1988 and concerns the exchange of date and time related data. The standard was designed to provide an unambiguous representation of dates and times when exchanged between different countries, particularly when they’re expressed numerically. A brief discussion of the potential ambiguities in the absence of a standard may be found at: date_and_time_format.

 

The standard was originally designed to replace older standards on numeric date/time representations (ISO 2014), week dates (ISO 2015), ordinal dates (ISO 2711), and a number of time-related standards. It was revised in 2000, and again in 2004.

 

The essential elements are:

  • All-numeric representation of dates and/or times. This removes any locale dependency caused by the use of day names or month names.
  • Fixed-width zero-padded fields to prevent ambiguous interpretations.
  • Ordering of fields from coarse-grained to fine-grained, thus giving a natural sorting capability.
  • Truncation of dates or times to cope with different granularities, e.g. yyyy-mm to represent a month rather than a day.

 

This standard often gets criticised, although not always for good reasons. So what are its failings?

  • It deals exclusively with the Gregorian calendar.
  • It does not define an alternative XML representation of the data.
  • It does not allow yearly quarters to be specified.
  • The primitive syntax is overloaded with such variations as durations and repeating intervals, and has already caused ambiguities that had to be removed.
  • It addresses granularity but not imprecision.

 

The standard could be improved as follows:

  • Define a subset of ISO 8601 with its own specific designation. This would encompass the most common usages as described below (e.g. by W3C).
  • Define a true XML alternative to the existing alphanumeric string format.
  • Define a related standard that represents the basics of dates and times in an arbitrary number of calendars. Ideally, the ISO 8601 subset, mentioned above, would be one part of that standard relevant to the Gregorian calendar.
  • Enhance the existing ISO 8601 standard to represent yearly quarters (see below).

7.1    Quarters

The all-numeric ISO 8601 format does cope with some degree of granularity:

 

    yyyy-mm-dd          (for a known day, e.g. 1956-06-09)

    yyyy-mm                (for a known month)

    yyyy                        (for a known year)

 

Unfortunately, some historical records are registered on a quarterly basis and the above schemes do not allow their values to be recorded accurately. The most obvious example in the UK is the GRO index of civil registrations of births, marriages, and deaths. This requires an indication of the relevant yearly quarter in addition to the year itself. For instance:

 

    yyyy-Qq                  (e.g. 1956-Q2 for April-to-June of 1956)

 

The yyyy-Qq syntax is deliberately put forward here since it is a simple and unambiguous extension of the standard as it currently exists, and is totally in keeping with the way it already addresses week numbers within a year (i.e. yyyy-Www as in 1956-W25).

 

There is no known requirement to address days relative to each quarter in the way that the existing standard addresses days relative to each week in the ‘Week dates’ format.

 

FHISO have already approached ISO/TC 154 about incorporating this extended syntax in a future revision of the ISO 8601 standard.

 

Unfortunately, this syntax breaks the natural sorting capability since 1956-Q2 does not sort anywhere close to 1956-04 to 1956-06, but this was already broken by the standard’s syntax for week numbers.

7.2    Calendars

Unless there is an international standard for representing dates in other calendars then there is a serious hindrance to the representation of historical and worldwide dates in a digital format. This, in turn, has repercussions for concepts like the Semantic Web as well as a standard representation for micro-history data.

 

The Wikipedia ISO_8601 page confuses the issue of calendars by pointing out that a bridging of the Gregorian and Julian calendars by the standard would contravene the requirement of continuity. In fact, even if there were no discontinuity, this would be wrong as those calendars are entirely separate entities.

 

The Society of American Archivists (SAA) adopted DACS (Describing Archives, A Content Standard) in 2004. This mentions alternative calendar systems but only from a written point of view as opposed to a digital one. Their Standards for Archival Description, Chapter 7 (Codes), does mention the Julian calendar but only in the context of ordinal dates.

 

The MSS Working Group discusses a number of issues related to date/time representation, including dates from non-Gregorian calendars.

 

The ISO standard should not only be explicit about its focus on the Gregorian calendar, but it should be supplemented with a related standard that can accommodate dates in other calendars. That related standard need only consider the representation of common dates and times in each calendar – i.e. excluding such things as durations, repeating intervals, etc – and adopt the same principles as discussed above. That is, all numeric and with fixed-width fields ordered from coarse-grained to fine-grained.

7.3    XML Representation

There are many applications where a simple date/time has to be represented in an unambiguous way. This is usually at the exclusion of all the variations and flexibility afforded by ISO 8601 and suggests a subset of it needs to be defined.

 

XML is already a globally accepted standard for the representation of structured data. In addition to the simple text-string syntax, there is a requirement for a more precise and more flexible representation of dates and times in XML. This would allow fields to be extracted and utilised without having to parse a text string (for instance, in an XSLT application). The main use of this would be for the aforementioned subset rather than the complete standard.

 

The W3CDTF discussion note NOTE-datetime examines the need for a subset of the overloaded ISO 8601 standard that can be used on the Internet. It does not actually propose an XML representation though.

 

The US Library of Congress Extended Date Time Format (EDTF) proposes a similar subsetting of the ISO 8601 standard but does not mention XML at all.

 

As a related example, consider the ISO 6709 standard for representing geographic coordinates. ISO 6709:1983 originally used a single alphanumeric string representation, just like ISO 8601. However, ISO 6709:2008 additionally supported location representation through the use of XML, although that XML merely embeds an amorphous string representation rather than its discrete fields.

 

<gpl:GPL_CoordinateTuple xmlns:gpl="http://www.isotc211.org/gpl">

<gpl:tuple srsName="urn:ogc:def:crs:EPSG:6.6:4326">

35.89421911 139.94637467

</gpl:tuple>

</gpl:GPL_CoordinateTuple>

 

Notice that it defines a special namespace in order to ensure uniqueness of the element tags but there is no associated schema definition. This is probably because there is no actual structure to the XML. In keeping with good XML design, the fields of a date value could be broken apart, e.g.:

 

<iso:datetime>

<iso:date>

<iso:year>2012</iso:year>

<iso:month>4</iso:month>

<iso:day>16</iso:day>

</iso:date>

</iso:datetime>

 

 

8      STEMMA Dates

There are two distinct date representations in STEMMA: the date-value and the date-entity. These are equivalent for many purposes but the date-entity affords greater flexibility and scope. A date-value is encoded in a single text string, while a date-entity (see DATE_ENTITY) is a combination of elements and attributes.

 

<Date [ DATA_ATTRIBUTE ] … >

{ <Value [Calendar=’name’] [Margin=’err’]> std-date </Value>

|  { <Range [Calendar=’name’]> <Min> std-fulldate </Min>

   <Max> std-fulldate </Max> </Range> } } ...

[ NARRATIVE_TEXT ] ...

</Date>

 

The various evidence-related contexts where a date may be specified also allow a transcription of the original written form to supplement it.

 

The date-entity provides for both imprecision (via the Margin attribute) and granularity (via the syntax of the string specified by a std-date value). The table at the above link indicates how the imprecision is interpreted in conjunction with the granularity. It does not support discontinuous forms of imprecision in a computer-readable way.

 

The default value of the Calendar attribute is “Gregorian”. Although a general syntax is defined by STEMMA for strings representing dates in world calendars, the specifics for non-Gregorian calendars are not enumerated here.

8.1    Date-value strings

The STEMMA date-value syntax follows the main advantages of the ISO 8601 standard, i.e. locale independence, fixed-width fields, natural sorting, granularity. However, as explained above, it cannot extend the date-subset of ISO 8601 to deal properly with yearly-quarter granularity and still keep the natural sorting capability. The STEMMA date syntax is, therefore, not 100% compatible with the ISO 8601 syntax in the case of the Gregorian calendar.

 

For the Gregorian Calendar, the full STEMMA date syntax is:

 

yyyy-mm-dd[:[xx]]

 

where xx is an optional number of days expressing  the granularity. The xx defaults to 1 if not present. There are truncated forms, just as with the ISO standard. In each instance, the granularity field occupies the same number of digits as the terminating date field, and defaults to 1 if not present. Hence:

 

yyyy-mm[:[xx]]

yyyy[:[xxxx]]

 

One consequence of this syntax is that the basic ISO date forms are still valid, e.g.

 

yyyy-mm-dd

yyyy-mm

yyyy

 

However, it gives much more freedom for expressing non-day granularities, including yearly quarters. For instance:

 

1901:0100     20th Century            (1901-2000)

1960:0010     1960's                        (1960-1969)

1956-02:03    1956-Q2                     (1956-04 to 1956-06)

 

This syntax is powerful but covers many more cases than we'll likely need - for any calendar system. It's tempting to think that the syntax should be used sparingly to represent only these common everyday notions but we could use it to represent concepts such as:

 

1967-06-22:93    Summer 1967    (1967-06-22 to 1967-09-22)

 

This is the astronomical definition of Summer (Summer Solstice to Autumnal Equinox) but the meteorological definition is more subjective. Hence, this is unlikely to be a notion we can map to a standard date syntax.

 

Another advantage of the STEMMA syntax is that it sorts correctly for all granularities, so long as there is a colon (‘:’) present since ‘:’ sorts after ‘-‘. This may be ensured by simply appending any missing ‘:’ before entry into a system where comparisons are likely to occur, including hashtables and databases. For instance:

 

1901-01-31    (day)

1901-01:03    (quarter)

1901:              (year)

1901:0001     (year)

1901:0010     (decade)

1901:0100     (century)

 

In other words, for the same common starting point (a month in the first two cases, and a year for all these cases), the entries are sorted according to the size of their granularity.

 

The syntax accommodates different calendar systems using an optional calendar ID prefix. This defaults to “GR” which indicates the Gregorian calendar.

 

#GR#2001-04-16

 

Other examples might be:

 

#JU#1666-09-02      (Julian Calendar)

#FR#02-09-20          (French Republican Calendar)

 

The identification of the fields for other calendars, including their range and size, requires further research but the same principles will be upheld in all cases.

 

Although this allows the unambiguous exchange dates from world calendars – ancient and modern – it does not provide any indication of how a formatted version is generated for display to the user. This may require the help of a Date Authority on the Internet but STEMMA’s

http://stemma.parallaxview.co/date-mode namespace defines generic levels of formatting control (Short, Medium, Long, Full) that are well-established in the industry for the Gregorian case.

 

9      Events

The concept of an Event is common to several data models, although the specific implementations differ. Maybe the root concept in all cases is that an Event represents something that happened on a particular date. Questions like the following will differentiate the implementations:

 

  • Is there a Place associated with the event? Is it a single Place or can there be several?
  • Is there a single date associated with the Event, or multiple? Can it represent the start and end of a protracted Event?
  • Can it include a hierarchy of events in a complex case such as a world Event?
  • Can Events be ordered, irrespective of the imprecision (or absence) in their dates?
  • Are Events shared my multiple Person entities?
  • Does an Event reference Person entities, or do Person entities reference Events?

 

The latter issue has an interesting basis. If a product didn’t have the concept of an Event entity then it could still describe things that happened on particular dates. However, if the same occurrence affected the lives of multiple people then it would not have the capability to link them; there would be no focal point for that occurrence. That focal point – which would be an Event entity – must therefore have an id such that relevant Person entities can reference it.

 

9.1    STEMMA Events

STEMMA Events usually represent something that happened at a particular Place on a particular date, but it generalises the concept in order to represent protracted Events, i.e. ones with both a start and an end, and hence a duration. Either boundary (start or end) can be replaced with a reference to a finer-level Event, thus creating a hierarchy.

 

For protracted Events, ones that use an <Until> element to specify an end date, STEMMA allows further intermediate Events to be nominated using IncludeEventRef elements. Hierarchical Events are useful for adding granularity and structure to a complex Event such as a world one.

 

Although the associated Place is syntactically optional, one must at least be provided through the inheritance mechanism. In other words, all Events must have an effective Place. This inheritance mechanism is analogous to a base-class and derived-class (aka super-class and sub-class) in object-orientated programming. A typical use might be for a national census. The base-Event would define the date and common parts of the census citation reference (e.g. “RG10” for the 1871 census of England and Wales). It could also specify an appropriate citation template if a later design incorporates them. A derived-Event could then specify, for instance, the details of a particular household during that census.

 

A particularly useful expressive feature of STEMMA is that it allows the imprecision around an Event date to be supplemented with constraints that order it in relation to other Events. For instance, you may only know a particular date to ± 5 years but you may also know that it occurred after Event A and before Event B. That extra information could later yield a more accurate date value. See EVENT_CONSTRAINTS.

 

10  Glossary

This section clarifies some of the terms used in this paper in case they are unfamiliar to anyone.

 

Calendar

A mechanism by which dates are reckoned in a given culture. For instance, Gregorian, Julian, etc.

 

Date

A measurement of the passing of days. Dates are reckoned according to a calendar.

 

Daylight Saving Time (DST)

The annual adjustment of clocks in a given TZ to increase the daylight hours in the evening and so reduce those in the morning. See Daylight_saving_time.

 

Granularity

When a given date is recorded, it might refer to a particular day (e.g. a date of birth) or something broader such as a month, year, etc. This isn’t the same as not knowing a specific day and so contrasts with imprecision. For instance, some registrations may only be recorded on a monthly or quarterly basis. It may be used when talking collectively about a group of dates such as 19th century newspaper editions. This is similar to a ‘period’ in the GEDCOM model.

 

Imprecision

A particular date may not be known accurately. This usually means that the actual value has an upper and a lower limit. These could be represented explicitly, or via a plus/minus from some mean. It is possible to conceive of less-common discontinuous forms of imprecision such as knowing a date was in the summer during a range of possible years. This is similar to a ‘range’ in the GEDCOM model

 

Time Zone (TZ)

The standard counting of a 24-hour day in a particular region. See Time_zone. The time in a given TZ was defined relative to GMT before 1972 but is now defined relative to UTC.

 

Timestamp

This term has several meanings (see Timestamp). In the context of this paper, it refers to the POSIX or UNIX interpretation which is an offset from a fixed time called the epoch (see Unix_time). Java’s Date class uses this representation and it means that its values are then locale-independent, and unaffected by either TZ or DST.

 

UTC

A time standard based on the International Atomic Time (TAI). See UTC. This succeeded GMT (Greenwich Mean Time) but can be viewed as synonymous for most daily purposes.


® STEMMA is a registered trademark of Tony Proctor.

[1] Calendar (New Style) Act 1750. http://www.legislation.gov.uk/apgb/Geo2/24/23.

Comments