Sumero-Akkadian

By Miller Prosser, May 2017

Introduction

For over 3,000 years, numerous cultures throughout the Near East used a borrowed and adapted version the Sumerian cuneiform logosyllabic writing system to express their native languages. This writing system, in various customized iterations, was used to write Sumerian, various dialects of Akkadian, Hittite, Hurrian, Ugaritic, Elamite, and others. It was employed over a broad chronological and geographical range, from personal names attested in early third millennium Early Dynastic IIIa texts from Tell Abu Salabih and Fara to likely the first century A.D. (cf. Hunger and de Jong [2014] ZfA 104/2, pp.182-94).

Over a long and complex history, the Sumerian values of the hundreds of signs were adopted and adapted by scribes of later languages. Additional values for signs also developed and entered various scribal traditions.[1] The result is a writing system that is very complex when viewed as a single system.

First, any given sign can express various values. These values fall into three basic functional categories: phonogram, logogram, and determinative. This tripartite definition is reductionist, but provides some basic functionality, both for creating and formatting textual data.[2] 

Why Create a Sign List in a Database?

The OCHRE Data Service of the Oriental Institute of the University of Chicago supports philological projects studying the languages of the ancient world. A number of these projects study texts written in Sumero-Akkadian cuneiform scripts. To validate data entry, to provide consistency in display format, and to create a fully linked and defined network of textual data required the creation of a standardized cuneiform sign list. This sign list would include every known Sumerian sign, with each sign defined by every possible reading and every possible allograph.[3]

The sign list is available at https://onlinepublications.uchicago.edu/signary/. This digital publication is based on data in the OCHRE database. It is live and dynamic, pulling data that has been published from OCHRE for consumption on the web.

Defining Terms

A phonogram, also called a syllabogram, is a sign that represents a syllable. There are four basic syllable types expressed by phonograms.

V – a single vowel: a, i, e, u

VC – a vowel-consonant combination: aš, ib, il etc.

CV – a consonant-vowel combination: nu, pa, ša, etc.

CVC – a consonant-vowel-consonant combination: bad, zar, tum, etc.

In transcription, phonograms are usually written in lower case, often in italics. A sequence of phonograms in a single word is typically separated by dashes. The sequence of syllables “sounds out” the word. Vowel length is not indicated in the writing system, which leaves some uncertainty. Note in the following examples, the length of the a-vowel is not expressed in the cuneiform writing system and is not added as a level of interpretation to the transcription. Only in the normalized transcription do scholars indicate vowel length.

šar-ru, šarru, “king”

ḫu-ra-ṣú-um, ḫurāṣum, “gold”

A logogram is a sign, or group of signs, that represents an entire word. A logographic writing does not represent a phonetic expression of a word: BAN, “bow” i.e. the weapon.

(Behind this reading lies a grammatical expression of the word qaštu.)

A phonetic complement can be added to a logogram to help disambiguate which word the scribe intended: BAN-tu, which would likely indicate qaštu.

A determinative is a sign that marks the semantic class of the word of which it is a part. Examples of typical semantic classes include, material, animal, vocation, and geographic element. There are only a few dozen determinatives. The determinative sign can occur at the beginning or end of the word. When transcribed, a determinative is usually superscripted. There is no consistency of capitalization across various transcription traditions. There is usually no space between the superscripted determinative and the adjacent transcription. For the purposes of text importation into OCHRE specifically, a space indicates a word separator. Therefore, a determinative should not be separated by a space.

GIŠBANMEŠ, “bows”

GIŠ is the determinative for items made of wood. MEŠ is a plural marker, usually classified as a determinative, although it would be more accurately described as a grammatical determinative, a marker of plurality.

URUú-ga-ri-it, “(the city of) Ugarit”

URU is the determinative before city names.

There are three determinatives that are transcribed with abbreviations: d for DINGIR before divine names, m for DIŠ before a male proper name, and f for MÍ before a female proper names.

di-lu, “the god Ilu”

mḫa-am-mu-ra-ap-pi, “Hammurabi“

fša-ri-el-li, “Šarri-elli”

NOTE : DIŠ is sometimes abbreviated as l (a single lower case letter L) instead of m, which is meant resemble the single vertical stroke of the DIŠ sign.

Scope

The Standardized Cuneiform Sign List represents an attempt to model a universal and comprehensive list of Sumero-Akkadian signs. The list is standardized in the sense that it represents an attempt to create an architecture into which every sign from every logosyllabic writing system in the Sumero-Akkadian tradition can be recorded. Signs and values used in Old Assyrian are recorded alongside signs and values used in Elamite, Hittite, and various dialects of peripheral Akkadian. As such, this database of signs represents more than a sign list for any given language, dialect, or period.

On a practical level, the sign list is used for importing and performing automated processing of textual data at the data ingestion stage. The import process recognizes the transcribed text based on project-level formatting choices, identifies the signs, and creates the relationships between the epigraphic units and discourse units. In short, when given a text transcription that follows the necessary transcription rules, the import process creates a fully linked and modelled database item, i.e. a text. 

Short Description of Sign Lists

Albeit fictional, we refer to the logosyllabic writing system as a single thing. There were various versions of the writing system over time and in various locations. When we model these writing systems in a database environment, it becomes a single thing. This is not too different than the result of compiling a printed sign list.

Transcription Guidelines

Each project has the flexibility to decide some of the transliteration guidelines it wishes to follow. Once established, the guidelines must be followed rigorously. The import process parses the transcription and uses the guidelines to import, format, and link the textual content. In addition to the stylistic choices shown above, the user can also decide which epigraphic sigla they wish to use. For example, projects typically use square brackets to indicate complete damage or missing-ness and half brackets for partial damage. There are 40 categories or types of epigraphic sigla available for use.

Text as Data

To support researchers wishing to atomize their texts down to the level of the readings of signs and to record these observations as data in a database environment, it is necessary to establish a standardized sign list and a useful and meaningful user interface. It is necessary for the researcher to adhere to a formatting and transliteration schema. Mixing of schemas and using exceptions cannot be allowed. 

Here are the first few lines of an Akkadian text from Ras Shamra as given to the import tool.

Recto

iš-tu UD.DIŠ.KAM an-ni-im

a-na pa-ni mam-mi-is-tam-ri

DUMU níq-me-pa LUGAL URUú-ga-ri-itKI

Note that the transcription given to the database does not contain additional codes that would serve to markup the document. There are no complex prefixes to set off words or lines. Further, there are no tags or elements to define or classify the content. These cumbersome elements are removed from the interface. Instead, the import tool uses project-level specifications to ingest the formatted transcription. This strategy allows the researcher to import already existing transcriptions from their own files or from other sources, with very little technical formatting. 

Figure 1 – Transcription Options 

The user can load or copy from a formatted document (like Word .docx) and paste into OCHRE. Alternately, transcriptions can be typed directly into a pending content pane from which the import tool will ingest the textual data.

The import tool examines the transcription and creates the following text in the database. Based on a few settings selected at the time of import, the database understands the section heading, the horizontal dividing line, each individual sign, and the relationship between signs and words. The import can assign line numbers automatically. 

Figure 2 -- Imported Text 

A click on any sign in a text reveals the links created on import. In this case, the transcription of the first sign in line one is recognized as the reading iš of the IŠ sign. The script unit that represents that sign is linked to the epigraphic unit. The transcription given at the time of import populates the transcription field. No allograph has been specified.

Figure 3 -- Linked Script Unit 

The Script Unit

In the OCHRE system, a script unit represents the minimal unit of writing. In alphabetic writing systems, the script unit is a letter. In the logosyllabic Sumero-Akkadian writing system, a script unit is a sign.

Each script unit is characterized by readings and allographs. An allograph is an alternate written form of a script unit. Any given script unit in the sign list may have various readings and allographs. Alphabetic writing systems typically have one reading per letter. Some systems, like Hebrew square script, have multiple allographs, like the final letters. Logosyllabic writing systems typically have many readings per sign and can have multiple allographs. In the complex Sumero-Akkadian writing systems, the reading represents the various phonographic, logographic, and determinative values of the logosyllabic sign. Because the sign list represents the writing systems from various locations and periods, the use of allographs is necessary to distinguish the various graphic realizations of signs. The use of allographs also allows the creation and use of language-specific Unicode fonts.

The IŠ sign, for example, is defined by eleven phonographic values and two logographic readings. At the time of this writing, this script unit had been linked to over twenty-three thousand epigraphic units across various research projects. 

Figure 4 -- The IŠ Script Unit 

Linking Data

Beginning with the smallest observable unit and moving to larger units like words, we model textual data as follows. A script unit, defined by its reading and allograph, is linked to an epigraphic unit. The epigraphic unit is the unit of observation that contains the various details observed for a single sign. It links to the script unit. The epigraphic unit can be described with metadata regarding damage, scribal mistakes, sign placement, and the level of certainty of reading.

Each epigraphic unit is linked to a discourse unit. Typically, a series of epigraphic units is linked to a single discourse unit, making up a word. The discourse unit, when it represents a word, is linked as an attested form in a project lexicon. The attested from is linked to constituent epigraphic units. So, the web of links spans from epigraphic, to discourse, to semantic. 

Figure 5 -- Discourse Unit with Links 

The epigraphic observation includes multiple types of observations. It records the identification of the sign (IŠ), its reading (iš), and an allograph if necessary. In most cases, the identification of the sign is made by the import tool. Most scholars do not transcribe a text by first identifying the name of the sign. The scholar typically jumps immediately to an interpretive level by transcribing a reading of a sign. The epigraphic observation also includes any metadata specifying the certainty of the transcription.

The discourse observation can include various elements. It can include a phonemic interpretation, sometimes called a normalization. It can include a translation. It can include analytical properties.

Discourse units can be grouped together in larger discourse units, from compound words, to phrases, sentences, and paragraphs. At every level of observation, the user has the option to record properties and metadata. Chief among these properties and metadata are time periods and discourse analysis such as prosopography, geographical, or economic analysis.

The level of semantic interpretation is kept separate from the epigraphic observation. While the import process may identify a sign as a logogram, it remains agnostic about the semantic force of the logogram. The semantic identification is made later. At the level of discourse unit where the discourse unit is a word, various wizards help the user perform grammatical parsing, prosopographical analysis, and geographic analysis. At this stage, the user can address the semantic interpretation of the word.

Figure 6 -- Lexical Analysis Wizard 

By performing grammatical analysis with the help of various tools, the user can create a lexicon. Following upon the data model outlined above, the words added to the lexicon link to attested forms in texts, which include links to constituent epigraphic units and script units. In the end, the database create a vast network of textual data. 

Figure 7 -- Lexical Entry 

Organization and Definition

Each sign in the standardized sign list in OCHRE has a unique name, usually the Sumerian sign name. MZL and ABZ sign numbers serve as aliases. As mentioned above, each sign is defined by readings and allographs. A sign may have a single reading or multiple readings. The readings are classified as either phonograms, logograms, or determinatives. Each reading is unique. There are no duplicated readings across all signs. In the Sumero-Akkadian logosyllabic writing systems, a single script unit (i.e. a sign) can be written in various ways. These various writings of the same sign are allographs. Sometimes an allograph is tied to a specific usage (like a specific phonogram) or period (e.g. Old Assyrian). The allograph tab on the script unit represents the various writings of the sign. Each allograph can have a name, a Unicode character code, and links to other script units where applicable. While each script unit and each reading must occur only once in the sign list, there is no such restriction on allographs. However, it seems unlikely that there will be a duplicated allograph.

Numerical Values

For those logographic readings that represent a numeral (e.g DIŠ), the reading is defined by a property that indicates the numerical value of the logogram. The database, therefore, has the ability to calculate numerical values of signs. 

For a complete list of numerical values in the Sumero-Akkadian writing system, see the OCHRE Signary.

Figure 8 -- Numerical Value 

Because there are sometimes various signs used as logograms for a single number (like 6 or 60 for example), the user should use the logographic name of the specific sign they wish to record. For example, one may transcribe 6DIŠ or 1ÀŠ for the number six, depending on which signs appear on the tablet. Further, the user can specify which allograph of ÀŠ was used. (There are currently three allographs for this sign, meant to represent the most common forms used by scribes.)

Various scribal practices and various transliteration schemes intersect to create many contradictions. Our system tries to strike a balance between an accurate sign list and a usable input method. We also try to accommodate the Unicode points for the various number signs. Our method allows the user to specify the exact numerical sign they read in the text. For example, for the number 2, the user can transliterate 1MIN, 2DIŠ, or 1AŠAŠ.[4] To take advantage of the built in numerical features, the user must supply a number before the reading, even if that number is 1. This multiplier number cues OCHRE to interpret this logogram as a number. For example, if the text has the logographic value IGI for 1,000 then the user must provide 1IGI to the text import. Providing IGI without the preceding number will create a link to the logographic value IGI, but it will not tell OCHRE to treat this logogram as a numeral.

Numbers 70, 80, and 90 do not have discrete Sumerian signs; however, we have character codes for these three. If a user wants to use these specific characters, they'll need to provide 1SEVENU, 1EIGHTU, or 1NINEU as if these were logograms. If the user doesn't care to use these specific characters,they can provide the normal numeric syntax in the pending content, which would be 7U, 8U, and9U.

Note also that the number 600 is a combined sign, referred to in the sign lists and grammars as GÍŠ-U (and sometimes as GEŠU, which strikes me as inaccurate). Our sign and logogram are GÉŠU. 

Fractions

Fractions are treated in the same way as numerals, described above. However, there are a number of specific complexities that should be addressed. If the user wishes to link to specific signs for fractions, they must provide one of the accepted logographic readings AND they must provide the multiplier number before the logographic reading. For example, acceptable examples are 1⅔DIŠ (for 0.667) and 1ŠUŠANA (for 0.333). The sole exception to the use of the multiplier is for decomposed fractions. Specifically, forms like 2/3DIŠ cannot be written with a multiplier such as 12/3DIŠ.

NOTE: simply typing a fraction like ½ or 1/2 will not link to a sign reading, just like typing a number like 3 or 60 will not link to a sign reading.

NOTE: compound numbers should be transliterated as compound logograms, separated by dots, not plus signs. See the example below.

Here are some examples illustrating how the text import interface will interpret numbers: 

[1] Cf. Hoffner (2008): 9-24.

[2] It should be noted that there are various transcription traditions for each type of sign. Typical variations include capitalization and italicization. OCHRE does not impose any specific tradition upon any research. However, the project must decide upon a specific set of rules and adhere to it.

[3] For expediency’s sake, we began by documenting only the commonly used signs, leaving aside for the moment the sign that occur only in rare texts such as vocabulary lists. However, with the data model established and tested, the system is prepared to document all known signs in all dialects. For the OCHRE Data Service, this was more than an exercise, it was a prerequisite for our philology projects.

[4] When importing a text, we cannot allow thedash in AŠ-AŠ, because a dash is a syllable separator.