Mixed Orthographies and Related Cataloging Issues

This document suggests guidelines for cataloging resources in Slavic and other languages that use various Cyrillic scripts (e.g., Central Asian languages, older Romanian, etc.) that are published in mixed orthographies. By the term “mixed orthography” we understand the presence of more than one system of scripts represented in a resource’s main sources of information, most frequently the title page. As official cataloging documentation, as mentioned above, is silent on the question of mixed orthographies and scripts, this paper was developed by the ABC SEES Pre-Revolutionary Orthography Task Force in order to suggest a cataloging approach for the descriptive part of the bibliographic record.

Background

Many languages of Europe have gone through periods in their histories when their written forms were very unsettled. This could be because of various dialects competing to be the standard, influence of prestige languages spoken by the elite but not necessarily the common folk, intentional competition among language planners, or the influence of other cultural factors, such as the influence of archaic liturgical languages (e.g., various recensions of Church Slavic).

These unsettled orthographic situations can be seen in competing alphabets (e.g., Latin, Cyrillic, and Arabic scripts), competing spelling conventions in a single script (e.g,. spelling the letter S (as in Sam) as s, ss, ß, or sz in German and some Slavic languages in the 17th-18th centuries), the use of historically unjustified “fancy” letters for stylistic reasons, and the introduction of idiosyncratic letters of limited use (e.g., the ʒ in Romani), or combinations of all of the above.

In the context of Slavic and other Eastern European languages, this orthographic turmoil, which was mostly a result of the competing influences of Western versus Eastern European languages, was especially conspicuous on the linguistic territory of the former Yugoslavia and the other Balkan states, Bulgaria and Albania, but also occurred in the Central European countries in the Czech, Slovak, Polish, and Hungarian speech areas. Similar phenomena are also seen in Central Asia where there has been competition between Latin, Cyrillic, and Arabic and Persian scripts over the early 20th century.

The various forms of Slavic liturgical languages collectively and somewhat imprecisely referred to as “Church Slavic” present specific complications. Often there will be confusion among catalogers whether a book is in, say, Old Church Slavic, Church Russian, or some other Slavic recension of Church Slavic. Strictly speaking, Old Church Slavic is the South Slavic language spoken in the Balkans in the 9th century and which continued for several centuries thereafter as a liturgical language. On the other hand, Church Russian (also often referred to as Church Slavic) is a highly stylized variant of Russian used by the Orthodox Church. There is a great deal of inconsistency in bibliographic records in that Church Russian can be found coded as Russian (rus) or Church Slavic (chu). As the cataloging documentation is silent on the fact that the term “Church Slavic” is often used to refer to vastly different linguistic manifestations, it is understandable that such confusion abounds.

A more recent development is the emergence of pseudo-languages. These are writings in which, for stylistic or artistic reasons, a language is intentionally disguised by the orthography in order to appear to be another language, e.g., when Russian is written in such a way as to appear that the writer is a speaker of Russian but is only familiar with the Ukrainian alphabet (example http://catalog.lib.ku.edu/cgi-bin/Pwebrecon.cgi?bbid=6883750). The author clearly intentionally uses such a graphical presentation to make a particular point, and this information should be preserved as faithful a way as possible in the bibliographic record, especially in cases where it may affect discoverability in the catalog. Another example is when modern Russian is written with Church Slavic letters and Church Slavic-looking grammatical endings, so as to appear to be Church Slavic.

Main cataloging principles

The cataloger’s challenge when confronted with a resource in mixed orthographies is to transliterate the title page information using the available ALA-LC romanization schemes in as faithful way as possible, while making the resource findable in a predictable way for the user. This is the approach that is encouraged by both AACR2 and RDA. However, because these goals are not always compatible, we must therefore offer additional strategies to achieve both goals.

The Slavic Cataloging Manual Task Force therefore suggests the following approaches to the descriptive portion of the cataloging record. These suggestions are our opinion only and do not represent any official cataloging body, but reflect our understanding, from experience, of how to best provide access within the framework of current cataloging conventions. Differing practices may become apparent as interpretation of RDA develops. In addition, cataloging agencies may have legitimate differing interpretations of the rules involved and historical local practices. Our approach is not meant to be prescriptive or to invalidate other approaches.

Suggested practices

Title Proper transliteration

In most cases of mixed orthography we are dealing with problems in representing the transliterated title proper of the work, represented by MARC field 245 in OCLC records, when relying on the approved ALA-LC romanization charts for each language. For the most part, cases of mixed orthography, whether being various recensions of Church Slavic, or because of an unsettled orthographic history in a specific time period, or a more modern example of pseudo-language, can be handled in the manner suggested below.

1) The cataloger should first determine the language represented in the primary source of information, which is usually the title page. This determination will generally be made based on the body of the whole work, which often provides a much clearer indication of the nature of the language than a title alone can provide. This will also be the language you assign to the language fixed field code. In the case of various recensions of Church Slavic, it is preferable to identify the language with the underlying one--e.g., the Church Slavic recension of Serbian should be identified as Serbian.

2) Strive to transliterate the title proper of the resource as faithfully as possible using the approved ALA-LC romanization scheme for what appears to be the predominant script of the text. This may not necessarily correspond to the script normally associated with the primary language (e.g., modern Russian or Romanian church texts with orthography heavily influenced by Church Slavic). There will be cases where not all characters will be found in the romanization chart for the predominant script and the cataloger will have to use judgment in applying romanization schemes from other charts. In cases where the orthography is so mixed as to make this determination very difficult, err on the side of identifying the script with the language of the main body of the text.

3) If the information to be transliterated contains letters for which no appropriate ALA-LC romanization chart can give guidance (see example #1 below), the cataloger will have to decide on an alternate transliteration for those letters. In most cases, it is advisable to substitute the romanization for the modern equivalent of the letter.

4) Make a 546 language note explaining the nature of the language/script represented, e.g., “Serbian in Russo-Serbian orthography”, “Romanian in Church Slavic script” (example #3 below), or “Russian in Ukrainian-style spelling”, if not otherwise obvious. In some cases this may only pertain to the title page information, e.g., “Title page in Russian in pseudo-Church Slavic” for an item with a title as shown in example #2 below.

Title added entries

Once the title proper has been transliterated to satisfaction, the cataloger may then supply as many added entries as deemed necessary to allow an average user of the modern form of the language identified as per step 1 above to locate and identify the resource in an electronic catalog.

Optional: Vernacular script access points

Option 1:

Add matching vernacular fields for the title proper and added title entries if those characters are available for use in OCLC (officially called Alternate Graphic Representation: see http://www.loc.gov/marc/bibliographic/bd880.html). Note however that certain extended Cyrillic characters used in Church Slavic, older Cyrillic Romanian, and Central Asian Cyrillic languages are not yet available. In the latter case, matching vernacular fields can still often be paired with the added transliterated entries (MARC field 246) containing the modern orthographic form of the title. Titles containing mixed Latin script and Cyrillic characters can be handled in the same way by using a matching vernacular field, as long as all characters are available in OCLC (All character sets supported by OCLC’s Connexion client can be found at: https://www.oclc.org/content/dam/support/connexion/documentation/client/international/internationalcataloging.pdf).

Option 2:

Although somewhat laborious, valid Unicode characters that are otherwise unavailable in the OCLC Connexion client can be represented in the matching vernacular fields by their hexadecimal Numeric Character Reference (NCR). To create these, the NCR is input in the Connexion matching vernacular field surrounded thus, &#xNCR : for example, ѧ is encoded as ѧ

The full set of valid Unicode characters can be found at: http://www.unicode.org/charts/. The basic Cyrillic chart is here: http://www.unicode.org/charts/PDF/U0400.pdf and the extended Cyrillic characters at: http://www.unicode.org/charts/PDF/UA640.pdf.

If the local catalog is properly Unicode compliant, then these characters should be displayed in the OPAC correctly. They will display correctly in OCLC WorldCat if they are input in the match-vernacular field 880 when input through OCLC. Experimentation will be necessary to see if your catalog can deal with these characters. For more on NCRs, see http://www.loc.gov/marc/marbi/2006/2006-09.html.

Examples

Example #1. Item in a mixed Russo-Serbian script (link to WorldCat record):

Старый пчеларъ, или, кратко руководство къ цѣли сходномъ практическомъ пчеловодству у плетенымъ кошньицама …