Normalization and Diacritics in Authority Records


Normalization is the conversion of a text string to a normalized form. Text strings that normalize to the same form are considered to be duplicates and must be differentiated from each other. The goal of normalization is to ensure that each authorized access point is unique.

NACO practice regarding normalization (from DCM Z1):

“When a new authority record is added to the authority file or when a new field is added to an existing NAR, each new access point is compared against access points already in the file to determine whether the new access point is adequately differentiated from existing authorized access points. All partners involved in the exchange of LC/NAF authority data have agreed to a specific set of rules for normalization, and these rules are posted at: http://www.loc.gov/aba/pcc/naco/normrule-2.html . Briefly, the process of normalization removes all diacritics and most punctuation, and converts all letters to uppercase and all modified letters to their unmodified equivalents. Subfield delimiters and subfield codes are retained in the normalized form. The normalized form of the name differs from the authorized form of the name as an access point.”

Please note: when adding variant access points in modern orthography, keep NACO normalization rules in mind. Since diacritics are disregarded in normalization, access points that vary only in diacritics, as is frequently the case, will normalize to the same normalized form, and therefore only one form of name (modern or old) can be chosen.

With the above in mind, a see reference tracing (4XX field) may not contain a string that normalizes to the same normalized form as an established heading (1XX field) in an authority record.

Examples of normalization:

    • Language-specific diacritics (háčeks, umlauts, acutes, etc.) used in Slavic Latin-script languages and in romanization will be ignored for the purposes of normalization, e.g., Havel, Václav will normalize as Havel, Vaclav; Semënov will normalize as Semenov.

    • Special characters (e.g., Polish ł) will normalize as their base characters, i.e., L in this case, e.g., Wałęsa, Lech will normalize as Walesa, Lech.

    • Technical special characters added as a result of romanization (e.g., ligatures, hard and soft signs), will be ignored for the purposes of normalization, e.g., Rossiĭskai︠a︡ akademii︠a︡ nauk will normalize as Rossiiskaia akademiia nauk, and Vserossiĭskiĭ nauchno-issledovatelʹskiĭ konʺi︠u︡nkturnyĭ institut will normalize as Vserossiiskii nauchno-issledovatelskii koniunkturnyi institut.


See also:

Names in Pre-Reform Orthography;
Translation with the Same Name as Original Title


Revised: Oct. 7, 2015