How to detect errors in your calculations or your data
Introduction
Checksums and check digits are small data elements that are derived from larger data blocks and used to detect errors in generation, transmission or storage of those larger data blocks. A change in the data block should result in a change in the checksum. For convenience, a checksum should contain less digits than the data block, but this implies that the same checksum value can belong to a number of different data blocks. There are various algorithms for calculating checksums. Good checksum algorithms are specially sensitive to common errors in the data block, like switching two adjacent digits or changing one digit. Some checksum algorithms are designed to prevent phonetic errors[Verhoeff] like thirty — thirteen. Many checksum algorithms look likeΣ | w_{i} d_{i} = 0 mod m |
where | w_{i} | some position-dependent integer weight |
d_{i} | : digit at position i starting from the right | |
m | : some integer |
11 | |
Σ | i d_{i} = 0 mod 11 |
i=1 |
The last digit, the check digit, thus ranges from 0 to 10. The number 10 is represented by an “X”. ISBN-10 always detects the change of a single digit in the original data and switching two adjacent or nearly adjacent digits ab→ba or abc→cba, but not abc→cab.
The more recent 13-digit ISBN uses a different algorithm with modulo 10:
So ISBN-13 has a check digit ranging from 0 to 9. It does not even detect a single switch of two adjacent digits if they differ by 5.
The more recent 13-digit ISBN uses a different algorithm with modulo 10:
13 | ||||
Σ | { | i is odd: d_{i} i is even: 3d_{i} |
} | = 0 mod 10 |
i=1 |
Checksums are not always explicitly visible in the actual number. This is, for example, the case in the Rudolf Mosse Code Condenser for telegrams[Mosse]. The aim of the Mosse Code is to lower the costs of telegrams by reducing their size. First common words and phrases are expressed in small numbers using a dictionary, then an intermediate checksum is calculated per five 2-digit numbers, and finally the 2-digit numbers are encoded in 2-character codes. This final encoding is controlled by the checksums. The checksums themselves are not part of the final message, so do not increase the transmission costs directly, but they force the encoding to be not the most efficient one. When decoding an erroneous message an inconsistency occurs between the possible checksums indicated by the encoding scheme and the checksums of the decoded message. Other telegram condensers do show checksums in the final message, and sometimes even checksums that contain a check digit to check themselves.[Barto]
The calculation of a checksum has to be fast and not error-prone, so it is useful to mechanize this calculation.
Casting out nines and elevens
Checksums can also be used to check calculations. Best known are “casting out nines” and “casting out elevens” for addition, subtraction, multiplication and division (see Appendix).In 1911 Rudolf Malý patented a simple device for casting out nines to check calculations ([Malý], Figure 1). The digits of a number have to be dialed in one by one to get a check digit. In Malý's description the remainder of a division is not checked. Malý also mentions “regel-de-tri” as a separate operation that can be checked with his device.
Try the interactive simulation
of this device
of this device
Much later, a similar device for casting out elevens was marketed for 2,50 Dutch Guilders[SC1953] as the “Controlex” ([Reumhelm1953], Figure 2). It was made by “Reumhelm SA”, an company located in Amsterdam.
Try the interactive simulation
of this device
of this device
Because casting out elevens involves alternatingly adding and subtracting digits, the Controlex contains two setting slots (A and B). One starts by dialing in the right-most digit in slot A, which causes the digits in slot A to be covered and the digits in slot B to be revealed. The digits in slot B run in a direction opposite to those in slot A, so dialing in slot B in the same direction as in slot A results in a subtraction. After dialing in the second digit in slot B the digits in slot B are covered, and those in slot A are revealed, ready for adding the third digit. The check digit is read in the small window C. An X is used to represent 10. The slot BB can be used to add or subtract check digits modulo 11. The description of the Controlex also mentions checking powers and square roots.
A device for casting out threes appears in a Swiss patent by Jakow Trachtenberg (Figure 4). This patent also describes variants for casting out nines and elevens, which are more useful.
Credit cards
The 12-digit number on your credit card also contains a check digit. Its algorithm is attributed to Hans Peter Luhn, an IBM employee.12 | ||||
Σ | { | i is odd: d_{i} i is even: 2d_{i} mod 9 |
} | = 0 mod 10 |
i=1 |
Luhn invented a mechanical calculator for this checksum ([Luhn1960], Figure 4). It looks like an ordinary chain adder, but it does not have a tens carry and all chains add to the same number in the small window at the far right. If we number the slots from the right, starting with 1, the even-numbered input slots have a strange digit distribution. Where an odd slot shows a 1, an even one shows a 5, so entering a 5 in an even slot is equivalent to entering a 1 in an odd slot. This is because 2\times 5 = 1 mod 9. Similarly, the number 2 in an odd slot corresponds to 1 in an even slot. If a correct credit card number is entered, the number in the small window is zero. The device can also be used to calculate the check digit for an as yet incomplete credit card number. In that case, no number is entered into the rightmost slot and the digit is read from the small window. The scale behind this window runs opposite to the chains, so it shows the tens complement of the number entered, which is exactly what is needed when determining the check digit.
Before that, in 1944, Hans P. Luhn invented a relay-based checking device for calculators, [Luhn1947]. It used only modulo 3, probably because it employed hard-wired tables for multiplication and addition.
A credit card checksum device following the principles of the Controlex was patented in 1962 by the Pure Oil Company ([Zitnik], Figure 5).
Try the interactive simulation
of this device
of this device
Railroad cars
For American railroad cars a labeling scheme is prescribed by the Association of American Railroads. After converting letters to numbers, a check digit is calculated asΣ_{i=0} 2^{i} d_{i} mod 11.
A 1969 US Patent [Buck] describes a simple device for this calculation (Figure 6). Note the different distribution of numbers for each column, representing 2^{i} d_{i} mod 11, and the huge result “register” which is partially cut away to reduce the size of the image. The patent describes a version with a circular register, which would be more convenient, but does not show it.
Adding and bookkeeping machines
In bookkeeping, checksums can be used to assure the correctness of bank account numbers, article numbers and the like. But they can also be used to check the transfer of an intermediate balance sum from one sheet to another.As early as 1930, Paul J. Simones, of Dubuque, Iowa, obtained a patent for a checksum attachment for adding machines (Figure 7). The three-column adding machine used as checksum 4d_{2} - 3d_{1} - 2d_{0} mod 17.
Figure 8: Remington “Computing Machine”. Images from US Patent 1,810,329,
and an early Remington advertisement for a bookkeeping machine
(not showing the actual checksum attachment).
In 1931, the Remington Accounting Machine Corp. received a patent for a “Computing machine” which turns out to be a checksum attachment for a Remington/Wahl bookkeeping machine that enables “proof of pick-up” (Figure 8). This small device can be put on the carriage of a special typewriter, among ordinary column adding attachments. It works with modulo 11, the 10 being represented by an X, as can be seen in the lower left of Figure 8.
Figure 9: Electronic checksum attachment for Ascota mechanical
bookkeeping machines (courtesy of robotrontechnik.de)
The East-German-built Ascota 170 bookkeeping machine, which itself was electromechanical, could be extended with an electronic checksum device for account numbers (Figure 9). The device consisted of four boards with germanium transistors. Also Walther and Anker made special checksum devices for their electromechanical machines.
Punch card punches could also be equipped with checksum devices (Figure 10).
Punch card punches could also be equipped with checksum devices (Figure 10).
Manufacturers of relay-based or electronic computing equipment, like Bull and IBM, used checksums for checking data transfer.
In electronic binary computers, parity bits are used which can be regarded as the simplest form of checksum.
Complex codes
The Dutch mathematician Koos Verhoeff compiled statistics on the actual occurrence of errors in transferring numbers[Verhoeff]. He developed an checksum algorithm based on the dihedral group D_{5}. In this algorithm, any addition to the checksum depends on the intermediate value of the checksum. A slightly modified version of this algorithm was used on the Deutsche Mark bank notes. It appears in a patent for Philips[Philips]. Verhoeff is also mentioned as inventor on patents issued to Anker Werke [Anker].Read more about the Verhoeff algorithm in an appendix.
Epilogue
Several checksum and check digit calculators have been designed for manual use. Some of them were not very user-friendly. No surviving examples of the simple manual devices have been found. Checksum calculators attached to bookkeeping machines were often based on relays or electronic components, even if the bookkeeping machine was purely mechanical. These devices do survive, but are not always recognized.References
- [Mosse] J. Kähler, “Rudolf Mosse-Code mit Mosse-Condenser,” Rudolf Mosse, Berlin, 1922.
- [Barto] C.B. Barto, “Economie en Techniek van Codes en Code-Condensors,” PhD Thesis TH Delft, 1933.
- [Luhn1960] Hans P. Luhn, “Computer for verifying numbers,” US Patent 2,950,048, filed 6-1-1954, patented 23-8-1960, assigned to IBM.
- [Zitnik] Leonard C Zitnik, “Check digit computing apparatus”, US Patent 3,033,450, filed 18-12-1959, patented 8-5-1962, assigned to Pure Oil Comp.
- [Luhn1947] Hans P. Luhn, “Calculation checking device,” US Patent 2,425,549, filed 31-8-1944, patented 12-8-1947, assigned to IBM.
- [Malý] Rudolf Malý, “Kontrollrechenapparat,” German Patent 244,183, issued 29-4-1911.
- [Verhoeff] J. Verhoeff, “Error Detecting Decimal Codes”, (Mathematical Centre Tracts, 29). Mathematisch Centrum. Amsterdam 1969; second edition 1975.
- [Philips] J. Verhoeff, “Device for checking a group of symbols...”, US Patent 3,675,202, filed 1-6-1969, patented 4-7-1972, assigned to U.S. Philips Corp.
- [Anker] J. Verhoeff, “Data checking system”, US Patent 3,448,254, filed 28-7-1965, patented 3-6-1969, assigned to Anker Werke.
- [Buck] Norman R. Buck, “Check digit calculator”, US Patent 3,451,619, filed 11-9-1967, patented 24-6-1969, assigned to Westinghouse Air Brake Comp.