S2: A language of numbers

We are going to do a book review. Not really, but we are going to learn from the ground up with regards to how this language of numbers is connected to computing.

The best book I've come across is Code: The Hidden Language of Computer Hardware and Software by: Charles Petzold

This is a free online pdf. The pathological flow looks something like this.

Codes and Combinations

Morse code was invented by Samuel Finley Breese Morse (1791–1872), whom we shall meet more properly later in this book. The invention of Morse code goes hand in hand with the invention of the telegraph, which we'll also examine in more detail. Just as Morse code provides a good introduction to the nature of codes, the telegraph provides a good introduction to the hardware of the computer. Most people find Morse code easier to send than to receive. Even if you don't have Morse code memorized, you can simply use this table, conveniently arranged in alphabetical order:

Receiving Morse Code

Receiving Morse code and translating it back into words is considerably harder and more time consuming than sending because you must work backward to figure out the letter that corresponds to a particular coded sequence of dots and dashes. For example, if you receive a dash-dot-dash-dash, you have to scan through the table letter by letter before you finally discover that the code is the letter Y. The problem is that we have a table that provides this translation:

Alphabetical letter → Morse code dots and dashes

But we don't have a table that lets us go backward:

Morse code dots and dashes → Alphabetical letter

In the early stages of learning Morse code, such a table would certainly be convenient. But it's not at all obvious how we could construct it. There's nothing in those dots and dashes that we can put into alphabetical order. So let's forget about alphabetical order. Perhaps a better approach to organizing the codes might be to group them depending on how many dots and dashes they have. For example, a Morse code sequence that contains either one dot or one dash can represent only two letters, which are E and T:

A combination of exactly two dots or dashes gives us four more letters—I, A, N, and M:

A pattern of three dots or dashes gives us eight more letters:

And finally (if we want to stop this exercise before dealing with numbers and punctuation marks), sequences of four dots and dashes give us 16 more characters:

Taken together, these four tables contain 2 plus 4 plus 8 plus 16 codes for a total of 30 letters, 4 more than are needed for the 26 letters of the Latin alphabet. For this reason, you'll notice that 4 of the codes in the last table are for accented letters.

These four tables might help you translate with greater ease when someone is sending you Morse code. After you receive a code for a particular letter, you know how many dots and dashes it has, and you can at least go to the right table to look it up. Each table is organized so that you find the all-dots code in the upper left and the all-dashes code in the lower right.

Can you see a pattern in the size of the four tables? Notice that each table has twice as many codes as the table before it. This makes sense: Each table has all the codes in the previous table followed by a dot, and all the codes in the previous table followed by a dash. We can summarize this interesting trend this way:

Each of the four tables has twice as many codes as the table before it, so if the first table has 2 codes, the second table has 2 x 2 codes, and the third table has 2 x 2 x 2 codes. Here's another way to show that:

Of course, once we have a number multiplied by itself, we can start using exponents to show powers. For example, 2 x 2 x 2 x 2 can be written as 2 4 (2 to the 4th power). The numbers 2, 4, 8, and 16 are all powers of 2 because you can calculate them by multiplying 2 by itself. So our summary can also be shown like this:

This table has become very simple. The number of codes is simply 2 to the power of the number of dots and dashes.

We might summarize the table data in this simple formula: number of codes = 2 number of dots and dashes Powers of 2 tend to show up a lot in codes, and we'll see another example in the next chapter.

To make the process of decoding Morse code even easier, we might want to draw something like the big tree

like table shown here :

This table shows the letters that result from each particular consecutive sequence of dots and dashes. To decode a particular sequence, follow the arrows from left to right. For example, suppose you want to know which letter corresponds to the code dot-dash-dot. Begin at the left and choose the dot; then continue moving right along the arrows and choose the dash and then another dot. The letter is R, shown next to the last dot.

If you think about it, constructing such a table was probably necessary for defining Morse code in the first place. First, it ensures that you don't make the dumb mistake of using the same code for two different letters! Second, you're assured of using all the possible codes without making the sequences of dots and dashes unnecessarily long.

At the risk of extending this table beyond the limits of the printed page, we could continue it for codes of five dots and dashes and more. A sequence of exactly five dots and dashes gives us 32 (2x2x2x2x2, or 2^5 ) additional codes. Normally that would be enough for the 10 numbers and the 16 punctuation symbols defined in Morse code, and indeed the numbers are encoded with five dots and dashes. But many of the other codes that use a sequence of five dots and dashes represent accented letters rather than punctuation marks.

To include all the punctuation marks, the system must be expanded to six dots and dashes, which gives us 64 (2x2x2x2x2x2, or 2 6 ) additional codes for a grand total of 2+4+8+16+32+64, or 126, characters. That's overkill for Morse code, which leaves many of these longer codes "undefined." The word undefined used in this context refers to a code that doesn't stand for anything. If you were receiving Morse code and you got an undefined code, you could be pretty sure that somebody made a mistake.

Because we were clever enough to develop this little formula,

number of codes = 2^(number of dots and dashes)

we could continue figuring out how many codes we get from using longer sequences of dots and dashes:

Fortunately, we don't have to actually write out all the possible codes to determine how many there would be. All we have to do is multiply 2 by itself over and over again.

Morse code is said to be a binary (literally meaning two by two) code because the components of the code consist of only two things—a dot and a dash. That's similar to a coin, which can land only on the head side or the tail side. Binary objects (such as coins) and binary codes (such as Morse code) are always described by powers of two.

What we're doing by analyzing binary codes is a simple exercise in the branch of mathematics known as combinatorics or combinatorial analysis. Traditionally, combinatorial analysis is used most often in the fields of probability and statistics because it involves determining the number of ways that things, like coins and dice, can be combined. But it also helps us understand how codes can be put together and taken apart

Page updated

Report abuse

S2: A language of numbers

Codes and Combinations

Receiving Morse Code

felts