Binary Coded Morse

Here's programming challenge.

Let's say you want to write a program to send CW. You would like to have a way to store the Morse Code translation of the entire character set (or even just the characters in your message) in an efficient manner, without having to manually program each dit, dah, and pause.

Welp...

The pauses between characters, words, and sentences are be accounted for separately, leaving just dits and dahs to be encoded. This allows the entire alphabet to be represented in a binary format. To start, let's use zeros to store dits, and ones to store dahs. That gives us something like this:

A = dit dah = 01

B = dah dit dit dit = 1000

C = dah dit dah dit = 1010

CQ DE KI4MCW = 1010 - 1101 - - - 100 - 0 - - - 101 - 00 - 00001 - 11 - 1010 - 011

Using this approach, all of the characters (and I think the pro-signs too) used in International Morse Code contain less than eight dit/dah elements, so we can use single 8-bit integers to store the entire set. Abviously some characters are quite short, so we're going to be left with extra bits to fill in each byte. One solution would be to "left-justify" the symbols:

A = dit dah = 01 ==> [ 0 1 _ _ _ _ _ _ ]

Unfortunately, the remaining bits must also be either ones or zeros, so we would end up with something like this:

A = dit dah = 01 ==> [ 0 1 _ _ _ _ _ _ ] = [ 0 1 0 0 0 0 0 0 ]

We cannot tell from this where the end of the dits and dahs is - it looks like it should be "dit dah dit dit dit dit dit dit". We get the same problem is we "right-justify" the symbols:

A = dit dah = 01 ==> [ _ _ _ _ _ _ 0 1 ] = [ 0 0 0 0 0 0 0 1 ]

What we need is some way to flag the beginning and ending of our symbols - a "stop bit" or a "start bit". Let's look at the "left-justified" approach again:

A = dit dah = 01 ==> [ 0 1 (stop) _ _ _ _ _ ]

The stop bit can only be a one or a zero. Unfortunately, both of those already have meaning, so as you read left-to-right, you would not be able to tell a symbol bit from a stop bit - the same problem we saw above. Let's look at the "right-justified" approach once more.

A = dit dah = 01 ==> [ _ _ _ _ _ (start) 0 1 ]

This is a little different - reading left-to-right, we start with unused bits, then hit the start bit, and only then do the data symbols begin. So as long as the empty bits and the start bit use different values, we can tell them apart, and then any bits afterward must be data symbols. Let's try this, using a one for the start bit, and fill the unused bits with zeros.

A = dit dah = 01 ==> [ _ _ _ _ _ (start) 0 1 ] = [ 0 0 0 0 0 (1) 0 1 ]

That might not =look= much better to the eye, but it is now super-easy to process. Moving left-to-right:

1. bit = 0: have not hit start bit yet - ignore

2. bit = 0: have not hit start bit yet - ignore

3. bit = 0: have not hit start bit yet - ignore

4. bit = 0: have not hit start bit yet - ignore

5. bit = 0: have not hit start bit yet - ignore

6. bit = 1: have not hit start bit yet - this is the start bit, data follows

7. bit = 0: have hit start bit - send a dit

8. bit = 1: have hit start bit - send a dah

The rest of the alphabet fills out the same way, resulting in decimal values as shown:

B = dah dit dit dit = 1000 ==> [ _ _ _ (start) 1 0 0 0 ] = [ 0 0 0 (1) 1 0 0 0 ] = [ 00011000 ] = 24

C = dah dit dah dit = 1010 ==> [ _ _ _ (start) 1 0 1 0 ] = [ 0 0 0 (1) 1 0 1 0 ] = [ 00011010 ] = 26

D = dah dit dit = 100 ==> [ _ _ _ _ (start) 1 0 0 ] = [ 0 0 0 0 (1) 1 0 0 ] = [ 00001100 ] = 12

Even numbers, pro-signs, and punctuation can be represented:

3 = dit dit dit dah dah = 00011 ==> [ _ _ (start) 0 0 0 1 1 ] = [ 0 0 (1) 0 0 0 1 1 ] = [ 00100011 ] = 35

SK = dit dit dit dah dit dah = 000101 ==> [ _ (start) 0 0 0 1 0 1 ] = [ 0 (1) 0 0 0 1 0 1 ] = [ 01000101 ] = 69

? = dit dit dah dah dit dit = 001100 ==> [ _ (start) 0 0 1 1 0 0 ] = [ 0 (1) 0 0 1 1 0 0 ] = [ 01001100 ] = 76

With the alphabet down, the routine for sending CW from an arbitrary string becomes very simple. In pseudo code:

// load an array of constants with the Binary Coded Morse values
// associative array, for simplicity
bcmArray("A") = 5
bcmArray("B") = 24
...etc...

// calculate duration of each dit in ms
cwSpeed = 10      // start with words per minute

                  // Guesstimate, in rough numbers
                  // avg per letter:   2 dits
                  //                 + (2 dahs = 2*3 dits = 6 dits) 
                  //                 + (4 element gaps = 4 dits)
                  //                 = 3 + 6 + 4
                  //                 = 13 dit-lengths / letter
                  // avg per word:     (6 letters = 6*13 = 78 dits)
                  //                 + (5 letter gaps = 5*3 = 15 dits)
                  //                 + (1 word gap = 7 dits)
                  //                 = 78 + 15 + 7 
fudge = 100       //                 = 100 dit-lengths per word

msDit = (1/cwSpeed) * 60 * 1000 * (1/fudge)  // min    sec   ms    word
                                             // ---  * --- * --- * ----
                                             // word   min   sec   dits

// calculate the duration of other elements
msDah = msDit * 3
msBetweenElems = msDit
msBetweenChars = msDah
msBetweenWords = msDit * 7

// main
sendString( "KI4MCW BEACON ON 7.021" )
return 0
// end main

function sendString( myString )
    for each myChar in myString
        if myChar = (a space) then sleep(msBetweenWords)
        else sendChar(myChar)
    end for
end function sendString

function sendChar( myChar )
    // look up BCM value from array
    bcmCode = bcmArray(myChar)
    if error (bcmCode is not valid) then return from function

    myGotStartBit = 0
    myKeyDownMs = 0 
    for each myBit in bcmCode
        if myGotStartBit = 0 then
            if myBit = 1 then myGotStartBit = 1
        else 
            // this must be a data bit
            if myBit = 0 then myKeyDownMs = msDit
            else myKeyDownMs = msDah

            keyDown() ; sleep(myKeyDownMs) ; keyUp()

            // end of this element
            sleep(msBetweenElems)

        end if..else
    end for

    // end of this letter
    sleep(msBetweenChars)
end function sendChar

// end

For memory-constrained devices like microcontrollers, the BCM array can be stored in flash or EEPROM as appropriate.