Word List Scoring

Notes on word list scoring and parsing strings of letters into words

Word list scoring can be combined with scoring by logs of tetragraph or pentagraph frequencies to evaluate the trial decrypts generated by hill-climbing programs. Word list scoring is especially helpful when solving patristocrats with convoluted plaintexts, and when solving Grandpre or numbered key ciphers with no crib.

Parsing strings of letters into words, or into a combination of suffixes, words, and prefixes is helpful for solving running key ciphers. You can drag thousands of “standard cribs” across a running key cipher and try to parse the resulting decrypted strings into suffixes plus words plus prefixes.

I’ve posted a short slide-show about parsing strings into words here.

Word lists for scoring trial decrypts and for parsing strings can be stored in a “trie” structure. Words are stored in a trie as single letters along with pointers to the letters that follow them in the word. I’ve posted an online program, Demo1, that constructs a trie from words entered by the user. As an example you might want to enter these words:

the

last

they

then

theater

Then try searching the string “theater”, and the string “theateqx”

From a list of words stored in a trie, you can construct a word length table for a given string of letters as mentioned in the slideshow above. I’ve posted an online program, Demo2, that uses a list of about 60,000 words to construct a word length table for any short string that the user enters. You might try entering the string “advocatedaydreamerudition”.

Parsing a string into a suffix plus zero or more words plus a prefix, as mentioned in the slideshow, is illustrated in an online program, Demo3. This program uses the same list of 60,000 words as demo 2. Here’s an interesting string to try getting a complete parse:, “ingoddlightasymbol”

A parsing demo program with a bigger list of words and phrases is posted in zip form here. You can download it, unzip it, and run the program “parse_demo.html” from your hard drive. The demo program will load the phrase list “demo_list.js” from your hard drive. The phrase list is 6MB in unzipped form so it will take awhile to load.

These parsing demo programs also include buttons for forward (left-right) and reverse (right-left) word list scoring. In word list scoring each word in the string is scored by adding the square of its length (always taking the longest word possible), skipping words that are too short. For positions in which no word is found, one point is subtracted in basic word list scoring. In extended word list scoring, stretches of the string where no word is found are scored by subtracting the square of their lengths.

As an example, the string "theaterqqq", scores 49 for the word "theater" (the longest word at position 0) minus 3 for "qqq" in basic scoring, or minus 9 for "qqq" in extended scoring.