Estimating Key Length

Estimating Key Length of a Numbered Key Cipher[1] By MATANZA

BION in the May June 2010 edition of the Cryptogram magazine introduced the "Numbered Key Cipher"[2] and provided three crypts for solving. The article in a slightly different form is accessable at this website. The cipher uses an extended key including repeated letters followed by the unused letters in alphabetical order for a plaintext alphabet set against displaced consecutive numbers for the ciphertext. This gives multiple choices for substitution.

In solving, it is helpful to be able to estimate the length of the extended key and the number of residual alphabetic letters. (Because of the low frequency of the residual letters one can often spot their location in the number sequence and make good guesses on individual residual letters.) A good estimate of the length can be made by using the statistical method of "expected number of blanks" as presented by Solomon Kullback[3], where a "blank" is a ciphertext element (in this case a number) having a zero frequency.

LENGTH OF EXPECTED NUMBER OF ROUNDED SUM

TEXT, N BLANKS, B(N) N + B(N)

(key length) (residue letters) (pt alphabet length)

20 14.5 25

30 11.9 42

40 10.1 50

50 8.9 59

60 8.0 68

70 7.2 77

80 6.6 87

100 5.7 106

125 4.9 130

150 4.4 154

175 4.0 179

200 3.6 204

TABLE I, expected number of blanks vs text length

In this application, N equals the key length, B(N) equals the residual letters, and N + B(N) is the length of the extended plaintext alphabet.

For the first crypt, NK-1, the length of the keyed alphabet is 27. This is a bit short for meaningful key length analysis --- it is almost simple substitution.

For the second crypt, NK-2, the length of the keyed alphabet is by implication 57 -- the largest number is 56 and adding one for the 00 gives 57. By interpolation on the N + B(N) column of Table I, the expected number of blanks (rounded) would be 9 and the key length, 57 - 9 = 48. The actual key length was 48 letters long.

The third crypt, NK-3, is,

NK-3. Numbered key. Motley crew. (andalldr-)

26 08 09 23 21 28 03 09 14 12 36 09 30 36 33 02 11 09 30 03 19 20

20 16 14 25 38 01 33 03 07 23 32 08 33 32 33 13 13 30 00 34 19 28

36 20 15 07 08 30 34 33 12 16 07 08 28 34 14 03 36 10 21 14 32 08

36 03 33 38 38 22 28 36 37 09 33 07 27 09 03 31 07 08 09 33 25 24

00 33 19 18 21 14 26 27 26 27 09 24 03 37 11 03 35 19 24 25 38 14

25 38 35 19 20 10 28 32 20 32 27 30 00 33 15 36 33 19 19 36 03 14

01 01 24 15 38 29 28 26 35 03 20 13 18 33 25 36 11 15 35 20 00 37

20 03 26 33 34 19 09.

NK-3 compiled frequencies:

00 4 08 6 16 2 24 4 32 5

01 3 09 9 17 0 25 5 33 13

02 1 10 2 18 2 26 5 34 4

03 11 11 3 19 8 27 4 35 4

04 0 12 2 20 8 28 6 36 9

05 0 13 3 21 3 29 1 37 3

06 0 14 7 22 1 30 5 38 6

07 5 15 4 23 2 31 1

For this crypt, the length of the keyed alphabet is not known, however it can be estimated using Table I. The crypt length is 161 characters. Assuming that the constructor uses all the available substitutions, the number of blanks of the ciphertext would probably remain the same as that of the underlying plaintext. Interpolating for 161 in the N column of Table I, the expected number of blanks would be four. Since in the number series of 00 to 38 there are four blanks (04, 05, 06 and 17), this would imply a keyed alphabet length of 39. If not all substitutes were used, it might be a bit higher. (actually, if the residue doesn't wrap around the end of the number sequence, the highest number plus one, in this case 38 + 1, is a good estimate.)

From Table I, for an alphabet length of around 39, the expected length of the residue would be 12 and the key length around 27. Since the low frequency letters would tend to accumulate in the residue, it would be reasonable (and mostly correct) to assume that the key starts at 07 and ends around 35.

Using frequency matching on the residue, one could assume that 04, 05 and 06 are v, x, and z; 03 is r, s or t, 02, q; 01, p ; 00, l or m; and maybe j and k have zero frequency and are 39 and 40. Since the crib, (andalldr-), contains the doubled l's, and the crypt does not contain '00 00', 00 more probably is m.

00 4 m 08 6 16 2 24 4 32 5 (40) 0 k

01 3 p 09 9 17 0 25 5 33 13

02 1 q 10 2 18 2 26 5 34 4

03 11 r/s/t 11 3 19 8 27 4 35 4

04 0 v 12 2 20 8 28 6 36 9

05 0 x 13 3 21 3 29 1 37 3

06 0 z 14 7 22 1 30 5 38 6

07 5 15 4 23 2 31 1 (39) 0 j

Spotting the crib would clarify things further. (The actual key length was 26 and the residue, 14; the low frequency letter k was in the key.)

_________________________________________

References

[1] From similar article in slightly different form: MATANZA (George Conlee), "Estimating Key Length of a Numbered Key Cipher", American Cryptogram Association (ACA), Cryptogram mag.(Cm), MJ 2011.

[1] BION, "The Numbered Key Cipher", ACA, Cm, MJ 2010. See also similar website article.

[2] Solomon Kullback, Statistical Methods in Cryptanalysis, Aegean Park Press, Laguma Hills, CA, 1976, p27. Values in Table I were computed based on English frequency data from Elsy rather than from telegraphic text as given in the reference.