Two-character Substitution Ciphers

Case 9.—Two-character substitution ciphers. In ciphers of this type, two letters, numerals, or conventional signs, are substituted for each letter of the text. There are many ways of obtaining the characters to be substituted but, in general, these ciphers may be considered as special varieties of Case 6 or Case 7. The ciphers which come under this case are not well suited to telegraphic correspondence because the cipher message will contain twice as many letters as the plain text. However they are so used; an example is at hand in which two numerals are substituted for each letter and this makes transmission by telegraph very slow.

Case 9 can be recognized by some or all of the following points; the number of characters in the cipher is always an even number; often only a few, say five to ten, of the letters of the alphabet appear; either a frequency table for pairs of the cipher text resembling the normal single letter frequency table can be made, or groups of four letters will show a regular recurrence, from which the cipher can be solved as in Case 7.

Case 9a.—

Message

RNTGN RAAGR NARNA GTGRA TGAAN NANGG RARAT NAANR NNNRN AAAGG AANGR NGGNN NRNAA AANRA TNANN NGGRN RNNRG TTGRG TGGRN ARNTG NNART GGRNR GRNNT GTGAA NNARN ARNRT TGAGG GAAAA NANNA RNAGA NGNAT NNNAT

This message contains 160 letters and it will be noted that the only letters used are A, G, N, R and T.

We may expect a simple two-letter substitution cipher at once. It will simplify the work if we divide the cipher into groups of two letters and then, if we find there are 26 or less recurring groups, to assign an arbitrary letter to each group and work out the cipher by the method of Case 6.

RN TG NR AA GR NA RN AG TG RA TG AA NN AN GG RA RA TN AA NR NN NR NA AA GG AA NG RN GG NN NR NA AA AN RA TN AN NN GG RN RN NR GT TG RG TG GR NA RN TG NN AR TG GR NR GR NN TG TG AA NN AR NA RN RT TG AG GG AA AA NA NN AR NA GA NG NA TN NN AT

With arbitrary letters substituted, we have

A B C D E F A G B H B D I J K H H L D C I C F D K D M A K I C F D J H L J I K A A C N B O B E F A B I P B E C E B B D I P F A Q B G K D D F I P F R M F L I S

Now, preparing a frequency table, with note of prefixes and suffixes we have:

FrequencyPrefixSuffix
A71111111FMKAFFBGKACBQ
B101111111111AGHNOAPIBQCHDOEIEBDG
C6111111BDIIAEDIFFNE
D9111111111CBLFKFBKDEICKMJIDF
E41111DBBCFFCI
F811111111ECCEPDPMADDAAIRL
G211ABBK
H41111BKJHBHLL
I9111111111DCKJBEDFLJCCKPBPP
J3111IDLKHI
K511111JDAIGHDIAD
L3111HHFDJI
M211DRAF
N11CB
O11BB
P3111IIIBFF
Q11AB
R11FM
S11I

A brief study of this table and the distribution in the cipher leads to the conclusion that B, F and C are certainly vowels and are, if the normal frequency holds, equal to E, O, and A or I. Similarly D and I are consonants and we may take them as N and T. I is taken as T because of the combination IP (=possibly TH) occurring three times. The next letter in order of frequency is A; it is certainly a consonant and may be taken as R on the basis of its frequency. Let us now try these assumptions on the first two lines of the message. We have

REAN_OR_E_ENT_____NATAON_N_
III

This is clearly the word REINFORCEMENTS and, using the letters thus found, the rest of the line becomes AMMUNITIONAND. We have then the following letters determined:

Arbitrary lettersABCDEFGHIJKLM
Plain TextREINFOCMTSAUD

If these be substituted we have for the message:

REINFORCEMENTS AMMUNITION AND RATIONS MUST ARRI_E _EFORE T_E FIFTEENT_ OR _E CANNOT _O_D OUT_.

From this the remainder of the letters are determined:

Arbitrary lettersNOPQRS
Plain textVBHWLX

Now let us substitute the two-letter groups for the arbitrary letters:

Arbitrary lettersKOGMBEPCRHDFAJILNQS
Two-letter groupsGGRGAGNGTGGRARNRGARAAANARNANNNTNGTRTAT
Plain textABCDEFHILMNORSTUVWX

It is evident that the cipher was prepared with the letters of the word GRANT chosen by means of a square of this kind:

GRANT
GABCDE
RFGHIK
ALMNOP
NQRSTU
TVWXYZ

Thus TG=E, AN=S, etc., as we have already found.

Case 9-b

Message

1950492958312325281544184528152048115041
2252115345584913412450285525265933195222
5245113215621558414328613612652945565015
2342455850634554201915501853112115415828
1124174553455420595025524541321533492048
5018152364

An examination of the groups of two numerals each which make up this message, shows that we have 11 to 36 and 41 to 65 with eleven groups missing. Now the 11 to 36 combination is a very familiar one in numeral substitution ciphers (See Case 6-c) and it will be noted that 41 to 66 would give us a similar alphabet. Let us make a frequency table in this form:

GroupFrequencyGroup Frequency
11111114111111
121421
131431
14441
1511111111145111111111
1646
17147
181114811
1911149111
2011115011111111
21151
2211521111
2311153111
24115411
25111551
261561
2757
28111115811111
29115911
3060
311611
3211621
3311631
34641
35651
36166

Each of these tables looks like the normal frequency table except for the position of 20 and 50 which should represent T, by all our rules, and should be apparently 30 and 60. But suppose we put the alphabet and corresponding numerals in this form:

1234567890
1 or 4ABCDEFGHIJ
2 or 5KLMNOPQRST
3 or 6UVWXYZ

Then A=11 or 41, J=10 or 40 and T=20 or 50 as we found. Using the above alphabet, the message may easily be read. Note that this cipher is made up of ten characters only, the Arabic numerals.

Case 9c—

Message

11562546762542294432194929401514232172112979703115
49242135117424147875764625244451432548453179742533
40554615127573227945162748151170423519441378252149
25147645531548342126721525407516112578454642217415
49521979297015242143292544493319701875314079254829
45514914117321171554

An examination of this message shows it to consist of forty-four different two-figure groups running from 11 to 79. Let us prepare a frequency table of these groups.

Group Frequency
11111111
121
131
141111
15111111111
1611
171
181
191111
20
211111111
221
231
241111
2511111111111
261
271
28
29111111
30
31111
321
3311
341
3511
36
37
38
39
401111
41
42111
4311
441111
4511111
461111
47
481111
49111111
50
5111
521
531
541
551
561
57
58
59
701111
71
7211
7311
74111
751111
76111
77
78111
7911111

We at once note the resemblance between the frequency tables for the groups 11 to 19 and 21 to 29; for the groups 30 to 36 and 50 to 56; and for the groups 40 to 49 and 70 to 79. Also the groups 11 to 19 and 21 to 29 have a frequency fitting well with the normal frequency table of the letters A to I; the groups 41 to 49 and 71 to 79 have a frequency fitting well with the normal frequency table of the letters K to S; and the groups 31 to 36 and 51 to 56 have a frequency fitting well with the normal frequency table of the letters U to Z. We have J and T unaccounted for, but note what occurred in Case 9-b and that 40 and 70 would correspond well with T if they followed respectively 49 and 79. We may now make up a cipher table as follows:

1234567890
1 or 2ABCDEFGHIJ
4 or 7KLMNOPQRST
3 or 5UVWXYZ

and this table will solve the cipher message.

In ciphers coming under case 9-b and 9-c, it is not uncommon to assign some of the unused numbers such as 85, 93, etc., to whole words in common use or to names of persons or places. In case such groups are found, the meaning must be guessed at from the context; but if many messages in the same cipher are available, the meaning of these groups will soon be obtained. The appearance of such odd groups of figures in a message does not interfere materially with the analysis, and it will be apparent at once on deciphering the message that they represent whole words instead of letters.

Chapter IX

Other Substitution Methods

The foregoing cases by no means exhaust the possibilities of the substitution cipher but they cover practically all methods which are satisfactory for military purposes, having in mind conservation of time, the minimizing of mental strain, and the requirements that complicated apparatus and rules be avoided, and that the resulting cipher should be adapted to telegraphic correspondence.

A message may be re-enciphered two or more times using a different key word each time or it may be enciphered by one method and re-enciphered by another method, using the same or a different key word. Complicated cipher systems requiring the memorizing of, or reference to, numerous rules have been devised for special purposes. Such systems usually fail utterly if there are any errors in transmission and it will be seen later that such errors are very common.

There are several ingenious cipher machines by which complicated ciphers can be formed, but if the apparatus is available and fairly long messages are at hand for examination, it is usually possible to solve them. Such machines are not, as a rule, simple and small enough for field use; and it must always be remembered that a machine cipher operates on certain mechanical cycles, which can be determined if the machine is available.

A book by Commandant Bazeries, entitled “Etude sur la Cryptographie Militaire,” and a series of articles by A. Collon, entitled “Etude sur la Cryptographie,” which appeared in the Revue de L’Armée Belge, 1899–1902, give illustrations and details of operation of several of these cipher machines and the latter goes into the methods of deciphering messages enciphered with them. These methods of analysis require long messages, and as each one is adapted only to the product of a certain machine or apparatus, it is not considered advisable to include a discussion of them here. Those interested in such advanced cipher work must refer to these and other European authors on the subject.

The requirement that cipher messages should be adapted to telegraphic transmission, practically excludes ciphers in which three or more letters or whole words are substituted for each letter of the plain text. Such ciphers might be used for the transmission of very short messages but in no other case.

The cipher of Case 7, with a key word or phrase longer than one-fourth of the message, the cipher after the method of Case 7, using a certain page of a book as a key, and the cipher with a running key, where each letter of the cipher is the key for enciphering the next letter, all look safe and desirable, theoretically, but, practically, the work of enciphering and deciphering is hopelessly slow, and errors in enciphering or transmission make deciphering very difficult. Incidentally the first and second of these ciphers can be solved by the special solution for Case 7, and the third can be solved by trying each of the twenty-six letters of the alphabet as the first key letter, and then continuing the work for five or six letters of the cipher. When the proper primary key letter is found, the solution of the next five or six letters of the cipher will make sense, and thereafter the cipher offers no difficulty.

There are numerous other methods of preparing what is virtually a very long, or even an indefinitely long key from a short key word, but all such cipher methods have the same practical disadvantages of slowness of operation and difficulty in deciphering, if errors of enciphering or transmission have been made.

The ciphers of Napoleon were long series of numbers representing letters, syllables and words. They were really codes; and a code based on these principles, but using letters instead of numerals, might be evolved very easily. The War Department Code, the Western Union Code, and, in fact, all codes are nothing but specialized substitution ciphers in which each code word represents a letter, word or phrase of the plain text.