Two-character Substitution Ciphers
Case 9.—Two-character substitution ciphers. In ciphers of this type, two letters, numerals, or conventional signs, are substituted for each letter of the text. There are many ways of obtaining the characters to be substituted but, in general, these ciphers may be considered as special varieties of Case 6 or Case 7. The ciphers which come under this case are not well suited to telegraphic correspondence because the cipher message will contain twice as many letters as the plain text. However they are so used; an example is at hand in which two numerals are substituted for each letter and this makes transmission by telegraph very slow.
Case 9 can be recognized by some or all of the following points; the number of characters in the cipher is always an even number; often only a few, say five to ten, of the letters of the alphabet appear; either a frequency table for pairs of the cipher text resembling the normal single letter frequency table can be made, or groups of four letters will show a regular recurrence, from which the cipher can be solved as in Case 7.
Case 9a.—
Message
RNTGN RAAGR NARNA GTGRA TGAAN NANGG RARAT NAANR NNNRN AAAGG AANGR NGGNN NRNAA AANRA TNANN NGGRN RNNRG TTGRG TGGRN ARNTG NNART GGRNR GRNNT GTGAA NNARN ARNRT TGAGG GAAAA NANNA RNAGA NGNAT NNNAT
This message contains 160 letters and it will be noted that the only letters used are A, G, N, R and T.
We may expect a simple two-letter substitution cipher at once. It will simplify the work if we divide the cipher into groups of two letters and then, if we find there are 26 or less recurring groups, to assign an arbitrary letter to each group and work out the cipher by the method of Case 6.
RN TG NR AA GR NA RN AG TG RA TG AA NN AN GG RA RA TN AA NR NN NR NA AA GG AA NG RN GG NN NR NA AA AN RA TN AN NN GG RN RN NR GT TG RG TG GR NA RN TG NN AR TG GR NR GR NN TG TG AA NN AR NA RN RT TG AG GG AA AA NA NN AR NA GA NG NA TN NN AT
With arbitrary letters substituted, we have
A B C D E F A G B H B D I J K H H L D C I C F D K D M A K I C F D J H L J I K A A C N B O B E F A B I P B E C E B B D I P F A Q B G K D D F I P F R M F L I S
Now, preparing a frequency table, with note of prefixes and suffixes we have:
| Frequency | Prefix | Suffix | ||
| A | 7 | 1111111 | FMKAFF | BGKACBQ |
| B | 10 | 1111111111 | AGHNOAPIBQ | CHDOEIEBDG |
| C | 6 | 111111 | BDIIAE | DIFFNE |
| D | 9 | 111111111 | CBLFKFBKD | EICKMJIDF |
| E | 4 | 1111 | DBBC | FFCI |
| F | 8 | 11111111 | ECCEPDPM | ADDAAIRL |
| G | 2 | 11 | AB | BK |
| H | 4 | 1111 | BKJH | BHLL |
| I | 9 | 111111111 | DCKJBEDFL | JCCKPBPP |
| J | 3 | 111 | IDL | KHI |
| K | 5 | 11111 | JDAIG | HDIAD |
| L | 3 | 111 | HHF | DJI |
| M | 2 | 11 | DR | AF |
| N | 1 | 1 | C | B |
| O | 1 | 1 | B | B |
| P | 3 | 111 | III | BFF |
| Q | 1 | 1 | A | B |
| R | 1 | 1 | F | M |
| S | 1 | 1 | I | |
A brief study of this table and the distribution in the cipher leads to the conclusion that B, F and C are certainly vowels and are, if the normal frequency holds, equal to E, O, and A or I. Similarly D and I are consonants and we may take them as N and T. I is taken as T because of the combination IP (=possibly TH) occurring three times. The next letter in order of frequency is A; it is certainly a consonant and may be taken as R on the basis of its frequency. Let us now try these assumptions on the first two lines of the message. We have
| R | E | A | N | _ | O | R | _ | E | _ | E | N | T | _ | _ | _ | _ | _ | N | A | T | A | O | N | _ | N | _ |
| I | I | I |
This is clearly the word REINFORCEMENTS and, using the letters thus found, the rest of the line becomes AMMUNITIONAND. We have then the following letters determined:
| Arbitrary letters | A | B | C | D | E | F | G | H | I | J | K | L | M |
| Plain Text | R | E | I | N | F | O | C | M | T | S | A | U | D |
If these be substituted we have for the message:
REINFORCEMENTS AMMUNITION AND RATIONS MUST ARRI_E _EFORE T_E FIFTEENT_ OR _E CANNOT _O_D OUT_.
From this the remainder of the letters are determined:
| Arbitrary letters | N | O | P | Q | R | S |
| Plain text | V | B | H | W | L | X |
Now let us substitute the two-letter groups for the arbitrary letters:
| Arbitrary letters | K | O | G | M | B | E | P | C | R | H | D | F | A | J | I | L | N | Q | S |
| Two-letter groups | GG | RG | AG | NG | TG | GR | AR | NR | GA | RA | AA | NA | RN | AN | NN | TN | GT | RT | AT |
| Plain text | A | B | C | D | E | F | H | I | L | M | N | O | R | S | T | U | V | W | X |
It is evident that the cipher was prepared with the letters of the word GRANT chosen by means of a square of this kind:
| G | R | A | N | T | |
| G | A | B | C | D | E |
| R | F | G | H | I | K |
| A | L | M | N | O | P |
| N | Q | R | S | T | U |
| T | V | W | X | Y | Z |
Thus TG=E, AN=S, etc., as we have already found.
Case 9-b
Message
| 1950492958 | 3123252815 | 4418452815 | 2048115041 |
| 2252115345 | 5849134124 | 5028552526 | 5933195222 |
| 5245113215 | 6215584143 | 2861361265 | 2945565015 |
| 2342455850 | 6345542019 | 1550185311 | 2115415828 |
| 1124174553 | 4554205950 | 2552454132 | 1533492048 |
| 5018152364 |
An examination of the groups of two numerals each which make up this message, shows that we have 11 to 36 and 41 to 65 with eleven groups missing. Now the 11 to 36 combination is a very familiar one in numeral substitution ciphers (See Case 6-c) and it will be noted that 41 to 66 would give us a similar alphabet. Let us make a frequency table in this form:
| Group | Frequency | Group | Frequency |
| 11 | 11111 | 41 | 11111 |
| 12 | 1 | 42 | 1 |
| 13 | 1 | 43 | 1 |
| 14 | 44 | 1 | |
| 15 | 111111111 | 45 | 111111111 |
| 16 | 46 | ||
| 17 | 1 | 47 | |
| 18 | 111 | 48 | 11 |
| 19 | 111 | 49 | 111 |
| 20 | 1111 | 50 | 11111111 |
| 21 | 1 | 51 | |
| 22 | 11 | 52 | 1111 |
| 23 | 111 | 53 | 111 |
| 24 | 11 | 54 | 11 |
| 25 | 111 | 55 | 1 |
| 26 | 1 | 56 | 1 |
| 27 | 57 | ||
| 28 | 11111 | 58 | 11111 |
| 29 | 11 | 59 | 11 |
| 30 | 60 | ||
| 31 | 1 | 61 | 1 |
| 32 | 11 | 62 | 1 |
| 33 | 11 | 63 | 1 |
| 34 | 64 | 1 | |
| 35 | 65 | 1 | |
| 36 | 1 | 66 |
Each of these tables looks like the normal frequency table except for the position of 20 and 50 which should represent T, by all our rules, and should be apparently 30 and 60. But suppose we put the alphabet and corresponding numerals in this form:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | |
| 1 or 4 | A | B | C | D | E | F | G | H | I | J |
| 2 or 5 | K | L | M | N | O | P | Q | R | S | T |
| 3 or 6 | U | V | W | X | Y | Z |
Then A=11 or 41, J=10 or 40 and T=20 or 50 as we found. Using the above alphabet, the message may easily be read. Note that this cipher is made up of ten characters only, the Arabic numerals.
Case 9c—
Message
| 1156254676 | 2542294432 | 1949294015 | 1423217211 | 2979703115 |
| 4924213511 | 7424147875 | 7646252444 | 5143254845 | 3179742533 |
| 4055461512 | 7573227945 | 1627481511 | 7042351944 | 1378252149 |
| 2514764553 | 1548342126 | 7215254075 | 1611257845 | 4642217415 |
| 4952197929 | 7015242143 | 2925444933 | 1970187531 | 4079254829 |
| 4551491411 | 7321171554 |
An examination of this message shows it to consist of forty-four different two-figure groups running from 11 to 79. Let us prepare a frequency table of these groups.
| Group | Frequency |
| 11 | 111111 |
| 12 | 1 |
| 13 | 1 |
| 14 | 1111 |
| 15 | 111111111 |
| 16 | 11 |
| 17 | 1 |
| 18 | 1 |
| 19 | 1111 |
| 20 | |
| 21 | 1111111 |
| 22 | 1 |
| 23 | 1 |
| 24 | 1111 |
| 25 | 11111111111 |
| 26 | 1 |
| 27 | 1 |
| 28 | |
| 29 | 111111 |
| 30 | |
| 31 | 111 |
| 32 | 1 |
| 33 | 11 |
| 34 | 1 |
| 35 | 11 |
| 36 | |
| 37 | |
| 38 | |
| 39 | |
| 40 | 1111 |
| 41 | |
| 42 | 111 |
| 43 | 11 |
| 44 | 1111 |
| 45 | 11111 |
| 46 | 1111 |
| 47 | |
| 48 | 1111 |
| 49 | 111111 |
| 50 | |
| 51 | 11 |
| 52 | 1 |
| 53 | 1 |
| 54 | 1 |
| 55 | 1 |
| 56 | 1 |
| 57 | |
| 58 | |
| 59 | |
| 70 | 1111 |
| 71 | |
| 72 | 11 |
| 73 | 11 |
| 74 | 111 |
| 75 | 1111 |
| 76 | 111 |
| 77 | |
| 78 | 111 |
| 79 | 11111 |
We at once note the resemblance between the frequency tables for the groups 11 to 19 and 21 to 29; for the groups 30 to 36 and 50 to 56; and for the groups 40 to 49 and 70 to 79. Also the groups 11 to 19 and 21 to 29 have a frequency fitting well with the normal frequency table of the letters A to I; the groups 41 to 49 and 71 to 79 have a frequency fitting well with the normal frequency table of the letters K to S; and the groups 31 to 36 and 51 to 56 have a frequency fitting well with the normal frequency table of the letters U to Z. We have J and T unaccounted for, but note what occurred in Case 9-b and that 40 and 70 would correspond well with T if they followed respectively 49 and 79. We may now make up a cipher table as follows:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | |
| 1 or 2 | A | B | C | D | E | F | G | H | I | J |
| 4 or 7 | K | L | M | N | O | P | Q | R | S | T |
| 3 or 5 | U | V | W | X | Y | Z |
and this table will solve the cipher message.
In ciphers coming under case 9-b and 9-c, it is not uncommon to assign some of the unused numbers such as 85, 93, etc., to whole words in common use or to names of persons or places. In case such groups are found, the meaning must be guessed at from the context; but if many messages in the same cipher are available, the meaning of these groups will soon be obtained. The appearance of such odd groups of figures in a message does not interfere materially with the analysis, and it will be apparent at once on deciphering the message that they represent whole words instead of letters.
Chapter IX
Other Substitution Methods
The foregoing cases by no means exhaust the possibilities of the substitution cipher but they cover practically all methods which are satisfactory for military purposes, having in mind conservation of time, the minimizing of mental strain, and the requirements that complicated apparatus and rules be avoided, and that the resulting cipher should be adapted to telegraphic correspondence.
A message may be re-enciphered two or more times using a different key word each time or it may be enciphered by one method and re-enciphered by another method, using the same or a different key word. Complicated cipher systems requiring the memorizing of, or reference to, numerous rules have been devised for special purposes. Such systems usually fail utterly if there are any errors in transmission and it will be seen later that such errors are very common.
There are several ingenious cipher machines by which complicated ciphers can be formed, but if the apparatus is available and fairly long messages are at hand for examination, it is usually possible to solve them. Such machines are not, as a rule, simple and small enough for field use; and it must always be remembered that a machine cipher operates on certain mechanical cycles, which can be determined if the machine is available.
A book by Commandant Bazeries, entitled “Etude sur la Cryptographie Militaire,” and a series of articles by A. Collon, entitled “Etude sur la Cryptographie,” which appeared in the Revue de L’Armée Belge, 1899–1902, give illustrations and details of operation of several of these cipher machines and the latter goes into the methods of deciphering messages enciphered with them. These methods of analysis require long messages, and as each one is adapted only to the product of a certain machine or apparatus, it is not considered advisable to include a discussion of them here. Those interested in such advanced cipher work must refer to these and other European authors on the subject.
The requirement that cipher messages should be adapted to telegraphic transmission, practically excludes ciphers in which three or more letters or whole words are substituted for each letter of the plain text. Such ciphers might be used for the transmission of very short messages but in no other case.
The cipher of Case 7, with a key word or phrase longer than one-fourth of the message, the cipher after the method of Case 7, using a certain page of a book as a key, and the cipher with a running key, where each letter of the cipher is the key for enciphering the next letter, all look safe and desirable, theoretically, but, practically, the work of enciphering and deciphering is hopelessly slow, and errors in enciphering or transmission make deciphering very difficult. Incidentally the first and second of these ciphers can be solved by the special solution for Case 7, and the third can be solved by trying each of the twenty-six letters of the alphabet as the first key letter, and then continuing the work for five or six letters of the cipher. When the proper primary key letter is found, the solution of the next five or six letters of the cipher will make sense, and thereafter the cipher offers no difficulty.
There are numerous other methods of preparing what is virtually a very long, or even an indefinitely long key from a short key word, but all such cipher methods have the same practical disadvantages of slowness of operation and difficulty in deciphering, if errors of enciphering or transmission have been made.
The ciphers of Napoleon were long series of numbers representing letters, syllables and words. They were really codes; and a code based on these principles, but using letters instead of numerals, might be evolved very easily. The War Department Code, the Western Union Code, and, in fact, all codes are nothing but specialized substitution ciphers in which each code word represents a letter, word or phrase of the plain text.