Elementary cryptanalysis - Helen Fouché Gaines

colors, so as to avoid any risk of taking off portions of plaintext along with a cryptogram.

For decipherment, the plan is the same, except that the cryptogram is written first, and the two alphabets of the key exchange their functions. Often, when the cipher alphabet in use is so incoherent that its letters are not quickly found, the decipherer will prepare for himself a special decipherment key, in which he places the letters of his cipher alphabet in straight alphabetical order, and allows the plaintext alphabet to grow mixed.

In taking up the decryptment of simple substitution, we may dispose summarily of the Caesar alphabets by pointing to Fig. 61. If we suspect that one of these has been used, we may verify the suspicion by taking some ten-or-fifteen-letter segment of the cryptogram, and, with each of its letters as a beginning, extend the ten or fifteen alphabets, a few letters at a time, until we come to the line which is purely plaintext. This process is popularly known as “running down the alphabet,” and whenever it results in a row of plaintext, we may quickly determine the amount of “shift,” set up the cipher alphabet, and start deciphering.

The same thing is true of a pair of inverse normal alphabets which have merely been shifted with reference to each other. But in this case, the cryptogram (or that segment of it which is being investigated), must first be enciphered in the same kind of alphabet. To explain this, suppose that our cryptogram fragment is B Y K I L Y J O. If we encipher this with the pair of inverse alphabets which was shown at (b) of Fig. 59, we obtain a new cryptogram fragment Y B P R O B Q L. This new fragment is now a “Caesar,” and we may “run it down the alphabet” until we find its plaintext. This particular fragment was done with a pair of inverse normal alphabets in which the lower one began at C, instead of at Z. Most decryptors, in dealing with any kind of substitution, will make these two tests before trying anything else. When the guess proves correct, a great deal of paper work can be saved.

Concerning decryptment in the case of the less simple alphabets, the true vulnerability of simple substitution can be seen when the word “battalion,” enciphered in alphabet (f) of our Fig. 59, becomes T S B B S M Z E P. Since each letter of the alphabet may have only one substitute, the pattern of -atta- shows up clearly in its enciphered version -SBBS-. The decryptor knows instantly what kind of pattern it represents, since the letters S and B can have only one original each. The frequency with which these two letters have been used in his cryptogram will tell him approximately what their two originals ought to be, and, by making a few trials, he loses little time in arriving at a solution. As a matter of fact, a simple monoliteral substitution, given fewer than a hundred letters of text and no information whatever as to source or subject-matter, can be decrypted purely through the frequencies and other characteristics of its letters; and if, in addition, the original word-divisions have been preserved, we have the lengths and patterns of these words, plus the knowledge that individual letters have their favorite positions in words.

The “Crypt” with word-divisions. — Not infrequently, the cryptogram which retains its word-divisions can be read at sight, without putting pencil to paper, and this regardless of how short it may be. Again, even though based on normal text, it will prove more troublesome; and thus, in dealing with this type of simple substitution, we attack each individual example according to what appears at first glance to be its greatest weakness. The cryptogram shown in Fig. 62, for instance, would be attacked through its many short words, probably the simplest of the available methods. The words in question are those numbered 3 (RD), 4 (MD), 9 (QYR), 11 (RKV), 13 (DF), and 15 (DN). Among the two-letter words, it is noticeable that every one of these includes a letter D, used indiscriminately as the initial or final letter. We do not need to know much of cryptanalysis to guess that this letter represents the o found in such words as to, no, do, go, of, on, or. A comparison of the two three-letter words shows that these, also, have a common letter, R, which ends one of these words and begins the other. Of all words in English, the commonest is the. If RKV be assumed as the, then RD, already thought to contain o, will check as to, another extremely common word.

Thus we are able to begin work by tentatively assuming that the four cryptogram letters R, K, V, and D, are the substitutes, respectively, for plaintext letters t, h, e, and o. These assumptions are tested by actually making the necessary substitutions directly on the cryptogram, as seen at (a) of Fig. 62. And we may be sure that they are correct when we see the 12th word clearly outlined as other. This word gives a new substitution: cipher letter T evidently represents r, occurring in three different words; the actual making of this substitution will cause the 8th word to show a very common ending: -tter.

If we now consider the other three-letter word, the 9th of the cryptogram, we see that QYR cannot represent any one of the common words not, got, out, yet, since the substitutes for o and e have already been determined. It may, however, represent the common word but, especially if we care to investigate the frequency in the cryptogram of its first letter, Q. This letter has been used only once; and its assumed original, b, is normally of very low frequency, and, in addition, is known to have a fondness for initial positions. The assumption of this word as but gives us the substitute for u, which appears to be Y.

Figure 62
Making Substitutions
(a) 1 2 3 4 5 6 7
F D R J N U H V X X U R D M D S K V S O P J R K Z D Y F Z J X
o t e t o o h e t h o
8 9 10 11 12 13
G S R R V T Q Y R W D A R W D F V R K V D R K V T D F
t t e t o t o e t h e o t h e o
14 15 16 17
S Z Z D Y F R D N N V O V T S X S A W V Z R.
o t o e e e t
(b) ...D F S Z Z D Y F R D N... ...Z D Y F Z J X...
o n a c c o u n t o f c o u n c . .
13 14 15 7

In addition to the points mentioned, it is not unusual to find that short words, by their very positions with reference to some longer word, will identify a whole sequence, as might happen with the sequence shown at (b) of the same figure. Good examples of this are: as well as, as soon as, in order to, and so on. In this particular case of (b), we began with only the identified o, and immediately were able to identify t; this alone should serve for spotting the whole sequence on account of, taking into consideration the doubled c. Notice what the identification of the word account will do toward identifying the 7th word.