Among methods which do not seem indicated in the given example, there is a very fertile field for research in the examination of terminal sequences. When two or more of the affixes -tion, -ing, in-, and con- are present in the same text, as they practically always are, they will serve to identify one another, and may, in addition, be cross-compared with many of the short words, as in, on, no, not, into, upon, can. The prefix sub- may serve to identify the word but. There is a whole group -ment, -ence, -ance, -ency, -ancy; another group pre-, re-, -er, de-, -ed, etc.; or a good comparison in be-, -able, -ible, etc.

Still a third road to solution, especially popular with those who solve the “aristocrats,” is found in pattern words, that is, words having one or more letters repeated. The puzzler, examining a dictionary, prepares lists for his permanent use, one list for each “pattern”; such a list, for instance, would contain PATTERN, FALLING and all other words in which the third and fourth letters are the same and all others different, another would contain all words having the pattern STATE, DEFER, ROBOT, still another all words of the pattern BANANA, ROCOCO, and so on. The solver, having thus armed himself in advance, begins work by searching his cryptogram for words having repeated cipher letters, and attempts to identify these from the proper lists. He may provide himself, also, with non-pattern lists, on which words have given lengths but contain no repeated letters; and with “transposal lists” containing pairs of words (as NIGHT and THING) which use the same letters but not in the same order. It is true that such lists are troublesome to prepare, but they are extremely effective; they will break the most resistant of the “aristocrats” or the shortest example of legitimate cipher.

No matter how resistant the cryptogram, all that is really needed is an entry, the identification of one word, or of three or four letters. The experienced solver knows well that persistence will find this entry, and trusts largely to instinct and perseverance; the beginner, however, may feel at a loss for a “system,” and, if so, may, perhaps, be able to find suggestions for one in the next few paragraphs.

Figure 63
A Favorite Form of Frequency Count Combined With CONTACT Data
A D S 2/4
R W B
C
D F R M Z W W V T Z R 10/11
R M S Y A F R F Y N E
F * Y D D Y 5/6
D Z V S R G X 1/2
S (Etc.)
Concerning the numbers: A has a
frequency of 2, and a variety-count
of 4. D has a frequency of 10, and
a variety-count of only 11. (Yet
D, with so little variety of contact
is a vowel!)

First of all, in any substitution problem, there should be a counting of the letters in the cryptogram in order to find out their frequencies. This is called a frequency count, and is usually accomplished as follows: The decryptor first lays out the normal alphabet — either horizontally or vertically. He then begins with the first letter of his cryptogram, taking letters one by one just as he finds them, and for each time that he finds a letter in his cryptogram, he places a tally mark beside that same letter as found in his prepared alphabet. The result of such a count, taken on the foregoing cryptogram, will be shown further on, when the same cryptogram appears again without its word-divisions.

If the problem seems likely to prove really difficult, there should also be a contact count; that is, a list showing every letter, together with the two which have flanked it right and left each time it was used. Such a count is partly shown in Fig. 63. This, like the frequency count, may be prepared either vertically or horizontally; and, just as in making ready for the frequency count, an alphabet may be laid out in advance ready to receive the contact letters, taken from the cryptogram as they happen to be found. Specifically: The letter F comes first in the cryptogram; it has no left-hand contact, but is contacted on the right by D. We find the F of the prepared alphabet, and place beside it its contacts: *-D. The second letter of the cryptogram is D, flanked by F and R. We find the D of the prepared alphabet, and place beside it its contacts: F-R; and so on to the end of the cryptogram. Some solvers do not prepare an alphabet in advance, but simply put down the main letters as they happen to come across them in the cryptogram. It should be added, too, that the few contacts included in Fig. 63 were taken from the undivided cryptogram. When word-divisions exist, and are known to be the correct ones, a great many solvers do not include any contacts which involve two different words. Here, for instance, the second appearance of D is shown with contacts R-M. These solvers, knowing that this D stands at the end of a word, will leave the M-contact blank: R-*

It will be noticed from the figure that the contact-count is, in itself, a frequency count; it shows that A has been used twice (frequency 2), that B and C have not been used at all, that D has a frequency of 10, and so on. We may also make it a variety-count, by noting down beside each letter the number of different letters present among its contacts. Ordinarily, the vowels have more variety in their contacts than do the consonants, and take part in more reversals. The uses of contact data will be examined more closely later on.

Now, giving our attention to English frequencies: No matter what frequency table we examine, we always find that the letter E tops the list, with a frequency of over 12%. Except in telegraphic text, the letter T always has the second frequency, near 10%. After that, the frequency tables will disagree as to whether A or O should have the third frequency, or whether I should come before N, or S before R; but always the same nine letters, E T A O N I R S H, will constitute the high-frequency group of letters. These particular letters will make up about 70% of any English text, and it is almost impossible to prepare one, no matter how short, without using them in about that proportion, though in the shorter texts, L and D will sometimes creep up into the high-frequency class, taking the place of H. Following the high-frequency group, we find a group of letters which are always of moderate frequency; and a third group made up of low-frequency letters. Since the frequency tables themselves are not duplicates throughout, we could not expect, even having a 10,000-letter cryptogram, to make substitutions by simply following the frequency table and be absolutely sure of coming out with the correct solution, though we might very nearly do so, and might, to some extent, succeed in doing this with a cryptogram of 2,000 letters. The “aristocrats,” however, are arbitrarily confined to lengths which run between 75 and 100 letters. Even without manipulation, a text of this length will not always show E as a frequent letter, and may, for some reason, show Z or X with a fairly high frequency.

However, the “class distinctions” among the letters are always, to some extent, dependable. High-frequency letters, moderate-frequency letters, and low-frequency letters, all tend to be very exclusive. They will exchange frequencies with letters of their own class, but all three classes are disinclined to welcome outsiders. The vowels, also, as we have seen, have their fraternity; if the frequency of E is lowered, some other vowel, even U and Y, will insist upon making up the difference, rather than yield this privilege to a consonant.

The high-frequency group, as mentioned, includes the nine letters E T A O N I R S H. Even in this exclusive circle, there are cliques — not ironclad, but clearly noticeable: