If the probable word is a pattern word, so much the better; but every word carries a pattern in the normal frequencies of its letters. For instance, the word CIPHER, considered in relation to a text of 100 letters, has, roughly, the frequency-pattern 3-7-2-5-12-6; or, considered in relation to a 200-letter text, a pattern which is approximately double the first: 6-14-4-10-24-12. A cryptogram supposed to contain this word may be prepared as recommended in [Chapter IX], with a frequency-figure written above each letter. The frequency-pattern of the word CIPHER, based on approximately the same amount of text, may then be written on a slip of paper and passed along below the frequency-figures shown for cryptogram-letters, in the hope of finding points at which the two sets of figures are, to some extent, alike. Wherever such points can be found, the suspected word can be assumed to be present there. So long as the method remains that of simple substitution, any substitutes which can be found in this way can have no other originals than those first determined; thus, their substitution throughout the cryptogram will serve to bring out other possibilities.
For the multiple-substitute cases, that is, those cases in which all or part of the letters may have more than one substitute, the frequencies of such letters as I, H, E, R, may be left blank (or cut in half, dependent upon just what the cipher is), and only the frequencies of C and P, standing two positions apart, need be considered. Particularly helpful, in this case, would be a probable word such as CRYPTOGRAM, in which five infrequent letters are standing at known distances apart. The frequency-pattern of this word, based on 100, can be expressed roughly as 3 - 2 2 - - 2 - - 2, and the attempt made to find points in the cryptogram at which five letters of somewhat these frequencies are standing at the given intervals apart. The foregoing is based on the supposition that while the encipherer, having several substitutes per letter, will be able to conceal the true frequencies of his high-frequency letters, there is not much that he can do toward concealing his low frequencies. He can, of course, produce any frequencies that he likes by swamping his text with nulls; and this, in the hands of a clever operator, can be very effective, especially if the circumstances are such that he can keep his method a secret. But for the average practical purpose, the time consumed in the encipherment, and the increased length of the cryptograms, are highly undesirable features, especially if it be kept in mind that there are many other ciphers than simple substitution. As to attack by analytical methods, the one device which is more likely than any other to prove applicable in all cases is the preparation of a digram count of exactly the kind we saw in Fig. 68. Such a chart will afford the means for studying carefully the contacts of any given letter; just what its variety seems to be; whether or not this seems disproportionate to its apparent frequency; whether or not it shows a tendency to touch letters of lower frequency, or to be present in reversals; and so on.
Many of these ciphers, however, make use of two letters to represent one. With these, it is the single-letter frequency count which is best made on a chart. That is, the cryptogram is first marked off into its pairs, and these pairs are counted in the same way as that described for digrams. But digrams, in this case, will be represented by four letters, and usually the number of different pairs is so large that the examination of digrams will have to be done by listing. For any cipher whatever in which the substitutes are two-digit numbers, a frequency count taken in chart form is usually far more convenient than one made by listing the numbers in advance. With only the ten digits, the 100 cells can be made larger than the 676 cells needed for letters, and the chart still be small and compact. The pairs of digits would be counted in exactly the same way as so many digrams. With numbers, it is sometimes possible to take the subsequent digram count, also, on a chart. Solution, in many cases, involves pure guess-work. The decryptor, perhaps, has begun his examination by testing his cryptogram for some variation of the “Caesar” encipherment. He has counted the first hundred or so of his letters, and has discovered that his frequency count is not going to be that of an ordinary simple substitution; that is, it is evidently not going to be one which he would be able to mark off into sections of high, medium, and low frequencies (usually with several letters missing), which would certainly be the case had each plaintext letter been replaced always with a given substitute throughout the cryptogram. Perhaps he has then marked his cryptogram into pairs of numbers or letters, and finds that these, also, are not likely to furnish the kind of frequency count which betrays simple substitution or some other cipher with which he is familiar. At this point, he is likely to pause and consider the source of the cryptogram. Is this the work of an expert, or the work of an amateur? Is it worthwhile to make up the statistics? Or shall I try for some one of the novelties which I have met many times before?
One device which is particularly popular with amateurs is that of assigning to each letter the numerical value which represents its serial position in the normal (or reversed) alphabet, A having the value 1, B the value 2, and so on, and afterward representing each plaintext letter with two (or more) others which will express some arithmetical process. For instance, the letter C (value 3) might, in some one of these systems, have the substitute AB (1 plus 2), or the substitute DA (4 minus 1), or the substitute YD (25 plus 4 equals 29; and 29 minus 26 equals 3); and so on to infinity.
Other simple devices, hardly worth calling ciphers, which have been used in the columns of The Cryptogram under the title “Simple Substitution with Frills,” have included: (1) The use of false word divisions. (2) The simple reversal of an otherwise unmanipulated cryptogram. (3) The use of two given digrams, placed alternately at the ends of words. (4) The use of a new cipher alphabet for each new sentence. The first of these, of course, should have been suspected after examination of the apparent terminal letters. The second, theoretically, ought to be spotted if the method of solution includes a close investigation of digrams. As to the third device, any two digrams, used in the manner described, will attain impossible percentages; our leading digram, TH, in normal text, remains fairly close to three or four percent. It was the fourth device, however, which caused the greatest consternation among the younger solvers; in this case, the making of the frequency count will show what the trouble is: It begins very well, with the expected resemblance to a normal count, and suddenly begins to grow erratic.
Not every variation encountered in dealing with simple substitution is employed with the deliberate intention of creating difficulties. Those correspondents, for instance, who select some one letter, as X, and place it after each word as a word-separator, do so because they find it difficult to read their texts unless the word-divisions are present. As to whether or not this device does actually create difficulties: The person who is content to make use of simple substitution as his means of secret communication, is not usually inspired to employ more than one such letter. The length of an English word being somewhat shorter than five letters, any single letter placed religiously after each word will attain a frequency (based on the new length) of not less than 18%, where the letter E, at its very maximum, can rarely attain 15%. The decryptor, taking his preliminary frequency count, quickly discovers this one letter of enormous frequency. He might suspect German, or even French, and look for other characteristics of those languages. But having reason to believe that the language is English, he recognizes this letter instantly for what it is; he first makes sure that it is distributed throughout the cryptogram at an average interval of five or six letters, then calmly circles it out and deals with a case of word-divisions.
| Figure 76 "Alphabet" for Encipherment of Numbers "Plaintext" .. 1 2 3 4 5 6 7 8 9 0 "CIPHER" ... A B C D E F G H I J Text ready for encipherment: WE HAVE WCBEW BALES. |
Considering something of a more practical nature, there is another very common device, used with every conceivable kind of cipher, which is not in the least intended for the purpose of creating difficulties, yet invariably does in short cryptograms. The ordinary practice, when dealing with numbers, necessary punctuation marks, and so on, is to write these out in words: three hundred twenty five; quote; dollars. But where a given correspondence is likely to involve a great many of these, so that the ordinary practice is very wasteful, the encipherer is nearly always provided with a little “cipher alphabet” of the general kind indicated in Fig. 76, in which the ten digits, any desired punctuation marks, and any other needed symbols ($, %, @) have each a single substitute. In the “alphabet” of the figure, the number 325 will be enciphered CBE. But if this enciphered group CBE is always to be cleanly distinguishable from the rest of the text, a means must be found for making this distinction, and this is usually done by reserving some one letter to act solely as an indicator and never using this letter for any other purpose. This indicator-letter, as W, may then be placed at the beginning and end of the enciphered group CBE, and the resulting group, WCBEW, may be placed in the plaintext message, ready to receive whatever kind of encipherment is given to the rest of the letters. These groups, used in short cryptograms, can give about the same amount of trouble as would so many nulls. But where cryptograms are longer, with a great many such groups, the decryptor invariably spots them by means of the recurrent indicator. Sometimes one letter is used, and sometimes two (W. . .W, or K. . .W); but in either case, the indicator always appears as a pair of correlatives, and wherever the first of the pair is found, its companion is never far away. Some provision must, of course, be made for replacing the indicator letter in the plaintext alphabet. In English, we ordinarily select J for any such omission; this is a letter which is rarely used, and, on those scattered occasions when it does occur, it can be replaced with I. Among the Latins, it is commoner to make use of K and W; these two letters are not used at all in their native languages, and can be replaced, respectively, with Q and VV. It is also possible to omit X, replacing it with KS, or V, replacing it with U. The fact that it is possible to shorten the message alphabet without appreciably impairing the clearness of its messages has given rise to what is probably the most practical of the simple substitution variations: two or three letters, as J, K, V, are omitted from the plaintext alphabet, while the cipher alphabet retains its full 26, and in this way some extra substitutes are provided which can be given to the more frequent letters. It is possible to dispense with as many as five letters, replacing J, K, X, V, W with I, Q, QS, U, UU, and assign the extra substitutes to E, T, A, O, N. Fig. 77 illustrates an alphabet of this kind. Here, the letters I J are to have the same substitute, and the letters K Q are to have the same substitute. This releases two extra substitutes which may be given to E and T.
| Figure 77 j q Plaintext: a b c d e f g h i k l m n o p r s t u v w x y z E T CIPHER: C U L P E R Z Y X W V T S Q O N M K J I H G F D B A Encipherment: w e m u s t h a v e b e t t e r c o v e r a g e ... H E T J M K Y C I B U E A K B N L Q I E N C Z B ... |
The foregoing is one of those cases in which the decryptor can learn a great deal by taking his frequency count in the form of a digram chart. And he knows, of course, that his cryptogram contains some two letters whose combined frequencies will reproduce the frequency of E, or of T.