Data for Solution of Ciphers in English
Table I.—Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.
| 10,000 Letters | 200 Letters | ||
| A | 778 | 16 | 1111111111111111 |
| B | 141 | 3 | 111 |
| C | 296 | 6 | 111111 |
| D | 402 | 8 | 11111111 |
| E | 1277 | 26 | 11111111111111111111111111 |
| F | 197 | 4 | 1111 |
| G | 174 | 3 | 111 |
| H | 595 | 12 | 111111111111 |
| I | 667 | 13 | 1111111111111 |
| J | 51 | 1 | 1 |
| K | 74 | 2 | 11 |
| L | 372 | 7 | 1111111 |
| M | 288 | 6 | 111111 |
| N | 686 | 14 | 11111111111111 |
| O | 807 | 16 | 1111111111111111 |
| P | 223 | 4 | 1111 |
| Q | 8 | ||
| R | 651 | 13 | 1111111111111 |
| S | 622 | 12 | 111111111111 |
| T | 855 | 17 | 11111111111111111 |
| U | 308 | 6 | 111111 |
| V | 112 | 2 | 11 |
| W | 176 | 3 | 111 |
| X | 27 | ||
| Y | 196 | 4 | 1111 |
| Z | 17 | ||
Vowels AEIOU = 38.37%; consonants LNRST = 31.86%; consonants JKQXZ = 1.77%.
The vowels may be safely taken as 40%, consonants LNRST as 30% and consonants JKQXZ as 2%.
Order of letters: E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.
Table II.—Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word “the” is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.
| 10,000 Letters | 200 Letters | ||
| A | 813 | 16 | 1111111111111111 |
| B | 149 | 3 | 111 |
| C | 306 | 6 | 111111 |
| D | 417 | 8 | 11111111 |
| E | 1319 | 26 | 11111111111111111111111111 |
| F | 205 | 4 | 1111 |
| G | 201 | 4 | 1111 |
| H | 386 | 8 | 11111111 |
| I | 711 | 14 | 11111111111111 |
| J | 42 | 1 | 1 |
| K | 88 | 2 | 11 |
| L | 392 | 8 | 11111111 |
| M | 273 | 6 | 111111 |
| N | 718 | 14 | 11111111111111 |
| O | 844 | 17 | 11111111111111111 |
| P | 243 | 5 | 11111 |
| Q | 38 | 1 | 1 |
| R | 677 | 14 | 11111111111111 |
| S | 656 | 13 | 1111111111111 |
| T | 634 | 13 | 1111111111111 |
| U | 321 | 6 | 111111 |
| V | 136 | 3 | 111 |
| W | 166 | 3 | 111 |
| X | 51 | 1 | 1 |
| Y | 208 | 4 | 1111 |
| Z | 6 | ||
In this table the vowels AEIOU = 40.08%, consonants LNRST = 30.77% and consonants JKQXZ = 2.25%.
Orders of letters: E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.
Table III.—Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to find AH once and HA twenty-six times.
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
| A | 1 | 7 | 10 | 22 | 3 | 2 | 26 | 4 | 2 | 2 | 7 | 8 | 11 | 2 | 9 | 13 | 12 | 9 | 2 | 4 | 1 | 12 | ||||
| B | 5 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 3 | 1 | ||||||||||||||
| C | 6 | 1 | 1 | 14 | 2 | 11 | 11 | 3 | 2 | 3 | 1 | 1 | 1 | 1 | ||||||||||||
| D | 6 | 12 | 30 | 1 | 2 | 4 | 30 | 1 | 4 | 1 | 1 | 1 | 1 | 3 | ||||||||||||
| E | 11 | 14 | 16 | 12 | 2 | 6 | 33 | 10 | 2 | 6 | 18 | 14 | 12 | 1 | 7 | 36 | 11 | 12 | 2 | 16 | 5 | 1 | 1 | |||
| F | 3 | 2 | 8 | 2 | 1 | 2 | 2 | 1 | 3 | 25 | 3 | 1 | 1 | 1 | ||||||||||||
| G | 4 | 1 | 3 | 2 | 11 | 2 | 3 | 1 | ||||||||||||||||||
| H | 1 | 11 | 2 | 4 | 1 | 4 | 1 | 2 | 1 | 1 | 2 | 10 | 50 | 3 | 2 | |||||||||||
| I | 2 | 1 | 4 | 12 | 6 | 5 | 1 | 12 | 1 | 5 | 9 | 8 | 12 | 1 | 3 | 12 | 13 | 22 | 2 | 3 | 6 | 1 | 1 | |||
| J | 1 | |||||||||||||||||||||||||
| K | 1 | 1 | 2 | 2 | 1 | 1 | ||||||||||||||||||||
| L | 14 | 6 | 2 | 1 | 6 | 1 | 1 | 1 | 6 | 9 | 3 | 6 | 3 | 3 | 2 | 3 | 5 | |||||||||
| M | 7 | 3 | 13 | 2 | 2 | 3 | 4 | 1 | 10 | 4 | 1 | 1 | 2 | |||||||||||||
| N | 38 | 3 | 25 | 2 | 1 | 31 | 3 | 2 | 2 | 39 | 4 | 3 | 11 | 2 | ||||||||||||
| O | 1 | 1 | 12 | 4 | 8 | 8 | 3 | 12 | 18 | 2 | 4 | 7 | 8 | 3 | 7 | 13 | 15 | 22 | 2 | 6 | 1 | 5 | ||||
| P | 2 | 1 | 8 | 1 | 2 | 4 | 2 | 3 | 2 | 1 | 8 | 1 | 4 | 3 | 1 | |||||||||||
| Q | 2 | 1 | 1 | 1 | ||||||||||||||||||||||
| R | 16 | 1 | 3 | 3 | 40 | 3 | 6 | 2 | 6 | 1 | 2 | 1 | 25 | 8 | 2 | 2 | 8 | 11 | 2 | |||||||
| S | 16 | 1 | 3 | 25 | 1 | 2 | 17 | 1 | 2 | 1 | 12 | 7 | 2 | 9 | 11 | 6 | 11 | 1 | 6 | |||||||
| T | 25 | 1 | 3 | 12 | 13 | 5 | 2 | 3 | 20 | 2 | 1 | 24 | 8 | 2 | 16 | 20 | 11 | 6 | 2 | 2 | 7 | |||||
| U | 1 | 2 | 1 | 6 | 1 | 3 | 2 | 2 | 3 | 3 | 1 | 17 | 1 | 5 | 3 | 5 | 5 | 1 | ||||||||
| V | 3 | 1 | 5 | 5 | 3 | 2 | 5 | 1 | ||||||||||||||||||
| W | 1 | 2 | 8 | 1 | 1 | 1 | 1 | 2 | 4 | 2 | 3 | 3 | ||||||||||||||
| X | 1 | 4 | 2 | 1 | 1 | |||||||||||||||||||||
| Y | 3 | 2 | 2 | 4 | 1 | 1 | 8 | 1 | 2 | 1 | 3 | 1 | 7 | |||||||||||||
| Z | 1 | 1 | 1 | |||||||||||||||||||||||
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
Table IV.—Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).
| TH | 50 | AT | 25 | ST | 20 |
| ER | 40 | EN | 25 | IO | 18 |
| ON | 39 | ES | 25 | LE | 18 |
| AN | 38 | OF | 25 | IS | 17 |
| RE | 36 | OR | 25 | OU | 17 |
| HE | 33 | NT | 24 | AR | 16 |
| IN | 31 | EA | 22 | AS | 16 |
| ED | 30 | TI | 22 | DE | 16 |
| ND | 30 | TO | 22 | RT | 16 |
| HA | 26 | IT | 20 | VE | 16 |
Table V.—Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.
| THE | 89 | TIO | 33 | EDT | 27 |
| AND | 54 | FOR | 33 | TIS | 25 |
| THA | 47 | NDE | 31 | OFT | 23 |
| ENT | 39 | HAS | 28 | STH | 21 |
| ION | 36 | NCE | 27 | MEN | 20 |
Table VI.—Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, although Q, X and Z are rare as initials and B, I, J, Q, V, X and Z are rare as finals.
| Letters | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
| Initial | 9 | 6 | 6 | 5 | 2 | 4 | 2 | 3 | 3 | 1 | 1 | 2 | 4 | 2 | 10 | 2 | - | 4 | 5 | 17 | 2 | - | 7 | - | 3 | - |
| Final | 1 | - | - | 10 | 17 | 6 | 4 | 2 | - | - | 1 | 6 | 1 | 9 | 4 | 1 | - | 8 | 9 | 11 | 1 | - | 1 | - | 8 | - |
It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (including Y when a vowel) is as follows:
| A, | 19.5% | E, | 32.0% | I, | 16.7% |
| O, | 20.2% | U, | 8.0% | Y, | 3.6% |
The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.