Data for Solution of Ciphers in English

Table I.—Normal frequency table. Frequency for ten thousand letters and for two hundred letters. This latter is put in graphic form and is necessarily an approximation. Taken from military orders and reports, English text.

10,000 Letters200 Letters
A778161111111111111111
B1413111
C2966111111
D402811111111
E12772611111111111111111111111111
F19741111
G1743111
H59512111111111111
I667131111111111111
J5111
K74211
L37271111111
M2886111111
N6861411111111111111
O807161111111111111111
P22341111
Q8
R651131111111111111
S62212111111111111
T8551711111111111111111
U3086111111
V112211
W1763111
X27
Y19641111
Z17

Vowels AEIOU = 38.37%; consonants LNRST = 31.86%; consonants JKQXZ = 1.77%.

The vowels may be safely taken as 40%, consonants LNRST as 30% and consonants JKQXZ as 2%.

Order of letters: E T O A N I R S H D L U C M P F Y W G B V K J X Z Q.

Table II.—Frequency table for telegraph messages, English text. This table varies slightly from the standard frequency table because the common word “the” is rarely used in telegrams and there is a tendency to use longer and less common words in preparing telegraph messages.

10,000 Letters200 Letters
A813161111111111111111
B1493111
C3066111111
D417811111111
E13192611111111111111111111111111
F20541111
G20141111
H386811111111
I7111411111111111111
J4211
K88211
L392811111111
M2736111111
N7181411111111111111
O8441711111111111111111
P243511111
Q3811
R6771411111111111111
S656131111111111111
T634131111111111111
U3216111111
V1363111
W1663111
X5111
Y20841111
Z6

In this table the vowels AEIOU = 40.08%, consonants LNRST = 30.77% and consonants JKQXZ = 2.25%.

Orders of letters: E O A N I R S T D L H U C M P Y F G W B V K X J Q Z.

Table III.—Table of frequency of digraphs, duals or pairs (English). This table was prepared from 20,000 letters, but the figures shown are on the basis of 2,000 letters. For this reason they are, to a certain extent, approximate; that is, merely because no figures are shown for certain combinations, we should not assume that such combinations never occur but rather that they are rare. The letters in the horizontal line at the top and bottom are the leading letters; those in the vertical columns at the sides are the following letters. Thus in two thousand letters we may expect to find AH once and HA twenty-six times.

ABCDEFGHIJKLMNOPQRSTUVWXYZ
A17102232264227811291312924112
B512111122131
C61114211113231111
D61230124301411113
E111416122633102618141217361112216511
F328212213253111
G413211231
H111241412112105032
I21412651121598121312132223611
J1
K112211
L146216111693633235
M731322341104112
N3832521313223943112
O1112488312182478371315222615
P218124232181431
Q2111
R16133403626121258228112
S1613251217121127291161116
T25131213523202124821620116227
U1216132233117153551
V31553251
W128111124233
X14211
Y3224118121317
Z111
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Table IV.—Order of frequency of common pairs to be expected in a count of 2,000 letters of military or semi-military English text. (Based on a count of 20,000 letters).

TH50AT25ST20
ER40EN25IO18
ON39ES25LE18
AN38OF25IS17
RE36OR25OU17
HE33NT24AR16
IN31EA22AS16
ED30TI22DE16
ND30TO22RT16
HA26IT20VE16

Table V.—Table of recurrence of groups of three letters to be expected in a count of 10,000 letters of English text.

THE89TIO33EDT27
AND54FOR33TIS25
THA47NDE31OFT23
ENT39HAS28STH21
ION36NCE27MEN20

Table VI.—Table of frequency of occurrence of letters as initials and finals of English words. Based on a count of 4,000 words; this table gives the figures for an average 100 words and is necessarily an approximation, like Table III. English words are derived from so many sources that it is not impossible for any letter to occur as an initial or final of a word, although Q, X and Z are rare as initials and B, I, J, Q, V, X and Z are rare as finals.

LettersABCDEFGHIJKLMNOPQRSTUVWXYZ
Initial96652423311242102-45172-7-3-
Final1--1017642--161941-89111-1-8-

It is practically impossible to find five consecutive letters in an English text without a vowel and we may expect from one to three with two as the general average. In any twenty letters we may expect to find from 6 to 9 vowels with 8 as an average. Among themselves the relative frequency of occurrence of each of the vowels, (including Y when a vowel) is as follows:

A,19.5%E,32.0%I,16.7%
O,20.2%U,8.0%Y,3.6%

The foregoing tables give all the essential facts about the mechanism of the English language from the standpoint of the solution of ciphers. The use to be made of these tables will be evident when the solution of different types of ciphers is taken up.