David Kahn’s notes on the personalities of letters “Cryptanalysis rests upon the fact that the letters of language have ‘personalities’ of their own. To the casual observer, they may look as alike as troops lined up for inspection, but just as the sergeant knows his men as ‘the goldbrick,’ ‘the kid,’ ‘the reliable soldier,’ so the cryptanalyst knows the letters of the alphabet.” (99) “The cryptanalyst would begin by counting each letter’s frequency (how often it occurs in the text) and its contacts (which letters it touches, and how many different ones).” (99) “But whereas the relative frequencies may shift slightly, making, say, i more frequent than a in a particular case, the letters generally do not stray very far from their home areas in the frequency table. Thus, e, t, a, o, n, i, r, s, and h will normally be found in the high-frequency group; d, l, u, c, and m in the medium-frequency group; p, f, y, w, g, b, and v in the lows, and j, k, q, x, and z in the rare group. Furthermore, a sharp break in frequency usually sets off the highs from the mediums; the lowest of the highs, h, is normally 6 per cent, while the highest of the mediums, d, is only 4 per cent.” (100) “Every letter has a cluster of preferred associations that constitute its most distinguishing characteristic. The cryptanalyst can spot these almost by eye if he sets up a contact chart for the high-frequency cipher letters … In the chart, the letter being counted stands at the left, with the other letters strung out in order of frequency in a line to the right. Each tally above a letter in the line means that the letter in that line has preceded the subject letter in one instance, while each tally below means that it has followed the subject letter.” (100) “In a chart like this, plaintext e is about as hard to recognize under its cipher masquerade as a six-and-a-half-foot-tall man at a costume party. It is president of this republic of letters because it leads all the rest in frequency, yet it is democratic enough to contact more different letters more often than any other letter, including a goodly number in the low-frequency bracket.” (101) “Next most distinctive are the three high-frequency vowels, a, i, and o. Like rival dowagers at a society ball, they avoid one another as much as possible. … Which is which can often be ascertained by the fact that the plaintext digraph io is fairly frequent while the other five combinations (oi, ia, ai, oa, ao) are fairly rare.” (101) “What about consonants? The easiest to spot is plaintext n because four fifths of the letters that precede it are vowels.” (101) “… plaintext h. The digraph he is one of the most common in English, while eh is very rare; th is the most common of all, and ht is also fairly rare. … In telegraphic texts where the is deleted, plaintext h can usually be spotted because—just the opposite of n—it precedes vowels about ten times as often it follows them.” (102) “The only two high-frequency letters remaining to be identified are r and s. The basic difference between them is that r, rather like a social climber, associates much more with the vowels—dowagers a, i, and o as well as President e—than does s, while s, a proletarian at heart, mingles with the consonants, the blue-collar laborers of the alphabet.” (102) “Naturally, in shorter cryptograms, solutions do not run quite as smoothly as the longer ones that allow the statistics of language enough play to become reliable. For these more difficult problems, expert solvers offer novices two tips: (1) make contact charts: the drudgery usually pays off in faster and more accurate identifications; (2) when stumped, and no likely plaintext values are visible, try something and see where it leads; even if it proves wrong, it has narrowed down the possibilities. No cryptogram was ever solved by staring at it.” (104-105) David Kahn, The Codebreakers: the Story of Secret Writing, Macmillan, New York, 1967.