David Kahn's notes on the personalities of letters

advertisement
David Kahn’s notes on the personalities of letters
“Cryptanalysis rests upon the fact that the letters of language have ‘personalities’ of their
own. To the casual observer, they may look as alike as troops lined up for inspection, but
just as the sergeant knows his men as ‘the goldbrick,’ ‘the kid,’ ‘the reliable soldier,’ so
the cryptanalyst knows the letters of the alphabet.” (99)
“The cryptanalyst would begin by counting each letter’s frequency (how often it occurs
in the text) and its contacts (which letters it touches, and how many different ones).” (99)
“But whereas the relative frequencies may shift slightly, making, say, i more frequent
than a in a particular case, the letters generally do not stray very far from their home
areas in the frequency table. Thus, e, t, a, o, n, i, r, s, and h will normally be found in the
high-frequency group; d, l, u, c, and m in the medium-frequency group; p, f, y, w, g, b,
and v in the lows, and j, k, q, x, and z in the rare group. Furthermore, a sharp break in
frequency usually sets off the highs from the mediums; the lowest of the highs, h, is
normally 6 per cent, while the highest of the mediums, d, is only 4 per cent.” (100)
“Every letter has a cluster of preferred associations that constitute its most distinguishing
characteristic. The cryptanalyst can spot these almost by eye if he sets up a contact chart
for the high-frequency cipher letters … In the chart, the letter being counted stands at the
left, with the other letters strung out in order of frequency in a line to the right. Each tally
above a letter in the line means that the letter in that line has preceded the subject letter in
one instance, while each tally below means that it has followed the subject letter.” (100)
“In a chart like this, plaintext e is about as hard to recognize under its cipher masquerade
as a six-and-a-half-foot-tall man at a costume party. It is president of this republic of
letters because it leads all the rest in frequency, yet it is democratic enough to contact
more different letters more often than any other letter, including a goodly number in the
low-frequency bracket.” (101)
“Next most distinctive are the three high-frequency vowels, a, i, and o. Like rival
dowagers at a society ball, they avoid one another as much as possible. … Which is
which can often be ascertained by the fact that the plaintext digraph io is fairly frequent
while the other five combinations (oi, ia, ai, oa, ao) are fairly rare.” (101)
“What about consonants? The easiest to spot is plaintext n because four fifths of the
letters that precede it are vowels.” (101)
“… plaintext h. The digraph he is one of the most common in English, while eh is very
rare; th is the most common of all, and ht is also fairly rare. … In telegraphic texts where
the is deleted, plaintext h can usually be spotted because—just the opposite of n—it
precedes vowels about ten times as often it follows them.” (102)
“The only two high-frequency letters remaining to be identified are r and s. The basic
difference between them is that r, rather like a social climber, associates much more with
the vowels—dowagers a, i, and o as well as President e—than does s, while s, a
proletarian at heart, mingles with the consonants, the blue-collar laborers of the
alphabet.” (102)
“Naturally, in shorter cryptograms, solutions do not run quite as smoothly as the longer
ones that allow the statistics of language enough play to become reliable. For these more
difficult problems, expert solvers offer novices two tips: (1) make contact charts: the
drudgery usually pays off in faster and more accurate identifications; (2) when stumped,
and no likely plaintext values are visible, try something and see where it leads; even if it
proves wrong, it has narrowed down the possibilities. No cryptogram was ever solved by
staring at it.” (104-105)
David Kahn, The Codebreakers: the Story of Secret Writing, Macmillan, New York,
1967.
Download