Singular Value Decomposition of Cryptograms

advertisement
Singular Value Analysis of Cryptograms
Author(s): Cleve Moler and Donald Morrison
Source: The American Mathematical Monthly, Vol. 90, No. 2 (Feb., 1983), pp. 78-87
Published by: Mathematical Association of America
Stable URL: http://www.jstor.org/stable/2975804 .
Accessed: 14/03/2011 13:52
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=maa. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to
The American Mathematical Monthly.
http://www.jstor.org
78
CLEVEMOLERAND DONALD MORRISON
[February
newsletter culminated in the creation of our very successful, very well-received newsletter,
FOCUS. Its editor, MarciaP. Sward,calls Ed the "Father of FOCUS".
Let me conclude by emphasizingother service to the Association. Ed was elected chairmanof
the Texas Section,but he left for Michiganbefore he took office. He was a visiting lecturerfor the
MAA, and served a five-yearterm as editor of the MathematicalNotes Section of the MONTHLY.
As chairmanof the Committeeon Publicationssince 1971,he guided the unprecedentedgrowthof
our journals and our series of books and monographswith skill, determination,and enthusiasm.
In all his activities, Ed Beckenbachenlisted the cooperation of his colleagues by his skill at
negotiation, his unfailing courtesy and considerationtoward others, and his common sense and
good humor. But Ed's cooperativeand accommodatingspirit at the committee table completely
disappeared in another of his roles. On the tennis court Ed was a hard contender, a tough
adversarywho showed 'em no mercy. Captain of the tennis team at Rice University back in the
twenties, and later the coach of the team, he spanned six decades with his favorite sport. Even
recently Ed and his wife Alice competed in national tournamentsof "superseniors";in more
peaceful moments they indulged their lifelong hobby, tending their hillside acre of plants ranging
from apricots to orchids. Incidentally, Alice surely holds a record for faithful attendance at
meetings of mathematicians,having started at the age of eleven as daughter of a one-time
presidentof the Association.
Ed Beckenbachclearly served the mathematicalcommunityvery well. We are all indebted to
him for his preeminentleadership.Ed Beckenbachis indeed a most worthy recipientof the Award
for DistinguishedServiceto Mathematics.
*
*
*
Edwin F. Beckenbachdied on September5, 1982.
SINGULAR VALUE ANALYSIS OF CRYPTOGRAMS
CLEVE MOLER AND DONALD MORRISON*
Departmentof ComputerScience, Universityof New Mexico, Albuquerque,NM 87131
1. Singular Value Analysis and Cryptanalysis.The singular value decomposition is a matrix
factorization which can produce approximationsto large arrays. Cryptanalysisis the task of
breaking coded messages. In this paper, we present an unusual merger of the two in which the
singularvalue decompositionmay aid the cryptanalystin discoveringvowels and consonants in
messagescoded in certainvariationsof simple substitutionciphers.
Texts in many languages, including English, have the property that vowels are frequently
followed by consonants, and consonants are frequently followed by vowels. We say a text is a
Cleve B. Moler received his Ph.D. at Stanford under George Forsythe. He has taught both mathematics and
computer science at Stanford, the University of Michigan, and the University of New Mexico. He recently succeeded
Morrison as Chairman of the Department of Computer Science at New Mexico. His academic and research interests
include: numerical analysis, mathematical software, and scientific computing.
Don Morrison earned his Ph.D. in mathematics at the University of Wisconsin in 1950, under C. C. MacDuffee.
He taught mathematics at Tulane, 1950-55, and computer science at the University of New Mexico, 1967 to the
present. He was a staff member, division supervisor and department manager at Sandia Corporation, 1955-71. He
has one wife, three children and numerous grandchildren. His present research interests include cryptology, data
structures, and design of algorithms.
*This work was partially supported by a grant from the National Science Foundation.
1983]
SINGULARVALUEANALYSISOF CRYPTOGRAMS
79
"vowel-follows-consonant"text, or more briefly, a "vfc" text if the proportion of vowels
following vowels is less than the proportionof vowels following consonants. That is, if
numberof vowel-vowelpairs < numberof consonant-vowel.pairs
numberof vowels
numberof consonants
Some languagesproduce better vfc texts than others and some deviations by individualletters
can be expected. In English text, the letter h is often preceded by other consonants to form a
single sound, as in ch, gh, ph, sh, and, especiallyth. The letters 1,n, m, and r are often followed by
consonants. Nevertheless,Englishis a predominantlyvfc language.In Hawaiian,every consonant
is followed by a vowel, so there are no consonant-consonantpairs. In the romajitransliterationof
Japanese,the only consonant-consonantpairs are ch, sh, ts, and n, followed by a consonant and
several double consonants. Russian has many sounds which, in transliteration, are of the
consonant-consonantor vowel-vowel form, such as sh, ch, ts, ya, and ye, but in the Cyrillic
alphabet, these are represented as single letters. Thus, Russian is also a predominantly vfc
language.
2. The Cryptanalyst'sProblem. A simple substitutioncryptogramis a coded message in which
the individualletters have been replacedby a permutationof themselves.When faced with such a
message, a cryptanalystmight first count the occurrencesof individualletters in the message and
compare the frequencieswith the known frequenciesin typical uncoded text. However, a fairly
long message is requiredbefore this techniquehas much chance of success.
Another aid in breakingthe code is a partitioningof the alphabet of the cryptograminto two
subsets, representingthe vowels and the consonants.In order that such a partitionbe plausible,it
ought to satisfy the vfc rule (1). This is the main task which we wish to considerhere-and is what
we call the cryptanalyst'sproblem. A trial and error solution, which tries all possible partitions
until one satisfying the vfc rule is found, is clearly prohibitive.
Let n be the numberof lettersin the alphabet.For any text, the digramfrequencymatrixis the
n-by-n arrayA with aij = the number of occurrencesof the ith letter followed by the jth letter.
Blanks and punctuation,if present, are ignored and the first letter of the text is assumedto follow
the last letter. In general,the matrixis not symmetric,but for each i
Eaij = Eaj1 =f
I
i
wherefi = the numberof occurrencesof the i th letter.
For any proposed partitioning of the alphabet into vowels and consonants, two column
vectors, v and c, can be defined by
vi = 1 if the ith letteris a vowel, 0 otherwise,
Ci = 1 if the i th letter is a consonant, 0 otherwise.
Note that v + c is a vector with all l's and that the inner product vTc is zero. Also note that the
value of the quadraticform vTAvis the numberof vowel-vowelpairs in the text.
Using A, v and c, the vfc rule (1) can be stated
vTAv
cTAv
vTA(v + c)
cTA(v + c)
Cross-multiplyingand cancelling the common term, we obtain
(2)
(vTAv)(cTAc) - (vTAc)(cTAv) < 0.
The cryptanalyst'sproblem, then, is: Given A, find a partitioningv and c so that (2) holds.
3. The SingularValue Decomposition. The singularvalue decomposition,or SVD, is a matrix
factorizationwhich numericalanalystsuse in a wide varietyof ways. Althoughits primaryuses are
80
CLEVEMOLERAND DONALD MORRISON
[February
in the analysis of systems of simultaneouslinear equations and in the computationof pseudoinverses, we will use it here to obtain "simple"approximationsto the digramfrequencymatrix. For
this purpose,we express the SVD as a sum of rank one matricesof the form
(3)
A
=
a1xXyT + a2x2yT + *-
+ anXnYnT
where
a1
xiTx=
a22
*
>
an > 0
uij (the Kroneckerdelta)
Yii
ij
The coefficients aj are known as the singularvalues and xj, and yj are the left and right singular
vectors, respectively.They can also be characterizedin terms of the solutions to the symmetric
eigenvalueproblem:
tT
o}
y}
It is not hard to see that the rank of a matrixis the numberof nonzero singularvalues. (In fact,
this is a particularlyuseful way to define rank.)
There is a fast, reliable algorithmfor computing the SVD [3], [6]. For a 26-by-26 matrix, the
computation of the singular values and correspondingleft and right vectors takes only a few
seconds on a medium speed modermcomputer.
The normalizationof the vectors xj and yj insures that all of the rank one matricesxjyT have
the same norm (that is, the sum of the squares of the elements). Consequently, the numerical
importanceof each term in the sum (3) can be measuredby the size of the coefficient aJ..If the a
decrease fairly rapidly as j increases, then an accurate approximationto A can be obtained by
truncatingthe series after only a few terms.Truncatingthe series after k nonzero terms providesa
rank k approximationto A. Such approximationsare related to those obtained by factor analysis
and have a wide varietyof applications.One unusualapplicationis to digital image processing[1].
4. Rank One Approximation.A rough approximationto a digram frequency matrix can be
obtained by taking only the first term in its SVD:
A
~A1 = a1xXlyT.
Let e be the vector of all l's and let f be the vector withf
letter in the text. Then Ae = ATe= f and so
a1 (yTe)xl
a-(-xaTe)x
e
=
fy
the numberof occurrencesof the i th
.
Consequently,if we were to assume that the digram frequency matrix were only rank one, we
would conclude that it was symmetric and that each row and column was proportional to the
frequencyvectorf.
Of course, in practice,A is not rank one. Nevertheless,the first left and right singularvectors
tend to be approximatelyequal and reflect the frequenciesof the letters in the text.
5. Rank Two Approximation.The rank two approximationobtained from the first two terms
of the SVD is the simplest approximationwhich takes into account the correlationbetween pairs
of letters in the text. The second left and right singularvectors contain the key to the solution of
the ciyptanalyst'sproblem.
Assume that A is approximatedby a matrix of rank two, so that
(4)
A
A2 = a xTyf + a2x2yT.
We propose to use the signs of the components of x2 and Y2 to partition the alphabet as follows:
1983]
SINGULARVALUEANALYSISOF CRYPTOGRAMS
{
i
I
0
0
()c
(
1
0
81
ifx2 > O andy,2 < O
otherwise,
f Xi2< 0 andyi2 > 0
otherwise,
if sign (Xi2) = sign (Yi2)
=
O otherwise.
The third category-the "neuter"letters-are the ones that cannot be classified as either vowels
or consonants. It turns out in practice that few letters fall into this category. In English text, for
example, the letter h is usually neuter. It might be possible to consider a finer partitioninvolving
"left vowel, right consonant" and so on, but we have not pursuedthis idea.
Using these definitions,we obtain a solution to the cryptanalyst'sproblem as follows.
THEOREM. Let A = A2 be a nonnegativerank 2 matrixwith the SVD expansionin (4). Let v and
c be definedby (5). Thenthe vfc rule,
(6)
is satisfied.
D = (vTAv)(cTAc) - (vTAc)(cTAv) < 0
Proof. Since A is a nonnegativematrix,it follows from the Perron-Frobeniustheorem [4] that
xl and Yi have nonnegativecomponents. Placing (4) in (2) and expanding produces eight terms.
The two terms involving only subscript 1 cancel, so do the two terms involving only subscript2:
D
(V"ZIY1v cTz2y2Cc + vTz2y2v cITzyIc
=1F2
T
-vZIy1C
T
T
CTZ2Y22V -
T
T
VTZ2Y2C CTZIYl V)
Of all the different inner products appearing in this expression, only two-namely, y2Tvand
CTZ2-are negative.Consequently,all four termsin the parenthesesare negative and D is negative.
6. Effects of Encipherment.When a message M is encoded by a simple substitutioncipher c,
(wherec is a permutationof the integers 1 to n) each occurrenceof the ith letter u, is replacedby
the c(i)th letter, uc(z). The resultingcryptogramis called MC.If A is the digramfrequencymatrix
of M, and C is the permutationmatrix C = (8c(i)j) which has in its ith row, the c(i)th row of the
identity matrix, then it is not difficult to show that the digram frequencymatrix of MCis CACT.
This matrix has in row c(i), column c(j), the frequency of the digram uiu1 in M, which it
represents.It follows that the singularvalues of CACT are the same as those of A, and thejth left
and right singularvectors are, respectively,Cxj and Cyj. These have the same coefficients as xj
and y1, but they are permutedby C; the coefficients appearingin row i in xJ ad yJ appearin row
c(i) in Cxj and Cyj.If the scheme describedabove, applied to A, classifies the letter ui as a vowel
in M, then applied to CACT, it will classify uc(,), the encoding of ui, as a vowel in Mc.
Simple substitutionciphersare, of course,very simple ciphers,and no self-respectingcryptanalyst regards them as a challenge. A more sophisticatedcipher is the k-alphabeticcipher (k is a
positive integer). In this cipher,k permutations,cl, ca.. ., Ckare used in cyclic fashion to encode
the letters in M to produce the cryptogram M,. A letter is said to be in position p, in M
(1 < p < k), if it is the mth letter of M and m is congruentp modulo k. If the letter ui occurs in
positionp, it is encoded as uc(i). Thus, the encoding of a letter depends,not only on what letter it
is, but also on its position in M.
The digramfrequencymatrixof a cryptogramencoded by a k-alphabetcipheris of little use in
decoding it, since the various occurrencesof a digramin MCdo not representthe same digramin
M, if they do not occur in the same position. Cryptanalystshave some clever ways to deduce the
probablevalue of the cycle length k. See, for example, Gaines [2] or Sinkov [5]. Armed with this
information,a cryptanalystcan calculate k digram frequencymatrices,Ap = (aijp) where ai jp is
CLEVE MOLER AND DONALD MORRISON
82
[February
CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CD CDCODDC
CDl rO CD CD CD rO rO
CDl -,O
r.CCD
-
CO CDCD
>-
r
CD CD CD 1r-C)
OCO
CD CD CD CD CD CD CD CD CD CD OCO
O
CD
CD CD rO CD rO
Dn
CCD
CD CD
xooooooooooooooooooooooooo
O CD r-
z
CM OCDLO CO CD CD
O,
0 or:00000
CD CD CD CD
z
4
D
D
q
C
CD
o
o
CO
O
CD CD
CO
O
O
CD r-Cr
D
CD CD CD CD CD CD CD CD CD CD CD CD CD CD
CD CD CD CD CD
CD CD
000000
00c\Jnj00000000000
LC
o
D
D
D
D
C)CCDCD\
D
D
D O CD o
D
o
tDk
-0
C
CD
nC
C
C
C
c0 oC
xoceooooooooooo C\i.
o.
~
o
, lo
L C rO CD CD CD
CD CD
CD -
O
CD CD CD C\
CD rO CD
L" rO CD CD CD) On
LIJ C
LO I-,t I-,
CD CD CD O
CD CD CD CD rO CD CD
CD
C
C\
O
O
CD CD CD CD
CD CD CD CD CD r
CY) CDJ r-
CY) CLO
CO CD m r
CD CD CD
O
-
m
CD CDC)
C\l
O CD CDI,
CD
0n CD) 0O CD
O
CD CD CD
CD CD CO CD
CDO
CY) CD CD CD
CD
CD CD CD CD CD CD
CD
CD
CD
C\J
C
C
O
O
ctCD
I
C
D
CD004 CD
"\
-4
CDC.D
00O
C
Cl
.
0
0
D O
M LO -,t CD CD CD
C\l
LU bI_ OCD-T)
Lr-00
O
C
D
O
-
C
)OCD CD C\
CDO
0D 0 D 00 CO CD CD
D 0DO
C\l C\
C\l CD rO CD CD CO md
M
1 r=
D
3
CD
CD C\l CD
>>
1983]
SINGULARVALUEANALYSISOF CRYPTOGRAMS
83
the frequencywith which the digram uiuj occurs in positions p, p + 1 modulo k. All of these
occurrencesare encoded into the same digramuc (i)uc +(i) in the cryptogramMc. By an argument
similar to that given above for the simple substitution,if Ap is the digram frequencymatrix for
is the corresponding digram
digrams occurring in position p, p + 1 in M, then CpApCp/+I
frequency matrix for the cryptogramMc. It has the same coefficients as Ap4,but its rows are
+. Its singularvalues are the same as those of Ap, and its
permutedby Cp,and its columns by Cp+
j th left and right singularvectors are, respectively,Cpxpj and Cp+ IYpj,where xpj and ypjare those
of Ap. If the classification scheme described above, applied to xpj and y(p- 1)j classifies u, as a
vowel, then it will also, applied to Cpxpjand Cpy(p+1)j, classify its encoding uc (j) as a vowel.
If this is done for eachp, then the cryptanalysthas, for eachp, a classificationof the letters in
position p of the cryptogram,into vowels and consonants. The classificationis, of course, only as
reliable as it would have been if it had been applied to the original message M, or, more
particularly,to those subsequences of M consisting of the digrams occurring in a particular
position p, p + 1. If these are representativesamples of the digramsoccurringin the language of
M, then the classificationis as reliableas it is when applied to equally representativeplaintexts, as
we have done in sections 7 and 8.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
81.7256
53.5189
45.1604
31.1073
19.9258
18.8196
16.8777
12.1174
10.1647
9.4667
7.6957
6.2491
4.7882
2.7120
2.1048
1.7556
1.6395
0.9360
0.8129
0.4996
0.3232
0.2069
0.0148
0.0000
0.0
0.0
FIG. 2. The singular values.
7. ExperimentalResults. Fig. 1 is the digram frequency matrix for Lincoln's Gettysburg
Address, a text of 1,148 characters.Notice, for example, that "th," with 47 occurrences,is the
most frequentpair, that "q" occurs only once, and that "j," "x" and "z" do not occur at all.
Fig. 2 gives the singularvalues of the matrixin Fig. 1. Since threeletters are missing, the matrix
has rank 23 at most, and 3 of the singularvalues should be zero. The subroutinefinds two exact
zero values and one value, the size of roundoff erroron the computer.(Exact zeros are printed as
0.0, while numbersless than 10' are printed as 0.00000.)
84
[February
CLEVEMOLERAND DONALD MORRISON
It certainlycannot be claimed that the singularvalues decrease rapidly. In fact, the rank two
approximationonly vaguelyresemblesthe originalmatrix.Nevertheless,useful informationcan be
obtained from the first two pairs of singularvectors.
Fig. 3 shows the first singularvectors xl and Yi, togetherwith the frequencyvectorf. It can be
seen that the components of the two singular vectors are roughly equal, and are roughly
proportional to the components of the frequency vector. Thus, even though the matrix is not
particularlywell approximatedby the first term of its SVD, the singularvectors still retain the
propertiespredictedby the rank one theory.
A
B
C
D
E
F
G
H
I
J
K
L
M
N
0
P
Q
R
S
T
U
V
W
0.3275
0.0470
0.1200
0.2011
0.4394
0.0875
0.0966
0.3481
0.1830
0.0000
0.0099
0.1203
0.0607
0.2165
0.2387
0.0522
0.0007
0.2954
0.1683
0.4453
0.0532
0.1167
0.1210
0.3221
0.0441
0.1135
0.2259
0.4517
0.1065
0.0799
0.3378
0.2339
0.0000
0.0097
0.1218
0.0468
0.2435
0.2563
0.0564
0.0054
0.2493
0.1391
0.4367
0.0597
0.0817
0.1044
0.0
X
0.0
Y
0.0339
0.0344
Z
0.0
0.0
102.
14.
31.
58.
165.
27.
28.
80.
68.
0.
3.
42.
13.
77.
93.
15.
1.
79.
44.
126.
21.
24.
28.
0.
X
10.
Y
0.
Z
FIG. 3. The first right and left singular vectors and the letter
frequency vector.
A
B
C
D
E
F
G
H
I
J
K
L
M
N
0
P
Q
R
S
T
U
V
W
-0.5097
0.0412
0.0719
0.1436
-0.3306
-0.0006
0.0521
0.3877
-0.2070
0.0000
0.0033
0.0298
0.0634
-0.0649
-0.3891
0.0385
-0.0010
0.1692
0.0801
0.3785
-0.0860
0.1884
0.1559
0.0
-0.0045
0.0
0.1574
-0.0386
-0.0788
-0.2163
0.5305
-0.0198
-0.0623
0.3293
0.1791
-0.0000
-0.0049
-0.0853
-0.0613
-0.3983
0.1070
-0.0535
-0.0062
-0.3402
-0.1462
-0.3878
-0.0516
-0.1405
-0.0112
0.0
-0.0215
0.0
FIG. 4. The second right and left singular
vectors. The sign patterns identify vowels
and consonants.
Fig. 4 shows the second singularvectors x2 and Y2. The alternatingsign patternspredictedby
the rank two theory are clearlyevident. Figs. 5 and 6 give a graphicalsummaryof the quantitative
informationin Fig. 4 by plotting each letter at a point in the two-dimensionalplane determined
by its components in x2 and Y2. Thus, "a" is plotted at coordinates (-0.5097,0.1574), "b" at
coordinates (0.0412, -0.0386), and so on. The more frequent letters are in Fig. 5 and the less
frequentlettersin Fig. 6. The box in both figureshas cornersat (? 0.1, ? 0.1).
Since they fall in the same quadrant(the second), the letters "a,""e," "i" and "o" should all
be classified as either vowels or consonants, and we have, of course, chosen to call them vowels.
The letters "h," " n," " u" and "y" must be called neuterbecause the correspondingsigns in the
two vectors agree. The letter "q" occurred only once. The classification of q as neuter is
neverthelessinteresting. Its one occurrenceis in the word "equal." One might expect it to be
classified as a consonant, since it occurs between two vowels. Note, however, that the algorithm
does not recognizeu as a vowel. It is classified as neuter.This is, no doubt, attributableto the very
high frequencyof the digramou, which accounts for 7 of the 21 occurrencesof u. The letter "h"
85
SINGULARVALUEANALYSISOF CRYPTOGRAMS
1983]
E
H
A
0
V
~~~
D
R
N
T
FIG. 5. The more frequent letters, plotted with coordinates from Fig. 4.
clearly shows a tendency to be followed by a vowel, and to be precededby a consonant. The letter
"n" shows a weak tendency to be followed by a consonant and a strong tendency to be preceded
by a vowel. The other three neuterletters occur infrequently.The letter "ij,I"x," and "z" are seen
not to occur at all. The remaining14 letters are classified as consonants.
If a simple substitutioncryptogramwere made from text such as the GettysburgAddress, the
digrammatrixA would be replacedby PAPT for some unknownpermutationmatrixP. As shown
in Section 6, the singularvectors of the transformedmatrix would classify the encoding of each
letter in the same way as x2 and Y2 classified the letters of the plaintext.
8. Other Languages.The same experiment was performed on texts of approximately one
thousand characters each, written in five other languages selected for their diversity. The
classificationsof letters resultingfrom these tests are tabulatedbelow.
Language
Vowels
Consonants
Hawaiian
AEIOU
HKLMNPW
Japanese
German
Spanish
Finnish
AEIOU
AEFOPU
ABO
AEIOUY
BDGHKMNRSTWYZ
BHIKLNRV
CDFGLMNQRSXYZ
DHJKLMNPRSTV
Neuter
Absent
CDGJMSTWZ
BHIJPTUV
B
BCDFGJQ
RSTVXYZ
CFJLPQVX
QXY
KW
CFGQWXZ
86
CLEVEMOLERAND DONALD MORRISON
[February
QK
GM
L
C
FIG. 6. The less frequent letters. The scale is 5 times that of Fig. 5.
None of the discrepanciesis very surprising.Severalof them are associatedwith letters which
occurredso infrequentlythat no significancecan be attachedto their classification.In the Finnish
example,B occurredonly once, at the beginningof a word of foreign origin. It followed a final N
in the precedingword and preceded an E. In the German text, P occurredonly once, and that
occurrencewas in the very GermanictrigramSPR. Occurringonly between two consonants, it is
not surprisingthat the analysisclassifiedit as a vowel. The most common neighborof F, on both
sides, in the selected sample, is R. The classification of I is confused because of the very high
frequencyof the diagramsIE and El. More generally,the Germanlanguage does not share the
aversionof the other languagesto consecutiveconsonants,and consequentlymany Germanletters
fall into the neuter category. The classification of Y as a vowel in the Finnish sample is not
surprising;Y has a value in Finnish that is hardly distinguishableto a non-Finnishear, from that
of U. The neuterclassificationof I and U in the Spanishexampleis clearlyattributableto the high
frequencyof vowel-voweldigramsin which U or I is the first letter, having a value equivalentto
W or Y in English.The perfect performanceof the algorithmin Japaneseand Hawaiianis clearly
the result of their rigorouslyobserved exclusion of consonant-consonantdigrams. Most of the
JapaneseHiraganacharactersare transliteratedas a consonant-voweldigram.
9. Conclusions.The second singularvectors in the singularvalue decompositionof the digram
frequencymatrixprovide the cryptanalystwith a helpful and surprisinglyreliableway to classify
the letters in a cryptogram as vowels or consonants, if the encoding algorithm is simple
WHAT IS THE GEOMETRYOF A SURFACE?
1983]
87
substitutionor k-alphabeticsubstitution,with k known, and if the text if vfc text. Texts writtenin
many naturallanguages,with certainexceptions,tend to be vfc texts. Near-exceptionsare German
(and, presumably,other germaniclanguages)in which the vfc characteris somewhat diminished
by the frequencyof consonant-consonantdigrams,and Spanish(and, presumably,other romance
languages)in which certainvowel-vowelpairs are frequent.Despite these deviations from the vfc
rule, the second singularvectors classify correctlymost of the letterswhich occur with a frequency
high enough to be statisticallysignificant.
A computer program which uses the SVD as the starting point in an automated, heuristic
approachto solving cryptogramsis describedby Schatz [7].
References
1. H. C. Andrews and C. L. Patterson, Outer product expansions and their uses in digital image processing, this
82 (1975) 1-12.
2. H. F. Gaines, Cryptanalysis, Dover, New York, 1956.
3. G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Computer Methods for Mathematical Computations,
Prentice-Hall, Englewood Cliffs, NJ, 1977.
4. Richard S. Varga, Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, NJ, 1962.
5. A. Sinkov, Elementary Cryptanalysis, A Mathematical Approach, New Mathematical Library, vol. 22,
Mathematical Association of America, Washington, D.C., 1968.
6. J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, LINPACK Users' Guide, Society for Industrial
and Applied Mathematics, Philadelphia, PA, 1979.
7. Bruce R. Schatz, Automated analysis of cryptograms, Cryptologia, 1 (1977) 116-142.
MONTHLY,
WHAT IS THE GEOMETRY OF A SURFACE?
ROGER FENN
School of Mathematicsand Physical Sciences, Universityof Sussex, Falmer, BrightonBNJ 9QH, England
1. In this article we shall assume the definition of a geometry proposed by Klein in his
Erlangenprogram, that is, a geometry is a group of transformationsof space together with the
propositionsleft invariantby this group. The space in question will be the two-dimensionalplane.
By a surfacewe shall mean a closed compact orientablesurfaceof genus y. That is a spherewith y
handles attached.The fact that two such surfacesare homeomorphicif and only if their generaare
equal was probably known to Riemann. A modern proof can be found in Massey's book [5,
Chapter 1].
The exact result which will be proved is the following:
THEOREM1. Let y > 1. Then there is a groupof hyperbolictranslationsof the hyperbolicplane
such that the space of orbitsunderthese translationsis homeomorphicto a surfaceof genus -y.
Moreover,thereis a compactpolygonin theplane whosetranslatesunderthe groupare distinctfor
distinct group elementsand whichform a networkof nonoverlappingpolygons covering the whole
hyperbolicplane.
Our aim is to prove the above by means as elementaryas possible, in particularwithout using
deep results from the theory of functions.
In order to illustrate the general result, we consider firstly the case where -y= 1. Here the
Roger Fenn is a Lecturer in Mathematics at the University of Sussex, England. He received his Ph.D. in 1968
under the direction of John Reeve and is presently writing a book on Geometric Topology. He believes people
should be at the same time renaissance polymaths and twentieth century specialists insofar as this is possible. He is
married, with three children, and enjoys playing chess, singing madrigals, and walking.
Download