Karen Spärck Jones contributed significantly to two separate fields

advertisement
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Karen Spärck Jones (1935-2007)
Professor (emeritus) of Computers and Information
University of Cambridge.
Yorick Wilks
Karen Spärck Jones contributed significantly to two
separate fields (Information Retrieval (IR) and Natural
Language Processing (NLP)) and in her later years was
concerned with their relationship within general schemes of
representation in AI. She died on 4th April after the return
of a cancer and was working until a week before her death.
Her major and lasting contributions will almost certainly be
her original PhD thesis and the inverse document frequency
(idf) measure of the relevance of terms (1972): the notion
that a document is relevant not only because key terms are
frequent in it, but because those terms are not frequent in
other, non-relevant, documents, a notion that is now part of
the basics of IR.
Before going to the Computing Laboratory in 1968, she
wrote her thesis “Synonymy and Semantic Classification”
(1964) at the Cambridge Language Research Unit (CLRU),
run by Margaret Masterman, and under the supervision of
Masterman’s husband, the philosopher Richard
Braithwaite. This work was far ahead of its time (see Wilks
and Tait, 2005) but was not published until twenty years
later in the Edinburgh University AI series (1986), and she
had to be persuaded then that it was still relevant. It was in
fact the first application of statistical clustering methods to
lexical data—in her case the whole of Roget’s Thesaurus
on punched cards—and was an ambitious attempt to create
some notion of primitive concepts for machine translation
on an empirical basis, and it can now be seen as the
ancestor of a range of empirical semantics research, from
the semi-synonymous rows of terms (synsets) in WordNet
to much later work on statistical clustering to determine
semantic relationships. The historian in her produced an
extraordinary appendix to the thesis on artificial languages
for coding meaning. The algorithms she used were those of
the Theory of Clumps, the same ones as had been
developed and used by her husband Roger Needham in his
own thesis work on automatic classification, and the ones
she used when she moved to the University Computer
Laboratory to begin work on Information Retrieval (IR),
since its then Director would not allow work explicitly on
AI or NLP, although IR he deemed respectable and
scientific.
Karen was born in 1935 in Huddersfield, Yorkshire, of
English and Norwegian parents. She studied history at
Cambridge, but moved to philosophy (then called Moral
Sciences) in her last year, so that when, after a brief spell
school teaching, she accepted Margaret Masterman’s
invitation to join the Cambridge Language Research Unit,
it was a philosophy doctorate she started. Masterman
(2005) remained a major inspiration to her and is, along
with Roger Needham, the person thanked at the end of
Karen’s acceptance speech on receiving the ACL Lifetime
Achievement Award (2005); this last remains the best
overview of the many interleaved themes in her work.
Her first published conference paper (1958), with
Masterman and Needham, is called “ The analogy between
mechanical translation and library retrieval”, a title of great
prescience in her career. At that time it referred to the use
of thesauri to resolve meaning problems in the two
technologies, but it was a link that preoccupied her all her
life and to which she returned with her “Information
Retrieval and Artificial Intelligence” (1999) where she
argued that AI in general, and NLP in particular, should
make more use of the statistical methodology of IR.
In 1968 the need for more serious computer facilities took
her out of CLRU and into the University Computing
Laboratory, by which time she had been a 3-year Research
Fellow of Newnham College and then a Royal Society
Fellow with which she began her new career in IR, a
subject on which she became a world authority. Eventually,
Needham became Director of the Laboratory and she was
able to revisit her early interests in NLP, taking on students
and producing major work in language front-ends to data
bases, automatic summarisation, content retrieval from
video, evaluation methods, and belief revision.
Academic promotion was slow in coming: most of her
career was as an Assistant Director of Research on soft
money and it was only in 1999 that she was awarded a
personal professorship. Meanwhile she had taken on a
wider role: she managed much of the Alvey Research
Programme (from 1985) in the UK, she was an outstanding
President of the ACL (1994), took leading roles in the US
DARPA/NIST evaluation projects (1992), and later was on
the Advisory Committee for the DARPA TIDES Program.
She gained many later honours, some of which she did not
live to receive (though she has recorded acceptance
speeches): Fellowships of the American and European AI
Societies, the Fellowship of the British Academy, the ACL
Lifetime Achievement ward, the Lovelace Medal of the
British Computer Society, the SIGIR Salton Award, the
American Society for Information Science and
Technology’s Award of Merit, and the ACM-AAAI Allen
Newell Award.
In retirement she was as active as ever, returning again to
issues of representation, to her early interest in semantic
primitives (including the last publication on her website,
2007) but always tempered by her powerful slogan “Words
stand only for themselves”. She remained finely balanced
on the issue of whether or not NLP can help IR, conscious
that most claimed non-statistical advantage can be
reproduced later by statistical means. And yet, she wanted
NLP to matter: although she had attributed statistical
influence on AI to IR, she knew well that it was above all
Jelinek’s Machine Translation research at IBM that had
driven NLP to take up statistical methods, but she remained
skeptical that the tasks of machine understanding could all
been seen as “recovery processes”, in the way the answer
recovers the question, the document recovers the original
query, and the transcription recovers the speech signal. She
asked (2005), can we really see machine translation of
Shakespeare into Spanish as recovering his hidden Spanish
within the English!? She produced a stimulating late paper
on how the Semantic Web movement faces up to these
questions. She never forgot that Masterman had been a
student of Wittgenstein, so she was therefore only one step
away from him, and how close her slogan above was to his
demand to look not for the meaning but the use.
She also campaigned hard for more women to enter
computing, and was conscious that she, like Masterman
before her, had a husband with a more powerful formal
role; and we can now examine, in both cases, on which side
the more creative achievements lay. She was, with
Needham, an accomplished sailor and they built their house
themselves; she made wonderful things from objets
trouves.
Masterman, M., Needham, R. M. and K. Spärck Jones
(1959) The analogy between mechanical translation and
library retrieval, In Proc. International Conference on
Scientific Information (1958), National Academy of
Sciences - National Research Council, Washington, D.C.
Masterman, M. (2005) Language, Cohesion and Form, (Ed.
Y. Wilks, with commentaries by Y. Wilks and K. Spärck
Jones), Cambridge University Press: Cambridge.
Spärck Jones, K. (1964/1968) Synonymy and Semantic
Classification, Ph.D. thesis, University of Cambridge,
reprinted, Edinburgh University Press AI series (Eds. S.
Michaelson and Y. Wilks): Edinburgh.
Spärck Jones, K. (1972) A statistical interpretation of term
specificity and its application in retrieval, Journal of
Documentation, 28.
Spärck Jones, K. (1999) Information retrieval and artificial
intelligence, Artificial Intelligence, 114.
Spärck Jones, K. (2007) Semantic primitives: the tip of the
iceberg, In Words and intelligence: Part II: Essays in
Honour of Yorick Wilks, (Eds. K. Ahmad, C. Brewster and
M. Stevenson), Springer: Berlin.
Wilks, Y. and J. Tait (2005) A Retrospective View of
Synonymy and Semantic Classification, In
Charting a New Course: Natural Language Processing and
Information Retrieval, Essays in Honour of Karen Spärck
Jones, (Ed. J. Tait) Springer: Berlin.
Download