I. Introduction In the history of English Lexicography, one visual image dominates all others; James Murray – the editor of the Oxford English Dictionary – standing in his Oxford ‘Scriptorium’ wearing his academic robe holding a book in one hand as he peers carefully at a small piece of paper. Presumably, this piece of paper is one of the lexicographic slips – or note cards that contain quotations of a particular word – and Murray is engaged in the act of pondering the precise shade of meaning of a specific word. The remarkable aspect of this picture is not, however, or glimpse into the idealized view of a lexicographer ‘defining’ a word, but rather the background to this image. Murray is standing in a room surrounded Figure 1: James Murray in His Scriptorium by pigeonholes overflowing with pieces of paper; these papers known as slips formed the core source material of the Oxford English Dictionary and were the result of a massive distributed reading effort where volunteer readers were given lists of books to peruse and asked to note interesting and unusual usages of words. (Cite actual letter here). These volunteer readers then returned their papers to Murray who – along with a small group of subeditors – would sit perched at the high table in the middle of this image and compose a definition from the raw materials before him. In this single image, we see the formation of three essential components of an idealized 1 vision of the lexicographer’s task; that lexicographers should begin by consulting words in context rather than other dictionaries, a visual representation of the massive task of actually gathering, organizing and accessing the raw material for a dictionary, and the implicit notion that an intrepid reader or scholar could return to these source materials later to revise or refine definitions in the dictionary. Of course, the actual practice of lexicography rarely – if ever - conforms to this idealized vision. There are at least three practical obstacles to this approach associated with managing and assimilating data. The image of Murray in his study examining a single specific paper drawn from the vast nest of pigeonholes behind him must give at least some pause to academics who can’t find an article offprint or student paper from previous week in the various stacks in their office. Indeed, when Murray took on the editorship of the Oxford English Dictionary, the slips were in such disarray that the decision was made to abandon previous efforts rather than trying to re-sort and re-catalog the slips already in their possession. Further, all of the labor involved in creating the slips and a corresponding system for storage and access is only the first step in the lexicographer’s task – order must be brought to the chaos; the slips must be organized to correspond to word senses, the lexicographer must decide whether to order the information chronologically, semantically, etc. and the actual writing remains to be done. Finally, the slip archive itself is more of an ideal than a practical reality; Murray himself consulted other dictionaries for the OED and practical constraints of time, cost and labor compel most lexicographers to work with existing dictionaries as they try to revise, refine, or build new dictionaries. The difference between this idealized vision and the actual practice of lexicography is amply illustrated by the fact that there is no equivalent image in the history of the lexicography 2 of Ancient Greek to match the famous image of Murary. Indeed, an image on Henry Liddell from the 1875 issue of the British Social magazine Vanity Fair offers only a caricature of an Oxford don in his robes with no books in sight and a caption that mentions only his college, Christchurch while an image from a later memoir shows Henry Liddell simply reading an unidentified book at his desk. In fact, the best equivalent image to reflect the intellectual shifts of Greek lexicography are not that of a single man in a room, but rather of a stack of dictionaries, each pointing to the one that follows it. (BRUCE EXTRACT) This line of dictionaries begins with Thesaurus Graecae Linguae of Stephanus in 1572 and runs in unbroken succession to the dictionaries most commonly in use today, the Liddell-Scott-Jones Greek lexicon originally published in 1843, revised some nine times until 1940 and further augmented with three supplemental volumes, the most recent in 1994 and its Supplements, the Italian Vocabolario della lingua greca (GI), and the Spanish Diccionario griego-español (DGE) and the Greman Lexikon des frühgriechischen Epos (LFgrE) now in progress. In this progression, citations of actual passages where words appear have become an increasingly important and distinctive component of lexicon entries. In the work of Stephanus, where words were grouped by ‘family resemblance’, brief phrases were given as examples, without line or chapter references (although authors, and sometimes works, were cited). The 3 later, alphabetic, editions of the Thesaurus (Valpy and Barker 1816-28, Hase 1831-65) introduced referenced citations, but these were very brief: often just one-word quotations from the early grammarians and lexicographers, rather than illustrations of usage. The first (modern) alphabetic Greek dictionary, and the first dictionary from Greek to a modern language, Schneider (1797-8), used more extensive citations, mostly from early epic, as examples. These provided the core source-material for subsequent Greek lexica: Passow (1831) drew on them for his citations, and Liddell and Scott (1843) in turn used his material as the basis for their own. In their seven subsequent editions, Liddell and Scott steadily increased the number and range of quotations, drawing on the alphabetic Thesaurus of Valpy and Barker, and then on a variety of later sources, as the discoveries and textual editions of the nineteenth century unearthed new attestations, until the accretion of new material made a complete reworking necessary.1 In 1904, a proposal was made to the British Academy for the creation of a new Thesaurus, in order to organise the newly-discovered material.2 However, in a memorable phrase, which has frequently been cited, Hermann Diels (1905: 693) compared the task of collating the citations from the full corpus of ancient Greek literature as equivalent to ‘in dieses Chaos den Nus hineinzubringen’,3 and the task was eventually abandoned as unfeasible, in 1 Zgusta (1987: 264-72) and Glare (1987) give contrasting accounts of the changes in Liddell and Scott’s approach. Their last (eighth) edition was published in 1897, the year of Scott’s death and a year before Liddell’s. 2 For a brief account of the discussions, see LSJ (1925: iv-vii). 3 ‘Bringing Νοῦς into this Chaos.’ The expression is cited in LSJ (1925: v), Berkowitz 4 favour of a further revision of Liddell and Scott’s lexicon, which was published in ten parts from 1925 to 1940 as its ninth edition, LSJ. This great work has proved to be the foundation of subsequent Greek lexicography, but it may perhaps be described as a magnificent failure, because so much new material has been incorporated into the structure of the eighth edition that the clarity of the semantic descriptions is often overwhelmed: see Zgusta (1987, 271-2), Glare (1987) and Chadwick (1994). Since then, the ever-increasing volume of new material has been collected in independent volumes: new citations were published in Supplements to LSJ (1968, 1996), and the historical range was extended by Lampe (1961-8) and Trapp (1994-9). (END BRUCE EXTRACT) The road that connects the practical accretive nature of Greek lexicography with the more idealized vision of the lexicographer’s task encapsulated in the picture of Murray in his study converges at with the emergence of large corpora of digitized literature. These corpora allow lexicographers to find and analyze lexicographic source material in ways that simply were not possible for Murray, Liddell or any other lexicographer of the pre-digital era. At its simplest level, the computer can automate the basic tasks of identifying the words in a corpus, constructing an index, and presenting passages where the words appear so that lexicographers can write the definitions. Electronic text corpora such as the one contained in the Thesaurus Linguae Gracae digital collection founded in 1972 and the Perseus digital library can substantially ease the task of locating the passages where words are used and can transform the questions that a lexicographer can ask.4 If we can automate the task of executing searches in an and Squitier (1990: vii), Pantelia (2000: title). 4 Crane 1999. 5 electronic corpus and compiling the results in a useful fashion, lexicographers can spend more of their time doing the intellectual work necessary to thoroughly consider words and their meanings for a new lexicon. The creation of a citation file only begins to exploit the possibilities that electronic text corpora can contribute to the practice of lexicography and philology. We can also help lexicographers begin to provide answers to questions that would be difficult or impossible to obtain without computational techniques. For example, How common or rare is a particular word? Is a word associated with a specific work, author, or genre? What grammatical or morphological features are commonly associated with different verbs?, etc. Beginning as a NEH funded post-doctoral researcher at the Perseus Project in 2001 and continuing as a professor at the University of Missouri-Kansas City, I have been engaged in the practical task of creating just such a database in partnership with a team at Cambridge University who are writing a new intermediate level Greek - English Lexicon. In our work, we created a database that allows the lexicographic team to complete a project that couldn’t have been done by a staff of the same size if at all in a pre-digital era by providing us with a method for examining all attestations of a word in our corpus along with an overview of the authors, timeperiods, and genres where those words appear while also providing tools that allow us to manage the chaos of information overload that comes from a massive unsorted list of citations. In this paper, I will describe the slips themselves, the two elements of general system architecture that have been essential for the creation of this system, and the merging of old and new resources that we have used to manage the potentially overwhelming abundance of information contained in the new lexicographic database. 6 The Citation Database As noted above, any lexicon is based on the collections of word attestations that serve as the raw materials for the lexicographer, whether gathered from scratch as Murray’s team did or built from previous lexica as is the case in the history of Ancient Greek-English lexicography. To create our citation file, we extracted a key-word-in-context listing for every occurrence of every word in the corpus and matched the Ancient Greek passage with a parallel English translation from the Perseus digital library wherever such a translation was available. The key-word-incontext index is built using the Greek morphological analysis engine known as Morpheus. Greek is a highly inflected language and many inflected forms share few if any surface features with their dictionary form. In order to identify the words contained in a corpus, we must take advantage of the Perseus morphological analysis system that allows us to determine, for example, that moloumetha is a future form of the Greek verb blôskô (to come or go) or even that mêtri is a form of the noun mêtêr (mother). These determinations are made using the Greek morphological analysis engine that is integrated with the Perseus digital library. This morphological analysis engine was developed for Greek texts by Greg Crane beginning in 1985 and it has been refined and extended over the years for Latin and other languages. Morpheus works by breaking down words into component parts and comparing these parts to morphological databases of stems and endings. Anne Mahoney describes the morphological analysis system as follows: The original implementation, Greek Morpheus, can handle regular verbs and nouns, irregular verbs (in Greek, mostly suppletive) and nouns, verb prefixes (a very common kind of derivation), and the various dialects of Greek in common use in the archaic and classical periods. Virtually all inflections in Greek are endings, though many 7 past-tense verb forms take a prefix (the "temporal augment") and some stems are formed by reduplication of the first consonant. Morpheus therefore assumes that inflected words can be divided into stems and endings. The stems are related to lexical headwords (e.g. the stems pemp- and pepomph- belong to the verb pempô, "send") so that tools using Morpheus can offer definitions as well as morphological analyses. For each stem, moreover, Morpheus knows the relevant grammatical category (the "conjugation" or "declension"), which determines the possible endings. It can then recognize that pempoimi is a valid form, but pempeiên is not: both use endings for the first person singular, present optative active, but only the first of these endings is appropriate for the verb pempô.5 Once these determinations have been made, it is then possible for us to create an index with each dictionary form and the passages where a word such as pempo might appear. Following the general model of the Liddell and Scott dictionaries, these citations are sorted in chronological order (with one notable exception described below) and accompanied by frequency charts that show how often these words appear in different authors and genres and links to an on-line edition of the Liddell, Scott, Jones Greek English Lexicon. In the example below, we find the top of the slip for the Greek word κλέπτω that illustrates these features. 5 http://www.ldc.upenn.edu/exploration/expl2000/papers/mahoney/mahoney.htm See also Generating and Parsing Classical Greek, Helma’s article at cybergreek.uchicago.edu/Bootstrapping.pdf and http://portal.acm.org/citation.cfm?id=1596347 8 9 In this citation file, the lexicographic team can clearly see that this word can be used in poetic registers, appearing frequently in works by authors such as Homer, Aristophanes, Euripides, Aeschylus, etc. but they can also see potentially interesting clusters in prose authors such as Aristotle and Xenophon. This example also provides a concrete illustration of the problem of scale that accompanies the ability to computationally generate a lexicographic slip. While it is a great luxury for lexicographers to have all of this source material at their disposal, it also introduces a secondary problem of potential information overload. In order to have any hope of completing a dictionary in a timely manner, lexicographers must also be able to move through their source material quickly and they may not have the time to analyze and categorize all 295 passages 10 where this word appears. Consider the example of the existing intermediate Liddel-Scott lexicon with approximately 32,000 entries. If the lexicographer is able to read all of the source material, analyze it, and write a definition in thirty minutes, a first draft of the complete dictionary will take almost eight person years to complete while an hour on each definition stretches the time required to create a first draft to sixteen person years. Even with a more generous allocation of sixty minutes per entry, the lexicographer is granted something on the order of 10 seconds to read each citation and then some ten minutes to write the citation itself. If one assumes that each citation will take a minute to read and contextualize, almost five hours would be devoted to simply reading the citations before beginning to write. Consider further the example of the Greek letter Pi. This letter is the most common initial letter in Classical Greek; the entries for it take up some 131 pages in the current intermediate Greek Lexicon, some ____ pages in the large Liddell, Scott, and Jones lexicon and there are some 15,964 distinct lemmas in our lexicographic database for words that begin with Pi. The scope of Pi is such that according a memoir TITLE, Liddel wrote to his collaborator Scott in 1842 about the impending completion of the letter Pi. “You will be glad to hear that I have all but finished Π, that two legged monster, who must in ancient times have worn his legs astraddle else he never could have strode over so enormous a space as he has occupied and will occupy in Lexicons.” He then inserted a drawing of the creature in human shape, adding, “Behold the monster, as he has 11 been mocking my waking and sleeping visions for the last many months.”6 Without some method to manage the volume of citations that we can generate, the Pi monster threatens to break the bounds of the its initial letter and consume the entire lexicographic project. Clearly a citation file of this sort requires tools that allow lexicographers to optimize the amount of time they spend analyzing citations and to help them identify more interesting and useful citations where they can devote their time. Indeed, the key-word-in-context index brings us back to the point of aporia where Diels found himself in 1905, wondering how to bring order into the chaos. Indeed, while large computational corpora create for us the chance to revise and start from scratch, this approach digitizes chaos rather than simplifying matters. Our approach to bringing initial order to the chaos has focused on integrating the long tradition of lexicographic research into our citation file. We have done this in three ways, 1) by separating out passages that were cited in the large Liddell-Scott-Jones dictionary in both the slips and in the frequency counts 2) by integrating the cited passages from our citation file into the on-line edition of the LSJ dictionary, and 3) by integrating the citation file into the larger architecture of the Perseus Digital Library. <BRUCE EXCERPT> These three approaches combine to create six essential features that make it make it a highly-effective lexicographic tool: (1) the separate collection of citations from LSJ in the slips and frequency counts; (2) an new digital edition of the Liddell-Scott-Jones lexicon that integrates cited passages, (3) integration with the Perseus Digital Library architecture in a way that allows lexicographers to quickly check ambiguous lemma forms, missed LSJ citations, and citations from multiple editions and collections of texts. 6.1. The LSJ collection: the ‘weave’ 6 Page 19 from Bruce’s scan 12 In order to make maximum use of the semantic sorting which has already been performed on the LSJ citations, we also display them in what we call a ‘weave’: that is, interwoven with the text of the LSJ entry itself. The start of the weave display for the same word as shown in Figure 1, θέ ατρον, is shown in Figure 2: This display is more informative than the ‘list’ format illustrated in Figure 1, in two ways. Firstly, it gives us a check on accuracy: we can easily see whether any citations are missing. Secondly, it gives us semantic information: we can see the LSJ definitions next to each 13 passage, and so compare their interpretations with ours. Because the citations are given in the order of the semantic groups of LSJ, we can benefit from the semantic sorting which has already been done, and make it the reference-point for our own revision. Three senses are visible here: the basic meaning of theatre as a place for dramatic performances (Herodotus), its use for political meetings (Thucydides), and a more abstract sense, the stage, the theatre, referring to the representations (Isocrates). The illustration does not show the full HTML page, which includes a fourth, collective, sense, spectators, audience.7 <JEFF WRITING>This process is facilitated by a core architectural element of the Perseus Digital Library that we have termed an Abstract Bibliographic Object (ABO). An ABO is simply an abstract identifier for a work that is not connected to any particular instantiation of that work. In terms of modern library cataloguing, our ABO corresponds to the concept from FRBR cataloguing known as the ‘work’.8 Since long before the digital era, Classicists have referred to ancient texts by abstract citation schemes rather than referencing specific printed editions (although some of these schemes – such as the ones for Plato and Aristotle – have their origins in early printed editions). For example, in the sample slip for θέ ατρον above, the first cited passage is listed as Hdt. 6.67. The reader knows that Hdt. refers to the Greek historian 7 The non-LSJ slips show that the two concrete senses appear throughout Greek, while the abstract sense is much less common. The development of the collective sense is especially interesting, being the usual sense in Aristophanes and in Plato, who gives it a much more general application, to any kind of audience or group of spectators. A fifth sense, what is seen, spectacle, is not identified in LSJ, but appears in the New Testament. 8 Cite Allison Babeau’s Perseus paper on FRBR – David Mimno should have something here too. 14 Herodotus while 6.67 directs him or her to section sixty-seven in book six of this work. The reader who wants to read the passage in context, can go to any edition of Herodotus either in the original or a modern translation and locate this passage. We implemented this abstract referencing system in the underlying architecture of the Perseus Digital Library so that users could move between different translations or Greek and English versions of texts within the library interface. This, in turn, allowed for the interconnection of texts and lexica within the digital library environment so that citations in reference works and grammars such as Liddell, Scott and Jones were encountered, they could be turned into bi-directional hyperlinks between the two sources, allowing readers to jump from lexicon to source text or source text to lexicon.9 The architecture that allowed for this bi-drectional linking made it possible for us to take every citation in the Liddell-Scott lexicon and integrate the appropriate text fragment with the digital edition of the lexicon. This same architecture also allows us to integrate multiple editions of different texts and texts from different collections. Because each edition of a text is tied together by the abstract bibliographic object and the stable citation system used in Classical texts, we are able to display – for example – an English translation of book six, line 140 of Homer’s Iliad with any other edition that uses this same citation scheme. When each word-form is identified and the chunk of surrounding text selected, that specific sentence-file in Perseus is matched to the corresponding sentence-file in the English text. This enables a matching passage of English text to be displayed below each Greek one, helping the lexicographers to scan quickly through the texts. This architecture also allows us to work with documents encoded against various DTDs. Anne Mahoney describes the core Perseus architecture as follows: 9 Cite Lexicon to Commentary and New Technologies for Reading 15 The Perseus text processing system manages XML and SGML texts encoded according to various different DTDs. The key to the system is the mapping of specific SGML elements to abstract structural elements. If a user wishes to read Our Mutual Friend, book 3, chapter 6, or if a commentary refers to Iliad, book 22, line 361, the document management system can identify this section of the text by its citation scheme (by book and chapter, or book and line), no matter what DTD was used for Dickens or for Homer. … Using our system, digital librarians create partial mappings between elements in a DTD (e.g., div1, div2, and lb) and abstract structural elements (act, scene, and line) from which the text processing system generates lookup tables (indices) of the elements so mapped. Thus what is encoded as <div2 type="scene"> in one document and as <scene> in another are both indexed as an abstract, structural "scene." This mapping hides the use of different DTDs from the higher-level processing routines.10 In 1999 and 2000, when this system was initially being implemented with NEH support, we found that we were able to use this system to quickly integrate texts from other collections such as the Library of Congress American Memory Collection and the Greek texts found in the Thesaurus Linguae Gracae collection. The TLG group gave permission for their collections to be used in the NEH project that initially funded this work, thereby allowing those authors and works which are not stored in Perseus to be included in their correct positions in the display. This seamless transition between the Greek texts from multiple sources along with English translations ensures that we have a complete coverage of our corpus texts. 6.2. Checking ambiguous lemma-forms and missed citations 10 Cite Anne’s ‘Generalizing the Perseus Document Manager’ 16 The Perseus architecture also allows us to provide a mechanism to check for situations where the automatic integration of the citations has failed, either because the slip-generation routine fails to recognise the passage corresponding to a LSJ citation, or the morphological analysis engine fails to parse the correct lemma from which an inflectional form is derived. If such failure leads to serious loss of time, then the archive will be, in practical terms, of limited value. In order for it to be a usable research tool, we need to have facilities to cope immediately with the failures. The most common problem is failure of lemma-identification. This has two possible causes. Firstly, the morphological analyser cannot identify every word-form. It is limited by the size of its index, which includes about 97,000 Greek stems and 14,000 inflections. This enables it to recognise 69% of the word-forms in the Perseus texts that constituting about 99% of the attestations. That gives a level of accuracy of about 85%: a good percentage, but still resulting in a substantial number of unresolved forms and missed citations. The second possible cause of failure is that the process of lemmatisation is itself fundamentally limited by the presence of ambiguous forms: ἄνα, for example, could be the vocative of ἄναξ, the Aeolic feminine of ἄνη, or the anastrophic form of ἀνά (or perhaps even a neuter plural of ἄνοος). However, we find that, in practice, homonyms like ἄνα or λῆξις cause least difficulty, and complexities of verb inflection cause most. To meet these eventualities, the program is therefore designed to give us automatic feedback, by identifying the level of certainty in lemma-identification, and assigning a ‘weight’, or probability-number, to each attestation, which is based on the number of possible lemmas from which the form could be derived (as far as the program recognises). This is the basis for the totals of ‘unambiguous’ and ‘ambiguous’ citations shown in Figure 1. 17 The ambiguous forms must then be lemmatised manually. In practice, this does not take long: the eye can very quickly scan down a page of chronologically-arranged citations. In the initial conception of the slips, we had intended to disambiguate these words and enter corrections in to Morpheus, however the time involved drew too much labour away from the core lexicographic task. Indeed, the Perseus group has taken this up as a new separate project to manually create parse-trees for a vast corpus of ancient Greek texts that can then be used to train a probabilistic parser. However, we needed to use the archive immediately, and so we required a strategy to cope with identification failures. Our solution was to combine the feedback with text-links. Every failure-report is accompanied by a hyperlink to the passage which was searched, so that we can check the text, by clicking on the link. The small horizontal lines preceding all the text passages shown in Figures 1 and 2 are the hyperlinks. We have, as it were, embedded the slips archive within the digital library of texts. This allows us to check problems immediately, reducing the times when we have to leave our work-stations and consult the print editions. A similar procedure is used for failed identification of LSJ citations. The program indicates to us where it has failed to find the word-form in the cited passage, and we can then immediately check the text. This feature can be illustrated for the word ἀβᾰκής, speechless, calm, whose LSJ entry is shown in Figure 3: 18 We can see from the absence of an inserted passage that Morpheus has missed Sappho fragment 72, and the feedback at the bottom of the page confirms this. By clicking on the hyperlink, the underlined ‘Sappho 72’, we move directly to the fragment, which is shown in Figure 4: In this fragment, the words which the analyser has identified are all underlined as parsed, and we can see that ἀβάκην is in fact there, but unrecognised (because it is a paroxytone accusative form not listed in the Morpheus index). So we still have fast access to the correct citation, even when the program has failed to identify the form. The consequent saving in time is substantial: this feature transforms the slips database from an ancillary tool with excellent but 19 limited coverage, into a dependable, ‘all-weather’ reference system. 6.4. Citation matching In order to identify all the LSJ citations, we also need to match any variations in numbering. In general, the citation systems for Greek texts are remarkably stable: the LSJ line numbers for Homer and the tragedians, and the section numbers for the prose texts, are much the same in modern editions. However, the texts of many early poets, especially the lyricists, have been republished in new editions which give different fragment numbers. We have therefore compiled a concordance from LSJ to the modern editions of the lyric and iambic poets, and also to epic, comic, and tragic fragments, where modern editions differ from LSJ. This ‘poetry map’ is integrated in the electronic database. Its use can be demonstrated from the citation from Sappho shown in Figure 4 above. LSJ cites this as fragment 72 in Bergk’s Poetae Lyrici Graeci, while TLG uses Lobel-Page’s Poetarum Lesbiorum Fragmenta, where it is fragment 120. By tagging it with the LSJ number, and also mapping that to the modern edition number, we can ensure that the LSJ citation is always recognised, even in cases like this where Morpheus fails to find the target-word.11 6.6. The slips: summary of lexicographic functions At this point, this database is in active use as the lexicon team continues its writing work. As the database nears completion, however, we need to begin work on the next two phases – publication and a long-term archival strategy. The lexicon team has a contract with Cambridge 11 Cite Poetry Map 20 University Press that allows for simultaneous publication of the lexicon in print form and as part of the freely available collections in the Perseus Digital Library. We also need to begin to think about the long term preservation of the slip archive itself. The bulk of the slip archive was designed as static HTML with Unicode text with an eye to long-term accessibility on different platforms and operating systems. Only the facilities for checking the missing citations requires a functional digital library system behind it. We must, therefore, now consider whether we should create static versions of these dynamically generated pages while also beginning to work with digital repositories in libraries to find a suitable long term home for this data. The lexicon itself is being authored in XML according to a DTD that lends itself to long term preservation and that will also facilitate the process of digital preservation. At the same time, however, the XML also contains additional working notes within a <NOTE> tag that are not intended for publication but rather serve as an archive of the working papers of the lexicon team over more than a decade of work on the dictionary. These also should find a long term stable repository. The archive gives us a digital library tailored to our needs, with exceptionally fast access, because it displays the results of millions of searches, with the words collated with their contexts and indexed for reference. A lexicographically-useful size of passage is selected, set at three sentences, which gives us enough context to evaluate the word meanings. The database is proving indispensable in the writing of our lexicon articles, and has transformed the nature of the project, by allowing us to examine the texts as we write, and to compare the LSJ citations with the others. Pre-searching has proved to be a highly-effective way of utilising the limited time available for writing the dictionary. The HTML format is also very user-friendly: we can navigate very quickly between the two components of the double archive (the LSJ citations and the others). The failures of identification cause minimal problems, because every page of the 21 archive is linked to the full texts. In sum, without this resource, it would have been impossible to write fresh definitions, unless we had a much larger team of writers and much more time. IMAGE CREDIT http://commons.wikimedia.org/wiki/File:James-Murray.jpg – Accessed January 10, 2010 22