July, 1968 Report ESL-R-355 Copy 9 RAPID PRE -INDEXING BY MACHINE by William R. Kampe II The work described in this document was performed as part of Project Intrex under Research Grant NSFC-472 (Part) awarded to the Massachusetts Institute of Technology by the National Science Foundation and the Advanced Research Projects Agency of the Department of Defense. This grant is designated as M.I.T. DSR Project No. 70054. Electronic Department of Massachusetts Cambridge, Systems Laboratory Electrical Engineering Institute of Technology Massachusetts 02139 I i, FOREWORD Except for minor editorial changes, this report is the thesis submitted by Mr. William R. Kampe II to the Electrical Engineering Department, Massachusetts Institute of Technology, in partial fulfillment of the requirements for the degree of Master of Science. A few alterations in the original wording have been made throughout the text in an effort to enhance clarity, and several pages have been reformatted; otherwise the manuscript remains as submitted. J. F. Reintjes Professor of Electrical Engineering {V ABSTRACT This report describes the development of a new method of subject indexing by machine for documents in the Project INTREX catalog. The purpose of the system is to allow new documents to be placed online quickly in the computer-stored INTREX catalog. The system that is developed makes use of human-generated subject terms of existing INTREX documents as a basis for generating index terms for new documents. The pre--indexing system operates on only the title and abstract of a document in generating a pre--index for the document. The analysis of documents already containing human-generated subject indexes consisted of comparing the titles and abstracts of the documents to their subject indexes. A large dictionary with data about word usage was obtained from these comparisons. The dictionary served as a guide for the later pre -indexing of new documents. Three variations of the automatic pre -indexing method have been de veloped, tested, and evaluated. Two methods show promise for operational use in the INTREX system. v ACKNOWLEDGMENT I am greatly indebted to Professor J. Francis Reintjes, thesis supervisor, for his guidance, suggestions, and critical evaluation of all stages of this research. I thank Peter Kugel for his continued encouragement and for his endless patience in discussing new technical approaches. Thanks are also expressed to Messrs. Robert Kusik, Richard Marcus, and Alan Benenfeld for their explanations of the Project INTREX cataloging proce s s. In addition, I am grateful to Professor Lynwood Bryant, who spent much time studying the manuscript and making pertinent suggestions for improvement. I also thank the Publications and Drafting Personnel of the Electronic Systems Laboratory for their efforts in typing and assembling the final document. This research was supported by the Department of Electrical Engineering and by Project INTREX. Project INTREX is supported through grants from the Carnegie Corporation and the Council on Library Resources, Inc., and through Contract NSFC-472 from the National Science Foundation and Advanced Research Projects Agency. vi CONTENTS CHAPTER I CHAPTER II THE NEED FOR PRE-INDEXING page The INTREX Environment 1 Motivation for Pre-Indexing 4 Plan of the Research 4 PHASE I - DATA GATHERING 6 The Analysis of Documents Already Having Subject Indexes CHAPTER III CHAPTER IV CHAPTER V 1 8 Building a Dictionary 12 Possible Pre-Indexing Criteria 14 THE DEVELOPMENT OF PRE-INDEXING SCHEMES 18 The Form of the Title and Abstract for Pre-Indexing 18 Three Methods of Pre-Indexing 18 Method I - Usage Rate Only 20 Method II - Usage Rate and Frequency Thresholds 20 Method III - Context Decisions 20 Testing the Pre-Indexing Methods 23 RESULTS OF PRE-INDEXING TRIALS 24 Subjective Evaluation of the Informational Content of Pre-Indexes 24 The Role of Function Words 27 The Effect of Varying the Usage Rate Threshold 28 The Effect of New Words on Pre-Indexing 29 The Completeness-Relevance Trade-off 30 The Effect of Dictionary Size 30 CONCLUSIONS AND SUGGESTIONS FOR FURTHER RESEARCH 32 Streamlining the Pre-Indexing System 33 Applicability of Pre-Indexing to Other Subject Areas 34 Suggestions for Further Research 34 The Possibility of Using Word Stemming 35 vii -~~~~~~~~~~~~~~~~~~~ - *-·----~------- --- -- ~I-~~ CONTENTS (Contd.) APPENDIX A APPENDIX B APPENDIX C BREAKDOWN OF WORDS IN DICTIONARY BY USAGE RATE AND FREQUENCY RESULTS OF PRE-INDEXING TRIALS page 37 38 Method I 38 Method II 39 Method III 40 PROGRAM LISTINGS AND STRUCTURE 41 Phase I 41 Phase II 51 BIBLIOGRAPHY 61 viii LIST OF FIGURES 1. The Pre-Indexing System 2. The Target Set of Words for Pre-Indexing (Shaded Area) 7 3. The Pre-Index 7 4. Typical Subject Index 9 5. Typical Title and Abstract 9 6. TA List and SI List 10 7. Size of Dictionary as a Function of the Number of Documents Analyzed 13 Set of Words Included in the Dictionary Data Base (Shaded Area) 13 Frequency of Appearance for Words in the Dictionary Based on 80 Document Records 14 Average Word Usage Rate as a Function Frequency of Appearance 16 11. Word Inclusion Region for Method I 21 12. Word Inclusion for Method II 21 13. Word Zones for Method III 21 14. Results of Method I for 30 Documents 25 15. Results of Method II for 30 Documents 25 16. Results of Method III for 30 Documents 26 17. Comparison of Average Results for all Methods on 30 Documents 26 8. 9. 10. page ix 4 I CHAPTER I THE NEED FOR PRE-INDEXING This research involves an investigation of the feasibility of using automatic machine-generated subject indexes as a possible means of expediting the cataloging process for new documents to be added to the Project INTREX library. An online library in a fast-changing technical field needs a method of putting information about a new document into the library system quickly. A possible method is to generate an auto- matic pre-index from only the title and abstract of a new document. The pre-index is intended to serve as a temporary subject index until human catalogers can replace the pre-index with a full subject index based on an examination of the entire new document. The pre-indexing schemes investigated are based on past experience gained from a computer analysis of the human-generated indexes of a set of documents. The experience thus obtained through comparison of title and abstracts to corresponding subject indexes is used as a guide for generating, by machine, a pre-index for a new document to be added to the system. The pre-indexing scheme being proposed differs from previous work in automatic indexing in that the pre-indexing scheme makes extensive use of prior human indexing in an automatic fashion. The INTREX Environment Project INTREX (Information Transfer Experiments) seeks to exploit the multiaccess computer operated in an online mode as a basis for a machine-stored library. The INTREX collection will consist of approximately 10, 000 documents in selected fields of materials science and engineering. Each document will be cataloged in depth and the cata- log will be stored in a multiaccess computer. Full text of each docu- ment will be stored on microfilm and made accessible under computer control. A user of the library will be able to search the catalog by means of a time-shared computer system. Special remote consoles will allow him to obtain displays of the full text of documents, if he so desires. or hard copies At present, INTREX is still in the developmental stage. are no users of the system, as yet. Nevertheless, There over 500 documents have been fully cataloged and placed into the computer data base, accessible to the INTREX programmers on a time-sharing computer system. It is this data base which is used for this research effort. The INTREX programmers have written numerous programs to facilitate access to the data base. The cataloging record for each document of the collection comprises many of the standard items found on the typical catalog cards of conventional libraries. The title and author, publication date, publisher, and number of pages are included. In addition, an INTREX record for a document includes information such as the language in which the document is written, the abstract, and a subject index. Each particular type of information is placed in its own field (title, author, publisher, catalog number, and so forth) in a document record. Most fields in a document record can be obtained in a purely routine manner. This information comes with the docuiment and needs only to be transferred to the document record. The abstract, when included in the record, also arrives with the document and has been written by the author or an abstracting agency. Nearly all documents of the INTREX collection include an abstract. If not, the cataloger substitutes an excerpt. In a document record, the one field which requires analysis of the document in order to be generated is the subject index. ject index is of prime interest in pre-indexing, in some detail. Since the sub- it will be examined here Of 49 possible descriptive fields being used by INTREX for a document, the process of generating the subject index field alone consumes over half of an indexer's time even for short journal articles. For longer documents, time. the subject index takes a larger part of the total The purpose of pre-indexing by machine is to relieve the indexers of the pressing nature of the burden of writing the subject index for the document. In generating the subject index for a document, the indexers may draw upon the material of the full document, including the title, abstract, and text. The subject index comprises several subject terms which describe the content If the document. Each subject term may range in length from a single word to one or more noun phrases containing -3several words. To every subject term, the indexer assigns a relevancy weight, which indicates how the term is used and how important it is in the document. This weighting system is a key feature of Project INTREX, since it is hoped that the weights will allow more accurate retrieval of pertinent documents. A subject term may be labeled with any of five weights, from 0 through 4. The numbers 0 and 4 have special significance, whereas the numbers document. numbered 1 through 3 indicate specific subject matter of the A weight of 1 indicates that the subject term describes the primary topic of the document; weight (2) designates a secondary topic; and weight (3),a much less central topic. The special weight of 4 is assigned to terms representing mathematical tools, instrumental tools, or applications which are cited in the document but which are not central to it. The weight (0) designates a generic class to which the subject matter of a document belongs. Knowledge of the indexing behavior of the INTREX catalogers provides support for the concept of pre-indexing. The catalogers for Proj- ect INTREX are trained and competent in matters of library science. Nevertheless, they are not trained as subject experts in the area of materials science, which is the area of the INTREX document collection. This fact may influence the subject indexes which the indexers generate. Although in writing a subject index, the indexer may use any word that she desires in describing the document, the majority of words used will be those employed by the author of the document. It is natural that an indexer who is not a subject expert should adopt the language of the author in indexing. Since the title and abstract of a document usually provide an excellent indication of the subject matter of that document, a large number of the subject terms of a subject index are borrowed directly from the title and abstract of the document. Thus, the indexers perform derivative indexing. indexing, In pure derivative the words selected to portray the subject matter of a document are only those words actually used by the author. Salton at Cornell University' has experimented with automatic derivative indexing from Salton, G. (ed), "Information Storage and Retrieval, " Scientific Report No. ISR-11, Department of Computer Science, Cornell University, Ithaca, New York, June, 1966. -4His results are very encouraging to the notion title and abstract only. Salton found that automatic derivative indexing from of pre-indexing. the title and abstract only is very nearly as good as automatic indexing from full text. Motivation for Pre-Indexing Two factors combine to indicate the possibility of pre-indexing. One factor indicates that pre-indexing is desirable; that it is feasible. the other factor, First, the indexers spend a large part of their time in writing the subject index. It would be desirable if this process could be bypassed, at least temporarily, available to INTREX users quickly. in order to make the document Second, Salton's work and an ex- amination of typical subject indexes reveals that pre-indexing is feasible. Since the title and abstract will be placed with the document index into an automatic method for selecting pertinent the computer data base, words or phrases for a document could generate a pre-index quickly for the document before the subject indexing is completed. Plan of the Research Baskcally, the investigation of pre-indexing divides into two rather distinct phases, 1 shows. as Fig. In Phase I, a number of document Dictionary and Data \ Look-ups Indexed Documents Documents reProc ess G~atherin Indexing a / Phase I Phase 11 New Document Fig. 1 The Pre-lndexing System Pre-Index -5records is automatically analyzed by computer in order to build a dictionary and data base for later use in actual pre-indexing. Further, the analysis is intended to provide clues for designing the pre-indexing methods to be used in Phase II. In Phase II the results of the analysis are applied to the automatic pre-indexing of new documents. Also in Phase II, the pre-indexes that are generated are evaluated to obtain a measure of pre-indexing quality. CHAPTER II PHASE I - DATA GATHERING To analyze profitably the titles, abstracts, and subject indexes of a large number of documents in order to gain experience and data for later pre-indexing, it is necessary first to define the goals of pre-indexing. The goal of automatic pre-indexing is to extract from the title and abstract of a document the set of words that best describes the contents of the document. In this research, it is assumed that those words in the title and abstract of a document that also appear in the human-generated subject index for the document are the best possible choices for the words of the pre-index. Thus, the goal of pre-indexing is to select from the title and abstract all those words which the human indexers will eventually include in the subject index. Of course, the pre-index may not always achieve the goal. Figure 2 shows the relationship of the title-and-abstract words to the words of the subject index of a document. is the set of words TA n The target set of words for a pre-index PI SI (read: Abstract and Subject Index). the set of words common to Title/ Words which are eventually included in the human-generated subject index but are not found in the title and abstract of the document are ignored in this study (the set of words SI-SI nf TA). Two types of errors are possible when words are selected for a pre-index. cluded. The first is the omission of some words that should be in- As a result of such omissions, the pre-index is incomplete. The second type of error is the inclusion in the pre-index words that should not be included. This second type of error decreases the relevance of the words in the pre-index. Definitions for two measures of pre-index quality which indicate how well the two errors are avoided can now be formulated as follows (see Fig. 3): Completeness - The percentage of words in TA n SI, the target pre-index, that are actually included in the pre -index Relevance - The percentage of words in PI, the actual pre-index, that are also included in the target set, TA n SI -6- -7- Words of the subject index (SI) Words of the title and abstract (TA) Words that should be selected for a pre-index Fig. 2 The Target Set of Words for Pre-Indexing (Shaded Area) Words that failed to be selected SI Irrelevant words Fig. 3 The Pre-Index -8If both completeness and relevance of a pre-index are 100 percent, then the pre-index exactly matches the target set of words, TA n SI. In this. research the standards defined above are based on assumptions that must be remembered when trying to determine the true value of an automatic pre-index. First, the measurements use the human- generated index as reference standards of completeness and relevance. Second, the standards assume that enough information is contained in the title and abstract of a document to extract a valid pre-index from only those two sources. Unfortunately, the quality of the human indexing cannot be evaluated objectively until the INTREX catalog is used operationally. In the meantime it appears safe to assume that the human index is a valid index and reference standard. An examination of subject indexes for several documents shows that most of the words of the subject index appear in the title and abstract of a document. Furthermore, the work of Salton at Cornell indicates that use of only the title and abstract of a document is a good procedure for generating a derivative index by machine. There will be occasion te question the validity of the two measures of pre-indexing later, but for the time being they appear to provide a reasonable basis for an investigation of pre-indexing methods. Now what remains to be determined is a statistical basis for a pre-indexing system. The next section describes the analysis of many documents which already have a subject index, as well as a title and abstract, so that some decision rules may be devised for the selection of words for a pre-index. The Analysis of Documents Already Having Subject Indexes The purpose of analyzing document records which already contain subject indexes is twofold. First, the analysis should yield information for the formulation of possible decision rules for later auto-pre-indexing, and second, the analysis will result in a dictionary of all words that have appeared in the titles and abstracts of all the documents analyzed. Such a dictionary is employed in the pre-indexing of new documents. In this research, 80 documents already having subject indexes, as well as titles and abstracts, were analyzed by computer according to the techniques formulated below. The subject index of each document positron Iannihilation in potassium (1); weight magnetic fieldI orientation (0); subject terms Fig. 4 Typical Subject Index Title Effect of in magnetic| field Abstract The I angularl correlation ·** * * of potassium I metal Fig. 5 Typical Title and Abstract potassium -10comprises a list of subject terms. A term may include one word or several words combined into a grammatical phrase. of a typical subject index is shown in Fig. 4, the title and abstract are given in Fig. 5. A representation and representations of In order to compare the words in the title and abstract with the words in the subject index, the analysis system in the Phase-I part of the pre-indexing process must convert title and abstract into one list of words and the subject index The converted lists are shown in Fig. 6. into another. TA Effect SI (1) magnetic 1 annihilation Title Words . The list for and weight in potassium potassium The (0) magnetic angular correlation of ) Abstract Words 2 d Subject term and weight field orientation ! * 1: Fig. 6 TA List and SI List the title and abstract words is called TA; words, SI. the list for subject index The purpose of these lists is to allow simple counts of the usage of words in the title, abstract, and subject index. In both the title and the abstract sections of TA, each different word is entered only once, regardless of the number of tokens of that word that actually occur in, For instance, for example, the abstract of a document. the word "of" may occur four times in the abstract of a -11- document. of TA. However, "of" will appear only once in the abstract section If the word "of" also occurs in the title of the document, then "of" will also be entered once in the title section of TA. subject term is considered separately. In SI, each A subject term is reduced to a list of words in a manner similar to that for the listing of the title, The word lists for all subject terms are placed sequentially in SI, and each subject term fills its own block of SI. Thus, the word "of" may occur more than once in the entire SI list, but only once in each of several subject terms. Each subject term is also marked with its weightnumber. Two definitions help in understanding the counting procedures used in analyzing documents. First, an appearance of a word is the occur- rence of that word in either the title or abstract. A word may have only one title appearance and one abstract appearance per document, Note that title appearances are counted separately from abstract appearances, Given that a word appears in a document title or abstract, then the number of times that the word occurs in SI of the same document is defined as the usage of the word. A word may have a usage count greater than one since the word may be used in several subject terms. The usage count of a word having title appearances is kept separately from the usage count of the same word also having an abstract appearance, This distinction has been made because the significance of a word appearing in the abstract is different from the significance of the same word appearing in the title of a document. The usage for each word is also broken down into usage by the weight of the subject terms in which it occurs. Thus, a word may have a total usage of. three, with a usage of two in weight-2 terms and a usage of one in weight-4 terms. This distinction of usage by weight is made to determine if the weight usage of a word provides any clues for selecting words for a pre-index. not particularly significant. Results indicate that weight usage is Only the aggregate usage is meaningful. At first glance it may appear somewhat arbitrary to permit a word to have only one appearance in a title or abstract. However, allowing multiple appearances would create a difficult problem in counting the usage for a particular word. With multiple appearances, it would be impossible to determine just which appearance is responsible for the usage of that word in the subject index. Moreover, the important statistic is not so much how many tokens of a word appear in the title or abstract, but rather, given that the word appears at all, how likely is that word to be used in the subject index. Building a Dictionary In order to provide a future basis for judging the significance of words for pre-indexing, the data that are collected for appearances and usage of the words of the title and abstract must be stored in a dictionary. As each new document record is analyzed, the words from the title and abstract of the document, along with the appearance and usage data for those words, are added to the dictionary. Thus, the dictionary will be a list of all words that have appeared in a title or abstract to date, along with cumulative data about appearances and usage. data portion of the dictionary, In the title data and abstract data are separated in the same manner that the data were collected, although the dictionary includes each word only once. As stated previously, 80 document records were analyzed auto- matically by computer in the manner just described. Then the data contained in the dictionary were inspected in order to learn about the statistical basis for the usage of words in the subject index. at this stage, Note that the data that has been collected merely represent the behavior of the human indexers. One item of interest is the number of words that the dictionary contains, since the number of words in the dictionary may influence the usefulness of the dictionary for pre-indexing. Figure 7 shows the num- ber of words in the dictionary as a function of the number of document records analyzed. The size of the dictionary is growing less rapidly in the region of 80 documents than it was in the first few documents of the collection. New words were added to the dictionary at the rate of approximately 35 per document during the analysis of the first 20 documents. After 80 documents, words per document. the dictionary grew at the rate of 24 new Such a growth curve indicates that the dictionary includes many common words after only a small sample of documents has been analyzed, but that new words will be encountered frequently in new documents. Later, in the design of a pre-indexing scheme for new -13 documents, it will be necessary to provide for those words that are not included in the dictionary, but appear in the title or abstract of a new document. x x Z 1600 I-- cx 1200 X -/ u. 800 X Z 400 0/ 0 I l 20 I I 40 i I 60 I I 80 NUMBER OF DOCUMENT RECORDS Fig. 7 Size of Dictionary as a Function of the Number of Documents Analyzed The dictionary does not include all words that have been used in subject index terms. The dictionary is intended to provide information on the selection of words from the title and abstract. I Figure 8 shows aSI Fig. 8. Set of Words Included in the Dictionary Data Base (Shaded Area) again the relationship of the set of title and abstract words to the set of subject index words. the dictionary. All the words in the shaded portion are filed in They include all title and abstract words. The words -14in the unshaded portion are not included. Although the unshaded set of words may well be good words for inclusion in a subject index for future documents, it is not the purpose of the dictionary merely to list such words. Since the words represented by the unshaded portion of Fig. 8 are not directly connected with the words of the title and abstract, the words of the unshaded portion are omitted. Possible Pre-Indexing Criteria One possible clue for the selection of a word for a pre-index is the frequency of appearance of the word in document records already analyzed. The frequency of appearance of a word refers to the percent- age of the document records analyzed in which the word has made an appearance. Thus if a word has appeared in 40 of 80 abstracts, then the word has a 50 percent frequency of appearance in abstracts. The same word may have a frequency of appearance of only ten percent in titles. Figure 9 shows the number of words from the dictionary in Percentile Percentile of Frequency 100 90 - 99 80- 89 70- 79 60- 69 50- 59 40 - 49 30 - 39 20-29 10- 19 0- 10 Fig. 9 Number of words in percentile Title Abstract 0 0 0 2 3 2 0 1 0 0 3 0 6 410 1 3 3 3 6 16 71 2009 Frequency of Appearance for Words in the Dictionary Based on 80 Document Records -15various percentiles of frequency for both title appearances and abstract appearances. High-frequency words tend to be function words -- mostly prepositions and articles. a document. Such words convey little about the content of Titles tend not to include high-frequency words. Another possible criterion for pre-indexing is the usage rate of words in the dictionary. abstract), Given that a word has appeared in a title (or then the usage rate of a word is the ratio of the number of times it is used in subject indexes to the number of appearances in titles (or abstracts). That is, usage rate = (cumulative usage)/(no. of appearances). The usage rate, then, is just the average usage of the word per appearance. A word has two usage rates. appearances in titles; One usage rate is based on its the other, on its appearances in abstracts. A high usage rate for a word means that whenever the word appears in a title (or abstract) it has a high probability of being used in the subject index. Thus, there are two main types of information available for words in the dictionary -- usage rate and frequency. Figure 10 shows the usage rate of the words in the dictionary as a function of frequency of appearance. Since the usage of a word can be higher than the number of appear- ances, the usage rate has been truncated at a maximum value of 1. dashed lines in Fig. quency. The 10 are the average usage rates as a function of fre- Each word in the dictionary is actually represented by a point somewhere on the two-dimensional graphs. (The points that are shown in the figure are merely dummy points for purposes of illustration. ) The data represented in Fig. 10 are an aggregate of the usage data for the different weight usage counts. Although the usage-count data that is stored in the dictionary is separated by weight usage, the usage rates shown in Fig. all weights. 10 are determined from the total usage counts for A graph showing usage rate by the individual weights yields similar results. With title words, however, in weight-l subject terms. most of the usage occurs Title words have an extremely high proba- bility of being used in subject terms -- over 90 percent. The only pecu- liarity of abstract word use is that high-frequency words are seldom used in terms of weight (0) or weight (4), indicating that terms of such -16 - Usage Rate 1.00 x… X x_ xx…x _ - x AVERAGE USAGE RATE 0.75 - x ' total usage o. of appearances 0.50 x 0.25 o.oo x 20 60 40 no. of appearances no. of documents (a) Usage Rate 1.00 xxx x 100 80 Frequency / of Appearance (percent) Title Words X x / x AVERAGE USAGE RATE 0.75 0.75 _ total usage / / / no. of appearances / 50 x x x 0.25 -x 0.00 x x xxxxx x 40 20 (b) Fig. 10 60 80 100 no. of appearances Frequency no. of documents of Appearance (percent) Abstract Words Average Word Usage Rate as a Function Frequency of Appearance -17- weights tend to be quite specific. High-frequency "function" words, such as prepositions, are used in phrases of other weights in order to make such terms readable noun phrases, for example, "positron annihilation in potassium". There is a useful characteristic of word usage that does not show in Fig. 10, but is clearly demonstrated in Appendix A. Very few low- frequency words have usage rates at intermediate levels near the average usage rate. The vast majority of low-frequency words are either used very seldom or very often, Thus, the dictionary contains a large number of words that the indexers have used in the subject indexes of documents. However, the dictionary also contains a large number of words that the indexers have consistently failed to use in subject indexes. which never appear in subject terms, Such words include verbs, Hence, the dictionary contains information that will be useful when dictionary words are encountered in pre-indexing new documents. When a word encountered in a new document is found in the dictionary, the data on usage rates will indicate whether the human indexers have considered the word as useful for indexing. Of course, words that are not in the dictionary will be discov- ered in new documents. Rules for handling both known and new words will be necessary in generating a pre-index. The next chapter describes the actual methods to be used for pre- indexing. CHAPTER III THE DEVELOPMENT OF PRE-INDEXING SCHEMES All the automatic pre-indexes that have been evaluated have a very simple form. Basically, the pre-index is a list of words which is intended to represent the content of a document. All the pre-indexes generated are the result of simple word-by-word selection procedures to choose words from the titles and abstracts of documents. Only in one of three methods examined are words considered in context. A random sample of 30 documents with both titles and abstracts has been used for the testing of pre-indexing. The Form of the Title and Abstract for Pre-Indexing All pre-indexing methods tested in this research operate on a list of the title and abstract words in the document to be indexed. paring a title and abstract of a document for pre-indexing, In pre- we have made a slight change from the form of the TA list as described in the preceding chapter. For pre-indexing, a single list of title and abstract words in a document is prepared, but now the list gives all words in exact order of their occurrence, including multiple occurrences. Such a listing preserves information on context, which may be useful for preindexing. Three Methods of Pre-Indexing Three different algorithms for pre-indexing from a title and abstract word list have been developed. common features. The three methods have some The similarities in the methods lie in the handling of title words and in the handling of abstract words not found in the dictionary. The primary differences in the methods are in the way that the dictionary data are used. For all methods, the words of the title of the document are included in the pre-index. The data contained in the dictionary indicates that including the title words is a very sound decision rule. From the data contained in the dictionary it is found that title words have at least a -18- -19 90 percent chance of being used in the subject index of a document; hence title words should definitely be included in a pre -index. All methods of pre -indexing face the problem of new words, that is, the occurrence in title or abstract of a word which is not con- tained in the dictionary. All new words, clearly, will be low-frequency words, since the new words have not appeared in any documents previously. The usage-rate averages, as graphed in Fig. 10 of Chap- ter II, show that a low-frequency word in an abstract has only about a 30 percent chance of being used in the human-generated subject index of the document. Thus, when a new word appears in the abstract of a document, there is no valid reason to include it in the pre -index. Nevertheless, if all new words are excluded, then many of the words that belong in the pre-index will be omitted. As noted previously, even when a dictionary comprising the words from 80 documents is used, over 20 new words are encountered in the title and abstract of a new document. Therefore, it has been decided that in all three pre- indexing methods a new word will be included if it appears at least twice in the title and abstract. A manual inspection of several docu- ment records indicates that if the word is used at least twice, then there is a good chance that the word represents something important to the content of the document. New words which occur only once in the title and abstract appear much more likely to be only filler and not central to the content of the document. Actual attempts at pre- indexing support the notion that only a single occurrence of a new word is insufficient for the word to be included in the pre -index. The three methods differ mostly in the handling of words that are contained in the dictionary. The primary information available about words in the dictionary is the frequency of appearance of the word and the usage rate of the word, For example, the dictionary contains the information that the word "the" has appeared in 80 abstracts and has a cumulative usage count from abstracts of 27 in weight-1 terms, 13 in weight-2 terms, 46 in weight-3 terms, 22 in weight-4 terms, but only 1 in weight-0 terms. use this information in different ways. However, all usage data is aggregated over all subject-term weights. breakdown by individual weights. The three methods No use is made of the Thus only the aggregated usage -20 count of 109 is retrieved from the dictionary. From the number of appearances in abstracts and the total usage from abstracts, it is then known that the word "the" has a frequency of 100 percent in abstracts and a usage rate greater than 1,.0. Method I - Usage Rate Only The first method considers only the usage rate for a word which is found in the dictionary. for the pre-index. pre-index. Words with a high usage rate are selected Words with a low usage rate are excluded from the Figure 11 illustrates method I. Words with usage rates falling in the shaded area are included in the pre-index for the document. The exact level of usage rate, R, that is required for a word to be included in the pre -index is tested at three different threshold levels to determine what effect the usage rate threshold has on the completeness and relevance of a pre-index. The three threshold levels of the usage rate, R, that have been tested are 0.25, 0.50, and 0.75. Method II - Usage Rate and Frequency Thresholds In the second method, not only the usage rate of a wordis considered, but also the frequency of appearance of the word is considered. Since very high-frequency words tend to carry little in- formation, such words may possibly be excluded from the pre -index. Therefore, in method II the pre -index will include only low-frequency, information-bearing words. teria for method II. Figure 12 illustrates the selection cri- Words whose frequency and usage rate fall in the shaded area are included in the pre-index. words are excluded. All other dictionary In tests of this method, the usage -rate thresh- old, R, has been held constant at 0.5 so that the effect of varying the frequency threshold, F, may be observed. has been tested at 25 percent, 50 percent, The frequency threshold and 75 percent. Method III - Context Decisions The third method of pre-indexing is In the other two methods, somewhat more involved. a word is either selected or not selected for the pre-index purely on the basis of the dictionary data for that Inclusion Region 1.0 L i- 0 , °75 R 0.50 0.25 50 FREQUENCY Fig. 11 100 (PERCENT) Word Inclusion Region for Method I /Inclusion 0ii Region u 0.75 V 0.50 I D I 0.25 - I 50 F FREQUENCY Fig. 12 100 (PERCENT) Word Inclusion for Method II Strong zone 0.2 0 .50 FREQUENCY Fig. 13 150 F (PERCENT) Word Zones for Method III -22 word only. The third method of pre -indexing recognizes an "ambigu- ous" class of words, for which the pre-indexing method considers neighboring words before making a decision to include or exclude the ambiguous word. Figure 13 shows two thresholds, rate and a threshold frequency F, pre-indexing. R1 andR2 on usage being used in the third method of Only words with a frequency below the frequencythresh- old F are considered as candidates for the pre-index. Words whose usage rates fall in the zone between the two usage rate thresholds R1 and R2 are considered to be in the ambiguous zone. Weak words, which have usage rates lower than R1, are immediately excluded from the pre-index. Strong words, whose frequency and usage rates fall in the cross-hatched region above R2 in Fig. cluded in the pre-index. 13, are immediately in- Words with usage rates between R1 and R2 (the shaded region of Fig. 13) are marked for later decision. If such an ambiguous word neighbors a strong word on either side, then the ambiguous word is included in the pre-index. the ambiguous word is However, if surrounded only by weak words then the am- biguous word is excluded from the pre-index. This method was de- vised purely as an experiment to aid in the handling of words whose usage rate is at an intermediate level and for new words whose usage rate is unknown. For such words there is no sound basis for a se- lection decision, so content is considered in this method. In essence, this method recognizes simple phrases that are delimited by weak words such as articles, prepositions, and verbs. the phrase ". .the For example, from dynamic pulse hysteresis in...", method III will select the words "dynamic pulse hysteresis." In method III as in the other two methods, if a new word (a word not contained in the dictionary) occurs at least twice in the title and abstract of a document, then that word is included in the index. such a new word occurs only once, it is If considered to be an ambigu- ous word and is included only if it has a strong neighbor. This de- cision rule was adopted because a manual inspection of many abstracts indicated that although many meaningful words may appear only once in the title and abstract of a document, such words usually seem to occur next to other meaningful words. Through consideration of single- occurrence new words as ambiguous rather than weak, fewer -23 - meaningful words are omitted from a pre -index. Later pre -indexing trials showed that the primary difference between method II and method III lies in the treatment of new words with single occurrences. Recall that such words are excluded from a method II pre -index. In tests of method III the frequency threshold was held constant at 75 percent so that the effect of varying the ambiguous usage zone could be observed. For testing, the ambiguous zone was set at three different ranges -- 0. 2 to 0.5, 0.3 to 0.5, and 0.3 to 0. 6. Testing the Pre -Indexing Methods Thirty documents were automatically pre -indexed in order to test the above methods. To facilitate experimentation, the automatic pre-indexing system was developed to pre-index each document by all three methods with only one pass on a document. To provide further experimental efficiency, the pre -indexing system also preindexed under three different parameter sets for each method. In addition, the pre -indexing system also compared the different trial pre-indexes of each document to the human-generated subject index in order to yield immediate values of completeness and relevance for the pre-indexes. Completeness and relevance are the standards of quality as defined in the previous chapter. the results of the tests of pre-indexing. The next chapter describes CHAPTER IV RESULTS OF PRE -INDEXING TRIALS To test the pre -indexing methods that were described in the preceding chapter, used. 30 documents having titles and abstracts were For each of the documents, generated. nine different pre -indexes were The nine pre-indexes are the result of using the three methods, each with three parameter sets. The testing demonstrated that the methods cause significant differences in pre -indexing results, although changing parameters within a method has little effect on the quality of a pre -index as measured by the standards of completeness and relevance. A simple way to view the results for a pre-indexing method is to plot the completeness and relevance for each document pre -index as a point on a graph. The closer that a document pre -index is to 100 percent in both completeness and relevance, the better the preindex, at least in a primitive way. indicate that there is The results of pre-indexing trials cause to question the measure of completeness as it has been defined previously. Figs. 14, However, for the time being, 15, and 16 show the plots of completeness and relevance for pre-indexes by methods I, II, and III respectively. Although three different parameter sets were tested for each method, a representative parameter set has been selected to illustrate the results of each method. Figure 17 summarizes the results for all three parameter sets for each of the three methods. In Fig. 17 only the average re- sults of completeness and relevance are plotted. Appendix B gives the results of all pre-indexing trials in tabular form. Subjective Evaluation of the Informational Content of Pre-Indexes One of the first questions that comes to mind is, "Does an auto- matic pre-index appear to describe the subject content of the document?" An inspection of pre-indexes for several documents indicates that, indeed, the pre-indexes contain reasonably accurate subject matter from the title and abstract of documents. -24 - For example, from -25 - 100 x xxx X X x XK x x x x x x Z~ X X x ' X _x x Z . 60 - "' Z Parameter used: Usage-Rate Threshold 0.50 40 O U AVERAGE: ( COMPLETENESS RELEVANCE 20r 0 20 .! I I I I o- 86 percent 61 percent RELEVANCE Fig. 14 I , 60 40 I , 80 100 (PERCENT) Results of Method I for 30 Documents 100 Parameters used: Usage-Rate Threshold 0.50 Frequency Threshold 75 percent 80 x Z x ;60 - x Z - xx x Z K x x . 40 X X X x zX x t)r~~x xKX x K X X O 0 U AVERAGE: 20 - o 53 percent RELEVANCE 62 percent 20 20 0 ® COMPLETENESS 40 40 RELEVANCE Fig. 15 60 60 80 80 (PERCENT) Results of Method II for 30 Documents 100 100 , -26 j 100 - Parameters used: Usage-Rate Thresholds, R1 0.3; R2 0.6 Frequency Threshold 75 percent 80-x X x 60 - x Xx ) x x X _ X K 40 x x K 20 i ol 0 AVERAGE: ) COMPLETENESS 59 percent RELEVANCE 57 percent I 1 20 RELEVANCE Fig. 16 I ~ 100 I I 80 I I 60 I I 40 (PERCENT) Results of Method III for 30 Documents 100 1 METHOD (gB '\ 80 I C U METHOD /A A m 60o METHOD II B \ \ I O 40 U 20 _ 0 0 A, B, and C indicate the different parameter sets used. The values of the parameters are included in Appendix B. I 20 I 40 I RELEVANCE Fig. 17 I 60 I 80 1. 100 (PERCENT) Comparison of Average Results for all Methods on 30 Documents -27 the title and abstract of an article about "Dynamic Pulse Hysteresis in Magnetic Devices", the following is a partial list of terms in a pre - index generated by method III. instantaneous hysteresis magnetic risetime an applied pulse field dynamic pulse hysteresis magnetic devices quasi-static hysteresis magnetic damping phenomena by extended with The pre-indexing method overlooked some words that are pertinent, the pre-index contains a sig- such as "ferrite core", but in general, nificant number of the important words from the title and abstract. Only a few words of the pre -index words are not pertinent to the topic, such as "an", "by'', "with", and "extended". In general the pre- indexing methods are very good at eliminating verbs from pre -indexes. The word "extended" is included because it has a high usage rate from prior usage by the human indexers, who probably used it as an adjective rather than a verb, Overall, however, the pre-indexing system appears to do an effective job. The Role of Function Words The results from method I and method II, Figs. 14 and 15, apparently show that method I is far superior to method II. vance; as illustrated in Both methods yield very nearly the same measured rele - however, method I gives a completeness measure a full 33 percent higher than method II, Yet the only difference in the two methods is that method II eliminates high -frequency words from a pre-index, such as "in", "to", "for", "the", and "of". include all title words in the pre -index. Both methods Both methods have the same usage rate threshold for words found in the dictionary. Moreover, both methods treat new words in exactly the same way. But method II excludes numerous function words by excluding high-frequency words. Such function words have little informational value. In fact the two most frequent words that still carry information are "magnetic" and -28 'field", with frequencies just under 30 percent. Words of such low frequency have not been eliminated from pre-indexes. However, human indexers include high-frequency words in subject index terms in order to make noun phrases. (In INTREX each word of a noun phrase is being placed in an inverted file of subject words and its position within the phrase is being recorded. The purpose of this pro- cedure is to allow users of the INTREX system to state their requests INTREX plans to test and in terms of noun phrases, if they wish. evaluate the merit of this kind of capability.) When method II elimi- nates such high-frequency function words from a pre -index, the completeness of that pre-index is reduced. Nevertheless, mational value of that pre-index is unchanged, function words carry no information. the infor- since the excluded The comparison of method I with method II shows that fully 30 percent of all word occurrences that are eventually found in the human-generated subject index are merely informationless function words. Thus, the comparison of the two methods shows that the measures of completeness and relevance unfortunately do not fully indicate the quality of a pre-index. How- ever, if the problem of high-frequency words is kept in mind, then the measures of completeness and relevance still give at least a feeling for the relative results of pre -indexing trials. The Effect of Varying the Usage-Rate Threshold Within both methods I and III the usage-rate thresholds are varied to determine if the exact setting of the usage rate threshold affects the pre -index significantly. In method III, where high- frequency words are excluded from the pre -index, the settings of the usage -rate thresholds have very little effect on the completeness and relevance of the pre -indexes. This is to be expected, since only five percent of all low-frequency words found in the dictionary have usage rates that fall in the intermediate range from 0.20 to 0.60, the range in which the thresholds were varied. Thus, the particular setting of the usage-rate threshold has negligible effect on the pre-indexes generated by method III. However, in method I the setting of the simple usage-rate threshold seems to have a much greater effect on the pre -indexes generated by this method. This is not surprising -29when it is remembered that method I includes high-frequency words in the pre-index; their inclusion depends on the usage rates of such high frequency words. Since the usage rates of several of the higher frequency words fall in the range of trial variation of the usage rate threshold for method I, the completeness and relevance are more sensitive to the usage-rate threshold, The Effect of New Words on Pre-Indexing Examination of Fig. 17 shows a noticeable difference in the pre indexing results between method II and method III. However, from the preceding discussion, it appears that there should be little difference because of the different ways of setting the usage-rate thresholds for the two methods, since both methods eliminate high-frequency words. Also, recall that both methods include new words that appear at least twice. The difference in the results obtained from the two methods arises from the different way of handling new words that occur only once in the abstract of a document. In method II, such words are considered as ambiguous words and may be included in the pre-index under the proper circumstances. By including such single- occurrence new words, method III has a higher completeness than method II, since the inclusion of any word in a pre -index can only "help the measured completeness of that pre -index, On the other hand, some of the new words that method III selects for -inclusion in the preindex are not used in the human-generated subject index; hence the relevance of method II pre -indexes is lowered slightly. In the early experimental stages of this research, all methods of pre -indexing included all new words regardless of the number of occurrences of that word in the title and abstract of the document. However, changing the decision rule to require at least two occurrences of a word for definite inclusion in a pre -index appears to have significantly increased the quality of the pre -indexes. Under all methods of pre -indexing, the relevance of pre -indexes increased by approximately eight percent, while the completeness suffered byonly two to three percent when the decision rule was changed to require at least two occurrences. -30The Completenes s-Relevance Trade -off Further examination of Fig. 17 clearly shows that within a preindexing method, pleteness. relevance must be sacrificed in order to gain com- Moreover, comparing method II and method III, which differ most significantly in policy toward new words, that the same trade-off is made between methods. pected. ness. one also finds Such is to be ex- Adding any word to a pre-index can never reduce completeIn fact if every word of the title and abstract of a document is included in the pre-index, then the completeness for that pre -index will be 100 percent. But the relevance of that pre -index will be ap- proximately 35 percent, since only that percentage of words in the title and abstract of a document are used in the average humangenerated subject index. (The analysis of document records in Phase I showed that approximately 35 percent of the words of the title and abstract eventually appear in the human-generated subject index.) When words are eliminated from a pre -index in order to in- crease relevance, there is the statistical probability that some of the eliminated words are actually necessary to maintain completeness. Hence, completeness must be sacrificed if rele--ance is to be raised. Moreover, as is found by comparing method I with method II, a full 30 percent of completeness can be sacrificed without damaging the informational content of a pre -index merely by removing the high frequency words from the pre -index. The Effect of Dictionary Size Even with a dictionary containing all the words from the titles and abstracts of 80 documents, many new words are encountered in the pre -indexing of new documents. An examination of the curve of dictionary growth in Fig. 7 indicates that even if the dictionary were obtained from a much larger document base, new words in titles and abstracts would continue to have an important role in pre -indexes. However, the dictionary does contain a good collection of 40 very common words, which appear in over 30 percent of all document abstracts. Also, an examination of several pre -indexes reveals that the dictionary contains most of the verbs which are encountered in typical abstracts. The three methods of pre-indexing are very good at recognizing and excluding verbs from pre-indexes, for example, forms of "to be", "to report", "to have", and "to discuss". In fact a large portion of the selection decisions that the pre-indexing methods must make about words found in the dictionary result in the exclusion of those words, rather than the inclusion of the words. In addition, most decisions must be made about common words, since such words appear most frequently. Therefore, the dictionary used in this research is of sufficient size to provide a reasonable test of the postulated pre-indexing methods. Only a very much larger dictionary would alter the proportion of new words encountered, and it is doubtful that the results would be substantially different. CHAPTER V CONCLUSIONS AND SUGGESTIONS FOR FURTHER RESEARCH The primary conclusion to be drawn from this research is that automatic pre -indexing by machine is feasible with the application of simple techniques. Furthermore, the implementation of the pre - indexing methods developed here can be accomplished with computational efficiency if the pre -indexing system is designed as an operational system rather than as an experimental system. The first step in the design of an operational pre -indexing system is to select a suitable pre-indexing method. It is the opinion of this author that method I is not so useful for pre -indexing as are the other two methods. Recall that method I allows the inclusion of high -frequency function words in the pre -index for a document, In a pre -index which is whereas the other two methods exclude them. a list of descriptive words, function words have no place. The remaining pre-indexing methods, II and III, present a trade-off between completeness and relevance. Method III, which in- cludes a greater percentage of the new words that it encounters than does method II, tends to be slightly more complete, but somewhat less relevant. However, it is necessary before one judges the relative merits of methods I and II to look more closely at the figures for completeness for both of these methods. The graph of Fig. 17 shows method II with an average completeness of 53 percent; method III, 59 percent. From the prior examination of method I, recall that approximately 30 percent of the completeness measured for a pre -index is lost if highfrequency words are removed. That particular 30 percent of com- pleteness can be safely ignored in considering informational content, since the high-frequency words removed in methods II and III are all merely function words. Thus in terms of residual informational con- tent, the true average completeness of methods II and III probably lies in the range of 80 percent to 90 percent, after making a correction for the extraneous 30 percent. Thus, both methods II and III appear to -32 - -33 contain a substantial portion of the informational content of the title and abstract of a document. Nevertheless, method III, which includes more of the new words, would probably be the best pre-index for an operational system because of its greater completeness. The price that must be paid for the greater completeness is lessened relevance, but the larger preindex generated by method III should not create an undue burden. Another distinction between method II and method III is that method III recognizes and provides special processing for words having intermediate usage rates. Because a small dictionary was used in the pre -indexing tests, this difference in methods caused only very small differences in results from method II; with a greatly ex- panded dictionary, method III would probably become even more useful. Streamlining the Pre-Indexing System Much can be done to streamline the pre -indexing system for either operational use or further experimentation. Most of the pos- sible improvements should be made in the organization of the dictionary files, Presently, the files contain an excessive amount of data that were collected in the automatic analysis of documents. At the time the data were collected, there was no way to determine just what data would be useful, From an examination of the accumulated data, however, it appears that the breakdown of usage information by subject-term weights can be discarded, and only an aggregated form of usage data retained, Such a condensation would reduce the size of the files by 35 percent. The use of sorting techniques in organizing the dictionary would improve the efficiency of the pre -indexing system. Proper organization would allow the use of faster search techniques in dictionary lookups. Such organization of the dictio- nary and implementation of improved search would be necessary for an operational system, or even for extensive additional research, a pioneer experimental system with a relatively small dictionary, such refinements are not justified. In -34Additional streamlining can be performed on some of the programming. Several operations that at first appeared necessary for a preliminary experimental system are not clearly superfluous. Applicability of Pre-Indexing to other Subject Areas All of the documents used for this pre -indexing research fall in the area of materials science and engineering. Thus, the dictionary contains the special vocabulary used by scientists and engineers of that field. A question arises as to whether the dictionary and pre- indexing system developed in this research are useful in other subject areas. Unfortunately, there is no document collection from another field available for direct experimentation. However, the nature of the pre -indexing system would probably allow it to be effective in another technical field. The pre-indexing system does not work solely by choosing recognizable words from a title and abstract; In fact, the system also eliminates many common words. most word decisions made by the system are to eliminate a word rather than to include it. Many of the words very common to materials science are function words and therefore are also very common in other fields. such words, The pre-indexing system is good at eliminating since those words are the verbs and prepositions. Hence, the dictionary would still be at least partially effective in some other field. Suggestions for Further Research One item of obvious interest is the effect of dictionary size on the results of pre -indexing trials. The dictionary used in these tests contained approximately 2200 words. affect the results in two ways. A much larger dictionary -nay First, the number of new words en- countered in pre -indexing may be reduced. Second, the distribution of the usage rates of the known words may change, so that the setting of the usage-rate thresholds in the different methods may become more important. For example, in a dictionary built from 80 docu- ments, many of the words contained in the dictionary have appeared in documents only once. only either 1.0 or 0.0. Hence, the word can have a usage rate of But if the dictionary were to be based on the -35analysis of 400 documents then perhaps several of the low-frequency words would have a sufficient number of appearances, usage rates could fall around 0,5. so that their An analysis of the data contained in the dictionary as it grew to its present size indicates that any significant change in the distributions of the usage -rate data is far in the future, if in the future at all, A comparison of a dictionary based on 35 documents with a later dictionary based on 80 documents showed only a very minor change in the distribution of usage rates. In both cases, the usage rates fell primarily at the two extremes. The effect of reducing the number of new words is also hard to predict. Again it is very probable that a very much larger dictionary is necessary to substantially affect the number of new words encountered in the title and abstract of a document, Figure 7 shows that the number of new words per new document is dropping slowly after 80 documents. In fact, a base of 400 documents might be neces- sary to decrease the number of new words per document from over 20 at present to about 10. A shortage of computer time prevented tests with an expanded data base. However, first a streamlining of the pre -indexing system to a more operational form, followed by testing with a larger dictionary, would no doubt be worthwhile as a prelude to making such a preindexing system operational. The Possibility of Using Word Stemming The value of stemming in a pre -indexing system was not tested during these experiments. However, stemming mayhave two very useful effects on a pre -indexing system. One immediate effect is to reduce the size of the dictionary that is stored as the data base. Many different forms of a root with various endings are-presently stored separately in the dictionary. However, by stemming, all words that have the same root can be lumped together in one dictionary entry. For example, the entries for "magnetic" and "magne- tism" can be merged. Such a consolidation of data significantly re- duces storage requirements. Moreover, the stemming of words also decreases the number of new words encountered in a new document. With stemming, a new -36 ending on a known root would be considered as merely another instance of the known root, rather than as a completely new and unknown word. APPENDIX A BREAKDOWN OF WORDS IN DICTIONARY BY USAGE RATE AND FREQUENCY (Based on words from 80 documents) >1.00 368 6 - 3 - - 1 0.90 - 099 - - - - - - - 0.80 - 0.89 - - 0.70 - 0.79 1 - Usage 0.60 -0.69 1 Rate 0.50 -0.59 3 0.40 - 0.49 - - - - - - - - - 37 - - - - 0 to 9 10 to 19 30 to 39 40 to 49 50 to 59 60 to 69 70 to 79 80 to 89 90 to 99 100 to 1 - - 1 2 - - 0.30 - 0.39 - 0.20 -0.29 - - 0.100.19 0.00 -0.09 20 to 29 Frequency of Appearance (percent) (a) Title Words >1.00 660 20 4 0.90099 - - 1 - - 0.80 - 0.89 4 2 - - - 0.70 - 0.79 9 3 - - - Usage 0.60 - 0 69 22 4 1 1 Rate 0.50 - 0.59 52 3 2 - 0.40 -0.49 6 4 - 0.30 - 0.39 25 5 0.20 - 0.29 15 6 0.10-- 0.19 0.00 - 0.09 1210 0 to 9 - - - - - - - - - - 2 - 1 - 1 1 - - - - - - - - -_ 10 - - - - - - - - 1 - - - - - - - - - 19 9 5 - 3 1 - 1 - - 10 to 19 20 to 29 30 to 39 40 to 49 50 to 59 60 to 69 70 to 79 80 to 89 90 to 99 100 to Frequency of Appearance (per cent) (b) - Abstract Words -37 - APPENDIX B RESULTS OF PRE-INDEXING TRIALS C = Completeness in percent in percent R = Relevance Method I A Usage rate threshold: 0.25 Doc No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 300 *302 304 305 364 383 401 402 415 437 606 619 620 621 622 623 625 719 721 72Z 723 724 818 905 907 908 909 911 1102 1106 Average C 82 93 87 89 78 97 96 100 90 72 88 91 87 78 78 75 84 91 83 90 94 94 97 93 81 95 89 91 92 88 R 72 80 86 92 88 45 48 19 41 63 69 79 48 55 82 29 43 51 56 4i 52 71 65 73 43 67 45 29 65 75 88 55 -38 - B C 0.50 0.75 R C 80 76 89 86 84 87 87 94 78 91 97 48 96 51 100 23 87 45 70 68 88 73 81 91 51 87 78 58 72 83 70 30 82 56 84 53 79 61 89 43 94 53 90 77 87 65 74 93 81 45 70 91 77 43 34 9] 84 68 88 75 C 74 73 85 82 75 94 96 100 87 76 86 87 87 64 71 70 77 80 76 59 94 81 81 65 68 78 73 91 80 79 R 84 88 89 100 96 58 59 33 57 73 87 86 58 57 92 36 52 61 69 37 64 79 78 74 49 77 53 44 80 90 61 79 69 86 -39Method II B C 25% 50% 75% 0.50 0.50 0.50 A Frequency thre shold: Usage rate threshold: Doc No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 300 302 304 305 364 383 401 402 415 437 606 619 620 621 622 623 625 719 721 722 723 724 818 905 907 908 909 911 1102 1106 Average R C 40 74 34 74 45 89 95 53 42 87 60 68 57 61 36 20 65 72 32 63 46 76 38 69 58 64 41 64 34 96 55 58 43 41 40 47 47 79 35 40 65 69 30 58 42 62 65 87 38 48 38 62 48 63 66 58 57 80 88 63 C 42 39 45 61 45 71 57 57 65 36 51 44 65 41 37 "0 43 46 55 38 74 41 44 65 38 38 50 66 59 67 R 72 74 86 96 85 71 58 27 54 59 73 70 51 63 93 58 42 49 80 3B 66 62 60 79 39 56 50 45 74 89 C 42 42 51 66 47 71 57 57 65 36 51 44 65 49 40 70 45 50 55 40 74 43 44 80 38 38 50 66 59 67 R 66 75 85 96 86 61 53 24 72 58 66 67 57 64 90 50 42 50 72 37 62 64 55 82 36 53 45 42 64 84 67 52 64 53 62 47 -40 - Method III A C 75% Frequency threshold: 75% 75% "ambiguous" zone: 0.2 'to 0.5 0.3 to 0.5 Doc No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 B 0.3 to 0.6 300 302 304 305 364 383 401 402 415 437 606 619 620 621 622 623 625 719 721 722 723 724 818 905 907 908 909 911 1102 o106 C 47 49 60 74 53 74 59 57 69 49 54 47 74 56 49 90 52 60 58 46 77 51 50 85 49 43 66 71 67 71 R 57 70 79 97 77 54 41 17 43 57 60 58 50 58 85 43 36 40 62 31 60 55 50 72 35 47 45 32 59 74 C 46 48 60 74 53 74 59 57 69 49 54 47 7153 47 90 52 60 58 46 77 48 49 85 47 43 66 71 67 71 R 58 71 80 97 77 55 43 17 45' 57 60 69 49 57 86 44 36 41 64 33 59 63 50 74 36 48 47 33 60 74 C 46 48 60 74 53 74 59 57 69 49 51 47 71 48 46 83 48 60 58 41 74 43 47 85 47 43 66 71 67 71 R 60 73 80 97 77 58 44 17 45 57 64 53 49 56 89 27 36 42 64 32 59 61 52 74 39 51 48 37 61 74 Average 60 55 60 56 59 57 APPENDIX C PROGRAM STRUCTURE AND LISTINGS The programs for both Phase I and Phase II of the pre -indexing system make extensive use of pointers in order to make data available to all subroutines. The programs are written in the AED lan- guage, which has convenient facilities for handling pointers, The programs are listed and further described by phase. The Phase I programs are used to analyze document subject indexes and generate the dictionary files. Phase II actually generates pre- indexes by the three different methods. All programs are designed to run on the Compatible TimeSharing Computer System at M.I. T. (CTSS). PHASE I The main program, MAIN1, sets up all the data handling arrays. MAIN1 makes room for a list of title and abstract words, a list of subject index words, and an array for storing appearance and usage data. Pointers to the various arrays are stored in a directory array PTRS. Thus most major subroutine calls need only the argument PT, which is a pointer to the array PTRS. which is useful to the subroutines. PTRS also contains other data A list of the important entries in PTRS follows: 0 - Location of TA, the array for title and abstract words, 1 - Location of SI, the array for subject index terms. 2 - Location of D, the array for appearance and usage data. 3 - TL, the length of the title section of TA. entered by a subroutine. 4 - AL, the length of the abstract section of TA. 5 - A scratch location sometimes used to pass arguments in subroutine calls. 6 - The number of the current document being operated on. 8 - Another scratch location. 9 - The word length in a dictionary lookup. 10 - The file number in a dictionary lookup. -41 - This is -42 11 - Another scratch location. 12 - Location of BOOK FILE. 13 - Location of dictionary file in use. Three major arrays are used in the analysis of a document record. These arrays are TA, SI, the title and abstract. and D. TA contains the words of SI contains the subject terms. D is filled with the usage and appearance data for the words in TA. The TA Array The words in the TA array are stored four characters per computer word, left justified and blank-padded. Thus a six-character word will fill two computer words and have two blank characters in the final two character positions. The words stored in TA are separated by a computer word of four blank ASCII characters. The length of a stored word is defined as the number of computer words it fills, including the fence of blanks which separate it from the next word. Thus, a four character word such as 'film" has a length of 2--one computer word for the characters plus one computer word for thie fence of blanks. The words "magnetic" and field have the same length of 3. The length of an array of words is defined as the sum of the lengths of all individual words in the array. The length of an array does not specify the number of words in an array. The SI Array The first computer word of the SI array is a header which tells the number of subject terms in SI. subject terms. The header is followed by the Each subject term also has a one computer word header which contains the length of the subject term in the decrement and the weight of the subject term in the address. The D Array The D array is used for recording the usage and appearance data for the words in the TA array. Three computer words in D are re- quired for each language word in TA. For the nth word in TA the -43 contents of D are as follows (the position of data within a word is indicated by the octal mask): D(3N) - Weight 0 usage from title, 7C29 - Weight 1 usage from title, 1777C - Weight 2 usage from title, 777CZO - Weight 4 usage from title, D(3N+1) 1777C10 - Weight 0 usage from abstract, 7C29 - Weight 1 usage from abstract 1777C - Weight 2 usage from abstract, 777C20 - Weight 4 usage from abstract, D(3N+2) 1777C10 - Weight 3 usage from title, 777C27 - Weight 3 usage from abstract, 777C9 - Title appearance, 777C18 - Abstract appearance, 777C Data are placed in the D array in this particular manner so that it is formatted for direct addition to the word data already contained in the dictionary files. Dictionary Files The dictionary files contain all words encountered in titles and abstracts, as well as the cumulative usage andappearance data for each word. The files are sorted only by word length. Each file is two tracks (864 computer words) long. files for each possible word length. such as..... M03 ... N04. Thus, there is a series of A file is specified by two names, The first name indicates the length of words contained in the file, in this case 3, The second name indicates the number of the file in the series of files for the given word length. In this case the file is the fourth of the dictionary files having a word length of three. For storage, each word in a file is followed directly by three computer words containing data about appearances and usage. The data words have exactly the format described for data storage array D. -44The Book File With the dictionary files as described above, a bookkeeping system is necessary to keep track of the number of files in each word length series, and to monitor the number of words stored in the last number file of each series. mation. The Book File maintains this vital infor- If a file word length is M, then the number of files in the series is stored in location 2M and the number of words contained in the last file of the series is contained in location (2M+l) of the Book File. Operating Phase I To operate Phase I, all Phase I programs must be loaded and started from a CTSS console. The files CATDIR FILE and CATREC FILE, which contain the document records, must be available. program will type the number of documents analyzed to date, request the number of a new document to be analyzed. The then The user re- sponds by typing a four digit document number for the new document to be analyzed. The computer will signify when the analysis is com- plete and wait :or another document number. The Phase I programs will not analyze a document which is missing a title, abstract, or subject index field. Phase I automatically collects all data and builds the dictionary files. -45·.- 4 L ~ · Li x IU.~~~~~~~~~~~~~~~~~~ .1-- u0~~~~~~~~~~~~~~~~~~~~~ ILL *4.U. ~ e, * W a %ft ,~~~~~~~~~~~~~~~~~~~~~~W U. or * In - 2 WZ U ~~~~~~~ux~~~~~~~~~~ ·.-. I I~ X s~0 / U.*4 W. sa ca U.4 W0 IA *0 )--. *U,/llW0 wOZ Z I• S eft a * ~~~ W. O 0 * 4W .t.J ' -'.* LiIm 'S~~n Z LLN WJ o 0 4~1*0i' ..J W Wuft n).ZZ34OUZ I-wl-0 L C oU.eaOt tL 40 wZ >2 CO Zw 0 00. ao 0 Iw - * I-Ce a. 3- *.I~me o W -JUUaQ )llr, C C Z- i ft In In f--W-I- I Xt IL.U.L -L-0 a U, 0-L UJ4 U * a.- ~ Li1- Z VI I :J -a.L) ... 4~ .k ma\Y · a SW~ * *2In-*r 1 - - 0 U ?Un - *O ~. O 0 W I L~O * W O~u 4W IU OW.54 -i4.*£JIJ o ft 2 U.WCIU Li~~n4 *) I LO .~ 4~. Li.i a J. *1-1 ftft£ 0 -* W * *QU\~~Iu.C *a. a . .a -eC ~9~~~~0·U a'"CaO I-I1Inr O t -O-i -- L -c ac. LI IS aOJ 0ccr Li a' a~cr aL 0u Li LI a ur 0 LU1. L ..-`O""'i *ZLi0 .4 ·ZC II £ i *In LICU --QL I .4Cl -4 Z Z I U L lM-J"* -- "L I .4 4 .~Ui~I I. 0 C In W ' ·VI CL i- X V) · r~~~0 a.Tr a. -C 1-r t a 0 LIr.~~~ 051-~~~~~ I' , . n 4 n In C) 4 _JtX0 ~. -- a- InC L-I ~. Li~ ZUL .aaILf1 ~a.OL ~LIIUt a. LZIU Q ft~r a L a a· . 4~ .04 4cu ON1( .4~~ .1-·0 u i LIL.Z U.U~ LnY 0 0 ' a.n4 co J U ft 42O 0 4iU S) iO.1a.ILL~r, 02 32D .A S a.ZCa. 0.1-U U ~ · u 0 4 1*1 *U.- , £aIJ OW U.L4 I0 v 1UNic.aW IL 0*(0Vl-ftL~.I-1 0( 1 ft £ Wc.Li3 X. V, w.aaaa4~-~..f-LL'J raa-S. *.O *0c-L * 3- - I luYI -*InX *~~ 0 O.C£0I 044Li~~C 1-Z LILIWWI-"I- InW 0 4 Z In <.L Co~ ~~~ ~ ~ ~ ~ ~~0 ~ *LiLi4~ *44. 44 f. ~ -0 Li4L4 ~ 00I ftft* ft~ Ont.0In 40 . . n Z -U . ~~4 3 2 ~ . 4W2Li3ia1*0 ~ ftf ~ .11 _ ~~ ~ fttJ.fttiL * ~ W ac 0U . N~~~~~~~~J 4 ~~~~~0a.I* 01Oftr .4.03a0Il .-.ft-ft -0.-.-4f~ u U~tftttt1W I W l. ~ U.Li WI-LL *~ -' 44 ,,*LW W 1-t * W J*-.J .2 * :o U U Z0 a0., a ft a,--' 4. - SL -'IL--I .' * -XI" UU. . , I~ I- A W ·_ - 0 41~~~~~~~~~~ · * N * z U - -J XUJU 0J C IXW *0 o *4/Y *U · Z- leW * -4W2 , l w Z oz' *40 .0~~~~~~~~~~~~~~~~~~ U, J. *4 ~. I)( LC cI SA LU .1 aI at >. 0 -4 4 L "0 . z Z ftJ. L).)0 · c 0-W c~ *X 1(U.0 -Wi--3 uL 'n Cft 04W c ZWWWft: ra.'-X Ut Xa o~~~~~~~~o Li ;W 3X ---C - U1 - .0~~~~~~~~~~U 1-1.11.12 0 1(0 n.LLI . U 4 a,. . -0Ix a0 0ft2 2 X f£a1-J r a .e 4 0(YIC x.II 1-C.W - Iv I a C £.4o *W 4 *.0114~ I" a.i1·-O .004 ~---- .14 InwWZ I ) I aW In ofZ L WU. X 2W I fo~~~O 2 iL 1 I.u ZC ~U O .LIJ : 1-IO *Z *_ - ·C LiL C L, U, *0 -- tt)u U J ZWC W LU CZ 0 UL. 0 JL -l -40 O~b- Z * Z Z 00 aX~O *0. r0e a. ZI ft 0 0 ft 1-1 0 -~Z -e *W4·u~Z4Ov uw a. 0,,.4 -w 344 41-'.4 - V4 W I~P~~ X cc 4 a. ~~ ~~~~~~ In X , IJ W'·U C " WC +" 4 0 - 1 (L UL.O.J' atIrZ LiL OZa. .4.43-IO ). f 4 a 5 UUIZ 1 IIZA aUbWf OP*3 u.L)*Liftflu. WWOUi.-OCa.OOX ft-lft "cOw~r u. a..I'OC) za.LiUiOi4W0.laza .L. Ii,. ,-Zft 01-4)- *0 ft a *4 */ .1W0 InZ L Ud -. *0 ~~ Wf~C~rt>I-.1I-. 0 z",0 -11 IU LO O Zc Z L.1ftZ Int ~~~~~~a ' '. a.~ L 0 IW Lt Z1. fC WI W ,0 luu~u C411 >c-1-01-~ zft *' n W£a~f cO * au % £' W0 Ia...JOuiLI 4 0. .0 .ft OW-0 0 Z. ft-0 . 00.WI.I4W O 3 Li 0 L * so w Or ft4 sL~O~ ga ~~ ULU. 3 0 U, . i, eJ C Ix ~~~~ 01- C 0L -- * 1 -U.Z I Uft0 Z 1 Zn uL 3, i.: Z04.-. W 4240 -I 3 0 Z a cnom cO 1-1-.JW -U Oft W L IW *4 t W-· ·. .0>3_U Li 0 2~ -w I * f 0 a. I-.I *~~~ w3 -a * W I *3 * U. 0 0eI)f Zg Li W.j -u.C C0e *a. U, 41W 0ftZ WL.)f* U, U.I Z~~~~~~~~~~~~~~~~~. U * W. 5.1 - U.W Li Z ulO er - 1o ...or. * * a w ,rr :0 zooOw 0 -CU - C IVI In 0UI . I~ .i W0o . 0 31- U J ZtW . * ofl -WI * 0 o~ r" r LiV)Z . L 0 cXN0 U Z-W , LZi -46 - * 'g * 44W CU. 0C Ud -1 NO* U ~ ' ,QO a ~ 4r~~~~~~~~~n~~ -Q Ww - * * a* :: U,*L, 0.U VI'A W... 5.40.U. 43 U) U,40 4-W .( D-1 C 4.4WU 00.-43 C, ~ W -C 4- 1 0I U S-Q 3.1) 434. 33 4~ .30.4 -cuO *~ *LL 700. 04Z 004- * Sa4 ur ~ ~ U..5C)U 4 3. 44311w L 0.0ZZ cc.1- .3 -7 W 43.u 3 0. ZZ *4dN* 4 4-4 0lwcc~rl~J 4 ~coZ0 UO0. Q I"'ll 0 -9 0 3.. 4 - -- J ~ I 4JI 0. 01.4 W0 t UI W ~ * 0.) W~ UN YU LI4 - - 0 I C a, I w- 0 U, -J -. 2 Z 004440N0.4-QJ~U 'O Z 0Z O . 44 NN00.L .JUII4.J-.4 t 4*44 *4. 4U)U.L43 4-4 .70.0 0.0.0.0 U. U.L r-t 4- - 3 0. tW U. 0. 0 W UN 4- 0.LI Ut U *. U . 0. Z 4-QU MU. to Z U U- . Z. 0.0.0 a 40.1 0 u~~0 . 4-0 00U. 0- -L0 0. "J W 4 X ~ )W *Zt- W 0U 40.4 'rc--rr·r-4r-*4·-~4 N *4C: *WNO 4414…I-u 0 % LI 0 0 0 3 04- 40044 3 .4rN . 4 4- 0. 04- UN4.t. Q a.ULC0 0Z 0> ID WCJ 40.1 -0 00 -J I 0. · ZO)W LL 00.M) Z ZWU. I 4/143 W4OU UN OZU.0 4 M UX0A ZU Z . ... z0 W ZU N'.-a LJ -t 00 43 0.W4ZX..a P ZU Z 4I X~ 0.4.) 5 a WN. 0WV 0. . M U04 V4 I 00Q 33.. N Z0 * 0u. 0 YNN 4104.U L~I a..1i- *Z a.- a.0.; 0.D1U 4-41 0. 04 4 W.0 - Z~~~~~~ 0.UWW ri~~0 04 . L) 0. *Z> 30 ZZ 0LI40 40.0 ZM 4- . N 4-*Z X 4-..L0 ·4C X . . *A 0.. U 4 4)( z0 LI rU.1 U. -wz3 0 I .Z.0. 4- -J WN 14W N 430 O 411 U 0 . N. 0 4N4..4. ZbO -J-DN-U, W e U, 0 0 D W.m~ -O 3 40L Ze W * W D 0, O U Z. U) 4-· * X U L a~~~~vrCC , u 4 4 0I 0 4K w 0.- 0 U4''J L co PO VI ~ 40z aU.444J. YV)L a 04-4-~L J.-L co U,(X I X K-W40- - 1. . 3t X I' ' .4--4 0. OW 0. U' U~~~~~~aaL 4-4I - I 4- 2 W WU W .Z4 (U.4 0at L. 04-C of4U. WKI . '4- 40 Z Wa 0 .4-0 a W 4~~ 23 N X U,",~U 0 4N.J 3 0 3)4~~~~~C, -JIZ II WU 4P m2O 3U.. 0a . XL) 4r(30 0.0 I44 .4 W -r~ .4~U 0. +3 K4 K~ * 0 43.L0W 0...J7ZIQ 3)4-)(W 000...0 4 0 0II C J X or.O U.I-,U 4 UX D IJO 07D 4LU owZ 43 C~r LP 44-70I3IL ~~~ 04 0 4- 0 44 . -J X Z U, W 31 4- A U. 0t 7 0. 4U, 4 4a4 0Z U. 4Z 0~~~~~~-' -- a. 0z 0 U U. 3·I~ .44 0.rO 0.WJ W I 0.03 Z9 'A 43 W Z 0 4 0 U, ru a,'4. 0r , 0 .4 r ~~~~~~~~~~~~~~~~~~~~~~ Z XC· Z IW Z 0 AU.D U L. W 3 0 4- U.-. ~ - .WU 0, * 0 ,C.I--4-C 0 0 O 4-14- 'JL4 aO ma. Z 04WCt Z~~x U Ut43C 04- C, -- )47ccL.4 74-I' C, ~ 4-Wv1XI 4 4 wX. 4 . X CD U, I-0 a 0~t Z U U U 0 40e Z w ~~~~~OT~~J ~~~~~~~~~ u~~ 3U, M4 .eU, 0 UaC - N Z4-~ M 00X · 0 . 0 03 4--40~~~~~~~~~C ccZO 00. . -14W be 5K 4 N . U U , X 0.U 07 IL) ZC Zc- * No~ 1 4..444W l 0.3.LJ 0.0. Q44U WW 0.0. ).LmcoL 4444A4 *0 ~~~~~~~~10~~0Y Z O. 0 0 0 Z- " 003W> U, 0J 00 a W OWN-I 0.~IC Z LZ KLI4 4i-L -4414 +* .... C~*~ WW-K4-0.4-·r a W 3J c - 43 .I * -~~~~~~~ N'I'U4UJ.c 004.4 0. 4-4-0.4r40. 0l·-. W4-43IZ.4 3 0.0.0.0 WOO g . Z 0. 3---- 0? ZOOuJ 4-O * 4-0.0 0.0.LI4*- (.4444 *4J S 0.4.44z~r * 0.UU lc X4 O*. W 43U4U 3aW W W0 44. Z O...0. 00.~ *0. W)ZC UK UO Ua 0.orWW >* 0rX .- 0 0. -.4n L041 44 4i4W41 M I U. a I0 O 0X0 0 i !:.a W Z % 43W 0 Z-4-N-U 4- W* D C OWn 07 03.. 'S1 Q9 . 54'Ld-04 *a * Z % LYZ 133 4- U, LU ZZU.0.- AL0 X *U.cc* M 0 ZI0 O c0 CW Si Z cc C -aq~44 -zV) .4M 0 KW 44 w07 .0. -47- x~~~~~ 4 0.W~~~~InYL W Q 4- W U 4 b 4 U) 0. 0 IL, U. o - 0 - z 0. 4 3 3- 4 co z 4 x u 41*4- 0. 0. 4 a £4~~o 2. -I Ovn0.In3 U .J - 4 x . 4 - U4f U 0 4-2 IU U u 4IV 4 .32 U 4 0.*.J 44*0 In * 2. w3-r 2.40. InZ2 *-EWY 0 In In 4,~ .2-4'-- 4 Z~ 4 W44 U 0v .45'U0.4 ..*"* ~ .. JW 4 ~ ~ ~4(0 4-. Lk 0 3- UU £4 z - 04 * 4 UY 2 0 0.0.OOU.40-400-Z 0.-2. uD k 40 0 L 4- - 4W In lo Z z 9 O 4U 0.. InU·~ 0.0 W' .32 .z o 0. 4V'o 4 *L cx z- mc%. *2N 0 2In~Y >lLUU~. w2.ru 4- 4-4 . WI)49 4 U, *U.4 0..02. ~r OWc 0I 44 0.~Ic 0...)0U4 *-.. WWL4U33r 0.-4.T.2 · 4- ..L 4 4 Q 2. . W . 0..03004 04343W4.~ -42. 4 14 . 4- - 4-4-4-04-4/43Z~ 4.3 LLZ0.L 4-U4-Z 4 4LI .30.J~c Ucn U 4d443U*, --0.WD340 4Q.c 034 4 4 W00I0.ZnU U4 4..4..J .L4-4-4U.4U4-4WU.Z.3a- 4 444 UwL Z. nU2.4Uvn . 44o z44ZW)U442 z cc 4I T 014Z 40n 4* I I 2.0- 1 W ).-....v ~ 0 WUU) 14 I .4 UII) £ 4W.I~OZ cum 0.4Uo0 InWZW0 O4v * ZLO 3 u -ZZ~W ru uO 4-.2 4-*4 L~3 4W 4--0.U O0WW2.r\-3 0004Z MmI 0 .ZZOr d ..In 44 4 a w Z3L, 0-? OW. )4. .) U. ZU 0 0. U U U 4: Z - 90~~~-Q U 0 0 0.U~ (3~ OU 2W-* 4-W U.~~~~~~I a. *4 ZL LVI c 0.1 .3 0. 414 14... 000 -42...J002 9 U 4PW -" ~~~~~~~W0 U IL440.~~~~~~~~~4 ..J*0 4-UUU-4--Z 2.2. 24-WO)4 Ob~~~~~~~~~~~~w 4. 3 J U- I0. 3.4.*C3 44 40~~~U 4a 0 -+ - U.QTc 4-U 4Z r-In - ZOZIZ 44-U ~ .30 44' 02.In0 u 2r c 0C I 0. .0 40 4 -·Lt 2.U L 24-.0. -I 4 ..J* LULILC I owclauO 1W( · -2 N I 0 Utc C ~ Q 44 V)4 O OwOr O~~~~~~~~~~~~~~~~zIL I WO..J 4--24 QL~ 4'-0.I*U 2,U-3V . *IW 44-2 -O ' W0.0.4 4-~nUn.4U.44n C U· 4 - -X 0 4Z4.4 4 .4 In -2.0 4 0.2.0.in-.22 4 -N *-44-. U ---4.. .3 0.0 4. £+ Q** .WaIn*4 C~YCQ 4 W3WU.2. 4.. t~ 4-U a4 Uw 0.1.X 0 W4 44a .1.- U-3I4W 4urr4'.O*4-F La-·lcn In * 2 L UWLUn 2.4 4- * ** W 4.30 4U. .4-* 4~ p 44 .3 0 ~WC Lm x0. 2.-W£ 4-.4 -1 ~~0. 40. U P In 20 U 4-. *4 4 4-Z0Z 0. w~r~ xrrZ~~~~cZ 24U 0u 4U.0.4~l~~ *U aU U 420 u -. 1 0.J 4. 0. U *.JW .34C3 4* £ 4WUOO *4U2. 0In 0Z 4-40 0 0.2.0WU44-- ~~~~~~~~Zw~~~ U2. 22 -4041W~ r~~ 0- 30 0 0.0. 0.U , 01 £43C Q-OQ O I *~r . ."n W0.I ~ - W) r - -T£ o3 00 4-4 ~r 44402 0. 2 .J0 0.4- -u.2.I-44 UY ) -.4-4W2 WI.In 2 *. - 0. l 'aU. 44. -O .2-. 0.Z Qn U, W. W Y4 Q I z 2cc 4 -2.0. 0Icn u, 4 wun WIL.-U Z4-U. U0 -w ~ ~ ~ ~~~~~ ~ ~ arU W "I4 ~~ ~ rr r 0. 3 4 2 o 3404-. wu w 413 Z)*u £44- - ~~~~~~~~~U, 44.1* Iw 41 2.I w-2 cc Z4 Lu a. OZWuOU.2.00W 0Wvc430--0U£ >. WIn 4 4 3 wo x4W 4-QU 0WW2 4-7IW 2 QUU-L -"1 ** 44 o. 4 N U. u~~~~~m 4--4 0.OW0. 0.4 2. LU 43 2-·U 0 rw ZU2.O 4wt-W4- U 4- 4 I * 4- 03 · * k444 .3rr *4 .4-44.4 U. W0. Z In . Z 4 4W wD 0.W4f Wu 3C10XQ 0. 0W0. .34 4- 4-44 .3~42.44-4 InU w 2. 44 -4 ow 0.4or z .. 0. U U . 0(3 -. -x - - Of at, 4W I- 34 4 Z 1 U. U 0 * I W.--.4U.4- * 4 0 2. 4- *j4444 G 4- YUJLU *4-~~W z>-. DU 04W4 O£4U. -r 2.0 UI 4-0. *In2.I 44Cc.3 *g~ 2. 4N-4 01--U ~Q*4 ~ ~~~44 ~34.3 P ZW-o U 0 2 0. In 4.U£4. w- 0 4-44- ~-z U rxwr-.I-4 Z44-4444--0.0. .14-0. 4 £40.44I 44- 4.04N4--4.12..0. LUr £4 UY -O or- 2 01 U1 - O~rw 3.12 2.- 0 4 woz 0. o 0.U WoZ 2. 2 0 LL~ 0 .·*Z 43 4. 0I 0 Z 4- *~w4 + £4 2 4 4I4C -~~~~~~m * 1 0e. 3* *4- 4I .3 c 0 40 4 44- -ZIA 0W W) WU(W-t U . 4 *U` 44 .. £4 - 0 - . 2 4 0. 4In U x 7 9 LW *Www0) Uz 4 .. o Is 4 Za Z ur 0-.0. .d J 4 41* 4J -l 4- cc ~~~~~~~~~~~ 0 ~ 2 4 0 O 0 U, 4 3 2 o - U.e0. of 4 2. 14L 24-0. U. U WY 2. 0. - 30L2. W.0 3.2. 0...U. . U.24 23 0.34 U w -4 U W-*230U 2. IUWUt 0 4 4 P InW U. U 4444 -48 - X - A0- 0 W - J~~~~~ 0 4 06~ CW~~~~~~~~~~~~~ 2 2. u06 0* * 0- 3 U UU u 0- 0 W U 00 4 W 0.Q 0 Uu*I tUQ U W U 0 W c 0 4 W W W - V0.U. M - U0.0.0.0.0.0-~ M MM Z XYYL , mmmma 0-----· 3333300 06006006 U~~U 0-0-0fl 06. NN~n06000 X 0. wr*S Z ZZ 01W Z CD! "I ~~ ~~~.0-I-. U,4* W~LYYILUYIW -5 * 0 W .1.1 3 3 - 0--00( U0.J 00002.0.0.---J~X 3 06 - 33U W 32 ~ ~ M. .-0. ", -z Z W .33 UQ ZOA. *-W - 0. 9L-U W Z 06U, -. I~0I IU ..J4*4*4*.... *.J0 UZ 1U04 W 0 U 0 U 0- 0. 0. 0-L.- W. W0 4* oo0-00.O. 06 I'll,, .. I U U, 32 : 0 I M L 0t AI00-0U Ua0ILZ .. I-U,0.0-030 . 0 W ~ 0- -X 0. A 0 0.V) . LU U .1 U. 4z U J W 0Z4 6 U, 4* U,0 060 C, 4*0.. 14*U 0 000 LUZ0-0 Z UZ .1U, 4 0.~~060 UJ 3 X ZUJOXXWW 0~~~~~~~~~~~~~~~ 3 -~~~~~~~ao Uur~~~~~~~C)Z OX U,0-LU U L0W0 U) UaU0 U,1. -J- f40606 U ZW.1 7 XU. 2 'J 32 X W4 U- Ua X 4 20 , 06 6 06 * U. ~CZ ~ ~ . U ~ ~ D U. XU O.-t4 U. V 4*0 Z06 2: U. 000 WW0.~ UM ZX M ~~J. 0~-.J ~~0 n U0- 0 L . CL 'JU cc ~~~ 6 SO U 0W 0. 06'U, 0 Z.ZW -J 0. 4 W 0- 0LU 006 0- X 060- 06 0 . W 0-0£ C, 6-Z 404* - 0X X Z00 * Z1 Z60 Z0 0.. LU at*.-0d ---- 6*,UQ 60 0 .U U, W0I 0 4. 0 UI LU"IZ 0 U, "I 0X 40 W.0UWU, 22. cc 0.2 0 *U Ko J0 Z 30- 0-t 2.~~~~~u W 0. a X U o 06 * 6 0; -J 2W0Z0 W0 -Z0.M - O X--J"0-" 0 -UZ M , L W -X 1 0 0 . 0 6 0 66 - L4X 66 - 0 M 4* W U. 3,0 - LZ 000 X 0- X61. X.0W -~~~~ 02~~~~~~( W~~~~~v 2. a. 3. 0 *U W .. 0 0 *60 0 0 00Xaa-. ; Z0.66~ a,*-0-~60,0~ 4 Z * ZUU, I0 Z J -J U U 06.00Z-I.,UJ 02 . 006 4 -. WC W)U 330 0 - CD ~~~~ U ZW0-W --04* 0.06 2XUL 0. 0up 3 4*3 OW 0t' . 4,U ~ 0 U, v U U, .0-. XOAJ..U '*. U tC~l 0. X X0 W0 X00- -.- U W W WLUO.W .1 Z- 00a. Z06-06 0 J I 0 6 W0 WJ A- ~ 0 0-0 W Z06%X W -0.2 X 0.0 -..04 U, W~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ X U WX X£ W6((L JLJ J U.U. ~ ~~ ~ ~~~~~ 0 Z~.00AU..L 60 W-- ~ ~cZ~ ~ ~~ ~LW X It3 ZU, 0. 0 UU 4 *Z 06 0.· U 4UZ * ~~~~~ - * .0 2U, * t0. Iu~~~~~~~~~I UJ~~~~~ CDZZw XZC. WOO 0-4W X 000 W££0.£ 4*£WUU,06-Z 0. 00 Z W.1.1~ X j0UOaU,4 0. 4*4*00-U 4* W60X0 I *~~~~.UZOL 3 4*2W322 ' MU, ICICW 04* I.. U, UU2. 3.062. 0-. .1 0-. .00 XJ 2.02. OU.L - 'X0. .1 £ 1.03*00. & W..J C,.0-) M06Z.W 06(20: UUU 20600.12W LU 33UwW 0406..2Z U 42 XU UZ X. ZJ 0~~~~~~~C X "I W..J 0/06 Q60 0- X0- 0- . . . 0.Z0.4*4*-o 33301.6(U U, .W~ - WOXW et U0-U, 0U 0- 0 ~~~ ~~~~~ 0~~~~~~~~W0 ~~~~~~~~ J 0. U. 0 U, V, 3 2XU I- X-0.4. .a£U 0.0.06- - W 4~ 06Ut06040'ULX*J0J U W J.1 W cc Z 0.Z4* W ~~~~~~~Z0-W0-0.w ~~~~~~~~~~~~~~~~~06UVI-X 4* U 0 0 2 . - 0 w J U, 0U U0.o 0 U c~~~~~~~~~~~~~~~~u ~ U .L ON010 10 ZW WJU M M WOUU.. 0-0-0-U W.... "., "IL W0-00--06..06 2 C %.~~~~~~~~~~ 'J -J 2.~~~~~~~LU C3 )< 0.~ C, **4 w 0 U 0. 0.0. I, 0 .1 4*0-W0.0 UU . - - 06 4 U, U, 061 0U, U r~*W.I. W c X.LL .1LXU.00. uZWM /06-~a .O X0-·U, U ~ U U) W0. W L 3 0.-4ZM 4 Z0W 0 0 Q 2 - w 0,00 A LLIL~~~~~~~~~~~~~~ rL J. 4* 20-20. U,I 0 .' 00 U i U0406 £0 "I cc4 W U. .. +W Z 49 £ *Z)1 L 00. 0 W00. 14U.U0 M 0 *. -J 0 M 4 WU 0 4 M CL .1 3606N Z-' I. J.. I 10. 0N Z W0.M WJJ J0£,06 06 0606.O.0U30--0-0I06.*-..J0C 3 P I.- W uj I U 20 * 0 ~~~~~ ~Lc~t~-t ~ 4*** - W 0 06 30 2 *W _* 0 U, 6 ZZUa 0 0. 4*o r~~r, 00-4W 0 I~m~~· 0 a -N0V~~n66. -,..J. *.W 6 W WW 0- 0aU, X-340 4 00AWZ it UWZL 2U .1 Z*I £0 O I WXO "W Ucc ID4 4 Z 0 UW XM w0WU 0toat0 W00-~ UM M Z' Z v n .. I 02 - W Z - WWU >10 ~el0. oU. X4 2-06 -IL IL * 'U Ur WLI 0C 0 UU 02.Ww , 0-a0s OX .- U 0-W. I 2* WI- W U U C W U.--W Uw " I"I UW O------0-.1 . *++++. U 0-0-0-0-0ILa0 IL0.Ow -at 3 W 0 J NJ666666U, W I %Z *~+,,O U. U Ua.- ~ ·Wtf 3 X 0 W IZ 3 Z X0 0 Q&U I . .6 W60 X0,...03., U L 0 U,0 .. -49W * ·m :° *9*· ~~~~~~~~~~~~~~~~~~~~~~~L-K0 4 S * *1 W W·0 4 9-4m U~v 3.0 SW C) .9L- 4 . . -wc IL30 63. *J * 5. * ) W3.4 u U, -03--i 0.0 . -J 60 Wm 4U'A .£ * O *-.9 *.9. * U9 * uv, 9 * Z 9. 1A 3 9a. Q 4 W -I 9 I uI U 4 9 IL..J 4 * * ~ 4- 9 CU 0 W -r.9-'*2* 4949J 4-N4W4· 49-999 0999 a v .4 - c v I u £9~u 00)-'M 6-0.9-4309-093.9- ~ .W U 9*0r~Q~uQL-0 U 33. 990 9-094 UWOW 3...9JW W 4L U - a '9 'L U 6 00 v.90 U 40 00 0 40 U O. UUU%4' 4u 0 U U .-I U 9 0 U 0000 U1'A. 00 U- 4444U4U90.4.40* NNNO9.9 000C009 9-7 OW-9X 933J33S0W3333 I W99 09- 0.~O .99.)-4 90. 0.00 09--9 3900 ~~~~~·N~n~~~uI 93. 9- 0 - 7 W) U9WWWW * W99.9WW 73.33.3..93.3.33.3 .-. 9--9-9-*09-9~-9cl·-uZU~~~in- Q w U 32 4 U W O r~d990-90J994 .1wC 4000000 '00000 0. .. Ja 9. . 4 .0 .. w~O~~~ 3..J9.-99. W0 u9L.r 0.u 9 J9J9 JU 99...9W W-* U .5~ 9-O 9- 9-I 9- Wu W 4W0. 49- 03 0999 IN 03 099 UVIA 9-00 AIU- 9-9 22 WW 04331~04 0.*I 9-U 7~UVU~ * 0442 -0.9 0 6- 0C 4* 4 6-9.9 r. o Zw. 310.2 0 *499 9-- Nl .. N -~r 4 ---7 WU.0.0. =00. U 0 0 A .-- 5 * 2-4w-99.,- 6 9flO~wO0. 3.000w...9Z.9- WI 1W9I U ?27 4 70. -9 *-WZOOU5.4WOOZJUbWO w-w9- i 0W -J 0 4.9 300w9-9-rc0.:v; J5.0 wI 4,I 4 99 . 46 X 0 00 WO 3-Z.9 1 . 0.Z w W 4 j06-0 3 3 0.9.92w 4 L 25..--3J. 9Zw1q 0 W.) - 3. Z. 9-.90003.5.04792: -000-5.300Z..-...90 DrOoo.9*vu *M rWw9J 9-0 U 4 4"WI 7wU 9-9- Z 4* 2-. -. WUO .32:.99 3.u.44*4..j --- 9-99- 0A 9U w OW (LI 9J9-9* 4.0. I 000 9-00-~ -Z 033lur -D 6- ,. ;I4Z4 w 0. 9- ~~~4 0.U 4 -J : 909 C) a'.0Li 9- owr 9.WUV9 M&Mom y21- . W 3yy~LU - w~z 4 3. .9- 9 o Z -'r 9-: c.z 0 w 9.a 'JV OWL Z 0I . 999 0 *4L 2 -Z9. 22 ZO 0 9*0. -w 0793wU-0. 0 0.40. * 9 -.- Z 9 ex -Z -C e 9 0 'L r4 0.. 0 .92 w 03. 0. . a 9 33 9%a...9 'A Z OJQ9.. 020.-Z W-bZI .4.0.W Z *.49w~rr-4w. 0. 0.0-9 -2 4 -I"33.09d92: 9.. 0-- J W u .WI4'-Al*9-.99-9 *wwO-49..9 goCU0 00 9-U U, 9 W. *.je UQ000.2: * 0r *u*9 0~~ 4.... 62: Z9-e I. W 02 0.- 94 .3. - U Z 9- fx 4 3 7N 4 U U Z 0I99 . 2 U, 4 00 1N,0U 4 9U 00.00W 0. 0Z 23.239-0 WW)w 0121W.-J0.00 3. *. J C. 6- 0 9-3. 9- w 0. ooo 4 0at X-U, Oz 0.W9 9..)3.9a. . 0 '. c 49- 0 .. J~ 4 9 3. z'uw .0 4W Z U -.. .9 CZL 0.0-U U. "UJ1-wMuw0?.9-9-b.9-.. U*..30WuLUU 0 . *9.5 w w -3 WZ 00.J4wi 00. * U,. U -wz 9ruw 04Z U. * 00. -0 uO ~ 9 ~ ~ 09- 3. 9-~ 7 O : ZZI * 0.00W 0. 93 wu W3..) .43VI 3 900 03 ac~ U,9 Z 0 = 29-Luc 09-3 02 0 0 0 . * 5.03 * 4 W % z0.. 9 3- * U WZ r * wf C)0 *4 wW19.4 *3. 3.0. ZO 'A. 3 * 0-0 * 9CD *o.. 09 443 a . * z-.)3 0. Z. u w *z OW3.*U. S9-0 .)W *O6-4 .. .0 WU .J 4. . 0. -9- .9900.0.. N 053. 9-. CI--0W 9 U- ,*40 -.-UJ2:2:23.9 3.*4 9..j 9- ZZ7Z-. .4000*3.5. 2:0002:5 000649 *9, 0 U. 0 5.IIv . 9.9 *Y 3.00- 0 0 V*~OUtJY5. 0000.5... - P 0L 9-0 9- 9 42 Owr 0 0 0 0.--~~Q4~UUU~~YCI-I~ 0 9- w 3.0 u. 0 0 4Z 9 I 2w .0 * _ - * 50 - so . a. -3.- rS * 9' Z -. @0 IL 64 * * 0 U W tD S o )I x S a.M-0 S44. * *0 * CL wILZ GL O -W * * U 4. i1 0.&W 4 .Z W *Pa WILO W1 - * O 2 *oa *Z OXQ >C f a *3 * a- a* * 0 -_LIC. 3X *- -- 2 0- - C, 3 - * O O O W WZ 4L 0 t JU . -, _ _ 0.tL .L 3*-o 1 X 0- ZW I.. MJ -N 0.).It L .V I 2 LI ':a f Zw.4 S - Z a. ar 2j.-q - O LW S 01-LW UN G 2 WWNL) 2:w * 1-U 4 OW S vx ZLI-Z X >- Z-Z -4W 4 IL)- I U - I O 2 rUN 0 I u r 0 j L I * - 0 Z - 0 *U 1 UI·U·- I UNN O- Z - -- 2ZLO I 4 S I @42 r a 'I 2'4 -r N 4 LI t4 W @ L W .L 9 N I ZLU -- W ZUN C) Loc~~~Y.L~Y VN-4UNI- w1 L . £ LI LI . VUN. 2 * . ZN- Q 49 - ON 0-N0 0a i .4 N_ M" C*Z 4 .JZ_-4 Z 0-ZO vr W40 IL-LI C L - 4 1- w2 2 2 w IL W IL wI NM4I L - W _ 'e 2 O a O 3L30 C, a. 2. a r N 2223IL--LI 4 3343LI34LI30 XZ'-.42Z01 1- 0 - 1-444. -Y 1- 0 + NI- Z IL. I Wa aO 2Z IL OU, Q IL - OL 3 *a:3 _ IL 4 * I-W N W+4-Z ODL Z VN 2 Z ZIEI 0-0. 04-.. ra a· O*Z2O 4..~ OUNNNLI4N QJNO -O Zaco *^ 0 ZCL+ _e, V S (> ts; Z_-49W 0 . -0 .JN.+JO-O0.32 .9 11a. NZ1-09ILLI934-UJ00. A04 O *-4 4*4NLIZI ~~~a I U LI I* ^. .. *M+w 1-2: .W Y _ W 2. Z 3 IL IL IW 2 . 9 41 L _ O - - - * - a. LU-4G) + 04 444 . 4 - -U " L 4 U, I2 * * 1-44 w-Uz .' 2W *W Q Z WLIU 2 LILLIwa.) 0LI w 4 S S' I > a.W -1 i-LIN > _ ILZ G X 1-1-01-4WOLI2IZa.00.U~a YIIOU~ IWL Z WZ ZW * . 0W O 0O r2 * N, 4< 3 04 Z CLV~) _ WLI.J W I-I W (3(3~eUZ(3WLZD-4- 4 LWIL ZZ W Z CL Qw LI , z Z ~ 04 Z a. 20 C W L O *. LIWU L 0-LULIA -O1 0u .a .ILaa Cf.Z O O YL NW WL O WO G ZILZ %%.%U'.- WILDO '444 -40~0 W0 .DX Z*** Li_ OGt-OO e -a. 0 .4LILLI~)WM* .. J," 04ILILI4W4QP W2:0N > 'A IL ~ I .0dZ C *@3L n-UN Owa~o 4 I W C ra: VW - zv3IW WIL O crOZ w W 4C W rs Sch W W 4 IL U - W UL 0aY a.1-- W L ) - a UZIo LL C zZC Z I ^<2 aX Z _Z (JIL W .Z ~~~~~~~~~~~~~~ U. It Of N IL I0-Ja ~ + C)r~ 1-0 UN D WW tYN OZOZ < - W Z aW Wa. U. 4 4 W La, W - W I- vt W . X ZZ4.- 1- IL 1Ja N ' 0 . V 0. * 3Z 0 L *O G f1 ' a - 4 a. -UN 4.-L OOwZLI 0 0 o o W4UJ '- D -- t 2 ,W4 Mla.It 0U0a31-.4-D N 0 o N 1Z""63xo t ZW W -OIV Z aW aL 0 tL Z SZI )-41I W wIZ ·-IQO W a * * *.IW3 4. *- 2 21 UN* N -o- * *Z 0.*WO a. *-*J C', Z Z Z1 Z oZW OZS W I I Z OW I w-W -X1-IV1 2: I 4 0.14.*.4 0IXt n :_ .W Z * - Wt~~~~~~~~~~~~~ Li U, Z C 1-22 O XL< Z Z 4 * I,0S W 1-4t a. 0- CL L ^t Z 4z WU X ZXLi O O - O _ Z J o 2I. * o --U4Z LL* WID-I WOO* _ - L 2: * Z MC I I.0 ZW 2 0 Z O Z * X SU W O IL O - t< C Z W M UN I NW t 4 .o UN 0 0 U 0 *· * *O UNNUN *)0 Na zA1I- x 4< -4--MLIt W -W OO *a4 a* 30 tl L S* * L *1 I *U * Er) C> c a 4 I * 2 O2 Wa.1*O V -2L*+W LI _t.I4.LD _A _ a..2u.C*,Q~ L\L aCa.ELa.4.- 9 a.4-I4 WILL) -a.--a a. tC3~lQ4-O a.IU. a M3-0 ILIJ. W-I 1WWUl N-3 a * W · t o 0 K 3 't O ~d4 0O *O * *u. *2 1 U U, at a ' *, * y 0J Z_ W-UV) -3 * a2:2 LC 304z W024 3 0. WW 2 -'0 * t>O -wU a*v ) a* X *, WW . 0.1 O -· ^ CC) * s 0Z W _. * > t,24 S * _a Z It IZ M. O-*Z L. .Z. C a* *'a*L 0 W cc... 3W a 0.Z. O g tIW * .- Wu, a a L * a a LID O Z Z LI W W -51PHASE II The programs of Phase II are very similar to those of Phase I. The structure of TA and SI are nearly identical to the same ar- The dictionary used as a reference for Phase II rays of Phase I. is just that generated in Phase I. The major difference in the pro- gramming for Phase II is first, that rather than place data into the dictionary, Phase II extracts data from the dictionary. Second, Phase II must operate on the extracted data to form a pre-index. Thus, changes have been made to the arrays, P indexes. and D array; also, two extra PI, have been provided to help in generating pre- Thus the directory array, PTRS, for Phase II contains two additional entries. PTRS(14) contains a pointer to P; PTRS(15) a pointer to PI. The D Array The TA. D array is still used to store the usage data for all words in' The usage data in D can than serve as an immediate reference for evaluation of pre-indexes. word for D However, in Phase II only one computer is used to store usage information, rather than the three computer words used in Phase I. The P Array The data extracted from the dictionary for all words of are placed in array PO One computer word in The data placed in P P is used for each word of TA. are the usage rate for the word and the frequency of appearance of the word. The PI Array The PI array is used for flags which mark those words se- lected for a pre-indexo The rightmost nine bits of a word in PI -52are used to indicate selected words. Three flags are reserved for each method in order to handle three parameter sets simultaneously. The PARA Array One further set of data is necessary to operate Phase II. Three sets of parameters must be specified for each pre-indexing method. PARA holds 18 parameters in total. Method I requires three param- eters for three trials; method II, six; and method III, nine. Operation of Phase II To operate Phase II, all Phase II programs must be loaded and started from a CTSS console. FILE must be available. The files CATDIR FILE and CATREC The program will request the user to input the parameter sets for each method. The parameters specified as two-digit integers on a scale from 1 to 31. must be The pro- gram converts the parameters to appropriate usage rate for frequency thresholds. Once the parameters have been typed in, quest a document number. the program will re- The user may type a four-digit number. When the pre-indexes have been run, the program will printout a summary evaluation and wait for the next document number. To obtain a full listing of the pre-indexes for a document, the user may type a 1 after the document number. The pre-indexing system will print a table showing each word of the title and abstract. and show whether the word has been included in a pre-index for each method and each parameter set. -53 - . 0o W-4 U, U U. cc . x .44 .9 Z a. -04 0 U V- 0 . a. W 0 W.O UJ C I C -ZW -? _ - o 4W . -J ~~~~0 ~ 4 4.0_o -'0 9- _ * uiZ· U. ~ I ~ ~ Z Z _Z Z. -f . .4 f >zz O Oa u 2 * OU* - * .L * 0W. *z 4'^ * O < Z *.4 0 . 4U. s CsQZ 0. L_ 0 W viW 0 I U a. 0 CL I . . _ t<YYWU<YXWO<YXWQ<Xff 0 La0. 0 F 2t * 4O.- P-N mU ' 0o 4- 4 4 . W 00 >3s 2 20 0o 4-42 U.4.-2U. . _ * .- %O. : o. " ._ -4 . _ 2 mo.mU X. O O.A I - W UU _4 -LU O . *N X ' 0o LzW- V.443 N.Qw 4mmm *'0 Z .m m ZW o 4 - W. -<4-- a, U W :- n _<.,_, 1boC*XN_ h0 §c 1 aX OX4 MLJQ_--We 0 * 0)0. W W ~ - U 0 92 2 W ~£ ~~..4 - C3 U, Z C;) 0 * g *N *W ~ V Tc. 444 U ~ . -9 * 0 Su ~ - C *2 ~ ~ <r z~~~4,4 * S *2. *. at . . _ W 0 - ''0 ( W _ _ - · , .4 .4 F.CC.) _ * y* r* * . .W -f . _ .. 44- _ Z. mQ: 0.*L O-0. 4 -W 0 4, . Q~G4- - - X O 2 a U.Z v f L) -54- %. 4 UI-- ~ ~~ ~ ~ CD U, W""' X Z Z *00M 1-40 0.0 WU, oC) C) 0. ~ ~ ~ 2 Z Z22 ILW M.,ZZ I( X - - .-J .4 WO INI-0 :- t c ~~Z L) U.0...1N0( 0I., 3323U.. 0 00*JI1 00 .3 I* j C(I - 01-0. 21JU.U. ZLI c~- n4 0 2-X £0. .. LI - 0L 2 *1*Wt * W.>O -. 0. . 1- . W 0.2 0 -00~0. X. X U > U ZW 2X-I U ; C) U 0. ILI u CO 'aC .111-C)JaZI.ah-Ua ZWw )O ~2 1Z *.Z4 Z . WW C- 00 3- ~'xw U.~O,.- wO > C*23> aZZC C Z f2IZW*U *u2 h-i Z4 .0000-U. L. · C Z ;IU, a-U u-' U, 0.,0 Z0U I -Ow Z- U...t) 0, I U)~..OU)I0. 2-0 U, '%OU,0 0 4I 0. &Z -00b 1. ' 2 0 U. 0.. 'LL U. O -.O- 2 LI 0. j . U) - )J)I ... > 0-.l L -0 ZI-Z 2. - U 1, U. 2w2I.-U) W ;. 1-J QIC C W .>..2 C.*ZwU W U. *U.. UOC. M.1 U. r-I0 c X 0...ZZ 00. 0 ruO, U. P 1U. *XI.L C..JCJ.3 WOO or< 0l... Z WO- LIZ 0 CcJ . 1-. 22 - W . 1 : 0 -Z U. V UI LI %. ? 0. U. W 0 4 2 > U LI U. a. I-. * 0* 1-1-10. W .00 N2U. 0 -Iai- -WO WC 01-Z 1- U0-0. U) -0 20. 204Z. Z - O. U, L,,CC CC U. $ u 0U. 1-IU . U W2£ U C) 0. 1Z w U -4 a. U -J * U.; Z 0. U,) II-. Ww :, 2 3. I~~~~~~~~~~~~nI 1-2 .I L· 1-I- 0 ,,,, - r-9 U.C10 wC ~ Z U 1- Li · . Z i=, i U - *2 Z (~~~~U)0.. . 1-0W1In J .U.-. ci "W0 w LIM -.. jwwO 22U. 1-- *.Jw L, -, 0.4 ~ 00~4 C CCC0 -*4-JO* W -ruwUcunfO~(AI.c~r~-~-I...3 O--(3-0.Z0.0 - Il a -WU-h-0 -W0.1-1-1-1-(0 oQ..-0 0.0. a 0. 0.04L4 .0 3CO9h0. 3CO h-J 2£ £ LZ £ UWL--U.UU £U-l&- 0.0U WY~ 0n U. LIOI) 00. LOLIW 0 02 Z * - 0 cr~~~~~~r U .L0(3.0 I4~c£U~~~~~~~~~~~~~~~ LII.J U.WUW (0 0.IU a, -J -~r C- -2 - U.) *aL~ur~~r -O~arO~~(LO~~ 000.U)+*~~r*LIc(U . 0.0 U.O - a U.0. I.a..*-I0 -ON *...W.1-00 000.C 3C00..) U. I(AU £ 2 U.I 0 I a.U, 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~m CD~ Ca. $* 90 Z a * U. a~~) In WLU)I.I 2*' u,.I 40.0.OZO0.0Z 0.4 1I UI. U. £U.0.0.~ 1- - aL @0 . . - C0 ~~~~ ~o~~~~ z * U, 0..) LI I.%J 0-* U. U. cc I.) d C)(0(0 2 p. U. (0 . .3 0 I *- (UIU CZW Z U. M, U. .0(0-0.1 WX -. Z' .. I 0.-LZ ::C) ' 2 ..... uw 0.. (01-0.0 0-- ..- Z CZIU.41 Z ui >. :1 X1-2 -0.W 0 90 0 -0* cLI .I( U 0 0. 0 U 4 LiOr. U... J~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~. 3 W. ( 10Z . - U,,.DuJ a0 )0LW (0e~4 0 W u~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~u *0 W ---L U0 Z 4 00 2 Z IO 40 -0 0 4 0 -W WZ 1 * -U 1 -C. 00 0.)N u 0 4 2 001 Z 0(U 1Of 00 0 O W .. 0 @0. . 400 -.* I0 Z;3 -0 0. *U. 0-C W 00.0 *0 1..(. 1C, *0o W .03 0. Z 1- Z -0-0 LI @0 02 -U"0. b2-U01 L 2 0.03. UJ L(U.L .) -+.0,Z> 1Z 04(I3 C ~ St. LI W-LU . LI LIL00.J.I 0II 01 A LW *U.U.)-~~~~~ 0 a 'LUI'. I.C aC 0U2 w0.0 00 00 U. * * 0 00C @4 0 02 atV *0. .2. 3 40 U. a.d 0 0 I I' .. J IOU. I 10. (I-C 20 0301 U.(~~ 1ZW00. I-I ?3 0' t LI (.1 N~~~ a LI4 . ' 0.0.U.I'-1-NWN1- 0U.L.-.I 0 0 * WU.LO.40 0O'OU*00UUJL 0 ~ .iuII 3-I .:1 k 0 0. .1 U. Z CL I 4 30 ( W . 0~~~~ L Z* 4 0 % . 4 0 W . .L 0.-C) 000U, 0 %~~01:U, .4LI 0. C 02 I.2 -. -0IL 0-I LWI (0 - (0u IC -I U. 2I 0. VPI)1- Ul U .J 4 1U X-0..0 0~a 4 X 10A U) o2- I. . ) - 2 ZMo 0(20 1 3. U).I l I . *C)0. .1 1 ( . 4J -- 100. 1-w3 Z0 - (0 5A.I-.-IZZ -.. 0 v Z ( .44U U I N .. U.K - X aC. . 2 U. -- 2 03T 0L1 0 T.U. I U 2 1 JIZ Z 0 - , -XI .~~~~ 0 *W1- . (LW 0. C)2 O--3 C Ul w U I (0 32 -2 . _ - 0.U. C.-0..L U. X2.I !1 02 wj0 - .. U, . U x (0 30 1U .2 UZr £J7 U 00.0.22 -w LI U.' £ .Z -. 0.0i 4U. Z U). 102 .J ~~0wW~~a.~ ~ .. . U. 0 2 U0U. L,-.0 U. WO 124 0I 1'I 0 ~~0U 22-L-1 "IL . X U U -J - ~ Z IC110 - 24 2.0-L 0 00 - IU....I U '0 lU. .03ZJ 0- 0 U 3 ( - -55- cr it 0~~~~~~~~~~~~~~~~~~~~~~~~0 U,~~~~~~~~~~~~~~~~~~~~~N J4 NJ Z LD U .~~~~~~~~~~~~~~~~~~~~~~C) 0J r Q Z 0 I wuC. -- ~ ~~ . ZIa: t -uu W4 o X 2 0. a -z' 0. 0 4C 4 ~ ~~~~~~ ~~ U,~ - o - 0-. U W a W 240D~ 0 Wu 2 4- 4 Z 0.4.4 4L44-iA · ZL .40.4-4 Z 43 4rU3 OLU.~0 NJ. y Il r W4r-*20LILI IL 2 NJ*41. X WZ *2WO ~ ~~ 4 4-4 z4 y0. 4-4 0. IA 22 .. d4IA 4Wu 4-N443 * 4N4 -s 0. IA U DL N44.4.INJL~P -I 4rrV 4 4.JN I *0.NJIA& 4a 4 4- ~ ~ ~ ~ ~ o .42 LI 0. 4~O~ .43044 L - ~ .~ W· C.) O4 U, NI & U CC)2 2 - W O4 W *t * ·9~~~~~~~~~~~~~~~ ~~~~~~~~ *~~~ ~~~~~~~ a N N * 4do£ A . 0. *0 * 0U *N *2D N N · K* * *J* * *2 * 4.. 24-0 0.~ '3 I N 0 4 ON 0. 4~ 0 U N 0 2 . 4.3 4-U., w 2 4- A .U U, 4- WJ NI N 3034 *WWO * 4 LI ~ ~~~ 0 2. 3*4 ~ -ILL2 0 W .-0 N '1 0. 4 3 0N-.0 I 0004.O Ij44 I.S....NN W4JJ0JL LU 0.A444433402 '4 0 0 4 4JN .4 40. c 0.2.4 U.4W~~~0 X 2 4ZN0IX J. LI LI U 0. .4 3 C) 1.4 4-L (J4~~~~~~~~~~~~~~~~~~~~~~~~~~N 00. U 0 4 0 N LI 0 LI U NJ· NJ 0W I'd Z ~ 4~ W4- W00 . 3 N 0 .00 -42 I.3 0..4. ILJ, NI-ID 20 044, 2 44 4-) 3U IA IA IAIA A 14-- 4 J 3 4. I'N OA .J. NN . 0 .01 .. 000). 44 4 c 444 22W2' 4N L. 0 U.UQQ Z 4 O 0 - 4- ON .. .J334 U 40 V 4~ ~ 4IP 00 .. J-4w4 .0a-.4J. 4NN.0-I 4-4.22c422·4'2 * U4c 0.004~~~~~~~~~~~~0 W W N .44N N W.I 4NNu N IL 0. 444 O0 W-2I ?Ir4 2. 0. 00 (3042 I 2041,041. N 0J 00 a 44 --0IA U00t 4 102 * X2 2.1 UX0.U t(-00Z 4 0ONL WN 0424 4- * N-A 4. 44N -N 0U0. ..-£3. 0 43 1 4 W0.0.IA0.2~~ZI Y 4 4 0 * VI - 04 N4 0 O I N 4LZ 2-.-3 .4IA . .. JN 2TU Q L 4- 0 0 000 4- 0 0 Z22 *0. 42 .W~2N *-4 .N . 0 0.0-2~WU 1W4t4- . . .. U. 0. 2 2 4 3 C N 4 4 0 02 4 N J. - LI N NO~ 41 4 0..0 0. NJ 0.J 0 .0. . 4.4 a V443) 44 3I 0.0 UC .J3N 4 0 .W rr-c l Qx ~ -·2 c ( ~ ·. IA -~ -0.2.0.41 44J 4-1c 0440.2c-'44c-4--4.441.4u~-NO -33224--- £24U.0.JNJ..J0.3NOI 41 20 UZWUw.- ~ .40. -23X WLL2 0 4- ~ 444-44A - .4 .4 4 4 WWWOW~ ~ .4 I ~ ~N 0.0.0..JE.~ 04-N4-40 WW414IW 0.0 42 N CL -4 4LI 00.~~~~~~0 C ~~~~ P.U. 2 0. 0 * W~. -3 1. 0 0 *1.4 *43 W LI * N *A N0 ~ z 4- 0.44434W 0 2 4 y 2N0 0-wOO U r0ZO Nw 40.IAO'0-4-40 0.20w4J444-4 WN 0.. 4- IA WU WO0r . W 00. WU> -J 00 C.-4 4.JNc 0.LI 0.-LI rr 0 4 a. ·o * ~ Z me ocrY -4 U, U ~L~ wY'~ 430 I4 OY00.-40 U.UU U.n - ~~ 0...J r1J N~YIA. 4-04 4440 nJ4(0.ONJ e U, 10 r 44 2 -04-N4222Y0 LU N .. JW~ N .4UUJLIJNJ-24 LI41.U. 4 U N24.uw N..UsCU· z ar Q~ NN a 4 -~ -0. * 04L WY 00.4 Z z 2t 4 c 0 2 r-· *c 04w -. 32 2 4 4 444 .4. 4-4-O 4... N N-I 2 ..j 2 NJ4 0 c -0U 24-U.-'04 LI4 41 ON c1r0 430 2 & 4-4.. N44 ir NJW .JNJI4- c Z&ILU-I .Z 20. *UJ1C o ..;. Z) W W 2- 3- cu 04-20.*O ~WNJ 44. XX 'O do-go 44 2.0NJ 0 0*r 2C * N-arm~ IL3Z 00N W CD n.4co .4 4- 02 CL 2-324rro2 2 U, NJ 4W~~~ A * 0 0 333 3 WW W 44 -56- Li -L (66 -L0 6/, 4i IW .6W 0 ~~466L C 0 W 6Z CLK 8 I 366VI VI VI 1 I 0 Li30 666-NZN- 3 -1 * - 6fl 3 0. 0 ~ ~ ~ ~ 0 uo (6 6/14- VIlNUO6O *..J3 .JZ *WU ~~t 360.610 £3 6 *· 4 3 6£ Li-L Li 6- I 01/14 6- 66 0 66V 0.6 0 II Li r1 1A~ a *1.. IU.IV TO 6 ) 66L66.1. 0..U.....J66A00... OO ZI).1(~r~2 3..i6 w · a~LYvf ~~~~~~~~~~ ~~(Z~~rLU~~~X ~ *~n~N.*3~~~1-CC*~ ru~~~~~~~~~~~~0d 6 P- 46-3 LiiI -'0 u 0 01 -4iZ00) 0.10.061 0U00 er Cn~Ii. 660000 (3-666666 66-0iLii3 ILC L 0 33300 LiI 4 1 I3 Z ZI 6U 461ii~~~ 01 6-01. 66 6/6 0.46C UV U. '0 KU. 0~ ZW 6 U. 0 661 666-UU 66 660. 0.ZO.VIVI.I 63(3…V LIU~cU U.3 301010~~C;Y)'1VQr-C WO~ O 10 61- 116116 U.I L ~ 2I 0 VI Z.01-3( 6U .. 0..I..0Z dZ~~~~~~~~~~~l~~V Z Z 01 0101 UO~~~~ 6V UILiOVI . 6 W6--6-0.+c 0.6-66-6h-U ZWXZZ0 0 1(I z 66 I26 VI 6 666 Z. -~Z 0 ua - 22 6( 66 .J6J ~ ' 4 " uZ 01 0OZ(I0W.VI LiU U C1~6 .6( ~ (3 n~ 66b 3 66 W 1 ' I 6 .01 Z 3U.U.6-U.U.16 x~~~~~ 6-6J6 ~ ~ uu 40.ulru : 0 666 V -' 0. 0.. W V 3 3 .o 0. U. X . - xur ~ C) J. J 44444- iiD~ Z~~Z Z NNN411O w ~I2: IZ ,D(D.4I 0 021111 W.J I 6-6---hj 66VI 66 6a Z . 6/6661 0 ocUUUd IIZd 0CLLC WI 0C'0.O9OOV 6641 0 *U64n )( -. 66V 00000 VI.V16U-6 6KOVIV 2. 66~Y'-0.J 0. 10 6VIVIO. W Q.J.J~J.I 6 L 0W~I-66. ZC -rYIUYVALI A6/ 1UUI U 66yLi 66- ry 0 11666666 O~~ ZC) 6:ZZZZ IXC) r +cI 6 6/6PL *?I 0 Z~~~~~~PZ~~~~4a W061Li6-VI0.3 V 4 ( - ~ Li-0 Elii 0. p.666-6/1 0Z _ 0 j 4: -- 2X-.U04) I.-W 0066066 6.04 4Li0 *4 Li 61 *K 10.01-0.** JO..0.0.OUJ.300. 6 ... 34 3 13 .133331 B66U.1.16660.66.366I-U.6~03333 1 66 ZZi' .CU.6-01W0166Zrl9 3 6 6 I.00 1(14. KOU.6WO~ CtU.3LU.U66UVI1O4W0 66q~~ *4*4* 61 3 .? 0 *VI 0 666 66. VI Q404C V 00&66.. 6631/ Li.J U. 06 6 66VI*U 0. *?-0*N 1- ~U I 6- VI Li L 3 - K3 0.6-L fm K .6-I 363 00000 Z, S d. Li* C 0 OC00 a01 .JU ~ 66 66C VII £ w66! 4 6.4 J Z w ~ N 4JLIzi I GOQOVI 00 4 3 '~ -0 -o .Wo u .aZ (0.40 66-Y 4nZE4w 66 6LI LiU..) y-44Z U.2.W6IO W PPIC)L 0 , a '.·Lu? *74-U.Li0 6-6/ NV 6.40K X Id . .J0W6-6--)CO 2 Li 0W4P/1Z0.30.LiVIU.?I.166-lJ A W 0 * ~ VI00 4 4 a C . *u 66-f Z40.4WXr0 wI 066' ~ ~ ~ W W - Z C) ( I iL C Z-L M- "L l . 4WIL - 4-a X c u Li (3 . 00.0~~~~~.0I Z (3Li 46/U6W J - oo 660 U~ ) 66 6. Li Z~~~~~~~ .: U.(f0 ZLUPIOWZ w w UJ-i0-VI JI" A I ZI O4UOW 'UZaW0. 3400o:,~Z W I Lia'fl 0 (3 wU lW No . 0 Li~~~-i~ 66 6IQC Z W -rI 34 & LiZ 2:2:(00 I~ IV 0~ VI VIQW~~^ VI~Cb VI~IJ~~ur re6.-U avxi 6/66/1~4Q4 ~~*1U LI(ls Licr~ 33 -57- 0 W W O~~~~ i.U u~~~W c 0 ' . Z l . C Z i-)4, 3. 0 -YIL LY :C 'a ZIL . '' I ul .)-NL ,I 0. LU· W0 0UJ ZI,-]L I '-L '"'1 I OU .0* II,.. 4*o o o *Ll Oc *ZI43- L WIl U C rc~~~~~~~~~~~~~~~~~~~~~~a O W ~~~~ ~ ~ ~ ~ ~ ~ '*~ ~ z,,o~~~~~~~~. .4.~~~~~ ~ OOILZ ..ZIu - .1 4a a U .)NC I M.. rr V - .b Z'L WILL4 0 I~~~~~~~~~~~~~~V aal a,, * · aaL in~~~~~~~~~~i* I0..Z . Z 3 W2 3 U UjI 0~. a~'-'":! · Q~~~~-0 6. 77Y aL ii u * *ILI * . '0 I 4 U fl ~ 0 - 44 7 I n - 70 3Z N 0 -1 (ZU .0L0 ' .J - IL Ia 0 L 3.J 0 0 0 *0I .0 I I nI 0 . ~ ~~~L j 0 1"'~ 7 7 I 0 .4 5 0 4 5- * .. 03 .5 j n.54 . a. 4 L 0( 5- 0C 0~U Inr Z 5-v IL 0. r0.. t c 5- u. (I… … .3 - 0 1IL I IL.'4 .. … … 3 ILO 0 p 1-4 -49 0 449I4 - 0000.10 I0LIILI5-n ~ 1J, k) 01UIWr4A..22 In) ZO *-L0 IL. 0.) 4 00I0.7 .. L.0 ? 0 W 0....." ' - .a*' 0SLl0I .0.Z.. 3 I L 0 0 41W- -L IL- 0 £4£05I5 *t 7 I . ZI 9 0 I J ~I ~~~~~~~~~~ 0 ~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~M ~~~~~~~~~ . . .00 ~ ~ '. ~-r OA ~-IuI0 320').LIZ.010 Z2rr I00 0- 7 4 94 . 04 i 1-11-0 I00.i0 .W 02. £i2-5 L IL OZ 7 4 . 1-. 1-O 5-5 I In U4 Ida. U ILa - 0 0 L .. . 2 7 -ZS.) .1s IL .e.J 5U N -~~~~~~~~~~~~~~~~~~~V 4 - 0 - 5I 4 ).5-IL 0 I I 41 IL ur~ ILJ.*-Q7L IL 0C *.4.. kL N 5-3 =)~~~~~ 0 Z 0.. -..ILOIL 2. 3t 441-.4 WO 0-OI 4.a.(,) 0 lr 54 OI6.1~~~~~~~~~ 0. 0. a4 acn IL *- 94 *0. 44145- 1- 0. II0. - u IL L 5 * ' . SO"' IL3 * * a 0 7. £45 5-DI ).. °° O 0 2 '5 n 4 W 5 L ua 0 0 0.0W~ 1IL W3 . S )( In co IL. 13. I,- *4. -J5- ~~~~~~ 0~ ~ .j.N um I -. *l aN ILC a1- Ia - Il Z LI X0.w .1 0 I04 LI I 0 0ZC~~~~~~~~~~~~~~~~~~~~~~~~~ 0 r5-5- IL * * - ILZ£J.-. - 0. 0n 17I 47L2. Ll IL 7~L ILJ ou.0 a. a 00 03 10 c ZII0 .10 0 00 I 1.9Wuh 12 - OS.404 0 400 '· rc 0 L 0 -58- =o 0 LU U v,I~~~~~~~~~~~~~~~~~~~~~~~~~~~U ru u~~~~~~~~~~~~~~ i l W 4 I U. a, I . 3Z < 0o Nt'9~ zU, u W 0.a~ .4 D Z LUU0, I. 4 W WJ .-Z_ . Wc -. c U, LU U H - ! 0 0.c SO U. 2. 0. .NU. L VU 3 MN)* w w0. WO 0.0.0 . U0-X * H-30.wUZ4 -. 044 0.1 · *4 W W 7- 3 W 0 0O W =w U *r 40-0.4. U 0. 0.644 3 Z . Z I. ~ "Cc ofI.M0.Hcc 4 ' __3 LUZO0 *** 3.4N L4 4.U.4U.S-U 0.0 .4. - * -04.W40.LILL..-W -C0L.H-.40uZ * 30-wOO I~---?-N * 0.H-J L~~ - : U,~ .1 - L C., L( o '4' 0. Z 0. H U' *HH 4'N4L -z N'J0.3 HNN 4H 4' * 00.110.-I -0.0U4 -. C43 * 0 HU 4-0-0( .0.>w0 -NZO * uNLU>U *-N 4QU> *U0. N9 U-4d0U~O0.UN-4 2 SO -l.4 30CL.0-4-0.0 ).>U *0 0.4-L 0. -0.0 .40.Z44.U40.U 041 0. LUUJ4WO 0.0-0.. ~ C IL W * WX LULUW U, Z WM 0.ULUuZ C > . W -. JI "I 0- IW "I 0 4LU Z UU 1VI W 0. * 0 U -. * H- av UI 3 Z6 CI64UOZ -1 4.4 ~ ..i 3 0 - Sr -W0 H-U£ 4 -4U L z CJ.)if ILZO4' Z...U .. . LUU0. *HHUNUN Z 0-3 U H-HI . *6N0 2 .>. UN. * Ua4 4 ZW 00 WU-WM 0W-6WM0 IMMM4 0. LJ * O'4.0 LU 17J. 64 030.).. 4 .4 H-. 3 0.646-Z .4 cwZt4 30Z0.N. HI WU.'4 0* 0.-40 WiA -. I. ..H0-LU04 4 LU UUO40. W0. He Z U0 NuN 4.4/4 HN(Z 02 5lr-40 4 H0.U.IJI4 ULL) IN&.L IL~LU440 UZ *UH-20. NZ 0 4-0 U 0 LU UN AZ . O4-m S 4W~3~) X 0*Z.uU.44 0.0U OUU HW-4110(~r ~- ~ . - ~~~~~~~~~~~~~~~~~~~~~~~~~~~L0. 03W*U W..O 3 ZiO 0 I-N~I0-Z--H-U4..LUU4N.C-4--.-.1 0. 5. U.4ULNH-U U, OI.~ L.C1 u..--0'-l II~ W-40.-Z--ZJ0. Z -0.30.0 Luf) 0. .4 0H-HH-0. U,4 *4UU4U H-HZZ4N . 4 34/,H-44UH-HU Z4-4-Z"I N N-k I U 0 4/X .4: 04<0ZC H- II -0 W LU 0W0. 0H 0. UNr 0Q 4444444 Ua HHLU LU NC 0at 'N. LU 0. 'N NLU - 30 0H-LU0. U,40(0 U,40 U, 2 0 0. 00N N 40S U. W 0. U LU LUI U UN 4 44.000 LI1 2. 4 QD XZU Z ZLU IL ZU N.U2.UNv N H-H-U .4 Nr 3 CI N 0u 000 U U 0. 0. LU ---N N e ~ rr- 44z 2. 0. -J LU H--4 4 ·0. -3 H0 U M N · N -- vI - 4. 4 * r rc IU,44W0 0 H'000 0 . N N N~ N 000 0. H -WYr 4 0Nf~t~JIT~t~t~t_" * 00 0.teOa-OU. 3OO0 -t4tJt -UU~n~0O0OQ00 O HOI* NNN I * .IOW-Q. 44X 0 .0 .'Z L* 0-4Hvu,4o NtU 0 ( N N L N I L .43N-444.ZZ 0(0(0( Z Zk0(4..J44Z NNNU Z U .4 0U,...JJ..J .J.44Z444WN.W r~rr 4-H4-NH- Z 4 o7 W LU0 LU0-0. 0 3 - 00.H-u..ZLU0. 2 -) I0C 3U NN4 H- U H, LU N 4. I00 .4 UZ 40 .. J 0 4. O N . 0. N "Ir HHUH-4 UUN0.2 L LULUWI-WOD NHI UN3 ox XM"Z0 4 30.4 0.4 cnL U,.-U, , NH -I -· UJ.4IO.4-IOU H-4.L OHH-Z 0 10g-14 N Z-M7UN 44, N4 4 4 Z 40Nu. 00-0. 0.·-·Z0.IU H-34..NN-H0 H NC LU 0 LU LUH-0 44--L 34 4Z JI H--..LU. 4.0. 0. 0 I . 0 4 40~~~~~~~~~~~~~ -Z WZVI' 44 - , OU.40 U, 0-U, .) I HLU 4. --- HZ 0 0 M I~~r~ZZLZZ~~Q~ N 0~~~~~~~~~~~~~~~~~'.I. 0 ! U -J 0.I 40. 0.C H 'N 4L UW -- I I. 0. UN+ U IL 2~~~~~~~~~1 0,0. U LU 0r N IL I4 U, U - - 1C.0H 34. 3-4.Q U3U H-IUI L 0-0.0 -U~ W Ca4 ..-. 44 4 LU .4 H-a H I0 H H0. IL U) LU U 0. U 0. C 3 4 3" NLUI N~4 NH-HU, 0 44Z N -·UN. - N N· 0.00 0 NC LU 0. HLU 0. H0N 0. H- .C LU 44-H-·-4 LNO U Iu~ U 0 0U U Z LUJL 0 0.LU Z 40. I O Z W 4 H-LU'NA HUNNC1'N. OU .W4. 034.0. 4 0' N·.. ON(NH CUU H-U 0-0 Z LU 0 -4 N4 UU Nr N UOO '00 U--'- I Nm *UU 4N NC( xH- -X U- ' ~ 0NH-0.0.0.~( L H H4 U - 3 H-H-.4I0. .N · N · -~ N 0. 010 N0. IL0. 0-4H-LUC0 0000NNNI NH-Z N 4-NH111* LUN NH N N 4 I WL-·-ruSN · AL U, LUL N 0.3 … … H-L~~(NUYr ON * 42.4 4 V*)UU. 'UOa ZNNNNNNN0ZZHZL U2.-H-LU .. 40.LC-Z … 0.0.---ar~ 3-' H -. 40.·- 440 0.0 4~w-LU.-IH-.-4N444 -V U .4.H 0.044H-Q N.NNCW NUUU0.0.0.--U H-I -4. H-OH~r-0.~ W4/N r-NH,-LU0.4UUU0.X0 0r K. - U UUUUUUU -00000000 4N-ttt -6 00 000000 VV U UJt~t-t .)-r·t Nt OO O OQ OOUOOUU O a LU HL O - 24 Z ONH-4. IUNWLU0.--r 02. 04cu 0-0 33 34W CJUN U,Z WU, W N4 . 0.I W -. W 0. 4.LU2 4. 0-U 0N 0-. CN U2. U M XUN 0. 0. H-.4. A UNLU0.40 M. 0. 0)-o.' LU 0 0tN.V -Z 4 U"~.~NZz ~~.. 0 0r 0 C L M U W W ILO0Z-~.u- 0 Lu 4 .. ~~~~~ U 3; MLU. -2 / 0. Z -u 0.2M.W. 4.04. 0 -J U I W0.0 U I0 w0. U C, UUU .. Ny U LUW3 II .4. .4 30. 4WI M- -J44U M2M. U- n WLWUU...4U-.u.0.--z.. :WU U LU * .0. WW IZZ ZZZU. U W U0.U, "J W IC"I c, I, .4I M Z 2 LU W U.4 OW )NM 0. 00.U. .. >U, , 00.0c WZLUO Q 14.H H-2.U 0 ( - -U, L N-2 IA0. UCL UW 39 CZ 0 UU HX 'a' I4 LU Os 34 *m N-)U .. NU Llu.IJJ ~U, . 0 ~rL U. .... ~ ~ ~ ~ U LU 1 U 3- UI . Z H-N' . H-4 W0WU .4 0. N-0.U I w *-N 6-4 H-..4--U .40.0 I LU -0(0.0.0 4W U'J-..I---..4>0 0-)-0. -000(30. N0.4 .0. 2.~N c14.U U A 0W4U, WJL 4a-J a. 0.0.0.-4 0Ul a. .. *.4.U * LUIS 00.0 -OZ .0(00/ rLU or * iN UO 0.4 Wu. *2. 0 63 Z. WO I, 4-0. U w 03 WI WLUN ~~~~~ ~~~~~~~~ 0 W - ' 144 J.H-NUZ S L44104. 40.1 3 0 WUwZ 1I L" U, ZULUM4' O. W Cc rq~~q * WH-44N2.44 v a 0 0.LcUH-O-4 1^ S ~ . U,"IN W.IL.0L LUCZ W UN 3~ C) U H-4 UW 40 LU I LU0. * 4 I4 IU"., It, MW-ZO a S U 4: CZWLUN(H- 0. o U. U I UW U, U LU -0 WI kOUNO 0-'I 0.0 0 -LU\L0. U NLUN N0.0U N 0(0.0ZN NHU, N402. 040.cc 0.4LU0-af~r 00 U N 40- I -J LU LuZ4Z * 0U 402.4 Z0.H-. HH-NC 4/00 4t W 0.7 s-- .0 . W LIU L J .U0 w~ 4 0MXX U U, N0-0 *U0.0 ZLU 0.0.4 II r 2 30 FLU LUU, 0-101.4 .L0.H NUN 4 U NZL. 400 H( 2 -.4. 4 100. 4-4 '0t~-YN-N-0 0-- 0-p H -59- Z ~ ~ ~~~~~~~~~ LIa.i CZ3I IIL) U- LI U a M LI (JNW N4 08 U. InI 0 3 0 8.2 U, U-.-3 D .. ZU -L Inf0ZZ 0 . iii I'coIn - 4 aZ WV -c IU, I Z 4L UiZ; 8W 11- InMZ W I In 8.~~~~~~~~~~I !-1 0.J 2-8. w.3 30 - N.N11~~~~~~ N-? 1-t ....8' 3120 W 8., 4I 4 1 W 2~ Wd8.0 I j at · °~~~~~~~~~~~~~~~o 0 ~~~~ * b JU I :EA. Z Z~~~~ OIL9 3,- J 0.X .. . 0t,. I"O Uo w o* -w04 'o- ,.' .-.- 8.-- 7' 0 N- - I N4wUJO , ILN-D WZ J810LZ W· e~l ""' 2" I- 11 -0 O CZ *I 3In 2 48.1N.W 4 U, W . -<2 4..~ I, Q 1-,-,,.1 A.11 x *,"I Z 4 :.2L 0 -. . LI 821-InLI I -08In -4 to- - 43., 8. w I.N ) o~~~: a. 3 WI w nnnI 2 NiMi..J 4 cc- X, LI* 1 C3i. 4 IA 2 - . .LI .~.~t~ NWU 42;8 444 U..8. N-InN .0-.42 8I 111.8 M.. ~ .0i~1 . II I? LI 3.I.NQL)8 8. 8.l a. J - LU s 70 . ,,. 1-8. . - LI -- 8J . U. LU -. -W I 133 13 N - I.% ~~~~~~~~~~~~~~~~~~~8--tt'" *In8.222LU -.- *8 - I a 1-4 2t N0nnn8 L ' U~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~,~ C}t. L,, 8 0 4i nZ4441-NLtIw .I-I-IP. I InN-> .N W .e... WWL 4 W I~II NW0 .8 LIIL L.n~ U. 8..I W-"4~ .8 . n8I. , -4 -or. ,= W04 8.In 8. . 8. I ~ ~ ~ ~ ~ ~ 0 ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Z 0 U,~~ ~~ ~ ~ 2 ,it., NI -4I ~n.W...iIL 0 0 ON IN LI)8. Il N . 2 LI 0LW 1 8. 4 U X30 4 X?2W-U, W I4 4LI I..3-.8.2U.O 4W · 8. ... cc~.~~~ ~~~~L w LUNWWLILIIJ 2 710 C.)S. or=u3 11 .1 M.--.8.8.O. .8 LI'4NLU1I8.i N4 In - 0 1 4 103 L -, . .,X. 3W 8 . 0-1 a wax U, W? 8. 8. W N ~ IL ..-- ..eO, 0W<ou~)-2 1W1.000 Z4Ii~~~8.W~~O 4240 2 0 - ) W3 -- r..0 00 L)L - ~ .41 0 II- J. LU4 W 8.NiNONNi o - 2 0 . In - N- 8W U, N In i .- . 1.-. - 4 -. UU J 'A-. z3 U. U..j - U, 33 1-8 P-3I1I3 N In333 011 8. ~~~~~~~~~~~~~~~~~~~~~~~~~~~,U Z Z~~~~~~~~~~ w 8.41- 41 Z J 8 48.WW441-W0U~~1-I-.~ X - U. LI U.. I- 02 4" .0 Z U- 1( ". 2 N 0 .08. 8.Ni4UJW1- Z 1 - "'(3M. U.. LI Z 8.. U ' 1 - . I-1-1-I-~1C -.-I~4I-8.1-1-~ 1-U- 1 .1. 2- 0 -U, : ,- )-o X Z- 40 U. ., -60 - 4.. * tr' : * 5*' , ,. lL O) 1 C * * G 0J W6 4W * *0o *0 O *Ub 5.. *0 J~ -, 43Cy0 I_ A LL'M.~ -. * P2 U O- Z-0*.nL , - * W 5 5* * 3W ~ * V. Z0-'- - Z v.... X~~~~~~~~~~~~~~~i z ,I,, - .4-J - .. O )34664.. W 0 Sr -0-0 *434.C * * 5 * wd ae 0-iW C4W o- *0 w .4.0 * * 0-(D S .L . 4 . 5 J * AW0-). ,, *! W egt c | Z A WW- * a0 *Z ? kga . w cJ 02- 0- : .. . 0- _,._. C NN 4. N _ _ -N41" - n4 O .4¢ U. 0 4(MO O 0 '.4 0) *O 2' -_: 43 C NN e . 0X*.u. N _ _ .. _ 0-0 -N N NZ N) '' 44_ v-. Cl - -6 '' ^ zz< 0-0 *5 X - N N N *4x .441J_ _ U. NsS 45- . - '-4-. J * * .w -_4 - )~ ,,3 00 Z * * 0- 4.50-4J0- *A .3_ .0 _ .e . n eK II * 00 O-1O ^-) ) 54^-"W + ^ o-.4 U.U._ 4-:.-Y '. _0 0- 0- F2 0--0 ."..W -0-0-0- JW 0 -0. . 4 2B ***I)4 --- U. U. 4X J 2%*. )U2v O 40 -- N U.LtW W ZZ WW 4 2> -- 0 -0- _ 2i_ L .)0- 22 44_ W 5 :.ZWU.4-U.U W Z W6 1 0 a - 0-4 u4 cf · _. _4- - ^ ^ - - _ U. C 4.4C + .6 W4 -....... V 4* * 4S-*4G4 4+, J, .* _. W Id .U, *4 0-. 0 * -O - * C- 4.I,'l="4 7 WXXs0Z U> Ws w Ci JU. Z I _ _,%IL .Z.). ^ _ 4 - 4N 4.-W Id Vt * 5 C --40-e1^ 0-0^ 00f +XFO.O~C -sVoul _N *t'N *00*0---' 4 M WM 4-44c-40 . .N I g 051-0^fi<c I5-4 W_,Uu Id.'N4-4 -_ *0 v 4 4- 5 5 * 0* - .^^0- Z …4 0 x w.fiz C6 1 C4, k 0- of W~ ~~~~~~ 4.4 .I U ' * .4<t . 41 4.J Z U. - 0. a. 4 - * _U * 0- .W0-? e *... - *, G. .Z- c4 G.0 tJ - O0 _- . -3o4 -W w,, V* L - Z4 Ofl444 - - Ww W U.F 5v4N w 4-AAA 0- * 0 W 2- .6 0.4 O W .w, * 4 *I,, 44-.JId~ W_ w c 0-W.. ,L 4 3ZIIcZ Z24 * 1 .FJ O C 0-..) 4a '.4 r W wC4 J CF-W0. ..0-0-1 IL. 04 0r c0-00* -W ( C c 1,X <..) O c. * 0W.4) _ 0-V0 . WI. 0 0- *02 0, Lr _L 'o*t . 0 C) oI) . 4 UI W C~ * 5* 1. 3 *4.Id.^ -N) C J* _ I_3 W0 U.Id)3.S0.I IOL 0. 44 c f FC40C C WI. - .C.Z <. CWZ I SC.-C . U 0 G . ^ L4 44 --- _4-0 * * * *W_ *0 5. %. A 4 -. , Q ' t-,** W Cg ~ 4 .6 *~~ .624 or~ 4 '.) * W *549 )-.3*-I ^-30..5W ;W-W Id4.. * _ ~' . W : W4 : Z Z 0 - a, * - * 4 o 0. CC (.45- z * * t- C- I , *0-.* * n4* 0 4 ^N: * * * r WC.) } _- t4 . *L * . W W6 cZ uJ Z U, ,. ,cJ * -O' ~~ ~ ~ ~-03. ~ ~ ~ ~ ~ ~ ~ ~~~ ~~~~~~~~~~~~~~~~~~~~L * C W im of G, . p.~~~~4 p. 4 t' ,l * . Ct -. Z O WU. CZ 0W i 0: W 41 4O * ~. .4 Z U. _ 4 W6 *O 5'.tzV 4S * * :0 - 0 * *0 I "'~~t * .- 4 U, c 50 * * * 5. *4 _ 4t -40 O *0 0 -04-0_. vcu f _ 4-- .. 0._ 4 _- 9 It 4.) ..11 5 S 000 §|"+C -- 44^4^----0000 1J44L4o L*sv-0-0.--00I4-.0 f osf ~ 5 -4 0-< .:. Oc 2. W 0 W2ZU< '. QZ 2.6.6 BIBLIOGRAPHY 1.' The American University, Machine Indexing: Problems and Progress (Papers presented at the Third Institute on Informatic Storage and Retrieval, February 13, 17, 1961), Washington, D. C., 1962, 3541. 2. Baxendale, P. B. "An Empirical Model for Computer Indexing," in Machine Indexing, American U., 1962, pp. 207-218. 3. Bohnert, L. M., "New Role of Machines in Document Retrieval: Definitions and Scope," in Machine Indexing, American U., 1962, pp. 8-21. 4. Luhn, H. P. (ed), Automation and Scientific Communication, Short Papers, Part 1, American Documentation Institute, Washington, D. C., 1963, pp. 1-128. 5. Luhn, H. P., "Keyword-In-Context Index for Technical Literature (KWIC Index)" in Amer. Documentation 11, 1960, pp. 288-295. 6. Maron, M. E., "Automatic Indexing: An Experimental Inquiry," Report No. P-2181, 1 September 1960 (rev. 2 Feb. 1961), p.31. 7. McCormick, E. M., "Why Computers?" in Machine Indexing, American U., 1962, pp. 220-232. 8. Overhage, C., Harnman, R. J. (ed). INTREX: Planning Conference, M.I.T. Press, Cambridge, Mass., 1965. 9. "Project INTREX: Semiannual Report," Massachusetts Institute of Technology, 15 Sept. 1967. 10. "Project INTREX: Semiannual Report," Massachusetts Institute of Technology, 15 March 1968. 11. Salton, G. (ed), "Information Storage and Retrieval," Scientific Rept. No. ISR-11, Department of Computer Science, Cornell University, Ithaca, New York, June, 1966. 12. Stevens, Mary E., "Automatic Indexing" A State-of-the-Art RePort," U. S. Department of Commerce, National Bureau of Standards, NBS Monograph 91, 30 March 1965. -61 -