INDEXES AND INDEXING Ma. Theresa B. Villanueva Head, Microforms and Digital Resource Center Rizal Library, Ateneo De Manila University April 15-16, 2013 James O’Brien Library-Ateneo de Naga University DEFINITION OF TERMS Index a tool, which indicates to a user the information or a source of information that one needs a systematic guide designed to indicate subjects, topics, or features of documents in order to facilitate their retrieval 2 Indexing the process of identifying and assigning index terms to a document, either to describe its physical characteristics, give facts about its creator or distribution, or describe its content 3 General Purposes of Indexes To construct representations of documents in a form that is suitable to the users to browse through To maximize the searching success of the users To minimize the time and effort in finding information 4 Uses of Indexes • • • facilitate reference to the specific material or to locate wanted information serve as filter to withhold irrelevant materials make the information storage and retrieval system useful to individual • disclose related information • tool for current awareness services 5 Alphabetical Classified Book Audiovisual Periodical/Newspaper Concordance Card index Printed Microform Computerized 6 By Arrangement a. Alphabetical Index - is based on the orderly principle of letters of the alphabet; used for the arrangement of subheadings, cross references as well as main headings b. Classified Index – contents are arranged systematically by classes or subject headings c. Concordance – is in alphabetical index of all principal words appearing in a single text or in a multi-volume of a single author w/ a precise pointer to the precise point at which the word occurs. 7 By Physical Form a) Card index – an index in which 3” x 5” cards are used as the tools b) Printed index – a tool for indexing or for researching and retrieval of information that is in printed form c) Microform index – index to microforms such as microfiche and microfilm d) Computerized index – uses computers to construct indexes 8 By Type of Materials Index a. Audiovisual Material Index - textual labeling (index terms or description) is needed along with image matching - search on words may retrieve a particular image related to the search term which in turn can be used as input to find other related entries 9 b. Book index - a list of words or group of words arranged alphabetically, at the back of the book giving a page location of the subject or name associated with each word. 10 Periodical Index/Newspaper Index - open-ended projects usually performed by group of people - consistency is a challenging part since each periodical issue may deal with unrelated topics by several authors - written in different styles and aimed at different users. 11 Classified Index Entry points are arranged in a hierarchy Alphabetical Subject Indexof related topics, starting with generic or broad topics and working down to an alphabetical subject index covers a number of different Author the specific ones. Index kinds of indexes. The arrangement is in alphabetical order Examples: Entry points are names of persons, organizations, and follows a familiar pattern. - Index Medicus – classified index in the field of medicines and related government agencies, institutions, etc. disciplines Periodicals Indexes - Engineering Index – classified index in the field of engineering and Examples: related disciplines Examples: - Reader’s Guide to Periodical Literature (RGPL) - Development of the (IPP) Philippines - Index to Philippine Bank Periodicals - Philippine Chamber of Commerce and Industry - Romulo, Carlos P. 12 Specificity Exhaustivity - refers to the extent to which a document is analyzed to identify its subject content INDEXING PRINCIPLES – refers to the extent to which a concept or topic in a document is identified by precise term in the hierarchy of its genusspecies relations Consistency –refers to the extent to which agreement exists on the terms to be used to index contents of documents 13 Principle of Exhaustivity • Exhaustive indexing use of various index terms to fully cover the major and minor themes of document • Selective indexing use of a few terms to cover only the main or major theme of a document Exhaustivity results to high recall but low precision. 14 Principle of Specificity Example: Genus: Citrus Fruits Species: ORANGES LEMONS LIMES GRAPEFRUITS Specificity would result to high precision but low recall 15 Principle of Consistency There are two types of consistency level: Inter-indexer consistency refers to the agreement between or among indexers in assigning subject terms in a particular article Intra-indexer consistency refers to the extent to which one indexer is consistent to himself/herself on assigning subject terms. 16 Indexing Methods 1. Derived or derivative indexing – a method by which words and phrases occurring in the title or text of documentary unit are extracted by a human or computer to serve as indexing terms. - also called an extractive indexing. 17 2. Assigned indexing - a method by which terms, descriptors or subject headings are selected by a human or computer to represent the topics or features of a documentary unit - assigned terms are often times taken from a source other than the document itself. 18 Indexing Language An indexing language is a language that is used by the indexer to represent the subject content of a document. 19 Purposes and Uses of Indexing Language: to represent the subject content of a document either using the words of the author or assigning appropriate descriptors from a controlled vocabulary to help users discriminate between terms and reduce ambiguity in the language 20 Types of Indexing Language 1. Natural Language - uses index terms/words occurring in the printed text as index entries; it is sometimes called derived-term system 21 Characteristics of using Natural Language: • Improves recall because it provides more access point but reduces precision • Redundancy is greater • Uses more current terms • Tends to be favored by end-users 22 2. Controlled vocabulary - represent the general conceptual structure of one or more subject areas and presents a guide to the users of the index - categorized as assigned-term system 23 Controlled Vocabulary provides cross references in the form of Use: To show the three relationships of terms: a) equivalence b) hierarchical c) associative This is achieved by providing or showing under: broader term (BT) narrower term (NT) related terms (RT) use for (UF) see also (SA) 24 Relationships of Terms: a. Equivalence relationship - implies that there will be more than one term denoting the same concept 25 Equivalence relationship: Example 1 Use for (UF) or Use reference (see reference) Example: EMPLOYEES UF: Personnel Staff Workers - refers to a preferred descriptor from a non-usable term 26 Equivalence relationship Example 2: BIRTH CONTROL UF : Family Planning - reference deals primarily with synonymous or variant forms of the preferred descriptor - it is also used to lead the indexer to more general terms 27 Examples that indicate Equivalence relationship: Synonyms (e.g. Reason; Cause) Quasi-synonyms (e.g. Law; Law Management) Preferred spelling (e.g. Catalog; Catalogue) Acronyms and abbreviations (e.g. ASEAN; Association of Southeast Asian Nations) Current and established terms (e.g. Cellular Radio; Cellular Phone) Translation (e.g. Coconut Coir; Bunot) 28 b. Hierarchical relationship – refers to the general and specific or broad and narrow type of relationship 29 Hierarchical relationship Example 1 : Broader term (BT) Employees BT : People - shows hierarchical relationship upward in the classification ranking - it differs from the use for reference in that both the basic terms and its broader term are descriptor terms and both can be used 30 Hierarchical relationship: Example 2 Cats BT: ANIMALS "ANIMALS" is a broader term to "CATS“ because all cats are animals. Reference: http://publish.uwo.ca/~craven/677/thesaur/main05.htm Hierarchical relationship: Example 3 Narrower term (NT) Employees NT : HOTEL EMPLOYEES RAILROAD EMPLOYEES - reference is similar to the broader term reference, except it goes down in the classification ranking 32 Hierarchical relationship: Example 4 Head NT : NOSE “NOSE” might be a narrower term to “HEAD”, because noses are normally parts of heads. Reference: http://publish.uwo.ca/~craven/677/thesaur/main05.htm Genus – species relationship (represent class inclusion) Example: Animals Domestic Animals Cats Whole-part relationship Example: Hand Fingers Instance relationship Example: Mountains Mount Apo 34 c. Associative relationship - refers to a non-hierarchical relationship of terms 35 Associative relationship Example 1 : Related term (RT) EMPLOYEE RT : EMPLOYMENT - reference refers to a descriptor that can be used in addition to the basic term but not in a hierarchical relationship 36 Associative relationship Other Examples : Teachers – Student Tables – Chairs Education – Teaching Men – Women 37 Scope Note (SN) & Qualifier - used to give the users about the descriptor’s usage restrictions or to clarify ambiguity; a scope note may give additional instructions to indexers Scope Note: Examples: INDEXING (SN) Assigning of natural language terms to documents HOSPITALIZATION (SN) Assign also terms for the conditions for which patients were hospitalized, if applicable Qualifier: Example: Security (Law) Security (Psychology) Reference: http://publish.uwo.ca/~craven/677/thesaur/main08.htm 38 Functions of Controlled Vocabulary: • • • To control synonyms by choosing one form as the standard term To make distinction among homographs To link or bring together those terms whose meaning are closely related Example: Cereals and Wheat • Controls variant spelling 39 A controlled vocabulary may take the form of verbal expressions as illustrated by Subject Headings Lists and Thesauri or coded/nonverbal expressions as shown by Classification schemes. Subject headings lists – are lists of terms representing several subject fields; some focus on specific fields Thesauri – are another authority devices that cover more specific or narrower subject fields Classification schemes – generally contain coded expression or notations to the relevant topics in a particular class or subclass 40 INDEXING GUIDELINES & PROCEDURES Part 2 41 INDEXING PROCESS: 1. Recording of bibliographic data - recording of the important information or the elements that identify a particular document The International Organization for Standards (ISO) set a Standards for bibliographic references: ISO 690 1975 (E)- “Bibliographic References Essential and Supplementary Elements” 42 - When indexing contents of a collection of documents, locators should give complete information about each document. - for periodical articles, each entry normally consists of the following elements: Essential elements for an article or contribution in a periodical are: Name(s) of Author(s) with forenames Title of the article Title of the periodical or Source Volume Number Issue Number Date of the issue Page number 43 Example: Name(s) of Author(s): [Xian, Jie] Title of the article : [Hybrid rice: a new hope towards a bountiful Philippines] Title of the periodical or Source : [Impact] Volume Number : [46] Issue Number : [9] Date of the issue : [September 2007] Page number : [4-8] 44 ISO FORMAT: Sample entry: ________________ (subject/Topic) Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact, Vol. 46, no.9, S ‘12, p. 4-8. Format comparison: ISO FORMAT: _______________ (subject/topic) Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact, Vol. 46, no.9, S ‘12, p. 4-8. ATENEO FORMAT: ________________ (subject/Topic) Hybrid rice: a new hope towards a bountiful Philippines. Xian, Jie. Impact 46 (9) : 4-8. S ‘12. OTHER FORMAT: _______________ (subject/topic) Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact 46 (9) : 4-8. S ‘12 46 2. Subject determination “aboutness of the material and the formulation of a concept list • • Choose the most appropriate concepts; consider the users & the purpose of the index No arbitrary limit should be set to the number of terms or descriptors which can be assigned to a document. - it should be determined fully by the amount of information contained in the document - it should be related to the expected needs of the users of the index. 47 • • Modify the indexing guidelines and procedures if needed; but modification should not compromise the structure or logic of the indexing language. Concepts should be as specific as possible. More general concepts may be preferred in some circumstances, depending upon the following factors: – – over-specificity might adversely affect the performance of the indexing system. if an idea is not fully developed, or is referred to only casually by the author, then it might be justified to index at a more general level 48 3. Content/Conceptual analysis – identifying the topics discussed in a document and determining what aspects of its users will be interested in 49 Content Analysis - Decide which topics in the item are relevant to the potential user of the document. - Decide which topics truly capture the content of the document. - Determine terms that come as close as possible to the terminology use in the document. - Decide on index terms and the specificity of those terms. 50 Parts of the document that have to be analyzed Title of the document/article - it is considered as basic indexing unit - it is the first stop in determining the subject content Abstract - actual information-packed miniature of documents; - good abstract can be fundamental indicator of subject content 51 Text itself - includes introduction, summary, conclusion, section heading, first & last sentences of the paragraph Illustrations, diagrams, tables and captions References - reference sources cited by the author may also be considered as subject indicator 52 Factors that may affect content analysis: if there is labor shortage or other critical time factor the guidelines and policies imposed by institutions that generally concerns with the selection of index content decisions of the indexer which aspects of the subjects will be emphasized and which aspects will be deemphasized 53 4. Translation - involves the conversion of terms in the natural language into standard terms drawn from a controlled vocabulary such as thesaurus, subject headings list, etc. - match terms in the concept list against those available in the controlled vocabulary 54 Practices to follow in the Translation process: - Concepts which are already translated into indexing terms should be translated into their preferred terms - Terms which represent new concept should be checked for accuracy and acceptability from the reference tools such as: ◦ ◦ ◦ ◦ Dictionaries and encyclopedias Thesauri (UNBIS Thesaurus) Classification schemes (Library of Congress) Established indexes (Reader’s Guide to Periodical Literature) 55 - Subject specialist, particularly those with some knowledge of indexing or documentation, may also be consulted - If the concepts are not found in existing thesaurus or classification scheme, these may be: • expressed by terms or descriptors which are admitted into indexing language • represented temporarily by more general terms; the new concepts being proposed as candidates for later addition 56 Translation - Group references to information that is scattered in the text of the document. - Combine heading and subheadings into related multilevel headings. - Direct the user seeking information under terms not used to those that are being used by means of see references and to related terms with see also references. - Arrange the index into a systematic presentation 57 Generating Index Entries Index entries maybe generated manually or using the computer. Manual generation- involves generation of index entries one by one using an ordinary or electric typewriter Machine generation- involves the use of the computers in generating index entries; various software packages are available 58 Indexing Techniques for Periodicals 1. Topics that can be considered for indexing are the following: - persons - sports events - economic news - special features - social trends - local politics - entertainment - editorials & columns - first and last events 59 • All article that have permanent value should be indexed under all topics and issues dealt with • • Editorials should be indexed under their topics as any other article but differentiated with others by adding (Ed.) or (E). The titles of editorials may be indexed under a collective heading “Editorials”. Letters to the editor if considered indexable should be indexed by topic, not under a caption that may have been assigned by the editor. It is advisable to index at least the name of the person who criticized an article as well as the author’s response. 60 2. Preference and Forms of Headings based on the International Organization for Standardization (ISO 999) Personal Names: – – – Provide as full a form as possible Choose the most recent/most commonly used form of personal name as the heading and add “see” crossreference from other forms Personal names should be take the form used in the document, but if the text is not consistent the indexer should adopt one form. 61 – Compound and multiple surnames, whether hyphenated or not, should be indexed under the first part e.g. Lee Chua, Queena, Loren ; Perez de Cueller, Javier – Persons normally identified by title of honor or nobility should be indexed under the first name e.g. Prince Charles see Charles, Prince of Wales Queen Elizabeth I see Elizabeth I, Queen of England 62 Corporate Bodies • Names of the corporate bodies should normally be indexed without transportation and in as full a form as necessary. An initial article is omitted , unless specifically required for semantic or grammatical reasons e.g. Lopez Museum • Transposition maybe used if it is considered that this would help the users of the index e.g. Department of Energy see Energy, Department of • Choose the most recent, or the most commonly used, form of corporate name as the main heading and add “see” cross references from other forms e.g. Philippine Normal College see Philippine Normal University 63 Geographic Names • Geographic names should be as full as is necessary for clarity, with additions to avoid confusion with the otherwise identical names Example: J.P. Rizal (Quezon city) J.P. Rizal (Marikina) • An article or preposition should be retained in a geographic name of which it forms an integral part Example: Santolan, Pasig City • Where the article or preposition does not form an integral part of a name it should be omitted Example: New Day rather than The New Day 64 INDEXING STANDARDS Part 3 65 Standards serve as models and guidelines for the analysis of documents, construction and organization of indexes, indexing terminology, construction and use of thesauri, etc. they promote consistency and uniformity. 66 A. International Organization for Standardization -is a network of the national standards institutes of 146 countries, on the basis of one member per country, with a Central Secretariat in Geneva, Switzerland that coordinates the system. 67 ISO 5963: 1985 Documentation – Methods for examining documents, determining their subjects, and selecting indexing terms ISO 999: 1996 Information and documentation – Guidelines for the content, organization and presentation of indexes ISO 4: 1997 Information and documentation – Rules for the abbreviation of title words and titles of publications. It publishes a List of Serial Title Word Abbreviations which includes title word abbreviations in over 50 languages. 68 B. National Information Standards Organization (NISO) A nonprofit association accredited by the American National Standards Institute (ANSI) that identifies, develops, maintains and publishes technical standards to manage information in our changing and ever-more digital environment. NISO standards apply both traditional and new technologies to the full range of information-related needs, including retrieval, repurposing, storage, metadata, and presentation. 69 Standards developed by NISO: – – ANSI/NISO Z39.2 – 1994 (R2001) Information interchange format equivalent international standard: ISO 2709 ANSI/NISO Z39.19 – 2003 Guidelines for the construction, format, and management of Monolingual Thesauri *Equivalent international standard: ISO 2788 70 C. British Standards Institution (BSI) – as the National Standards Body of the UK, it develops standards and applies innovative standardization solutions to meet the needs of business and society. Standards developed by BSI (related to library and information science): – BS 1749: 1985 Recommendations for alphabetical arrangement and the filing order of numbers and symbols • Provides guidance on arranging entries within lists of all kinds, e.g. bibliographies, catalogues, directories and indexes. – BS ISO 999: 1996 Information and Documentation – guidelines for the content, organization and presentation of indexes 71 Automatic Indexing - - refers to indexing by machine, or the analysis of text by means of computer algorithms. The focus is on automatic methods used behind the scenes with little or no input from individual searchers, with the exception of relevance feedback. It does not include searching options and techniques used by human searches, such as methods for creating effective search statements, adding weights to terms, specifying proximity requirements, using truncation, wild cards or combining terms with Boolean or role operators. 72 Four Types of Approaches • Statistical – based on counts of words, statistical associations, and collation techniques that assigns weights, cluster similar words Example: Tf-idf (term frequency-inverse document frequency), which is frequency used in many search engines. The intuitive philosophy behind tf-idf is that terms that are frequent in many documents are less suited to make discriminations, while terms that are frequent within a single document may indicate that this document has much information about the things the terms are referring to). Source: Cleveland & Cleveland, 2001, p. 211 73 • • • Syntactical – stresses grammar and parts of speech, identifying concepts found in designated grammatical combinations, such as noun phrases Semantic systems – systems are concerned with the context sensitivity of words in the text Examples: What does cat mean in terms of its context? House cats? Heavy earthmoving equipment? Knowledge-based – systems goes beyond thesaurus or equivalent relationships to knowing the relationship between words Example: ‘tibia’ is part of a leg, thus the document is 74 indexed under ‘leg injuries’. Human / Manual Indexing vs. Automatic Indexing • • Automatic methods have trouble handling synonyms, homonyms, and semantic relations. Conceptualizing is very poor. Human indexers go through cognitive processes that may be influenced by their background experience, education, training, intelligence, and common sense. Computers can, and humans cannot, organize all words in a text and in a given database and make statistical operations on them (e.g. Td-idf). 75 Websites for Indexers Indexing Services H.W. Wilson Home Page (http://www.hwwilson.com/) Wright Information (http://mindspring.com/~jancw/) Susan Holbert Indexing Services ( http://abbington.com/holbert/) Special Formats and Subjects Indexing ASIS Thesaurus of Information Science (http://www.asis.org/Publications/Thesaurus/isframe.htm) The Library of Congress Thesauri (http://lcweb.loc.gov/pmei/lexico/liv/bsearch.html) Standards National Information Standards Organization (http://www.niso.org/) ANSI/NISO Z39.41- 1997 Guidelines for Abstracts (http://www.ansi.org/) ANSI/Z39.4- 1984 Basic Criteria for Indexers (http://www.ansi.org/) Indexing software HTML Indexer (for Windows) http://www.html-indexer.com/ Cindex (for DOS, Windows, and Macintosh) http://www.indexres.com 76 www.comicstripgenerator.com www.comicstripgenerator.com http://sweetmud.tv/wp-content/plugins/thank-you-animation-for-powerpoint-free 77