Indexing & retrieval Approaches to indexing Key word indexing Concept indexing Social indexing Non-text indexing Keyword Indexing Keyword indexing (1) Entity-oriented - draw terms from entity itself Advantages: • Quick How to succeed in graduate school Keyword indexing (1) Entity-oriented - draw terms from entity itself Advantages: • Quick • Inexpensive • No vocabulary lag • Multiple access points • Accuracy • No intellectual effort needed Keyword indexing (2) Disadvantages: • No control over synonyms, near synonyms • No control over homographs Keyword indexing (3) Disadvantages: • Dependent on authors for informative and accurate titles Artificial metalloenzymes based on the biotin−avidin technology: enantioselective catalysis and beyond The golden peaches of Samarkhand Keyword indexing (4) Disadvantages: • No control over word forms Communicating in the library or Communications in libraries Keyword indexing (5) Disadvantages: • No cross reference structure Historical key word indexing methodologies Uniterm cards Edge-notched cards Optical coincidence cards Key word in context (KWIC) Spatial indexing Pre- versus post-coordinate indexing Mortimer Taube China—Folklore China—History China —Politics France —Folklore France —History France —Politics Germany —Folklore Germany —History Germany —Politics Russia —Folklore Russia —History Russia —Politics (12 terms) China, France, Germany, Russia, Folklore, History, Politics (7 terms) Post-coordinate index searching History of France → France + History Two sets of documents France Boolean AND search yields intersection of the two sets History France AND History Advantages to Taube's system No need to develop a list of authorized terms—pulling terms from documents themselves No need to articulate rules of punctuation for representing complex concepts (France—History) No need to delineate citation order (France—history v. History—France) No need to formulate rules for subheadings ("May subdivide geog.") Uniterm cards One card per term Document no. 102 "Arrest statistics of the Arizona State Police" state police 31 102 53 24 75 96 107 68 49 70 34 95 117 59 115 147 109 11 102 23 91 85 96 87 68 49 115 107 79 60 Searching with uniterm cards Query: looking for documents about state police state police 31 102 53 24 75 96 107 68 49 70 34 95 117 59 115 147 109 11 102 23 91 85 96 87 68 49 115 107 79 60 102 Arrest statistics of the Arizona State Police. 107 A short history of the Wisconsin State Police. 115 The modern police state. Edge-notched cards One card per bibliographic item pet-care Whirdeaux, Ima bears Caring for your pet pterodactyl / by Ima Whirdeaux Turner, Paige Call no.pet Q54321 .W45 Caring for your grizzly / by Paige Turner pterodactyls Call no. Q12345 .T8 Pyramid coding for edgenotched cards Coding the year 1947* 20 dots 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 dots 9 5 2 0 8 4 1 7 3 6 9 5 2 0 8 4 1 7 3 6 *They hadn't heard of the Y2K problem yet. Optical coincidence cards Pre-printed cards with numbers for entire database fleas 0 10 20 30 40 50 60 70 80 90 1 11 21 31 41 51 61 71 81 91 2 12 22 32 42 52 62 72 82 92 3 13 23 33 43 53 63 73 83 93 4 14 24 34 44 54 64 74 84 94 5 15 25 35 45 55 65 75 85 95 6 16 26 36 46 56 66 76 86 96 7 17 27 37 47 57 67 77 87 97 8 18 28 38 48 58 68 78 88 98 9 19 29 39 49 59 69 79 89 99 Key Word in Context Stop (KWIC) Index word Stop word Doc 15 title: "A comparison of OCLC and WLN hit rates for monographs and an analysis of the types of records retrieved" CONTEXT ttems of remote users: an hit rates for monograph/A comparison of OCLC and WLN OCLC and WLN hit rates for onographs/ A comparison of arison of OCLC and WLN hit n analysis of the types of s of the types of records phs and an analysis of the A comparison of OCLC and KEY WORDS analysis of the types of comparison of OCLC and WLN hit rates for monographs and / monographs and an analysi/ OCLC and WLN hit rates for rates for monographs and / records retrieved. A com/ retrieved. A comparison / types of records retrieve/ WLN hit rates for monogra/ POINTER 15 15 15 15 15 15 15 15 15 15 Key Word Out of Context (KWOC) Index aardvark baggage banyan coconut driving elementary elephant garage hardware meter nadir 101 123 128, 159, 179 955, 654 196, 488, 788 455, 785 128, 465, 783 678, 398 849, 483, 399 768 877 noxious opium opus people quark radar radio stereo television ultraviolet zebra 112 289 985, 159, 849 629, 458 137, 492 968, 295 430, 206, 749 294, 837, 873 745, 727, 883 958, 774 276 Vector space model (VSM) technology Each document represented by a vector libraries Vector for document entitled "Assistive technology for libraries" Vector space model matching technology Similarity between query and document vectors Vector for document 1 Vector for document 2 Vector for query libraries VSM term weighting Assign high weights to terms that appear frequently in the document but infrequently in the database No. of Freq. w/in documents Term document with term conclusion low high information high high blind high low Query: "I'm looking for articles about assistive technology for the blind." VSM refinements Adding semantic and syntactical parsing. Bill is going to the store to make a purchase. Bill is going to purchase the store. Bill is going to store his purchase. Concept indexing Concept indexing Rather than pulling terms from documents, assign concept identifier (e.g. France—History) to documents dealing with history of France Requires intellectual effort Takes more time than key word indexing so less economical Avoids problems of false coordination and synonymy through use of vocabulary control Vocabulary control (1) One indexing term or phrase to represent a concept – Unidentified flying objects not flying saucers – Point user to correct term with "use" reference – Reduces number of searches needed to find items about a particular topic Vocabulary control (2) One form of a word to represent the concept – Dictionaries not dictionary Vocabulary control (3) One usage of a homographic term – Fault (geologic) not fault (responsibility for error) – Usage identified though scope note – Consistency among indexers as well as one indexer over time – Helps user to avoid false drops Vocabulary control (4) Syndetic structure – Broader terms – Narrower terms – Related terms (see also) – User can negotiate structure to find most appropriate term, as well as identify additional related terms of potential use in finding relevant documents Social network indexing • Tags • Tag clouds • User-created tags providing access to library resources flickr http://www.flickr.com/ Tags Tags Tags architecture Bohemian South Country Czech Republic Europe European historical medieval old Old Town Other Keywords River Snow town Vltava Tags Tags Tags (177,583 photos) Tags Tag clouds Geotagging Librarian tagging Library using flickr Peace Palace Library (PPL) Social bookmarking: http://www.delicious.com http://www.delicious.com/mauicclibrary http://www.delicious.com/mauicclibrary technology The economic case for open access in academic publishing Portable software for USB drives CU Researcher Finds 10,000-Year-Old Hunting Weapon in Melting Ice Patch University of Pennsylvania http://www.library.upenn.edu/ PennTags Item list with PennTags Adding a PennTag Add to PennTags Non-text indexing Indexing Music Indexing music transcription 1 1 5 5 6 6 5 Indexing Music - melodic contour * * R - U / R - U / R - D \ Query by humming Query by humming (2) Hummed Queries Digital MIDI Songs Pitch Tracker Melodic Melody Database Audio contour Query Engine Ranked List Of Matching Melodies Source: Ghias, Asif; Logan, Jonathan; Chamberlin, David; and Brian C. Smith. 1995. Query by humming-musical Information retrieval in an audio database. ACM Multimedia 95 - Electronic Proceedings. http://www.cs.cornell.edu/Info/Faculty/bsmith/query-by-humming.html Indexing Music - melodic contour http://www.musipedia.org/ * R U R U R D Indexing Music - melodic contour http://www.musipedia.org/ RURURD * R U R U R D Indexing Music - melodic contour http://www.musipedia.org/ * R U R U R D Indexing images Source: Trust Territory archives. Indexing images - chair (1) Indexing images - ? Indexing images - chair (2) Biometrics - face Biometrics - differences Biometrics - similarities Look at ratios of distances between marker points Indexing images • Color • Layout • Shape Indexing images by color http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English Indexing images by color http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English Indexing images by color http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English Indexing images by color http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English Indexing images by color http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English Indexing images by color http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English Indexing images by color http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English Indexing images by layout http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English Indexing images by layout http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English Indexing images by layout http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English Indexing images by layout http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English Indexing images by layout http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English Indexing images by layout http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicLayout.mac/qbic?selLang=English Indexing images by shape http://shape.cs.princeton.edu/search.html Indexing images by shape http://shape.cs.princeton.edu/search.html Indexing images by shape http://shape.cs.princeton.edu/search.html Indexing images by shape Original http://shape.cs.princeton.edu/search.html Search by Shape – Commercial Usage http://www.youtube.com/watch?v=grShwnDXyUA Search by Color Exercise 1 Title? Artist? 3 4 2 5 http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English