Library Automation: TheYesterday’s Future of the Technology, Online Catalog Tomorrow Andrew K. Pace NCSU Libraries July 28, 2006 What I will cover: Online catalog: the problem Brief environmental scan Endeca: team, timeline, technology Usability, statistical results, relevance study Dis-integrated systems / Future Catalogs What ILS Catalogs Do Well… (liberally stolen from Roy Tennant) Inventory control: What and where Known item searching What ILS Catalogs Don’t do Well… (liberally stolen from Roy Tennant, and augmented by me) Any search other than known item Most Anything other than books (serials, e-resources, articles, digital objects) Logical groupings of results (e.g. FRBR) Faceted browsing Relevance ranking Sideways searching (suggestions, expansion of searches and search targets) “OPAC Complainers” “There is certainly no dearth of OPAC complainers. You have Andrew Pace (OPACs suck), and Roy Tennant (You Can’t Put Lipstick on a Pig) writing and presenting about the need for change (more simplicity) in the OPAC world. I can appreciate their arguments for a simpler OPAC (not to mention the rest of the system) but other then [sic] present their arguments, neither has much in the way of suggestions nor have they sparked a movement among librarians or the automation vendors to do anything about the situation.” -ACRL Blog entry Oct. 13 2005 NextGen Library Search Tools RedLightGreen (RLG) OCLC Fictionfinder Vivisimo clustered search (Ex Libris, Serials Soltions) Grokker (EBSCO) Aquabrowser visual context Endeca Information Access Platform OCLC Custom Worldcat and OpenWorldCat Innovative Interfaces OPAC Pro & Encore Ex Libris Primo Polaris, AJAX-Enabled OPAC SirsiDynix Enterprise Portal System, FAST Talis, et alWeb Services Georgia Pines and the Library 2.0 Bandwagon Endeca purchase decision Lots of topical searches and poor subject access – Keyword gives too many or too few results – leads to general distrust – Misunderstanding of authority headings No relevancy ranking of results Needed more responsiveness (speed) Implementation Team 7 representative team members – – – – – – – Andrew Pace, IT, Chair Emily Lynema, IT, ex officio (tech lead) Cindy Levine, Research and Information Services Erik Moore, IT, ex officio (ILS librarian) Charley Pennell, Metadata and Cataloging Shirley Rodgers, IT Tito Sierra, Digital Library Initiatives Timeline – License / negotiation: Spring 2005 – Acquire: Summer 2005 – Implementation: August 2005 – January 12, 2006 Technical Overview Endeca ProFind co-exists with SirsiDynix Unicorn ILS and Web2 online catalog. Endeca indexes MARC records exported from Unicorn. Index is refreshed nightly with records added/updated during previous day. Endeca ProFind Overview Endeca ProFind NCSU exports and reformats Data Foundry Parse text files Raw MARC data Navigation Engine Indices Flat text files HTTP HTTP Client browser NCSU Web Application Endeca ProFind Overview Offline - Nightly NCSU exports and reformats Data Foundry Parse text files Raw MARC data Navigation Engine Indices Flat text files HTTP HTTP Client browser NCSU Web Application Endeca ProFind Overview Always Online NCSU exports and reformats Data Foundry Parse text files Raw MARC data Navigation Engine Indices Flat text files HTTP HTTP Client browser NCSU Web Application Integrating Endeca Endeca doesn’t understand MARC data / MARC-8 character encoding – translate to UTF-8 text files Each night a script updates the data indexed by Endeca: – Exports updated or new MARC records from Unicorn. – Reformats and merges these records with those already indexed. – Starts Endeca re-index – completely rebuilding index for the catalog. Process requires about 4 hours. Retain Web2 OPAC for some functionality – Authority searching - known items and cross-references – Detailed record pages – how to make Endeca -> Web2 link? Quick Demo http://catalog.lib.ncsu.edu Some User Reaction “This is absolutely the coolest thing I've seen all century.” - Will Owen, Head of Systems (UNC Libraries) “Also, I'm really digging the new NCSU library catalog. Very nice." - Educause staff (non-librarian) “The new Endeca system is incredible. It would be difficult to exaggerate how much better it is than our old online card catalog (and therefore that of most other universities). I've found myself searching the catalog just for fun, whereas before it was a chore to find what I needed.” - NCSU Undergrad, Statistics Basic statistics (March – May 2006) Requests by Search Type Search -> Navigation 29% Search 51% Navigation 20% Navigation statistics (March – May 2006) Navigation Requests by Dimension 23,848 Availability 169,249 LC Classification 155,856 Subject: Topic 65,545 Subject: Genre 74,985 Format 87,221 Library 59,248 Subject: Region Subject: Era 38,605 Language 38,074 70,516 Author 0 30,000 60,000 90,000 Requests 120,000 150,000 Navigation statistics (March – May 2006) Navigation by Dimensions New 4% Language 5% Subject: Era 5% Availability 3% LC Classification 20% Subject: Region 7% Subject: Genre 8% Subject: Topic 19% Author 9% Format 9% Library 11% Sorting statistics (March – May 2006) Sorting Requests Call Number 6% Author A-Z 9% Title A-Z 13% Most Popular 19% Pub Date 53% Other interesting tidbits… (March 2006) Authority searching decreased 45% Keyword searching increased 230% – Caveat: default catalog search changed from title authority to keyword ~ 5% of keyword searches offered spelling correction or suggestion – 3.1% - automatic spell correction – 2.3% - “Did you mean…” suggestion Usability Testing Trends 10 undergraduate students – 5 with Endeca catalog – 5 with old Web2 OPAC Endeca performed as well as OPAC for known-item searching – 89% Endeca tasks completed ‘easily’ (8/9) – 71% OPAC tasks completed ‘easily’ (15/21) Endeca performs better than OPAC for topical searching – – – – 61% Endeca tasks completed ‘easily’ (19/31) 3% Endeca tasks completed as ‘hard’ (1/31) 33% OPAC tasks completed ‘easily’ (13/39) 26% OPAC tasks completed as ‘hard’ (10/39) A study in relevance Are search results in Endeca more likely to be relevant to a user’s query than search results in Web2 OPAC? 100 topical user searches from 1 month in fall 2005 How many of top 5 results relevant? – 40% relevant in Web2 OPAC – 68% relevant in Endeca catalog Relevance defined Relevance ranking in Endeca – select from a variety of modules and order them based on importance. Relevance most important in Keyword Anywhere - searches all fields. At NCSU… 1. Original query term(s) (no thesaurus, stemming, spell correction) 2. Exact phrase match 3. Field ranking (Title higher than Author higher than Table of Contents) 4. Number of fields that contain term(s) … Future Plans Ongoing tweaks: – Continued usability testing – Relevance ranking algorithms & spell correction thresholds – Additional browsing options Endeca 2.0 ideas – FRBR-ized display – Discussions with OCLC regarding FAST (Faceted Access to Subject Terms) and FRBR – Patron-generated refinements (folksonomies?) – Enrich records with supplemental Web Services content – more usable TOCs, book reviews, etc. – The death of authority searching (?) – More integration with QuickSearch, other data repositories, and third-party discovery tools Stuff to read… Rethinking how we provide bibliographic services for the University of California by the Bibliographic Services Task Force http://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf The Changing nature of the catalog and its integration with other discovery tools by Karen Calhoun http://www.loc.gov/catdir/calhoun-report-final.pdf The Changing nature of the catalog and its integration with other discovery tools. Final report. March 17, 2006. Prepared for the Library of Congress by Karen Calhoun: A Critical review by Thomas Mann http://www.guild2910.org/AFSCMECalhounReviewREV.pdf A “Next Generation Catalog, Eric Morgan http://dewey.library.nd.edu/morgan/ngc/ Metadata Research Center, SILS http://ils.unc.edu/mrc/ University of Rochester eXtensible Catalog Toward a 21st Century Catalog, ITAL, Sept. 2006, by Antelman, Lynema, and Pace From the Calhoun Report "If one accepts the premise that library collections have value, then library leaders must move swiftly to establish the catalog within the framework of online information discovery systems of all kinds. Because it is catalog data that has made collections accessible over time, to fail to define a strategic future for library catalogs places in jeopardy the legacy of the world's library collections themselves. For this reason, the option of rejecting library catalogs is not considered in this report." The library system pile “Seams serve as perceptible boundaries that provide points of reference; without such boundaries readers get ‘lost at sea’ and don’t know were they are in relation to anything else; they can’t perceive either the extent of what they have or what they don’t have.” -Thomas Mann Wither or Whither the Catalog? Reversal of fortune OLD SEARCH MODEL NEW SEARCH MODEL The library system puzzle Serials A&I / FT DBs Catalog Web The library system puzzle Serials A&I / FT DBs Metasearch ERM Systems GS Catalog Guided Navigation Digital Repositories Web Legacy ILS IR Thank you. http://www.lib.ncsu.edu/endeca Andrew Pace, Head, IT andrew_pace@ncsu.edu