Endeca @ NCSU Libraries Andrew Pace & Emily Lynema NCSU Libraries May 24, 2006 Technical Overview Endeca Information Access Platform coexists with SirsiDynix Unicorn ILS and Web2 online catalog. Endeca indexes MARC records exported from Unicorn. Index is refreshed nightly with records added/updated during previous day. Endeca IAP Overview Endeca Information Access Platform NCSU exports and reformats Data Foundry Parse text files Raw MARC data MDEX Engine Indices Flat text files HTTP HTTP Client browser NCSU Web Application Endeca IAP Overview Offline - Nightly NCSU exports and reformats Data Foundry Parse text files Raw MARC data MDEX Engine Indices Flat text files HTTP HTTP Client browser NCSU Web Application Endeca IAP Overview Always Online NCSU exports and reformats Data Foundry Parse text files Raw MARC data MDEX Engine Indices Flat text files HTTP HTTP Client browser NCSU Web Application Integrating Endeca Endeca doesn’t understand MARC data / MARC-8 character encoding – translate to UTF-8 text files Each night a script updates the data indexed by Endeca: – Exports updated or new MARC records from Unicorn. – Reformats and merges these records with those already indexed. – Starts Endeca re-index – completely rebuilding index for the catalog. Process requires about 7 hours. Retain Web2 OPAC for some functionality – Authority searching - known items and cross-references – Detailed record pages – how to make Endeca -> Web2 link? Integrating Endeca - Future MarcAdapter plugin for raw MARC data. – Create local field mappings and special handlers in Java. – Eliminate need for external MARC 21 translation and file merging. Partial Updates – Update circulation data multiple times throughout the day. Quick Demo http://catalog.lib.ncsu.edu Some Search Statistics (March 2006) Requests by Search Type Search -> Navigation 30% Search 55% Navigation 15% Searches by Search Key 80000 74971 Requests 60000 40000 32776 20000 13563 9872 5838 1141 0 Keyword ISBN Title Author Search Key Subject Multi-Field Some Navigation Statistics (March 2006) Navigation by Dimensions Availability 6790 LC Classification 49931 Subject: Topic 44197 Dimension Subject: Genre 17720 Format 20867 Library 23291 Subject: Region 13607 Subject: Era 7451 Language 8653 Author 17939 0 20000 40000 Requests 60000 Navigation Statistics (II) Dimension (March 2006) Requests Order (on page) LC Classification 49931 2 Subject: Topic 44197 3 Library 23291 6 Format 20867 5 Author 17939 10 Subject: Genre 17720 4 Subject: Region 13607 7 Language 8653 9 Subject: Era 7451 8 Availability 6790 1 Other interesting tidbits… (March 2006) Authority searching decreased 45% Keyword searching increased 230%. – Caveat: default catalog search changed from title authority to keyword. ~ 6% of keyword searches offered spelling correction or suggestion – 3.6% - automatic spell correction – 2.6% - “Did you mean…” suggestion Usability Testing 10 undergraduate students – 5 with Endeca catalog – 5 with old Web2 OPAC Endeca performed as well as OPAC for known-item searching in usability test – 89% Endeca tasks completed ‘easily’ (8/9) – 71% OPAC tasks completed ‘easily’ (15/21) Endeca performed better than OPAC for topical searching in usability test. Topical Searching Tasks Topical Task Success: Web2 Topical Task Success: Endeca Failed 22% Failed 34% Easy 36% Hard 3% Medium 17% Hard 23% Medium 7% Easy 58% Average Topical Task Duration 00:00.0 Task 5 Task 6 Task 7 Task 8 Task 9 Task 10 00:43.2 01:26.4 02:09.6 02:52.8 03:36.0 Web2 Endeca Usability Testing Trends Relevance *most* important – “Once I scroll through a page, I get pretty discouraged about the results...” Web2 OPAC participant looking for resources on cat health ‘Keyword’ term less intuitive / trusted than ‘Subject’ and ‘Title’ – “[I used] Keyword in Title because that’s what I want the book to be mainly referring to. But I also could’ve went Keyword in Subject. But if I’d have went Keyword Anywhere it would have had too big of a field to look through.” Web2 OPAC participant looking for resources on gene therapy When found, dimensions seem intuitive and useful ‘Did you mean’ seems intuitive A study in relevance Are search results in Endeca more likely to be relevant to a user’s query than search results in Web2 OPAC? 100 topical user searches from 1 month in fall 2005 How many of top 5 results relevant? – 40% relevant in Web2 OPAC – 68% relevant in Endeca catalog Relevance defined Relevance ranking in Endeca – select from a variety of modules and order them based on importance. Relevance most important in Keyword Anywhere - searches all fields. At NCSU… 1. Original query term(s) (no thesaurus, stemming, spell correction) 2. Exact phrase match 3. Field ranking (Title higher than Author higher than Table of Contents) 4. Number of fields that contain term(s) … Future Plans Ongoing tweaks: – Continued usability testing – Relevance ranking algorithms & spell correction thresholds – Additional browsing options Endeca 2.0 ideas – FRBR-ized display – Discussions with OCLC regarding FAST (Faceted Access to Subject Terms) and FRBR – Patron-generated refinements (folksonomies?) – Enrich records with supplemental Web Services content – more usable TOCs, book reviews, etc. – The death of authority searching (?) – More integration with QuickSearch, other data repositories, and third-party discovery tools Thanks http://www.lib.ncsu.edu/endeca Andrew Pace, Head, IT andrew_pace@ncsu.edu Emily Lynema, Systems Librarian for Digital Projects emily_lynema@ncsu.edu