Endeca @ NCSU Libraries

advertisement
Endeca @ NCSU
Libraries
Andrew Pace & Emily Lynema
NCSU Libraries
May 24, 2006
Technical Overview
Endeca Information Access Platform coexists with SirsiDynix Unicorn ILS and
Web2 online catalog.
 Endeca indexes MARC records exported
from Unicorn.
 Index is refreshed nightly with records
added/updated during previous day.

Endeca IAP Overview
Endeca Information Access Platform
NCSU exports
and reformats
Data
Foundry
Parse text
files
Raw MARC
data
MDEX
Engine
Indices
Flat text
files
HTTP
HTTP
Client
browser
NCSU Web
Application
Endeca IAP Overview
Offline - Nightly
NCSU exports
and reformats
Data
Foundry
Parse text
files
Raw MARC
data
MDEX
Engine
Indices
Flat text
files
HTTP
HTTP
Client
browser
NCSU Web
Application
Endeca IAP Overview
Always Online
NCSU exports
and reformats
Data
Foundry
Parse text
files
Raw MARC
data
MDEX
Engine
Indices
Flat text
files
HTTP
HTTP
Client
browser
NCSU Web
Application
Integrating Endeca


Endeca doesn’t understand MARC data / MARC-8
character encoding – translate to UTF-8 text files
Each night a script updates the data indexed by Endeca:
– Exports updated or new MARC records from Unicorn.
– Reformats and merges these records with those already indexed.
– Starts Endeca re-index – completely rebuilding index for the
catalog.


Process requires about 7 hours.
Retain Web2 OPAC for some functionality
– Authority searching - known items and cross-references
– Detailed record pages – how to make Endeca -> Web2 link?
Integrating Endeca - Future

MarcAdapter plugin for raw MARC data.
– Create local field mappings and special
handlers in Java.
– Eliminate need for external MARC 21
translation and file merging.

Partial Updates
– Update circulation data multiple times
throughout the day.
Quick Demo

http://catalog.lib.ncsu.edu
Some Search Statistics
(March 2006)
Requests by Search Type
Search ->
Navigation
30%
Search 55%
Navigation
15%
Searches by Search Key
80000
74971
Requests
60000
40000
32776
20000
13563
9872
5838
1141
0
Keyword
ISBN
Title
Author
Search Key
Subject
Multi-Field
Some Navigation Statistics
(March 2006)
Navigation by Dimensions
Availability
6790
LC Classification
49931
Subject: Topic
44197
Dimension
Subject: Genre
17720
Format
20867
Library
23291
Subject: Region
13607
Subject: Era
7451
Language
8653
Author
17939
0
20000
40000
Requests
60000
Navigation Statistics (II)
Dimension
(March 2006)
Requests
Order (on page)
LC Classification
49931
2
Subject: Topic
44197
3
Library
23291
6
Format
20867
5
Author
17939
10
Subject: Genre
17720
4
Subject: Region
13607
7
Language
8653
9
Subject: Era
7451
8
Availability
6790
1
Other interesting tidbits…
(March 2006)
Authority searching decreased 45%
 Keyword searching increased 230%.

– Caveat: default catalog search changed from
title authority to keyword.

~ 6% of keyword searches offered
spelling correction or suggestion
– 3.6% - automatic spell correction
– 2.6% - “Did you mean…” suggestion
Usability Testing

10 undergraduate students
– 5 with Endeca catalog
– 5 with old Web2 OPAC

Endeca performed as well as OPAC for
known-item searching in usability test
– 89% Endeca tasks completed ‘easily’ (8/9)
– 71% OPAC tasks completed ‘easily’ (15/21)

Endeca performed better than OPAC for
topical searching in usability test.
Topical Searching Tasks
Topical Task Success: Web2
Topical Task Success: Endeca
Failed
22%
Failed
34%
Easy
36%
Hard
3%
Medium
17%
Hard
23%
Medium
7%
Easy
58%
Average Topical Task Duration
00:00.0
Task 5
Task 6
Task 7
Task 8
Task 9
Task 10
00:43.2
01:26.4
02:09.6
02:52.8
03:36.0
Web2
Endeca
Usability Testing Trends

Relevance *most* important
– “Once I scroll through a page, I get pretty discouraged about
the results...”
Web2 OPAC participant looking for resources on cat health

‘Keyword’ term less intuitive / trusted than ‘Subject’ and
‘Title’
– “[I used] Keyword in Title because that’s what I want the book
to be mainly referring to. But I also could’ve went Keyword in
Subject. But if I’d have went Keyword Anywhere it would have
had too big of a field to look through.”
Web2 OPAC participant looking for resources on gene therapy


When found, dimensions seem intuitive and useful
‘Did you mean’ seems intuitive
A study in relevance
Are search results in Endeca more likely to
be relevant to a user’s query than search
results in Web2 OPAC?
 100 topical user searches from 1 month in
fall 2005
 How many of top 5 results relevant?

– 40% relevant in Web2 OPAC
– 68% relevant in Endeca catalog
Relevance defined



Relevance ranking in Endeca – select from a
variety of modules and order them based on
importance.
Relevance most important in Keyword
Anywhere - searches all fields.
At NCSU…
1. Original query term(s) (no thesaurus, stemming,
spell correction)
2. Exact phrase match
3. Field ranking (Title higher than Author higher than
Table of Contents)
4. Number of fields that contain term(s) …
Future Plans

Ongoing tweaks:
– Continued usability testing
– Relevance ranking algorithms & spell correction thresholds
– Additional browsing options

Endeca 2.0 ideas
– FRBR-ized display
– Discussions with OCLC regarding FAST (Faceted Access to
Subject Terms) and FRBR
– Patron-generated refinements (folksonomies?)
– Enrich records with supplemental Web Services content –
more usable TOCs, book reviews, etc.
– The death of authority searching (?)
– More integration with QuickSearch, other data repositories,
and third-party discovery tools
Thanks
http://www.lib.ncsu.edu/endeca
Andrew Pace, Head, IT
andrew_pace@ncsu.edu
Emily Lynema, Systems Librarian for Digital Projects
emily_lynema@ncsu.edu
Download