The Future of the Online Catalog

advertisement
Library Automation:
TheYesterday’s
Future of the
Technology,
Online Catalog
Tomorrow
Andrew K. Pace
NCSU Libraries
July 28, 2006
What I will cover:
Online catalog: the problem
 Brief environmental scan
 Endeca: team, timeline, technology
 Usability, statistical results, relevance study
 Dis-integrated systems / Future Catalogs

What ILS Catalogs Do Well…
(liberally stolen from Roy Tennant)
Inventory control: What and where
 Known item searching

What ILS Catalogs Don’t do Well…
(liberally stolen from Roy Tennant, and augmented by me)
Any search other than known item
 Most Anything other than books (serials,
e-resources, articles, digital objects)
 Logical groupings of results (e.g. FRBR)
 Faceted browsing
 Relevance ranking
 Sideways searching (suggestions,
expansion of searches and search targets)

“OPAC Complainers”
“There is certainly no dearth of OPAC complainers.
You have Andrew Pace (OPACs suck), and Roy
Tennant (You Can’t Put Lipstick on a Pig) writing
and presenting about the need for change (more
simplicity) in the OPAC world. I can appreciate
their arguments for a simpler OPAC (not to
mention the rest of the system) but other then
[sic] present their arguments, neither has much in
the way of suggestions nor have they sparked a
movement among librarians or the automation
vendors to do anything about the situation.”
-ACRL Blog entry
Oct. 13 2005
NextGen Library Search Tools
 RedLightGreen (RLG)
 OCLC Fictionfinder
 Vivisimo clustered search
(Ex Libris, Serials Soltions)
 Grokker (EBSCO)
 Aquabrowser visual
context
 Endeca Information Access
Platform
 OCLC Custom Worldcat
and OpenWorldCat
 Innovative Interfaces
OPAC Pro & Encore
 Ex Libris Primo
 Polaris, AJAX-Enabled
OPAC
 SirsiDynix Enterprise
Portal System, FAST
 Talis, et alWeb Services
 Georgia Pines and the
Library 2.0 Bandwagon
Endeca purchase decision

Lots of topical searches and poor subject
access
– Keyword gives too many or too few results –
leads to general distrust
– Misunderstanding of authority headings
No relevancy ranking of results
 Needed more responsiveness (speed)

Implementation Team

7 representative team members
–
–
–
–
–
–
–

Andrew Pace, IT, Chair
Emily Lynema, IT, ex officio (tech lead)
Cindy Levine, Research and Information Services
Erik Moore, IT, ex officio (ILS librarian)
Charley Pennell, Metadata and Cataloging
Shirley Rodgers, IT
Tito Sierra, Digital Library Initiatives
Timeline
– License / negotiation: Spring 2005
– Acquire: Summer 2005
– Implementation: August 2005 – January 12, 2006
Technical Overview
Endeca ProFind co-exists with SirsiDynix
Unicorn ILS and Web2 online catalog.
 Endeca indexes MARC records exported
from Unicorn.
 Index is refreshed nightly with records
added/updated during previous day.

Endeca ProFind Overview
Endeca ProFind
NCSU exports
and reformats
Data
Foundry
Parse text
files
Raw MARC
data
Navigation
Engine
Indices
Flat text
files
HTTP
HTTP
Client
browser
NCSU Web
Application
Endeca ProFind Overview
Offline - Nightly
NCSU exports
and reformats
Data
Foundry
Parse text
files
Raw MARC
data
Navigation
Engine
Indices
Flat text
files
HTTP
HTTP
Client
browser
NCSU Web
Application
Endeca ProFind Overview
Always Online
NCSU exports
and reformats
Data
Foundry
Parse text
files
Raw MARC
data
Navigation
Engine
Indices
Flat text
files
HTTP
HTTP
Client
browser
NCSU Web
Application
Integrating Endeca


Endeca doesn’t understand MARC data / MARC-8
character encoding – translate to UTF-8 text files
Each night a script updates the data indexed by Endeca:
– Exports updated or new MARC records from Unicorn.
– Reformats and merges these records with those already indexed.
– Starts Endeca re-index – completely rebuilding index for the
catalog.


Process requires about 4 hours.
Retain Web2 OPAC for some functionality
– Authority searching - known items and cross-references
– Detailed record pages – how to make Endeca -> Web2 link?
Quick Demo

http://catalog.lib.ncsu.edu
Some User Reaction
“This is absolutely the coolest thing I've seen all
century.”
- Will Owen, Head of Systems (UNC Libraries)
“Also, I'm really digging the new NCSU library catalog.
Very nice."
- Educause staff (non-librarian)
“The new Endeca system is incredible. It would be
difficult to exaggerate how much better it is than our
old online card catalog (and therefore that of most
other universities). I've found myself searching the
catalog just for fun, whereas before it was a chore to
find what I needed.”
- NCSU Undergrad, Statistics
Basic statistics
(March – May 2006)
Requests by Search Type
Search ->
Navigation
29%
Search 51%
Navigation
20%
Navigation statistics
(March – May 2006)
Navigation Requests by Dimension
23,848
Availability
169,249
LC Classification
155,856
Subject: Topic
65,545
Subject: Genre
74,985
Format
87,221
Library
59,248
Subject: Region
Subject: Era
38,605
Language
38,074
70,516
Author
0
30,000
60,000
90,000
Requests
120,000
150,000
Navigation statistics
(March – May 2006)
Navigation by Dimensions
New
4%
Language
5%
Subject: Era
5%
Availability
3%
LC Classification
20%
Subject: Region
7%
Subject: Genre
8%
Subject: Topic
19%
Author
9%
Format
9%
Library
11%
Sorting statistics
(March – May 2006)
Sorting Requests
Call Number
6%
Author A-Z
9%
Title A-Z
13%
Most Popular
19%
Pub Date
53%
Other interesting tidbits…
(March 2006)
Authority searching decreased 45%
 Keyword searching increased 230%

– Caveat: default catalog search changed from
title authority to keyword

~ 5% of keyword searches offered
spelling correction or suggestion
– 3.1% - automatic spell correction
– 2.3% - “Did you mean…” suggestion
Usability Testing Trends

10 undergraduate students
– 5 with Endeca catalog
– 5 with old Web2 OPAC

Endeca performed as well as OPAC for known-item
searching
– 89% Endeca tasks completed ‘easily’ (8/9)
– 71% OPAC tasks completed ‘easily’ (15/21)

Endeca performs better than OPAC for topical searching
–
–
–
–
61% Endeca tasks completed ‘easily’ (19/31)
3% Endeca tasks completed as ‘hard’ (1/31)
33% OPAC tasks completed ‘easily’ (13/39)
26% OPAC tasks completed as ‘hard’ (10/39)
A study in relevance
Are search results in Endeca more likely to
be relevant to a user’s query than search
results in Web2 OPAC?
 100 topical user searches from 1 month in
fall 2005
 How many of top 5 results relevant?

– 40% relevant in Web2 OPAC
– 68% relevant in Endeca catalog
Relevance defined



Relevance ranking in Endeca – select from a
variety of modules and order them based on
importance.
Relevance most important in Keyword
Anywhere - searches all fields.
At NCSU…
1. Original query term(s) (no thesaurus, stemming,
spell correction)
2. Exact phrase match
3. Field ranking (Title higher than Author higher than
Table of Contents)
4. Number of fields that contain term(s) …
Future Plans

Ongoing tweaks:
– Continued usability testing
– Relevance ranking algorithms & spell correction thresholds
– Additional browsing options

Endeca 2.0 ideas
– FRBR-ized display
– Discussions with OCLC regarding FAST (Faceted Access to
Subject Terms) and FRBR
– Patron-generated refinements (folksonomies?)
– Enrich records with supplemental Web Services content –
more usable TOCs, book reviews, etc.
– The death of authority searching (?)
– More integration with QuickSearch, other data repositories,
and third-party discovery tools
Stuff to read…







Rethinking how we provide bibliographic services for the
University of California by the Bibliographic Services Task Force
http://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf
The Changing nature of the catalog and its integration with other discovery
tools by Karen Calhoun
http://www.loc.gov/catdir/calhoun-report-final.pdf
The Changing nature of the catalog and its integration with
other discovery tools. Final report. March 17, 2006. Prepared for the
Library of Congress by Karen Calhoun: A Critical review by Thomas
Mann
http://www.guild2910.org/AFSCMECalhounReviewREV.pdf
A “Next Generation Catalog, Eric Morgan
http://dewey.library.nd.edu/morgan/ngc/
Metadata Research Center, SILS
http://ils.unc.edu/mrc/
University of Rochester eXtensible Catalog
Toward a 21st Century Catalog, ITAL, Sept. 2006, by Antelman, Lynema,
and Pace
From the Calhoun Report

"If one accepts the premise that library
collections have value, then library leaders must
move swiftly to establish the catalog within the
framework of online information discovery
systems of all kinds. Because it is catalog data
that has made collections accessible over time,
to fail to define a strategic future for library
catalogs places in jeopardy the legacy of the
world's library collections themselves. For this
reason, the option of rejecting library catalogs is
not considered in this report."
The library system pile

“Seams serve as perceptible boundaries
that provide points of reference; without
such boundaries readers get ‘lost at sea’
and don’t know were they are in relation
to anything else; they can’t perceive
either the extent of what they have or
what they don’t have.”
-Thomas Mann
Wither or Whither the Catalog?
Reversal of fortune
OLD SEARCH MODEL
NEW SEARCH MODEL
The library system puzzle
Serials
A&I / FT DBs
Catalog
Web
The library system puzzle
Serials
A&I / FT DBs
Metasearch
ERM Systems
GS
Catalog
Guided
Navigation
Digital
Repositories
Web
Legacy ILS
IR
Thank you.
http://www.lib.ncsu.edu/endeca
Andrew Pace, Head, IT
andrew_pace@ncsu.edu
Download