Powerpoint Slides

advertisement
Search to Discovery:
Finding Global Scholarly
Resources with Primo
Pascal Calarco &
Alison Hitchens, Library
December 6, 2011
Agenda
• The state of search in libraries (Pascal)
• Expanding Primo beyond the local
catalogue (Alison)
• Questions
2011
Library Information Systems: Milestones
Discovery
Metasearch
Citation Linking
ILS 3rd gen (Client-server; 1990s)
ILS 2nd gen (Mainframe; 1980s)
OCLC (library network; 1972)
Early systems
MARC
1960
1980
2011
1990
2000
2010
In the beginning, there was the card
catalog (1901+)
Indexes:
• Subject
• Author
• Title
• Interfiled cards, call
number access
2011
Library of Congress National Union
Catalog (pre-1956)
2011
Henriette Avram, Developer of MARC
• Programmer/analyst at Library
Of Congress
• Developed system for printing
card catalog information (MARC)
• ISO certification 1973
2011
Later, there was the Online Public
Access Catalog (OPAC)
• Machine Readable Cataloging (MARC)
• Inventory of the print/physical holdings of a
library
• Better than the card catalog; keyword searching
& boolean functionality
• Non-intuitive; required training or intermediation
(information professional)
• Limited generally to single library
2011
Library networks & resource sharing
2011
Print to Electronic
2011
Now: Electronic Almost Ubiquitous
• 85%+ of journal literature digital
• Hundreds of specialized scholarly
databases
• Mass print book digitization efforts
• Electronic books going mainstream
• Aggregated meta-indexes: 750 million
metadata for journal/newspaper articles
2011
Goal: improve user experience
• Users want to FIND not search
• Source required information to user
regardless of format or location
• Leverage our knowledge of academic
community @ uWaterloo
• Integrate into key services: LMS, CMS,
other library services
2011
Database Content Silos
Content Silos
ScienceDirect
Catalog
Web of
Science
ILL
JSTOR
ETDs
EEBO
Website
Metasearch
eReserve
System Silos
Metasearch: an interim step
• aka Federated Search; emerged 2003
• Distributed search from one interface via
web services, SOAP/XML gateways
• Idiosyncratic and slow; vendors
implemented variously
• Relevancy of merged results problematic
2011
Problems with catalog searching &
evolution to discovery
• UCLA & Berkeley: information retrieval & user
behavior (1986-1996)
• Google Books: “digitize the world’s knowledge”
(2002)
• Karen Schneider, Andrew Pace, Roy Tennant:
“The OPAC ‘Sucks’”(2002)
• Next generation catalogs -> Discovery (2008+)
2011
Catalogs: Information Science Research
• Christine L. Borgman (1986) “Why are online catalogs hard
to use? Lessons learned from information retrieval studies”
Journal of the American Society for Information Science
• Ray R. Larsen (1991) “The decline of subject searching:
Long-term trends and patterns of index use in an online
catalog” Journal of the American Society for Information
Science
• Ray R. Larsen (1992) “Evaluation of advanced retrieval
techniques in an experimental online catalog” Journal of the
American Society for Information Science
• Ray R. Larsen (1996) “Cheshire II: designing a nextgeneration online catalog” Journal of the American Society
for Information Science
• Christine L. Borgman (1996) “Why are online catalogs still
hard to use?” Journal of the American Society for
Information Science
How Users Search: What We’ve Learned
• Most people make typos at least some of the
time
• Most searches are 2, 3, 4 words with no Boolean
operators
• Most searches use keyword
• Search is hesitant, iterative, often random
process of discovery
• Most people start elsewhere
• Few read help screens
• Few use advanced search – this is true even in
Google
The Google Effect
• Expectations for web search tools now:
– Radically simplified UI, fast results
– Aggregated content
– Relevant results on first page
– Natural Language queries
– Spelling correction/adaptation
2011
The OPAC “Sucks”
• The OPAC lacks common features of most search
engines
–
–
–
–
–
–
–
–
Relevance ranking vs. last in, first out
Spell checking (related - did you mean?)
Popular query operators like + and –
Refine search
Sort flexibility
Faceting
Citation indexing vs full text
Developed for print materials, limitations with electronic
materials or atomized items (like articles)
– Difficult for certain known item search
Industry Trends
• Decouple the front end (search and
discovery) from the back end (inventory
and cataloguing)
• Service Oriented Architecture – many
programs loosely coupled
• Cloud services -- SaaS
• The 5th generation of library business
systems emerging now – hosted, cloud
solutions
Discovery Characteristics
• Enhanced Search Functionality
– Faceted browse
– Relevance ranking
– “Did you mean?” / Spell Checking
• auto-correction, resubmit search
– Content aggregation
• Integrating search for books, articles, etc.
– Single, Simple Search Box
– FRBR – functional requirements for bibliographic
record, grouping editions
Discovery Characteristics, cont.
• Enhanced Experience
– Sometimes fun and engaging
– Interactive/Collaborative
– User centered design
• Enhanced Services
– Find it / Get it for me
– Book Covers / Synopsis
– Full text
– Availability on same page as results
Discovery Characteristics, cont.
•
•
Enhanced Content
– Article Searching
– Commercial Data
– Merging Special Collections
– Harvesting Online Collections
• Grey Literature
• Free Content
Enhanced Access
– Syndication - Getting into users tools
• Course Management Systems
• Browser and Desktop Tool Bars
• Portals
Discovery Components
1. Next Generation Catalog
2. Next Generation “Unified Search” Aid
Full
Text
Vendor
Data
OAI
User Interface
OPAC
ILSCirc
MARC
Data
Normalization &
Apache
SOLR/Lucene
MetaSearch
Phase I
TUG
Content
Components
Phase II
Future
OCUL
Others
Primo
Central
HathiTrust
Archives
Geospatial
RACER
Primo
Evolution of Discovery
Primo
Catalog
Metasearch
Primo
Central
Options for Expanding Primo
• Local ingestion of resources using FTP or
OAI harvesting
• Searching remote resources in Primo
using the Primo DeepSearch API*
• Subscribing to a large centralized index,
such as Primo Central
*Application Programming Interface
2011
Local ingestion of records
• Example: Hathi Trust Digital Library
– Harvest the public domain records from Hathi
Trust Digital Library
– Normalize the records
– Index the records in our local Primo database
– Schedule updates from Hathi Trust into Primo
2011
Normalization: creating local
sort field (Date – Oldest)
2011
Primo Normalized XML (PNX)
2011
Open source & Open platform
• Primo uses Lucene for its indexing
• SOLR exposes Lucene as a web service
and allows for faceting
• APIs and web services allow flexibility and
customization
2011
We can’t index everything!
• Trying out a subscription to Primo Central,
a centralized index of scholarly journal
articles, newspapers, conference
proceedings etc.
• User sees one interface; user is searching
2 indexes
2011
What is Primo Central Index?
• A centralized index
– of free and restricted resources
– primarily articles & e-books
– based on metadata & full-text provided by
publishers/aggregators
– based on the collections selected by the
library in the Primo Administration module
– created & maintained by our vendor, Ex Libris
What is Primo Central Index?
• A centralized index
– of records harvested using the same process
as our local Primo database
– created using the same PNX record structure
as our local Primo database
– indexed using the same indexing tools as our
local Primo database
Blending local and remote
resources
• Both local and remote results are represented in
the facets
• Blended relevance ranking
– Can configure Primo to boost high ranking local
results so that when Primo is doing relevance
ranking on our 4 million records alongside 100s of
millions of Primo Central records local results aren’t
missed by the user
Search = local resources &
Primo Central
How does it work?
• Ex Libris has created & indexed records for
millions of items based on information from the
publishers
• Primo searches Primo Central the same way it
searches the local database
• Full text availability is determined in advance by
our URL resolver SFX, i.e.
• Delivery of the resource uses menu for
New features: snippets give
context
If your search term is found in the
full-text, Primo supplies a snippet
highlighting the term
New features: expanding the
search
Defaults to our library’s electronic subscriptions but users
can expand the search to all of Primo Central
New Facets & Facet Values
Added value:
bX Recommender
Trouble-shooting remote
resources
• We can view the PNX records using web
services but we have no control over the
content or the normalization rules
• Records have the same structure as our
local records but are missing local fields
and don’t reflect local policies
2011
Assessing Primo Central
• Over 65 hours of one-on-one usability
testing and focus groups with
undergraduate students, graduate
students, faculty, staff and alumni
• Library staff survey
• Feedback form
• Statistics from Cognos
2011
Looking to the future
• What other content should be added to
Primo?
• How can we improve/enhance the
interface?
• What is the right balance for boosting local
physical resources?
• How do we point users to resources that
can’t be searched using Primo?
2011
Questions?
• Pascal Calarco
– Associate University Librarian, Digital &
Discovery Services
– pvcalarco@uwaterloo.ca
• Alison Hitchens
– Cataloguing & Metadata Librarian
– ahitchen@uwaterloo.ca
2011
Download