Union Catalogue Models Presentation for IFLA Libraries for the Blind Section Janifer Gatenby, Strategic Research Agenda • Needs • Models • Models and Needs • Infrastructure • Models and Infrastrucure • Advanced Needs • Summary 2 Needs for Union Catalogues • • • • • • Discovery Web presence – Exposure Delivery Cataloguing Collection Development Advanced Needs – Gestalt effect – data quality, data mining – Collective Innovation 3 Purposes of Union Catalogues Discovery and Exposure • Unified point of discovery – larger resources are better exposed in general (single shop versus row of shops versus mall) – comprehensive resources are more attractive to users • more likely to use; more likely to contribute via WIKI • Ease load on local OPAC • provide identity to a library cooperative 4 Purposes of Union Catalogues Discovery and Exposure 5 Purposes of Union Catalogues Discovery and Exposure 6 Purposes of Union Catalogues FRBR – Alternative editions 7 Purposes of Union Catalogues Without clustering 8 FRBR With Clustering 9 FRBR Expansion of search Global Group Local 10 Purposes of Union Catalogues Worldcat.org Statistics • March 2007 Reference : 14,118,777 • 03 2006 96% – Daily average: 455,444 • Links to libraries: 834,866 363% 11 Exposition Referrals to WorldCat.org pages Month over month Referrals In millions 12,000,000 2005 2006 10 10,000,000 8,000,000 6 6,000,000 4,000,000 2 2,000,000 0 Jul Aug Sep Oct Nov Dec 12 Exposure Worldcat.org Referrals March / April 2007 blogs, CNN, AltV + Google Scholar 8% Google 3% library sites 31% 9% Google books 14% education sites 9% Yahoo 26% library sites education sites Yahoo Google books Google Google Scholar blogs, CNN, AltV + 13 Delivery • Resource sharing • Sharing of users – Your users find resources elsewhere – Other’s users find your resources • Delivery becoming increasingly separated from discovery • Increasing delivery options – Digitisation on demand, reference look-up, purchase, resolution – Union catalogues need to link to multiple providers 14 Purposes of Union Catalogues Appropriate Delivery Options Wanted Item Digital Free Physical Licensed In Print Out of Print Rare In Copyright Link, Resolve, Print on Demand Copy Access Restricted Common Out of Copyright Loan, Digitize, Lookup Authenticate / Authorise, Purchase / Pay 15 International Discovery to Delivery Digitisation on Demand • Encourage use of eTen DoD – – – – libraries only need a scanner Software plug in on HTML page ++ “cover to cover” – books, pictures Out of copyright; exceptions for blind, country specific – On demand / just in time – Payment options – direct, credit card, cash to library – Evolving service • ILS interfaces – loan, reservation, recording URL • Regional & international registration of digital masters 16 Discovery Universe Portals OPACs Union Catalogues +++ Request Transfer Msg OpenURL Union Catalogue RDS RDS RDS RDS RDS Delivery Universe WorldCat RS nuc national / regional services nuc nuc ……… Subito BLDSC CISTI Doc Del 17 Data Management • shared data maintenance – copy cataloguing - efficiencies – ability to improve the data quality – FRBR grouping, authority control – data mining and statistical possibilities • cooperative collection management – collection analysis – cooperative digitisation – cooperative storage, weeding, purchasing 18 Purposes of Union Catalogues 2/2 Union catalogue models • Virtual • • Physical • • • Created and maintained by online shared cataloguing Created and maintained via batch load from local systems Combinations • • • Meta search engines Partly virtual; partly physical Online cataloguing and batch load Subset of a larger catalogue • WorldCat Group Catalog / WorldCat Local 19 Virtual Union Catalogue • Advantages: • • Light weight organization Relatively easy to implement; easy to get agreements • • • Need metasearch engine and portal Ad-hoc basis: easy to extend – easy to disconnect Disadvantages: • • • • • • • • Slow Limited common searching accesses Limited sorting / relevance Duplicates Searching inefficiencies Varying quality Availability of all systems Configuration maintenance Search / Retrieve Standards Z39.50 SRU OpenSearch Maintenance Standards none 20 Virtual Model Z39.50 Searches - SUDOC 80 000 70 000 60 000 50 000 40 000 30 000 20 000 10 000 0 Janvier-Avril 2007 Helka NLA Libris BNE DBD ISSN connexions recherches réussies BN-Opale L. C. ratio connexions / recherches réussies RLG 0% 20% 40% 60% 80% 100% Helka NLA Libris BNE DBD ISSN BN-Opale L. C. RLG 21 Physical union catalogue – system performance advantages of single resource • Up time & speed – consistently indexed data offers better precision and recall in retrieval – easier and cheaper to create and maintain data links centrally • TOC, index, article level data • thumbnails, reviews – Ease load on local OPAC 22 Physical Model Subset – WorldCat.org • Speed of searching a single system • • Exposure • • • • • • • • Audience level, holdings count, copyright WorldCat Identities Book covers, reviews, tables of contents Modern, Library 2.0 interface • • WorldCat.org partner network – Google, Yahoo, MSN, online bookshops, antiquarian bookshops, Ask.com etc. Enriched data • • Searching at local / group / global levels Perpetual beta, rapidly evolving Improved displays, faceted searching Social networking features Registry of Digital Masters Collection Analysis Work Cluster Service 23 Models / Needs Discovery Exposure Delivery Cataloguing Collection Develop’t Gestalt Data mining Innovation Virtual Physical Subset standard Z39.50 SRU RTM NCIP ISO ILL • Possible combinations – Physical union catalogue + global subset – Global subset + additional virtual component 24 Infrastructure ratio connexions / recherches réussies 0% 20% 40% 60% 80% 100% Helka • Maintain portal – Load on Local OPACs NLA Libris BNE DBD ISSN BN-Opale L. C. RLG • Maintain physical catalogue – Local Extract – Loading and Conversion – Online maintenance • Cataloguing Support – Error corrections – Quality police - consistency • Global exposure – Export to union catalogue 25 Infrastructure Physical union catalogue Batch versus Online • batch data less consistent but converted & de-duplicated • Not as up to date • Most union catalogues have some batch contribution • Optimal = SRU update 26 Infrastructure Online Contribution to CBS • Windows client WinIBW • Web client • Holdings • Record capture • Simple new record input (Australia) • Z39.50 / UCP • From Aleph systems in Germany • SRU update • Real time update from an alternative interface • Q1 2006 27 Infrastructure Batch Contribution to CBS • FTP • Files sent to FTP server associated with CBS (PUSH) • OAI PMH • CBS OAI server enacts transactions to retrieve pre-defined sets of data from another database (PULL) • Email / SMTP 28 Infrastructure External Output Catalogue Mirror OPAC Documents Repository Catalogue Union catalogue Push OAI Pull SRU Update Push 29 Infrastructure NCC PUSH L O G GGC SRU UPDATE M21 WorldCat WorldCat identifiers, FRBR work id 30 Infrastructure Update protocols compared • OAI & FTP • SRU Update • Two databases are not identical – – Suited for batch loading e.g. A union catalogue is not a slave – OAI is used where the of the contributing catalogues data source exposes the • immediate availability of records metadata to be harvested for discovery – Scalability, no scheduling & batch maintenance purposes • interactive feedback – alignment, – Where the data source enrichment does not expect the data – Identifiers to be changed by the harvesting site or does – Inter-record links – authorities,. not care • Diagnostics – diffused error – Hence one way – no resolution mechanism for response or future synchronisation 31 Infrastructure Subset – WorldCat.org • Independent loading from institutions regardless of • • • No infrastructure • • Location Data formats and standards Lowest setup and maintenance costs WorldCat Local option • Ability to add virtual collections to search 32 Models / Infrastructure Maintain Portal Load on local OPAC Cataloguing Support Local Extract Loading & Conversion Export to global Data difference tolerance Virtual Physical Subset 33 Faceted Searching Sorting 34 Audience Level and Rareness 35 Data Mining 36 37 Collection Analysis: Gestalt effect 38 39 Innovation 40 We can do it !! 41