OCLC Cluster Service Leiden March 28 2007 Discussion Session With KB & UVA Janifer Gatenby, Strategic Research Agenda • Welcome and Introductions • Presentation – Clustering – Audience Level – Copyright / Rareness – FAST subject headings • Discussion • Lunch 2 Some slides from NCSU’s Endeca Test Catalog using OCLC work identifiers for Clustering 3 4 5 6 Some slides from PiCarta (Netherlands) Test Catalog using OCLC work identifiers for Clustering 7 Without clustering 8 With Clustering 9 Consolidation of Holdings The above example shows 2 holdings, one each per bibliographic record. The consolidation of holdings permits Reservations (holds) and Requests at work level 10 Dutch • 6.7 million work identifiers / 7.7 million bib records • Collapse rate of 13% – Av. 1.15 bibliographic records per work record • Software adaptation less than 1 week NCSU • 1.64 million work identifiers / 1.7 million bib records • Collapse rate of 3% – Av. 1.03 bibliographic records per work record 11 Method OCLC # OCLC Work ID Title 65647794 20842726 Goldene vliess 27921612 30369321 Goldene vliess 5773235 19885466 Goldene vliess 36638149 12019603 Goldene vliess 36638149 12019603 Goldene vliess 36638149 12019603 Goldene vliess 12 Method PPN 80637760 124594883 36330531 80626203 18113649x 80540333 OCLC # 65647794 27921612 5773235 36638149 36638149 36638149 OCLC Work ID Title Comments 20842726 Goldene vliess not in main group 30369321 Goldene vliess not in main group 19885466 Goldene vliess not in main group 12019603 Goldene vliess in main group 12019603 Goldene vliess in main group 12019603 Goldene vliess in main group 13 Fixing Mismatches • Alternatives – Fix data at source – Apply name / title authority records – Enhance algorithm • Eliminate foreign articles • Convert “fünf”, “vijf”, “cinq” to “5” etc. • At OCLC – Quality control – Office of Research 14 Authorities Ensure Matching • Foreign union catalogue data – Non AACR2, not native MARC21, other language of cataloguing, non standard uniform titles – Requesting 1,000 name / title authority records per union catalogue Bib record for a translation without uniform title will match if there is a comprehensive author / title authority record 15 Bib 100 …Rowling, J.K. 245 …La chambre secrète ……………. Authority Rowling, J.K. The secret chamber De geheime kamer La chambre secrète Die geheime kammer …………… 16 FRBR – Divide and conquer • • • • Creation of works (38 million) Algorithm Authority records Cleaning bibliographic records where necessary • No manual links created • Improved user interfaces • • • • Harvesting Loading IDs & records Authority records Improved user interfaces • Suggestions for the improvement of the algorithm and records 17 ALA Mid Winter Meeting • Representatives 19 libraries with substantial holdings in WorldCat • Clear Requirements – XML cluster record service – Minimum of daily update 18 Discussion 19 Phase 2 • Phase 1 – table • Phase 2 – work record with enriched data – Audience level – Rareness – Copyright – FAST headings for faceted search 20 Audience Level and Rareness 21 OpenURL Request Transfer Message 22 Faceted Search 23 FAST headings • Fully formed concepts • Suitable for faceted search – LCSH “sentences” – breaking into concepts is tricky http://www.oclc.org/research/projects/fast/ 24 Discussion 25 Cluster Cluster Identifier Instance/s Description Related Works Type Identifier/s + type WC Cluster Identifier Value Copyright estimate Instance/s Holdings count (rarity) Relationship (sequel etc.) OCLC Number 26 Cluster WC Cluster Identifier Instances Author Description Title Related works About Audience WC identity ID Language Heading + type Display version Alternative Title/s Classification + type Holdings (rarity) Language Type 27 Deployment • CBS 3.2 ++ incorporating cluster record in test due Easter • Installation in LBS • OCLC Distribution service – dev. To start in April • PSI modifications to use cluster record • Looking for testing partners 28 29