Implementing FRBR on Large Databases Thomas Hickey Diane Vizine-Goetz OCLC Research What is FRBR • IFLA study group report: Functional Requirements for Bibliographic Records • Bibliographic model independent of cataloging rules • Clusters bibliographic items into a four-level structure • • • • Work Expression Manifestation Item CNI 2002 Fall Task Force 2 Control of Entities in FRBR Work Expression Manifestation Person Concept Object Corporate Body Item Event Place Entities Surrogates Uniform titles Citations CNI 2002 Fall Task Force Names Subjects 3 Why FRBR? • Potential to improve: – Cataloging – Discovery – Delivery • By – Bringing versions of works together – Showing relationships of various kinds – Enabling users to navigate to level of interest CNI 2002 Fall Task Force 4 Research on FRBR & WorldCat • Subsets – By library, region – Example/problem sets • Shakespeare, the Bible • Humphry Clinker • 1,000 random works – By genre • Dissertations • Fiction • Whole file, 47 million bibliographic records CNI 2002 Fall Task Force 5 Our Approach • Concentrating on work-level – Problems with expression-level clusters • Efficient, maintainable, understandable • Few, if any, false matches with correct cataloging – Err on the side of missed matches – Some accommodation of frequent variants • Compare with manually clustered CNI 2002 Fall Task Force 6 The Algorithm • A key is generated for each record • Extract author, title – Look up in NACO authority file – Added entry information as needed • Form a key from bibliographic record – Author, title, added entry information – These can be sorted, compared CNI 2002 Fall Task Force 7 Problems • Many (17%) records do not have – Author main-entry – Uniform title • In general these can not be matched – Look at added entries – Information at the expression and manifestation levels – Handled separately – 180,000 clusters involving ~400,000 records CNI 2002 Fall Task Force 10 Top 10 WorldCat Clusters # Recs Author/Title Key 8,383 8,055 6,174 4,033 3,964 3,477 2,402 2,248 2,153 bible\n t bible bible\authorized bible\o t\psalms haggadah great britain/treaties etc bible\o t koran arabian nights CNI 2002 Fall Task Force 11 Top 10 from a Public Library # Recs Author/Title Key 89 85 84 81 63 61 60 58 57 56 bible\authorized mother goose chopin, frederic\1810 1849/piano music schulz, charles m/peanuts davis, jim/garfield moore, clement clarke\1779 1863/night before christmas mozart, wolfgang amadeus\1756 1791/instrumental music bach, johann sebastian\1685 1750/cantatas beethoven, ludwig van\1770 1827/sonatas twain, mark\1835 1910/adventures of huckleberry finn CNI 2002 Fall Task Force 12 Results • Manual estimate: 1.5 manifestations/work in WorldCat • Algorithm: ~1.3 • 25,844 clusters have 20 or more records • 401,659 clusters have 5 or more records CNI 2002 Fall Task Force 13 Preliminary Plans • Build structures for FRBR into new catalog • Expose FRBR clustering for searching • Make visible in cataloging – As consensus on implementation is developed – As cataloging rules accommodate FRBR CNI 2002 Fall Task Force 14 Spin-offs • NACO normalization code – Testbed – Server • Authority work – ePrints UK • FRBR in other projects – FictionFinder – NDLTD union catalog CNI 2002 Fall Task Force 15 Fiction Subset • • • • • 2,665,662 WorldCat records 1,758,479 work clusters 1.5 records/cluster 3,866 clusters have 20 or more records 50,540 clusters have 5 or more records CNI 2002 Fall Task Force 16 Top 10 clusters for fiction # Recs 1,296 1,248 971 828 689 624 618 600 581 570 Author/Title Key defoe, daniel\1661 1731/robinson crusoe carroll, lewis\1832 1898/alices adventures in wonderland cervantes saavedra, miguel de\1547 1616/don quixote stevenson, robert louis\1850 1894/treasure island twain, mark\1835 1910/adventures of huckleberry finn twain, mark\1835 1910/adventures of tom sawyer swift, jonathan\1667 1745/gullivers travels andersen, h c\hans christian\1805 1875/tales stowe, harriet beecher\1811 1896/uncle toms cabin arabian nights CNI 2002 Fall Task Force 17 FictionFinder • Employs work clusters in a prototype system for searching and browsing bibliographic records for fiction • Indexes records at the work level and organizes displays by work and expression (primarily language) • Includes records for textual items; additional modes of expression (moving image, sound) to be added later CNI 2002 Fall Task Force 18 395 records for author “crichton, michael\1942” clustered into 17 entries 23 40 5 11 44 26 5 16 7 27 47 25 37 31 7 19 25 395 airframe andromeda strain binary case of need congo disclosure disclosure a novel eaters of the dead eaters of the dead the manuscript of ibn fadlan relating his experiences with the northmen in a d 922 great train robbery jurassic park lost world rising sun sphere sphere a novel terminal man timeline Typical Results Set Display Typical Work-level Display Typical Results Set Display Typical Work-level Display Benefits • Aggregated displays for works and expressions • Enhancement of (fiction) records at work level – with elements from records within the work cluster (e.g., summaries, genre terms, subject headings, class numbers) – with external data (e.g., literary prizes, prequels/sequels, evaluative content) CNI 2002 Fall Task Force 24 Challenges • Identifying appropriate bibliographic data for systematically grouping or differentiating works and expressions – Works • Genre (graphic novel v.s novel) • Genre + mode of expressions (audio book v.s radio play) • Degree of modification (abridgement of juvenile work v.s an adaptation for young children) – Expressions • translators, illustrators, editors CNI 2002 Fall Task Force 25 Next Steps • FRBR algorithm – Explore applications – Refine algorithm as needed • FictionFinder – Add records for sound and image – Conduct user studies CNI 2002 Fall Task Force 26 Links • Functional Requirements for Bibliographic Records Final Report – http://www.ifla.org/VII/s13/frbr/frbr.htm • Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR) – http://www.dlib.org/dlib/september02/hickey/09hickey.html • OCLC Research Activities and IFLA's Functional Requirements for Bibliographic Records – http://www.oclc.org/research/projects/frbr/index.shtm • Implementing FRBR on Large Databases – http://staff.oclc.org/~vizine/CNI/OCLCFRBR.htm CNI 2002 Fall Task Force 27