Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway Topics • FRBRizing existing catalogues – The BIBSYS FRBR project • Internal FRBR structures – How to structure and store FRBR data internally • Exchange – How to express and exchange FRBR data externally • What kind of specification do we want/need the FRBR to be for implementations? The BIBSYS FRBR project a case study in the use of the FRBR model on the BIBSYS database • BIBSYS – Norwegian service center for libraries: Norwegian university libraries, the National Library, all college libraries, and a number of research libraries – Bibliographic database with circa 3.8 mill. records (8 mill. holdings) – BIBSYSMARC ~ NORMARC (subset but not proper subset of USMARC) • Project cooperates with – Norwegian University of Science and Technology (Project management, modeling and implementations) – The National Library of Norway (Mapping FRBR – BIBSYSMARC) – OCLC (running the Work-Set algorithm on the BIBSYS database) – The National Database Project of Norwegian University Museums (CRM) • Funded by the Norwegian Archive, Library and Museum Authority (1/9 2004 – 31/8 2005) and is a part of the Norwegian Digital Library Initiative Motivation and objectives • Large number of existing MARC-based bibliographic catalogues – FRBRizing existing catalogues is a major challenge and the key to a FRBRized bibliographic universe – Realistic FRBR prototypes can be used to validate the model • ”Holistic” view – Process the complete database (not ideal subset) – From FRBR data model to test database and search prototype – Cover as much as possible of the BIBSYS data • Findings – Possibilities and limitations – How to improve support for FRBR in BIBSYSMARC – Further research on specific problems FRBRizing existing catalogues • Def: – to implement aspects of the FRBR model • Two different strategies: – Presentation layer only • Adding system component that enables generation of FRBR • Run-time or preprocessed – Presentation and storage layer • Convert data to a FRBR ”compatible” model Levels of FRBRizing • Different levels of FRBRizing – Implement group 1 entities and inherent relationships – Implement group 2 and 3 entitites and inherent relationsips – Implement other relationships – Implement FRBR attributes Implementing FRBR Record 1 Internal FRBR data structure Record 4 Record 2 Record 5 Record 3 • Build on ER approach • Decompose and convert MARC to FRBR attributes BIBSYSMARC Example record *008 *015 *020 *082 *100 *241 *245 *260 *300 *700 *096 *096 pv eng $a nf0113657 $a 0-8222-1636-1$b h. $d 839.822[S] $a Ibsen, Henrik $a Et dukkehjem $w dukkehjem $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness $a New York $b Dramatists Play Service $c c1998 $a 70 s. $a McGuinness, Frank $a NBO $c Småtr. 582 $n 02ga00027 $a NBO $c Ibsensenteret $n 01ga20306 *100 $a Ibsen, Henrik *241 $a Et dukkehjem $w dukkehjem *008 pv eng *245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness *020 $a 0-8222-1636-1$b h. *700 $a McGuinness, Frank *245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness *260 $a New York $b Dramatists Play Service $c c1998 *300 $a 70 s. Whole/Part records in BIBSYS for numbered series and multi-volumed publications *00191087302x *008 pv eng *015 $alc90186650 *082 $c839.82/26 *100 $aIbsen, Henrik *240 $aVerker$lEngelsk *245 $aThe collected works of Henrik Ibsen$c[entirely revised and edited by William Archer]$wcollected works of Henrik Ibsen *250 $aCopyright ed. *260 $aLondon$bHeinemann$c1906-1912 *300 $a12 b. *700 $aArcher, William$d1856-1924 *580 $aDette er et lenket flerbindsverk *096kj$aUHS$bISS$c839.82 Ibs:Col$n75k005729 *096ga$aNBO$cIbsensenteret$n75ga27600 *096ga$aNBO$n75ga29508 *096ga$aNBO$cIbsensenteret$n74ga02037 *096ga$aNBO$cIbsensenteret$n85ga06639 *001982396694 *008 pv eng *241 $aNår vi døde vågner *241 $aLille Eyolf *241 $aJohn Gabriel Borkman *245 $aLittle Eyolf ; John Gabriel Borkman ; When we dead awaken $cwith introductions by William Archer *260 $c1907 *300 $aXXVIII, 456 s. *491 $n91087302x$q11$v11 *096ga$aNBO$cIbsensenteret/11$n85ga06648 *096ga$aNBO$cIbsensenteret/11$n75ga29424 *096ga$aNBO$cIbsensenteret/11$n74ga02038 *001980846714 *008 pv *245 $aBrand$ctranslated and with introduction by C.H. Herford *260 $c1906 *300 $aXIII, 262 s. *491 $n91087302x$q3$v3 *700 $aHereford, C.H. *096ga$aNBO$cNA/A 2001:579$n75ga27601 *096ga$aNBO$cIbsensenteret/3$n74ga02035 *096ga$aNBO$cIbsensenteret/3$n74ga02036 *491 is used to implement an isPartOf reference App. 20% of the records Preliminary results: W:E:M Statistics from the BIBSYS database 4000000 Manifestations Expressions Works 3000000 2000000 1000000 1:1 1:N Data quality problems • Typical problems for “not normalized” data – Redundant information • The same information is duplicated in multiple records – Records are missing information – The same information is expressed in different ways • Inherent problems with data quality • Results from earlier work on the subset of ”Ibsen” records (~3000) – Using manual inspection and corrections of entries (language, titles, etc) – Based on knowledge about author, works, titles, …. • Compared to results from automatic processing • Numbers indicate – a high level of imprecise information – quality can significantly be improved Works Expressions With error corrections 84 747 Without error corrections 865 1354 Typical problems • Different capitalization Easy • Spelling errors • Substrings • Only selected values • Indicative information • Missing information Difficult Conversion process outlined FRBR Implementation model Convert or extract from MARC fields to FRBR attributes FRBR – BIBSYSMARC mapping Identify entities and relationships FRBR in MARC catalogues N:1 N:1 MARC-record 1~1 Work Expression Manifestation 1:N Item N:N Group 2 and 3 entities N:N Relationships FRBR attributes • Each of the entities in the model has associated with it a set of characteristics or attributes • Attributes serve as the means by which users formulate queries and interpret responses • Derived from a logical analysis of the data that are typically reflected in bibliographic records • Attributes are defined at a logical level • Some are generally applicable, others are applicable only to subtypes • Intended to be comprehensive but not exhaustive • Not every instance will exhibit all attributes listed Mapping MARC to FRBR • FRBR attributes are the bridge between FRBR and other formats • Functional Analysis of the MARC 21 Bibliographic and Holdings Formats • Local mapping tables are needed • Mapping is easy but conversion is difficult • Depending on the purpose of mapping – Full conversion of data – Enable searching in different formats – Mapping tables need to be close to conversion processes – Requires refinement of many FRBR attributes – and generalization of others • What structures/formats do we implement? Example: Manifestation.title • 245 TITLE • • • • • 246 PARALELL TITLE (R) • • • • Field names are translations of BIBSYSMARC fieldnames • 740 is also mapped to expression and work title $a – Title $b – Other title information $n – Number of part of work $p – Title of part of work 210 - ABBREVIATED TITLE • • • $a - Title proper/short title $b - Other title information 740 ADDED ENTRY TITLE (R) • • • • • $a – Title $b – Other title information $n – Number of part of work $p – Title of part of work $a – Abbreviated title $b – Complementary information 222 - KEY TITLE • • $a – Key title $b – Complementary information • Complex data that maps to a single element • Generic category of information except for 740 • Somewhat comparable structure Example: Manifestation.identifier • 020 ISBN (R) – $a ISBN – $z Invalid ISBN • 022 ISSN (R) – $a ISSN – $y Invalid ISSN • 024 ISMN and ISRC (R) – $a Number – $x Type of number – $y Invalid number • And 027, 028, .. • Complex data that maps to a single element • 020 and 022 comparable structure, but not 024 Example: 300 PHYSICAL DESCRIPTION • $a Extent = ~ ~ ~ ~ Extent of the Carrier Form of Carrier Presentation Format (Visual Projection) Foliation (Hand-Printed Book) Collation (Hand-Printed Book) • $b Illustrations (Other physical details) ~ ~ ~ ~ Capture Mode Colour (Image) Playing Speed (Sound Recording) Kind of Sound (Sound Recording) • $c Format (Dimensions of the carrier) = Dimensions of the Carrier *Mapped to manifestation • Some FRBR attributes are too specific! Prototype solution • Substructure is not always important for searching • Substructure is important for presentation • Mix models (FRBR and MARC)? • Classifying specific fields/subfields as belonging to a specific entity/attribute – Not possible for fields that map to several FRBR entities and/or attributes • Decomposing record instances – Determine what belongs to what entity/attribute – Tag values in MARC records with FRBR entity/attribute • E.g extend MARCXML with attributes that identify FRBR entity/attribute – Tag FRBR attribute values with original MARC field/subfield • Prototype solution using XML: – – – – Different records for different entities Maintain MARC substructure To avoid runtime selection of work and expression entities To facilitate error corrections and improve overall FRBR group 1 structure FRBR as ontology • FRBR is a conceptual model – Mainly interpreted as a reference model • Can be formalized to an ontology eg. using W3C OWL: – ”This is a FRBR.Expression and it has a FRBR.Translation relationship to another FRBR.Expression” • Using Topic Maps and FRBR as typology (example from another project) TM prototype • FRBR as ontology for music information: – Works and creators – Artists and recorded performances – Navigation as the main discovery/search strategy • Model and represent music information as distinct entities and relationships using FRBR as “types” – Not including FRBR attributes • Exchange and integrate fragments using P2P (TMRAP) • Objective – – – – Explore and evaluate the use of FRBR entities and relationships P2P exchange and integration of rich music information Identifiers in the domain of music The use of FRBR as an ontology in Topic Maps * Examples are based on demo version of Omnigator software from Ontopia FRBR ontology Work example Conclusion • What do we want FRBR to be? – A reference model for bibliographic catalogues – A conceptual model for understanding bibliographic records – An ontology for exchanging bibliographic information within the domain and with other domains