Formats and FRBR Catalogues – Where's our focus?

advertisement
Formats and FRBR Catalogues –
Where's our focus?
Trond Aalberg
NTNU and BIBSYS
Norway
Topics
• FRBRizing existing catalogues
– The BIBSYS FRBR project
• Internal FRBR structures
– How to structure and store FRBR data internally
• Exchange
– How to express and exchange FRBR data externally
• What kind of specification do we want/need the
FRBR to be for implementations?
The BIBSYS FRBR project
a case study in the use of the FRBR model on the BIBSYS database
•
BIBSYS
– Norwegian service center for libraries: Norwegian university libraries, the
National Library, all college libraries, and a number of research libraries
– Bibliographic database with circa 3.8 mill. records (8 mill. holdings)
– BIBSYSMARC ~ NORMARC (subset but not proper subset of USMARC)
•
Project cooperates with
– Norwegian University of Science and Technology (Project management,
modeling and implementations)
– The National Library of Norway (Mapping FRBR – BIBSYSMARC)
– OCLC (running the Work-Set algorithm on the BIBSYS database)
– The National Database Project of Norwegian University Museums (CRM)
•
Funded by the Norwegian Archive, Library and Museum Authority (1/9 2004
– 31/8 2005) and is a part of the Norwegian Digital Library Initiative
Motivation and objectives
• Large number of existing MARC-based bibliographic
catalogues
– FRBRizing existing catalogues is a major challenge and the key
to a FRBRized bibliographic universe
– Realistic FRBR prototypes can be used to validate the model
• ”Holistic” view
– Process the complete database (not ideal subset)
– From FRBR data model to test database and search prototype
– Cover as much as possible of the BIBSYS data
• Findings
– Possibilities and limitations
– How to improve support for FRBR in BIBSYSMARC
– Further research on specific problems
FRBRizing existing catalogues
• Def:
– to implement aspects of the FRBR model
• Two different strategies:
– Presentation layer only
• Adding system component that enables generation
of FRBR
• Run-time or preprocessed
– Presentation and storage layer
• Convert data to a FRBR ”compatible” model
Levels of FRBRizing
• Different levels of FRBRizing
– Implement group 1 entities and inherent
relationships
– Implement group 2 and 3 entitites and
inherent relationsips
– Implement other relationships
– Implement FRBR attributes
Implementing FRBR
Record 1
Internal
FRBR data structure
Record 4
Record 2
Record 5
Record 3
• Build on ER approach
• Decompose and convert MARC
to FRBR attributes
BIBSYSMARC Example record
*008
*015
*020
*082
*100
*241
*245
*260
*300
*700
*096
*096
pv
eng
$a nf0113657
$a 0-8222-1636-1$b h.
$d 839.822[S]
$a Ibsen, Henrik
$a Et dukkehjem $w dukkehjem
$a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness
$a New York $b Dramatists Play Service $c c1998
$a 70 s.
$a McGuinness, Frank
$a NBO $c Småtr. 582 $n 02ga00027
$a NBO $c Ibsensenteret $n 01ga20306
*100 $a Ibsen, Henrik
*241 $a Et dukkehjem $w dukkehjem
*008
pv
eng
*245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank
McGuinness
*020 $a 0-8222-1636-1$b h.
*700 $a McGuinness, Frank
*245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness
*260 $a New York $b Dramatists Play Service $c c1998
*300 $a 70 s.
Whole/Part records in BIBSYS for
numbered series and multi-volumed publications
*00191087302x
*008
pv
eng
*015 $alc90186650
*082 $c839.82/26
*100 $aIbsen, Henrik
*240 $aVerker$lEngelsk
*245 $aThe collected works of Henrik Ibsen$c[entirely revised and edited by
William Archer]$wcollected works of Henrik Ibsen
*250 $aCopyright ed.
*260 $aLondon$bHeinemann$c1906-1912
*300 $a12 b.
*700 $aArcher, William$d1856-1924
*580 $aDette er et lenket flerbindsverk
*096kj$aUHS$bISS$c839.82 Ibs:Col$n75k005729
*096ga$aNBO$cIbsensenteret$n75ga27600
*096ga$aNBO$n75ga29508
*096ga$aNBO$cIbsensenteret$n74ga02037
*096ga$aNBO$cIbsensenteret$n85ga06639
*001982396694
*008
pv
eng
*241 $aNår vi døde vågner
*241 $aLille Eyolf
*241 $aJohn Gabriel Borkman
*245 $aLittle Eyolf ; John Gabriel Borkman ; When we dead awaken $cwith
introductions by William Archer
*260 $c1907
*300 $aXXVIII, 456 s.
*491 $n91087302x$q11$v11
*096ga$aNBO$cIbsensenteret/11$n85ga06648
*096ga$aNBO$cIbsensenteret/11$n75ga29424
*096ga$aNBO$cIbsensenteret/11$n74ga02038
*001980846714
*008
pv
*245 $aBrand$ctranslated and with introduction by C.H. Herford
*260 $c1906
*300 $aXIII, 262 s.
*491 $n91087302x$q3$v3
*700 $aHereford, C.H.
*096ga$aNBO$cNA/A 2001:579$n75ga27601
*096ga$aNBO$cIbsensenteret/3$n74ga02035
*096ga$aNBO$cIbsensenteret/3$n74ga02036
*491 is used to implement an isPartOf reference
App. 20% of the records
Preliminary results: W:E:M Statistics
from the BIBSYS database
4000000
Manifestations
Expressions
Works
3000000
2000000
1000000
1:1
1:N
Data quality problems
• Typical problems for “not normalized” data
– Redundant information
• The same information is duplicated in multiple records
– Records are missing information
– The same information is expressed in different ways
• Inherent problems with data quality
• Results from earlier work on the subset of ”Ibsen” records (~3000)
– Using manual inspection and corrections of entries (language, titles, etc)
– Based on knowledge about author, works, titles, ….
• Compared to results from automatic processing
• Numbers indicate
– a high level of imprecise
information
– quality can significantly
be improved
Works
Expressions
With
error corrections
84
747
Without
error corrections
865
1354
Typical problems
• Different capitalization
Easy
• Spelling errors
• Substrings
• Only selected values
• Indicative information
• Missing information
Difficult
Conversion process outlined
FRBR
Implementation
model
Convert or extract from
MARC fields
to FRBR attributes
FRBR – BIBSYSMARC
mapping
Identify
entities and
relationships
FRBR
in MARC catalogues
N:1
N:1
MARC-record
1~1
Work
Expression
Manifestation
1:N
Item
N:N
Group 2 and 3 entities
N:N
Relationships
FRBR attributes
• Each of the entities in the model has associated with it a set of
characteristics or attributes
• Attributes serve as the means by which users formulate queries and
interpret responses
• Derived from a logical analysis of the data that are typically reflected
in bibliographic records
• Attributes are defined at a logical level
• Some are generally applicable, others are applicable only to
subtypes
• Intended to be comprehensive but not exhaustive
• Not every instance will exhibit all attributes listed
Mapping MARC to FRBR
• FRBR attributes are the bridge between FRBR and other formats
• Functional Analysis of the MARC 21
Bibliographic and Holdings Formats
• Local mapping tables are needed
• Mapping is easy but conversion is difficult
• Depending on the purpose of mapping
– Full conversion of data
– Enable searching in different formats
– Mapping tables need to be close to conversion processes
– Requires refinement of many FRBR attributes
– and generalization of others
• What structures/formats do we implement?
Example: Manifestation.title
•
245 TITLE
•
•
•
•
•
246 PARALELL TITLE (R)
•
•
•
• Field names are translations
of BIBSYSMARC fieldnames
• 740 is also mapped to expression
and work title
$a – Title
$b – Other title information
$n – Number of part of work
$p – Title of part of work
210 - ABBREVIATED TITLE
•
•
•
$a - Title proper/short title
$b - Other title information
740 ADDED ENTRY TITLE (R)
•
•
•
•
•
$a – Title
$b – Other title information
$n – Number of part of work
$p – Title of part of work
$a – Abbreviated title
$b – Complementary information
222 - KEY TITLE
•
•
$a – Key title
$b – Complementary information
• Complex data that maps to a single
element
• Generic category of information
except for 740
• Somewhat comparable structure
Example: Manifestation.identifier
• 020 ISBN (R)
– $a ISBN
– $z Invalid ISBN
• 022 ISSN (R)
– $a ISSN
– $y Invalid ISSN
• 024 ISMN and ISRC (R)
– $a Number
– $x Type of number
– $y Invalid number
• And 027, 028, ..
• Complex data that maps to a single
element
• 020 and 022 comparable structure,
but not 024
Example:
300 PHYSICAL DESCRIPTION
• $a Extent
=
~
~
~
~
Extent of the Carrier
Form of Carrier
Presentation Format (Visual Projection)
Foliation (Hand-Printed Book)
Collation (Hand-Printed Book)
• $b Illustrations (Other physical details)
~
~
~
~
Capture Mode
Colour (Image)
Playing Speed (Sound Recording)
Kind of Sound (Sound Recording)
• $c Format (Dimensions of the carrier)
= Dimensions of the Carrier
*Mapped to manifestation
• Some FRBR attributes
are too specific!
Prototype solution
•
Substructure is not always important for searching
•
Substructure is important for presentation
•
Mix models (FRBR and MARC)?
•
Classifying specific fields/subfields as belonging to a specific entity/attribute
– Not possible for fields that map to several FRBR entities and/or attributes
•
Decomposing record instances
– Determine what belongs to what entity/attribute
– Tag values in MARC records with FRBR entity/attribute
• E.g extend MARCXML with attributes that identify FRBR entity/attribute
– Tag FRBR attribute values with original MARC field/subfield
•
Prototype solution using XML:
–
–
–
–
Different records for different entities
Maintain MARC substructure
To avoid runtime selection of work and expression entities
To facilitate error corrections and improve overall FRBR group 1 structure
FRBR as ontology
• FRBR is a conceptual model
– Mainly interpreted as a reference model
• Can be formalized to an ontology eg. using W3C
OWL:
– ”This is a FRBR.Expression and it has a
FRBR.Translation relationship to another
FRBR.Expression”
• Using Topic Maps and FRBR as typology
(example from another project)
TM prototype
• FRBR as ontology for music information:
– Works and creators
– Artists and recorded performances
– Navigation as the main discovery/search strategy
• Model and represent music information as distinct entities and
relationships using FRBR as “types”
– Not including FRBR attributes
• Exchange and integrate fragments using P2P (TMRAP)
• Objective
–
–
–
–
Explore and evaluate the use of FRBR entities and relationships
P2P exchange and integration of rich music information
Identifiers in the domain of music
The use of FRBR as an ontology in Topic Maps
* Examples are based on demo version of Omnigator software from Ontopia
FRBR ontology
Work example
Conclusion
• What do we want FRBR to be?
– A reference model for bibliographic
catalogues
– A conceptual model for understanding
bibliographic records
– An ontology for exchanging bibliographic
information within the domain and with other
domains
Download