The Future of MARC: dead or reviving? Rebecca Guenther NYTSL Fall Program Nov. 4, 2011 Overview of presentation History of MARC The current bibliographic framework Efforts to evolve MARC XML formats Linked data explorations RDA changes LC Bibliographic Framework Transition Initiative What is MARC 21? A syntax defined by an international standard for communications with 2 expressions: Classic MARC (MARC 2709) MARCXML A data element set defined by content designation and semantics Many data elements are defined by external content rules; a common misperception is that it is tied to AACR2 It does not specify internal storage and institutions do not store “MARC 21” A set of 5 formats for different purposes: Bibliographic, Authority, Holdings, Classification, Community Information The current bibliographic environment Billions of rich descriptive records in MARC systems Many national formats have been harmonized with MARC 21 Integrated library systems support MARC bibliographic, authority and holdings formats for different functions Wide sharing of records for 30+ years OCLC is a major source of records MARC records are being reused (sometimes converted) and repackaged Need to interact with descriptions in other formats/syntaxes MARC successes Can carry data formulated by different cataloging rules and conventions Multiple descriptive rules, different principles and models Different subject thesauri Multiple languages and scripts Cooperation in record exchange has resulted in widespread use and cost savings Richness of MARC records supports multifaceted retrieval Coded data Parsed data Problems with MARC MARC 2709 syntax problems Limitation of available fields, subfields, indicator values, etc. Redundant data (fixed vs. variable fields) The longevity of the format complicates reuse of data tags; redundancies have built up over time Ability to link is limited Lack of explicit hierarchical levels Efforts to streamline MARC 21 Take advantage of XML Develop simpler (but compatible) alternatives Increasingly use MARC 21 in an XML structure Take advantage of freely available XML tools MODS and MADS Allow for interoperability with different XML metadata schemas Assemble coordinated set of tools MARCXML MARCXML uses the MARC data element set in an XML syntax Lossless roundtrip conversions Simple flexible XML schema, no need to change when MARC 21 changes Continuity with current data and flexible transition options Problems with limitations in tagging persist http://lccn.loc.gov/2004012412/marcxml MARC derivatives: MODS and MADS Attempts to deal with MARC limitations Eliminates some of the problems with MARC (e.g. lack of tags/subfield codes) More user-friendly (uses language tags) Repackages redundant data elements into one Can carry hierarchical data Less tied to cataloging rules Highly compatible with MARC but simpler, although retaining some richness Widely implemented especially for digital projects Governed by Editorial Committee Example: http://lccn.loc.gov/2004012412/mods Related XML schemas: METS METS A container/information package Wrapper for MARCXML and MODS descriptions Allows for additional technical and preservation metadata Enables tracking of actions on the metadata itself Many use METS as a framework for digital libraries and their metadata Particularly useful for complex digital objects Allows for reuse of rich descriptions Experimentation with “Linked data” Library of Congress Authorities & Vocabularies service: http://id.loc.gov Allows both human-oriented and programmatic access to LC authorities and vocabularies Actionable URIs associated with concepts First offering was Library of Congress Subject Headings, then Names, MARC code lists, Thesaurus of Graphic Materials, ISO 639-2, PREMIS vocabularies Advantages Facilitate development and maintenance process for vocabularies Expose vocabularies to wider communities Experiment with Linked Data Offer bulk downloads Example: http://id.loc.gov/authorities/sh85049843 Experimentation with Linked Data MADS in RDF MODS in RDF Linking vocabularies in id.loc.gov with other external vocabularies PREMIS OWL ontology Integration between ontologies and controlled vocabularies becomes possible MARC Changes for RDA MARC community made many changes to accommodate RDA In some cases RDA was more granular than MARC and data elements had to be examined as to whether such detail was needed Limitations in number of fields/subfields prevented complete crosswalking Need for additional experimentation to determine what needs to be accommodated http://www.loc.gov/marc/RDAinMARC29-9-1211.html Challenges in adapting MARC for RDA RDA was changing as MARC was revised Not all MARC users will be using RDA Continuity with current data is important Not all RDA users will use the increased granularity– tension between simpler vs more complex Impact of FRBR Financial constraints of too much change and scarce resources Specific RDA changes RDA Content, Media, Carrier Fields 336, 337, 338 Controlled vocabularies—codes or text Carrier characteristics Additional values in 008 New subfields in 340 New fields for sound, video, digital Authority changes Attributes of Names and Resources Changes to Authority format for uniform titles (works or expressions) Changes to Authority format for additional metadata about persons, families, organizations New fields for date, place, address, field of activity, occupation, gender, family information New fields for date, content type, language, form of work, medium of performance, key All elements for works/expressions also added to bibliographic Other RDA changes Relationships between resources Name to resource (RDA App I) Resource to resource (RDA App J) Name to name (RDA App K) Uses MARC relators or subfield $i Production, publication, distribution Field 264 with designation of function in indicator URIs in MARC records Links to resources Field 856 for link to resource or related resource URIs available in numerous fields as a link to additional information, e.g. 505, 506, 583 Links to values Controlled vocabulary values may be identified by a URI id.loc.gov RDA vocabularies bring established with URIs in Open Metadata Registry http://rdvocab.info/ URIs in MARC records Do URIs need their own data element or are they self-identifying? Data elements where needed Code lists (relators, countries, GACs, orgs) RDA controlled vocabs (e.g. 336,337, 338) Fields with controlled lists (with $2) Headings Approach (experimental) Use same subfield where data is now Both URI and textual data? Results of the RDA test Feeling that MARC structure doesn’t allow for taking full advantage of RDA Not all RDA data elements have a distinct place in MARC RDA is element based; MARC groups elements that can’t live independently Concerns whether MARC can interoperate with other metadata in a semantic web world Limitations for showing relationships between entities and applying FRBR model Evolving the bibligraphic framework: issues to consider Actionable vs. descriptive data Parsed vs. text Controlled/access vs. transcribed Codes vs. words Library vs. non-library traditions My model vs. your model Stability vs. change Basic retrieval vs. scholar retrieval Cost of change Bibliographic Framework Transition Initiative Rethinking bibliographic control because of technological and environmental changes Content and packaging of RDA suggest that a different carrier is needed to fully exploit it Reevaluate use of scarce resources and provide efficiencies in creating and sharing bibliographic metadata Analyze present and future environment Identify components of the bibliographic framework to support users Plan for an evolution to a future framework Issues to be addressed Determine aspects of MARC that should be retained Experiment with Semantic Web and Linked Data technologies Foster reuse of existing rich metadata Allow for navigating relationships among entities Explore risks of action and inaction and pace of change Plan for migrating existing metadata into a new infrastructure Components of a new bibliographic framework Based on Working Group on Bibliographic Control and RDA test Continue to support MARC during the transition and as long as is needed Broaden participation in a network of resources and be able to link patrons to all kinds of resources Follow an open and transparent process Requirements Broad accommodation of content rules and data models Provide for types of data that accompany or support bibliographic data, e.g. holdings, preservation Accommodate textual and linked data with URIs Reconsider the relationship between internal storage, displays and input screens Requirements Consider all sizes and types of libraries Continue maintenance of MARC until no longer necessary; minimize changes to only those needed for RDA Compatibility with existing records Provide transformations from MARC 21 to the new environment to enable experimentation General approach Focus on the Web environment, Linked Data and RDF Integrate library data and other cultural heritage data on the Web Use of triplestores to provide more options for storing and retrieving data Allow the library environment to become more readily understandable by data creators and software developers Explorations Develop interaction scenarios in the broader information community Develop use cases to scope its boundaries and interdependence with other initiatives, e.g. PREMIS, METS Develop ontologies for the description of resources Experiment collaboratively with new models Use existing partners for prototyping Collaborations Close contact with MARC format partner institutions (national libraries) Review and comment from MARC advisory bodies (e.g. MARBI) Prototyping by networks and vendors Input on modeling with general resource description community Timetable and next steps Provide funding through a 2-year grant Organize consultative groups and prototyping activities Develop models and scenarios Assemble and review ontologies Few real details on time frame Community input Individuals and institutions can recommend members to serve on the advisory or technical committee Join and post thoughts to the bibliographic transition listserv (bibframe@loc.gov) Comments will be publicly available Likely characteristics of postMARC Web and linked data based High level simple core ontologies Modularized format that allows for extensions Application builders can pick from ontologies and extensions There should be a way to keep all elements of MARC MARC to post-MARC could be lossy Agnostic to cataloging rules Ability to output in various syntaxes Conclusions MARC 21 has served the community well for wide sharing of bibliographic metadata Much effort will go into the new initiative There are widely differing views More questions than answers remain How much of MARC will be retained? Will the new format look like MODS, a derivative, or will it be completely new? How will supporting data be accommodated? How will systems change? How long will it take? Thank you! Rebecca Guenther rguenther52@gmail.com http://www.meetyourdata.com