Emerging standards for libraries and publishers

Emerging Standards for Libraries and Publishers Cliff Morgan, John Wiley & Sons Ltd UKSG briefing session, 15-17 April 2002 What I’ll be covering  Identifiers  Metadata  E-books What I won’t be covering Graphics (e.g. JPEG, GIF, PNG, SVG)  Character sets (ASCII, Unicode)  Relationship models (RDF, Topic Maps/XTM)  E-commerce (UN/EDIFACT, XML-edi, ebXML)  XML stuff (Schemas, Xlink, XSL, XSLT, etc.)  Usage stats standards (e.g. COUNTER, ANSI/NISO Z39.7-1995)  Rights metadata (XrML, ODRL)  Identifiers  ISSN  ISBN  SICI  BICI  PII  DOI  ISTC  Multimedia identifiers ISBN  International Standard Book Number  ISO 2108  e.g. 0-471-92755-4  Geog location/language - publisher/imprint - title (print format) check character  Has been a standard for > 30 years New ISBN  ISBN is being revised - 13 digits from 1/1/05  Can double capacity by giving a 979 prefix  Issues: - hexadecimal or decimal? - limit ISBN to print - do something else for electronic? versions? formats? - assign to components (e.g. chaps)? - should number be completely dumb? - metadata deposit at assignment? ISSN  International Standard Serial Number  ISO 3297  e.g. 0749-503X  If publisher has not applied for an ISSN, any 3rd party can apply for their own data management needs  Different media get different ISSNs, e.g. print ISSN is different from CD-ROM ISSN  But different file formats don’t get different ISSNs, so offline is different from online, but PDF is same as HTML  If online contains only abstracts of print full text, no new ISSN for e-version  If use print and eISSNs, must change both if title changes  http://www.issn.org:8080/English/pub/getting-checking SICI  Serial Item and Contribution Identifier  ANSI/NISO Z39.56-1996 - reaffirmed  e.g. issue=0749-503X(20010115)18:1<>1.0.TX;2-X Art. = 0749-503X(20010115)18:1<1:YGPIWG>2.0.TX;2-X (Check digits in above examples have not been calculated.)  Well used at issue level - bar codes  Less used at article level SICIs at Article Level Requires publication info - but publishers want to assign article Ids before pubn  Long-winded  Unfortunate syntax for Internet transfer (<>, #) needs SGML entifying and hex encoding  Unclear what to do with special characters in Title Code  Not unique ID if two untitled articles on same page (e.g. Letters)  C = Contribution, not Component  SICI allows identification of article, issue ToC, issue Index and article abstract (DPIs of 0, 1, 2, 3 respectively)  No way of using SICI to identify any other component (such as Figure, Table, Section)  Not surprising since it’s a canonicalisation nightmare  http://sunsite.berkeley.edu/SICI/version2.html BICI  Book Item and Component Identifier  ISO DSFTU (Draft Standard for Trial Use)  e.g. 0387119787(1982)<174:ADTATO>2.2.TX;1-Q  ISBN, date, location, title, component type, etc.  Trial was Aug 2000 to Jan 2002 - not much evidence of use  Many issues the same as for SICI, but also less business push PII  Publisher Item Identifier  Proposed in 1995 by ACS, AIP, APS, IEEE and Elsevier, but never became a standard  e.g. S0749-503X011234  Some publishers use as internal id since doesn’t suffer from any of the SICI problems  But no registration/maintenance agency DOI  Digital Object Identifier  ANSI/NISO Z39.84-2000  e.g. issue = 10.1002/yea.v18:1 article = 10.1002/yea.1234  Well established in academic journals publishing - esp. ‘cos of CrossRef  4.2 million DOIs deposited to date  http://www.doi.org Some publishing issues regarding DOIs What are they assigned to?  Need for matching URL, so can’t assign to anything you wouldn’t give a URL to  Individual publishers need to decide their DOI structure  Doesn’t have to be human-friendly but must be unique, easily generated, and matched with URL  Application profiles for different genres  Processes  Apply to Registration Agency (IDF, CDI, CrossRef, Enpia, LON) for Registrant Prefix  For individual DOIs, batch-process generate DOIs and URLs from electronic metadata and send to RA for deposit  DOIs never change (even if journal changes ownership) but matched URLs (or other locators) can ISTC International Standard Textual Work Code  ISO Committee Draft 21047 - circulated Oct 01, voting finished Jan 02: progressed to Enquiry stage  http://www.nlc-bnc.ca/iso/tc46sc9/21047.htm  E.g. 0A9-2002-1223F332-0 (RA+year+WorkID+check)  A Work (= abstract creation) id - replaces the ISWC(L)   Creator-centric - authors may apply to ISTC Agency directly or via agents or via publisher  Requires metadata deposit too  Publishers therefore need to capture these numbers if they’ve been assigned to Works  Will authors really bother with this? A couple of non-text, non-graphic Ids you might want to know about  ISAN  ISWC ISAN International Standard Audiovisual Number  ISO Draft International Standard 15706  E.g. 153C-7365-B36F-844C-N  Can be issued to movies, trailers, TV programmes, episodes or series, ads, multimedia works if A/V component is significant  http://www.nlc-bnc.ca/iso/tc46sc9/isan.htm   Work has also started on a V-ISAN for Versions ISWC  International Standard Musical Work Code (used to be ISWC(T))  ISO 15707  e.g. T-034524680-1  Identifies any musical work, including arrangements, movements, medleys, samples  http://www.iswc.org/iswc/iswc/en/html/home.html Metadata  Resource discovery (Dublin Core, OAI-PMH), incl. Linking (CrossRef)  Product metadata (ONIX and ONIX for Serials)  Preservation metadata (OAIS)  I am not going to talk about library-specific sets such as MARC, Z-3950, AACR2, etc. Dublin Core  Defined Universal Bibliographic Language for Internet Navigation and Coherent Online Resource Exploration [not really!]  ANSI Z-3985  DC 1.1 (simple, unqualified set of 15 elements)  Qualified set (DCQ? dcterms?) needed to do anything more than basic - not standard yet  DC has been mandated by UK Government (“e-GMS”)  Application Profiles will deal with defined local extensions via namespace declarations OAI-PMH         Open Archives Initiative Protocol for Metadata Harvesting Not really an archive in the sense of repository, more of a political statement and a metadata harvesting protocol Came out of the E-print community, but they welcome commercial publishers Supported by DLF and CNI Uses simple (unqualified) Dublin Core as its metadata E.g. <creator>Cliff Morgan</> Version 2 of protocol due for release June 2002 http://www.openarchives.org CrossRef metadata set  CrossRef matches the metadata in a citation with the metadata in its Metadata Database (MDDB), which includes the DOI for the resource  Participating publishers (91 of ‘em) deposit the m/data with DOI into the MDDB  To date, 3.7M DOIs, covering 5000+ jnls  http://www.crossref.org New version  Version 2 much more complicated - full schema is 113 pages long  In addition to journals, covers books and conference proceedings, at whole title and chapter level  Some element names are different from CrossRef 1.0 ONIX OnLine Information eXchange  Latest release is 2.0  Original focus was message format for books through the trade, but is fast becoming a universal metadata set for describing publications  http://www.editeur.org   ONIX being championed by a number of publishers and online retailers  Swedish Royal Library using ONIX as an input medium ONIX for Serials  Provides rich cataloguing information for agents, librarians, users  Supports alerting, despatch and library check-in  Structured, multi-level bibliographic descriptions, including ToCs  Descriptions for library holdings (direct to OPACs) Draft 2 just released this month  Subscription Package Record provides product catalogue info about subscription packages  Serial Title Record provides catalogue info about an individual serial  Serial Item Record provides structured multilevel bibliographic description of serial parts  So is the CrossRef set like the ONIX for Serials set?  No  They both include metadata that can be used to describe journals, issues and articles  But they don’t use the same element names  CrossRef has mapped to ONIX but not to ONIX for Serials yet - but has said will support when released OpenURL NISO Work Item  Separates metadata for resource from metadata for location  Resolver services (such as SFX, CrossRef) make the context-sensitive link  Solves the “appropriate copy” problem, where more than one legit copy of an article may be available to a library, e.g. local holding, consortium, aggregator service, mirror site, publisher  OpenURL metadata  OpenURL comprises BASEURL and QUERY  BASEURL identifies the resolver; QUERY is a resource description  e.g. (simplified): http://resolver.ukoln.ac.uk/genre=article &atitle=Information%20gateways:… &issn=14684527&volume=24&spage=4 0 &aulast=Heery&aufirst=Rachel  Genres defined as “referent-types”, such as book, chapter, journal, article, conf proc and paper, dissertation, patent, report each has its own metadata spec  High-level concept is the Bison-Futé model http://www.dlib.org/dlib/july01/vandesompel/07vandesompel.html Preservation metadata  OAIS (Open Archival Information System) underlies all digital preservation models  Nothing to do with OAI  Based on SIPs (Submission Info Packages), AIPs (Archival Info Packages) and DIPs (Dissemination Info Packages)  The Producer wraps the stuff up in a SIP, it gets ingested into an AIP, and sent out as a DIP Some other metadata activities LOM - Learning Object Model  IMS - Instructional Management Set (builds on LOM)  PRISM - Publishing Requirements for Industry Standard Metadata  MEG - cross-sectoral Metadata for Education Group  SCORM - Shared Contents Objects Reference Model - US DoD project, also builds on IMS/LOM  How are we supposed to cope with all these metadata sets? A publisher’s metadata becomes an important asset for describing product to the outside world, esp. for trading and linking  If publishers have their publications in electronic form, the metadata will be in there in the file so it just needs extracting and mapping to whatever metadata set the publisher chooses  Production issue: who checks the metadata?  E-books OEBPS - Open E-Book Publication Structure  Three components: a) XML DTD for content b) DC-based metadata (but some noncompliant qualifier attributes) c) description of package’s structure, reading order, navigation  Many OEB files are just (a)  Version 2 being worked on, esp. M&I, and Rights  Formats  Front runners are Adobe E-Book Reader (PDF based) and Microsoft Reader (.lit based)  .lit limited to simple stuff, and not as robust as PDF, but can’t underestimate M/soft  New versions of Adobe will have built-in DOI capability Text reflow  Acrobat 5 introduced sructured PDF  The Holy Grail synthesis of structure and presentation  Writes a PDF file in XML(ish)  Asserts reading order  Allows for reflow into different reader devices  Works best for simple only, but good start Conclusions  There are lots of standards out there  Some of them compete with one another  Not all of them are formal  They may change over time  Publishing industry standards are not only developed by the publishing industry  Not always easy to judge the winners

Emerging standards for libraries and publishers

Related documents

Products

Support

Emerging standards for libraries and publishers

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib