Technology, workflow, and protocols in collaboratively edited digitical editions Juan Garcés British Library eIS 20 June 2007 Overview • Technology – XML • Workflow – quality control – quality improvement • Protocols – author attribution – identification and retrieval • What is ‘text’? Technology XML • Text Encoding Initiative – open standard and guidelines – de facto standard for Humanities texts – crucial: consistency (ODD), separation of critical perspective (?) • challenge: OHCO data model only allows one hierarchy • encoding disagreement • texts are more complex Desideratum • simple editing environment that allows: – encoding of heterogeneous aspects of the text – multiple instances of the same ‘layer’ (disagreement) – analysis of interrelation between instances and layers Workflow Quality control: peer review/refereeing • uphold standards of academic disciplines • stricter application since the middle of the twentieth century • anonymity (seldom ‘double-masked’) and independence • criticisms: – – – – – – slow process (sometimes iterative process) susceptible to control by elites and to personal jealousy lacks accountability may be biased and inconsistent failure to catch all fundamental errors fraud Quality control: wikipedia model • • • • mass-publication tool converted into mass-authoring tool everyone can edit contents mistakes are eradicated by community advantages: – timeliness – impressive workforce – democracy • problems: – – – – susceptible to spam and vandalism always a work in progress downplays individual contribution deters participation by scholars Quality control: hybrids • alternatives to traditional peer review: – – – – open peer review (reviewers’ names made known) parallel open peer review voluntary peer review (publication first) extended peer review (beyond publication date) • true hybrids: – content-appropriate marriage of community-oriented, collaborative editing and scholarly editorial process Quality improvement: sequential print publication Editor 3 Manuscript/ surrogate Edition 3 Editor 2 Edition 2 Editor 1 Edition 1 Quality improvement: simultaneous digital publication improved Edition improved Edition Editor 1 Editor 3 Manuscript/ surrogate improved Edition Editor 2 Protocols Author attribution • social, legal, and technical genealogy – social: 18th c. introduced a new concept of individualised authorship based on the idea of a creative genius working alone - the “privileged moment of individualization in the history of ideas, knowledge, literature, philosophy, and the sciences” (Foucault) – legal: “1710 Copyright Act”, or “Act for the Encouragement of Learning and the Securing the Property of Copies of Books to the Rightful Owners Thereof” – technological: coincides with the perfection of the movable types printing press • • essential for evaluating professional output of Humanists (grant application, tenure, etc.) solutions for collaborative ‘authoring’: – hierarchy of authors (lead, assistant, etc. – pre-assigned?) – editing profile (contribution broken down into modular or granular input – how to quantify quality?) – peer assessment • for any solution eeds to be accepted in professional evaluation scenarios! The Canonical Text Services (CTS) Protocol • developed by Neel Smith in conjunction with the Center for Hellenic Studies (Washington, DC) • defines a network service for identifying and working with texts • permanence and citability of scholarly published works – they are “works possessing an explicitly identified edition and explicitly identified citation scheme, that can be irrevocably and identically replicated” • digital library distributed objects accessible via a suite of network services (simple identification and retrieval) The Canonical Text Services (CTS) Protocol • hierarchical TextInventory (following FRBR, includes identification of how to validate a document): – – – – TextGroup+ (author, collection) Work+ (notional) Edition/Translation* (specific versions) Exemplar* (specific physical copies) • hierarchical model for citation of sections of a work (recursively nesting <citation>, mapping XPath expression) • requests – requests expressed as URL parameters – replies formatted as well-formed XML – requests: GetCapabilities, GetWorks, GetValidReff, GetDocumentMetadata, GetPassage, DownloadText Desiderata • impermanence (time stamps, editions) • new entities (data repository vs. VRE scenario)