2012 University of California Larry L. Sautter Award Application EZID Long-term Identifiers Made Easy Submitted by: Joan Starr, EZID Service Manager and Manager of Strategic and Project Planning California Digital Library (CDL) University of California Office of the President joan.starr@ucop.edu wk: 510-987-0469 Project team members: Patricia Cruse, UC Curation Center Director John Kunze, UC Curation Center Associate Director and EZID System Architect Greg Janée, EZID System Designer Scott Fisher, EZID User Interface Developer Joan Starr, EZID Service Manager Project Highlights EZID (easy-eye-dee) is a key tool for researchers handling data. It gives them the ability to create and manage long-term identifiers, which are critically important to ensuring the widest usability of their data now and in the future. These identifiers are alphanumeric strings that stay constant and point to the digital location of an object (such as a dataset). Citation information can also be associated with these identifiers. When a researcher is writing a scholarly paper and cites the research data, she can refer to the actual data with this identifier, pointing to it directly. In this way, the researcher can track usage, get credit for her work, share her data, and have it reused for more research. Most University of California (UC) Libraries are now offering EZID as a service to their campus researchers. EZID has also been adopted by government data centers, university-hosted research institutes, and publishers. The service is provided on a cost-recovery basis. Project Description EZID (easy-eye-dee) is a service that makes it simple to obtain and manage long-term identifiers for digital content. The service can create and resolve identifiers on behalf of the user and also allow the user to enter and maintain information about the identifier (metadata). EZID has both a user interface and an application programming interface (API). EZID currently offers services for two globally unique, persistent identifier schemes: Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs). DOIs are identifiers originating from the publishing world and in widespread use for journal articles. DOIs become persistent when the objects and identifier 1|EZID: Long-term identifiers made easy forwarding information is maintained. CDL is able to offer DOIs because we are a founding member of DataCite, an international consortium established to promote data citation and data sharing. ARKs are identifiers originating from the library, archive and museum community. Like DOIs, they become persistent when the objects and identifier forwarding information is maintained. CDL hosts a resolution service that resolves ARKs maintained by EZID, as well as LSIDs and LSRNs, and is the professional home of the ARK scheme’s founder, John Kunze. Over time, EZID will maintain more identifier types, increasing its feature set and value to researchers and publishers. EZID has over 40 current clients, including six of the University of California campuses, the University of Washington, Cornell Institute for Social and Economic Research, NASA-Earth Sciences Data Information Center, National Center for Atmospheric Research, and the American Astronomical Society. For more information about clients, please see http://n2t.net/ezid/home/community. How EZID helps Researchers EZID benefits researchers at every point in the data-intensive research life cycle. Early in the data collection process, it can be very useful to assign identifiers to data elements and datasets. These objects may move many times before they are cited, so stable references are important for local and distributed research teams. Equally, identifiers can be a key part of data organization plans mandated by funders because they demonstrate commitment to long-term tracking. We often recommend the use of ARKs for these early periods and for every object that needs to be tracked. Then, when the researcher is ready to cite only a small subject of the objects, it is time to get a DOI. Citations in published papers keep working even if the data moves, if the researcher has access to a tool like EZID to maintain the target URL. After citation, the services built on top of EZID and EZID’s membership organization, DataCite, provide the researcher with citation tracking information. How EZID makes UC competitive internationally The DOIs that EZID clients are able to create and manage are the internationally recognized standard identification mark for scholarly communication. In addition, CDL’s collaboration in DataCite as well as other national and international data curation and citation organizations such as DataONE (http://www.dataone.org/) ensures that EZID supports best practices for data management across research disciplines. In these national and international contexts, EZID stands out as a unique tool for identifier creation and management. It is the only such tool to offer multiple identifier schemes, and soon it will be the only tool to offer support for the DataCite metadata scheme in the user interface. From an administrative and business perspective, EZID is also breaking new ground in seeking cost recovery. Because of its elegant solution to the identifier creation and management problem, EZID has been licensed for reselling by another DataCite member. EZID: Open for Business 2|EZID: Long-term identifiers made easy EZID is in production operation today with clients across the UC system, across the country and in Europe. With its RESTful API, it supports the large-scale operations that sensor-based research can demand, while also providing a friendly user interface for one-by-one requests for smaller identifier needs and information-seeking. As will be described in the Technology Description below, EZID has been created using an open-source model. CDL has hosted the ARK identifier specification for many years, and if a campus or other organization wishes to operate an ARK server on their own, they are free to do so. EZID recovers costs for providing the identifier service on behalf of clients, and this includes the DOI component. EZID: Measuring Success The cost-recovery mandate is a built-in success criterion for any production service that works within this framework. In order to establish a pricing strategy, the service manager must develop a recharge proposal, including a budget and three-year cost recovery plan based on pricing scheme and expected revenue targets. This is submitted to the UC Office of the President (UCOP) Business Office for approval, and, if accepted, this then becomes the basis of a pricing schedule that can be shared with prospective clients. As noted, then, the Business Office is monitoring revenue targets. EZID is hitting its revenue targets for the first years of operations. The pricing schedule for non-UC clients is weighted in favor of UC clients, and so, in order to achieve full cost recovery, CDL will be implementing a marketing campaign to accelerate non-UC adoption. Another key measure of EZID’s success is its impressive reliability. In a sense, EZID is composed of three systems: the EZID service itself, and the two identifier resolver services (ARKs and DOIs). Since inception, the EZID service and the ARK resolver system, both hosted at CDL, have had 99.9% uptime. The DOI resolver service is hosted at CNRI (http://www.cnri.reston.va.us/ )on behalf of the DataCite Managing Agency, the German National Library of Science and Technology (http://www.tib-hannover.de/en/thetib/doi-registration-agency/ . The DOI service takes advantage of the global Handle system, which includes a high degree of redundancy. To increase the reliability of EZID and the ARK identifier system, the CDL is pursuing replication relationships with a number of sites in the United States and the United Kingdom. Technology Description EZID’s architecture is based on the classic client-server model, which permits people located anywhere within the UC system or the world to create, assign, and manage their identifiers. Many services need long-term identifiers but have only a few commercial services to turn to for their management. And often they have no centralized way to ensure that their long-term identifiers are both short and globally unique. Huge savings are experienced because departments need neither pay for nor install and operate their own identifier service. 3|EZID: Long-term identifiers made easy Because EZID supports multiple kinds of identifiers with one technology, it doesn’t require expensive conversion of legacy identifiers to one particular kind of identifier. EZID is open-source, relies only on an extremely simple, scalable database (BerkeleyDB) that runs under Linux and Solaris, with completely documented operational procedures and a RESTful API for automated operation at scale. Timeframe EZID was launched in June, 2010, with an application programming interface (API), and then a user interface (UI) was added in September, 2010. This first version of EZID was an introductory pilot offered without cost exclusively to UC users. In April 2011, the CDL received approval for the business plan for EZID and we were able to implement the cost-recovery program by enrolling system users from external institutions as well as UC campuses. In April 2012, EZID rolled out a new UI. Please visit: http://n2t.net/ezid/ Objective Customer Satisfaction Data Continuous customer growth 100% renewal rate on paid subscriptions 4|EZID: Long-term identifiers made easy