EZID 2012 University of California Larry L. Sautter Award Application

advertisement
2012 University of California Larry L. Sautter Award Application
EZID
Long-term Identifiers Made Easy
Submitted by:
Joan Starr, EZID Service Manager and Manager of Strategic and Project Planning
California Digital Library (CDL)
University of California Office of the President
joan.starr@ucop.edu
wk: 510-987-0469
Project team members:
Patricia Cruse, UC Curation Center Director
John Kunze, UC Curation Center Associate Director and EZID System Architect
Greg Janée, EZID System Designer
Scott Fisher, EZID User Interface Developer
Joan Starr, EZID Service Manager
Project Highlights
EZID (easy-eye-dee) is a key tool for researchers handling data. It gives them the ability to create and
manage long-term identifiers, which are critically important to ensuring the widest usability of their data
now and in the future. These identifiers are alphanumeric strings that stay constant and point to the
digital location of an object (such as a dataset). Citation information can also be associated with these
identifiers. When a researcher is writing a scholarly paper and cites the research data, she can refer to
the actual data with this identifier, pointing to it directly. In this way, the researcher can track usage, get
credit for her work, share her data, and have it reused for more research.
Most University of California (UC) Libraries are now offering EZID as a service to their campus
researchers. EZID has also been adopted by government data centers, university-hosted research
institutes, and publishers. The service is provided on a cost-recovery basis.
Project Description
EZID (easy-eye-dee) is a service that makes it simple to obtain and manage long-term identifiers for
digital content. The service can create and resolve identifiers on behalf of the user and also allow the
user to enter and maintain information about the identifier (metadata). EZID has both a user interface
and an application programming interface (API).
EZID currently offers services for two globally unique, persistent identifier schemes: Digital Object
Identifiers (DOIs) and Archival Resource Keys (ARKs). DOIs are identifiers originating from the publishing
world and in widespread use for journal articles. DOIs become persistent when the objects and identifier
1|EZID: Long-term identifiers made easy
forwarding information is maintained. CDL is able to offer DOIs because we are a founding member of
DataCite, an international consortium established to promote data citation and data sharing.
ARKs are identifiers originating from the library, archive and museum community. Like DOIs, they
become persistent when the objects and identifier forwarding information is maintained. CDL hosts a
resolution service that resolves ARKs maintained by EZID, as well as LSIDs and LSRNs, and is the
professional home of the ARK scheme’s founder, John Kunze. Over time, EZID will maintain more
identifier types, increasing its feature set and value to researchers and publishers.
EZID has over 40 current clients, including six of the University of California campuses, the University of
Washington, Cornell Institute for Social and Economic Research, NASA-Earth Sciences Data Information
Center, National Center for Atmospheric Research, and the American Astronomical Society. For more
information about clients, please see http://n2t.net/ezid/home/community.
How EZID helps Researchers
EZID benefits researchers at every point in the data-intensive research life cycle. Early in the data
collection process, it can be very useful to assign identifiers to data elements and datasets. These
objects may move many times before they are cited, so stable references are important for local and
distributed research teams. Equally, identifiers can be a key part of data organization plans mandated by
funders because they demonstrate commitment to long-term tracking. We often recommend the use of
ARKs for these early periods and for every object that needs to be tracked. Then, when the researcher is
ready to cite only a small subject of the objects, it is time to get a DOI. Citations in published papers
keep working even if the data moves, if the researcher has access to a tool like EZID to maintain the
target URL.
After citation, the services built on top of EZID and EZID’s membership organization, DataCite, provide
the researcher with citation tracking information.
How EZID makes UC competitive internationally
The DOIs that EZID clients are able to create and manage are the internationally recognized standard
identification mark for scholarly communication. In addition, CDL’s collaboration in DataCite as well as
other national and international data curation and citation organizations such as DataONE
(http://www.dataone.org/) ensures that EZID supports best practices for data management across
research disciplines.
In these national and international contexts, EZID stands out as a unique tool for identifier creation and
management. It is the only such tool to offer multiple identifier schemes, and soon it will be the only
tool to offer support for the DataCite metadata scheme in the user interface. From an administrative
and business perspective, EZID is also breaking new ground in seeking cost recovery. Because of its
elegant solution to the identifier creation and management problem, EZID has been licensed for reselling by another DataCite member.
EZID: Open for Business
2|EZID: Long-term identifiers made easy
EZID is in production operation today with clients across the UC system, across the country and in
Europe. With its RESTful API, it supports the large-scale operations that sensor-based research can
demand, while also providing a friendly user interface for one-by-one requests for smaller identifier
needs and information-seeking.
As will be described in the Technology Description below, EZID has been created using an open-source
model. CDL has hosted the ARK identifier specification for many years, and if a campus or other
organization wishes to operate an ARK server on their own, they are free to do so. EZID recovers costs
for providing the identifier service on behalf of clients, and this includes the DOI component.
EZID: Measuring Success
The cost-recovery mandate is a built-in success criterion for any production service that works within
this framework. In order to establish a pricing strategy, the service manager must develop a recharge
proposal, including a budget and three-year cost recovery plan based on pricing scheme and expected
revenue targets. This is submitted to the UC Office of the President (UCOP) Business Office for approval,
and, if accepted, this then becomes the basis of a pricing schedule that can be shared with prospective
clients.
As noted, then, the Business Office is monitoring revenue targets. EZID is hitting its revenue targets for
the first years of operations. The pricing schedule for non-UC clients is weighted in favor of UC clients,
and so, in order to achieve full cost recovery, CDL will be implementing a marketing campaign to
accelerate non-UC adoption.
Another key measure of EZID’s success is its impressive reliability. In a sense, EZID is composed of three
systems: the EZID service itself, and the two identifier resolver services (ARKs and DOIs). Since inception,
the EZID service and the ARK resolver system, both hosted at CDL, have had 99.9% uptime. The DOI
resolver service is hosted at CNRI (http://www.cnri.reston.va.us/ )on behalf of the DataCite Managing
Agency, the German National Library of Science and Technology (http://www.tib-hannover.de/en/thetib/doi-registration-agency/ . The DOI service takes advantage of the global Handle system, which
includes a high degree of redundancy. To increase the reliability of EZID and the ARK identifier system,
the CDL is pursuing replication relationships with a number of sites in the United States and the United
Kingdom.
Technology Description
EZID’s architecture is based on the classic client-server model, which permits people located anywhere
within the UC system or the world to create, assign, and manage their identifiers. Many services need
long-term identifiers but have only a few commercial services to turn to for their management. And
often they have no centralized way to ensure that their long-term identifiers are both short and globally
unique. Huge savings are experienced because departments need neither pay for nor install and operate
their own identifier service.
3|EZID: Long-term identifiers made easy
Because EZID supports multiple kinds of identifiers with one technology, it doesn’t require expensive
conversion of legacy identifiers to one particular kind of identifier. EZID is open-source, relies only on an
extremely simple, scalable database (BerkeleyDB) that runs under Linux and Solaris, with completely
documented operational procedures and a RESTful API for automated operation at scale.
Timeframe
EZID was launched in June, 2010, with an application programming interface (API), and then a user
interface (UI) was added in September, 2010. This first version of EZID was an introductory pilot offered
without cost exclusively to UC users.
In April 2011, the CDL received approval for the business plan for EZID and we were able to implement
the cost-recovery program by enrolling system users from external institutions as well as UC campuses.
In April 2012, EZID rolled out a new UI. Please visit: http://n2t.net/ezid/
Objective Customer Satisfaction Data


Continuous customer growth
100% renewal rate on paid subscriptions
4|EZID: Long-term identifiers made easy
Download