ORCID Technical Report May 18, 2011 Development Approach Alpha Phase 1 Phase 2 • Completed Spring 2010 • Self-claim oriented • Limited light integration with a few participant services • Currently in use for demonstration • In progress 2011 • Will be developed by ORCID, Inc. • Will provide core for future production service • Will focus on currently active researchers • Development headed by Geoff Bilder • Development 2012+ • Will address assertions by wide group of third parties • Will extend capabilities for alternate roles and other types of contributions • Will provide mechanisms for automatic de-duplication of third party donated records 2 ORCID Alpha Available for Demo …This Alpha provides a test environment for illustrating use cases and gathering feedback from the community for services ORCID may provide. The Alpha does not represent ORCID's live system… 3 ORCID Alpha Features Institutional Affiliations Links to Published Work via DOI 4 ORCID Alpha Features Profile Management and Privacy Controls Crossref Lookup for Publications 5 Phase 1 Scope • ORCID will build a central registry of unique identifiers for researchers and scholars with the following scope: • ORCID will focus on currently active researchers. • Data will come from individuals and universities. • ORCID will be a hybrid system of self- and organization-asserted identity. • Data collected will be those needed for disambiguation - extra data for optionally creating full CV-like profiles might be added in the future. • The system will provide basic matching and disambiguation of names. • The ORCID system will, from the start, enable 3rd parties to build value added services using ORCID infrastructure • ORCID services will be developed based on the needs of the ORCID community. 6 Phase 1 Functionality The development of the ORCID “alpha” and subsequent discussions with stakeholders have identified a number of additional changes that would need to be made to the system in order to meet common requirements. These include: • Incorporate OAuth2 & profile exchange • Privacy mechanism to support tertiary control (private/protected/public) at field level (needed to support profile exchange) • Authentication/authorization mechanism to support “delegated” management of profiles (e.g. a researcher can grant permission to a departmental secretary or librarian to edit a profile on their behalf) • Include production-level publication lookup feature from CrossRef • Expose minimal provenance information for metadata records 7 Phase 2: Issues of Assertion • Consider disambiguation as a collection of “claims” by different “parties” • Evaluating the duplication, contradiction, and uniqueness can indicate the credibility of a record 8 Ongoing Research: Profile Exchange Group • The work of the Profile Exchange sub-group: •Recommend a technological approach to reliably and efficiently merge researcher profiles from different databases • Approach • Create a Gold Standard of test data which comprises of a set of data where there is a number of known True Positive matches • In terms of creating the test set, a True Positive is determined by matching md5 hash of lowercase email addresses where the email address contains partial match with name of author (this is to avoid the known problem of generic email addresses such as “lab@” “admin@”, “info@”, etc. We are not at all proposing that email/email hash would be used in a final system. • The test data will be loaded into a system, where by contributors with matching technology can pull down the records and apply their methods. 9 Profile Exchange Group Progress • Mike Taylor is leading the R&D work in this space • Research Specialist at Elsevier Labs • Was responsible for ORCID Alpha Scopus integration • Workspace created for Profile Exchange R&D • Elsevier has provided a server to host data • Access Innovation / Data Harmony donated an instance of their XIS XML document repository • Datasets • Focus on High Energy Physics and Computer Science • Data contributed from Elsevier (Scopus) and IEEE • DTD • Published DTD to ORCID working group wiki • Normalizing data onto common DTD to ensure like-to-like comparisons • Current Activities • Research on datasets to develop clustering algorithms and relationships 10