Creating a Data Interchange Standard for Researchers, Research, and Research Resources: VIVO-ISF Dean B. Krafft Brian Lowe Coalition for Networked Information 10 December 2013 What is VIVO? • Software: An open-source semantic-web-based researcher and research discovery tool • Data: Institution-wide, publicly-visible information about research and researchers • Standards: A standard ontology (VIVO data) that interconnects researchers, communities, and campuses using Linked Open Data • Community: An open community with strong national and international participation VIVO Normalizes Complex Inputs NIH RePorter HR data Faculty Reporting Researcher. gov VP Research Univ. Communic ations Grants Tech transfer People Selfediting Grad School Research Facilities & Services Center/ Dept/ Program websites Data Publications other databases HPC Courses Other campuses Google Scholar Cross Ref Pubmed arXiv VIVO connects scientists and scholars with and through their research and scholarship SKE Knowledge Environment http://ske.las.ac.cn/ Customization The VIVO Community is now over 100 institutions worldwide Why is VIVO important? • It is the only standard way to exchange information about research and researchers across diverse institutions • It provides authoritative data from institutional databases of record as Linked Open Data • Structured VIVO data supports search, analysis and visualization across institutions and consortia • It is highly flexible and extensible to cover research resources, facilities, datasets, and more An HTTP request can return HTML or data Value for institutions and consortia • Common data substrate – Public, granular and direct – Discovery via external and internal search engines – Available for reuse at many levels • Distributed curation – – – – E.g., affiliations beyond what HR system tracks Data coordination across functional silos Feeding changes back to systems of record Direct linking across campuses • Data that is visible gets fixed Example: U.S. Dept. of Agriculture • Multiple agencies including Agricultural Research Service and U.S. Forest Service • VIVO portal for 45,000 intramural researchers • Goal to link to Land Grant universities and international agricultural research centers • Using VIVO as an integration tool to send data for federal STAR METRICS/SciENCV projects • RDF exposed via a SPARQL endpoint constitutes compliance VIVO Exploration and Analytics • Since VIVO is structured data, it can be navigated, analyzed, and visualized uniformly within or across institutions • VIVO can visualize the strengths of networks within and across institutions • You can create dashboards to help understand academic outputs and collaborations • VIVO can map research engagements and impact Providing the Context for Research Data • Context is critical to finding, understanding, and reusing research data • Contexts include: – Narrative publications – The researcher, research resources, grants, etc. – Dataset registries – Structured Knowledge Environments – The web of Linked Open Data VIVO Dataset Registries • VIVO/ANDS consortium in Australia – Link research data with researcher profiles and publications – Harvest to national registry • Datastar data registry tool – Add-on to VIVO or independent companion – Complement to other library data-related services – Institute for Museum and Library Services (IMLS) grant Melbourne Central Research Data Registry What is VIVO Today? • An open community hosted by the DuraSpace 501(c)3 with strong national and international participation, for which we are currently hiring a full-time VIVO Project Director • An open suite of software tools • A growing body of interoperable data • An ontology (VIVO-ISF) with a communitydriven process for extension VIVO and the Integrated Semantic Framework What is the Integrated Semantic Framework? • A semantic infrastructure to represent people based on all the products of their research and activities – To support both networking and reporting • A partnership between VIVO, eagle-i, and ShareCenter • A Clinical and Translational Information Exchange Project (CTSAConnect) – 18 Months (February 2012 – August 2013) – Funded by NIH NCATS via Booz Allen Hamilton CTSAconnect Team OHSU: Melissa Haendel, Carlo Torniai, Nicole Vasilevsky, Shahim Essaid, Eric Orwoll Harvard University: Daniela Bourges-Waldegg Sophia Cheng Cornell University: Jon Corson-Rikert, Dean Krafft, Brian Lowe Share Center: Chris Kelleher, Will Corbett, Ranjit Das, Ben Sharma University of Florida: Mike Conlon, Chris Barnes, Nicholas Rejack Stony Brook University: Moises Eisenberg, Erich Bremer, Janos Hajagos University at Buffalo: Barry Smith, Dagobert Soergel People and Resources genes affiliation anatomy roles techniques training publications protocols grants credentials manufacturer Connecting researchers, resources, and clinical activities Beyond Static CVs • Distributed data • Research and scholarship in context • Context aids in disambiguation • Contributor roles • Outputs and outcomes beyond publications Ontologies for Linked Data • First level text – Second level • Third level – Fourth level » Fifth Level Linked Data Vocabularies FOAF (people, organizations, groups) VCard (contact information) BIBO (publications) SKOS (terminologies) Open Biomedical Ontologies OBI (Ontology of Biomedical Investigations) RO (Relationship Ontology) ERO (eagle-i Research Resource Ontology) IAO (Information Artifact Ontology) Basic Formal Ontology Occurrent Process Continuant Role Spatial Region Site Szabolcs Toth http://www.flickr.com/photos/necccc/5726970855/ Relationships Person Position Org. Person Authorship Article Aggregate Data over Time Person Position time interval Org. Aggregate Data over Time Position 1 Person Org. 1 time Interval 1 Position 2 time Interval 2 Org. 2 Aggregate Data over Time Person VCard time interval Name Aggregate Data over Time VCard 1 Person Old Name time Interval 1 VCard 2 time Interval 2 New Name Aggregate Data over Time Person VCard time interval Authorship Beyond Publication Bylines • What are people doing? • Roles in projects, activities • Other kinds of scholarly contribution • Datasets, resources Person Role Project Roles and Outputs Project Person Role document /resource / etc. Application Examples: Search Application Examples: Search Scripps WashU VIVO VIVO UF VIVO IU VIVO Ponce eagle-I Research resources VIVO Harvard Profiles RDF Other VIVOs Cornell Ithaca VIVO Solr search index Weill Cornell VIVO vivo search .org Linked Open Data Alternate Solr index Iowa Loki RDF Digital Vita RDF Application Examples: Search Use Cases • Find publications supported by grants • Discover and re-use expensive equipment and resources • Demonstrate importance of facilities services to research results • Discover people with access to resources or with expertise in techniques Linking People through Terminologies http://cstaconnect.org/ Clinicians ICD9 codes Researchers ISF + UMLS linked data MeSH keywords Humanities and Artistic Works • Performances of a work • Translations • Collections and exhibits Steven McCauley and Theodore Lawless, Brown University http://www.vivoweb.org/files/vivo2013/friday_pm/ VIVO-Humanities_McCauley.pdf Collaborative Development • DuraSpace VIVO-ISF Working Group • Biweekly calls (Wed 2 pm ET) https://wiki.duraspace.org/display/VIVO/ - look for “Ontology Working Group” Interest Groups Linked Data for Libraries: Creating a Scholarly Resource Semantic Information Store (SRSIS) Linked Data for Libraries • On December 5, 2013, the Andrew W. Mellon Foundation made a two-year $999K grant to Cornell, Harvard, and Stanford starting Jan ‘14 • Partners will work together to develop an ontology and linked data sources that provide relationships, metadata, and broad context for Scholarly Information Resources • Leverages existing work by both the VIVO project and the Hydra Partnership The Project Team • Cornell: Dean Krafft, Jon Corson-Rikert, Brian Lowe, Simeon Warner, and 1.5 new FTE • Harvard: David Weinberger, Paul Deschner, and an outside consultant • Stanford: Tom Cramer and 1 new FTE “The goal is to create a Scholarly Resource Semantic Information Store model that works both within individual institutions and through a coordinated, extensible network of Linked Open Data to capture the intellectual value that librarians and other domain experts add to information resources when they describe, annotate, organize, select, and use those resources, together with the social value evident from patterns of usage.” Project timeline 2014 • Jan-June 2014: Initial ontology design; identify data sources; identify external vocabularies; begin SRSIS and Hydra ActiveTriples development • July-Dec 2014: Complete initial ontology; complete initial ActiveTriples development; pilot initial data ingests into Vitro-based SRSIS instance at Cornell Workshop – December 2014 • Hold a two-day workshop for 25 attendees from 10-12 interested library, archive, and cultural memory institutions • Demonstrate initial prototypes of SRSIS and ontology • Obtain feedback on initial ontology design • Obtain feedback on overall design and approach • Make connections to support participants in piloting this approach at their institutions • Understand how institutions see this approach fitting in with their own multi-institutional collaborations and existing cross-institutional efforts such as the Digital Public Library of America, VIVO, and SHARE Project timeline Jan-June 2015 • Pilot SRSIS instances at Harvard and Stanford • Populate Cornell SRSIS instance from multiple data sources including MARC catalog records, EAD finding aids, VIVO data, CuLLR, and local digital collections • Develop a test instance of the SRSIS Search application harvesting RDF across the three partner institutions • Integrate SRSIS with ActiveTriples Project timeline July-Dec 2015 • Implement fully functional SRSIS instances at Cornell, Harvard, and Stanford • Public release of open source SRSIS code and ontology • Public release of open source ActiveTriples Hydra Component • Create public demonstration of SRSIS Searchbased discovery and access system across the three SRSIS instances Project Outcomes • Open source extensible SRSIS ontology compatible with VIVO ontology, BIBFRAME, and other existing library LOD efforts • Open source SRSIS semantic editing, display, and discovery system • Project Hydra compatible interface to SRSIS, using ActiveTriples to support Blacklight search across multiple SRSIS instances Questions? For More Information: http://vivoweb.org @VIVOCollab