Creating a Data Interchange
Standard for Researchers,
Research, and Research
Resources:
VIVO-ISF
Dean B. Krafft
Brian Lowe
Coalition for Networked Information
10 December 2013
What is VIVO?
• Software: An open-source semantic-web-based
researcher and research discovery tool
• Data: Institution-wide, publicly-visible
information about research and researchers
• Standards: A standard ontology (VIVO data)
that interconnects researchers, communities,
and campuses using Linked Open Data
• Community: An open community with strong
national and international participation
VIVO Normalizes Complex Inputs
NIH
RePorter
HR data
Faculty
Reporting
Researcher.
gov
VP
Research
Univ.
Communic
ations
Grants
Tech
transfer
People
Selfediting
Grad
School
Research
Facilities &
Services
Center/
Dept/
Program
websites
Data
Publications
other
databases
HPC
Courses
Other
campuses
Google
Scholar
Cross
Ref
Pubmed
arXiv
VIVO connects scientists and scholars with
and through their research and scholarship
SKE Knowledge Environment
http://ske.las.ac.cn/
Customization
The VIVO Community is now over 100
institutions worldwide
Why is VIVO important?
• It is the only standard way to exchange
information about research and researchers
across diverse institutions
• It provides authoritative data from institutional
databases of record as Linked Open Data
• Structured VIVO data supports search, analysis
and visualization across institutions and consortia
• It is highly flexible and extensible to cover
research resources, facilities, datasets, and more
An HTTP request can return HTML or data
Value for institutions and consortia
• Common data substrate
– Public, granular and direct
– Discovery via external and internal search engines
– Available for reuse at many levels
• Distributed curation
–
–
–
–
E.g., affiliations beyond what HR system tracks
Data coordination across functional silos
Feeding changes back to systems of record
Direct linking across campuses
• Data that is visible gets fixed
Example: U.S. Dept. of Agriculture
• Multiple agencies including Agricultural Research
Service and U.S. Forest Service
• VIVO portal for 45,000 intramural researchers
• Goal to link to Land Grant universities and
international agricultural research centers
• Using VIVO as an integration tool to send data for
federal STAR METRICS/SciENCV projects
• RDF exposed via a SPARQL endpoint constitutes
compliance
VIVO Exploration and Analytics
• Since VIVO is structured data, it can be
navigated, analyzed, and visualized uniformly
within or across institutions
• VIVO can visualize the strengths of networks
within and across institutions
• You can create dashboards to help understand
academic outputs and collaborations
• VIVO can map research engagements and
impact
Providing the Context for Research
Data
• Context is critical to finding, understanding,
and reusing research data
• Contexts include:
– Narrative publications
– The researcher, research resources, grants, etc.
– Dataset registries
– Structured Knowledge Environments
– The web of Linked Open Data
VIVO Dataset Registries
• VIVO/ANDS consortium in Australia
– Link research data with researcher profiles and
publications
– Harvest to national registry
• Datastar data registry tool
– Add-on to VIVO or independent companion
– Complement to other library data-related services
– Institute for Museum and Library Services (IMLS)
grant
Melbourne Central Research Data Registry
What is VIVO Today?
• An open community hosted by the DuraSpace
501(c)3 with strong national and international
participation, for which we are currently hiring
a full-time VIVO Project Director
• An open suite of software tools
• A growing body of interoperable data
• An ontology (VIVO-ISF) with a communitydriven process for extension
VIVO and the Integrated Semantic
Framework
What is the Integrated Semantic
Framework?
• A semantic infrastructure to represent people based on
all the products of their research and activities
– To support both networking and reporting
• A partnership between VIVO, eagle-i, and ShareCenter
• A Clinical and Translational Information Exchange
Project (CTSAConnect)
– 18 Months (February 2012 – August 2013)
– Funded by NIH NCATS via Booz Allen Hamilton
CTSAconnect Team
OHSU:
Melissa Haendel, Carlo Torniai, Nicole
Vasilevsky, Shahim Essaid, Eric Orwoll
Harvard University:
Daniela Bourges-Waldegg
Sophia Cheng
Cornell University:
Jon Corson-Rikert, Dean Krafft, Brian
Lowe
Share Center:
Chris Kelleher, Will Corbett, Ranjit
Das, Ben Sharma
University of Florida:
Mike Conlon, Chris Barnes, Nicholas
Rejack
Stony Brook University:
Moises Eisenberg, Erich Bremer, Janos
Hajagos
University at Buffalo:
Barry Smith, Dagobert Soergel
People and Resources
genes
affiliation
anatomy
roles
techniques
training
publications
protocols
grants
credentials
manufacturer
Connecting researchers, resources, and clinical activities
Beyond Static CVs
• Distributed data
• Research and scholarship in context
• Context aids in disambiguation
• Contributor roles
• Outputs and outcomes beyond publications
Ontologies for Linked Data
• First level text
– Second level
• Third level
– Fourth level
» Fifth Level
Linked Data Vocabularies
FOAF
(people, organizations,
groups)
VCard
(contact information)
BIBO
(publications)
SKOS
(terminologies)
Open Biomedical Ontologies
OBI
(Ontology of Biomedical
Investigations)
RO
(Relationship
Ontology)
ERO
(eagle-i Research
Resource Ontology)
IAO
(Information Artifact
Ontology)
Basic Formal Ontology
Occurrent
Process
Continuant
Role
Spatial Region
Site
Szabolcs Toth
http://www.flickr.com/photos/necccc/5726970855/
Relationships
Person
Position
Org.
Person
Authorship
Article
Aggregate Data over Time
Person
Position
time
interval
Org.
Aggregate Data over Time
Position
1
Person
Org. 1
time
Interval
1
Position
2
time
Interval
2
Org. 2
Aggregate Data over Time
Person
VCard
time
interval
Name
Aggregate Data over Time
VCard 1
Person
Old
Name
time
Interval
1
VCard 2
time
Interval
2
New
Name
Aggregate Data over Time
Person
VCard
time
interval
Authorship
Beyond Publication Bylines
• What are people doing?
• Roles in projects, activities
• Other kinds of scholarly contribution
• Datasets, resources
Person
Role
Project
Roles and Outputs
Project
Person
Role
document
/resource /
etc.
Application Examples: Search
Application Examples: Search
Scripps
WashU
VIVO
VIVO
UF
VIVO
IU
VIVO
Ponce
eagle-I
Research
resources
VIVO
Harvard
Profiles
RDF
Other
VIVOs
Cornell
Ithaca
VIVO
Solr
search
index
Weill
Cornell
VIVO
vivo
search
.org
Linked Open Data
Alternate
Solr
index
Iowa
Loki
RDF
Digital
Vita
RDF
Application Examples: Search
Use Cases
• Find publications supported by grants
• Discover and re-use expensive equipment and
resources
• Demonstrate importance of facilities services
to research results
• Discover people with access to resources or
with expertise in techniques
Linking People through Terminologies
http://cstaconnect.org/
Clinicians
ICD9 codes
Researchers
ISF
+ UMLS
linked data
MeSH keywords
Humanities and Artistic Works
• Performances of a work
• Translations
• Collections and exhibits
Steven McCauley and Theodore Lawless, Brown
University
http://www.vivoweb.org/files/vivo2013/friday_pm/
VIVO-Humanities_McCauley.pdf
Collaborative Development
• DuraSpace VIVO-ISF Working Group
• Biweekly calls (Wed 2 pm ET)
https://wiki.duraspace.org/display/VIVO/
- look for “Ontology Working Group”
Interest Groups
Linked Data for Libraries: Creating
a Scholarly Resource Semantic
Information Store (SRSIS)
Linked Data for Libraries
• On December 5, 2013, the Andrew W. Mellon
Foundation made a two-year $999K grant to
Cornell, Harvard, and Stanford starting Jan ‘14
• Partners will work together to develop an
ontology and linked data sources that provide
relationships, metadata, and broad context for
Scholarly Information Resources
• Leverages existing work by both the VIVO
project and the Hydra Partnership
The Project Team
• Cornell: Dean Krafft, Jon Corson-Rikert, Brian
Lowe, Simeon Warner, and 1.5 new FTE
• Harvard: David Weinberger, Paul Deschner,
and an outside consultant
• Stanford: Tom Cramer and 1 new FTE
“The goal is to create a Scholarly Resource Semantic
Information Store model that works both within
individual institutions and through a coordinated,
extensible network of Linked Open Data to capture the
intellectual value that librarians and other domain
experts add to information resources when they
describe, annotate, organize, select, and use those
resources, together with the social value evident from
patterns of usage.”
Project timeline 2014
• Jan-June 2014: Initial ontology design; identify
data sources; identify external vocabularies;
begin SRSIS and Hydra ActiveTriples
development
• July-Dec 2014: Complete initial ontology;
complete initial ActiveTriples development;
pilot initial data ingests into Vitro-based SRSIS
instance at Cornell
Workshop – December 2014
• Hold a two-day workshop for 25 attendees from 10-12
interested library, archive, and cultural memory
institutions
• Demonstrate initial prototypes of SRSIS and ontology
• Obtain feedback on initial ontology design
• Obtain feedback on overall design and approach
• Make connections to support participants in piloting this
approach at their institutions
• Understand how institutions see this approach fitting in
with their own multi-institutional collaborations and
existing cross-institutional efforts such as the Digital
Public Library of America, VIVO, and SHARE
Project timeline Jan-June 2015
• Pilot SRSIS instances at Harvard and Stanford
• Populate Cornell SRSIS instance from multiple
data sources including MARC catalog records,
EAD finding aids, VIVO data, CuLLR, and local
digital collections
• Develop a test instance of the SRSIS Search
application harvesting RDF across the three
partner institutions
• Integrate SRSIS with ActiveTriples
Project timeline July-Dec 2015
• Implement fully functional SRSIS instances at
Cornell, Harvard, and Stanford
• Public release of open source SRSIS code and
ontology
• Public release of open source ActiveTriples
Hydra Component
• Create public demonstration of SRSIS Searchbased discovery and access system across the
three SRSIS instances
Project Outcomes
• Open source extensible SRSIS ontology
compatible with VIVO ontology, BIBFRAME,
and other existing library LOD efforts
• Open source SRSIS semantic editing, display,
and discovery system
• Project Hydra compatible interface to SRSIS,
using ActiveTriples to support Blacklight
search across multiple SRSIS instances
Questions?
For More Information:
http://vivoweb.org
@VIVOCollab