Abstract
VIVO is:
• an organic approach to reflecting the research at an institution;
• the antithesis of a static set of web pages.
VIVO is not:
• a ‘set and forget’ collection of automated data ingests.
VIVO Cornell engages in a continual process of environmental monitoring and major and minor overhauls as we strive to deliver quality, useful results.
We are:
• Monitoring how Cornell researchers are representing their work on the Web and designing displays that complement those efforts.
• Developing tools to identify substandard data and assist in their cleanup.
• Scanning the Cornell Web to encourage the reuse of VIVO data and deliver on the promise of
Linked Open Data
Publications management: PubMed, Researcher ID, Google Scholar
Current production VIVO Revised view: Mockup 1
1.
We coordinate and integrate our contribution to
Cornell’s information discovery goals as researchers’ information practices change and competing and complementary information products are introduced. Some researchers pay detailed attention to their web presence, others do not. With VIVO 1.5 we are designing display options to meet the varying needs of our researchers. We are looking at how publications could be managed to meet both individual and institutional goals.
Revised view: Mockup 2 Revised view: Mockup 3
Faculty lab web page
2.
VIVO Cornell data come from heterogeneous overlapping sources reflecting the diversity and complexity of our institution. In addition to manual data entry (with all its attendant quality issues) the
Cornell data systems of record deliver duplicate and contradictory data that must be cleaned and reconciled.
We have developed a tool that semi-automates this process. We apply heuristic matching algorithms to VIVO data to cluster similar names
(of people, journal titles and organizations). The
URI Tool presents those results for manual review and resolution. We identify, or create (from online sources such as Ulrichs) an authoritative version and then merge all the variants to that name.
Journal titles can vary by one word; we have researchers with the same name, but in different
Colleges; this manual step is the only reliable way to clean a dataset of this size.
VIVO data sources and data feeds
3.
Since we are required to take data from Cornell’s systems of record we cannot ‘clean up’ the data in
VIVO. It must be done at the source. For example: several Colleges at Cornell use Activity Insight from Digital Measures. It is difficult to identify missing or malformed values using the Activity
Insight administrative interface. We are developing a tool that presents the information in a format that
College administrators can use to correct the data, which will then feed into VIVO.
Mockup 3: Expanded
AI feedback form
Data integrity: URI Tool
4.
We must regularly monitor and coordinate with our data providers to keep our processes up to date.
We also pay attention to the continual changes in the information landscape. In an institution the size of Cornell it is easy for potentially duplicative initiatives to emerge. We have taken organizational approaches to maintaining accurate processes, and minimizing the wasteful duplication of effort.
This is a time consuming process, but essential if
VIVO is to be an accurate representation of research at Cornell. Here is a sample of the groups we meet with:
• The campus-level Activity Insight Users
Group and the Activity Insight implementation teams at the College level
• The Communications staff of the Vice Provost for Research and in the Colleges
• The Web Manager for the College of Arts and
Sciences; they do not use Activity Insight
• The appropriate staff from our data sources—
Human Resources, the Registrar (for courses), the Office of Sponsored Programs,
Cornell Cooperative Extension
• The Office of Land Grant Affairs
• Our institutional sponsors