Consuming and Disambiguating Publications Data from

advertisement
University of Florida CTSI: Consuming and disambiguating
publications data from Microsoft Academic Search in VIVO.
Linking to a New Data Source
Using the Academic Search Data
Disambiguating publication authorships is a well-recognized
problem in the field of academic publishing. The task is further
complicated by the welter of available sources of publications
data. UF has taken steps to input publication data via handinput and automated ingests of Thompson-Reuters data.
Future efforts to add new data sources, such as Microsoft
Academic Search, to VIVO will enrich publications data with
the end goal of creating complete, fully-disambiguated
publication records for each author.
The Microsoft Academic Search API can be accessed using an
API key, which can be requested from the Academic Search
web site.
On the Academic Search side, our process involves getting
JSON objects back from the RESTful interface using Python.
On the VIVO side, our process involves getting JSON objects
using SPARQL.
This project was intended to be a proof of concept,
demonstrating our ability to create a programmatic link
between VIVO and the Microsoft Academic Search API, then
retrieve publications data about University of Florida
investigators. Project work was performed on a small subset of
investigators homed in our CTSI.
A hybrid record consisting of data elements from both services
is then constructed. Future work will involve serializing the
data back out to VIVO-compliant RDF/XML to enrich the VIVO
publication record.
Microsoft Academic Search uses machine-learning algorithms
to disambiguate authorships, sometimes leading to papers
being incorrectly attributed or grouped. As UF’s data is handcurated and features authors we have a personal interest in,
future work should involve sorting out incorrectly attributed
papers . We believe the hybrid approach (automation and
hand-entry) is needed to cover all the cases.
Additional efforts have centered around providing a list of
publications involving University of Florida authors back to the
Microsoft Academic Search team.
Future work on this project is expected to include the possible
correction/union of details attached to publication titles that
may be present in both systems.
Did it Work?
What is Academic Search?
Microsoft Academic Search is a free service developed by
Microsoft Research to help scholars, scientists, students, and
practitioners quickly and easily find academic content,
researchers, institutions, and activities.
Fetching data from two sources
Reading JSON objects
Our project is considered a success, as we’ve been able to
retrieve data from Academic Search, compare it to existing
VIVO data in order to match or otherwise disambiguate, then
ingest any new data into VIVO. We are also able to produce a
list of missing publications for the Academic Search team, and
are working on a process to provide this data to them.
Microsoft Academic Search takes full advantage of results from
the Bing search engine, indexing thousands more publications
than can be found at any other single source (almost 39 million
publications for 20 million authors at this time).
We believe that evaluation of publication details is simply a
matter of developing the proper code, likely in Python, since all
connections are already in place and required data is available.
(http://academic.research.microsoft.com/About/Help.htm)
Please contact any of the authors of this poster regarding this
work. All authors can be found in UF VIVO.
Nicholas Rejack 1, Erik Schmidt 1, Michael Conlon 1
1
Clinical and Translational Science Institute, University of Florida
Download