How a music service inspired a new

advertisement
How a music service inspired a new
way of thinking about research
Jason Hoyt, PhD – Research Director
Impact and influence of Web 2.0-based Services on e-Research
Workshop
Edinburgh, UK
3 November 2009
Public Service Announcement
Support Open Access
Let’s talk about….
Idea behind Mendeley
Overall architecture
Mendeley in action
Clean data
Mendeley
Last.fm
works like this:
3) Last.fm builds your music
profile and recommends you
music you also could like... and
it’s the world‘s biggest open
music database
1) Install “Audioscrobbler”
2) Listen to music
Mendeley
Last.fm
music libraries
research libraries
artists
researchers
songs
papers
genres
disciplines
Based in London, UK. We are 18 researchers, graduates and
software developers from...
...backed by co-founders and
former executives of:
We are young!
Mean age = ~25
Our users:
Stanford University
Cambridge University
MIT
After 10 months in
public beta (version 0.9.4.1):
University of Edinburgh
University of Michigan
Harvard
Cornell University
Berkeley
University of Cologne
RWTH Aachen
Dartmouth College
University of Wisconsin
Fraunhofer Institute
ETH Zurich
University of Southampton
University College Dublin
Columbia University
Oxford University
Trinity College Dublin
Max Planck Society
Idea behind Mendeley
Overall Architecture
Mendeley in action
Clean data
Repository
Database
Web Service
Idea behind Mendeley
Overall architecture
Mendeley In Action
Clean data
Adding your papers
You have different options to set up your library:
• Add single files or an entire folder
• “Watch a folder” to automatically import PDF files
• Add existing EndNote/BibTeX/RIS databases, or…
…drag & drop PDF files into the library pane…
… and Mendeley will try to extract the
document details automatically
Document details lookup
You can also try to complete the document
details by querying various databases
(Crossref, PubMed, ArXiv or Google Scholar)
Enter the DOI, PubMed, or ArXiv ID and
click on the magnifier glass to start lookup
What is Mendeley?
Set up and manage your collections
Add tags & notes and edit document details
Library showing all your documents (citation or table view)
Filter your papers by authors,
keywords, tags, or publications
Annotate and highlight
Manage your library
Our Challenges
Challenges
1. Extracting
2. Syncing
3. Verifying
4. Recommending
Clean Data
Is Our Biggest Challenge!
Idea behind Mendeley
Overall architecture
Mendeley in action
Clean Data
Dirty Data
1. Poor extraction
2. User input errors
3. Lookup errors (Open Access issue)
4. Duplicates & near-duplicates*
What can we do with
clean data?
Yummy Data
Implement our own reference checking
service for ourselves/others
APIs & mashups
Entity disambiguation (create
ontologies and semantic services)
Starting point for recommendations
Discover research statistics
Discover research statistics
What is your impact?
Lots of Data
6.99M documents added by users
20M+ documents from other sources
140M references extracted so far
50M documents by Q3 2010
How to Clean
Clean Up
Improve text extraction
“Wiki-fy” metadata with users
Create canonical documents
Canonical Documents
Saves room
Removes duplicates
Corrects errors
Deduplication
Markov clustering
Affinity Propagation
Pair wise similarity
Fingerprinting
Fingerprinting
The cat in the h3t went home
The cat i the hat went home
The cat went home without a hat
Fingerprinting
The cat in the h3t went home
= 010011001
The cat ithe hat went home
= 011011001
The cat went home without a hat = 011110011
Fingerprinting
The ct in th3 hat went hom[
1000101
The cat in the hat went home
1000101
1010101
1010101
1010101
Questions
Will this scale to 50M+ documents
Will it scale to 1B references Q3 2010
Other Challenges
Image-based PDFs
Tables & Figures
Real-time recommendations
Entity disambiguation
jason.hoyt@mendeley.com
@jasonhoyt
www.mendeley.com
Download