Dr. Duncan Hull http://twitter.com/dullhunk
European Bioinformatics Institute, EBI.ac.uk
e-Science workshop: The influence and impact of Web 2.0 on various applications
11th-12th May 2010, Edinburgh
EBI is an Outstation of the European Molecular Biology Laboratory.
2
• Introduction: Wellcome Trust Genome Campus
• The European Bioinformatics Institute ( ebi.ac.uk
)
• The Wellcome Trust Sanger Institute ( sanger.ac.uk
)
• The Library
• Problem: economics and “freakonomics” of publishing
• The unintended consequences of “publish or perish”
• Burying data in publication silos
• Obscuring identities and obstructing social applications
• Solution? Bibliography 2.0 with citeulike
• Incentives
• Disincentives
• Case study: What we’ve learnt
• Conclusions and future work
17.04.2020
Home of
The European Bioinformatics Institute
The Sanger Institute
Just outside Cambridge, UK
EBI is an Outstation of the European Molecular Biology Laboratory.
Literature ebi.ac.uk/citexplore
Genomes: ensembl.org
DNA +RNA sequences ebi.ac.uk/ena
Protein sequence uniprot.org
Protein structure ebi.ac.uk/pdbe
Transcriptomes e.g. ArrayExpress
Protein domains, families ebi.ac.uk/interpro
Small molecules ebi.ac.uk/chebi and ebi.ac.uk/che mbl
Protein protein interactions ebi.ac.uk/intact
Pathways reactome.org
Systems biomodel s.net
~400 staff (research/services), publishing data on the web
e.g. Ch emical E ntities of B iological I nterest (ChEBI)
Free database /ontology of 500,000 small molecules (many drugs)
5 17.04.2020
6
• The Sanger Institute is a world leading genome research institute using DNA sequencing to further understanding of gene function in health and disease funded by charity
(The Wellcome Trust)
• From THE human genome ten years ago to
1000 genomes today 2010
• More Bio than Informatics (c.f. EBI) with progressive approach to Web 2.0 e.g.
• Daub, J., et al (2008). The RNA wikiproject: community annotation of RNA families. RNA
14 (12), 2462-2464. DOI:10.1261/rna.1200508
• http://en.wikipedia.org/wiki/Wikipedia:WikiProje ct_RNA
~900 Sanger staff (total)
7 17.04.2020
More later
Annual Journal subscription budget
£500,000
(modest compared to multi million pound journal budgets of university libraries)
“ People respond to incentives ,
• although not necessarily in ways that are predictable and manifest.
Therefore, one of the most powerful laws in the universe is the law of unintended consequences . This applies to schoolteachers and
Realtors and crack dealers as well as expectant mothers, sumo wrestlers, bible salesman, and the
Ku Klux Klan…”
…and scientists too…
8 17.04.2020
• Incentive: “publish or perish”
• Publications are rewarded with recognition, hiring, promotion, tenure, fame, funding, fortune, prizes, job satisfaction etc
• Unintended consequences :
• Valuable data gets damaged, destroyed or “buried” (see later)
• Inaccessible to data and text mining on the Web
• Copyright and toll-access journals
• Luddite scientists
• Minimal exploitation of social software for sharing data
• Minimal exploitation of Web 2.0 for sharing data
9 17.04.2020
Why bury it [data] first and then mine it again?
Which gene did you mean?
BMC Bioinformatics. 2005 Jun 7;6:142
DOI:10.1186/1471-2105-6-142
Barend Mons, Wikiproteins http://proteins.wikiprofessional.org
• Gene names: e.g. Hexokinase, HK1, HK2, HK3
• Protein names: e.g. Hexokinase, HK1, HK2, HK3
• Chemical names: e.g. Glucose-6-phosphate, G6P, Glu, Gluc
• Author names: e.g. Mark Baker (see next slide)
• Poor precision and recall
17.04.2020
10
http://pubmed.gov?term=Baker+M[author] http://pubmed.gov?term=Mark+Baker[author] etc
Until we have unique author identifiers, it is difficult or impossible to reliably find the papers published by a particular person
Open Researcher and Contributor ID http://orcid.org
“Tell me whenever Mark Baker publishes a paper”
11 17.04.2020
• Socialisation: (escience > “we-science”)
• How many other people have read this paper?
• What are my friends / enemies reading?
• What other papers did they also read?
• Personalisation (escience > “me-science”)
• These are my publications
• This is my bibliography (stuff I’m reading / have read)
• Digital libraries “ document-centred ” rather than “ people-centred ”
Author name disambiguation in MEDLINE by: Vetle I. Torvik, Neil R.
Smalheiser ACM Trans. Knowl. Discov. Data, Vol. 3, No. 3. (2009), pp. 1-29. DOI:10.1145/1552303.1552304
12 17.04.2020
• http://www.citeulike.org
• Lack of personalisation of library data
• Lack of socialisation of library data
• Works a lot like http://www.delicious.com
13 17.04.2020
14 17.04.2020
Click Post to Citeulike
15 17.04.2020
Tag it (optional) e.g. author tags
16 17.04.2020
Journal picks is a group of 40+ invited users on campus, who select interesting papers
17 17.04.2020
2,016 unique articles in journal picks
(less than one year)
3,880,055 unique articles total
18 17.04.2020
Citeulike + ZeitGeist = CiteGeist http://www.citeulike.org/citegeist
• Selfish scientist (just organise my reference mess)
• What’s popular (interesting stuff CiteGeist)
• Serendipity (find papers you wouldn’t find normally)
• Increase visibility and PageRank of papers?
• Person-centred access points into first / second page of
Google results e.g. http://www.google.com/search?q=carole+goble
Has result below fairly high up list, http://www.citeulike.org/group/10570/tag/carole-goble
19 17.04.2020
• Privacy, don’t want to share with rivals
• (but can make collections private)
• Citeulike might go bust?
• But Springer sponsored
• Parsers are fragile
• easily (and deliberately) broken by publishers
• Valuable data in the hands of a commercial company?
• But Facebook? LinkedIn? Twitter etc?
• No academic reward for using it
• publication = “finished”
• Social software works best with network effects
• There are LOTS of other tools that do this…
20 17.04.2020
www.refworks.com
www.zotero.org
www.hubmed.org
www.mendeley.com
“Last.fm of research” www.connotea.org
www.mekentosj.com
21 17.04.2020
“iTunes for PDF files”
• With significant vested financial interests
• Scopus http://www.scopus.com/
• ISI WOK http://isiknowledge.com
Wrote a review of these systems: Hull, D., S. R. Pettifer, and D. B. Kell (2008). Defrosting the digital library:
Bibliographic tools for the next generation web. PLoS
Comput Biol 4 (10), e1000204+.
DOI:10.1371/journal.pcbi.1000204
22 17.04.2020
• “Publish or perish” has some unfortunate and unintended consequences in science
• Citeulike is an interesting Web 2.0 tool
• We’ve had some success using it (typical “long tail”)
• Weak incentives for use by many cultural barriers to adoption
• Technical barriers to adoption, many tools, messy data
• Future work
• Social network analysis, clickthroughs, tag analysis
• Any other ideas…
• But the times they are a changin’
• Citeulike or something like it will work much better if/when
“publishing” incentives change over time…
23 17.04.2020
• Mark Baker for organising this workshop
• EBI, Christoph Steinbeck (laboratory head)
• Carole Goble, University of Manchester
• The Sanger, Alex Bateman, Frances Martin, Tim Hubbard and all the contributors to the Journal Picks group
• Richard Cameron, Kevin Emamy and the rest of the citeulike team
• BBSRC for funding
• Any questions?
24 17.04.2020