EMBL-EBI Powerpoint Presentation - National e

advertisement

Bibliography 2.0: A case study from the

Wellcome Trust Genome Campus

Dr. Duncan Hull http://twitter.com/dullhunk

European Bioinformatics Institute, EBI.ac.uk

e-Science workshop: The influence and impact of Web 2.0 on various applications

11th-12th May 2010, Edinburgh

EBI is an Outstation of the European Molecular Biology Laboratory.

Overview

2

• Introduction: Wellcome Trust Genome Campus

• The European Bioinformatics Institute ( ebi.ac.uk

)

• The Wellcome Trust Sanger Institute ( sanger.ac.uk

)

• The Library

• Problem: economics and “freakonomics” of publishing

• The unintended consequences of “publish or perish”

• Burying data in publication silos

• Obscuring identities and obstructing social applications

• Solution? Bibliography 2.0 with citeulike

• Incentives

• Disincentives

• Case study: What we’ve learnt

• Conclusions and future work

17.04.2020

Wellcome to the Genome Campus

Home of

 The European Bioinformatics Institute

 The Sanger Institute

Just outside Cambridge, UK

EBI is an Outstation of the European Molecular Biology Laboratory.

EBI: a data hub for bioinformatics in Europe

Literature ebi.ac.uk/citexplore

Genomes: ensembl.org

DNA +RNA sequences ebi.ac.uk/ena

Protein sequence uniprot.org

Protein structure ebi.ac.uk/pdbe

Transcriptomes e.g. ArrayExpress

Protein domains, families ebi.ac.uk/interpro

Small molecules ebi.ac.uk/chebi and ebi.ac.uk/che mbl

Protein protein interactions ebi.ac.uk/intact

Pathways reactome.org

Systems biomodel s.net

~400 staff (research/services), publishing data on the web

e.g. Ch emical E ntities of B iological I nterest (ChEBI)

Free database /ontology of 500,000 small molecules (many drugs)

5 17.04.2020

6

The Wellcome Trust Sanger Institute

• The Sanger Institute is a world leading genome research institute using DNA sequencing to further understanding of gene function in health and disease funded by charity

(The Wellcome Trust)

• From THE human genome ten years ago to

1000 genomes today 2010

• More Bio than Informatics (c.f. EBI) with progressive approach to Web 2.0 e.g.

• Daub, J., et al (2008). The RNA wikiproject: community annotation of RNA families. RNA

14 (12), 2462-2464. DOI:10.1261/rna.1200508

• http://en.wikipedia.org/wiki/Wikipedia:WikiProje ct_RNA

~900 Sanger staff (total)

Shared Library

7 17.04.2020

More later

Annual Journal subscription budget

£500,000

(modest compared to multi million pound journal budgets of university libraries)

“ People respond to incentives ,

• although not necessarily in ways that are predictable and manifest.

Therefore, one of the most powerful laws in the universe is the law of unintended consequences . This applies to schoolteachers and

Realtors and crack dealers as well as expectant mothers, sumo wrestlers, bible salesman, and the

Ku Klux Klan…”

…and scientists too…

8 17.04.2020

Unintended consequences, an example

• Incentive: “publish or perish

• Publications are rewarded with recognition, hiring, promotion, tenure, fame, funding, fortune, prizes, job satisfaction etc

• Unintended consequences :

• Valuable data gets damaged, destroyed or “buried” (see later)

• Inaccessible to data and text mining on the Web

• Copyright and toll-access journals

• Luddite scientists

• Minimal exploitation of social software for sharing data

• Minimal exploitation of Web 2.0 for sharing data

9 17.04.2020

Why bury it [data] first and then mine it again?

Which gene did you mean?

BMC Bioinformatics. 2005 Jun 7;6:142

DOI:10.1186/1471-2105-6-142

Barend Mons, Wikiproteins http://proteins.wikiprofessional.org

• Gene names: e.g. Hexokinase, HK1, HK2, HK3

• Protein names: e.g. Hexokinase, HK1, HK2, HK3

• Chemical names: e.g. Glucose-6-phosphate, G6P, Glu, Gluc

• Author names: e.g. Mark Baker (see next slide)

• Poor precision and recall

17.04.2020

10

Identity crisis: Mark Baker

http://pubmed.gov?term=Baker+M[author] http://pubmed.gov?term=Mark+Baker[author] etc

Until we have unique author identifiers, it is difficult or impossible to reliably find the papers published by a particular person

Open Researcher and Contributor ID http://orcid.org

“Tell me whenever Mark Baker publishes a paper”

11 17.04.2020

Social information (need identity for this)

• Socialisation: (escience > “we-science”)

• How many other people have read this paper?

• What are my friends / enemies reading?

• What other papers did they also read?

• Personalisation (escience > “me-science”)

• These are my publications

• This is my bibliography (stuff I’m reading / have read)

• Digital libraries “ document-centred ” rather than “ people-centred ”

Author name disambiguation in MEDLINE by: Vetle I. Torvik, Neil R.

Smalheiser ACM Trans. Knowl. Discov. Data, Vol. 3, No. 3. (2009), pp. 1-29. DOI:10.1145/1552303.1552304

12 17.04.2020

A solution, citeulike.org?

• http://www.citeulike.org

• Lack of personalisation of library data

• Lack of socialisation of library data

• Works a lot like http://www.delicious.com

13 17.04.2020

14 17.04.2020

Click Post to Citeulike

15 17.04.2020

Tag it (optional) e.g. author tags

16 17.04.2020

Journal picks is a group of 40+ invited users on campus, who select interesting papers

17 17.04.2020

2,016 unique articles in journal picks

(less than one year)

3,880,055 unique articles total

18 17.04.2020

Citeulike + ZeitGeist = CiteGeist http://www.citeulike.org/citegeist

Citeulike incentives

• Selfish scientist (just organise my reference mess)

• What’s popular (interesting stuff CiteGeist)

• Serendipity (find papers you wouldn’t find normally)

• Increase visibility and PageRank of papers?

• Person-centred access points into first / second page of

Google results e.g. http://www.google.com/search?q=carole+goble

Has result below fairly high up list, http://www.citeulike.org/group/10570/tag/carole-goble

19 17.04.2020

Citeulike disincentives

• Privacy, don’t want to share with rivals

• (but can make collections private)

• Citeulike might go bust?

• But Springer sponsored

• Parsers are fragile

• easily (and deliberately) broken by publishers

• Valuable data in the hands of a commercial company?

• But Facebook? LinkedIn? Twitter etc?

• No academic reward for using it

• publication = “finished”

• Social software works best with network effects

• There are LOTS of other tools that do this…

20 17.04.2020

And the rest…

www.refworks.com

www.zotero.org

www.hubmed.org

www.mendeley.com

“Last.fm of research” www.connotea.org

www.mekentosj.com

21 17.04.2020

“iTunes for PDF files”

Giant corporate commercial competitors

• With significant vested financial interests

• Scopus http://www.scopus.com/

• ISI WOK http://isiknowledge.com

Wrote a review of these systems: Hull, D., S. R. Pettifer, and D. B. Kell (2008). Defrosting the digital library:

Bibliographic tools for the next generation web. PLoS

Comput Biol 4 (10), e1000204+.

DOI:10.1371/journal.pcbi.1000204

22 17.04.2020

Conclusions

• “Publish or perish” has some unfortunate and unintended consequences in science

• Citeulike is an interesting Web 2.0 tool

• We’ve had some success using it (typical “long tail”)

• Weak incentives for use by many cultural barriers to adoption

• Technical barriers to adoption, many tools, messy data

• Future work

• Social network analysis, clickthroughs, tag analysis

• Any other ideas…

• But the times they are a changin’

• Citeulike or something like it will work much better if/when

“publishing” incentives change over time…

23 17.04.2020

Acknowledgements

• Mark Baker for organising this workshop

• EBI, Christoph Steinbeck (laboratory head)

• Carole Goble, University of Manchester

• The Sanger, Alex Bateman, Frances Martin, Tim Hubbard and all the contributors to the Journal Picks group

• Richard Cameron, Kevin Emamy and the rest of the citeulike team

• BBSRC for funding

• Any questions?

24 17.04.2020

Download