The Changing Landscape of Scholarly Communication as it Relates to the Biosciences Philip E. Bourne University of California San Diego [email protected] www.sdsc.edu/pb Keck Center Research Conference October 29, 2009 Disclaimer • I am not an information nor computer scientist • I got involved with the Public Library of Science (PLoS) and subsequently the promise of open access • I co-founded a company, SciVee Inc., that is attempting to leverage the perceived changes in scholarly communication • Every discipline is different – my views are broadly drawn from the biosciences Scholarly Communication Group • Can we improve the way science is disseminated and comprehended? • Through openness can we increase the number of people interested in science? Addressing these questions is made easier since this is a time of rapid change in a traditionally conservative STM market Lets Start with a Few Drivers of Change 1. You Have Been Very Busy! In the 5 minutes I have been talking so far ~50 papers have been indexed by PubMed Drivers of Change 2. You Cannot Possibly Read a Fraction of the Papers You Should Drivers of Change Renear & Palmer 2009 Science 325:828-832 3. Your Are Scanning More Reading Less Drivers of Change Renear & Palmer 2009 Science 325:828-832 4. You place more emphasis on writing and less on reading driven by blogs, H-factors… Drivers of Change 5. The Internet has Changed Everything • In 1993 there were very few electronic journals by 2003 nearly all were on-line, by 2013 there will be little or no paper • Traditional publishers have only really achieved an electronic print like experience – the power of the medium is for the taking • Web 2.0 has made us more open • Web 3.0 will accelerate further change Drivers of Change What are the Responses to This Change? Some Responses to Change (ack. Chicken and Egg Situation) – STM publishers are worried – Alternative business models have gained ground – open access, hybrid models, open review – Scientific societies are worried – In an electronic world, databases are becoming more like journals and journals are becoming more like databases – New modes of knowledge and data access are gaining some ground Responses to Change Lets Start with Open Access I Believe Open Access IF Broadly Accepted Could Profoundly Change Scholarly Discourse It remains a big IF Open Access: Taking Full Advantage of the Content PLoS Comp. Biol. 2008 4(3) e1000037 Responses to Change Growth of PubMed Central • Growing much more slowly that PubMed • Compliance is an issue Open Access Open Access (Creative Commons License) 1. All published materials available on-line free to all (author pays model) 2. Unrestricted access to all published material in various formats eg XML provided attribution is given to the original author(s) 3. Copyright remains with the author Open Access Open Access (Creative Commons License) 1. All published materials available on-line free to all (reader pays model) 2. Unrestricted access to all published material in various formats eg XML provided attribution is given to the original author(s) 3. Copyright remains with the author Open Access Open Access: Taking Full Advantage of the Content PLoS Comp. Biol. 2008 4(3) e1000037 Assuming Open Access Takes Off What is Possible? Mashups Notion of traditional publications being associated with podcasts and video www.scivee.tv Mashups – www.scivee.tv Pubcast – Video Integrated with the Full Text of the Paper Pubcasts - A Unique Technology Pubcasts - A Blend of Video, text, tables, figures, PowerPoints, comments, ratings… ALL SYNCHRONIZED FOR RAPID LEARNING Don’t understand what you are reading? Click and have the author pop-up and explain it! See the scientists and the experiments behind the research papers and textbooks Mashups – www.scivee.tv Professional Profile ICTP Trieste, December 2007 SciVee – Viral Projects • • • • Sweetwater School District “Postercasts” Science video competitions “CVcasts” Mashups – www.scivee.tv Postercasts Mashups – www.scivee.tv Assuming Open Access Takes Off What Else is Possible? Semantic Tagging Post Processing the Literature with BioLit Nucleic Acids Research 2008 36(S2) W385-389 http://biolit.ucsd.edu Semantic Tagging Semantic Tagging ICTP Trieste, December 10, 2007 Semantic Tagging 27 This is Literature Post-processing Better to Get the Authors Involved • Authors are the absolute experts on the content • More effective distribution of labor • Add metadata before the article enters the publishing process Semantic Tagging Word 2007 Add-in for Authors • Allows authors to add metadata as they write, before they submit the manuscript • Authors are assisted by automated term recognition – OBO ontologies – Database IDs • Metadata are embedded directly into the manuscript document via XML tags, OOXML format – Open – Machine-readable • Open source, Microsoft Public License Semantic Tagging http://www.codeplex.com/ucsdbiolit Word 2007 Add-in Example of What it Looks Like - Ontologies • Inline Recognition, Highlighting, and Mark-up of Informative Terms – A recognized term will have a dotted, purple underline – Hovering generates a Smart Tag above the term • • • • add mark-up for this term ignore this term view the term in the ontology browser If a recognized term appears in more than one ontology, all instances of that term will be listed – Hovering over a marked-up term • option to apply mark-up to all recognized instances of term • stop recognizing a term – Pass ontology terms back to provider Semantic Tagging Challenges • Author use – Familiarity with ontologies, terms – Agreement between co-authors • End-use of semantically enriched manuscript • Need to combine with NLM XML standard Semantic Tagging Challenges: Author Use IF one or more publishers fast tracked a paper that had semantic markup I would argue it would catch on in no time Semantic Tagging What are Other Responses to this Change? Databases are becoming more like journals and journals are becoming more like databases PLoS Comp. Biol. 2005 1(3) e34 Databases vs Journals Journals are Becoming More Like Databases and Databases are Becoming More like Journals Electronic Supplements Unstructured data are submitted as supplements Databases vs Journals Biocuration A great deal of money is spent extracting from the literature to structure in databases Both are Under Stress • PubMed contains 18,792,257 entries • ~100,000 papers indexed per month • In Feb 2009: – 67,406,898 interactive searches were done – 92,216,786 entries were viewed Databases vs Journals • 1078 databases reported in NAR 2008 • MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times Data as of April 14, 2009 Databases vs Journals • Journals have a pretty standardized interface • Journals have a business model • The quality is declining as numbers increase (?) • Audience believes they are sustainable Databases vs Journals • Efforts to make the interfaces different! • Little attempt at a business model compared to the Web 2.0 world • Quality is increasing (?) • Not well sustained PLoS Comp. Biol. 2008. 4(7): e1000136 Databases vs Journals • Read and write • Web 2.0 influence eg social bookmarking • Read and write eg Wikis • New services eg restful, widgets, semantic tagging • Use of rich media • Crowd review • New metrics • Use of rich media • Crowd review emerging Databases vs Journals PLoS Comp. Biol. 2008. 4(7): e1000136 If There is so Much Similarity Lets Do Another Mashup! Databases vs Journals PLoS Comp. Biol. 2008. 4(7): e1000136 The Test Bed http://www.plos.org/ http://www.pubmedcentral.nih.gov/ http://www.wwpdb.org/ 39 Databases vs Journals The World Wide Protein Data Bank http://www.wwpdb.org Databases vs Journals • The single worldwide repository for data on the structure of biological macromolecules • Free to all • Paper not published unless data are deposited – strong data to literature correspondence • Highly structured data conforming to an extensive ontology • DOI’s assigned to every structure A Note in Passing • Structural biologists have been fervent about making the data associated with their studies freely available • For the most part they do not think the same way about the literature (knowledge) associated with the data – they hand it over without a second thought • This latter point is true of scientists in general • We will come back to this Databases vs Journals The PLoS/PMC Corpus – Under the Hood • Conforms well/partially to the NLM DTD – little markup of content • PMC – some PDFs ! • The lack of conformance will come back to haunt us! Databases vs Journals The Database View Context www.rcsb.org/pdb/explore/literature.do?structureId=1TIM The Literature View – Web 3.0? Databases vs Journals http://betastaging.rcsb.org Take This Notion to its Logical Conclusion Enter PLoS iStructure An interactive journal Databases vs Journals Data Database Knowledge Knowledgebase Data Only Wikis Datapacks Journals Annotation Data + Annotation Data + Some Annotation Databases vs Journals Data + Some Annotation + Some Integration PLoS iStructure The Data – Knowledge Spectrum The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB 4. 1. 1. A link brings up figures from the paper 3. A composite view of journal and database content results 3. 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed The New Reader Workflow 1. User clicks on thumbnail 2. Metadata and a webservices call provide a renderable image that can be annotated 3. Selecting a features provides a database/literature mashup 4. That leads to new papers Databases vs Journals Take This Notion to its Logical Conclusion Data Clustering via the Literature & Databases Cardiac Disease Literature Immunology Literature Shared Function Databases vs Journals Let Us Look Even Farther Into the Future Consider the research contract Today’s Academic Workflow Reviews Curation Feds Research [Grants] Journal Article Publishers Poster Session Conference Paper Societies Community Service/Data Blogs 50 Conclusion • Scholarly output will come in more diverse forms and be solely in cyberspace and should be uniquely attributable • We need a DOI for people DOI’s for People A Unique Identifier is Going to Happen • Some scientists will resist • The winner is not clear yet: – – – – – OpenID MyBibliography ResearcherID ScopusID CrossReg I an Not a Scientist I am a Number PLoS Comp. Biol. 4(12) e1000247 What is the Role of the Publisher in this New World? Consider first the relationship between scientist and publisher today A paper when complete is thrown over a high wall to a publisher and essentially forgotten – Perhaps it is time to climb the wall? Scientist and Publisher uzar.wordpress.com Publishers as a Contractor for All Aspects of Scholarly Output Scientist Idea Scientist and Publisher Experiment Data Product Tomorrows Research Contract: Early Evidence • Publishers hubs: – Elsevier portals – PLoS collections • Data hubs • Open Access/open review e.g. Biology Direct • NIH Roadmap requires data be accessible • New Resources: – www.researchgate.net – Orwik Scientist and Publisher What Should We Be Doing As Scientists? • Encourage open science with the realization that there must be a business model • Examples: The Final Word – Publish in OA forums – Deposit data and software in open forums – Care what happens after publication Acknowledgements • BioLit Team – – – – – Lynn Fink Parker Williams Marco Martinez Rahul Chandran Greg Quinn • Microsoft Scholarly Communications – – – – – Pablo Fernicola Lee Dirks Savas Parastitidas Alex Wade Tony Hey http://biolit.ucsd.edu http//www.pdb.org http://www.codeplex.com/ucsdbiolit • wwPDB team • SciVee Team – Apryl Bailey http://www.scivee.tv – Tim Beck – – – – – – Leo Chalupa Lynn Fink Marc Friedman (CEO) Ken Liu Alex Ramos Willy Suwanto [email protected] Questions?