The Changing Landscape of Scholarly Communication as it Relates to the Biosciences

advertisement
The Changing Landscape of
Scholarly Communication as it
Relates to the Biosciences
Philip E. Bourne
University of California San Diego
pbourne@ucsd.edu
www.sdsc.edu/pb
Keck Center Research Conference October 29, 2009
Disclaimer
• I am not an information nor computer
scientist
• I got involved with the Public Library of
Science (PLoS) and subsequently the
promise of open access
• I co-founded a company, SciVee Inc., that
is attempting to leverage the perceived
changes in scholarly communication
• Every discipline is different – my views are
broadly drawn from the biosciences
Scholarly Communication
Group
• Can we improve the
way science is
disseminated and
comprehended?
• Through openness
can we increase the
number of people
interested in science?
Addressing these questions is
made easier since this is a
time of rapid change in a
traditionally conservative STM
market
Lets Start with a Few Drivers of
Change
1. You Have Been Very Busy!
In the 5 minutes I have been
talking so far ~50 papers have
been indexed by PubMed
Drivers of Change
2. You Cannot Possibly Read a
Fraction of the Papers You Should
Drivers of Change
Renear & Palmer 2009 Science 325:828-832
3. Your Are Scanning More Reading Less
Drivers of Change
Renear & Palmer 2009 Science 325:828-832
4. You place more emphasis
on writing and less on reading
driven by blogs, H-factors…
Drivers of Change
5. The Internet has Changed
Everything
• In 1993 there were very few electronic journals
by 2003 nearly all were on-line, by 2013 there
will be little or no paper
• Traditional publishers have only really achieved
an electronic print like experience – the power of
the medium is for the taking
• Web 2.0 has made us more open
• Web 3.0 will accelerate further change
Drivers of Change
What are the Responses to
This Change?
Some Responses to Change
(ack. Chicken and Egg Situation)
– STM publishers are worried
– Alternative business models have gained
ground – open access, hybrid models, open
review
– Scientific societies are worried
– In an electronic world, databases are
becoming more like journals and journals are
becoming more like databases
– New modes of knowledge and data access
are gaining some ground
Responses to Change
Lets Start with Open Access
I Believe Open Access IF Broadly
Accepted Could Profoundly Change
Scholarly Discourse
It remains a big IF
Open Access: Taking Full Advantage of the Content
PLoS Comp. Biol. 2008 4(3) e1000037
Responses to Change
Growth of PubMed Central
• Growing much more slowly that PubMed
• Compliance is an issue
Open Access
Open Access
(Creative Commons License)
1. All published materials available on-line
free to all (author pays model)
2. Unrestricted access to all published
material in various formats eg XML
provided attribution is given to the
original author(s)
3. Copyright remains with the author
Open Access
Open Access
(Creative Commons License)
1. All published materials available on-line
free to all (reader pays model)
2. Unrestricted access to all published
material in various formats eg XML
provided attribution is given to the
original author(s)
3. Copyright remains with the author
Open Access
Open Access: Taking Full Advantage of the Content
PLoS Comp. Biol. 2008 4(3) e1000037
Assuming Open Access
Takes Off What is Possible?
Mashups
Notion of traditional publications being
associated with podcasts and video
www.scivee.tv
Mashups – www.scivee.tv
Pubcast – Video Integrated
with the Full Text of the Paper
Pubcasts - A Unique Technology
Pubcasts - A Blend of Video, text, tables,
figures, PowerPoints, comments, ratings…
ALL SYNCHRONIZED FOR RAPID LEARNING
Don’t understand what
you are reading?
Click and have the
author pop-up and
explain it!
See the scientists and
the experiments
behind the research
papers and textbooks
Mashups – www.scivee.tv
Professional Profile
ICTP Trieste, December 2007
SciVee – Viral Projects
•
•
•
•
Sweetwater School District
“Postercasts”
Science video competitions
“CVcasts”
Mashups – www.scivee.tv
Postercasts
Mashups – www.scivee.tv
Assuming Open Access Takes
Off What Else is Possible?
Semantic Tagging
Post Processing the Literature with BioLit
Nucleic Acids Research 2008 36(S2) W385-389
http://biolit.ucsd.edu
Semantic Tagging
Semantic Tagging
ICTP Trieste, December 10, 2007
Semantic Tagging
27
This is Literature Post-processing
Better to Get the Authors Involved
• Authors are the absolute experts on the
content
• More effective distribution of labor
• Add metadata before the article enters the
publishing process
Semantic Tagging
Word 2007 Add-in for Authors
• Allows authors to add metadata as they write, before
they submit the manuscript
• Authors are assisted by automated term recognition
– OBO ontologies
– Database IDs
• Metadata are embedded directly into the manuscript
document via XML tags, OOXML format
– Open
– Machine-readable
• Open source, Microsoft Public License
Semantic Tagging
http://www.codeplex.com/ucsdbiolit
Word 2007 Add-in
Example of What it Looks Like - Ontologies
• Inline Recognition, Highlighting, and Mark-up
of Informative Terms
– A recognized term will have a dotted, purple underline
– Hovering generates a Smart Tag above the term
•
•
•
•
add mark-up for this term
ignore this term
view the term in the ontology browser
If a recognized term appears in more than one ontology, all
instances of that term will be listed
– Hovering over a marked-up term
• option to apply mark-up to all recognized instances of term
• stop recognizing a term
– Pass ontology terms back to provider
Semantic Tagging
Challenges
• Author use
– Familiarity with ontologies, terms
– Agreement between co-authors
• End-use of semantically enriched
manuscript
• Need to combine with NLM XML standard
Semantic Tagging
Challenges:
Author Use
IF one or more publishers fast
tracked a paper that had semantic
markup I would argue it would
catch on in no time
Semantic Tagging
What are Other Responses to
this Change?
Databases are becoming more
like journals and journals are
becoming more like databases
PLoS Comp. Biol. 2005 1(3) e34
Databases vs Journals
Journals are Becoming More Like
Databases and Databases are Becoming
More like Journals
Electronic
Supplements
Unstructured data are
submitted as supplements
Databases vs Journals
Biocuration
A great deal of money
is spent extracting from the
literature to structure in databases
Both are Under Stress
• PubMed contains
18,792,257 entries
• ~100,000 papers indexed
per month
• In Feb 2009:
– 67,406,898 interactive
searches were done
– 92,216,786 entries were
viewed
Databases vs Journals
• 1078 databases
reported in NAR 2008
• MetaBase
http://biodatabase.org
reports 2,651 entries
edited 12,587 times
Data as of April 14, 2009
Databases vs Journals
• Journals have a pretty
standardized interface
• Journals have a business
model
• The quality is declining as
numbers increase (?)
• Audience believes they
are sustainable
Databases vs Journals
• Efforts to make the
interfaces different!
• Little attempt at a
business model
compared to the Web 2.0
world
• Quality is increasing (?)
• Not well sustained
PLoS Comp. Biol. 2008. 4(7): e1000136
Databases vs Journals
• Read and write
• Web 2.0 influence eg
social bookmarking
• Read and write eg Wikis
• New services eg restful,
widgets, semantic tagging
• Use of rich media
• Crowd review
• New metrics
• Use of rich media
• Crowd review emerging
Databases vs Journals
PLoS Comp. Biol. 2008. 4(7): e1000136
If There is so Much Similarity
Lets Do Another Mashup!
Databases vs Journals
PLoS Comp. Biol. 2008. 4(7): e1000136
The Test Bed
http://www.plos.org/
http://www.pubmedcentral.nih.gov/
http://www.wwpdb.org/
39
Databases vs Journals
The World Wide Protein Data
Bank
http://www.wwpdb.org
Databases vs Journals
• The single worldwide
repository for data on the
structure of biological
macromolecules
• Free to all
• Paper not published
unless data are deposited
– strong data to literature
correspondence
• Highly structured data
conforming to an
extensive ontology
• DOI’s assigned to every
structure
A Note in Passing
• Structural biologists have been fervent about
making the data associated with their studies
freely available
• For the most part they do not think the same
way about the literature (knowledge) associated
with the data – they hand it over without a
second thought
• This latter point is true of scientists in general
• We will come back to this
Databases vs Journals
The PLoS/PMC Corpus – Under
the Hood
• Conforms well/partially to the NLM DTD –
little markup of content
• PMC – some PDFs !
• The lack of conformance will come back to
haunt us!
Databases vs Journals
The Database View
Context
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
The Literature View – Web 3.0?
Databases vs Journals
http://betastaging.rcsb.org
Take This Notion to its Logical
Conclusion
Enter PLoS iStructure
An interactive journal
Databases vs Journals
Data
Database
Knowledge
Knowledgebase
Data Only
Wikis
Datapacks
Journals
Annotation
Data +
Annotation
Data + Some
Annotation
Databases vs Journals
Data + Some
Annotation
+
Some
Integration
PLoS
iStructure
The Data – Knowledge Spectrum
The Knowledge and Data Cycle
0. Full text of PLoS papers stored
in a database
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
4.
1.
1. A link brings up figures
from the paper
3. A composite view of
journal and database
content results
3.
2.
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
The New Reader
Workflow
1. User clicks on
thumbnail
2. Metadata and a
webservices call
provide a renderable
image that can be
annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
Databases vs Journals
Take This Notion to its Logical Conclusion
Data Clustering via the Literature &
Databases
Cardiac Disease
Literature
Immunology Literature
Shared Function
Databases vs Journals
Let Us Look Even Farther Into
the Future
Consider the research contract
Today’s Academic Workflow
Reviews
Curation
Feds
Research
[Grants]
Journal
Article
Publishers
Poster
Session
Conference
Paper
Societies
Community Service/Data
Blogs
50
Conclusion
• Scholarly output will come in more diverse
forms and be solely in cyberspace and
should be uniquely attributable
• We need a DOI for people
DOI’s for People
A Unique Identifier is Going to
Happen
• Some scientists will
resist
• The winner is not
clear yet:
–
–
–
–
–
OpenID
MyBibliography
ResearcherID
ScopusID
CrossReg
I an Not a Scientist I am a Number
PLoS Comp. Biol. 4(12) e1000247
What is the Role of the
Publisher in this New World?
Consider first the relationship
between scientist and publisher
today
A paper when
complete is
thrown over a
high wall to a
publisher and
essentially
forgotten –
Perhaps it is time
to climb the wall?
Scientist and Publisher
uzar.wordpress.com
Publishers as a Contractor for All
Aspects of Scholarly Output
Scientist
Idea
Scientist and Publisher
Experiment
Data
Product
Tomorrows Research Contract: Early
Evidence
• Publishers hubs:
– Elsevier portals
– PLoS collections
• Data hubs
• Open Access/open review e.g. Biology
Direct
• NIH Roadmap requires data be accessible
• New Resources:
– www.researchgate.net
– Orwik
Scientist and Publisher
What Should We Be Doing As
Scientists?
• Encourage open
science with the
realization that there
must be a business
model
• Examples:
The Final Word
– Publish in OA forums
– Deposit data and
software in open
forums
– Care what happens
after publication
Acknowledgements
• BioLit Team
–
–
–
–
–
Lynn Fink
Parker Williams
Marco Martinez
Rahul Chandran
Greg Quinn
• Microsoft Scholarly
Communications
–
–
–
–
–
Pablo Fernicola
Lee Dirks
Savas Parastitidas
Alex Wade
Tony Hey
http://biolit.ucsd.edu
http//www.pdb.org
http://www.codeplex.com/ucsdbiolit
• wwPDB team
• SciVee Team
– Apryl Bailey
http://www.scivee.tv
– Tim Beck
–
–
–
–
–
–
Leo Chalupa
Lynn Fink
Marc Friedman (CEO)
Ken Liu
Alex Ramos
Willy Suwanto
pbourne@ucsd.edu
Questions?
Download