Science and the Web Simon Cockell (@sjcockell) Bioinformatics Support Unit (@nclbsu) http://bsu.ncl.ac.uk/ simon.cockell@newcastle.ac.uk In the bad old days • There was the internet • But there was no web • No Facebook • No iPlayer • No Spotify • No Google • But we still managed to use it for awesome science • Yes, this really was the view of the world’s sequence databases that we used to have! But we had social resources • Email newsgroups • MUDs (kind of like a text only Second Life) • Usenet Newsgroups • A kind of forerunner of today's message boards • Often un-moderated • Full of spam • All the science stuff of interest was under bio.* or sci.* How the web was won • The internet has been designed by scientists for scientists pretty much since inception • The internet ‘as we know it today’ arrived in the 1990s • OK that’s just not true, but it’s what most people associate with the internet • Aka ‘The Web’ And this WAS about science • (Sir) Tim Berners-Lee is a scientist • The World-Wide Web (W3) was developed to be a pool of human knowledge, which would allow collaborators in remote sites to share their ideas and all aspects of a common project • Data, and pages could be linked together for the first time with images, sounds, text • As the data grew, it needed to be searched • Search engines collated the data And so Web 1.0 began • We got used to: • Online shopping • Having a homepage • Getting our news online • Guestbooks on websites • Animated GIFs • Wonderful colourschemes: So wtf is Web 2.0? • 20% marketing nonsense to attract investors • 20% buzzwords designed to annoy old people • ‘mashup’ etc • 10% missing vowels & capitalisation • flickr, tumblr etc • 50% useful • Specifically refers to ‘user generated content’ • High availability and access to data • Networks of people and network effects • Openness • Tapping into the ‘wisdom of the crowds’ Concepts • One of the social concepts most people are familiar with now is tagging: • And the ubiquitous upvote: How can we take this out of Facebook and into science? Tagging is very useful • Search and retrieval tool • Can be applied to just about anything • Browser bookmarks • Uploaded YouTube videos • Academic Papers • Uploaded Powerpoint presentations • And most importantly they can be SHARED with other users • We call these tags ‘folksonomies’ • “a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content” Social bookmarking • If you’re going to share your folksonomies, why not share your bookmarks too? • Maybe you don’t want to share everything you have bookmarked in your browser… • But what about the stuff that’s related to work? • Many services exist for this, but the one that most people know about is http://delicious.com/ • See also: • diigo.com • pintrest.com delicious.com Scientific Literature • Scientific papers are as amenable to bookmarking as websites • Needs specialized service http://www.nature.com/nm/journal/vaop/ncurrent/full/nm.2692.html http://pubmed.org/22522561 http://dx.doi.org/10.1038/nm.2692 CiteULike • A social bookmarking service for scientific literature • Exactly like Delicious, but understands that different URLs can point to the same resource • Allows you to exploit “network effects” • Someone who reads lots of the same papers you do is probably interested in the same things • They may have papers in their library you don’t know about • You may be interested in those papers too CiteULike Mendeley • Similar service to CiteULike • Bookmark papers • Define folksonomy • Discover related literature • Also – • Manages PDFs • Manages citations (plugins for Word etc) See also: Zotero (http://zotero.org) Papers (http://mekentosj.com/papers) Mendeley - App Mendeley - Web Mendeley - Citations We have a data overload • How can you define what is important? • And how can you deal with it subsequently? Knowledge Discovery • No, not via Google • One of the issues of information overload is having to go to multiple sites in order to collate the day to day information you might want to read • This is already a solved problem • RSS • ‘Really Simple Syndication’ RSS Who has RSS? • You can have RSS feeds from • Journal publishers • Search results (PubMed) • Blogs • CiteULike, etc. • Actually just about any dynamically updated web resource. • RSS is everywhere • Pervasive, Useful, Centralised, Shareable FOAF and Network Effects • Already mentioned re: CiteULike • FOAF = Friend Of A Friend • Requires a network (obviously) • Can be of things (ie papers) or people • Have been loads of attempts to build ‘Facebook for Scientists’ • Begs the question – how about Facebook? On the (awesome) power of Twitter Q&A • Concept dates back to internet dark ages • ‘Usenet’ groups, revolved around specific subject areas • With the birth of the web, ‘forums’ proliferated • Usability nightmare, hard to navigate, harder to search • Then came expertsexchange.com • That’s Experts Exchange dot com, not Expert Sex Change dot com • Pay people to answer specific queries • Results in *bad* content (people care about the money, not the content) • Then came stackoverflow.com... Stack Overflow Stack Exchange biology.stackexchange.com academia.stackexchange.com An example - BioStar Half an hour later... Online Collaboration (crowd sourcing) Sharing your work • Everything so far has been about sharing knowledge • Things you read • Things you’ve seen • Things you know • Some people go further and share all their work • So-called ‘Open Science’ • While this is an extreme, it can have benefits • Some aspects of open science are becoming mainstream • Open Access publishing • Data sharing (now required by many funding bodies) #arseniclife #arseniclife If this data was presented by a PhD student at their committee meeting, I'd send them back to the bench to do more cleanup and controls. #arseniclife Open Access publishing • Your work is not behind a 'paywall' • It can be very frustrating trying to get hold of papers from journals or publishers that the University does not have a site licence for • Your work can be more widely read • Hard to argue that this is not desirable! • The full text of your article is preserved and can even be analysed computationally to derive even more knowledge Increasing pressure to share data BBSRC expects research data generated as a result of BBSRC support to be made available with as few restrictions as possible in a timely and responsible manner to the scientific community for subsequent research. Applicants should make use of existing standards for data collection and management and make data available through existing community resources or databases where possible. http://www.bbsrc.ac.uk/organisation/policies/position/policy/data-sharing-policy.aspx Data Sharing • Many services for specific data types • GEO/ArrayExpress for microarrays • PRIDE for proteomics • SRA for high-throughput sequencing • Also now facilities to share more generic research data: • FigShare FigShare Why share? • Well you might as well ask • Why go to a conference? • “At a conference the most important things happen in the coffee break” – Hans Ulrich Obrist • Why talk to colleagues? • The internet doesn’t have to be a distraction, it can be an extension of your peer group. A place to find and exchange relevant information, build your profile and do your job more efficiently Your online “brand” • In this day and age a future employer is going to Google • • • • • you Own what Google “says” about you What you say and do in public forums is a matter of public record - it’s ok to engage and discuss, but be wary of getting mad Be consistent and professional in your online ‘persona’ You might want to have a look at what your Facebook profile exposes to the world - it may be more than you think LinkedIn is a great place for an online CV and networking Things we haven’t had time for... • Loads more useful sites and services: • Nature Network • OpenWetWare • LinkedIn • SlideShare • Prezi • BioScreenCast • Google Plus/Groups/Docs • DropBox • ResearchBlogging