Subjects, Objects and Blogjects A Data Centric Approach Jeremy G. Frey School of Chemistry University of Southampton Seek improved support for collaborative research groups Many Individual Researchers Peer Review within research group User Generated Content Peer Review Traditionally the wider scale collaboration is mediated over space and time via the literature Publication Validation over time Literature Long Tail Science Tablet of Stone Permanent? Readable? Context? Useful? How long? Recording & Exchanging Information Context The researches context is always much move complex depending on discipline and culture and many other factors Data item In scientific experiments we strive to make the context of the data unambiguous but in other areas of research this is not possible Scientific Data from many experiments Many things can not be recorded directly but only via instrumentation and computation Problem 1 The Data Deluge The Semantic Web community is linking the data, but how is this communicated through the signs in the interface to the mind of the scientist? Problem 2 Maintaining & Communicating context Is a big IT system the solution? OneNote + Tablet PC Integrated Centralised System Searchable up to a point HT-Project • High Throughput Chemistry • Combined spectroscopic and reaction measurements at several sites • Need management structure to be able to assign access • Need to track samples and metadata about the samples and the experimental conditions Oct 2007 Jeremy G. Frey, University of Southampton E-Science Roles • • • • • • • • Project leader Researcher on the project Instrument Scientist Commission the array sample Make the array Characterize the array Make measurements on the array Analysis of data Repository • Produce a robust secure repository • Based on database and DBMS • Distributed access and role based access control • Relatively easy upload and extraction of conditions • Search based on samples, conditions, people, dates etc But... • No real system to discuss the data “The internet wasn't created for mockery! It was created so scientists from different universities could share datasets....” Simpson, H. The Simpsons (2005), Eds. Groening, M., Brooks, J.L. & Simon, S., Series 16, Episode 8, Original air date (US) 06-Feb-2005. http://www.tvtome.com/tvtome/servlet/GuidePageServlet/showid146/epid-346864/ The CombeChem Project • ‘End to End’ linking – Data (life-)cycle • Do things ‘right’ at the start – Make sure the metadata is of high quality – Record properly at source in Digital Form • Extensive provenance – Publication@Source • The Chemistry Lab – People & Machines working together Data on Paper Data on Computer Word TeX Paper PDF HTML Versatile PDF XML Web Semantic Web Semantic Electronic Notebook Semantic Tags Web 2.0 Laboratory Blog Book Tags Electronic Laboratory Notebooks Permanent, documented and primary record of laboratory observations Observations are never collected on note pads, filter paper or other temporary paper for later transfer into a notebook If you are caught using the “scrap of paper” technique, your improperly recorded data may be confiscated by your TA Data Sharing Excerpted from the Onion: The Recording Industry Association of America announced Tuesday that it will be taking legal action against anyone discovered telling friends, acquaintances, or associates about new songs, artists, or albums. "We are merely exercising our right to defend our intellectual properties from unauthorized peer-to-peer notification of the existence of copyrighted material." COSHH Leverage off things we already have to do “We have a cunning plan” Electronic Laboratory Notebooks meta He is charged with expressing contempt for meta-data Fluorinatedbiphenyl Br11OCB PotassiumCarbonate Butanone Dissolve4- AddK2CO3 Heat at reflux Cool andadd flourinated powder for 1.5hours Br11OCB biphenyl in butanone 0.9g 1.59g 2.07g 40ml Plan To Do List Ingredient List Add Add Heat at Cool andadd refluxuntil water (30ml) completion Extract with Combineorganics, DCM dryover MgSO4& (3x40ml) filter Cool Reflux Add Cool Reflux Liquidliquid extraction Add Dry Remove solvent in vacuo Remove Solvent byRotary Evaporation Filter Fusecompoundtosilica& columninether/petrol Column Chromatography Fuse 0.9031 grammes Inorganicsdissolve2 layers. Addedbrine ~20ml. 3of 40 ml g excess text Ether/ Petrol Ratio image Process Record Weigh Butanonedriedviasilicacolumnand measuredinto100mlRBflask. Used1mlextrasolvent towashout container. Silica Measure Measure Sampleof 4flourinated biphenyl Annotate DCM MgSO4 Annotate Add 1 1 2 2 1 Reflux Add text 3 Cool Annotate Butanone Sampleof K2CO3 Powder Measure 1 3 4 Add Sampleof Br11OCB Weigh 5 2 Reflux Weigh 6 2 4 7 Add Cool Water 8 9 10 Dry Liquidliquid extraction Annotate 11 Filter (Buchner) Annotate 12 Remove Solvent byRotary Evaporation 13 Fuse 14 Column Chromatography Measure text 40 Startedrefluxat 13.30. (Hadto changeheater stirrer) Onlyreflux for 45min, next step14:15. ml 2.0719 g 1.5918 g 30 ml Organicsareyellow text solution Key ObservationTypes FutureQuestions Process weight - grammes Whether tohavemanysubclassesof processesor fewer withannotations Input Literal measure- ml, drops Howtodepict destructiveprocesses annotate- text ° Howtodepict takinglotsof samples temperature- K, C Observation What istheobservation/processboundary?e.g. MRI scan WashedMgSO4with text DCM~50ml Combechem 30January2004 gvh, hrm,gms Ingredient List Fluorinated biphenyl Br11OCB Potassium Carbonate Butanone Dissolve 4flourinated biphenyl in butanone 0.9 g 1.59 g 2.07 g 40 ml Add Add K2CO3 powder Add 0.9031 Heat at reflux for 1.5 hours Reflux grammes Weigh Butanone dried via silica column and measured into 100ml RB flask. Used 1ml extra solvent to wash out container. Sample of 4flourinated biphenyl Annotate Add 1 1 2 2 Add 1 3 Reflux text Annotate Butanone Sample of K2CO3 Powder Measure Weigh text 40 Started reflux at 13.30. (Had to change heater stirrer) Only reflux for 45min, next step 14:15. ml 2.0719 g Smart Tea Project - User Centred Design, Design by Analogy to ensure the correct information is captured simply and easily. Simple Context Sensitive Interfaces RDF plan/experiment model • split experiment plan and instance – allow myExperiment to create plans and moreTea to create instances • represent materials and steps as explicit instances and then reference them in a linked list – allow process chain patterns that can be semantically searched basedon RDF graph • associate URL's with experiments – allow future smart lab linkage to hardware reports / networked data stores / sensor reports / documents / blogs moreTea • moreTea project scope – Electronic Laboratory Notebook (ELN) – Help Chemists plan experiments [outside lab] • Specify experiment procedure • Plan materials needed (type, quantity) • COSH risk assessment – Help Chemists record observations [inside lab] • Display experiment plan as a reminder • Record materials used (type, measurement) • Record observations (hand drawn) – Help Chemists write up and share experiments [outside lab] • Print experiments for cut and paste into paper lab books • Share electronically with other Chemists • Backup in a database Example experiment urn:moretea:experiments:1 created = 2007-11-10T12:54:08.566Z description = 3,3`,9-triethylbenzothiacarbocyanine Iodide owner = CN=Jeremy Frey, OU=Chemistry, O=University of Southampton, L=Southampton, ST=Hampshire, C=UK material list = urn:moretea:material:1, urn:moretea:material:2 procedure first step = urn:moretea:step:1 • RDF model and triple store – mySQL database with a table of RDF triples – JBDC connection using JENA toolkit – RDF graph for experiments • Experiment Example materials – properties urn:moretea:material:1 Example steps created = 2007-11-10T12:54:08.566Z – materials (list of material URI’s) urn:moretea:step:1 description = 3-ethyl-2-methylbenzothiazolium iodide – procedure first step created = 2007-12-11T12:54:08.968Z amount planned = 0.52 instructions = Purge glassware with N2URI’s) » materials (list of material amount used = 0.52 list = coshh notes = R36/37/38: Irritant by Material all » routes description of process complete = true » procedure =next step (URI) observation-notes http://uri-to-image-store urn:moretea:material:2 created = 2007-11-10T12:54:08.566Znext-step = urn:moretea:step:2 • Material description = Triethyl propinate – properties urn:moretea:step:2 amount planned = 0.85 created = 2007-12-11T12:57:12.968Z step amount used = 1.0 • Procedure instructions Add salt and heat to reflux coshh notes = R10, 20/21/22: – Flammable, Irritant=by allthiazolium routes. properties material list = urn:moretea:material:2 … – material completelist = true observation-notes = thiazolium dissolved readily. RM turned purple next-step = urn:moretea:step:3 … Tea RDF image store ID hardware device ID web URL … moreTea Architecture moreTea Security • Security technology – GRIA service-oriented architecture • Developed by IT Innovation, open source LGPL – Standard SAFE-compatible security • PKI, X.509, HTTPS – Role-based Access control • SAML token support • Experiments have group access rights – [ Owner, Read, Read/Write ] • Groups – [ Organization, Dept, Project ] • Chemists belong to groups – Signed PDF experiment documents Validation • Increasing the value of data • How to bring all the necessary information together to enable appropriate validation • Increasingly difficult & expensive to achieve Need provenance and context otherwise just a collection of items Quantities, Symbols and Units Have demonstrated how to handle units within the semantic web framework (RDF) Data Sharing: How to get started Who can we call? Validation • Increasing the value of data • How to bring all the necessary information together to enable appropriate validation • Increasingly difficult & expensive to achieve Need provenance and context otherwise just a collection of items Quantities, Symbols and Units Have demonstrated how to handle units within the semantic web framework (RDF) • Plans in advance are useful • This is the way things are supposed to be done • The Plan provides a digital context so increases the value of planning • Key to our ‘Smart Lab’ approach…. • But is it the best way? Plans Laboratory “Blogs” • Laboratory notebook is a Blog • Encourage and facilitate collaboration • Flexible • Need a data repositories behind the Blog – R4L – E-Bank • Web 2.0 but not too many people Lab Blog How to get started Who can we call? Implementation of e-lab book • Blog based format • Purpose built engine • Fully flexible system with arbitrary metadata • Full record of changes (not currently easily accessible) http://chemtools.chem.soton.ac.uk/projects/blog/ “Bio Blogs” http://blogs.openwetware.org/scienceintheopen Discussion Implementation of e-lab book • One post, one item approach • Procedures can be tracked back to starting materials (or forwards to products) by clicking through • Aim to ultimately be interpretable by machine and human Templates LIVECOP LINK <METADATA> <TITLE>album09 jrh4880_19_competent_transformation_from_ligation</TITLE> <SIZE_X>1300</SIZE_X> <SIZE_Y>1026</SIZE_Y> <THUMB_SRC>http://imgstore.chem.soton.ac.uk/albums/album09/jrh 4880_19_competent_transformation_from_ligation.thumb.jpg</THUM B_SRC> <PREVIEW_SRC>http://imgstore.chem.soton.ac.uk/albums/album09/j rh4880_19_competent_transformation_from_ligation.sized.jpg</PR EVIEW_SRC> <PICTURE_URL>http://imgstore.chem.soton.ac.uk/album09/jrh4880_ 19_competent_transformation_from_ligation</PICTURE_URL> </METADATA> Link to objects Issues • • • • • The Physical World Safety documentation Patent/IP – sign-off Trust Will computers survive in the laboratory? Remember we do have a physical world to keep in sync Time Line View / An rdf graph of posts and links between them rendered using Welkin (simile.mit.edu/welkin) Sortase Experiment Map of the X-Ray Blog (comments not shown) Impact on researchers • • • • • Higher Quality Record Easier Collaboration Improved planning Improved discussions Efficiency gain in production of presentations/reports • Change the nature of Professor/Student interactions 49 Meetings Blog But we don’t usually blog the meetings Influence on Meetings and Discussions • Enable geographically / temporally separated discussions • Meeting preparation much less of an imposition • Posted material is discussed, comparison with older materials is easy • Change from ‘can I look at your data’ to ‘have you seen my blog post’ 51 Blog-jects • Equipment become first class members of the web • Interacts well with Pub-Sub as items are attached to topics, topics relate the Bog items • With automation this evolves to a two-way communication • Everything has a network connection – research equipment will catch up with the fridge & other commodity goods Blog-jects • Equipment become first class members of the web • Interacts well with PubSub as items are attached to topics, topics relate the Bog items • With automation this evolves to a two-way communication • Live Copy essential Comments and Annotation A picture worth a thousand words! Chemists like to sketch! Ecology of Laboratory Web 2.0 and Semantic Web notebook tools MyExperiment ? Lab Blog Semantic ELN 55 MyExperiment Plans and Templates Lab Blog Data Data Repositories Processes Semantic ELN The ‘Scientific Blog’ to combine ELN & publication Sharing Rich Media Putting the idea of the book into action Put your material for a book chapter out on the web and ask for comments and contributions! How does (or could or should) this relate to the production of research papers? Separating Data from Interpretations: A crystallography example Underlying data Intellect & Interpretation Access to ALL underlying data eBank & eCrystals Growing need for the global (virtual) equivalent of the “Tea Room” Semantic Web The Semantic Web is an extension of the current Web in which information and services are given well-defined meaning, better enabling computers and people to work in cooperation Free the data! Free the services! Free the people! Qu ickT ime™ an d a TIF F (U nco mpre sse d) d ecom pre sso r are nee ded to s ee th is pi cture . These are the same people – if we can ‘talk’ to ourselves efficiently over time then that is a good start to be able to ‘talk’ to others Information Providers Qu ickT ime™ an d a TIF F (U nco mpre sse d) d ecom pre sso r are nee ded to s ee th is pi cture . Information Consumers Thanks • RC UK, EPSRC, JISC for funding • Colleagues and Students from the Schools of Chemistry, Electronics & Computer Science, Mathematics • IBM, Microsoft • www.combechem.org • www.ecrystals.soton.ac.uk • chemtools.chem.soton.ac.uk Qu ickT ime™ an d a TIF F (U nco mpre sse d) d ecom pre sso r are nee ded to s ee th is pi cture . 69