Subjects, Objects and Blogjects A Data Centric Approach Jeremy G. Frey

advertisement
Subjects, Objects and Blogjects
A Data Centric Approach
Jeremy G. Frey
School of Chemistry
University of Southampton
Seek improved
support for
collaborative
research
groups
Many Individual Researchers
Peer Review within research group
User Generated Content
Peer Review
Traditionally the
wider scale
collaboration is
mediated over space
and time via the
literature
Publication
Validation over time
Literature
Long Tail Science
Tablet of Stone
Permanent?
Readable?
Context?
Useful?
How long?
Recording & Exchanging Information
Context
The researches context is always much move
complex depending on discipline and culture
and many other factors
Data
item
In scientific experiments we strive to make the
context of the data unambiguous but in other areas
of research this is not possible
Scientific Data
from many
experiments
Many things can not be
recorded directly but
only via instrumentation
and computation
Problem 1
The Data Deluge
The Semantic Web community
is linking the data, but how is
this communicated through the
signs in the interface to the
mind of the scientist?
Problem 2
Maintaining &
Communicating context
Is a big IT system the
solution?
OneNote + Tablet PC
Integrated Centralised System
Searchable up to a point
HT-Project
• High Throughput Chemistry
• Combined spectroscopic and reaction
measurements at several sites
• Need management structure to be able to
assign access
• Need to track samples and metadata about
the samples and the experimental conditions
Oct 2007
Jeremy G. Frey, University of Southampton
E-Science
Roles
•
•
•
•
•
•
•
•
Project leader
Researcher on the project
Instrument Scientist
Commission the array sample
Make the array
Characterize the array
Make measurements on the array
Analysis of data
Repository
• Produce a robust secure repository
• Based on database and DBMS
• Distributed access and role based access
control
• Relatively easy upload and extraction of
conditions
• Search based on samples, conditions, people,
dates etc
But...
• No real system to discuss the data
“The internet wasn't created for mockery!
It was created so scientists from different
universities could share datasets....”
Simpson, H. The Simpsons (2005), Eds. Groening, M., Brooks, J.L. &
Simon, S., Series 16, Episode 8, Original air date (US) 06-Feb-2005.
http://www.tvtome.com/tvtome/servlet/GuidePageServlet/showid146/epid-346864/
The CombeChem Project
• ‘End to End’ linking
– Data (life-)cycle
• Do things ‘right’ at the start
– Make sure the metadata is of high
quality
– Record properly at source in
Digital Form
• Extensive provenance
– Publication@Source
• The Chemistry Lab
– People & Machines working
together
Data on Paper
Data on Computer
Word TeX
Paper
PDF
HTML
Versatile PDF
XML
Web
Semantic Web
Semantic Electronic Notebook
Semantic Tags
Web 2.0
Laboratory Blog Book
Tags
Electronic Laboratory Notebooks
Permanent,
documented
and primary
record of
laboratory
observations
Observations are never
collected on note pads,
filter paper or other
temporary paper for
later transfer into a
notebook
If you are caught using the
“scrap of paper” technique,
your improperly recorded data
may be confiscated by your TA
Data Sharing
Excerpted from the Onion:
The Recording Industry Association of America
announced Tuesday that it will be taking legal action
against anyone discovered telling friends, acquaintances,
or associates about new songs, artists, or albums.
"We are merely exercising our
right to defend our intellectual
properties from unauthorized
peer-to-peer notification of the
existence of copyrighted
material."
COSHH
Leverage off things we already have to
do
“We have a cunning plan”
Electronic
Laboratory
Notebooks
meta
He is charged with expressing contempt
for meta-data
Fluorinatedbiphenyl
Br11OCB
PotassiumCarbonate
Butanone
Dissolve4- AddK2CO3 Heat at reflux Cool andadd
flourinated powder
for 1.5hours Br11OCB
biphenyl in
butanone
0.9g
1.59g
2.07g
40ml
Plan
To Do
List
Ingredient List
Add
Add
Heat at
Cool andadd
refluxuntil water (30ml)
completion
Extract with Combineorganics,
DCM
dryover MgSO4&
(3x40ml) filter
Cool
Reflux
Add
Cool
Reflux
Liquidliquid
extraction
Add
Dry
Remove
solvent in
vacuo
Remove
Solvent
byRotary
Evaporation
Filter
Fusecompoundtosilica&
columninether/petrol
Column
Chromatography
Fuse
0.9031 grammes
Inorganicsdissolve2
layers. Addedbrine
~20ml.
3of 40
ml
g
excess
text
Ether/
Petrol
Ratio
image
Process
Record
Weigh
Butanonedriedviasilicacolumnand
measuredinto100mlRBflask.
Used1mlextrasolvent towashout
container.
Silica
Measure
Measure
Sampleof 4flourinated
biphenyl
Annotate
DCM
MgSO4
Annotate
Add
1
1
2
2
1
Reflux
Add
text
3
Cool
Annotate
Butanone
Sampleof
K2CO3
Powder
Measure
1
3
4
Add
Sampleof
Br11OCB
Weigh
5
2
Reflux
Weigh
6
2
4
7
Add
Cool
Water
8
9
10
Dry
Liquidliquid
extraction
Annotate
11
Filter
(Buchner)
Annotate
12
Remove
Solvent
byRotary
Evaporation
13
Fuse
14
Column
Chromatography
Measure
text
40
Startedrefluxat 13.30. (Hadto
changeheater stirrer) Onlyreflux
for 45min, next step14:15.
ml
2.0719
g
1.5918
g
30
ml
Organicsareyellow text
solution
Key
ObservationTypes
FutureQuestions
Process
weight - grammes
Whether tohavemanysubclassesof processesor fewer withannotations
Input
Literal
measure- ml, drops
Howtodepict destructiveprocesses
annotate- text
°
Howtodepict takinglotsof samples
temperature- K, C
Observation
What istheobservation/processboundary?e.g. MRI scan
WashedMgSO4with text
DCM~50ml
Combechem
30January2004
gvh, hrm,gms
Ingredient List
Fluorinated biphenyl
Br11OCB
Potassium Carbonate
Butanone
Dissolve 4flourinated
biphenyl in
butanone
0.9 g
1.59 g
2.07 g
40 ml
Add
Add K2CO3
powder
Add
0.9031
Heat at reflux
for 1.5 hours
Reflux
grammes
Weigh
Butanone dried via silica column and
measured into 100ml RB flask.
Used 1ml extra solvent to wash out
container.
Sample of 4flourinated
biphenyl
Annotate
Add
1
1
2
2
Add
1
3
Reflux
text
Annotate
Butanone
Sample of
K2CO3
Powder
Measure
Weigh
text
40
Started reflux at 13.30. (Had to
change heater stirrer) Only reflux
for 45min, next step 14:15.
ml
2.0719
g
Smart Tea Project - User Centred Design, Design by Analogy to ensure the
correct information is captured simply and easily.
Simple Context
Sensitive Interfaces
RDF plan/experiment model
• split experiment plan and instance
– allow myExperiment to create plans and moreTea to
create instances
• represent materials and steps as explicit instances
and then reference them in a linked list
– allow process chain patterns that can be semantically
searched basedon RDF graph
• associate URL's with experiments
– allow future smart lab linkage to hardware reports /
networked data stores / sensor reports / documents /
blogs
moreTea
• moreTea project scope
– Electronic Laboratory Notebook (ELN)
– Help Chemists plan experiments [outside lab]
• Specify experiment procedure
• Plan materials needed (type, quantity)
• COSH risk assessment
– Help Chemists record observations [inside lab]
• Display experiment plan as a reminder
• Record materials used (type, measurement)
• Record observations (hand drawn)
– Help Chemists write up and share experiments [outside lab]
• Print experiments for cut and paste into paper lab books
• Share electronically with other Chemists
• Backup in a database
Example experiment
urn:moretea:experiments:1
created = 2007-11-10T12:54:08.566Z
description = 3,3`,9-triethylbenzothiacarbocyanine Iodide
owner = CN=Jeremy Frey, OU=Chemistry, O=University of Southampton, L=Southampton,
ST=Hampshire, C=UK
material list = urn:moretea:material:1, urn:moretea:material:2
procedure first step = urn:moretea:step:1
• RDF model and triple store
– mySQL database with a table of RDF triples
– JBDC connection using JENA toolkit
– RDF graph for experiments
• Experiment
Example materials
– properties
urn:moretea:material:1
Example steps
created = 2007-11-10T12:54:08.566Z
– materials
(list of material URI’s)
urn:moretea:step:1
description = 3-ethyl-2-methylbenzothiazolium
iodide
–
procedure
first
step
created
=
2007-12-11T12:54:08.968Z
amount planned = 0.52
instructions
= Purge
glassware
with N2URI’s)
» materials
(list
of material
amount used = 0.52
list =
coshh notes = R36/37/38: Irritant by Material
all
» routes
description
of process
complete
= true
» procedure =next
step (URI)
observation-notes
http://uri-to-image-store
urn:moretea:material:2
created = 2007-11-10T12:54:08.566Znext-step = urn:moretea:step:2
• Material
description = Triethyl propinate
– properties
urn:moretea:step:2
amount planned = 0.85
created
= 2007-12-11T12:57:12.968Z
step
amount used = 1.0 • Procedure
instructions
Add
salt and heat to reflux
coshh notes = R10, 20/21/22: –
Flammable,
Irritant=by
allthiazolium
routes.
properties
material list = urn:moretea:material:2
…
– material
completelist
= true
observation-notes = thiazolium dissolved readily. RM turned purple
next-step = urn:moretea:step:3
…
Tea RDF
image store ID
hardware device ID
web URL …
moreTea Architecture
moreTea Security
• Security technology
– GRIA service-oriented architecture
• Developed by IT Innovation, open source LGPL
– Standard SAFE-compatible security
• PKI, X.509, HTTPS
– Role-based Access control
• SAML token support
• Experiments have group access rights
– [ Owner, Read, Read/Write ]
• Groups
– [ Organization, Dept, Project ]
• Chemists belong to groups
– Signed PDF experiment documents
Validation
• Increasing the value of
data
• How to bring all the
necessary information
together to enable
appropriate validation
• Increasingly difficult &
expensive to achieve
Need provenance and context otherwise just
a collection of items
Quantities, Symbols and Units
Have demonstrated how to handle units
within the semantic web framework
(RDF)
Data Sharing:
How to get started
Who can we call?
Validation
• Increasing the value of data
• How to bring all the necessary
information together to enable
appropriate validation
• Increasingly difficult &
expensive to achieve
Need provenance and context otherwise just a
collection of items
Quantities, Symbols and Units
Have demonstrated how to handle units
within the semantic web framework
(RDF)
• Plans in advance are
useful
• This is the way things are
supposed to be done
• The Plan provides a digital
context so increases the
value of planning
• Key to our ‘Smart Lab’
approach….
• But is it the best way?
Plans
Laboratory “Blogs”
• Laboratory notebook
is a Blog
• Encourage and
facilitate
collaboration
• Flexible
• Need a data
repositories behind
the Blog
– R4L
– E-Bank
• Web 2.0 but not
too many people
Lab Blog
How to get started
Who can we call?
Implementation of e-lab book
• Blog based format
• Purpose built engine
• Fully flexible system with
arbitrary metadata
• Full record of changes (not
currently easily accessible)
http://chemtools.chem.soton.ac.uk/projects/blog/ “Bio Blogs”
http://blogs.openwetware.org/scienceintheopen
Discussion
Implementation of e-lab book
• One post, one item
approach
• Procedures can be
tracked back to starting
materials (or forwards to
products) by clicking
through
• Aim to ultimately be
interpretable by machine
and human
Templates
LIVECOP LINK
<METADATA>
<TITLE>album09 jrh4880_19_competent_transformation_from_ligation</TITLE>
<SIZE_X>1300</SIZE_X>
<SIZE_Y>1026</SIZE_Y>
<THUMB_SRC>http://imgstore.chem.soton.ac.uk/albums/album09/jrh
4880_19_competent_transformation_from_ligation.thumb.jpg</THUM
B_SRC>
<PREVIEW_SRC>http://imgstore.chem.soton.ac.uk/albums/album09/j
rh4880_19_competent_transformation_from_ligation.sized.jpg</PR
EVIEW_SRC>
<PICTURE_URL>http://imgstore.chem.soton.ac.uk/album09/jrh4880_
19_competent_transformation_from_ligation</PICTURE_URL>
</METADATA>
Link to objects
Issues
•
•
•
•
•
The Physical World
Safety documentation
Patent/IP – sign-off
Trust
Will computers survive in
the laboratory?
Remember
we do have a
physical
world to keep
in sync
Time Line View
/
An rdf graph of posts and links between them rendered using
Welkin (simile.mit.edu/welkin)
Sortase Experiment
Map of the X-Ray Blog
(comments not shown)
Impact on researchers
•
•
•
•
•
Higher Quality Record
Easier Collaboration
Improved planning
Improved discussions
Efficiency gain in
production of
presentations/reports
• Change the nature of
Professor/Student
interactions
49
Meetings
Blog
But we don’t usually blog
the meetings
Influence on Meetings and
Discussions
• Enable geographically /
temporally separated
discussions
• Meeting preparation much
less of an imposition
• Posted material is discussed,
comparison with older
materials is easy
• Change from ‘can I look at
your data’ to ‘have you seen
my blog post’
51
Blog-jects
• Equipment become first class
members of the web
• Interacts well with Pub-Sub as
items are attached to topics,
topics relate the Bog items
• With automation this evolves to
a two-way communication
• Everything has a network
connection – research
equipment will catch up with
the fridge & other commodity
goods
Blog-jects
• Equipment become first
class members of the
web
• Interacts well with PubSub as items are
attached to topics, topics
relate the Bog items
• With automation this
evolves to a two-way
communication
• Live Copy essential
Comments and
Annotation
A picture worth a thousand words!
Chemists like to sketch!
Ecology of
Laboratory
Web 2.0 and
Semantic Web
notebook tools
MyExperiment
?
Lab Blog
Semantic ELN
55
MyExperiment
Plans and
Templates
Lab Blog
Data
Data Repositories
Processes
Semantic ELN
The ‘Scientific Blog’ to combine ELN & publication
Sharing Rich Media
Putting
the idea
of the
book into
action
Put your material for a book
chapter out on the web and ask for
comments and contributions!
How does (or
could or
should) this
relate to the
production of
research
papers?
Separating Data from Interpretations: A crystallography
example
Underlying
data
Intellect &
Interpretation
Access to ALL underlying data
eBank & eCrystals
Growing need for the global (virtual)
equivalent of the “Tea Room”
Semantic Web
The Semantic Web is an
extension of the current
Web in which
information and services
are given well-defined
meaning, better enabling
computers
and people to work
in cooperation
Free the data!
Free the services!
Free the people!
Qu ickT ime™ an d a
TIF F (U nco mpre sse d) d ecom pre sso r
are nee ded to s ee th is pi cture .
These are the same people – if we
can ‘talk’ to ourselves efficiently
over time then that is a good start
to be able to ‘talk’ to others
Information
Providers
Qu ickT ime™ an d a
TIF F (U nco mpre sse d) d ecom pre sso r
are nee ded to s ee th is pi cture .
Information
Consumers
Thanks
• RC UK, EPSRC, JISC for funding
• Colleagues and Students from the
Schools of Chemistry, Electronics
& Computer Science,
Mathematics
• IBM, Microsoft
• www.combechem.org
• www.ecrystals.soton.ac.uk
• chemtools.chem.soton.ac.uk
Qu ickT ime™ an d a
TIF F (U nco mpre sse d) d ecom pre sso r
are nee ded to s ee th is pi cture .
69
Download