The Chemical Knowledge Cycle ……and its ramifications for e-Science

advertisement
The Chemical
Knowledge Cycle
Comb-e-Chem
……and its ramifications for e-Science
Or the other way round
Jeremy Frey
School of Chemistry
University of Southampton, UK
Talk
Comb-e-Chem
The Comb-e-Chem Project
“Smart Lab”
“National Crystallography Service”
“Cluster Computing”
“Dissemination & Publication”
March 2004
•Comb-e-Chem Partners
•IBM
Comb-e-Chem
•IT
•Innovation
•NCS
•CCDC
•ECS
•Chemistry
•Stats
•Combi
•Centre
•Pfizer
•Bristol
•Chemistry
•GSK
•AZ
•Southampton
•IUPAC
•RSC
March 2004
People
Comb-e-Chem
Chemistry (Southampton & Bristol)
Mike Hursthouse, Chris Frampton, Jon Essex, Jeremy Frey, Guy
Orpen, Stephan Christensen, Thomas Gelbrich, Sam Peppe,
Hongchen Fu, Graham Tizard, Suzanna Ward, Lefteris Danos,
Jamie Robinson, Kieron Talyor, Chris Woods, Rob Gledhill
National Crystallography Service (NCS)
Simon Coles, Mark Light, Ann Bingham, Peter Horton
Electronics and Computer Science (Southampton)
Dave De Roure, Luck Moreau, Mike Luck, Hugo Mills, Graham
Smith, Simon Miles, Nicky Harding, Gareth Hughes, Nick
Humphries, monica schraefel, Terry Payne
It-Innovation (Southampton)
Mike Surridge, Ken Meacham, Steve Taylor, Daren Marvin
Statistics (Southampton)
Alan Welsh, Sue Lewis, Ralph Manson, Dave Woods
Rutherford Appleton Laboratory, Atlas Data Centre
IBM – Colin Bird, Syd Chapman
March 2004
Design
(statistics)
Comb-e-Chem
Experiments
Smart Labs
Plan
Access to data
CombeChem Data and Knowledge Cycle
High End-to-End Management
Throughput
Literature
measurement
Dissemination
Analysis
E-Bank
Statistics
Data
March 2004
Plans
Small set of
fixed plans
NCS
Variable plans,
written by chemist
(difficult!)
Tea
Ad-hoc, implied
by process
execution
SHG
A chemistry lab is a hostile environment
without much room to maneuver
what can be captured
captured automatically with sensors?
what must rely on manual annotation?
The chemist
The fume
cupboard
Competition
for space
very precise scales - but not connected to any recording device
Industrial support
Big block to publication@source: if it’s
not digital, it’s difficult to share
critical data
entry
By Making Tea!
Getting not just the what and how, but the why
Comb-e-Chem
Making Tea: design
elicitation through analogy
Developed and validated the
analogy with chemists
Gave us a way to ask
questions that would not
otherwise have been
possible
Let us maximize observation
Gave us repeatability
Derived rudiments of a
process model, too
Provided lingua franca with
chemists
March 2004
Pervasive Grid – “Smart Flight”
Comb-e-Chem
Tablet?
March 2004
Results
Comb-e-Chem
“I can go anywhere and its, like,
this is me and my data. It’s all
there! Bang!”
In real use,
chemists were able
to record their
experiments
After about ten
minutes of use,
they forgot about it
as a new thing, and
just used it
March 2004
Data model
Increasing detail
Comb-e-Chem
Plan
Intended actions:
guide to chemist,
or [later] workflow
Process record
Measurements
Processes
Annotations
Provenance record
Service invocations
Secure time-stamps
etc…
March 2004
Databases
Comb-e-Chem
Database will become the key method
of handling all data
Metadata must be generated at
inception and added as data traverses
the workflow
Version control, audit and backup
handled at the database level.
March 2004
Aparatus
PK
ApparatusID
Operator
Comb-e-Chem
PK
Rig Name
Laser Wavelength
Laser PulseRate
GateWidth
Sensitivity
PMTVoltage
Incedent Angle
OperatorID
Title
Surname
ForeName
PasswordHash
Position
Organisation Name
Phone
Email
Organisation
PK
OrganisationID
PK
SolutionID
Organisation Name
Address
ContactName
ContactPhone
ContactEmail
Website
FK1
FK2
FK3
FK4
SolventID
SoluteID
OperatorID
OrganisationID
Preparation Date
SoluteMass
SolventVolume
pH
pHControl
Notes
ChemicalD
CasNumber
CompoundName
Quantity
Supplier
Catalogue Number
LotNumber
PackDate
Purchase Date
Order Number
RMM
Purity
Notes
FK1
FK2
PK
FK1
FK3
FK4
RunID
Time
OperatorID
ApparatusID
SampleID
InputPolarisationAngle
OutputPolarisationAngle
Azimuthal Angle
Surface Pressure
MonoChromatorWavelength
MonoInputSlit
MonoOutputSlit
Sample Temperature
NumberLaserShots
IsBackground
DataBlob
Notes
Sample
PK
FK3
FK4
RunID
BkgID
RunData
Solution
Chemical
PK
Run/Bkg Link
SampleID
Notes
ChemicalD
SolutionID
Concentration
pH
March 2004
Live updates
(lab environment
End experiment trigger)
Client
Initiate Rserve run
Comb-e-Chem
Updates
Initiate Rserve run,
and
and finished notify decisions
Agent Server
Raw
Experimental
Data
Data Recall
and Update
End/start experiment/Run
Experiment Data
Logging PC
Web Server
Agent to listen for
end of experiments,
and auto trigger analysis
Data Recall
and Update
Changes in
Lab environment
Lab
Environment
Logger
Viewer Traffic
Rserve
MQBroker
Database Server
Periodic Backups
Smart(SHG)Lab
Data Flow
Processes
Broker Backup Agent
Also does recall function
Backup/Recall
Broker data
Broker Recall Agent
Experiment Data
SRB/ATLAS/Network
Backup server
Control Data
March 2004
Databases - Our experience
Comb-e-Chem
What do you do when the actual users keep
changing their mind?
Is a traditional relational database suitable?
Danger of re-enforcing scientific bias
against relational database for laboratory
data.
RDF & RDFS!
March 2004
Ingredient List
Comb-e-Chem
Fluorinated biphenyl
Br11OCB
Potassium Carbonate
Butanone
Dissolve 4flourinated
biphenyl in
butanone
0.9 g
1.59 g
2.07 g
40 ml
Add
Add K2CO3
powder
Add
0.9031
Heat at reflux
for 1.5 hours
Reflux
grammes
Weigh
Butanone dried via silica column and
measured into 100ml RB flask.
Used 1ml extra solvent to wash out
container.
Sample of 4flourinated
biphenyl
Annotate
Add
1
1
2
2
Add
1
3
Reflux
text
Annotate
Butanone
Sample of
K2CO3
Powder
Measure
Weigh
text
40
Started reflux at 13.30. (Had to
change heater stirrer) Only reflux
for 45min, next step 14:15.
ml
2.0719
g
March 2004
Ingredient List
Comb-e-Chem
Fluorinated biphenyl
Br11OCB
Potassium Carbonate
Butanone
Dissolve 4flourinated
biphenyl in
butanone
0.9 g
1.59 g
2.07 g
40 ml
Add
Add K2CO3
powder
Add
0.9031
Heat at reflux
for 1.5 hours
Reflux
grammes
Weigh
Butanone dried via silica column and
measured into 100ml RB flask.
Used 1ml extra solvent to wash out
container.
Sample of 4flourinated
biphenyl
Annotate
Add
1
1
2
2
Add
1
3
Reflux
text
Annotate
Butanone
Sample of
K2CO3
Powder
Measure
Weigh
text
40
Started reflux at 13.30. (Had to
change heater stirrer) Only reflux
for 45min, next step 14:15.
ml
2.0719
g
March 2004
Fluorinated biphenyl
Br11OCB
Potassium Carbonate
Butanone
Dissolve 4flourinated
biphenyl in
butanone
0.9 g
1.59 g
2.07 g
40 ml
Plan
To Do
List
Ingredient List
Add
Add K2CO3
powder
Heat at reflux
for 1.5 hours
Add
0.9031
Cool and add
Br11OCB
Heat at
reflux until
completion
Cool and add
water (30ml)
Extract with
DCM
(3x40ml)
Cool
Reflux
Add
Cool
Reflux
Liquidliquid
extraction
Add
Combine organics,
dry over MgSO4 &
filter
Dry
Remove
solvent in
vacuo
Remove
Solvent
by Rotary
Evaporation
Filter
Fuse compound to silica &
column in ether/petrol
Column
Chromatography
Fuse
grammes
Inorganics dissolve 2
layers. Added brine
~20ml.
3 of 40
g
excess
ml
text
Ether/
Petrol
Ratio
image
Process
Record
Weigh
Butanone dried via silica column and
measured into 100ml RB flask.
Used 1ml extra solvent to wash out
container.
Silica
Measure
Measure
Sample of 4flourinated
biphenyl
Annotate
DCM
MgSO4
Annotate
Add
1
1
2
2
1
Add
3
Cool
Reflux
text
Sample of
K2CO3
Powder
Measure
3
4
Add
Sample of
Br11OCB
Annotate
Butanone
1
Weigh
5
2
Reflux
Weigh
6
2
4
7
Add
Cool
Water
8
9
10
Dry
Liquidliquid
extraction
Annotate
11
Filter
(Buchner)
Annotate
12
Remove
Solvent
by Rotary
Evaporation
13
Fuse
14
Column
Chromatography
Measure
text
40
Started reflux at 13.30. (Had to
change heater stirrer) Only reflux
for 45min, next step 14:15.
ml
2.0719
g
1.5918
g
30
ml
Organics are yellow
solution
Key
Observation Types
Future Questions
Process
weight - grammes
Whether to have many subclasses of processes or fewer with annotations
Input
Literal
measure - ml, drops
How to depict destructive processes
annotate - text
°
How to depict taking lots of samples
temperature - K, C
Observation
What is the observation/process boundary? e.g. MRI scan
text
Washed MgSO4 with
DCM ~ 50ml
text
Combechem
30 January 2004
gvh, hrm, gms
Lessons
Comb-e-Chem
That we need two related ontologies
Plan – that are going to be done
Record – what was done
Not necessarily the same thing
Steps are added/repeated during the
experiment
Different annotations required for each
March 2004
Process Record Ontology
Comb-e-Chem
March 2004
NCS Grid Service
Architecture
Comb-e-Chem
March 2004
The “Grid Zone”
Comb-e-Chem
Security is fundamental
Who is using our experiments
Insulate them from each other and from
the rest of our institution
Process & Role based security
Use DMZ
This combination creates a “Grid Zone”
March 2004
Comb-e-Chem
March 2004
Comb-e-Chem
March 2004
Cluster Computation
Comb-e-Chem
Needed for Design of Experiments
Stats computationally intensive
Simulations
Protein dynamics
Clusters, Cycle Steeling
Schools engagement – e-Malaria
March 2004
Comb-e-Chem
•Combechem is compiling a large database of molecules. The
database contains the properties of these molecules, e.g. their
crystal structure or solvent accessible surface area (SASA). Some
of these properties are measured from experiment while others are
calculated from simulations run on the GRID.
•Molecule ID
•pKa
•SASA
•1CD34
•2.3
•Unknown
•1CD35
•1.3
•Unknown
•112
•1CD36
•Unknown
•36543
•96
•1CD37
•5.3
•58435
•78
•1CD38
•Unknown
•Unknown
•110
•Melting
Point
•58
•Comberobots continually scan the database for empty fields. They can
automatically submit simulations to calculate any unknown properties.
These simulations run on the GRID by stealing the spare cycles of
a
March 2004
heterogeneous network of computers.
Comb-e-Chem
The database of molecules can also be screened against
pharmaceutical protein targets. To do this accurately requires
knowledge of how the protein changes shape upon ligand binding.
We can use the GRID to investigate protein conformational change
via Replica Exchange simulations. Multiple simulations of the
protein are run in parallel, each running under a different condition,
e.g. temperature. Periodically the simulations running at
neighbouring temperatures are tested and swapped. This enables
simulations at high temperatures, where there is rapid
conformational change, to rain down to biologically relevant
temperatures where conformational change occurs more slowly.
HIV Protease
Nitrogen Regulatory Protein C
March 2004
Comb-e-Chem
March 2004
Dissemination & Publication
Comb-e-Chem
A different approach is required to
provide data to the community
The grid provides the necessary
medium
What & How do we want to make
available
March 2004
Comb-e-Chem
Publication@Source
Dissemination
Bibliography
Student
Journal
Professional
Body
Archive
Institution
Laboratory
March 2004
The Data Trail
Comb-e-Chem
Raw data
Workflow
Process
Model
Derive
Plot
Provenance
The graphical model of the workflow used as the front end of
a typical workflow enactor can also act as the navigation
tool for the provenance & publication.
March 2004
The need for xtl-Prints
Comb-e-Chem
100’s of
structures
National Crystallography
Service
How do we
disseminate?
March 2004
The need for xtl-Prints
Comb-e-Chem
Combechem
DATA
PUBLICATION
DISSEMINATION
Combichem
March 2004
Crystallographic e-Prints
JOURNAL
PUBLICATION
Comb-e-Chem
EBank
(World)
EBank
REPORT
STRUCTURE
REPORT
REPORT
(EPrint)
CIF
RESULTS
DATASET
(Contains
DATAFILES)
EPrint
(Local)
DERIVED
RAW
DATA INVESTIGATION
HOLDING
March 2004
Crystallographic e-Prints
Comb-e-Chem
March 2004
Direct access to data
Comb-e-Chem
 DERIVED
DATA
March 2004
Direct access to data
Comb-e-Chem
 RAW DATA
March 2004
Dolphin RDF Browser
Comb-e-Chem
RDF source and resource
Resource model
Schema model
Each statement
If literal
If resource
Display object
Add http request to that resource
Wh
en
req
uire
d
March 2004
Comb-e-Chem
SVG active graphics
March 2004
e-worries
Comb-e-Chem
WSRF
GTi
Must ensure this is
not a problem for
applications
March 2004
The Semiotic Web
Comb-e-Chem
Chemists use signs and symbols as much
as, if not more than words
Icons have a great significance – The
Periodic Table
People & Computers need to communicate
with each other as well as themselves
Need a more powerful (general) concept
than the semantic web & grid.
March 2004
Changing the way we work
Comb-e-Chem
E-Lab:
X-Ray
Crystallography
Samples
Quantum
Mechanical
Analysis
Data
Provenance
Authorship/
Submission
Samples
Laboratory
Processes
Laboratory
Processes
Structures
DB
E-Lab:
Combinatorial
Synthesis
Properties
Prediction
Data Mining,
QSAR, etc
E-Lab:
Properties
Measurement
Laboratory
Processes
Properties
DB
Design of
Experiment
Data Streaming
Visualisation
Agent Assistant
March 2004
Download