Talk - European Bioinformatics Institute

advertisement
Data Management and
Protein Production – from
Lab Notebooks to LIMS
Robert Esnouf
Oxford Protein Production Facility,
University of Oxford…
…and the PIMS development team
EBI, 23/9/2008
Introductory presentation
■
■
■
■
■
What is information management?
What types of laboratory process are there?
What information should be recorded?
Where can LIMS benefit protein scientists?
What are the potential pitfalls?
■ Introduction to PIMS concepts
■ Linking data upstream and downstream
 Target selection and crystallization
What is information management?
■ A way of storing information so that it can
be retrieved later:
■ Mechanism varies from human memory
(surprisingly common) to sophisticated
relational database systems
■ Purpose of retrieval can include supporting the
next experiment, sharing data with
collaborators, publication and depositing data
■ In a laboratory setting information management
is a branch of bioinformatics
■ Automated experiments may require electronic
information management
Types of information management
■ Paper-based records
■ Well suited to independent research
■ Long-term archive
■ Electronic Laboratory Notebooks (ELN)
■ Central repository of information
■ Electronic version of paper systems
■ Laboratory Information Management
Systems (LIMS)
■ Relational database
■ Model for laboratory processes
■ Snapshot of current state of laboratory
What information could be recorded?
■ Progress (target) tracking
■ Records status of a project
■ Progress through a (usually) linear workflow
■ Sample tracking
■ Where is a sample (which fridge, freezer etc.)?
■ Exchange of samples between labs
■ User auditing
■ Who did what and when?
■ Who authorized/checked work?
■ What is a sample?
■ What experiments have been done?
What is a sample?
■ Is it a reagent?
■ Which batch number was it?
■ Does it have use-by date?
■ How much is left?
■ What target does it “belong” to?
■ Does it belong to multiple targets (a complex)?
■ What is in the sample?
■ Is it a single entity?
■ Is it a mixture or a complex?
■ Record a full chemical description or how to
recreate sample?
What is an experiment?
■ Is it part of a fixed workflow?
■ Suited to repetitive, well defined work
■ Maps directly onto progress tracking
■ Workflow should be relatively invariant
■ Is it an isolated process?
■ Take input sample(s)
■ Work on them
■ Record results
■ Produce output sample(s)
■ No concept of a workflow
(Electronic) laboratory notebooks
■ A store for unformatted information
■ Easy to enter data
■ Fairly easy for single user to retrieve data
■ Difficult for others to retrieve data
■ Difficult to search for data
■ Difficult to share non-electronic data
■ Most data organized chronologically
■ Single projects can become fragmented
■ Data associated temporally
■ Non-electronic data can prove date of
discovery
Blurring ELN and LIMS
■ Linking samples and targets
■ Sharing data
■ Controlled vocabulary
■ E. coli not Escherichia coli
■ Sodium chloride not NaCl
■ Fixed descriptions of protocols
■ Moving towards the LIMS relational
database
■ Model for laboratory workflows and processes
Benefits of LIMS
■ Distributed projects
■ Information can be accessed anywhere
■ Collaborative projects
■ Different people record into same store
■ Miniaturized projects
■ Labelling of samples becomes impossible
■ Automated projects
■ Handling layouts in plates etc.
■ High-throughput processes
■ System managed by computer with automated
sample tracking
Potential pitfalls of LIMS
■ Data loss
■ Hardware failure – managable
■ Data corruption – potentially catastrophic
■ Data integrity
■ Data need to be described properly
■ LIMS can default to being ELN
■ Extra burden of recording data
■ Takes time for no immediate benefit
■ Need easy and intuitive input – risk of sloppiness
■ Compliance
■ Unrecorded data are lost
■ Incomplete data may break data “chain”
Recording Gateway cloning protocols
■ Example 96 constructs through PCR, Gateway cloning &
expression screening with 2 cell lines & 2 protocols:
■ Uses 34 96-well plates and 36 24-well plates and generates 480
images of colony wells, 1536 lanes on agarose gels and 416
lanes on SDS–PAGE gels
Recording Gateway cloning protocols
■ Example 96 constructs through PCR, Gateway cloning &
expression screening with 2 cell lines & 2 protocols:
■ Uses 34 96-well plates and 36 24-well plates and generates 480
images of colony wells, 1536 lanes on agarose gels and 416
lanes on SDS–PAGE gels
What is PIMS?
■ BBSRC SPoRT funded two consortia:
■ Scottish Structural Proteomics Facility (SSPF)
■ Membrane Protein Structure Initiative (MPSI)
■ PIMS is funded to develop a laboratory
information management system (LIMS):
■
■
■
■
■
Funded by the BBSRC SPoRT initiative
Funding Jan 2005 – Dec 2009
Supports SSPF, MPSI, OPPF & YSBL
Developers in Daresbury, EBI, OPPF & YSBL
Support from OPPF, Dundee & Daresbury
■ http://www.pims-lims.org/
Why develop PIMS?
■ Longstanding need for rational data
management for protein production
■ Complex, ever-changing workflow
■ To exploit higher throughput
■ To aid collaboration and make data public
■ Academic LIMS (and industrial?)
■ LISA, HalX, SESAME, MOLE, (Beehive)
■ Specific to one site, hard to maintain
■ PIMS is a collaborative effort to find a
common solution
■ Most laboratories have some similar processes
■ All have some unique processes
■ PIMS is fully featured LIMS, not target tracking
Some ancient history
■ Starts with 2001 Airlie House agreement
■ To share protein production data
■ TargetDB is limited implementation
■ Detailed specification of terms
■ European-based projects
■
■
■
■
eHTPX: data exchange models (dictionaries)
HAL & HALX: LIMS concentrating on workflow
MOLE: LIMS built on generic data model
SPINE encouraged collaboration
■ Loose PIMS consortium formed to seek a
common solution
■ BBSRC SPoRT provided funding opportunity
Technologies used
■ PIMS is used from a web browser
■ Mozilla Firefox or Internet Explorer
■ No client software to install (perhaps plugins)
■ Windows, Macintosh and Linux clients
■ PIMS requires a web and database server
■
■
■
■
■
Typically the same machine
Web server Apache Tomcat
Development on free PostgreSQL
Now available for Oracle
Windows and Linux servers
■ Technologies used by developers
■ Java1.5, Hibernate, JUnit, BioJava, dot, batik,
AJAX, ...
Aspects of application development
Crystallographic
applications
Graphics
applications
Robotics
LIMS
Basic concepts of PIMS
PIMS uses a few simple key concepts which can be
linked together to model complex workflows
Targets
■ Description of sequences, store annotations
Constructs
■ Starting points for real experiments, link to targets
Samples
■ Tracked samples made & used by experiments
■ Samples have types, owners, locations etc.
Experiments
■ Take one (or more samples), produce new
sample(s) as outputs
Experiments and protocols
A protocol is a reusable user-defined template
describing what you record for your experiments.
Parameters
■ Numerical values, free text values, T/F. E.g.
incubation temperature or the number of PCR cycles;
details of incubation conditions; was reagent added?
Input Samples
■ Samples or reagents used when performing an
experiment that you wish to track
Output Samples
■ Samples or reagents produced when performing
an experiment that you wish to track
Typing of PIMS items
Typing helps PIMS offer sensible choices: only a
plasmid can be used for transfection experiments…
Samples
■ Typed to show what they are
Input/Output samples for protocols
■ State what type of sample can be used and what
is produced
Experiments and protocols
■ An experiment type is defined by its protocol. A
protocol type links similar protocols together
Experiments & samples → Workflows
Sample A
Expt 3
Expt 1
Sample B
Expt 2
Sample C
Sample D
Expt 4
Sample E1
Sample E2
The PIMS holder (plate experiments)
A holder groups samples. This allows PIMS to
perform plate experiments in groups
Samples
■ For plate experiments output samples of
previous experiment are mapped to input
samples of next. (Provided sample type
matches!)
User interface for plate experiments
■ Gives graphical and spreadsheet views.
Allows editing, reformatting and spreadsheet
upload
Basic protocols used at OPPF
PCR
Verification
PCR Clean Up
InFusion
Transformation
Sequencing
Scale Up
Plasmid Prep
Lysis
Purification
Trial Expression
Concentration
A workflow derived from PIMS
Experiments can read/write data
How was a sample produced?
Export to external
target tracker softw
PDF reports of how
samples are made as
a permanent record
Target annotation at the OPPF
■ Used at OPPF and externally by YSBL & SSPF
■ OPAL freely available over the web
■ OPTIC DB managing annotations for 12226 targets
Albeck et al. (2006) Acta Cryst. D62, 1184
Example: predicting crystallizability
■ Simple to understand, calculated in advance
■ Helps to set priorities for lab work
Example: improving construct design
OPPF crystallization facility robots
4°C, 1000 plates
21°C, 10000 plates
Automation of crystallization trials
■ Live since Jan 2002
■ 22570 plates
■ 51845521 images
■ >28TB images
■ 351 registered users
for OPPF site
■ 100 OPPF/STRUBI
■ 118 Other Oxford
■ 50 Other UK
■ 83 Elsewhere
Mayo et al. (2005) Structure 13, 175
OPPF crystallization management
Controlled web-based
access from anywhere,
e.g. synchrotrons
xtalPIMS managing high-throughput
crystallization trials
■ Fusion of PIMS, Vault and eHTPX work
■ Can support multiple imaging and storage robots
Acknowledgments
PIMS PIs
■ Kim Henrick, Dave Stuart, Keith Wilson, Richard
Blake, Jim Naismith, Neil Isaacs
PIMS supported
■ Anne Pajon, Ed Daniel, Marc Savitsky, Susy Griffiths,
Katya Pilicheva
CCP4 supported
■ Chris Morris, Bill Lin
Other projects
■ OPPF: Jon Diprose
■ MPSI & SSPF: Petr Troshin, Jo van Niekerk
■ xtalPIMS: Ian Berry, Gael Seroul, Diederick De Vries
Download