Structural Genomics UK - an Oxford perspective Dave Stuart, UK-China meeting, June 2002

advertisement
Structural Genomics UK an Oxford perspective
Dave Stuart,
UK-China meeting, June 2002
Structural Genomics in Britain
Structure of presentation:
- e-science considerations (information content)
- survey
- some tasks and status
Structural Genomics
Information content – the scale of the problem
Human genomic DNA
3.2 Gbases
6.4 Gbits
~1
Gbyte
Translated proteins
amino acids
non-H atoms
parameters
parameter data
experimental data
100,000 (conservative estimate?)
>30,000,000
0.2 Gatoms
1 Gparameters
2 Gbyte
200 Gbyte
This would require:
3.2 Million Gbyte of X-ray data
Of course this assumes 1 structure / protein
For druggable targets, eg HIV-1 RT, may expect say >100 data sets
to be collected for 1 protein!
Structural Genomics
Information content – what we can do NOW
X-ray data collection at a 3rd generation synchrotron
(present technologies!)
3 sec / image
1,200 images / hour
~ 1,000 Gbyte / station / day
So for the planned day 1 beamlines at the new UK
synchrotron, Diamond
….
Upto 1,000 Tbytes / year of data,
to be shipped (GRID) / analysed (HPC) /
archived (???)
Structural Genomics
The ultimate aim is to tackle more complex
proteins, macromolecular complexes and
macromolecular machines….
For example
The sheer complexity of these systems
poses problems – eg the BTV core has
1000 protein subunits, another system has
a mass of 66 MDa – detailed analysis of
these is now possible due to beam
characteristics at 3rd generation
synchrotron sources
(not just viruses, eg ribosome)
ESRF Grenoble
ESRF Grenoble
Grimes at al., Nature, 1998,
Gouet et al., Cell, 1999
Structural Genomics
International context, synchrotron end –
eg NIH programme example
Software Development:
Complex Instrumentation to simple GUI
Structural Genomics in Britain
Despite a strong history of structural
biology, there is still rather little
coordinated activity in structural genomics
in the UK.
Ongoing:
-
BBSRC: current review of SB
Wellcome Trust / Industry: SGC
Oxford: MRC funded OPPF
Daresbury: NWSGC
The European perspective: SPINE
e-science, BBSRC: SRS,EBI,Oxford,York
Diamond – the new UK synchrotron
• Joint project: Office of Science and
Technology, Wellcome Trust, France
• Science:
roughly 50/50 Biology/Physical Sciences
3 out of 7 day one beamlines for HTP PX
Extant UK synchrotron resources (i)
• SRS: new MAD beamline under
construction
• ESRF to build HTP public PX
beamline
Extant UK synchrotron resources (ii)
BM14 at the ESRF
Owned and run by UK in collaboration with EMBL
(UK: MRC France)
Broad energy range MAD beamline
Intention is to provide a test-bed
for automation
Automation investigated: expect to order EMBL
microdiffractometer and automatic sample changer
Oxford Protein Production Facility (OPPF)
• MRC funding, 3 years initially (~6 M UK £)
• ‘Pilot project’ for larger scale activity
associated with the Diamond synchrotron
Oxford Protein Production Facility
WWW.OPPF.OX.AC.UK
Management Group:
Executive Committee:
John Bell, Iain Campbell,
Simon Davis, Robert Esnouf,
Jon Grimes, Karl Harlos,
Louise Johnson, Yvonne Jones,
Ian Jones, David Kerr,
Tony Monaco, Gavin Screaton,
Dave Stammers, Dave Stuart
Rob Esnouf, Jon Grimes,
Karl Harlos,
Yvonne Jones,
Dave Stammers,
Dave Stuart (chair)
Project Manager:
Dr R. Owens
Staff: David Alderton, Rene Assenberg,
Nick Berrow, Jon Diprose, Sally Greening,Jo
Nettleship,
Nahid Rahman-Huq, Tom Walter,
(Lester Carter, Mike Pickford)
WTC
OPPF
Building handover – April 2002
Aims / Philosophy of the OPPF
• Link in with existing biomedical research programmes which are
using, for instance, microarray and SAGE technologies
• Targets: mainly human proteins relevant to human health, plus
human viruses, driven by input from existing biology programmes
(funded by MRC and other bodies)
• Establish expression in bacteria/insect/mammalian systems
• Provide protein as a resource for functional studies and
structural studies (e.g. use GFP to track protein expression)
• Proteins will be ‘reagents’ for programmes aiming to look at
assemblies of several components (co-expression)
• Link in with NMR and cryo-EM
• Target 1000 clones per year into pipeline
Target definition
• Herpes viruses
• Proteins characteristic of immune cell function
• Zinc finger containing proteins / transcription
factors
• The cancer genome
• Protein modules (largely extracted from above)
OPPF tasks
• Bioinformatics. Data base construction, LIMS
integration.
• Protein expression/purification. Standardization to
pipeline 1000 target proteins per year.
• Crystallization. Automation of screening, detection
and optimization.
• Data collection/phasing. Data base integration with
synchrotron
(no direct support for X-ray, NMR, or EM)
OPPF – management structure
(Executive)
Tracking and scheduling with a Laboratory
Information Management System (LIMS)
Virtues evident
- from financial accounting to data mining
Effort considerable
- after investigation decided to go with a
commercial system
Barcodes
•• Coding symbology likely to be adopted by the OPPF:
OPPF:
128C encoding 12 numeric digits
•• Suggested usage format of the 12 digits:
XX YYY ZZZZZZ C
XX
Oganisation
Oganisation identifier
identifier –– the
the OPPF
OPPF would
would take
take 44
44
The
-99 is
The range
range 90
90-99
is reserved,
reserved, and
and will
will be
be used
used for
for three
three digit
digit
organisation
organisation identifiers
identifiers when
when this
this becomes
becomes necessary
necessary
YYY
Object
Object identifier
identifier –– eg
eg 998
998 for
for normal
normal Greiner
Greiner plates,
plates, 999
999 for
for shallow
shallow
Greiner
Greiner plates,
plates, 000
000 for
for people,
people, etc.
etc.
(Gives
(Gives aa range
range of
of 1000
1000 object
object types)
types)
ZZZZZZ Object
Object content
content identifier
identifier –– aa unique
unique identifier
identifier for
for this
this item
item within
within this
this
object
object set
set
(Gives
(Gives aa range
range of
of 1000000
1000000 uniquely
uniquely identified
identified items
items per
per object)
object)
C
C
Triple
-add-triple checksum
Triple-add-triple
checksum –– help
help prevent
prevent typos
typos if
if manually
manually entered
entered
Global identification
A flexible solution would be:
• Coding symbology:
Unspecified
• Alphanumeric/Numeric:
The string must start with the
numeric organisation identifier (first two or three
characters), otherwise no constraint is imposed
• Length of encoded string: >2
(>3 for organisation identifiers beginning 90-99)
• Each organisation then provides a web-based facility to
translate their own code string into the relevant
documentation for each item made available to other
groups.
• A centrally-maintained list maps the organisation identifier
(first two digits of string) to the organisation name
and the URL of the documentation-providing facility
Cloning issues
Cloning strategy: Gateway (present activity) or
Infusion
(LIC – ability to switch expression
system readily)
Expression: Bacteria / insects / mammalian
Tags: 1 or 2 tags plus fusion protein option via Gateway
(initial tests done)
Expression & purification issues
•
Initial work with E Coli
•
Activate Baculovirus infection route in year 1
•
Mammalian cell route within 2 years (developments)
•
Initial N-term His tag construct, HRV 3C protease
cleavable (Gateway issues, topo adaptation or
Infusion, under investigation – to avoid current nested
PCR and long primers)
Qiagen Biorobot 8000
- 96 well expression screening
- 96 well parallel protein purification – magnetic
& vacuum manifold technologies
- In particular Ni NTA
- or simply protein detection
Crystallisation issues
Technology:
96 well sitting drop
drop volumes
plates
Nanolitre
Currently Greiner
Reservoir dispensing for screens:
Robbins hydra, in-house adaptations
Drop dispensing:
Cartesian microsys (8 head), in-house
adaptations
Cartesian mods
Current drop size 100nl (will scale down)
Using lab scientists to test system.
…. They have voted with their feet!
Rate of equiilibration…
Crystals are large enough…
Thanks to C. Nicholls
Storage:
TAP 10,000
plate robot
Imaging:
Integrated
Veeco
Veeco-Optimag
system
1 tray
-96
tray-96
images /
minute
SPINE
Structural Proteomics IN Europe
SPINE
3 Year grant – 13.7 M Euros
Technologies development:
cloning and expression, through to
crystallography and NMR
Biomedical targets:
-Human pathogens:
Bacterial: TB & Campylobacter
Viral: Herpes viruses & enzyme targets
-Human proteins:
Cancer related targets
Neurological development/disorders
Download