Experimental Phasing with an Eye on Automation

advertisement
Structural Biology: A Collaborative Necessity
Or:
Collaborative Computing – does it have a future?
Or:
What should MX Software deliver now?
• West Coast Crystallographic Meeting
• Monterey: March 11th 2007
How has MX changed in 2007?
• The Internet – questions can be asked
and answered; information found
• Much more work done much faster, so desperate
need for organisation of information
• Better languages, Faster Computers, Better
graphics
• But still need good appropriate algorithms..
Has CCP4 helped develop New
Algorithms?
• Yes… maybe, but developers are free spirits – very little
“contracted” software
•
•
•
•
CCP4MG
Coot
Acorn
New density modification from KDC
• CCP4 provides distribution and support; author keeps
copyright, References flagged
What helps algorithm development?
• Common Data structures ( formats?)
•Library routines for data handling, crystallographic
operations ( Symmetry, FFTs, etc) :
Libraries must be accessible to developers and there
needs to be a way to update them to add routines, and
to debug. They must be well documented and curated.
Cooperation – is it possible? desirable?
•
Advantages
•Can speed up developments if library routines
are well documented and accessible :
•Shared efforts for maintenance and
distribution extends the code lifetime
• Common style helps users
•Organising crystallographic data is not easy
and requirements change– maybe we can
agree on and provide a better standard?
•Mtz model carries vital information with it.
Cooperation – is it possible? desirable?
•
Disadvantages
•Time consuming – consultation essential :
•Needs commitment by developers of algorithms
and libraries– often faster to make a quick
cludgey fix than read library; new routines may
need to be added to libraries
•Harder to get credit, raise funds
•Licensing issues!
Friendly Discussion Amongst Developers??
The Future - Automation ?
• Chemists now use crystallography as a tool and the
software is robust.
• MX will often be used in the same way in future– a
handy technique the user cannot be expected to
understand or criticise.
•Obviously, automation modules must be designed by
good crystallographers
( How will the good crystallographers be trained?)
What level of knowledge to assume- Discuss?
• Assessing the experiment
• Some understanding of crystal lattices, symmetry, point
groups & spacegroups
• Something about intensity statistics (at least that they exist!)
• I think it is important to know the structure factor equation
• Much basic information in
http://www.ccp4.ac.uk/docs.php
• (But there are still pathological cases.. See CCP4BB!)
• Acquiring some crystallographic know-how.
How much time will people devote to this?
Extracts from York tutorials given by Johan
Turkenburg (most slides taken from the web)
16 slides I thought essential follow!
Crystal: unit cell + lattice + symmetry
The unit cell in three dimensions.
The unit cell is defined by three vectors a, b, and c, and three angles , , .
b


a

c
 is angle between b and c;  between a and c;  between a and b
Unit cells are usually defined in terms of the lengths of the three vectors and the
three angles.
For example, a=94.2Å, b=72.6Å, c=30.1Å, =90°, =102.1°, =90°.
The Seven Crystal Systems
The 230 space groups can be grouped into seven crystal systems
Crystal
System
Minimum
Symmetry
Bravais
Lattices
Unit Cell
Geometry
1. Triclinic
None
1. Primitive (P)
a  b  c;

2. Monoclinic
One 2fold
axis
2. Primitive (P)
3. Base-Centered (C)
a  b  c;
 =  = 90  
3. Orthorhombic
Three
orthogonal
2fold axes
4. Primitive (P)
5. Base-Centered (C)
6. Body-Centered (I)
7. Face-Centered (F)
a  b  c;
 =  =  = 90
4. Tetragonal
One 4fold
axis
8. Primitive (P)
9. Body-Centered (I)
a = b  c;
 =  =  = 90
5. Trigonal
One 3fold
axis
10. Primitive (P)
a = b  c;
 =  = 90,  = 120
11. Rhombohedral (R)
a = b = c;
 =  =   90
6. Hexagonal
One 6fold
axis
10. Primitive (P)
a = b  c;
 =  = 90,  = 120
7. Cubic
Four 3fold
axes
12. Primitive (P)
13. Body-Centered (I)
14. Face-Centered (F)
a = b = c;
 =  =  = 90
Owing to symmetry requirements
some unit cells may not be primitive:
In total only 14 different combinations of
a, b, c and , ,  can exist = 14 Bravais’ lattices
Therefore we can have:
• P - primitive
• I – body centred
• A, B, C – face centred
• F – all-face centred unit cells
P
c
F
I
b
a
C
B
A
Symmetry Operators and Elements
Apart from the identity and translational symmetry, protein crystals can only contain
the following symmetry elements:
Proper rotation: Rotate by 360°/n. n = 2 3 4 or 6
Screw rotation: Rotate by 360°/n & translate by d(m/n); d= unit cell edge.
Proper Rotations
Symbol (n)
Screw Rotations
Symbol
(nm)
Two-fold
2
21
Three-fold
3
31, 32
Four-fold
4
41, 42, 43
Six-fold
6
61, 62, 63, 64, 65
Space group diagram P212121
Know where Int Tab A is!!
Indexing Conventions:
http://www.ccp4.ac.uk/dist/html/reindexing.html
Example:
• Reindexing (CCP4: General) - information about changing indexing regime
• etc
• All P3i and H3:
(h,k,l) not equivalent to (-h,-k,l) or (k,h,-l) or (-k,-h,-l) so we need to check all 4
possibilities:
• real axes:(a,b,c) and (-a,-b,c) and (b,a,-c) and (-b,-a,c)
• reciprocal axes: (a*,b*,c*) and (-a*,-b*,c*) and (b*,a*,-c*) and (-b*,-a*,c*)
• i.e. reindex (h,k,l) to (-h,-k,l) or (h,k,l) to (k,h,-l) or (h,k,l) to (-k,-h,-l).
•
•
N.B. For trigonal space groups, symmetry equivalent reflections can be
conveniently described as (h,k,l), (k,i,l) and (i,h,l) where i=-(h+k).
Replacing the 4 basic sets with a symmetry equivalent gives a bewildering range
of possibilities!.
Many choices of Asymmetric unit and unit cell
See http://www.ccp4.ac.uk/dist/html/alternate_origins.html
Unit cell = The smallest volume from which the entire crystal can be constructed
by translation only.
Diffraction
Geometry
Diffraction lattice and symmetry does not mirror
crystal symmetry exactly:
Use Reciprocal Space definitions to describe it..
• First we need to define the relation between
real space and reciprocal space. (Ie crystal
lattice and diffraction space)
• This requires us to look at Bragg planes and
Miller indices.
Definitions used for reciprocal space
• To go from real to reciprocal space we define a set of
axes a*, b* and c* such that:
• a* is perpendicular to b and c (b.a* = c.a* = 0)
• b* is perpendicular to a and c (a.b* = c.b* = 0)
• c* is perpendicular to a and b (a.c* = b.c* = 0)
• a.a* = b.b* = c.c* = 1
• For orthogonal system, the length of a* is 1/(length a)
• The length of a reciprocal vector d* is related to the
interplanar spacing in real space as 1/d
Structure Factor Equation
Very useful IF you know atom positions
Very useful for understanding crystallography
Alternate representation: Structure factor
can be represented by 2-d vectors.
FP
Native
Adding one (or more) atoms in known
positions changes the structure factor
in a known way
Derivative
FPH
Symmetry in reciprocal space
•No translations
•So point groups!
•But: Centrosymmetry: Friedel’s law Ihkl = I-h-k-l
•=> 11 Laue groups
Systematic absences
• Translational symmetry such as screw axes and lattice
centring, leads to some reflections being ‘absent’. This
can be shown using Structure Factor Formula
• If a space group has a 21 screw axis along b, then this
will affect the reflections 0k0: only k=2n observed
• If a space group has a 62 or 64 screw axis along c, then
this will affect the reflections 00l: only l=(6/2)n
observed
• Beware – a non-crystallographic translation
of(0.2,0.3,1/3) will ALSO give these absences
Centric and Acentric reflections
Centrosymmetric zones
• If Ihkl = I-h-k-l for a subset of reflections under the space group
symmetry without invoking Friedel’s law, then these
reflections are centric.
• In P21: Ihkl = I-hk-l so for all k=0 reflections, Ih0l = I-h0-l
•
• Most reflections are acentric
• This is relevant because:
1. Centric and Acentric intensities have different statistical
properties
2. Centric phases must be  or +180.
Does CCP4 help guide Users through
this?
• We hope so, but the best critics are the users
themselves
• In general some knowledge is assumed
• As far as possible programmers try to illustrate
important information by presenting it graphically.
• Links to documentation where possible
CCP4 Main Page
•
•
•
•
•
•
•
CCP4 Documentation
Individual Program Documentation
Tutorials
Maths for Protein Crystallographers
Crystallographic guidance
Roadmaps through the Suite
Talks
Example from CCP4 Main Page
• CCP4 Documentation
• Individual program documentation
• CCP4 Tutorial
• Maths for Protein Crystallographers
• Eleanor Dodson prepared a document containing all the
maths a protein crystallographer might need. It helps to
have this all together, and available on the web, so Maria
Turkenburg developed it further. It is distributed with the
suite as a set of documents in which certain symbols are
represented by small .gif-pictures. They are available here:
• Basic Maths for Protein Crystallographers
An aside- Project Book-keeping
• There is an urgent need for data management.
Each specific application program needs to define its
requirements and its product along with a book-keeping
header
e.g. Protein production needs sequence, so does
automatic model building – how to pass this info on via
intervening steps
– from laboratory to beamline to structure solution ?
Brief Introduction to the Graphical User Interface
• Designed
- to keep a record of what has been done within a project
directory/folder
It is far from perfect but at least it exists!
- to provide easy access to the tasks required for each
crystallographic module
- to provide diagnostic information, mostly via graphs,
summaries, and as a last resort, log files
CANNOT “manage” your work pattern! You must do that..
GUI Structure Solution Modules
•
•
•
•
•
•
•
•
•
•
•
•
Data Processing
Experimental phasing
Molecular Replacement
Density Improvement
Model building
Refinement
Structure Analysis
Validation and Deposition
Reflection, Coordinate, Map and Graphical utilities
Clipper applications
Program list
Needs logical up-dating! Currently underway –
CCP4BB request for feedback soon..
Lots of Graphical analysis from
CCP4 software
Scala Analysis
(Scaling and Merging)
Use hklview to see diffraction zones
Intensity Distributions
The structure factor equation means we can predict
some properties of all INTENSITY distributions
• These should be inspected as soon as data are
processed
• Intensity distribution v resolution
• Wilson Plot
• Moments
• SFCHECK good too
Intensity Analysis
Intensity Analysis
Lead to: Refinement problems
What level of knowledge to assume-
Validation
• Assessing how well the model describes the
experiment and fits with expectation - COOT lists
many tools
• Protein geometry – Ramachandran plots Need a tool
not part of refinement
• Sensible contacts (Molprobity, PISA etc)
• Density Fit
• Unmodelled map features
• Critical facilities
Solving the structure –
Automation as it is now
• From data through experimental phases to
model
• ShelxD/ShelxE: Decisions made within
program, based on good methodology.
• Solve & Resolve. Decisions made by
programmer, based on expert knowledge.
• AutoSharp: links several programs using
scripting. Decisions made at scripting level?,
based on expert knowledge.
More Automation Procedures
• From homologous model to final model.
• Molrep: Input experimental data and model –
output model of asymmetric unit. (Mr Bump –
Balbes)
• Arp-Warp/Refmac5. Model building using
refinement & map interpretation. This uses a
GUI to set a protocol, interpreted into a C-shell
script.
• Some CCP4 GUI tasks
Automation Thoughts
• Should procedures aim to be “black boxes”?
• Yees – but I think there are too many difficult cases
for this..
• Can MX be automated? Will Automation lead to
rigidity?
• There is a danger of this – not so serious if the
approach is modular, linked by scripting..
• Will automation destroy our critical facilities
Download