dv2

advertisement
Macromolecular Crystallography
and
Structural Genomics
– Recent Trends
Prof. D. Velmurugan
Department of Crystallography and Biophysics
University of Madras
Guindy Campus, Chennai – 25.
• Structural Genomics aims in identifying as many
new folds as possible.
• This eventually requires faster ways of
determining the three dimensional structures as
there are many sequences before us for which
structural information is not yet available.
• Although Molecular Replacement technique is
still used in Crystallography for solving
homologous structures, this method fails if there
is not sufficient percentage of homology.
• The Multiwavelength Anomalous Diffraction
(MAD) techniques have taken over the
conventional Multiple Isomorphous Replacement
(MIR) technique.
• With the advent of high energy synchrotron sources and
powerful detectors for the diffracted intensities,
developments in methodologies of macromolecular
structure determination, there is a steep increase in the
number of macromolecular structures determined and
on an average eight new structures are deposited in the
PDB every day and the total entries in the PDB is now
around 29,000.
• Instead of using the three wavelength strategies in MAD
experiments, the use of single wavelength anomalous
diffraction using Sulphur anomalous scattering is
recently proposed. This will reduce the data collection
time to 1/3rd.
• Also, the judicious use of the radiation damage during
redundant data measurements in second generation
synchrotron source and also during regular data
collection in the third generation synchrotron source has
been pointed out recently (RIP & RIPAS).
Protein Structure Determination
•
•
•
•
•
X-ray crystallography
NMR spectroscopy
Neutron diffraction
Electron microscopy
Atomic force microscopy
As the number of available amino acid
sequences exceeds far in number than the
number of available three-dimensional
structures, high-throughput is essential in
every aspect of X-ray crystallography.
Procedure
Protein Crystal
The 14 Bravais lattices
1: Triclinic
2: Monoclinic
(Blue numbers correspond o the crystal system)
The 14 Bravais lattices
3: Orthorhombic
(Blue numbers correspond to the crystal system)
The 14 Bravais lattices
4: Rhombohedral
5: Tetragonal
(Blue numbers correspond to the crystal system)
6: Hexagonal
The 14 Bravais lattices
7: Cubic
(Blue numbers correspond to the crystal system)
Synchrotron radiation
More intense X-rays at shorter wavelengths mean higher
resolution & much quicker data collection
Diffraction Apparatus
Diffraction Principles
nl = 2dsinq
The diffraction experiment
The amplitudes of the waves scattered by an atom to that of an single
electron
– atomic scattering factor
The amplitude of the waves scattered by all the atoms in a unit cell to
that of a single electron (The vector (amplitude and phase) representing the overall
scattering from a particular set of Bragg planes) | Fhkl | – structure factor
The structure factor magnitude F(hk/) is represented by the length
of a vector in the complex plane.
The phase angle a(hk/) is given by the angle. measured
counterclockwise, between the positive real axis and the vector F.
unit cell
F (h,k,l) = Vx=0 y=0 z=0 (x,y,z).exp[2I(hx + ky + lz)].dxdydz
A reflection
electron density
V = the volume of the unit cell
|Fhkl| = the structure-factor amplitude (proportional to the
square-root of reflection intensities)
ahkl = the phase associated with the structure-factor amplitude
|Fhkl|We can measure the amplitudes, but the phases are lost in
the experiment.
This is the phase problem.
Fourier Transform requires both
structure factors and phases
Electron density calculation
ρ
ΣΣΣ
π
α
Unknown
Patterson function
• Patterson space has the same dimension as
the real-space unit cell
• The peaks in the Patterson map are
expressed in fraction coordinates
• To avoid confusion, the x, z and z
dimensions of Patterson vector-space are
called (u, v, w).
What does Patterson function
represent?
• It represents a density map of the vectors
between scattering atoms in the cell
• Patterson density is proportional to the
squared term of scattering atoms, therefore,
the electron rich, i.e., heavy atoms,
contribute more to the patterson map than
the light atoms.
Patterson function –
no phase info required
Consider phaseless term (h, k, l, F2)
ΣΣΣ
P
No phase term
Patterson map
 r =
hkl F (S) exp (-2i{r.S})
Direct space
Pu = hkl I (S) exp (-2i{u.S})
Density
and
position
P(u) =
Amplitudes
and
phases
cell  rr+u d3r
I(S)=F*(S).F(S)=|F(S)|2
F (S) =
cell  r exp (2i{r.S}) d3r
Fourier
transformation
Fourier
transformation
Patterson
map
Intensities
Reciprocal space
 r =
hkl F (S) exp (-2i{r.S})
Patterson map with symmetry
Patterson map symmetry
Harker vectors
u, v, w
2x, 1/2, 2z
P(u) =
cell  rr+u d3r
P21
x, y, z
-x, y+1/2, -z
Pu = hkl I (S) exp (-2i{u.S})
Diffracting a Cat
Diffraction
data with
phase
information
Real
Diffraction
Data
Reconstructing a Cat
FT
Easy
FT
Hard
The importance of phases
Phasing Methods
all assume some prior knowledge of the electron
density or structure
The Phase Problem
• Diffraction data only records intensity,
not phase information (half the
information is missing)
• To reconstruct the image properly you
need to have the phases (even approx.)
–
–
–
–
Guess the phases (molecular replacement)
Search phase space (direct methods)
Bootstrap phases (isomorphous replacement)
Uses differing wavelengths (anomolous disp.)
Acronyms for phasing techniques
•
•
•
•
•
•
•
MR
SIR
MIR
SIRAS
MIRAS
MAD
SAD
Direct methods
• Based on the positivity and atomicity of electron density that leads
to phase relationships between the (normalized) structure factors
(E).
• Used to solve small molecules structures
• Proteins upto ~1000 atoms, resolution better than 1.2 Å
• Used in computer programs (SnB, SHELXD SHARP) to find
heavy-atom substructure.
Jerome Karle and Herbert A. Hauptman
Nobel prize 1985 (chemistry)
Density modification procedures (e.g. solvent flattening and
averaging) can be carried out as part of a cyclic process
Dm cycle
P
h
a
s
e
sa
n
d
a
m
p
l
it
u
d
e
s
F
,
P
P
Fourier
transformation
Phase
combination
N
e
w
p
h
a
s
e
s
a
n
d F
,
P
c
a
l
c
P
c
a
l
c
a
m
p
l
i
t
u
d
e
s
l
e
c
t
r
o
n

(
r
)E
d
e
n
s
i
t
y
Map
modification
Inverse Fourier
transformation

(
r
)
m
o
d
M
o
d
i
f
i
e
d
e
l
e
c
t
r
o
n
d
e
n
s
i
t
y
m
a
p
Molecular Replacement (MR)
Used when there is a homology model available (sequence identity > 25%).
1. Orientation of the model in the new unit cell (rotation function)
2. Translation
Molecular Replacement (MR)
• MR works because the Fourier
transform works in both directions.
– Reflections
New Protein
Coordinates in PDB
model (density)
• Have to be careful of model bias
MR solution
Isomorphous replacement
• Why isomorphous replacement, making
heavy atom derivatives?
– Phase determination
• Calculating FH
FH= FPH-FP
If HA position is known, FH can be calculated from ρ(xH, yH,
zH) by inverse FT
• HA position determination – Patterson
function
HA shifts FP by FH
Isomorphous Replacement (SIR, MIR)
– Collect data on native crystals (no metals)
– Soak in heavy metal compounds into crystals,
go to specific sites in the unit cell.
• e.g. Hg, Pt, Au compounds
– The unit cell must remain isomorphous
– Collect data on the derivatives
– As a result, only the intensity of the
reflections changes but not the indices
– Measure the reflection intensity differences
between native and derivative data sets.
– Find the position of the heavy atoms in the
unit cell from the intensity differences.
• generate vector maps (Patterson maps)
Native and heavy-atom derivative
• |FP + HA| – |FP| = |FHA|
• Must have at least two heavy atom derivatives
• The main limitations in obtaining accurate
phasing from MIR is non isomorphism and
incomplete incorporation (low occupancy) of
the heavy atom compound.
diffraction patterns superimposed
and shifted vertically.
Note: intensity differences for
certain reflections.
Note: the identical unit cell
(reflection positions).
This suggests isomorphism.
Isomorphic HA derivatives only changes
the intensity of the diffraction but not the
indices of the reflections
Native crystal
HA derivative crystal
Once we have an heavy atom structure H(r), we can use this to calculate
FH(S). In turn, this allows us to calculate phases for FP and FPH for each
reflection.
Harker diagram
Harker construction for SIR
FP
PH(P)
FPH
-F H
FP
P
FPH
The phase probability
distribution shows that SIR
results in a phase
ambiguity
We can use a second derivative to resolve the phase ambiguity
Harker construction for
multiple isomorphous replacement (MIR)
MIR
PH(P)
FPH2
FP
P
-FH2
FPH
-F H
PH2(P)
P
P(P) = PH(P).PH2(P)
P
AS
Anomalous scattering leads to a breakdown of Friedel‘s law
FPH(S)
Anomalous
derivative
FPH(0)
FPH(-S)
Anomalous scattering data can also be used to solve the phase ambiguity
P+(P)
P
+
FP
-F +H'
+
PH
F
F*PH
P(P)
-F+H'' -F*H''
P
P(P) = P+(P).P(P)
P
Note that the anomalous differences are very
small; thus very accurate data are necessary
Of course, there are errors in the data, determination of heavy atom
positions etc.
Blow and Crick developed a model in which all errors are
associated with |FPH|obs

FH
FPH
FP
P
PH
The triangle formed by FP, FPH and FH
fails to close
The 'lack of closure error'  is a function
of the calculated phase angle P

=
||
F

||
F
P
H
o
b
s
P
H
c
a
l
c
The phase probability P(P) is given by
  (P)
P(P) = exp
2E2
2
The resulting phases have a
minimum error when the
best phase best, i.e. the centroid of
the phase distribution
best =
2
0  P( )d
P
P
P
is used instead of the
most probable phase.
The quality of the phases is
indicated by the figure of merit m
m=
most probable phase
0
2
P(P)exp(iP)dP
2
0 P(P)dP
m=1: 0o phase error
o
m=0.5: ~60 phase error
m=0: all phases equally probable
•
Steps
in
MAD
Introduce anomalous scatterer
– Incorporate SeMet in replace of Met
– Incorporate HA eg Hg, Pt, etc…
•
Take your crystals to a synchrotron beam-line
(tunable wavelength).
•
Collect data sets at 3 separate wavelengths:
the Se (or other HA) absorption peak, edge and
distant to the peak.
•
Measure the differences in Friedel mates to get
an estimate of the phases for the Se atoms.
– These differences are quite small so one need to
collect a lot of data (completeness, redundancy)
to get a good estimate of the error associated
with each measurement.
•
Use the Se positions to obtain phase estimates
for the protein atoms.
Atomic scattering factor: 3 terms
Advantages of MAD
• All data is collected from one crystal
– Perfect isomorphism
• Fast
• Easily interpretable electron density maps
obtained right away.
SAD
Single-wavelength anomalous diffraction (SAD)
phasing has become increasingly popular in protein
crystallography.
Two main steps –
1) obtaining the initial phases
2) improving the electron density map
calculated with initial phases.
• The essential point is to break the intrinsic phase ambiguity.
• Two kinds of phase information enables the discrimination of phase
doublets from SAD data prior to density modification.
 From heavy atoms (expressed by Sim distribution)
 From direct methods phase relationships (expressed by
Cochran distribution)
Mlphare
first example of
+The
dm
OAS distribution
solving an unknown
Sim
distribution
protein by directmethod phasing of
the
2.1Å
OAS
data
Solvent flattening
Rusticyanin,
MW: 16.8 kDa; SG: P21;
a=32.43, b=60.68,
c=38.01Å ; b=107.82o ;
Anomalous scatterer: Cu
Oasis
+OAS
dm distribution
Sim distribution
Cochran distribution
Solvent flattening
Radiation damage Induced Phasing
(RIP)
• Radiation damage has been a curse of
macromolecular crystallography from its early
days.
• The X-ray radiation damage of cystals can be
caused by he breakage of covalent bonds as an
immediate consequence of the absorption of an
X-ray quantum (a primary effect) of by the
destructive effect of the propogation of radicals
throughout the crystal (a secondary effect).
• Total dose and dose rate play a role in the
amount of radiation damage inflicted on a
protein crystal.
• The most pronounced structural changes observed were
disulphide-bond breakage and associated main-chain
and side-chain movements as well as decarboxylation of
aspartate and glutamate residues.
• The structural changes induced on the sulphur atoms
were successfully used to obtain high-quality phase
estimates through an RIP (Radiation damage Induced
Phasing) procedure.
Radiation damage Induced Phasing
with Anomalous Scattering (RIPAS)
• Substructure solution and phasing
procedure using a combination of
anomalous scattering and radiation
damage
induced
isomorphous
differences.
• RIPAS strategy is beneficial for both
locating the substructure and subsequent
phasing.
Experimental electron density before solvent flattering with
SAD (left), RIP (middle) and RIPAS (right) phases for the
(a) CS (thaumatin crystal soaked in a diluted
N-iodisuccinamide solution) thaumatin data
(b) IC thaumatin (iodinated crystallized
thaumatin)
Methods of phase improvement
It is not always (!) possible to recognise features in a first electron
density map. There are however ways of improving the map
(phases):
• Solvent Flattening
• Histogram matching
• Non-crystallographic symmetry
(NCS) Averaging
these methods can result in dramatic
improvements in the clarity of the
electron density map.
1. Solvent flattening. Protein crystals contain large amounts of solvent;
this will in general be disordered, and so will not contribute to the crystal
diffraction.
By knowing the protein content of the crystal, it is therefore possible to
determine the threshold density below which is noise; points with
density below the threshold are set to a suitable average value.
This is particularly useful for locating molecular boundaries.
2. Averaging. If the asymmetric unit possesses more than one
molecule, the equivalencing of the various copies can lead to dramatic
improvement in the map and the phases.
Improvement in electron density after solvent
flattening and histogram matching
Before
Green =
solvent envelope
After
Interpretation of the Electron Density
(Building the Model)
• Lots of fun!
• Trace the main-chain
• Try to recognize the
amino acid sequence in
the density.
• Programs:- Xtal view, O
The effect of resolution of the quality of
the electron density map
2.0 Å
1.5 Å
1.2 Å
5.0 Å : see shape of molecule
3.0 Å : see main-chain and some side chains
2.5 Å : see main-chain carbonyls
1.5 Å : ~ atomic resolution.
Resolution
1.2 Å
2Å
3Å
Atomic resolution
Fitting side chains, adding waters
• If the density is good enough you can recognize alternate conformations for side-chains.
• Hydrogens are not seen in the density, except in ultra-high resolutions structures < 1.0 Å.
• Ordered Waters are seen on the surface and occasionally in the interior of the protein.
At 2.0 Å resolution or better ~ 1 water / residue.
Waters molecules play a big role in protein stability and enzyme catalysis.
•Because the density depends on experimental phases which has error associated with them.
The first model can have many errors.
• Therefore it is essential to refine the atomic positions and their thermal parameters.
Chain Tracing
Electron
Density
Chain
Trace
Final
Model
Maps coefficients used
to minimize model bias
2Fo – Fc : most common map seen in paper.
Fo – Fc : (difference map) used with the above map to detect errors
1
 ( x, y, z ) 
V
 F e
a
 2i ( hx ky  lz ) 
hkl
h
k
l
hkl
Refinement Cycle
Refinement: Improving the agreement between the model and the experimental density.
Compare Fobs (From reflection Intensities) to Fcalc (Calculated from the model)
Least squares minimization
Simulated Annealing / Molecular dynamics
Rfactor = numerical indicator to follow progress of refinement
agreement between data and model
data
model
F

F
R
F
obs
Fit Model
calc
obs
data
Calculate map
Refine
Refinement
Refinement
R
# iterations
R = (|Fo-Fc|)/(Fo)
Fc = calculated structure factor
Fo = observed structure factor
The best Fourier is calculated from
1
best(r) =

m|FP(S)|exp(iPbest(S))
S
Protein Data Base growth
Molecular Biology: cloning of genes / over expression of proteins
Synchrotron Radiation: MAD phasing, smaller crystals
Cryo-cooling of crystals: collect data from 1 crystal,
increase order.
Instrumentational and software improvements
Increase in the number of labs using the technique
• Due to the advent of synchrotron radiation and due to
the seleno-methionine derivatization technique, the
total number of protein structures deposited in the
PDB from 1980 onwards has increased
catastrophically.
• MAD technique played a major role in this. At
present nearly 100 new structures are deposited every
week.
THANK YOU
Download