ppt - ACA Summer School in Macromolecular Crystallography

advertisement

Protein Crystal Data Phases Structure

Overview of the Phase

Problem

Remember

We can measure reflection intensities

We can calculate structure factors from the intensities

We can calculate the structure factors from atomic positions

We need phase information to generate the image

What is the Phase Problem

X-ray Diffraction Experiment

All phase information is lost x,y.z

[Real Space]

F hkl

[Reciprocal Space]

In the X-ray diffraction experiment photons are reflected from the crystal lattice (planes) in different directions giving rise to the diffraction pattern.

Using a variety of detectors (film, image plates, CCD area detectors) we can estimate intensities but we loose any information about the relative phase for different reflections.

Phases

Let’s define a phase for an individual atom, f j

2

( hx j

 ky j

  z j

) f j

An atom at x j

=0.40, y j

=0.25, z j

=0.10 for plane [213] f j

= 2

 [ 2•(0.40)

+

1•(0.25)

+

3•(0.10)] = 2 (1.)

For k = 0 (a 2D case) then f j

2

( hx j

 z j

)

For plane [201] f j 

= 2

 [ 2•(0.40)

+

1•(0.10)] = 2 (0.0)

Now to understand what this means….

201 planes

0

0.4, y, 0.1

a

D

H

B

201 Phases

I

A G

F

E

C

C

E

B

H

F

I

G A

D c

360°

2 f

D

= 2

 [ 2•(0.40)

+

1•(0.10)] = 2 (0.0)

720°

4

1080°

6

In General for Any Atom (x, y, z)

a d hkl

Atom (j) at x,y,z d hkl

4π d hkl 2π φ

0

Plane hkl

Remember: We express any position in the cell as

(1) fractional coordinates p xyz

= x j a +y j b +z j c

(2) the sum of integral multiples of the reciprocal axes

 hkl

= ha * + kb * + lc * c

Phase for Any Atom

2

 j  proj p j

)

  hkl

 d hkl p j

 hkl

 ha *

 kb *

 lc *

  hkl

1 d hkl

 hkl

 p j

 j

( ha *

 kb *

 lc

 hx j

 ky j

 lz j

 x a j

 y b j

 z c j

2

 j

  hx j

 ky j

 lz j

)

)

Why Do We Need the Phase?

orm

Fourier transform r transf urie se Fo Inver

Structure Factor Electron Density

In order to reconstruct the molecular image (electron density) from its diffraction pattern both the intensity and phase, which can assume any value from 0 to 2

, of each of the thousands of measured reflections must be known.

Importance of Phases

Karle amplitudes with Karle phases

Hauptman amplitudes with Hauptman phases

Karle amplitudes with Hauptman phases

Hauptman amplitudes with Karle phases

Phases dominate the image!

Phase estimates need to be accurate

Understanding the Phase Problem

The phase problem can be best understood from a simple mathematical construct.

The structure factors (F hkl

) are treated in diffraction theory as complex quantities , i.e., they consist of a real part (A hkl imaginary part (B hkl

) .

) and an

If the phases,  hkl

, were available, the values of A could be calculated from very simple trigonometry: hkl and B hkl

A hkl

= |F hkl

| cos (  hkl

)

B hkl

= |F hkl

| sin (  hkl

) this leads to the relationship:

( A hkl

) 2 + ( B hkl

) 2 = |F hkl

| 2 = I hkl

Argand Diagram

( A hkl

) 2 + ( B hkl

) 2 = |F hkl

| 2 = I hkl

The above relationships are often illustrated using an Argand diagram

(right).

imaginary

F hkl

B hkl

 hkl

From the Argand diagram, it is obvious that A hkl and B hkl may be either positive or negative, depending on the value of the phase angle,  hkl

.

A hkl

Note: the units of A hkl are in electrons.

, B hkl and F hkl real

Figure 3. An Argand diagram of structure factor

 hkl

F hk 

A

A hkl hk hkl

iB hk 

( B hkl

) components are also shown.

 hk

 tan

1

B

A hk hk



f

0

The Structure Factor

Atomic scattering factors

F hk

N  j

1 f j e

2

 i ( hx j

 ky j

 z j

)

Here f j is the atomic scattering factor



The scattering factor for each atom type in the structure is evaluated at the correct sinΘ/λ. That value is the scattering ability of that atom.

Remember sin

1

2 d hkl

We now have an atomic scattering vector with a magnitude f

0 and direction φ j

.

sinΘ/λ

The Structure Factor

Sum of all individual atom contributions imaginary Resultant

F hkl

Individual atom f j s

B hkl real f j

2

( hx j

 ky j

  z j

)

A hkl

F hk 

 j

N 

1 f j e

2

 i ( hx j

 ky j

 z j

)  j

N 

1 f j e i f j



Electron Density

Remember the electron density (image of the molecule) is the Fourier transform of the structure factor F

 x , y , z

1

V





 hkl

F hkl e

2

 i [ hx

 ky

 lz ]







 hkl

1

V

. Thus







 hkl

F hkl e

 i





 e

 i

  cos

  i sin

F hkl

 x , y , z

A hkl

1

V





 hkl

 iB hkl

A hkl cos

 

B hkl hkl sin







 x , y , z

1

V

Here V is the volume of the unit cell





 hkl

A hkl cos[2

( hx

 ky

 lz )]

B hkl hkl sin[ 2

( hx

 ky

 lz )]







In practice, the electron density for one three-dimensional unit cell is calculated by starting at x, y, z = 0, 0, 0 and stepping incrementally along each axis, summing the terms as shown in the equation above for all hkl (as limited by the resolution of the data) at each point in space.

Solving the Phase Problem

Small molecules

Direct Methods

Patterson Methods

Molecular Replacement

Macromolecules

Multiple Isomorphous Replacement (MIR)

Multi Wavelength Anomalous Dispersion (MAD)

Single Isomorphous Replacement (SIR)

Single Wavelength Anomalous Scattering (SAS)

Molecular Replacement

Direct Methods (special cases)

Solving the Phase Problem

SMALL MOLECULES

The use of Direct Methods has essentially solved the phase problem for well diffracting small molecule crystals.

MACROMOLECULES

Today, anomalous scattering techniques such as MAD or

SAS are the most common techniques used for de novo structure determination of macromolecules.

Both techniques require the presence of one or more anomalous scatterers in the crystal.

SIR and SAS Methods

1. Need a heavy atom (lots of electrons) or a anomalous scatterer (large anomalous scattering signal) in the crystal.

SIR - heavy atoms usually soaked in.

• SAS - anomalous scatterers usually engineered in as selenomethional labels. Can also be soaked.

2. SIR collect a native and a derivative data set (2 sets total). SAS collect one highly redundant data set and keep anomalous pairs separate during processing.

• SAS - may want to choose a scatterer or wavelength that enhances the anomalous signal.

3. Must find the heavy atoms or anomalous scatterers

• can use Patterson analysis or direct methods.

4. Must resolve the bimodal ambiguity.

• use solvent flattening or similar technique

Heavy Atom Derivatives

Heavy atom derivatives MUST be isomorphous

Heavy atom derivatives are generally prepared by soaking crystals in dilute (2 - 20 mM) solutions of heavy atom salts (see Table II below for some examples).

Crystal cracking is generally a good indication that that heavy atom is interacting with the crystal lattice, and suggests that a good derivative can be obtained by soaking the crystal in a more dilute solution.

Once derivative data has been collected, the merging R factor (R merge

) between the native and derivative data sets can be used to check for heavy atom incorporation and isomorphism.

R merge values for isomorphous derivatives range from 0.05 to 0.15. Values below 0.05

indicate that there is little heavy atom incorporation. Values above 0.15 indicate a lack of isomorphism between the two crystals.

Table II . Protein Residues and Their Affinities for Heavy Metals

Residue: Affinity for: Conditions:

Histidine

Tryptophan

K

2

PtCl

4

, NaAuCl

4

, EtHgP O

4

H

2

Hg(OAc)

2

, EtHgP O

4

H

2

Glutamic, Aspartic Acids UO

2

(NO

3

)

2

, rare earth cations

Cysteine

Methionine

Hg,I r,Pt,Pd,Au cations

PtCl

4

2 anion pH>6 pH>5 ph>7

Finding the Heavy Atoms or Anomalous Scatterers

The Patterson function

- a F 2 Fourier transform with f

= 0 P uvw

- vector map (u,v,w instead of x,y,z)

maps all inter-atomic vectors

- g et N 2 vectors!! (where N= number of atoms)

1

V

| F hkl

|

2 cos2

( hu

 kv

 lv ) hkl



The Difference Patterson Map

SIR - |

SAS - |

D

F| 2 = |F

D

F| 2 = |F nat hkl

- F der

| 2

- F

-h-k-l

| 2

Patterson map is centrosymmetric

see peaks at u,v,w & -u, -v, -w

Peak height proportional to Z i

Z j

Peak u,v,w’s give heavy atom x,y,z’s

- Harker analysis

From Glusker, Lewis and Rossi

Origin (0,0,0) maps vector of atom to itself

Harker Analysis

Patterson symmetry = Space group symmetry minus translations

Example Space group P2

1

P2

1 space group symmetry operators x,y,z -x,1/2+y,-z x,y,z

-x,1/2+y,-z x,y,z

[(x,y,z) - (x,y,z)]

[(-x,1/2+y,-z) – (x,y,z)]

-x,1/2+y,-z

[(x,y,z) - (-x,1/2+y,-z)]

[(-x,1/2+y,-z) – (-x,1/2+y,-z)] x,y,z

-x,1/2+y,-z x,y,z

000

-2x, 1/2 ,-2z

-x,1/2+y,-z

2x ,-1/2 , 2z

000

Harker section v = 1/2 where to look for heavy atom vectors

±2x, 1/2, ±2z

Automated programs SOLVE, SHELXD, BNP are available

A Note About Handedness

We identify each reflection by an index, hkl.

The hkl also tells us the relative location of that reflection in a reciprocal space coordinate system .

The indexed reflection has correct handedness if a data processing program assigns it correctly.

The identity of the handedness of the molecule in the crystal is related to the assignment of handedness of the data, which may be right or wrong!

Note: not all data processing programs assign handedness correctly!

Be careful with your data pro cessing.

The Phase Triangle Relationship

M

D

OLM

QLM

LON

=

D

OLN

 

LON

   

H

Q

L

O





F

PH

= F

P

+ F

H

Need value of F

H

N

From Glusker, Lewis and Rossi

F

P

F

P

, F

PH

, F

H and -F

H are vectors (have direction)

<= obtained from native data

F

PH

F

H

<= obtained from derivative or anomalous data

<= obtained from Patterson analysis

The Phase Triangle Relationship

M

L

Q

O

N

From Glusker, Lewis and Rossi

In simplest terms, isomorphous replacement finds the orientation of the phase triangle from the orientation of one of its sides. It turns out, however, that there are two possible ways to orient the triangle if we fix the orientation of one of its sides.

Single Isomorphous Replacement

From Glusker, Lewis and Rossi

Note:

F

P

= protein

F

H

= heavy atom

F

P1

= heavy atom derivative

The center of the F

P1 circle is placed at the end of the vector -

F

H1.

X

X

1

2

f

f true or true or f false f false

The situation of two possible SIR phases is called the “phase ambiguity” problem, since we obtain both a true and a false phase for each reflection.

Both phase solutions are equally probable, i.e. the phase probability distribution is bimodal.

Resolving the Phase Ambugity

From Glusker, Lewis and Rossi

Note:

F

P

= protein

F

H

= heavy atom

F

P1

= heavy atom derivative

The center of the F

P1 circle is placed at the end of the vector -

F

H1.

X

X

1

2

f

f true or true or f false f false

Add more information:

(1) Add another derivative (Multiple Isomorphous Replacement)

(2) Use a density modification technique (solvent flattening)

(3) Add anomalous data (SIR with anomalous scattering)

Multiple Isomorphous Replacement

Note:

F

P

F

H1

= protein

= heavy atom #1

F

H2

= heavy atom #2

F

P1

= heavy atom derivative

F

P2

= heavy atom derivative

The center of the F

P1 and F

P1 circles are placed at the end of the vector -F

H1 and -F

H2

, respectively.

X

1

X

2

X

f true

f false

f fals

From Glusker, Lewis and Rossi

We still get two solutions, one true and one false for each reflection from the second derivative. The true solutions should be consistent between the two derivatives while the false solution should show a random variation.

Exact overlap at X

1 dependent on data accuracy dependent on HA accuracy called lack of closure

Solvent Flattening

Similar to noise filtering

Resolve the SIR or SAS phase ambiguity

From Glusker, Lewis and Rossi

B.C. Wang, 1985

Electron density can’t be negative

Use an iterative process to enhance true phase!

How Does Solvent Flattening Resolve the Phase

Ambiguity?

1.

Solvent flattening can locate and enhance the protein image – e.g. whatever is not solvent must be protein!

2.

From the protein image, the phases of the structure factors of the protein can be calculated

3.

These calculated phases are then used to select the true phases from sets of true and false phases

4.

Thus, in essence, the phase ambiguity is resolved by the protein image itself!

5.

The solvent flattening process was made practical by the introduction of the ISIR/ISAS program suite (Wang, 1985) and other phasing programs such DM and PHASES are based on this approach.

Handedness Can be Determined by Solvent Flattening

The ISAS pr ocess is carried twice, once with heavy atom site(s) at refined locations (+++), and once in their inverted locations (---).

Data

RHE

NP With I 3

FOM 1 Handedness FOM 2 R-Factor Corr. Coef

0.54

0.54

0.54

Correct 0.82

Incorrect

Correct

0.80

0.80

0.26

0.30

0.27

0.958

0.940

0.955

0.54

NP With I & S 4 0.56

Incorrect

Correct

0.76

0.82

0.36

0.24

0.919

0.964

0.56 Incorrect 0.78 0.35 0.926

1 : Figure of merit before solvent flattening

2 : Figure of merit after one filter and four cycles of solvent flattening

3 : Four Iodine were used for phasing

4 : Four Iodine and 56 Sulfur atoms were used for phasing

Heavy Atom Handedness and Protein Structure Determination using S ingle-wavelength

Anomalous Scattering Data, ACA Annual Meeting, Montreal, July 25, 1995.

Does the Correct Hand Make a Difference?

YES!

The wrong hand will give the mirror image!

Anomalous Dispersion Methods

All elements display an anomalous dispersion (AD) effect in X-ray diffraction

For elements such as e.g. C,N,O, etc., AD effects are negligible

For heavier elements, especially when the X-ray wavelength approaches an atomic absorption edge of the element, these AD effects can be very large.

The scattering power of an atom exhibiting AD effects is: f

AD

= f n

+ D f' + i D f” f n is the normal scattering power of the atom in absence of AD effects

D f' arises from the AD effect and is a real factor (+/- signed) added to f n

D f" is an imaginary term which also arises from the AD effect

D f" is always positive and 90° ahead of (f n

+ D f') in phase angle

The values of D f' and D f" are highly dependent on the wave-length of the Xradiation.

In the absence AD effects, I hkl

= I

-h-k-l

(Firedel’s Law).

With AD effects, I hkl

≠ I

-h-k-l

(Friedel’s Law breaks down).

Accurate measurement of Friedel pair differences can be used to extract starting phases if the AD effect is large enough.

Breakdown of Friedel’s Law

f’ f’

( F hkl

Left) F n represents the total scattering by "normal" atoms without AD effects, represents the sum of the normal and real AD scattering values (f n

+ D f'), D f" f’ is the imaginary

AD component and appears 90° (at a right angle) ahead of the f’ vector and the total scattering is the vector F

+++

.

(F

-h-k-l

Right) F

-n is the inverse of F n

(at  hkl again 90° ahead of f’ . The resultant vector, F

) and

--f’ is the inverse of f’ , the D f" vector is once in this case, is obviously shorter than the F

+++ vector.

Collecting Anomalous Scattering Data

Anomalous scatterers, such as selenium, are generally incorporated into the protein during expression of the protein or are soaked into the crystals in a manner similar to preparing a heavy atom derivative.

Bromine, iodine, xeon and traditional heavy atom compounds are also good anomalous scatterers.

The anomalous signal, the difference between |F

+++ magnitude smaller than that between |F

PH

| and |F

(hkl)|, and |F

P

---

| is generally about

(hkl)|.

one order of

Thus, the signal-to-noise (S/n) level in the data plays a critical role in the success of anomalous scattering experiments, i.e. the higher the S/n in the data the greater the probability of producing an interpretable electron density map.

The anomalous signal can be optimized by data collection at or near the absorption edge of the anomalous scatterer. This requires a tunable X-ray source such as a synchrotron.

The S/n of the data can also be increased by collecting redundant data .

The two common anomalous scattering experiments are Multiwavelength Anomalous

Dispersion ( MAD ) and single wavelength anomalous scattering/dfiffraction ( SAS or SAD )

The SAS technique is becoming more popular since it does not require a tunable X-ray source.

Increasing Number of SAS Structures

Increasing S/n with Redundancy

Multiwavelength Anomalous Dispersion

Note:

F

P

= protein

F

H1

F +

PH

= heavy atom

= F

+++

F -

PH

= F

---

F

F -

+

H”

H”

=

D f”

+++

=

D f”

---

The center of the F +

PH and F -

PH circles are placed at the end of the vector -F +

H” and -F -

H” respectively.

From Glusker, Lewis and Rossi

In the MAD experiment a strong anomalous scatterer is introduced into the crystal and data are recorded at several wavelengths ( peak, inflection and remote ) near the X-ray absorption edge of the anomalous scatterer. The phase ambiguity resolved a manner similar to the use of multiple derivatives in the MIR technique.

Single Wavelength Anomalous Scattering

The SAS method, which combines the use of SAS data and solvent flattening to resolve phase ambiguity was first introduced in the ISAS program (Wang, 1985). The technique is very similar to resolving the phase ambiguity in SIR data.

The SAS method does not require a tunable source and successful structure determination can be carried out using a home X-ray source on crystals containing anomalous scatterers with sufficiently large

D f” such as iron, copper, iodine, xenon and many heavy atom salts.

The ultimate goal of the SAS method is the use of S-SAS to phase protein data since most proteins contain sulfur. However sulfur has a very weak anomalous scattering signal with

D f” = 0.56 e for Cu X-rays.

The S-SAS method requires careful data collection and crystals that diffract to 2Å resolution.

A high symmetry space group (more internal symmetry equivalents) increases the chance of success.

The use of soft X-rays such as Cr K

(



= 2.2909Å) X-rays doubles the sulfur signal

(

D f” = 1.14 e ).

There over 20 S-SAS structures in the Protein Data Bank.

What is the Limit of the SAS Method

Electron Density Maps of Rhe by Sulfur-ISAS

(Calculated using simulated data in 1983 )

SAS Unresolved ISAS Filter1 Cycle 1 ISAS Filter 3 Cycle 8 F cal

(Wang (1985), Methods Enzym , 115 , 90-112)

D f” = 0.56e

using Cu K

X-rays

Molecular Replacement

Molecular replacement has proven effective for solving macromolecular crystal structures based upon the knowledge of homologous structures.

The method is straightforward and reduces the time and effort required for structure determination because there is no need to prepare heavy atom derivatives and collect their data.

Model building is also simplified, since little or no chain tracing is required .

The 3-dimensional structure of the search model must be very close ( < 1.7Å r.m.s.d

.) to that of the unknown structure for the technique to work.

Sequence homology between the model and unknown protein is helpful but not strictly required. Success has been observed using search models having as low as 17% sequence similarity.

Several computer programs such as AmoRe, X-PLOR/CNS PHASER are available for MR calculations.

Molecular Replacement

Use a model of the protein to estimate phases

Must be a structural homologue (RMSD < 1.7Å)

Two step process

1. find orientation of model (red ==> black)

2. find location of orientated model (black ==> blue) px.cryst.bbk.ac.uk/03/sample/molrep.htm

Molecular Replacement

Use a model of the protein to estimate phases

Need to determine model’s orientation in X

1 s unit cell

Use a Patterson rotation search (

,b,g) zyz convention

The coordinate system is rotated by an angle

 around the original z axis, then by an angle b around the new y axis, and then by an angle g around the final z axis.

Molecular Replacement

Use a model of the protein to estimate phases

Need to determine orientated model’s location in X

1 s unit cell

Use an R-factor search

Orientated model is stepped through the X

1 unit cell using small increments in x, y, and z (eg. x => x+ step)

Point where R is lowest represents the correct location

Other faster methods are available e.g. PHASER

Download