Multiple isomorphous replacement (MIR) (Ctd) Up to so far we have

advertisement

BSTR521 - Biocrystallography Winter 2011

Multiple isomorphous replacement (MIR) (Ctd)

Up to so far we have:

1. Measured a 'native dataset' giving us the F

P

2. Measured one or more 'derivative datasets' giving F

PH

’s

3. Calculated the 'difference Patterson' with coefficient (F

P

- F

PH

) 2

4. Solved the difference Patterson, giving us a preliminary set of positions and estimates of relative occupancies and perhaps even B-factors for these positions, i.e. Z j

, x j

, y j

, z j

, B j

5 Perhaps we have even optimized these 'heavy atom parameters' by some sort of phase-independent refinement.

But how do we get the best possible phases 

P from the available data?

2011 11 BSTR521 MIR-II V01 Page 1 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR : How to calculate the ‘best’ phase  best

.

The so-called “Harker construction” allows the calculation of the phase of F

P

in the absence of errors. However, errors are aplenty since we have:

measurement errors for F

P

and F

PH;

non-isomorphism of F

PH

versus F

P

;

parameter errors for the calculation of F

H

.

In a classic paper, Blow & Crick showed in 1959 how - with certain approximations - the Harker construction can be used to obtain phase probabilities and how these probabilities can be used to obtain the ‘best phase’,

 best

and a ‘figure - of - merit’, m .

2011 11 BSTR521 MIR-II V01 Page 2 Wim Hol

BSTR521 - Biocrystallography Winter 2011 and

MIR : Recapitulation & Summary of Some Useful Trigonometric Equations sin

2

  cos

2

  1 sin sin cos cos

 sin sin cos cos

 cos cos  cos cos

 cos cos sin sin

 sin sin sin sin

 hence sin 2  cos  

 2 sin  cos  hence cos 2   cos

2

  sin

2 and hence cos

2 sin

2

1

2

1 

1 

2

1  cos cos

2 

2 

 sin   2 sin

1

2

 cos

1

2

 cos

1

2

2

 sin

1

2

2 cos

2

2

 2

 1  1  2 sin

2 cos

1

2

2

 1  1  2

 sin

Also good to remember: exp i   cos   i sin 

1

2

2

hence cos  and sin  

1  exp

 

 exp

  

2

1

2 i

 exp

 

 exp

  

[An aside: i  exp

 2

( which leads to i i =  e 2

 0.208; fun to derive and to imagine what it means "a complex number raised to a complex power is a real number", but this i i equation not used anywhere later).]

2011 11 BSTR521 MIR-II V01 Page 3 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR : THE PRINCIPLE OF THE ISOMORPHOUS REPLACEMENT METHOD

Towards the Harker construction.

F p

and F

PH

are the available measurements. Can  p

be determined from these measurements? Well, IF F

H

can be derived from F p

and F

PH

 p

can almost be obtained:

In the figure above, in  OAB =  - ( 

Cosine rule in  OAB: p

- 

H

)

F 2

PH

 F 2

P

 F 2

H

 2 F F or F 2

PH

 F 2

P

 F 2

H

 2 F F cos

 

P

 cos

P

 

H

H

  or cos

P

 

H

F 2

PH

 F 2

P

 F 2

H

2

, then or

P

 

H

  arccos

F 2

PH

 F 2

P

 F 2

H

2

Hence 2 solutions, only one of which is correct.

2011 11 BSTR521 MIR-II V01 Page 4 Wim Hol

BSTR521 - Biocrystallography

MIR:HARKER CONSTRUCTION

Single Isomorphous Replacement (SIR)

Winter 2011

Clearly, there are two solutions for

P

:

and

OG

They correspond to 

P

 

H

 arccos

F 2

PH

 F 2

P

 F 2

H

2

2011 11 BSTR521 MIR-II V01 Page 5 Wim Hol

BSTR521 - Biocrystallography

MIR: HARKER CONSTRUCTION

MULTIPLE ISOMORPHOUS REPLACEMENT (MIR)

Winter 2011

With two derivatives the phase ambiguity of SIR is resolved:

Note that perfect data is assumed!

F

P

is OH .

2011 11 BSTR521 MIR-II V01 Page 6 Wim Hol

BSTR521 - Biocrystallography

MIR Step 6 :Phase Probabilities.

Winter 2011

Blow and Crick (1959) reasoned that it is plausible to assume that the probability of an arbitrary phase  for F

P,obs

is proportional to (for derivative j):

 

P j

 exp



2 E 2 j



Here  j

(  ) is the "lack of closure" error for this reflection(of derivative j) with assumed phase  for F

P

. And E j

is the average lack-of-closure error for this derivative j.

Since the Harker construction showed that  j

(  ) has usually two minimum values, P(  ) for a single derivative is usually bimodal - that is for acentric reflections.

For N heavy atom derivatives: P (  ) 

N

 j  1 exp 

 2 j

(  )

2 E 2 j

Take as E j

2 the squared lack of closures of the centrics exp

N  j  1

2

2 j

(  )

E j

2

E 2 j

1

N

F

PH,obs

 F

PH,calc

2

2011 11 BSTR521 MIR-II V01 Page 7 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Step 6:Calculating SIR lack-of-closure errors for each α

P

.

α

P

Lack-of-closure error (in red) as function of assumed phase α shown in one case (in blue) for F

P

.

P

(only

2011 11 BSTR521 MIR-II V01 Page 8 Wim Hol

BSTR521 - Biocrystallography

MIR Step 6:Calculating MIR phase probabilities .

Winter 2011

Phase circles for the 112(a) and 317(b) reflections from the horse oxy-haemoglobin, illustrating the difference between the most probable F and the “best” F. The curves show the corresponding phase probability distributions for the two reflections.

Note that the probability distribution for reflection 112 is much less broad than for reflection 317, and therefore the “figure-of-merit” m for reflection 112 is larger than for reflection 317: 0.74 versus 0.24. The figure-of-merit will be introduced a few pages below.

REF: Cullis, A. F., Muirhead, H., Perutz, M. F., Rossmann, M. G. & North, A. C. T. (1961). The structure of haemoglobin IX. A three-dimensional Fourier synthesis at 5.5 Å resolution: description of the structure. Proc. Roy. Soc. A265, 161.

2011 11 BSTR521 MIR-II V01 Page 9 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Step 7 :The ‘best’ Fourier.

From probabilities to best phase and figure-of-merit

Blow & Crick (1959) reasoned approximately as follows:

1.

Mean square error, over the cell,  2  due to one reflection being

  an arbitrary F s hkl xyz

1

V

[( F

S

 F

T

instead of the true F

) exp{  2  i ( hx j

 ky j

T

is: lz j

)}

( F

S

*  F

T

*) exp{  2  i ( hx j

 ky j

 lz j

)}]

(

 

2

V hkl xyz

(   hkl xyz

)

|

2

F

) 2

S

V

4

F

V

2

2

2

V

T

2

(

2

|

F

( F

S cos

S

( F

S

2

F

F

T

T

)

( hx

) 2

F

T

2

[ 1

) 2 j cos

2 2 ky

 cos

( j

 hx j lz

2  ( hx

2.

We don’t know F

T

(1) becomes:

but we know F

 j j

) ky

 j ky

 j lz

 j

) lz j

)]

(1)

T

=  F

P,obs

 exp i  and P(  ). Then

 

2

V 2

2

 0

F

S

 F

2

 obs exp i

P

 

2

P

 

  0

3.

The ‘best Fourier’ has that F

S

for which  2  is minimal, or:

 hkl

S

2

 0 yielding:

S

 F

P,obs

2

0 expi  P

  d 

 best

 m

2

P

  d 

0

F

P,obs expi  best

2011 11 BSTR521 MIR-II V01 Page 10 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Step 7 :The Figure of Merit: m

1.

Blow & Crick (1959) defined

 best

 m | F obs

| exp i  best

 m exp i  best

2 

0 exp i  P (  ) d 

2 

0

P (  ) d 

 m is the vector to the center of gravity of the phase probability curve plotted on a circle with radius 1

  best

is the phase of that vector m

2.

It can be shown that: m =  cos  = mean value of the cosine of the error in the phase angle  best

.

2011 11 BSTR521 MIR-II V01 Page 11 Wim Hol

BSTR521 - Biocrystallography

MIR Step 7: From phase probabilities to m and  best

.

Winter 2011

Two probability curves for the phase angle of two different reflections. The baseline for the curves is a circle with radius |r|=1. C is the centroid of the probability distribution on the circle. m is the vector that connects the center of the circle with C. In other words, the more spread-out the probability curve is, the poorer is the determination of the phase angle, and the shorter the vector m.

(a) the sharp peak of the probability curve positions C close to the circle;

(b) C is somewhat further away from the circle, m is shorter, and the phase of this reflection is somewhat less certain than in (a).

(From: Drenth, J., Principles of protein x-ray crystallography, 2 nd Ed. (1999) p. 172)

2011 11 BSTR521 MIR-II V01 Page 12 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Step 8 : Lack - of - closure error refinement

Optimization of phases  p

and heavy atom parameters in alternating approximate steps:

Once approximate heavy atom parameters are known approximate phases can be calculated by the Blow and Crick procedure.

With these approximate phases the heavy atom parameters can be refined by minimizing the ‘lack - of - closure error’.

This procedure can be cyclically repeated until convergence is reached i.e. no significant phase shifts and no significant shifts in heavy atom parameters occur any longer.

Notes: This is obviously not an ideal procedure since in one step we consider the approximate phases as ‘true’ and in the other step the approximate heavy atom parameters are considered ‘true’.

2011 11 BSTR521 MIR-II V01 Page 13 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Step 8 : Heavy atom parameter refinement with known phases.

Minimize E, the sum of the lack - of - closure errors, while keeping the protein phases constant, and varying the heavy atom parameter:

E

 hkl

W

hkl

F

PH , obs

Often: w hkl

= m hkl or w

And: | F

PH , cal

|  | F obs

| exp i 

P

 hkl

H , calc

= m 2 hkl

F

PH , calc

2

With  p

= usually the ‘best phase’, sometimes the ‘most probable phase’

H,calc

 j

Z j f exp

 B j 

 sin

 



2 

 exp 2 

 i hx j

 ky j

 lz j

 and the heavy atom parameters Z linear least squares procedure. j

, B j

, x j

, y j

, z j

are optimized by a non-

2011 11 BSTR521 MIR-II V01 Page 14 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Step 9 : Finding all heavy atom sites

Usually the Difference Patterson (F

PH

-F

P

) 2 reveal only a relatively small subset of all sites- i.e. the well occupied ones. The subsequent steps needed to arrive at the full set of positions can vary quite a bit from case to case, but the typical way is as follows:

0) A few major sites are found from the Difference Patterson;

1) Refine the parameters (i.e. x, y, z, occupancy and temperature factor) of each of these sites;

2) Calculate the “best” phases (  best sites;

) and figure of merit, m, with these

3) Calculate a Difference Fourier m(F

PH

- F

P

) exp i 

Difference’ or ‘Residual’ Fouriers with m (| F obs

PH

|  | best

and ‘Double

F calc

PH

|) exp i  calc

PH as coefficients for the derivative under consideration;

4) Cross-check the highest peaks in the Difference Fourier with the

Difference Patterson - checking in particular the presence or absence of cross-vectors between major and weaker sites;

5) Check the Residual Fouriers for extra sites and for ‘flatness’ near the known sites;

6) Refine the parameters of all sites, check for convergence and

“reasonableness” and go back to 2 above until no new sites are found.

2011 11 BSTR521 MIR-II V01 Page 15 Wim Hol

BSTR521 - Biocrystallography

Mir Step 9 : Difference and Residual Fouriers

Winter 2011

| |

New positions show up in the Difference Fourier:

  

 hkl

PH,obs

 F

P,obs

 exp i 

P

New positions and errors in current positions show up in the

“Residual Fourier”, also called “Double Difference Fourier”:

   

 hkl

PH,obs

 F

PH,calc

 exp i 

PH,calc

 hkl m 

 hkl

The Residual Fouriers show errors in:

- positions as a combination of positive and negative peaks next to current position

- occupancies as peaks or holes at current positions

- temp. factors as concentric positive and negative circles.

At least, that is ...... ideally.

2011 11 BSTR521 MIR-II V01 Page 16 Wim Hol

BSTR521 - Biocrystallography Winter 2011

Mir Step 10 : Calculate your “best” electron density map

With |Fobs|,  best

and m know, calculate the ‘best’ Fourier:

 ( x ) 

 h m h

F obs

( h ) exp i  best

( h ) and happy tracing and building….

But don’t forget to analyze your MIR statistics carefully!

And......it is a good habit to check your electron density in the neighborhood of the heavy atom positions – both at positive as well as negative contour levels. Sometimes serious errors, often F

PH

versus F

P

scaling errors, are reflected as large peaks or holes at or near heavy atom sites.

2011 11 BSTR521 MIR-II V01 Page 17 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Statistics: Monitoring the Progress of Lack - of - Closure Error

Refinement.

 The “reasonableness” of the heavy atom parameters; such as reasonable occupancies, temperature factors of the sites & reasonable relative scale factor (k) of F

PH

versus F p

 k initial

should not deviate more than 2-5% from unity. The relative temperature factor (B) should not differ more than 5-10 Å 2 from zero.

R centric

Cullis

 centric

 h

F

PH centric

 h

F

PH

F

P

F

P

F calc

H below up to

0 .

60

0 .

80 great often seen

 “ Phasing Power”  r.m.s.

lack r .

m .

s

of -

F

H closure error

for centrics: over 1.0 useful derivative; over 2.0 super; below 1.0 still frequently used.

 Phase Difference 

H

and 

P

H

- 

P

 ≈ 90  ideally, since

are uncorrelated. If this difference is not close to 90º (i.e. deviates by more than a few degrees) then 

P

looks too much like 

H

(which results in large peaks in  protein

at the position of heavy atom sites) or too much like ( 

H

+  ) (which leads to deep holes in  protein

The cause of this problem is usually an error in the scale of F

PH versus F

P

.

).

 The “Double Difference Fourier” (F

PH obs -F

PH calc )expi 

PH calc should be flat - ideally...........The double difference Fourier is also called the

“Residual Fourier”. The peaks and holes in such a difference Fourier can give very precise information about how to correct positional and occupancies, even thermal parameters.

2011 11 BSTR521 MIR-II V01 Page 18 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Statistics : Statistics of one derivative

Porcine Growth Hormone K

2

OsCl

6

Heavy-Atom Derivative Statistics a

Resolution limit, Å rms(

F

H

)/rms(E) b

R c c

Number of reflections

Mean figure of merit d

10.7

2.10

0.49

89

0.67

7.8

2.05

0.43

212

0.70

5.9

2.64

0.44

367

0.61

4.9

3.97

0.38

563

0.76

4.1

2.78

0.55

801

0.61

3.6

2.49

0.60

1049

0.52

3.1

1.97

0.68

1270

0.39

2.8

1.54

0.66

1446

0.28

Total

2.44

0.53

5797

0.48 a b c

Reproduced by permission from ref. 16. Abbreviations: rms(

F

H

), the root mean squared calculated heavy-atom structure factor amplitude; rms(E), the root mean square lack of closure determined from centric reflections only, R , Cullis R-factor.

The ratio (

F

H c

-rms/E-rms) should get larger as the phasing model improves. d

A reliability index (R) of zero indicates perfect agreement between observed and calculated structure factors. R c

is the Cullis -R for centric reflections

A figure of merit ( m

) of one indicates no error in phase angle.

REF: Sherin S. Abdel-Meguid 1996 “Structure Determination Using Isomorphous Replacement” In: C. Jones, B. Mulloy, and M.R. Sanderson (eds) Methods in Molecular Biology Volume 56: Crystallographic methods and Protocols, Human

Press, Totowa, New Jersey, pp 168.

2011 11 BSTR521 MIR-II V01 Page 19 Wim Hol

BSTR521 - Biocrystallography

MIR Statistics

Table 2 - Dihydroorotate dehydrogenase A (DHODH A)

Refined heavy-atom parameters.

Derivative Site* Fractional coordinates Occupancy

Winter 2011

B factor (Å2)

Gold 1

Gold 2

A 0.928, 0.999, 0.784

B 0.927, 0.586, 0.688

A 0.929, 0.999, 0.787

B 0.927, 0.587, 0.684

C 0.545, 0.931, 0.715

D 0.546, 0.657, 0.603

E 0.808, 0.938, 0.784

F 0.808, 0.647, 0.639

0.22

0.17

0.58

0.53

0.22

0.20

0.17

0.19

23.6

18.8

12.6

7.8

21.2

35.1

21.2

23.9

Gold 3

Gold 4

A 0.929, 0.000, 0.786

B 0.927, 0.587, 0.685

A 0.929, 0.000, 0.788

B 0.927, 0.588, 0.683

0.66

0.61

0.95

0.89

20.1

17.0

20.5

17.4

*Gold atoms in sites A and B are bound to Cys23 residues; sites E and F correspond to binding to Cys130 residues. There are no obvious candidate residues for sites C and D.

1. Note that in this case all derivatives share two sites – the two highest occupied sites. In other words, these derivatives do not provide entirely independent phase information.

2. Note that the occupancies of the derivative’s sites are very low in site

“Gold 1”.

REF: Paul Rowland, Finn S. Nielsen, Kaj. Frank Jensen, and Sine Larsen (1997) The crystal structure of the flavin containing enzyme dihydroorotate dehydrogenase A from

Lactococcus lactis

. Structure,

5:239-252.

2011 11 BSTR521 MIR-II V01 Page 20 Wim Hol

BSTR521 - Biocrystallography Winter 2011

MIR Statistics

Table 1 - Dihydroorotate dehydrogenase A (DHODH A)

Data collection and phasing statistics.

Native Gold 1 Gold 2

Crystal size (mm

3

)

Protein concentration (mg ml –1 )*

PEG component†

KAu(CN)

2

soaking conditions

Total data collected (°)

Oscillation range per frame (°)

Exposure time per frame (min)

Resolution (Å)

Observations

Unique reflections

Completeness (%) (all data / I

>3 σ (I)

R merge

(%)†† / Ri so

(%)§

Highest resolution shell (Å)

Completeness (%) (all data / I

>3 σ (I)

R merge

(%)

Heavy-atom sites

Phasing power

(acentric/centric)‡

R

Cullis

(acentric/centric)#

0.5x0.5x0.5

18

PEG 6000

180

2.5

30

2.0

179 624

50 757

99.3 / 80.3

5.2

2.03–2.00

90.8 / 45.4

23.4

1.0x0.8x0.2

30

0.5x0.4x0.2

40

0.8x0.8x0.2

18

0.4x0.4x0.4

18

PEG 4000 PEG 4000 PEG 6000 PEG 8000

2.5 mM, 1 day 8 mM, 2 days 50 mM, 1 day 50 mM, 2 days

180

3

30

120

3

60

180

3

30

160

2

30

2.8

67 880

18 583

99 / 92

2.8

44 519

16 641

89 / 56

2.8

68 497

18 804

99 / 89

2.5

80 280

25 489

96 / 85

5.4 / 7.8

2.85–2.80

90 / 75

9.3

2

0.75 / 0.48

0.91 / 0.96

13.6 / 18.4

2.85–2.80

81 / 25

38.8

6

1.48 / 0.95

0.75 / 0.79

Gold 3

6.9 / 13.2

2.85–2.80

88 / 66

11.4

2

2.80 / 1.90

0.51 / 0.54

Gold 4

6.1 / 19.3

2.54–2.50

88 / 61

11.2

2

2.62 / 1.86

0.54 / 0.54

*The protein concentration in the mother liquor.

†The PEG component used in the mother liquor.

††R merge

= Σ |Ij–< Ij> |/ Σ < Ij >, where Ij is the intensity of an observation of reflection j and < Ij > is the average intensity for reflection j.

§R iso

= Σ ||F

PH

|– |F

P

||/ Σ |F

P

|, where F

PH

is the structure-factor amplitude of the derivative crystal and F

P

is that of the native crystal.

‡Phasing power = root mean square (|F

H

|/ E), where F

H

is the calculated structure-factor amplitude due to scattering by the heavy atoms and E is the residual lack of closure error.

#R cullis

= lack of closure/isomorphous difference.

Note:

 ‘Gold 1’ with low occupancies and poor phasing power was obtained by a 2.6 mM 1 day soak. (I.e., a variant of the ‘2+2’ rule.)

 Gold 4 has the same two sites but with much higher occupancies and reasonable phasing power, and was obtained by a 50mM soak for 2 days.

REF: Paul Rowland, Finn S. Nielsen, Kaj. Frank Jensen, and Sine Larsen (1997) The crystal structure of the flavin containing enzyme dihydroorotate dehydrogenase A from

Lactococcus lactis

. Structure,

5:239-252.

2011 11 BSTR521 MIR-II V01 Page 21 Wim Hol

BSTR521 - Biocrystallography Winter 2011

Multiple Isomorphous Replacement Flow Diagram

Search for Derivatives

Scale F

PH

vs F

P

Determine Initial Heavy Atom Positions:

1. From Difference Patterson: by hand

2. From Difference Patterson: vector search

3. By Direct Methods

Optional:

Refine Heavy Atom Parameters

(without phase info)

Calculate (initial) phases

Refine Heavy Atom Parameters Combine

(with phases) Phase info

Add positions & Correct Parameters from Heavy Atom Difference and

Residual Fouriers

Calculate ‘best’ Protein Electron Density

Density Modification

Build Model

Structure

Better

Phases

2011 11 BSTR521 MIR-II V01 Page 22 Wim Hol

BSTR521 - Biocrystallography Winter 2011

Reflections on the past and future of heavy atom derivatives

 Heavy atom derivatives were in the 1950’s to 1980’s crucial for solving the very first and numerous subsequent protein structures by multiple-isomorphous replacement (MIR) and sometimes single-isomorphous replacement (SIR) methods.

 However, the necessity to obtain a set of reasonably isomorphous derivatives proved and proves often to be a formidable task. Due to the soak in a heavy atom containing solution cell dimensions tend to change, isomorphism is lost and MIR or SIR fails.

 However-square, if the heavy atom compound displays a number of welloccupied binding sites, the crystal obtained after soaking - no matter how different it is in cell dimensions, or even space group (!) from the native crystal – can be extremely valuable for Heavy Atom-Multiwavelength Anomalous

Dispersion (Heavy Atom MAD)! This approach is to be kept in mind when solving new structures: heavy atom derivatives – if well-diffracting – can be useful for SIR or MIR when isomorphous, and even more useful for MAD whether the crystal remained isomorphous or not!

 Finally, it has been observed on a number of occasions that co-crystallization of proteins in the presence of heavy atom derivatives yielded better diffracting crystals.

 Post-finally, it has also been observed on a number of occasions that soaking pregrown crystals in solutions of heavy atom derivatives improved the resolution

(and often simultaneously changed cell dimensions) of the crystals. Regardless of the change in cell dimensions, this is a double-winner: a crystal suitable for heavy atom MAD and of better resolution!

2011 11 BSTR521 MIR-II V01 Page 23 Wim Hol

Download