NMR structure calculation

advertisement
NMR structure calculation
1
NMR结构解析流程
生物信息学和生物化学分析
样品制备(蛋白质表达纯化,标记)
核磁共振初步鉴定
核磁共振数据收集
化学位移指认
二级结构分析
NOE指认
结构计算
2
Solving structures by NMR
Sample Preparation
Structural restraints
•Cloning, expression, purification
•Isotope labelling
[15N], [13C/15N],[ 2H/13C/15N]
•NOE, H-bonds
•J-couplings
•Residual dipolar couplings, T1/T2
Resonance Assignments
•Chemical shifts
• Backbone
• Side chains
Secondary Structure
Structure Calculation
•Distance geometry
•Restrained molecular dynamics
•Simulated annealing
Chemical shift
Ensemble of 3D structures
3
Overview
• Structure representation
• Types of NMR data conversion into restraints
• Structure calculation methods
• Structure validation
4
蛋白质结构层次
• 氨基酸通过肽键形成的生物高分子
• 一级结构、二级结构、三级结构、四级结构
•肽键具有双键性质而不能任意旋转
•主链可旋转的二面角,
5
20种常见的氨基酸残基
6
通常NMR中一个自旋系统是指一个氨基酸残基上的所有原子
NMR解析蛋白质溶液结构
• 测定原子(氢原子)之间的距离信息和其他约
束信息,得到空间结构模型
• 化学结构(氨基酸序列,即一级结构)已知,
测定空间结构(三级结构,四级结构)
7
Structure calculation
Conformation
8
结构计算
• 基本方法
– 由实验得到各种构象约束信息:距离,二面角
• 约束信息是不完备的
• 约束信息是不精确的
– 计算满足这些约束条件的构象
• 距离几何(Distance Geometry)
• 约束条件下的分子动力学模拟(Restrained Molecular
Dynamics Simulation)
• 模拟退火(Simulated Annealing)
9
NMR experimental observables
providing structural information
• Backbone conformation from chemical shifts
(Chemical Shift Index - CSI): , 
• Distance restraints from NOEs
• Hydrogen bond restraints
• Backbone and side chain dihedral angle restraints
from scalar couplings
• Orientation restraints from residual dipolar
couplings
10
由化学位移得到二级结构的信息
• 二级结构
– CSI:化学位移与
无规卷曲的多肽化
学位移之差
– TALOS:基于数据
库比对预测二面角
– 可用于结构计算和
分析
11
约束信息
• 距离约束
– 1H-1H NOE
– 氢键
• 二面角约束
– 主链
– 侧链
– 肽键:反式-顺式
• 其他
– 手性
L-氨基酸
12
约束生成
• NOE
– 指认
• Unique:只有一种可能
• Ambiguous:有多种可能
•其中一种可能是正确的
•多种可能都是正确的(谱峰重叠)
– 转化为距离
• 实验通常可检测 <5Å
• 距离是不精确的:距离范围
• 距离是大量的精确结构
13
NMR data 1: NOE
• For short mixing times NOE cross peak intensity
is proportional to 1/r6 of two protons.
• NOE ~ 1/r6 f(tc)
– For well structured areas of a macromolecule f(tc) can be considered to
be constant. (in practice this is assumed to be true for all parts of the
molecule)
– Calibration of cross peaks by using a proton pair of known local
geometry (distance)
– Because of multiple simplifying assumptions of the relationship
between NOE and distance it is usually used only qualitatively (class
NOEs in three bins: strong, medium and weak)
14
Approaches to identifying NOEs
• 1H-1H NOESY
3D
•
15N-
1H
2D
1H
1H
1H
1H
or 13C-dispersed 1H-1H NOESY
3D
4D
15
Special NOESY experiments
• Filtered, edited NOE: based
on selection of NOEs from two
molecules with unique labeling
patterns.
Labeled
protein
Unlabeled
peptide
Only NOEs at the interface
• Transferred NOE: based on
1) faster build-up of NOEs in
large versus small molecules; 2)
Fast exchange 3) NOEs of
bound state detected at
resonance frequencies of free
state
H
kon
H
H
koff
H
Only NOEs from bound state
16
1H-1H
distances from NOEs
Long-range
(tertiary structure)
Sequential
Intra-residue
A
B
C
D
••••
Z
Medium-range
(helices)
Challenge is to assign all peaks in NOESY spectra
17
NMR data 1: NOE
• Conversion of NOE into distances
– Strong:
1.8 - 2.7 Å
– Medium:
1.8 - 3.3 Å
– Weak:
1.8 - 5 Å
Lower bound because of vdw radii of atoms
18
NOE pseudo-energy potential
• Generate “fake” energy potentials representing the cost of violating
the distance or angle restraints. Here’s an example of a distance
restraint potential
KNOE(rij-riju)2 if rij>riju
VNOE =
0
if rijl<rij < riju
KNOE(rij-rij1)2 if rij<rijl
where rijl and riju are the lower and upper bounds of our distance restraint,
and KNOE is some chosen force constant, typically ~ 250 kcal mol-1 nm-2
So it’s somewhat permissible to violate restraints but it raises V
19
NOE pseudo-energy potential
VNOE
Potential rises steeply with
degree of violation
0
rijl
riju
20
Number of NOEs are more important
than accuracy of individual NOEs
Structure calculation of protein G (56 aa) with increasing numbers of NOES
21
Restraints and uncertainty
Large # of restraints = low
values of RMSD
Large # of restraints for
key hydrophobic side chains
22
Dealing with ambiguous restraints
• often not possible to tell which atoms are involved in a NOESY crosspeak, either
because of a lack of stereospecific assignments or because multiple protons have
the same chemical shift.
• sometimes an ambiguous restraint is included but is expressed ambiguously in the
restraint file, e.g. 3 HA --> 6 HB#, where the # wildcard indicates that the beta
protons of residue 6 are not stereospecifically assigned. This is quite commonly
done for stereochemical ambiguities.
• it is also possible to leave ambiguous restraints out and then try to resolve them
iteratively using multiple cycles of calculation. This is often done for restraints
that involve more complicated ambiguities, e.g. 3 HA-->10 HN, 43 HN, or 57
HN, where three amides all have the same shift.
• can also make stereospecific assignments iteratively using what are called floating
chirality methods.
23
Example of resolving an ambiguity
during structure calculation
9.52 ppm
9-11 Å
A
B
range of inter-atomic
distances observed in
trial ensemble
4.34 ppm
3-4 Å
C
4.34 ppm
Due to resonance overlap between
atoms B and C, an NOE crosspeak
between 9.52 ppm and 4.34 ppm
could be A to C or A to B - this
restraint is ambiguous.
But if an ensemble generated with this
ambiguous restraint shows that A is never
close to B, then the restraint must be A to C.
24
自动化结构计算
• NOE自动指认
– CANDID/CYANA
• 只需提供指认的化学位移列表和NOE谱峰列表
• 自动进行NOE指认
• 自动通过7个结构计算循环(100个结构取二十个),逐步优化
指认结果
• 自动计算结果正确性依赖于原子化学位移指认的比例和正确性
(至少>90%)
– SANE
• 依赖初始结构的自动NOE指认
25
Practical improvements in
structure calculation
• Conventional approach relies on interactive assignment of NOEs:
very laborious
• ARIA: ambiguous restraints
– use all NOEs in a spectrum even when unassigned and allow automatic
assignment during successive structure calculation rounds
i.e. discarding NOEs that are inconsistent with emerging structure
• Combine with fully automated assignment procedures to arrive at
fully automated structure calculation
26
Iterative structure calculation with
assignment of ambiguous restraints
start with some set of
unambiguous NOEs and
calculate an ensemble
there are programs such as ARIA, with
automatic routines for iterative assignment
of ambiguous restraints. The key to success
is to make absolutely sure the restraints you
start with are right!
27
source: http://www.pasteur.fr/recherche/unites/Binfs/aria/
How many restraints to get a
high-resolution NMR structure?
• usually ~15-20 NOE distance restraints per residue, but the total # is
not as important as how many long-range restraints you have,
meaning long-range in the sequence: |i-j|> 5, where i and j are the two
residues involved
•
good NMR structures usually have ≥ ~3.5 long-range distance
restraints per residue in the structured regions
• to get a very good quality structure, it is usually also necessary to have
some stereospecific assignments.
28
NMR data 2: H-bonds
• Usually inferred from H2O/D2O exchange
protection; Hence a priory not known which
groups form the H-bond. Hence only used during
structure refinement to improve convergence, and
precision of the family of structure.
– significant impact on structure quality measures
29
Backbone Hydrogen Bonds
C=O
H-N
• NH chemical shift at low field (high ppm)
• Slow rate of NH exchange with solvent
• Characteristic pattern of NOEs
• (Scalar couplings across the H-bond)
When H-bonding atoms are known  can
impose a series of distance/angle constraints to
enforce standard H-bond geometries
30
NMR data 3: J couplings
H

N Ca
3J(H ,H )
N 
10
3
6
H
b
4
,310
3
JHNHH (Hz)
8
2
0
-180
J=6.4cos2 -1.4cos+1.9
-120
-60
HN
0

60
120
180
q = -60º
H
N
31
Dihedral angles from
scalar couplings
•
•
• •
6 Hz
 Must accommodate multiple solutions multiple J values
But database shows few occupy higher energy conformations 32
Dihedral angle potential
• Convert J data into allowed dihedral angles and
introduce a restraining potential to maintain the
allowed angles
• Directly restrain against J-couplings
• V=kj (Jobs-Jcalc)2
33
Orientational constraints from
residual dipolar couplings (RDC)
Ho
Reports angle of inter-nuclear
vector relative to magnetic field Ho
F2
F3
F1
Requires medium to partially align molecules
Must accommodate multiple solutions multiple orientations
34
35
Alignment tensor and RDC: DAB
DAB(q,) = DaAB{ (3cos2q-1) + R(sin2qcos2)3/2}
36
15N-1H
dipolar couplings
A
5% (w/v) DTDPC:DHPC (3:1)
neutral
(a) + 3% CTAB
positive
0
20
40
60
80
100
residue
37
Structure refinement
with NOEs
NOEs & RDC (A)
NOEs & RDC (A) + (B)
7.3 ± 3.1Å
4.5 ± 2.1Å
3.4 ± 1.5Å
38
Methods for structure calculation
• distance geometry (DG)
• restrained molecular dynamics (rMD)
• simulated annealing (SA)
• hybrid methods
39
Starting points for calculations
• to get the most unbiased, representative ensemble, it is wise to start the
calculations from a set of randomly generated starting structures.
• Alternatively, in some methods the same initial structure is used for each trial
structure calculation, but the calculation trajectory is pushed in a different initial
direction each time using a random-number generator.
40
DG--Distance geometry
• In distance geometry, one uses the NOE-derived distance restraints
to generate a distance matrix, which one then uses as a guide in
calculating a structure
• Structures calculated from distance geometry will produce the
correct overall fold but usually have poor local geometry (e.g.
improper bond angles, distances)
• Hence distance geometry must be combined with some extensive
energy minimization method to generate physically reasonable
structures
41
分子动力学模拟
KNOE(rij-riju)2
• 模拟分子随机运动,使其达到能量 V
的最小值
• 约束条件转化为MD中的能量项
NOE
=
0
KNOE(rij-rij1)2
if rij>riju
if rijl<rij < riju
if rij<rijl
Vtotal= Vbond+ Vangle+ Vdihedr+ Vvdw+
Vcoulomb+ VNMR
• 模拟退火:克服局部的能量最小点
• 计算多个结构,取能量较低的若干
结构作为结果Ensemble
(NMR信息的不完备和不精确)
42
Restrained molecular dynamics
• Molecular dynamics involves computing the potential energy V with
respect to the atomic coordinates. Usually this is defined as the sum
of a number of terms:
Vtotal= Vbond+ Vangle+ Vdihedr+ VvdW+ Vcoulomb+ VNMR
• the first five terms here are “real” energy terms corresponding to
such forces as van der Waals and electrostatic repulsions and
attractions, cost of deforming bond lengths and angles...these come
from some standard molecular force field like CHARMM or
AMBER
• the NMR restraints are incorporated into the VNMR term, which is a
“pseudoenergy” or “pseudopotential” term included to represent the
cost of violating the restraints
43
SA-Simulated annealing
• SA is essentially a special implementation of rMD and uses similar
potentials but employs raising the temperature of the system and
then slow cooling in order not to get trapped in local energy minima
• SA is very efficient at locating the global minimum of the target
function
44
Further refinements
• Refinement of structure including full force field
and e.g. explicit water molecules
– May improve structural quality but may also increase
experimental violations
45
NMR structure calculations
• Objective is to determine all conformations
consistent with the experimental data
• Programs that only do conformational search
lead to bad chemistry  use molecular force
fields improve molecular properties
 Some programs try to do both at once
 Need a reasonable starting structure
• NMR data is not perfect: noise, incomplete data
 multiple solutions (conformational ensemble)
46
NMR ensemble
• NMR methods do not calculate a single structure, but rather repeat structure
calculations many times to generate an ensemble of structures
• Structure calculations are designed to thoroughly explore all regions of
conformational space that satisfy the experimentally derived restraints
• At the same time, they often impose some physical reasonableness on the system,
such as bond angles, distances and proper stereochemistry.
• The ideal result is an ensemble which
A. satisfies all the experimental restraints (minimizes violations)
B. at the same time accurately represents the full permissible conformational
space under the restraints
C. looks like a real protein
47
NMR ensemble
The fact that NMR structures
are reported as ensembles gives
them a “fuzzy” appearance
which is both informative and
sometimes annoying
• Secondary structures well defined, loops variable
• Interiors well defined, surfaces more variable
• Trends the same for backbone and side chains
 More dynamics at loops/surface
 Constraints in all directions in the interior
48
Minimized average structure
• a minimized average is just that: a mean structure is calculated from the ensemble
and then subjected to energy minimization to restore reasonable geometry, which
is often lost in the calculation of a mean
• this is NMR’s way of generating a single representative structure from the data. It
is much easier to visualize structural features from a minimized average than from
the ensemble
• for highly disordered regions a minimized average will not be informative and
may even be misleading--such regions are sometimes left out of the minimized
average
• sometimes when an NMR structure is deposited in the PDB, there will be separate
entries for both the ensemble and the minimized average. It is nice when people
do this. Alternatively, a member of the ensemble may be identified which is
considered the most representative (often the one closest to the mean)
49
NMR structures include
hydrogen coordinates
• X-ray structures do not generally include hydrogen atoms in atomic
coordinate files, because the heavy atoms dominate the diffraction
pattern and the hydrogen atoms are not explicitly seen.
• By contrast, NMR restraints such as NOE distance restraints and
hydrogen bond restraints often explicitly include the positions of
hydrogen atoms. Therefore, these positions are reported in the PDB
coordinate files.
50
Assessing the quality of
NMR structures
• Number of experimental constraints
• RMSD of structural ensemble (subjective!)
• Violation of constraints- number, magnitude
• Molecular energies
• Comparison to known structures: PROCHECK
• Back-calculation of experimental parameters
51
Acceptance criteria: choosing
structures for an ensemble
• typical to generate 50 or more trial structures, but not all will converge to a final
structure that is physically reasonable or consistent with the experimentally
derived NMR restraints. We want to throw such structures away rather than
include them in our reported ensemble.
• these are typical acceptance criteria for including calculated structures in the
ensemble:
– no more than 1 NOE distance restraint violation greater than 0.4 Å
– no dihedral angle restraint violations greater than 5
– no gross violations of reasonable molecular geometry
• sometimes structures are rejected on other grounds as well:
– too many residues with backbone angles in disfavored regions of Ramachandran space
– too high a final potential energy in the rMD calculation
52
Precision of NMR Structures
(Resolution)
• judged by RMSD of superimposed ensemble of accepted structures
• RMSDs for both backbone (Ca, N, CC=O) and all heavy atoms (i.e.
everything except hydrogen) are typically reported, e.g.
bb 0.6 Å
heavy 1.4 Å
• sometimes only the more ordered regions are included in the reported
RMSD, e.g. for a 58 residue protein you will see RMSD (residues 558) if residues 1-4 are completely disordered.
53
Reporting ensemble RMSD
• Two major ways of calculating RMSD of the ensemble:
– pairwise: compute RMSDs for all possible pairs of structures in the ensemble,
and calculate the mean of these RMSDs
– from mean: calculate a mean structure from the ensemble and measure RMSD
of each ensemble structure from it, then calculate the mean of these RMSDs
– pairwise will generally give a slightly higher number, so be aware that these
two ways of reporting RMSD are not completely equal. Usually the Materials
and Methods, or a footnote somewhere in the paper, will indicate which is
being used.
54
Assessing structure quality
• run the ensemble through the program PROCHECK-NMR to assess
its quality
• high-resolution structure will have backbone RMSD ≤ ~0.8 Å, heavy
atom RMSD ≤ ~1.5 Å
• low RMS deviation from restraints (good agreement w/restraints)
• will have good stereochemical quality:
– ideally >90% of residues in core (most favorable) regions of Ramachandran plot
– very few “unusual” side chain angles and rotamers (as judged by those
commonly found in crystal structures)
– low deviations from idealized covalent geometry
55
Structural Statistics Tables
list of restraints, # and type
calculated energies
agreement of ensemble
structures with restraints
(RMS)
precision of structure
(RMSD)
sometimes also see listings of Ramachandran statistics,
deviations from ideal covalent geometry, etc.
56
Structure validation
XPLOR/CNS: Consistency with data?
convergence of structure calculation (eg rmsd over all atoms)
restraint violations?
Procheck: programme that analyses and evaluates a family of structures
i.e. is the structure consistent with what we know about structure ?
residue by residue output
covalent geometry
dihedral angles
non-bonded interaction
main chain H-bonds
stereochemistry
chirality
disulphide bonds
57
结构评价
• 能量
• 二级结构
• 拉氏图
– 尽可能少的氨基酸残基处
于不允许区
• RMSD(均方根偏差)
– 表明结构的收敛程度
58
Example of Procheck results
59
Cross validation
• Leaving out a percentage of experimental
constraints. Recalculating structures and
checking for consistency with unused data
– Can be done with “same type of data” eg NOE
– More often used with NOE’s and RDCs
60
Grx-C1的结构计算
• CYANA:结构初步优化
–
–
–
–
力场相对简单
运行速度快
无需初始结构
结构相对较为粗略
• Amber:结构精修
– 具有更精细的力场参数,使用溶剂化模型(或显式加溶剂)
从而获得更加合理的局部构象
– 需要整体折叠正确的初始结构
– 运算量大,速度慢
61
Grx-C1的结构计算
• CANDID/CYANA得到初始结构(全自动)
– 2 CPU, ~8小时
• SANE-CYANA循环,进行初步优化(半自动)
– 手工分析违约和未指认的NOE
– 每个循环 2CPU ~1小时
– ~20-40个循环
• SANE-AMBER循环,进行结构精修(半自动)
– 每个循环 20 CPU ~15小时
– ~10~30个循环
62
结构计算结果
• PDB 1Z7P(ensemble), 1Z7R(mean) http://www.rcsb.org/pdb
63
64
结构计算结果
• 约束统计
–
–
–
–
NOE:4845
二面角:160
氢键:47
手性:287
• 违约状况
– 距离 无>0.2Å
– 二面角 无
65
结构评价
Most favored regions (%)
88.8
Additionally allowed regions (%)
10.7
Generously allowed regions (%)
0.5
Disallowed regions (%)
0.0
RMSD
All residues
Regular secondary structure
Backbone heavy atoms
0.88
0.32
All heavy atoms
1.13
0.68
66
使用的软件
• NMRPipe 数据处理
http://spin.niddk.nih.gov/bax/software/NMRPipe/
• NMRView 指认分析
http://www.onemoonscientific.com/nmrview/
• CYANA结构计算 500 Euro
http://www.las.jp/prod/cyana/eg/
• TALOS基于化学位移预测主链二面角 (NMRPipe的一部分)
http://spin.niddk.nih.gov/NMRPipe/talos/
• SANE基于结构的NOE自动指认
J Biomol NMR, 2001 19(4) 321-9
• Amber 分子动力学模拟。用于结构优化 $400
http://amber.scripps.edu/
• PROCHECK-NMR 结构分析与评价
http://www.biochem.ucl.ac.uk/~roman/procheck_nmr/procheck_nmr.html
• MOLMOL 结构分析与绘图
http://hugin.ethz.ch/wuthrich/software/molmol/
67
其他软件
• 数据处理
– Felix $???? http://www.accelrys.com/products/felix/index.html
– AZARA Free http://www.bio.cam.ac.uk/azara/
– PROSA (Free?) http://guentert.gsc.riken.go.jp/Software/Prosa.html
• 指认分析
–
–
–
–
Felix $???? http://www.accelrys.com/products/felix/index.html
XEASY $ 200 http://hugin.ethz.ch/wuthrich/software/xeasy/index.html
Sparky Free http://www.cgl.ucsf.edu/home/sparky/
CARA Free http://www.nmr.ch
• 结构计算
– CNS Free http://cns.csb.yale.edu/
– XPLOR Free http://xplor.csb.yale.edu/xplor/
– XPLOR-NIH Free http://nmr.cit.nih.gov/xplor-nih/
• 分子绘图
–
–
–
–
PyMol Free http://pymol.sourceforge.net/
MolScript Free http://www.avatar.se/molscript/
RasMol Free http://www.openrasmol.org/
VMD Free http://www.ks.uiuc.edu/Research/vmd/
68
Download