Multi-Scale Hierarchical Structure Prediction of Helical

advertisement

Multi-Scale Hierarchical

Structure Prediction of

Helical Transmembrane

Proteins

Zhong Chen and Ying Xu

Department of Biochemistry and Molecular Biology and

Institute of Bioinformatics

University of Georgia

Outline

1. Background information

2. Statistical analysis of known membrane protein structures

3. Structure prediction at residual level

4. Helix packing at atomistic level

5. Linking predictions at residue and atomistic levels

Membrane Proteins

 Roles in biological process:

Receptors;

Channels, gates and pumps;

Electric/chemical potential;

Energy transduction

 > 50% new drug targets are membrane proteins (MP).

Helical structure Beta structure

Membrane Proteins

20-30% of the genes in a genome encode MPs.

< 1% of the structures in the Protein Data Bank (PDB) are MPs difficulties in experimental structure determination.

Membrane Proteins

Prediction for transmembrane (TM) segments ( α-helix or β-sheet) based on sequence alone is very accurate (up to

95%);

?

Prediction of the tertiary structure of the TM segments: how do these α-helices/β-sheets arrange themselves in the constrains of bilipid layers?

Helical structures are relatively easier to solve computationally

Membrane Protein Structures

 Difficult to solve experimentally

 Computational techniques could possibly play a significant role in solving MP structures, particularly helical structures

High Level Plan

 Statistical analysis of known structures:

Unveil the underlying principles for MP structure and stability;

Develop knowledge-based propensity scale and energy functions.

 Structure prediction at residue level

 Structure prediction at atomistic level: MC, MD

 multi-scale, hierarchical computational framework

Part I: Statistical Analysis of Known

Structures

Database for Known MP Structures: Helical

Bundles

 Redundant database

50 pdb files

135 protein chains

 Non-redundant database (identity < 30%)

39 pdb files

95 protein chains (avg. length ~220 AA)

Bi-lipid Layer Chemistry

Polar header

(glycerol, phosphate)

Hydrophobic tail

(fatty acid)

Statistics-based energy functions

 Length of bi-lipid layer:

~ 60 Å

 Central regions

 Terminal regions

 Three energy terms

 Lipid-facing potential

Residue-depth potential

Inter-helical interaction potential

Terminal

60 Å

Central

Terminal

30 Å

Lipid-facing Propensity Scale fraction of AA are lipid-facing

LF_scale(AA) = fraction of AA are in interior

The most hydrophobic residues (ILE, VAL,

LEU) prefer the surface of MPs in the central region, while prefer interior position in the terminal regions;

Small residues (GLY, ALA, CYS, THR) tend to be buried in the helix bundle;

Bulky residues (LYS, ARG, TRP, HIS) are likely to be found on the surface.

This propensity scale reflects both hydrophobic interactions and helix packing

Residue

ILE

VAL

SER

TRP

TYR

PRO

HIS

ASP

GLU

LEU

PHE

CYS

MET

ALA

GLY

THR

ASN

GLN

LYS

ARG

0.60

1.61

1.08

0.93

0.71

0.44

0.61

0.51

1.89

1.04

0.71

1.97

1.16

Central

1.33

1.30

1.30

1.38

0.67

0.80

0.79

1.01

1.27

1.56

2.10

1.02

0.84

0.79

1.04

1.11

0.73

1.44

2.59

1.42

Termini

0.84

0.71

0.89

1.03

0.37

0.57

0.69

Helical Wheel and Moment Analysis

30

20

10

0

-10

-20

The magnitude of each thin-vector is proportional to the LF-propensity and overall lipid-facing vector is the sum of all thin vectors,

-30

-30 -20 -10 0 10

X (Angstrom)

20

* Average Predication Error: 41 degree

30

Lipid facing vector prediction: state of the art kPROT: avg. error ~41 º

Samatey Scale: 61 º

Hydrophobicity scales: 65 ~68 º

Reside-Depth Potential

1.0

LEU

TRP

GLU

0.6

V lp

( a , z )

  ln

N obs

( a , z )

N exp

( a , z ) 0.2

-0.2

-0.6

0 5 10 15 20 z (Å)

25 30

- hydrophobic residues tend to be located in the hydrocarbon core;

- hydrophilic residues tend to be closer to terminal regions;

- aromatic residues prefer the interface region.

35 40

TM Helix Tilt Angle Prediction major pVIII coat protein of the filamentous fd bacteriophage (1MZT)

0.06

0.04

0.02

0

0 experimental value

26 degree

GEM value

23 degree

30

Tilt angle q

(degree)

60 90

23 º

Inter-Helical Pair-wise Potential

V pw

( i , j , r )

  ln

N obs

( i , j , r )

N exp

( i , j , r )

N exp r cutoff

( r / r cutoff

)

15 .

0

Å

N obs

( i , j , r cutoff

)

6.00

4.00

ILE-VAL

GLY-GLY

ARG-ARG

2.00

0.00

1

-2.00

3 5 7 9 11 13 15

Distance (angstrom)

Statistical energy potentials (summary)

1. Three residue-based statistic potentials were derived from the database: (a) lipid-facing propensity, (b) residue depth potential, (c) inter-helical pair-wise potential

2. The lipid-facing scale predicted the lipid-facing direction for single helix with a uncertainty at ~ ±40º;

3. The residue-depth potential was able to predict the tilt angle for single helix with high accuracy.

4. Need more data to make inter-helical pair-wise potential more reliable

Part II: Structure Prediction at

Residue Level

Key Prediction Steps

• Structure prediction through optimizing our statistical potential (weighted sum)

• Idealized and rigid helical backbone configurations;

• Monte Carlo moves: translations, rotations, rotation by helix axis;

• Wang-Landau sampling technique for MC simulation

• Principle component analysis.

Wang-Landau Method for MC

Observation : if a random walk is performed with probability proportional to reciprocal of density of states p ( E )

1 / g ( E ) then a flat energy histogram could be obtained.

The density of states is not known a priori .

In Wang-Landau, g(E) is initially set to 1 and modified “on the fly”. Monte Carlo moves are accepted with probability p ( E

1

E

2

)

 min

 g ( E g ( E

2

1

)

)

, 1

Each time when an energy level E is visited, its density of states is updated by a modification factor f >1, i.e., g ( E )

 g ( E ) f

Wang-Landau Method for MC

Advantages :

1. simple formulation and general applicability;

2. Entropy and free energy information derivable from g(E);

3. Each energy state is visited with equal probability, so energy barriers are overcome with relative ease.

Principal Component Analysis

Purpose:

- analyze the conformation variations during a simulation, and

- identify the most important conformational degrees of freedom.

Covariance matrix :

* A large part of the system’s fluctuations can be described in terms of only a few PCA eigenvectors.

80

70

60

50

40

30

20

10

0

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Eigen Vector

A Model System: Glycophorin (GpA) Dimer

22 residues, 189 atoms

EITLIIF GVMAG VMAGVIGTILLISY

•GxxxG motif

•Ridges-into-grooves

Glycophorin (GpA)

Dimer (1AFO)

0

A: GEM (global energy minimum)

RMSD=3.6A

E=-114.6kcal/mol

-20

-40

-60

-80

-100

-120

0 5

RMSD (Å)

10

B: LEM

RMSD=0.8A

E=-93.9kcal/mol

B A

15

RED : experiment

GREY : simulation

A

B

Helices A and B of Bacteriorhodopsin (1QHJ)

0

-10

A: GEM

RMSD=2.7A

E=-94kcal/mol

-20

-30

-40

-50

-60

-70

-80

-90

0 5 15 10

R MSD (Å)

20

B: LEM

RMSD=0.9A

E=-86kcal/mol

RED : experiment

GREY : simulation

Bacteriorhodopsin (1QHJ)

100

Rmsd=5.0A

0

-100

-200

-300

-400

-500

-600

0

G

F

A

C

B

D

Computational prediction

E

A

5 10

RMSD (Å)

15

Experimental structure

20

Residue-level structure prediction (Summary)

1. A computational scheme was established for TM helix structure prediction at residue level;

2. For two-helix systems, LEM structures very close to native structures (RMSD < 1.0 Å) were consistently predicted;

3. For a seven-helix bundle, a packing topology within

5.0 Å of the crystal structure was identified as one of the LEMs.

Part III: Structure Prediction at

Atomistic Level

Key Prediction Steps

 Structure prediction through optimizing atom-level energy potential:

 CHARMM19 force field for helix-helix interaction

 Knowledge-based energy function for lipid-helix interaction

 Idealized and rigid helix structure for backbone and sidechain flexible;

 Apply helix orientation constraint (i.e., N-term inside/outside cell);

 MC moves: translations, rotations, rotation by helix axis, and sidechain torsional rotation;

 Wang-Landau algorithm for MC simulation

CHARMM19 Polar Hydrogen Force Field

- nonpolar hydrogen atoms are combined with heavy atoms they are bound to ,

- polar hydrogen atoms are modeled explicitly.

V

V

V vdw

V es

V lp

V vdw

 i

 j

 ij

 r ij ij

12

2

 ij r ij

6

E es

 i

 j q i q j

4



0

Dr ij

2D Wang-Landau Sampling in PC1 and E Spaces

1.0E+00

1.0E-02

F

LEM1

E

D

C

LEM2

A

B

1.0E-04

1.0E-06 300K

150K

1.0E-08

-14 -10 -6 -2

PC1 (Å)

2 6

P ( PC 1 )

 

E g ( E , PC 1 ) exp(

E / kT )

10

Effect of Helix-Lipid Interactions: Helices A&B of

Bacteriorhodopsin

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

Helix-helix interactions

2

Helix-helix & helix-lipid interactions b g

4 6

RMSD (Å) d

8

150K

306K

524K

10

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

2 b g

4

RMSD (Å)

6 8

150K

306K

524K

10

Helix-lipid interactions play a critical role in the correct packing of helices

Effect of Helix-Lipid Interactions: Helix A&B of Bacteriorhodopsin (BR)

Hydrocarbon core region

30 Å

RMSD=0.2

Å RMSD=4.4

Å RMSD=5.7

Å RMSD=7.1

Å

All four LEM structures share essentially the same contact surfaces.

In the native structure, the polar N-terminals of both helices are located outside of hydrocarbon core region, resulting in low helix-lipid energy.

Docking of a Seven-helix Bundle:

Bacteriorhodopsin (1QHJ)

7 helices, 174 residues, 1619 atoms

A

• CHARMM19 + lipid-helix potential;

• One month CPU time on one PC

B

Crystal structure

B A

Initial Configuration

Potential Energy Landscape

Rmsd=3.0A

Rmsd=8.0A

Rmsd=8.4A

Rmsd=4.7A

Rmsd=6.6A

Global Energy Minimum Structure

(RMSD=3.0 Å)

RED : experiment

GREY : simulation

Atom-level Structure Prediction (Summary)

1. Wang-Landau algorithm proved to be effective for the energetics study of TM helix packing;

2. Prediction results for two-helix and seven-helix structures are highly promising

3. Practical application of Wang-landau method to large systems requires further work.

Part IV: Linking Predictions at

Residue- and Atomistic levels

Correspondence between simulations at two levels

0

-1

-2

-3

-4

-5

-6 CHARMM19

Knowledge-based

-7

0 5 10

Residue number

15 20

A multi-scale hierarchical modeling approach is feasible and practical:

•LEMs identified at residue-level be used as candidates for atomistic simulation;

•Using PC vectors from residue-level simulation to improve search speed in atomistic simulation.

Future Works

1. Further improvement of the residue-based folding potentials;

2. Speed-up and parallelization of Wang-Landau sampling;

3. Construct a hierarchical computational framework, and develop corresponding software package.

Acknowledgements

1. Funding from NSF/DBI, NSF/ITR, NIH, and Georgia

Cancer Coalition

2. Dr. David Landau (Wang-Landau algorithm) and Dr. Jim

Prestegard (NMR data generation) of UGA

3. Thanks DIMACS for invitation to speak here

Download