CSBSI_2010_ProteinStructureLab

advertisement
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Protein Structure Lab
Michael Zimmermann
Ataur Katebi
Ragothaman Yennamalli
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Structures and Bioinformatics
Detailed genetic
information informs
organism wide views
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Structures and Bioinformatics
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Today’s Plan
1. What are molecular structures?
•
•
Primary, Secondary, Tertiary, Quaternary Structure
Why we need them
2. Where do we get them?
•
•
PDB, NDB, and EMDB
Homology modeling
3. How do they interact?
•
DIP and Docking
4. How do we know what they do?
•
•
Genome annotation (what you’ve been doing)
Molecular motions
I. Molecular Dynamics
II. Normal Mode Analysis (Elastic Networks)
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
What Are Molecular Structures?
(and why are they important?)
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Central Dogma
CGACGGGGACGA
CGGGGACCATTT
GCUGCCCCUGCU
GCCCCUGGUAAA
AAPAAPGK
DNA → RNA → Protein
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Protein secondary structure elements (1arl)
• (H) -helices
• (E) - sheets
• (C) Coils
•Molecules are too small to see
•Artistic depictions are
informative
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Size and Scale
http://learn.genetics.utah.edu/content/begin/cells/scale/
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Protein Structure
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
 Helix
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Parallel  sheet
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Antiparallel  sheet
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Diverse Tertiary Structures
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Importance of the problem
• # sequences >> # number structures
• Secondary structure may be used as an
input for tertiary structure prediction
• 1D problem is easier than 3D
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Scale of Sequence Versus Structure
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
How do we get them?
Databases or
Structure Prediction
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Assignments of secondary
structure
• Crystallographers assign (subjective)
• Automatic assignments from the PDB
coordinates
– Dictionary of Secondary Structure of
Proteins (DSSP)
– Kabsch and Sander 1983 - based on
positions of hydrogen bonds
• STRIDE assignments
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
DSSP assignments
•
•
•
•
•
•
•
•
1.
2
3
4
5
6
7
8
(H) Helix
(E) Strand
(G) 310 Helix
(I) Helix
(B) Bridge (single residue strand)
(T) Turn
(S) Bend
(C) Coil
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Some ambiguity
• Various translations of 8 DSSP states
into 3 secondary structure states
• Two versions of DSSP
– EMBL (Heidelberg) version
• Includes interchain hydrogen bonds
– PDB version
• Excludes interchain hydrogen bonds
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Improvement of prediction by using
multiple sequence alignments
• Zvelebil et al 1987
• Levin, Pascarella, Argos & Garnier 1993
• Rost & Sander 1993
• Accuracy of prediction based on single
sequences ~ 65%
• Accuracy of prediction using multiple
sequence alignments ~ 75% (for the most
successful methods)
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
New improved algorithm (GOR V)
Kloczkowski, Ting, Jernigan & Garnier
• New database of 513 non-redundant sequences
proposed by Cuff and Barton
• Additional statistics of triplets
• Resizable window (size of the window is adjusted to
the length of the sequence)
• Optimization of parameters
– Decision parameters to increase the accuracy of
prediction for -sheets
• Multiple sequence alignments PSI-BLAST (FASTA +
CLUSTAL in an early version)
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
GOR V
>gi|42572793|ref|NP_974493.1|
myb family transcription
factor [Arabidopsis thaliana]
MDNHRRTKQPKTNSIVTSSSEVSSLEWEVV
SQEEEDLVSRMHKLVGDRWELIAGRIPGRT
AGEIERFWVMKN
GOR V server
http://gor.bb.iastate.edu/
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
References
• A. Kloczkowski, K-L. Ting, R.L. Jernigan and
J. Garnier – Protein secondary structure
prediction based on the GOR algorithm
incorporating multiple sequence alignment,
Polymer, 2002, 43, 441-449
• A. Kloczkowski, K-L. Ting, R.L. Jernigan and
J. Garnier – Combining GOR V algorithm with
evolutionary information for protein secondary
structure prediction from amino acid
sequence, Proteins; Structure, Function
Genetics, 2002, 49, 154-166
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Other methods
• PSIPRED (Neural Network)
http://bioinf.cs.ucl.ac.uk/psipred/psiform.html
• PHD (Neural Network)
http://cubic.bioc.columbia.edu/predictprotein/
• JPRED (Neural Network)
http://www.compbio.dundee.ac.uk/~wwwjpred/submit.html
• SAM-T99 (Hidden Markov Models)
http://www.cse.ucsc.edu/research/compbio/HMMapps/T99-query.html
• META servers
http://cubic.bioc.columbia.edu/predictprotein/submit_
meta.html
» compare with actual structure
» problem of turning into 3D structure
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Retrieving, Viewing,
and Analyzing
Molecular Structure
Files
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Where to get Molecular Files
• http://www.rcsb.org/
• http://ndbserver.rutgers.edu
• http://www.emdatabank.org/
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Molecule Files
• The Protein DataBank (PDB) file 1T3R
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
8
9
10
11
12
13
14
15
16
N
CA
C
O
CB
CG
CD
OE1
NE2
GLN
GLN
GLN
GLN
GLN
GLN
GLN
GLN
GLN
A
A
A
A
A
A
A
A
A
2
2
2
2
2
2
2
2
2
AtomType ChainID
Atom#
Residue
25.279
23.872
23.654
23.996
22.926
21.447
20.558
20.145
20.336
22.419
22.620
24.078
24.956
22.138
22.401
21.549
20.502
21.926
X
Residue#
CSBSI Short Course, June, 2010
34.914
34.516
34.247
35.114
35.611
35.328
36.121
35.662
37.380
Y
Z
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
21.01
17.82
18.11
20.40
19.10
18.52
21.32
22.49
21.05
N
C
C
O
C
C
C
O
N
B-Factor
Element
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
sdf
mol2
MOL2 – SYBYL
Tripos format
SMILES
convert to 3D with CORINA
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Molecular Visualization
UIUC
UCSF
Delano
Scientific
and
Schrödinger
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Homology
Modeling
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Homology Modeling
• Use when sequence identity is > 35%
• 1233 known topologies (CATH)
• ≈70% of protein sequences (~50,000,000)
template selection
sequence-to-structure alignment
model building
model selection and refinement
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Protein Machines
• Most of biochemical processes taking place
in vivo are controlled by proteins:
– gene expression and regulation (nuclear
receptors)
– metabolic pathways (enzymes)
– immune system (antibodies)
– signal transduction (trans-membrane receptors)
– structural (collagen)
• Fully automated
• Highly specific
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Classical Structure Determination
• Proteins’ structures are solved mostly by:
– x-ray crystallography (or SAXS)
– NMR spectroscopy
– Cryo-EM
• All methods require a lot of human input
from highly trained specialists.
• time-consuming
• $10,000 - $1,000,000 for one structure.
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Homology Modeling
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Template Detection
• Sequence-only methods:
– Blast, Fasta scan against PDB database.
– PSI-Blast scan against sequence database.
• Profile comparison:
– Profile-to-profile alignment on structural database.
• Threading:
– Optimal fitting of modeled sequence to structures from
PDB.
• Metaservers:
– Combination of all above (and others).
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Modeling
•
•
•
•
Template is used as a rigid scaffold.
Modeling algorithm rebuilds missing parts (loops)
Template is used as a semi-flexible scaffold.
Usually a great number of models are generated
• Modeller (A. Sali), Rosetta (D. Baker),
CABS (A. Kolinski), UnRes (H. Scheraga), ITASSER (Y. Zhang)
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Homology Modeling Example
See “Homology Modeling.pdf”
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
How do they interact?
DIP: http://dip.doe-mbi.ucla.edu/dip/Main.cgi
ORGANISM
PROTEINS
Drosophila melanogaster
7482
(fruit fly)
Saccharomyces cerevisiae
4943
(baker's yeast)
Escherichia coli
1863
Caenorhabditis elegans
2650
Homo sapiens (Human)
1476
Helicobacter pylori
712
Mus musculus
502
(house mouse)
Rattus norvegicus
163
(Norway rat)
Others (266)
2098
INTERACTIONS EXPERIMENTS
22881
23178
18440
23034
7447
4043
2292
1428
8884
4090
3438
1430
683
917
215
315
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
An Introduction to
Docking
46
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Outline
Introduction to DOCKING
Protein-protein docking
Protein-ligand docking
Protein-ligand Docking – “Hands -on”
47
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
What is docking
Prediction of the optimal physical configuration and energy
between two molecules
The docking problem optimizes:
1. Finds orientation that maximizes the interaction.
2. Searches for minimum energy conformation
3. Predicts structural rearrangement
48
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Why docking?
 Predicting Biomolecular interactions
 Computer aided analysis is time saving
 Automated prediction of molecular interactions is the
key to rational drug design
 Measuring the relative strength of interactions in a
cluster of interacting proteins
 Drug design: Virtual Screening
 Drug molecule database growth
49
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Different types of docking
Protein-protein docking:
Two proteins – aprox. the same
size
Protein-ligand docking
A large molecule (the receptor)
and a small molecule (the ligand)
50
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Rigid body and flexible docking
Rigid body docking:
bond angles, bond lengths, and
torsion angles of the components
are not modified
Flexible Docking:
Permits conformational change
51
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Scoring function
Van der Waals
A/(r6) - B/(r12) where A and B are constants and r is
the distance between them
H-bond:
occurs when one molecule has a Hydrogen atom
close to the docking surface that interacts with an
atom from the second molecule when the docking
occurs
Electrostatics
The most significant force that draws parts of the
molecules closer together or further apart
52
according to their electrical charge.
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
53
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Protein-Protein Docking
Examples
Based on last CAPRI (Critical Assessment
of Predicted Interactions) performances:
• Zdock
• Cluspro
• Autodock
• RosettaDock
• PatchDock
• HADDOCK
54
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Protein-Ligand Docking Examples
•
•
•
•
•
•
•
•
DOCK
Autodock
MOE-Dock
GOLD
FlexX
Glide
Hammerhead
FLOG
55
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Docking Server: ClusPro
http://cluspro.bu.edu/
ClusPro is the first integrated automated server that incorporates both docking and
discrimination steps for structural predictions of protein-protein complexes
Using ClusPro, one can generate many relative orientation/conformations of the 2 proteins
 filter using desolvation + electrostatics potentials  discriminate via clustering  find
the best fit (closest to native structure from x-ray crystallography results) between the 2
proteins
Top ranked predictions of ClusPro  further manual refinement and discrimination using
existing biochemical constraints and analysis to eliminate false positives  test binding
affinity of promising protein pairs in vitro  lead compounds used as starting points for
drug development/optimization
Can use ClusPro to screen databases of various existing, recombinant, or de novo proteins
for their interaction to a protein target of interest
ClusPro can be used to predict either:
How a protein drug may bind (either inhibit or stimulate) a receptor
How 2 proteins bind, and based on the structural details of the interaction 
design/screen for a drug that can inhibit that interaction
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Protein-protein docking
Cyclin docked to
Yeast transcription factor
Ubiquitin-conjugating enzyme
docked to Yeast transcription factor
57
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
DOCK program
Protein-ligand docking
Ligand flexibility is permitted
algorithm's ability to find the lowest-energy
binding mode
force-field based scoring
A function expressing the energy of a system as a sum
of diverse molecular mechanics (or other) terms.
an improved matching algorithm for rigid body
docking and an algorithm for flexible ligand
docking
58
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
61
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
62
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Recap:
•
•
•
•
Molecular Structures
Structure Databases
Homology Modeling
Molecular Docking
Now, what can we learn from motion?
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
How do we know what they do?
• Genome annotation (what you’ve
been doing)
• Molecular motions
I. Molecular Dynamics
II. Normal Mode Analysis (Elastic
Networks)
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Ribosome Simulation
http://www.pnas.org/content/102/44/15854.long
•tMD simulation of 2,640,030 atoms
•CPU time used ≈ 106 hours
•Accommodation occurs ≈ 7/s
•Simulated for a total time of 20ns (2E-8s)
How do we handle these large systems
when MD won’t do?
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Coarse-Grained MD
Coarse-graining plug-in
Existing issues with model
parameters are smoothed or
compounded?
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Elastic Network Models
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Force Field Comparison
ENMs use
Hookean
springs for all
interactions
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Elastic Network Models

 
d ij rc

  0
d ij rc
 N
 ik i  j
 k 1,k  i

V  R T R
2

Ri  Ai Qi cosi t  i 
Spring
Constant
kB
Boltzmann
Constant
rc
Cutoff
Radius
T
Temperatur
e
V
Potential
Energy
N
# of Points
Eigenvalue
ε
Phase
Angle
Q
Eigenvecto
r
ΔR
Fluctuation
ω
Frequency

1
Ri R j  
ZN
Ri R j e
 
  3k B T   1


Vtot
k BT
dR
i  i
ij
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Elastic Network Models
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
http://ignmtest.ccbb.pitt.edu/cgi-bin/anm/anm1.cgi
1. Locate a
structure on PDB
2. Determine its
primary function
3. Submit it to
oANM
4. Relate the
computed
motions to
known functions
CSBSI Short Course, June, 2010
Iowa State
University
Bioinformatics and Computational Biology
Graduate Program
Acknowledgements


Secondary structure prediction slides
generously provided by Dr. Andrzej
Kloczkowski
The homology modeling section of this
presentation is based on a presentation by
Mateusz Kurcinski
CSBSI Short Course, June, 2010
Download