What about structure? - Protein Evolution (Rob Russell)

advertisement
Protein Structure Prediction
Matthew Betts
Russell Group, University of Heidelberg, Germany
www.russelllab.org
Active/inactive?
Binds/does not bind?
Substrate specificity?
Sequence
Function
Structure
What is this about?
• What we do to find out what a protein
might be doing
• Looking at sequences, with a particular
emphasis on finding out something
about the protein structure
• Some background for practical work
Given a sequence, what should you look for?
• Functional domains (Pfam, SMART, COGS, CDD, etc.)
• Intrinsic features
–
–
–
–
Signal peptide, transit peptides (signalP)
Transmembrane segments (TMpred, etc)
Coiled-coils (coils server)
Low complexity regions, disorder (e.g. SEG, disembl)
• Hints about structure?
Given a sequence, what should you look for?
“Low sequence complexity”
(Linker regions? Flexible? Junk?
Signal peptide
(secreted or membrane
attached)
Transmembrane segment
(crosses the membrane)
Immunoglobulin domains
(bind ligands?)
Tyrosine kinase
(phosphorylates Tyr)
SMART domain ‘bubblegram’ for human
fibroblast growth factor (FGF) receptor 1
(type P11362 into web site: smart.embl.de)
What about structure?
3D
3D
3D
• Intrinsic features general mean trouble for structure determination, so
they are usually skipped
• Knock on effect is that structures for large, flexible multi-domain
proteins are rare
• Structure determination/prediction therefore typically restricted to parts
(with exceptions obviously)
Structure prediction
algorithm
Sequence
Structure
Best predictions are by homology
• Is your sequence homologous to a
known structure?
• If yes, then often very good models of
structure can be constructed.
• This is what we will do in the practical
Homology Modelling
algorithm
+
Homology Modelling Steps
•
•
•
Identify a homologue of known
structure
Get the best alignment of your
sequence to the structure
Model building
– Side-chain replacement
– Loop building
– Optimisation/relaxation/minimisation
Problems with loops
Two subtilisin-like serine proteases
Sanchez et al, Nature Struct. Biol. (Suppl), 7, 986-990, 2001
The Twilight Zone
Sander & Schneider (EMBL, ca. 1990)
1.
2.
3.
4.
Compared all known structures to
each other using sequence
comparison.
For each fragment of a particular
length & sequence identity, simply
asked the question: is the structure
similar or different.
The line to the right is where one
can be 90% confident that an
alignment of a particular length &
sequence identity
Below the line, structures can be
either similar or different: the twilight
zone.
(Basis for much of the sequence
alignment statistics that are now in
use today)
Based on Sander & Schneider,
Proteins, 9, 56, 1991
Similar structures within the twilight zone
sequence identity:
80%
8.8%
4.4%
…can we find these similarities without known
structures if sequence searches fail?
Russell et al, J.Mol. Biol., 1997
Fold Recognition (‘Threading’)
?
>C562_RHOSH
TQEPGYTRLQITLHWAIAGL…
Does the sequence
“fit” on any of a
library of known
3D structures?
?
?
?
?
Fold Recognition (‘Threading’)
Jones, Taylor, Thornton,
Nature, 358, 86-89, 1992.
Residue pair potentials
Phe
GOOD
Asp
Phe
Arg
BAD
Asp
Phe
Fold Recognition
Executive Summary
• Works some of the time
• Probably best at identifying distant
homologues, where sequence identity is
in the twilight zone
• Useful sites:
– 3D-PSSM, FUGUE, (Gen)-Threader
• Meta predictions are the best - combine
all and get a consensus
– E.g. bioinfo.pl/meta
If no homology…
• Is your sequence homologous to a
known structure?
• If no then actual models are less
accurate, but structural insights still
possible
• First, secondary structure prediction
Secondary-structure prediction
algorithm
• Neural networks
• Inductive logic programming
• Spin-glass theory
• Human intuition
Secondary-structure prediction
E.g. Chou & Fasman, 1974
Helix forming: Glu, Ala, Leu
Helix breaking: Pro, Gly
Strand forming: Met, Val, Ile
Strand breaking: Glu, Lys, Ser, His, Asn
Etc.
Numerical approach + simple protocol =
prediction of secondary structure
Said “80%” accuracy. Reality: 50-60%
Tested the method on the same proteins used to
derive the parameters… big no-no.
Homologous proteins add
a lot of information
70% accuracy!
SS pred
What about de novo or ab initio prediction?
• Can you simulate folding using physics to
predict the structure of a protein
• No, not usually.
• However, advances have been made…
• David Baker, co-workers and subsequent
followers: fragment based structure
prediction. De novo not ab initio
Predicting Fragments
Preferences learned from all stretches
with a similar structure
Assembling Fragments
Database of structures
Fragments
matching the
target sequence
Assembly of fragments
Selection of best model
The Prediction Irony
• General trend: increasing accuracy is more a function
of data than algorithms
• In other words: as we know more structure, and
indeed even sequence data, we get better at
predicting
• Probably we will have a perfect algorithm for protein
structure prediction when we know all of the answers
• Structural genomics & the generally increased pace
of structure predictions means there aren’t many
really “new” structures anymore
Things to Remember
• Methods have mostly been developed for soluble,
globular proteins or domains
• Problems with membrane proteins, low-complexity,
etc.
• Many segments in proteins should be studied with
other methods:
– Signal peptides
– TM regions
– Coiled-coils
– Intrinsic Disorder (e.g. http://dis.embl.de)
What we use this for…
We aim to:
•
•
•
•
•
Understand molecular interactions
Predict molecular interactions
Focus on those interactions of biomedical importance
Apply tools to large datasets
Use interaction networks predictively
– To predict new interactions
– To predict other details like pathologies, toxicities
Modelling or predicting interactions
by homology
Your favourite protein
N
Your second favourite protein
C
N
Match to
known structure
C
Match to
known structure
Templates
in contact?
Histidyl adenylate
tRNA Synthetase
Modelled Interaction
Prediction of Structures of Complexes
Five component complex
X-ray
Two-hybrid network
homology
(e.g. blast)
homology
+
homology
Electron microscopy & Mass Spectometry
homology
Russell et al, Curr. Opin. Struct Biol. 2004
Aloy & Russell, Nature Rev. Mol. Cell. Biol. 2006
Taverner et al, Adv Chem. Res. 2008
Adding Mechanisms to
Interaction Networks
Who interacts with whom?
RGS-4
What does the interaction look like?
Ga/q
P
Ga/i
How strong? How fast?
RGS-3
Which piece from which protein?
Bridging the information gap
Modelled complexes
Aloy & Russell, Nature Rev. Mol. Cell. Biol., 2006.
From Proteomics to Cellular Anatomy?
Kuehner et al, Science, 2010
From Proteomics to Cellular Anatomy?
Kuehner et al, Science, 2010
Some Links
www.russelllab.org/aas
Guide to the amino acids
www.russelllab.org/gtsp
Guide to Structure Prediction
meta.bioinfo.pl
Meta server (runs virtually all
reliable prediction methods)
Structure Prediction Practical
www.russelllab.org/wiki
Active/inactive?
Binds/does not bind?
Substrate specificity?
Sequence
Function
Structure
In groups of two or more you will attempt to answer
functional questions about a particular protein target
Acknowledgements
www.russelllab.org
Current group members
Rob Russell (the boss), Matthew Betts, Leonardo Trabuco, Oliver Wichmann,
Mathias Utz, Yvonne Lara
Alumni
Chad Davis, Olga Kalinina, Ricardo de la Vega, Victor Neduva, Evangelia
Petsalaki Damien Devos
Complex modeling & interactions collaborators
Patrick Aloy (IRB Barcelona)
Anne-Claude Gavin (EMBL Heidelberg)
Peer Bork (EMBL Heidelberg)
Luis Serrano (CRG Barcelona)
Achilleas Frangakis (Uni Frankfurt)
Bettina Boettcher (Edinburgh)
Download