ProteinStructurePredictionTalk

advertisement
Protein Structure Prediction
Mason Bially
Types of Structure
• Primary Structure
– The linear amino acid sequence.
• Secondary Structure
– The local three-dimensional structure.
– Defined by hydrogen bonding patterns.
• Tertiary Structure
– The global three-dimensional structure.
– Defined in atomic coordinates.
– The actual function.
• Quaternary Structure
– The arrangement of multiple proteins.
How do we find Secondary Structure?
• Couple Algorithms:
– DSSP (Original, Slight Errors)
– STRIDE (Newer, Sliding Window)
• Requires the primary and tertiary structure.
– Because of this they are exact, not guesswork.
• Finds hydrogen bonds.
– Uses potential energy functions.
• Based on amino acid locations and orientations.
• STRIDE’s is slightly more accurate
– Returns one of 8 types of secondary structure for each amino
acid.
•
•
•
•
3 helix types
2 beta-sheet types
2 turn types
and ‘other’
X-Ray Crystallography
• Shoot X-rays through a crystal
and depending on how the Xrays come back, angle and
intensity, the structure can be
determined.
• Some proteins are challenging
to crystallize (intrinsic
membrane proteins).
• Can handle arbitrarily large
sizes.
NMR Protein Spectroscopy
• Uses Nuclear Magnetic Resonance a
phenomena by which atomic nuclei in a
magnetic field respond to electromagnetic
radiation by reemitting it.
• Has difficulty with large proteins.
• Works on almost anything. (Including proteins
with unstable tertiary structure)
Why do we need Structure Prediction?
• Experimentally Finding tertiary structure has
problems.
– Slow, difficult, hard.
– Some proteins can’t be found experimentally.
• We need to cover more ground, quicker.
– Drug design.
– Bioinformatics tool development.
– More detailed Interactome information.
But isn’t it computationally hard?
• Yes.
• Secondary structure prediction.
– Machine learning methods.
• Tertiary structure prediction.
– Homology Modeling
– Fold Recognition (AKA Protein Threading )
– From scratch (AKA de novo, AKA ab initio)
Basis for Prediction
(Comparative Modeling)
• Protein structure (Secondary and Tertiary) is
evolutionarily more conserved than the DNA
or amino acid sequence.
– Structure is function; changing it would prevent
the protein from doing it’s job.
• Therefore proteins will probably share
structure with each other.
Secondary Structure prediction
• Early attempts. (~60%)
– Chou-Fasman
• Uses the probability of a secondary structure containing an amino
acid.
– GOR
• Bayesian inference applied to the same basic idea.
• Machine learning methods. (~70%)
– Neural networks.
– Support vector machines.
– Hidden Markov models.
• Future.
– Secondary structure is also based on the environment the
protein is folded in.
– Including this metadata to attempt to improve methods.
Homology Modeling
Homology Modeling
• Requires primary structure and a template tertiary
structure.
– Relies on the idea that if one protein has a specific structure, so
do other proteins.
• Only works with relatively similar sequences.
– Sequence identity above 50% is high quality.
• Low quality x-ray crystallography.
– Sequence identity above 30% is medium quality.
• Anything lower degrades rapidly.
– Limited by availability of suitable templates.
– Limited by the ability to accurately align and choose distant
templates.
• Sometimes function/structure will diverge for seemingly
similar targets and templates.
– Happily generates models against incorrect templates.
Homology Modeling
1. Template selection and Sequence alignment
–
Crucial, but relatively simple if a similar sequence exists
(BLAST).
For edge cases:
–
•
PSI-Blast, HMM or profile-profile alignment based.
2. Model Generation
–
–
–
Multiple methods.
Construct the model by placing the amino acids where the
aligned template suggests.
Then refine by going back to the chemistry/physics and fixing
errors.
3. Model Assessment
–
–
–
Make sure the resulting fold is correct.
Detects errors in alignments and template selection.
Sometimes chooses the best of many potential models.
Fold Recognition
Fold Recognition
(AKA Protein Threading)
• Requires primary structure and a library of tertiary
structures.
– Relies on the idea that there are (relatively) few folds (tertiary
structure) of proteins.
• Often feeds final structure back to Homology Modeling
techniques as template to get final model.
• Can use a number of different scoring algorithms.
– Most popular is free energy.
• Attempts to find which templates in the library minimize
the scoring algorithm
– Threading
– Dynamic Programming. (Optimization technique)
– Machine Learning.
• Often finds a large number of results.
How do we know these models work?
• CASP (Critical Assessment of Techniques for
Protein Structure Prediction)
– Every two years.
– Tests blind prediction algorithms.
• In many different categories.
– Since 1994.
• Other variations.
Future
• Mix it all together!
• Including evolutionary information.
– Improves alignment.
– Helps find better folds.
• Structural information.
– Predicted secondary structure can help.
• Mixing with ab initio/de novo methods.
Questions?
• COMPUTATIONAL STRUCTURAL BIOLOGY
Methods and Applications
– By Torsten Schwede and Manuel C Peitsch
• Images from Wikipedia or sources.
Download