Predicting Functional signatures in Proteins – a computational

advertisement
David Bernick
BME230
Identifying Functional signatures in Proteins – a computational approach
2/16/2016
Abstract
Residues within a protein sequence contribute in varying degrees to the structure that the
protein will adopt and the ability of the molecule to provide value to the organism.
Natural selection provides the selection pressure which evaluates the function(value) of a
protein while imposing selection pressure on the underlying scaffold(structure) that
supports that function. In this way, nature selects for both protein function and structure
in this interdependent manner. We have tools today to computationally evaluate and
predict protein structure, but few techniques exist to computationally evaluate or classify
function. Protein design tools (Rosetta, SAM, etc.) can create computational ensembles
of putative sequences that fit a query backbone. These tools select for near optimal
sequences by minimizing a score (energy) that the sequence would have when placed on
a query fold. The information used by these tools includes structure fragments drawn
from known protein structures, as well as scoring of atomic-scale interactions. Function
specific information is not available to this process. These computational ensembles can
then provide a position specific distribution of residues that can adopt a protein fold,
independent of any functional constraints. This study will use these computational
ensembles as a background distribution to highlight function in natural sequences.
Previous use of structural ensembles by Pei (2003)i, Larson (2003)ii iiiiv, Koehlv, and
Kuhlman(2000)vi have contributed to improvements in function prediction, homology
searches and in examining the energetic optimality of natural protein structures. Koehl
has made use of a euclidian distance score as a measure of the difference in position
specific distributions between aligned positions. This study will extend on this idea, in
order to measure both structural and functional importance of every position by
considering a triangle of three distributions for every position. The points of the triangle
include the distribution found in nature, the distribution found with our computational
ensemble, and the background distribution found across all proteins. We can then
consider the length of each side as a measure of function, structure, and the vector sum of
function and structure. As a measure of confidence, these lengths can be converted to Zscores, normalizing for the variation seen across all positions in the protein.
Additionally, this project can evaluate methods of creating these structural ensembles.
Ideally, a method will produce a good approximation of true residue diversity available at
every position. Any noise induced in the background distribution would increase the
variance in the structure distribution found across all positions in the protein. This
variation would be seen in both scores for function and for structure. The method that
provides the smaller variance in structure and function scores would then be preferred.
There are two methods under consideration for creating structural ensembles. The first
population is generated by building an ensemble of decoys constructed at an elevated
annealing temperature with a flexible protein backbone. The second population of
ensembles is computationally derived from natural structures that adopt the query protein
fold, using a fixed backbone protocol. Both of these techniques attempt to explore the
sequence diversity available to the specific fold.
Previous efforts by this author have shown some success with this technique using SH3,
PH and TIM barrel domains. This study will examine 15+ domains, with the intention of
eventually scaling up to include all known domains.
David Bernick
BME230
Identifying Functional signatures in Proteins – a computational approach
2/16/2016
References
i
Pei, J Dokholyan, DV, Shakhnovich, EI, and Grishin, NV, Using protein design for
homology detection and active site searches. (2003) PNAS 100 no 20:11361-11366
ii
Larson SM, Garg A, Desjarlais, JR and Pande VS Increased Detection of Structural
Templates Using Alignments of Designed Sequences (2003) Proteins: Structure,
Function and Genetics 51:390-396
iii
Larson SM, Garg A, Desjarlais, JR and Pande VS Increased Detection of Structural
Templates Using Alignments of Designed Sequences (2003) Proteins: Structure,
Function and Genetics 51:390-396
iv
Larson SM, Pande VS Sequence Optimzation for Native State Stability Determines the
Evolution and Folding Kinetics of a Small Protein (2003) J. Mol. Biol. 332, 275-286
v
Koehl P and Levitt M, Protein Topology and stability define the space of allowed
sequences (2002) PNAS 99 no 3.
vi
Kuhlman B, Baker D, Native protein sequences are close to optimal for their structures
(2000) PNAS 97:10383-10388
Download