Recursive domains in proteins Teresa Przytycka NCBI, NIH Joint work with G.Rose & Raj Srinivasan; JHU Domain: “Polypeptide chain (or a part of it) that can independently fold into stable tertiary structure” (Baranden & Tooze; Introduction to Protein Structure) Two-domain protein. The 3D structure of a protein domain can be described as a compact arrangement of secondary structures Alpha helix Beta strand These arrangements are far from random: There are not so many of them : PDB contains about 17000 structures and less than 1000 different folds. Proportion of "new folds" (light blue) and "old folds" (orange) for a given year. (fold = fold domain) Possible sources of restricted number of folds: • Evolutionary history. – Given enough time would domains look “more random”? • Existence of general restrictions/rules which render some (compact) arrangements of secondary structures non-feasible. – Can real protein domains be seen as sentences in a language, which can be generated by an underlying grammar? Can protein domains be described using a set of folding rules? We restrict our attention to all beta domains: • they admit variety of topologies • they are difficult to predict from sequence Understanding b-folds • Patterns in b-sheets – Richardson 1977 • folding rules for b-sheets – Zhang and Kim 2000 • Hydrogen bonding pattern • Polypeptide chain seems to avoid “complications” Parallel anti-parallel mixed • Properties of b-sandwiches – Woolfson D. N., Evans P. A., Hutchinson E. G., and Thornton J. M. 1993 “forbidden” crossed conformation Expectations for good folding rules • We need to look at fold properties that occur in non-homologous proteins. • Preferably: The provide a model for the folding process. Super-secondary structures as precursors of folding rules • Super-secondary structure – frequently occurring arrangements of a small number of secondary structures • The occurrences of super-secondary structures in unrelated families supports possibility of their independent formation. Example 1: Hairpin b-b-b-unit Example 2: Greek key and suggested folding pathway for it Folding pathway for Greek key proposed by Ptitsyn. Pattern from a Greek vase Two level of folding rules: • Primitive folding rules – based on super secondary structures • Closure operation – allows for hierarchical application of the primitive rules supersecondary structures -primitive folding rules hairpin Hairpin rule Bridge Greek key Direct wind Indirect wind Closure-composite rules • Super-secondary structures are composed of secondary structures that are neighboring in the chain sequence • However from the presence of a super-secondary structure, like a hairpin, in a protein structure follows that residues that are non consecutive become neighboring in space. Closure - “short cut” in the sequence due to a folding rule Example 1 applying folding rules to jelly roll Recursive domains Recursive domain is a part of a protein fold that can be generated using folding rules supported with the closure operation. A protein that can be fully generated using folding rules has one recursive domain. Examples • Example 1 • Example 2 • Example 3 • Example 4 Recursive domains Recursive domain is a part of a protein fold that can be generated using folding rules supported with the closure operation. A protein that can be fully generated using folding rules has one recursive domain. Graph theoretical tools and recursive domains Fold graph: Vertices – strands Edges – two types: Neighbor edges: directed edges between strands that are neighbors in chain or vie the closure operation. Domain edge: edges between stands used in the same folding rule Recursive domains = connected component of the fold graph without neighbor edges. Can the rules generate all known folds? Comparison with the partition for computer generated set of all possible 8-strand sandwiches Partition into recursive components for small (<=10 strands) proteins Control Protein data Distribution of recursive domains in all sandwich like '"folds" recursive domains for proteins with at most 10 strands 3000 45 40 2500 number of folds 30 25 recursive domains 20 15 number of generated "folds" 35 2000 1500 Series1 1000 10 500 5 0 1 2 3 4 5 6 7 8 number of recursive domains 0 1 One recursive fold 2 3 4 5 number of recursive domains 6 7 8 Offenders Hedhehog intein domain Given a fold, is there a unique sequence of folding steps leading to it? Usually no. Usually there alternative sequences of folding steps leading to a construction of the same domain. Do such alternative folding sequences correspond to alternative folding pathways? Are the rules complete? Probably not. e.g.: For propeller, each blade is in one recursive domain but we do not have a rule that will put the blades together. Conclusions: We are getting some idea how things work... It is so nice outside. It would be nice to take the dog for a walk! Nice… dog… walk Conclusions • Protein folds can be described by simple folding rules. • The folding rules capture at least some aspects of fold simplicity and regularity. • The sequence of folding steps leading to a given fold is usually not unique. • The folding rules generate protein-like structures. Future directions • Can folding rules guide fold prediction? • Would hierarchical description of a fold provided by folding rules be useful for fold classification / comparison ? • Adding statistical evaluation of a recursive domain. Acknowledgments George Rose Raj Srinivasen Rohit Pappu Venk Murthy NIH, K01 grant