Researchers “Shaping” Future of Proteomics by Dennis Meredith With the rough draft of the human genome now completed – as announced to great fanfare last month – researchers now face an even more daunting task of figuring out how the 30,000 or so genes give rise to the biological protein machinery that makes humans uniquely humans. A central problem in this field, called “proteomics,” is how to mathematically describe the intricate folding of proteins. These central molecules to all life begin as one-dimensional linear strings of amino acids, but collapse immediately like intricate origami into the threedimensional proteins such as enzymes that are the workhorse catalysts of biochemical reactions in the cell. Since last fall, researchers from Duke, UNC Chapel Hill, Stanford and North Carolina A&T have been tackling the daunting problems of mathematically describing the shapes of these proteins, the complex contortions they undergo in carrying out cell processes and how their triggering molecules called ligands plug into them. The scientists are working under a $7 million National Science Foundation grant to develop new tools of “computational geometry” that will aid help biologists in taking the next key steps in understanding protein shape and function. Principal investigator on the project is Duke Professor of Computer Science Herbert Edelsbrunner, and his Duke collaborators include Associate Professor of Biochemistry Homme Hellinga and Professor of Computer Science Pankaj Agarwal. They are joined by UNC professors Fred Brooks, Jack Snoeying and Charles Carter; Stanford professors Jean-Claude Latombe, Leonidas Guibas and Michael Levitt, and North Carolina A&T Professor Solomon Bililign. According to the project summary, “This research is expected to shed light on some of the most important unsolved biological puzzles: prediction of protein structure, simulation of protein folding and analysis of ligand to protein docking. These processes link form to function.. “Understanding them will pave the way to a post-genomic era in biological research in which the wealth of DNA sequence information is complemented by corresponding knowledge of geometric shape. Together, sequence and shape will provide a description of the biological function so critical for all life.” According to Edelsbrunner, the prodigious scientific effort needed to move biology into the new realm of three-dimensionality is well worth it. “Since we live in a three-dimensional world, everything is represented geometrically, so the aim of computational geometry and of this project is to represent geometric shapes and do computing about them,” he said. In their work, the scientists will seek to combine two previously mutually exclusive approaches to computing, he explained. “There are really two basic camps in the use of geometric algorithms -- the combinatorial and the numerical,” Edelsbrunner said. “In the combinatorial approach to computational geometry, logic is used at every step of the computations, critically relying on yes or no decisions, and the result being a mathematical tree with branches. “Numerical approaches use approximations and avoid the problem in combinatorial approaches, in which a single small error can lead to the wrong branch and compound itself, with the algorithm breaking down. Our approach will be a combined one, using whichever method is better for the task.” Such efforts are complicated by the fact that researchers will first have to understand where the scientific weaknesses lie in techniques of mathematically modeling proteins. “In addressing the protein-folding problem it is hard to pinpoint whether there’s an essential weakness in the physical understanding of proteins or in the computing needed to describe them,” Edelsbrunner said. “In fact, maybe there’s not a particular culprit; maybe the whole system is just not good enough.” Thus, said Edelsbrunner, he and his colleagues will seek not only to better understand the biology of protein folding, but to speed up the computing tools, called algorithms, so that biologists can perform more experiments to simulate the intricate molecular motions proteins undergo as they fold. “Say, if we can speed up a modeling algorithm by a factor of a thousand, then people can run more molecular dynamics faster, and rather than waiting a week for their results, they can see them in half an hour. Thus, they can perform more simulations and improve the system faster.” Importantly, he said, simulating a protein effectively means simulating not just its shape, but how it moves. “When you address the protein folding problem you must address molecular motion, and the simulation of that motion right now is not good enough,” he said. For example, he said, in building a mathematical simulation of protein motion, researchers must now largely ignore the smaller vibrations of the thousands of individual atoms in the protein, although they can prove important to understand how the larger protein molecule behaves. “If you try to simulate molecular motion, you may go through thousands of iterations in which the whole system does nothing but vibration most of the time, and then occasionally something large moves,” he said. Also, Edelsbrunner said, there are few shortcuts to developing the complex simulation techniques. “The most difficulty we have with our software is that it is so labor-intensive,” he said. “It is sophisticated geometric software that takes a lot of time to develop. And once you have it, it is great, but you always need an expert to create it.” According to Edelsbrunner, the research group consists of teams of mathematicians and biologists, each contributing and teaching their expertise to make the software not only mathematically sound, but biologically useful. For example, while the mathematicians understand the intricacies of modeling techniques, the biologists understand the complex analytical technique of X-ray crystallography – in which beams of X-rays shone through crystals of pure protein yield information about the protein’s structure. The results of their interdisciplinary collaboration will not only be useful new software for biologists, but also new talent. “Part of the success of this project will be the education of postdoctoral students in this interdisciplinary area,” he said. “There is a huge lack of computing and mathematical skills in biology right now,” he said. “And while there are many people interested in the field, there are no training programs. We want to help fill that need.” And the scientists hope their research will yield new discoveries about the shape of the proteins they study and how they interact with ligands in biochemical reactions. All in all, Edelsbrunner said, the project will be not just an experiment in modeling protein structures, but in creating fruitful collaborations among computer scientists and biologists to advance the understanding of life’s processes. “There is certainly a great potential for progress, in which we improve one part of a modeling technique and it allows an advance that enables us to see how to improve or even redesign another part -- and step-by-step, we will build far more useful scientific tools,” Edelsbrunner said.