Project Summary: As a large fraction of the population represented by the baby boomers in the United States and other countries age, addressing the needs and providing the means for human health care become a major challenge facing the society in the 21 century. As skyrocketing medical costs become a major constraint to the development of the society and the well-being of our future generation, we are in urgent need of developing and implementing revolutionary ideas that can offer breakthroughs in providing cost-effective human health care. We propose to establish the Engineering Research Center for Design of Biotherapeutics. The rapid progress of biotechnology (e.g., genome sequencing, global gene expression profiling, structural genomics and proteomics) provides us with a global picture of the molecular and systematic blueprint of living organisms, (e.g., how the genes are expressed and how they interact to carry out the functions of a cell). The advances in computing technology and computational sciences (e.g., algorithms, high performance computing, data mining, data base analysis, artificial intelligence, robotics, visualization) provide the necessary tools and platforms to integrate, interpret, and creatively model as well as make predictions based on such information. Our vision is to develop and apply quantitative and optimized engineering principles in computational measurement, design, and analysis to turn biomedical knowledge into discovery in disease pathogenesis, new diagnostics markers, novel therapeutic drugs, and new health treatment, as biomedical researchers alone do not have the required expertise to accomplish these goals. The specific aims are: 1. Global maps of binding proterties of all known protein structures to all known prescription drugs. Many drugs have strong efficacy for diseases other than what they were originally developed. Conversely, side effects of drugs often arise from unintended binding of drugs to different proteins targets than what are intended. The global maps we developed will provide clear landmarks about utilizing exiting drugs for new disease and reducing potential side effects when developing new drugs. 2. Developing design principles and computational tools. We will develop predictive models and computational tools that can be used to generate candidate compounds and peptides to inhibit biochemical interactions important for pathogenesis of various diseases. 3. Discovery of novel diagnostic target proteins and therapeutic compounds. We will experimentally verify the efficacy of binding properties of designed compounds and peptides, and further refine their properties so they can be used for clinical trials. This will be an iterative process. Project Description 4a. List of Participants and Other Supporters Title of ERC: Engineering Research Center for Design of Biotherapeutics. Lead Institution: University of Illinois at Chicago, Chicago, IL Core Partner Institutions: Argonne National Lab, Boston University University of California at San Diego, University of Illinois at Urbana-Champaign, Argonne, Boston, San Diego, Urbana, IL MA CA IL Affliated Outreach Institutions Chicago State University Chicago IL Leadership Team Director Jie Liang Bioengineering University of Illinois at Chicago Deputy Director Prashant Banerjee Mechanical Engineering University of Illinois at Chicago Educational Program Director Richard Magin Bioengineering University of Illinois at Chicago Educational Outreach Program Director Evelyn Esquivel Bioengineering University of Illinois at Chicago Industrial Collaboration and Technology Transfer Director Dave Carley Bioengineering University of Illinois at Chicago Administrative Director Ying Zhong Bioengineering University of Illinois at Chicago Thrusts Theory and Algorithm Group. Thrust Leader Rong Chen Information and Decision Science UIC Faculty Members in the Thrust: Yang Dai BioE Bhaskar DasGupta CS Eric Jakobsson NCSA Simon Kasif Biomedical Engineering Jie Liang BioE Hui Lu BioE Dan Schoenfeld ECE Klaus Schulten Physics Shankar Subramaniam BioE UIC UIC UIUC Boston University UIC UIC UIC UIUC UCSD Software, Simulation, and Application Group. Thrust Leader Bob Grossman, Mathematics UIC Faculty Members in the Thrust: Prashant Banerjee ME Yang Dai BioE Tom DeFanti CS Eric Jakobsson NCSA Simon Kasif Biomedical Engineering Jie Liang BioE Bing Liu CS Hui Lu BioE Klaus Schulten Physics Shankar Subramaniam BioE Clement Yu CS UIC UIC UIC UIUC Boston University UIC UIC UIC UIUC UCSD UIC Design and Application Group. Thrust Leader Michael Johnson, Medicinal Chemistry UIC Faculty Members in the Thrust: Yang Dai BioE Alan Kozikowski Medicinal Chemistry Jie Liang BioE Hui Lu BioE Asra Malik Pharmacology Preb Prabhakar Microbilogy Brenda Russell Physiology John Solaro Physiology Carol Westbrook Medicine UIC UIC UIC UIC UIC UIC UIC UIC Boston University Non-Faculty Investigators from National Lab: Brian Kay Biological Sciences Andrzej Joachimiak Midwest Center for Structural Genomics Argonne National Lab Argonne National Lab Advisory Charles DeLisi (nominated) Roderick Eckenhoff (nominated) Herbert Edelsbrunner (nominated) Zhipei Liang (nominated) Ying Xu (nominated) Biomedical Engineering Medicine CS ECE Biochemistry Industrial/Agency Supporters Abbot Laboratory (in contact) Pfizer, Inc (in contact) Eli Lilly (in contact) Boston U U Pennsylvania Duke University UIUC University of Georgia Pharmaceutical Pharmaceutical Pharmaceutical 4b. Vision and Rationale for the ERC. The vision of the proposed ERC is to bring engineering design principles and computational strategies to develop disruptive technology for drug discovery. Our focus will be based on the wealth of information from rapidly accumulating data from genome sequencing, proteomics mapping, and structural genomics. Our approach is to address two pressing problems that currently do not have existing solutions. The first problem is based on the observation that many drugs have strong efficacy for diseases other than what they were originally developed. Well-known examples include antidepressant drug prozac and erectile dysfunction drug viagra, each was originally developed for another disease. Alternative use of existing prescription drug provide a rapid approach for drug development. However, they are currently discovered through serendipidity. The second problem is based on the fact that side effects of drugs are a serious issue facing health care, and they often arise from unintended binding of drugs to different proteins targets than what are intended. Currently there is no systematic strategy to identify potential side effects of drugs in development, and side effects are a major cause for the high costs associated with each new drug. The proposed research will transform the process of drug discovery as currently practiced in pharmaceutical and biotech industries. With the planned comprehensive mapping of protein surfaces and drugs contained in the global atlas of protein targets for all existing prescription drugs, we can systematically examine potential efficacy of every prescription drug to every protein structure. The proposed research will provide a clear road map of unknown efficacies for existing drugs. In addition, it will bring the computational power into the process of drug discovery and to generate a clear early picture of possible side effects of drugs under development. The assembled research team is uniquely qualified for reaching the goals outlined in this ERC preproposal [1-32]. An important recent develop is the work of the development of comprehensive surface topographic maps of protein structures at UIC. A series of works from the PI’s lab have identified the origin of pockets and voids of proteins [7,9,12, 18, 26] and point towards methods for identifying the small number of surfaces that are biologically important, similarity measure that incorporating shape, sequence, and evolutionary information for discover similar binding properties of proteins surfaces. Dr. Yang Dai’s research focuses on the development of several forms of learning models. The related work includes a new method that enables high quality prediction of drug activity based on large scale set of descriptors of compounds in Quantitative StructureActivity Relationships (QSAR) analysis; a category of classification methods for the prediction of peptide binding to molecules; and a graphical learning model that allows the reconstruction of gene regulatory network based on modulated gene expression data. This model is particularly useful for the identification of drug target genes and the investigation of responding genes in drug response cascade. The research in Lu lab focuses on structure and dynamics modeling of protein binding, which include protein-protein, protein-DNA and protein-membrane interactions. In protein-protein interaction, we have developed statistical potentials that can discriminate true binding from false ones and a threading based interaction prediction protocol. Furthermore, we have analyzed the dynamics behavior of protein binding site, which can be used in prediction. In protein-DNA interaction, we have combined machine learning techniques and biophysical analysis to build a systematic framework to build the transcription factor prediction and their binding sites in DNA promoter sequences. The results can be applied to predict the drug effect on gene regulation include both the protein-drug binding and DNA-drug binding. In protein-membrane interaction, we have focused on signaling protein and how their binding affinity is changed by interacting with lipid and drug molecules. Other major contributors include Dr. Bhaskar DasGupta at UIC, who have developed a series theoretical framework including information compressed DNA coding for minimal length discrimination of strains of microbes as bioterror weapons, protein substructure comparison independent of sequence connections, optimal solution to protein-protein network based on a formulation of coupled ODEs. Dr. Simon Kasif at Boston University has pioneered many important computational techniques essential in gene finding and gene functional annotation, including one of the first Hidden Markov Models used for genome sequencing. He brings a wealth of experience and in-depth expertise with his recent work on Gibbs random field and Bayesian network for protein-protein networks. Dr. Shankar Subramaniam at UCSD pioneered evolutionary approach for system biology, where various phenotypes of cells are mapped to specific pathways in the regulatory networks of protein interactions. Dr. Klaus Schulten at UIUC is a world leading expert in computer simulation of protein dynamics and pioneered in parallel simulation and interactive simulation of biomolecules. He brings the insight and expertise for large scale and realized simulation of proteins and their interactions. Dr. Eric Jakobsson at NIH, on leave from UIUC, is world leader in protein membrane simulation with the focus on ion channels. He brings expertise in a large class of drug target in ion channels. In addition, the proposed ERC will be draw on the expertise in visualization and virtual reality of the EVL lab at UIC, where Associate Director Dr. Prashant Banerjee who worked closely with Dr. Tom DeFanti in developing the state-of-the-art visualization technique will lead the efforts on developing immersive environment of protein-drug discovery virtual environment for iterative refinement of design strategies with medicinal chemists on this proposal. In addition to developing breakthrough enabling technology, the proposed ERC will also have major impact in education. UIC currently has the only Bioinformatics PHD/MS degree program approved by the Illinois Board of Higher Education in the state of Illinois. With annual incoming students of 7-8, the proposed research will directly involve student team in BioE, CS, ECE, ME, and other disciplines in all phases of the proposed design, computation, analysis, and experimental validation. Currently there are two MD/PHD students are co-trained in bioinformatics group and UIC medical school, and one PhD student is placed in the cardiovascular training grant from UIC physiology department. We plan to strengthen and expand our co-training of students in both computational and experimental skills. The most important key barrier along our way in realizing the goals proposed by the ERC is the integration of theory, computation, and experimental group. To address this issue, we have assembled a strong research teams addressing each of these areas with cross appointment, so experts in different area will provide timely advice and their voices will be attentatively heard. Because the innovative nature of the proposed research, and because the demand in human and financial resources for carrying out these projects, an NSF ERC is the only route towards rapid realizing these goals. 4c. Strategic Research Plan and Research Program. We describe the key aspects of proposed research plan Global atlas of protein targets for all existing prescription drugs. Based on the computational methods for characterizing the binding surface properties of proteins, we are now ready to obtain comprehensive knowledge of both sequence and spatial similarity of protein pockets and voids on a genome scale. Using newly developed shape similarity measure oRMSD [12] and the traditional cRMSD , we can discover functionally relevant similar protein surfaces for many proteins. An example is the strong similarity relationship between the binding surface of HIV-1 protease and a chaperone protein HSP-90 [12]. This can be rationalized by the fact that both binds to peptide or peptide-like substrate in cell events. This functional relationship was previously unknown, and points to the potential for identifying alternative target proteins for existing drugs. We believe the proposed global atlas will provide a clear road map of unknown efficacies for existing drugs [7,12, 13]. Structural Basis of Disease-Causing nsSNPs. An important result from analyzing protein structure is relating genotypes with phenotypes. From a study of nonsynonymous SNP, we found that non-disease associated nsSNPs are under different selection pressure [14]. nsSNPs not associated with disease are occurring more frequently in positions of genes whose wild-type residue is already different from that of the consensus residue, and disease-associated nsSNPs are more likely to be mutations from the consensus residues. This result suggest that consensus residues are phenotypically important. In addition, the majority (88%) of disease-associated nsSNPs are located in voids or pockets of proteins, but they are infrequently observed in buried interior (3.2%), or in convex/shallow surface regions. This suggests that protein-protein interaction and membrane binding interface are not a great source of disease-causing nsSNPs. For SNPs mapped to protein interior, disease-associated mutations are more likely to be at conserved residues, but nsSNPs found in pockets and voids are not strongly conserved. In contrast, nsSNPs in control dbSNP database do not have these clear patterns [9, 14]. Global image of binding side effects of all known drugs on all known proteins. Protein targets through systems biology. To further identify drug potency and side effect, we plan to map all protein-protein and protein-DNA interactions into regulatory, signaling, and metabolic pathways. Very rarely does a protein perform a single biological function. Our visual perception involves many proteins (e.g., rhodopsin, G-protein, phosphodiesterase) that form a complex network [23-25]. Thus the understanding of the structure and the dynamics of the complex web of interactions is crucial in identifying drug targets and drug effects. We will focus on protein interaction networks and transcription factor-DNA interactions that are important for developmental biology and for cancer development. Origin of Pockets and Voids in Proteins. From the comparative study of protein packing using off-lattice models and real protein, we found that the widely used radius of gyration is a poor parameter describing protein packing [26,28, 32]. Instead, we developed an alpha contact number that is based on weighted Delaunay triangulation and alpha shape of the protein [26]. It characterizes protein packing well. We also show that protein-like scaling relationship between packing density and chain length is observed in off-lattice sel-avoiding walks. Our results suggest that protein-like packing is a generic feature of random polymers satisfying loose constraint in compactness. The main conclusion we draw is that proteins are not optimized by evolution to eliminate packing voids, and the existence of a large number of pockets and voids are natural results of basic packing requirement of proteins. This points out the importance of discriminating those pockets and voids that are biologically relevant from those formed due to random packing. Membrane Proteins. Membrane proteins are an important class of proteins of direct medical relevance, as >50% of drug targets are various G-protein coupled receptors, which are 7-helical transmembrane proteins. In membrane protein studies, we plan to develop a set of interhelical three-body interaction parameters for helical membrane proteins. We plan to identified high propensity triplets, and determine whether they are involved in interhelical H-bond interactions. We also hope to discover spatial-motifs of triplet interactions with similar parallel or antiparallel orientations and handedness. These spatial motifs will be examined for correlation with sequence motifs obtained from genome analysis. We will study specifically the role of interhelical H-bonds are important for function and stabilities of Ca$^{++}$-transporting ATPase as a pilot system. The importance of these studies is that the results will be incorporated in computational methods for predicting membrane protein structure and the results will be useful for studying the details of membrane protein folding. The assembled PIs have extensive experience in studying membrane proteins [15-17, 19, 22, 30, 31]. Comprehensive global structural atlas of protein-protein interaction maps. Protein drugs are a new class of therapeutic agents that can interfere with many important protein-protein interactions. They are the wave of the future and represent the fastest growth area in pharmaceutical industry. We plan to devise a method for engineering optimally biased peptide libraries, which will lead to the creation of novel protein drugs with low side effects. The PI’s lab has recently started several projects on protein-protein interactions and on peptide design [1, 2]. Experimental Design and Test. To test the whether additional efficacy are indeed found for existing drugs for alternative target proteins, we will first test in vitro using binding essays to assess the binding affinities of purified target proteins and existing prescription drug. Dr. Michale Johnson, an world-expert in computational drug discdovery, who leads the $15million NIH Center of Pharmaceutical Technology at UIC, will lead in this effort. Theoretical Development. The outcome of the proposed ERC research will also include major algorithm and theoretical model development. Specifically, we will introduce various geometric and topological description of proteins structures, including the recently developed mixed cell theory from Dr. Edelsbrunner’s group for analyzing the pattern of protein-ligand and protein-protein interactions. Additionally, we will make major contributions in development of approximation algorithms for protein structures, constrained sampling methods for high dimensional space, and forced molecular dynamics simulations. Organization and Structure, Theory and Algorithm Group: formulate problems, develop theoretical models, and design efficient algorithms for proteins structures, binding surfaces, reconstruction of evolutionary history, dynamics, protein folding and binding. (Chen, Dai, DasGupta, Jakobbson, Kasif, Liang, Lu, Schoenfeld, Schulten, Subramaniam) Software, Simulation, and Database Group: develop and implement software tools for high performance computing, for interactive modeling and visualization, and for large scale simulation of interactions of all proteins versus all drugs (Banerjee, DeFanti, Grossman, Kasif, Liang, Liu, Lu, Schulten, Subramaniam, Tsai, Yu). Design and Application Group: designing small molecule compounds, peptide modulators, and protein drugs based on algorithm development and computational simulations. Design and formulate biological experiments for data gathering, for prediction verification, and for formulating new design strategy (Dai, Joachimiak, Kay, Johnson, Liang, Lu). Final tests of potency and efficacy of designed compound, peptides, and protein drugs in test tube systems and in other assay systems. We will focus on diseases systems that are currently the research topics of team members. These include: inflammatory and immune responses and vaccine development of anthrax and other microbes (Prabhakar), cardiac muscle contraction and heart failure for regulation of protein synthesis (Solaro, Russell), inflammatory diseases, cancer metastasis (Malik and Westbrook). 4d. Education Program, An independent bioinformatics degree program has been setup within the Bioengineering department. The program awards PhD and MS degrees in bioinformatics. The graduate curriculum includes 7 newly developed bioinformatics courses ranging from sequence analysis to structure and dynamics, as well as related mathematics and algorithms. The students are also required to take 2 to 3 biology courses from UIC medical school, and several algorithm and math courses from Engineering campus. Currently there are 30 graduate bioinformatics students which will naturally included in the proposed ERC. The students will participate with various components of the proposed research activities. Each year, we have from 50 to 100 applicants with background from engineering to physics chemistry to biology. We are able to recruit top level graduate students. The three PhD graduated from our young program are in top labs for postdoc or in big pharmaceutical companies. Besides the formal training in bioinformatics graduate program, we also participated in teaching bioinformatics and computational biology in UIC medical schools such as lectures in courses “Biology methods” and “medical chemistry”. These activities will be expanded once the ERC has been setup. Outreach Program, Bioinformatics usually has little exposure to high school student. Built upon successful experience of UIC Bioengineering high school summer camp, we will work closely with the Chicago Public School system to increase exposure of bioinformatics research to high school and community college students from the Chicago metropolitan area. The bioinformatics lectures and projects assigned to high school students in the past three BIOE summer camp had very good responses from high school students. Out future outreach plan will be achieved through: (1) Summer camp activities, (2) Visits to local high school, (3) Hosting of interns in participating research labs, and (4) Development of educational and career material on bioinformatics. Diversity Program, The lead institution UIC is an urban research institution. It has a much higher percentage of minority and women student bodies that are traditionally under represented in engineering, providing great opportunity for our diversity program in training students in teaching, curriculum development, and in research. Our outreach efforts will also have a special focus to attract students not traditionally represented in academic research. We will enhance our current collaboration with Truman college, part of the City Colleges of Chicago. We will co-develop curriculum and train their instructors for teaching Bioinformatics courses at the Associate Degree level. 4.e. Industrial Collaboration and Technology Transfer Program. We plan to disseminate our results through collaborations with industry and transfer technologies developed during the funded project period. Currently there exist extensive industrial collaborations and tech transfer activities at participating labs at UIC, Boston U, UIUC, UCSD. We will continue our summer intern experience with Pfizer and explore additional opportunities for our students and trainees in industry.