Chapter 21 Predictive protein networks and identification of drugable targets in the beta-cell Joachim Størling and Regine Bergholdt Abstract A prerequisite for designing good drugs that perform through clinical development with the final goal to treat human diseases is a detailed understanding of the mechanisms underlying disease. This is particularly true for complex diseases such as diabetes. It has become increasingly clear that complex traits or phenotypes are the result of an interplay between environmental factors and numerous genes and proteins that jointly affect the functionality of biological systems. Since interactions between proteins in networks and pathways make up biological systems, it is essential that we learn more about how networks and pathways are influenced by environmental factors and genetic variation, and how such influences cause disease. In this chapter, we will discuss recent data, advancement and ideas on how more valid drugable targets to treat diabetes may be predicted by the application of bioinformatics and systems biology. Keywords: beta-cells, diabetes etiology, drug targets, GWAS, phenotype description, protein networks, systems biology 21.1 The need for new ways of identifying drugable targets Tens of billions of Euros and dollars are spent each year by the pharmaceutical industry on the development of new drugs to treat human diseases. However, Joachim Størling and Regine Bergholdt, Hagedorn Research Institute, Niels Steensensvej 1, DK-2820 Gentofte, Denmark. E-mail: jstq@hagedorn.dk, rber@hagedorn.dk 21-2 drug discovery is an extremely expensive and risky business, and despite the enormous investment in drug discovery, the rate of failure of drug candidates in clinical development is dreadfully high. One explanation for this is that the strongly restricted genetic and epigenetic backgrounds and environmental settings of simple animal- and in vitro cell systems used to model human disease and preclinical drug testing, differs greatly from the genetically, environmentally and epigenetically much more heterogeneous nature of the human population. Another explanation is that drug discovery traditionally has been aiming at designing drugs against targets considered to affect simple biological systems or signalling pathways, and such an approach represents an exceedingly simplistic view of the mechanisms underlying complex human diseases [1]. An improvement of the success rate of drugs in clinical development will require new approaches to pinpoint more valid drug target candidates for preclinical testing. Obviously, a prerequisite for this will be an improved understanding of disease mechanisms and increased insight into the complex biological systems in tissues and cells in a heterogeneous human population. This is the true challenge and entails innovative ways of studying disease and disease model systems and highlights the need for systems biology and bioinformatics approaches. 21.2 How can drug target identification be optimized? Improved prediction of valid drug targets will require increased insight into the specific biological and molecular systems in tissues and cells that are responsible for causing disease. Most human diseases, including type 1 and 2 diabetes which are the result of complete or relative destruction and dysfunction of the beta 21-3 cells, are caused by a complex interplay between environment and genes. The interaction between environmental factors and the genetic background of an individual affects susceptibility to disease and progression of disease. Also the response to drug treatment is determined by the individual´s specific environmental and genetic settings. Different genes contributing to a specific phenotype may encode proteins involved in the same biological system or in its regulation. Therefore, causal genes in complex diseases can be expected to affect the functionality of the same protein networks and pathways. If we can improve the prediction, identification and functional validation and characterization of networks involved in disease in carefully selected model systems and in humans, we will have a greatly increased likelihood of choosing the most reliable drugable targets for drug development. This will increase the chance of the drug to endure clinical development. Current drugs to treat type 2 diabetes work by increasing beta-cell insulin secretion, decrease the amount of glucose released from the liver, increase the sensitivity of cells to insulin, decrease the absorption of carbohydrates from the intestine, and slow emptying of the stomach to delay the presentation of carbohydrates for digestion and absorption in the small intestine. Drugs increasing insulin output by the beta-cells have been widely used to treat type 2 diabetes and represent the existing group of diabetes drugs directly targeting the beta-cells. These medications belong to a class of drugs called sulfonylureas, which increase insulin secretion by inhibiting ATP-regulated K+ channels leading to plasma membrane depolarization and influx of Ca2+ that triggers insulincontaining vesicles to fuse with the plasma membrane and release insulin. 21-4 Sulfonylureas are ineffective where there is absolute deficiency of insulin production as in type 1 diabetes. Development of novel drugs targeting the beta-cell may represent new ways of increasing insulin secretion in type 2 diabetes and/or preserving beta-cell mass and insulin secretory capacity in type 1 diabetes. How do we obtain a better knowledge of the pathological mechanisms i.e. which protein networks and pathways that lie behind disease, and what kind of data can be exploited for this purpose? Much knowledge about disease mechanisms and pathologies is to a large extent based on data from animal models and cell systems. However, translation of results from animal and in vitro experiments to humans is often difficult due to the fact that the environmental and genetic settings of model systems are much too simple. Therefore, drug targets should preferentially be identified from a platform of human data. “Integrative genomics” is an emerging, promising field to tackle complex disease. It provides increased knowledge about functional mechanisms underlying disease and thereby an approach to increase our understanding of disease pathogenesis. Disease associated networks today are, however, based on incomplete data, we have not yet characterized rare variation or copy number variation, we do not know enough about non-coding RNAs, alternative splicing, genetic isoforms, heterogeneity among populations, as well as dynamics in molecular systems. Most biological systems are characterized by considerable redundancy and therefore the analysis of genes and proteins in the context of their networks will provide the most important functional and quantitative information. Networks should be seen as a framework of how to explore the context in which a given 21-5 gene operates and to causally associate networks with physiological states associated with disease. This will lead to a more comprehensive understanding and view of disease as compared to examination of individual components of the network. Integrating data like DNA variations, gene expression data, DNAprotein binding and protein-protein interactions and molecular phenotypic data may construct more comprehensive networks and thereby improve understanding of the molecular processes underlying disease. 21.2.1 GWAS and systems biology That diabetes has a strong genetic component is underlined by the fact that the concordance rate for both type 1 and 2 diabetes is up to ~70% in monozygotic twins [2, 3]. Genetic variation may influence protein networks and thus cellular function at several different levels. Changes in amino acid sequence, alterations in protein expression or modification in enzymatic activity etc. can be the result of genetic variation. Such changes to proteins can cause perturbations of the functionality of protein networks. Depending on the degree of disturbances of network function, this can lead to cellular malfunctioning, changes in phenotype, and ultimately to disease. However, genetic variation may account for different levels of risk for disease in different individuals, suggesting that integrative methods for gene discovery are necessary. With the advent in recent years of huge amounts of data from genome-wide association studies (GWAS), transcriptomics and proteomics experiments etc., now increasing focus is on interactions between DNA, RNA and proteins and whole system physiology, as well as integration of large-scale, high through-put molecular and physiological data with clinical data. Genome-wide association studies in complex diseases are 21-6 producing an unprecedented amount of genetic data. However, identifying the individual genes can be difficult because each only contributes weakly to the pathology. Alternatively, identification of entire cellular systems involved in a particular disease could be attempted. Such a strategy should be feasible in many different complex diseases since most genes exert their function as members of molecular networks where groups of proteins contributing to disease may be expected to affect the same biological pathways. Experimental evidence for this is supported by the finding that the expression of genes which are all involved in oxidative phosphorylation is coordinately downregulated in human diabetic muscle [4]. Analysis of an entire disease-related biological system might provide insight into the molecular etiology of the disease that would not emerge from isolated functional studies of single genes. It is clear that results of e.g. GWAS do not themselves directly identify clinical useful drug targets, but by integrating GWAS data with other types of data and more refined phenotyping, this may well be possible. Genetic disease loci for diabetes typically only confers modest disease risk and only for very few are the causal genes known. Even replicated disease associations do not provide clues about the functional roles of a given candidate gene. A genetic association is not enough for drug development strategies. There is no doubt that additional functional support is needed such as evaluating potential causal genes in the broader biological context in which they operate. The most likely causal candidate gene for an association may or may not be genes in closest proximity of the associated single nucleotide polymorphism (SNP). However, a combination of such knowledge with an evaluation of the 21-7 biological function of the genes, e.g. in expressional profiling studies under disease relevant conditions and in functional studies, may provide insight into the mechanistic nature of complex traits beyond what human genetic association studies can do alone. Use of molecular traits can enhance the interpretation of GWAS results by putting them into a broader biological context and ultimately elucidate the networks defining disease associated processes. 21.2.2 Moving from genomes to networks If genetic data are integrated with networks of physically and functionally interacting proteins, this is likely to increase the probability of identifying positional candidate disease genes and proteins (Fig. 21.1). FIGURE 21.1. INSERT COLOR VERSION HERE. LEGEND: Figure 21.1. Mapping of genetic loci onto a human interaction network. The creation of networks based on protein-protein interactions of proteins encoded by genes in genetic regions associated to disease allows identification of “disease” networks, i.e. networks that are enriched for proteins encoded by genes in these regions. Many disease-associated genes are known today, now the challenging task is to understand how they affect disease risk and how to select key proteins for drug development. As mentioned, diabetes involves multiple interacting genetic determinants, representing functional relationships between genes, in which the 21-8 phenotypic effect of one gene may be modified by another. However, new strategies for detecting sets of marker loci, which are linked to multiple interacting disease genes are in demand. Data mining methods have been used to evaluate genetic interactions [5], and the importance of predicted genetic interactions was in this report supported by comprehensive, high-confidence protein-protein interaction networks of the corresponding regions. This allowed identification of candidate genes of likely functional significance in type 1 diabetes, representing a suggestion of genetic epistasis in a multi-factorial disease supported by protein network analysis with implications for functionality [5, 6]. Another approach for selecting candidate genes of functional importance is transcriptional profiling. Intermediate between DNA variations and variation in phenotype are variation in gene expression, protein expression, protein state and metabolite levels. Such intermediates are believed to respond to variations in DNA and then potentially lead to changes in phenotype and disease state. Following identification of genes there is a huge demand for functional genomics. The number of identified susceptibility genes may continue to grow, and the elucidation of their function in the pathogenesis of diseases, will be important for understanding their molecular pathogenesis. Approaches used will vary according to the function of the genes, but may include expression studies and generation of transgenic and knockout animal models. Whereas the genome is rather static, interaction networks are more dynamic and dependent on the biological context. They might be active only under certain conditions, in certain cell types or stages of development. Ideally, all conditions and cell types should be tested to capture this presumed variability. 21-9 For prioritization of positional candidate genes in genetic association or linkage intervals the use of functional interaction networks (interactomes) may be a valuable method. If intervals obtained for a disease are queried for functional interactions with each other and related to phenotype information for the disease, this holds promise for selection of putative disease genes for further investigation [7, 8]. Such studies have the potential of identifying new, previously unrecognized components of disease mechanisms, as well as of pinpointing the most important protein complexes involved. Furthermore, many diseases have overlapping clinical manifestations/sub-phenotypes and it could be speculated that this may be represented by genetic variation in the same functional pathways. The existence of so called disease sub-networks has been suggested. It was demonstrated that proteins encoded by genes mutated in one inherited genetic disorder, were likely to interact with proteins known to cause similar disorders, presumably by sharing common underlying biochemical mechanisms [7]. The feasibility of constructing such functional human gene networks has been demonstrated and applied to positional candidate gene identification [9]. It was shown that obvious candidate genes are not always involved, and that taking an unbiased approach in finding candidate genes, e.g. by using functional networks may result in new testable hypotheses [9]. 21.2.3 Moving from networks to phenotypes A systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human diseases has been used to create a “phenome-interactome network” [8]. This was the first study to explain disease phenotypes by genome-wide mapping of genetic loci onto a 21-10 human interaction network. This strategy was expanded to include epistasis and statistical methods for evaluating the significance of deduced networks [5]. Protein interaction networks were by this method used to examine whether gene products from interacting genetic regions could also be shown to interact in biological pathways. Support for physical interactions at the protein level for all the predicted genetic interactions were suggested [5], representing a novel exploration of integrative genomics. The resulting networks point directly to novel candidates visualized in context of their interaction network, potentially providing even further biological insight. Another study evaluated changes at the proteome level after exposure of pancreatic insulin-producing cells to proinflammatory cytokines resembling the inflammatory milieu surrounding the islets in type 1 diabetes. That study demonstrated a large protein interaction network containing many of the differentially expressed proteins [10]. Despite use of different species and model systems and unknown dynamic differences in the transcriptome and proteome, a significant overlap existed between genes pinpointed in this study [10] and in other studies [5, 6], providing evidence that common networks and pathways can be identified using different model systems and underlines the power of integrating protein-protein interaction data with genetic data and expression profiling. Major histocompatibility complex (MHC) fine mapping data has been analyzed by the same approach to characterize the MHC susceptibility interactome [11]. This approach allowed identification of functionally important genes and gene-gene interactions independent of the genetic linkage disequilibrium that characterizes the MHC region, as protein-protein interactions are unlikely to depend on linkage 21-11 disequilibrium between the genes encoding the proteins. Approaches like these may be valuable in prioritizing candidate genes in linkage regions or from disease associated regions, in which the disease gene(s) are not known. Information on whether genes from the different loci observed, do interact at a functional level are potentially interesting. Obviously, the input information is crucial for the success of such an approach. Studies will be biased by absence of complete functional information in databases of the majority of genes, and also interaction databases are far from complete. However, hypotheses generated with existing knowledge may be of value, and genes, that would otherwise not have been predicted to be involved in the disease in question, might be identified this way. Data amounts in databases are rapidly increasing. This include increased knowledge regarding genes, proteins, interactions among them, methods integrating high throughput genomic and proteomic approaches, as well as text mining methods extracting functional relationships from the literature. Candidate genes involved in putative interaction networks should be further examined not only at the single gene level, but also in the context of the networks of which they form an integral part. mRNA expression levels for each gene can be evaluated e.g. under different relevant conditions. Genes with differential regulation are believed to be most important. This approach has been used recently evaluating predicted interaction networks in type 1 diabetes [6]. Differential regulation of several genes was demonstrated, e.g. after cytokine exposure of human pancreatic islets, supporting the prediction of the interaction network as a whole as a risk factor. In addition, enrichment of type 1 diabetes associated SNPs in the individual interaction networks were measured 21-12 to evaluate evidence of significant association at network level. This method provided additional support, in an independent dataset, that some of the interaction networks could be involved in type 1 diabetes [6]. 21.2.3 Future directions Systems biology approaches complement more classical analyses of the genetics of complex diseases and may shed light on the underlying biological pathways and help us understand the complex interplay between multiple factors contributing to disease pathogenesis. Combining GWAS, protein networks, molecular biology studies, and phenotype data in searching for functional candidates for observed genetic associations has been shown to be a feasible approach [5, 8]. Characterization of phenotypic effects of SNPs on gene expression or on protein function or interaction will provide a more efficient approach to the identification of risk variants and will provide insights into possible mechanisms whereby these variants modify disease risk. Focusing on interplay between many components in modules or systems may demonstrate how defects in such modules can lead to human disease. Such an understanding is likely to be helpful in defining new key targets for prediction, prevention and improved therapeutic responsiveness. Elucidation of networks and signaling pathways associated with disease and examination of the effects of combinations of experimental changes and variations are important in drug discovery, and a prerequisite in translation of results into clinically useful predictors of disease and drug targets. Interaction networks can identify subnetworks corresponding to functional units in the biological system. Subnetworks associated with disease may link molecular biology to physiology and 21-13 thereby to clinically relevant issues, and the aim is that predictive gene networks can lead directly to discovery of drug targets and biomarkers of disease. For identifying drug targets it is necessary to understand how the causal genes function and act in their biological context. Identified genes from a GWAS may not be chemically suitable as drug targets. However, proteins in the same signaling pathway may constitute more rational and better drug targets. Disease associated genetic loci and intermediate molecular phenotypes that are connected with these loci and cause disease are obvious starting points to uncover the drivers of disease. It is important to evaluate pertubations of networks and pathways with the potential to thereby identify key steps or nodes that drive diseases, and which may act as targets for therapeutic intervention. To develop disease therapies by targeting a given gene it is necessary to know if activation, inhibition or partial activation leads to disease [12]. We can now begin to understand the context in which a gene operates and thereby suggest the best possible points of therapeutic intervention [12]. FIGURE 21.2. INSERT BLACK/WHITE VERSION HERE. LEGEND: Figure 21.2. Strategy for drug target identification. Genome-wide association scan data alone or integrated with transcriptomics-, proteomics-, or epigentics data etc. are used as “input” data. Protein-protein interaction data and the application of bioinformatics and systems biology allow in silico generation of 21-14 networks. Text mining analysis of these networks for enrichment of proteins with association to disease phenotype leads to a score and ranking of each network. This will end up in a list of potential candidate proteins whose functional relevance can be tested in model systems using e.g. RNA interference. From the outcome of the functional studies, the most promising drugable targets are selected for drug development. Seen as a whole, this method will from a platform of thousands of data, step by step narrow down the number of candidate proteins ultimately resulting in identification of a few numbers of plausible drug targets. Systems biology approaches to develop drugs to treat human diseases is of high interest and with the high cost of developing novel therapies, improved ways of selecting valid drug target candidates are extremely important. Novel and highly interdisciplinary systems biology approaches are likely to identify networks from which the most rational target can be selected. We are still far from a comprehensive understanding of the molecular pathogenesis of multi-factorial diseases. This makes it difficult to identify optimal strategies for intervention and treatment. The recent success of GWAS and the prospects for combining genetics with high-throughput genomics, as well as general advances in genome informatics, genotyping technology, statistical methodology and large clinical materials are sources of optimism for the future. References: 21-15 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Zhu, J., B. Zhang, E.E. Schadt, D.C. Rao, and C.C. Gu, A Systems Biology Approach to Drug Discovery, in Advances in Genetics. 2008, Academic Press. p. 603-635. Hyttinen, V., J. Kaprio, L. Kinnunen, M. Koskenvuo, and J. Tuomilehto, Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs: a nationwide follow-up study. Diabetes, 2003. 52(4): p. 1052-1055. Ridderstråle, M. and L. Groop, Genetic dissection of type 2 diabetes. Molecular and Cellular Endocrinology, 2009. 297(1-2): p. 10-17. Mootha, V.K., C.M. Lindgren, K.-F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstrale, E. Laurila, N. Houstis, M.J. Daly, N. Patterson, J.P. Mesirov, T.R. Golub, P. Tamayo, B. Spiegelman, E.S. Lander, J.N. Hirschhorn, D. Altshuler, and L.C. Groop, PGC-1[alpha]-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics, 2003. 34(3): p. 267-273. Bergholdt, R., Z. Størling, K. Lage, E. Karlberg, P. Òlason, M. Aalund, J. Nerup, S. Brunak, C. Workman, and F. Pociot, Integrative analysis for finding genes and networks involved in diabetes and other complex diseases. Genome Biology, 2007. 8: p. R253. Bergholdt, R., C. Brorsson, K. Lage, J.H.i. Nielsen, S.r. Brunak, and F. Pociot, Expression Profiling of Human Genetic and Protein Interaction Networks in Type 1 Diabetes. PLoS ONE, 2009. 4(7): p. e6250. Gandhi, T.K.B., J. Zhong, S. Mathivanan, L. Karthick, K.N. Chandrika, S.S. Mohan, S. Sharma, S. Pinkert, S. Nagaraju, B. Periaswamy, G. Mishra, K. Nandakumar, B. Shen, N. Deshpande, R. Nayak, M. Sarker, J.D. Boeke, G. Parmigiani, J. Schultz, J.S. Bader, and A. Pandey, Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nature Genetics, 2006. 38(3): p. 285-293. Lage, K., E. Karlberg, Z. Størling, P. Olason, A. Pedersen, O. Rigina, A. Hinsby, Z. Tümer, F. Pociot, N. Tommerup, Y. Moreau, and S. Brunak, A human phenome-interactome network of protein complexes implicated in genetic disorders. Nature Biotechnology, 2007. 25(3): p. 309-316. Franke, L., H. van-Bakel, L. Fokkens, E.D. de-Jong, M. Egmont-Petersen, and C. Wijmenga, Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. American Journal of Human Genetics, 2006. 78(6): p. 1011-1025. D'Hertog, W., L. Overbergh, K. Lage, G.B. Ferreira, M. Maris, C. Gysemans, D. Flamez, A.K. Cardozo, G. Van den Bergh, L. Schoofs, L. Arckens, Y. Moreau, D.A. Hansen, D.L. Eizirik, E. Waelkens, and C. Mathieu, Proteomics Analysis of Cytokine-induced Dysfunction and Death in Insulin-producing INS-1E Cells: New Insights into the Pathways Involved. Mol Cell Proteomics, 2007. 6(12): p. 2180-2199. Brorsson, C., N.T. Hansen, K. Lage, R. Bergholdt, S. Brunak, and F. Pociot, Identification of T1D susceptibility genes within the MHC region by combining protein interaction networks and SNP genotyping data. Diabetes, Obesity and Metabolism, 2009. 11(s1): p. 60-66. Schadt, E., B. Zhang, and J. Zhu, Advances in systems biology are enhancing our understanding of disease and moving us closer to novel disease treatments. Genetica, 2009. 136(2): p. 259-269.