21_Storling-Bergholdt\21_Storling

advertisement
Chapter 21
Predictive protein networks and identification of drugable targets in the beta-cell
Joachim Størling and Regine Bergholdt
Abstract
A
prerequisite
for
designing
good
drugs
that
perform
through
clinical
development with the final goal to treat human diseases is a detailed
understanding of the mechanisms underlying disease. This is particularly true for
complex diseases such as diabetes. It has become increasingly clear that
complex
traits
or
phenotypes
are
the
result
of
an
interplay
between
environmental factors and numerous genes and proteins that jointly affect the
functionality of biological systems. Since interactions between proteins in
networks and pathways make up biological systems, it is essential that we learn
more about how networks and pathways are influenced by environmental factors
and genetic variation, and how such influences cause disease. In this chapter,
we will discuss recent data, advancement and ideas on how more valid drugable
targets to treat diabetes may be predicted by the application of bioinformatics
and systems biology.
Keywords: beta-cells, diabetes etiology, drug targets, GWAS, phenotype
description, protein networks, systems biology
21.1 The need for new ways of identifying drugable targets
Tens of billions of Euros and dollars are spent each year by the pharmaceutical
industry on the development of new drugs to treat human diseases. However,
Joachim Størling and Regine Bergholdt, Hagedorn Research Institute, Niels Steensensvej 1,
DK-2820 Gentofte, Denmark. E-mail: jstq@hagedorn.dk, rber@hagedorn.dk
21-2
drug discovery is an extremely expensive and risky business, and despite the
enormous investment in drug discovery, the rate of failure of drug candidates in
clinical development is dreadfully high. One explanation for this is that the
strongly restricted genetic and epigenetic backgrounds and environmental
settings of simple animal- and in vitro cell systems used to model human
disease and preclinical drug testing, differs greatly from the genetically,
environmentally and epigenetically much more heterogeneous nature of the
human population. Another explanation is that drug discovery traditionally has
been aiming at designing drugs against targets considered to affect simple
biological systems or signalling pathways, and such an approach represents an
exceedingly simplistic view of the mechanisms underlying complex human
diseases [1]. An improvement of the success rate of drugs in clinical
development will require new approaches to pinpoint more valid drug target
candidates for preclinical testing. Obviously, a prerequisite for this will be an
improved understanding of disease mechanisms and increased insight into the
complex biological systems in tissues and cells in a heterogeneous human
population. This is the true challenge and entails innovative ways of studying
disease and disease model systems and highlights the need for systems biology
and bioinformatics approaches.
21.2 How can drug target identification be optimized?
Improved prediction of valid drug targets will require increased insight into the
specific biological and molecular systems in tissues and cells that are responsible
for causing disease. Most human diseases, including type 1 and 2 diabetes which
are the result of complete or relative destruction and dysfunction of the beta
21-3
cells, are caused by a complex interplay between environment and genes. The
interaction between environmental factors and the genetic background of an
individual affects susceptibility to disease and progression of disease. Also the
response
to drug treatment
is
determined
by the
individual´s specific
environmental and genetic settings. Different genes contributing to a specific
phenotype may encode proteins involved in the same biological system or in its
regulation. Therefore, causal genes in complex diseases can be expected to
affect the functionality of the same protein networks and pathways. If we can
improve
the
prediction,
identification
and
functional
validation
and
characterization of networks involved in disease in carefully selected model
systems and in humans, we will have a greatly increased likelihood of choosing
the most reliable drugable targets for drug development. This will increase the
chance of the drug to endure clinical development.
Current drugs to treat type 2 diabetes work by increasing beta-cell insulin
secretion, decrease the amount of glucose released from the liver, increase the
sensitivity of cells to insulin, decrease the absorption of carbohydrates from the
intestine, and slow emptying of the stomach to delay the presentation of
carbohydrates for digestion and absorption in the small intestine. Drugs
increasing insulin output by the beta-cells have been widely used to treat type 2
diabetes and represent the existing group of diabetes drugs directly targeting
the beta-cells. These medications belong to a class of drugs called sulfonylureas,
which increase insulin secretion by inhibiting ATP-regulated K+ channels leading
to plasma membrane depolarization and influx of Ca2+ that triggers insulincontaining vesicles to fuse with the plasma membrane and release insulin.
21-4
Sulfonylureas are ineffective where there is absolute deficiency of insulin
production as in type 1 diabetes. Development of novel drugs targeting the
beta-cell may represent new ways of increasing insulin secretion in type 2
diabetes and/or preserving beta-cell mass and insulin secretory capacity in type
1 diabetes.
How do we obtain a better knowledge of the pathological mechanisms i.e. which
protein networks and pathways that lie behind disease, and what kind of data
can be exploited for this purpose? Much knowledge about disease mechanisms
and pathologies is to a large extent based on data from animal models and cell
systems. However, translation of results from animal and in vitro experiments to
humans is often difficult due to the fact that the environmental and genetic
settings of model systems are much too simple. Therefore, drug targets should
preferentially be identified from a platform of human data. “Integrative
genomics” is an emerging, promising field to tackle complex disease. It provides
increased knowledge about functional mechanisms underlying disease and
thereby an approach to increase our understanding of disease pathogenesis.
Disease associated networks today are, however, based on incomplete data, we
have not yet characterized rare variation or copy number variation, we do not
know enough about non-coding RNAs, alternative splicing, genetic isoforms,
heterogeneity among populations, as well as dynamics in molecular systems.
Most biological systems are characterized by considerable redundancy and
therefore the analysis of genes and proteins in the context of their networks will
provide the most important functional and quantitative information. Networks
should be seen as a framework of how to explore the context in which a given
21-5
gene operates and to causally associate networks with physiological states
associated with disease. This will lead to a more comprehensive understanding
and view of disease as compared to examination of individual components of the
network. Integrating data like DNA variations, gene expression data, DNAprotein binding and protein-protein interactions and molecular phenotypic data
may
construct
more
comprehensive
networks
and
thereby
improve
understanding of the molecular processes underlying disease.
21.2.1 GWAS and systems biology
That diabetes has a strong genetic component is underlined by the fact that the
concordance rate for both type 1 and 2 diabetes is up to ~70% in monozygotic
twins [2, 3]. Genetic variation may influence protein networks and thus cellular
function at several different levels. Changes in amino acid sequence, alterations
in protein expression or modification in enzymatic activity etc. can be the result
of genetic variation. Such changes to proteins can cause perturbations of the
functionality of protein networks. Depending on the degree of disturbances of
network function, this can lead to cellular malfunctioning, changes in phenotype,
and ultimately to disease. However, genetic variation may account for different
levels of risk for disease in different individuals, suggesting that integrative
methods for gene discovery are necessary. With the advent in recent years of
huge
amounts
of
data
from
genome-wide
association
studies
(GWAS),
transcriptomics and proteomics experiments etc., now increasing focus is on
interactions between DNA, RNA and proteins and whole system physiology, as
well as integration of large-scale, high through-put molecular and physiological
data with clinical data. Genome-wide association studies in complex diseases are
21-6
producing an unprecedented amount of genetic data. However, identifying the
individual genes can be difficult because each only contributes weakly to the
pathology. Alternatively, identification of entire cellular systems involved in a
particular disease could be attempted. Such a strategy should be feasible in
many different complex diseases since most genes exert their function as
members of molecular networks where groups of proteins contributing to
disease may be expected to affect the same biological pathways. Experimental
evidence for this is supported by the finding that the expression of genes which
are all involved in oxidative phosphorylation is coordinately downregulated in
human diabetic muscle [4]. Analysis of an entire disease-related biological
system might provide insight into the molecular etiology of the disease that
would not emerge from isolated functional studies of single genes. It is clear
that results of e.g. GWAS do not themselves directly identify clinical useful drug
targets, but by integrating GWAS data with other types of data and more refined
phenotyping, this may well be possible.
Genetic disease loci for diabetes typically only confers modest disease risk and
only for very few are the causal genes known. Even replicated disease
associations do not provide clues about the functional roles of a given candidate
gene. A genetic association is not enough for drug development strategies.
There is no doubt that additional functional support is needed such as evaluating
potential causal genes in the broader biological context in which they operate.
The most likely causal candidate gene for an association may or may not be
genes in closest proximity of the associated single nucleotide polymorphism
(SNP). However, a combination of such knowledge with an evaluation of the
21-7
biological function of the genes, e.g. in expressional profiling studies under
disease relevant conditions and in functional studies, may provide insight into
the mechanistic nature of complex traits beyond what human genetic association
studies can do alone. Use of molecular traits can enhance the interpretation of
GWAS results by putting them into a broader biological context and ultimately
elucidate the networks defining disease associated processes.
21.2.2 Moving from genomes to networks
If genetic data are integrated with networks of physically and functionally
interacting proteins, this is likely to increase the probability of identifying
positional candidate disease genes and proteins (Fig. 21.1).
FIGURE 21.1. INSERT COLOR VERSION HERE.
LEGEND:
Figure 21.1. Mapping of genetic loci onto a human interaction network. The
creation of networks based on protein-protein interactions of proteins encoded
by genes in genetic regions associated to disease allows identification of
“disease” networks, i.e. networks that are enriched for proteins encoded by
genes in these regions.
Many disease-associated genes are known today, now the challenging task is to
understand how they affect disease risk and how to select key proteins for drug
development. As mentioned, diabetes involves multiple interacting genetic
determinants, representing functional relationships between genes, in which the
21-8
phenotypic effect of one gene may be modified by another. However, new
strategies for detecting sets of marker loci, which are linked to multiple
interacting disease genes are in demand. Data mining methods have been used
to evaluate genetic interactions [5], and the importance of predicted genetic
interactions was in this report supported by comprehensive, high-confidence
protein-protein interaction networks of the corresponding regions. This allowed
identification of candidate genes of likely functional significance in type 1
diabetes, representing a suggestion of genetic epistasis in a multi-factorial
disease supported by protein network analysis with implications for functionality
[5, 6]. Another approach for selecting candidate genes of functional importance
is transcriptional profiling. Intermediate between DNA variations and variation in
phenotype are variation in gene expression, protein expression, protein state
and metabolite levels. Such intermediates are believed to respond to variations
in DNA and then potentially lead to changes in phenotype and disease state.
Following identification of genes there is a huge demand for functional
genomics. The number of identified susceptibility genes may continue to grow,
and the elucidation of their function in the pathogenesis of diseases, will be
important for understanding their molecular pathogenesis. Approaches used will
vary according to the function of the genes, but may include expression studies
and generation of transgenic and knockout animal models. Whereas the genome
is rather static, interaction networks are more dynamic and dependent on the
biological context. They might be active only under certain conditions, in certain
cell types or stages of development. Ideally, all conditions and cell types should
be tested to capture this presumed variability.
21-9
For prioritization of positional candidate genes in genetic association or linkage
intervals the use of functional interaction networks (interactomes) may be a
valuable method. If intervals obtained for a disease are queried for functional
interactions with each other and related to phenotype information for the
disease, this holds promise for selection of putative disease genes for further
investigation [7, 8]. Such studies have the potential of identifying new,
previously unrecognized components of disease mechanisms, as well as of pinpointing the most important protein complexes involved. Furthermore, many
diseases have overlapping clinical manifestations/sub-phenotypes and it could
be speculated that this may be represented by genetic variation in the same
functional pathways. The existence of so called disease sub-networks has been
suggested. It was demonstrated that proteins encoded by genes mutated in one
inherited genetic disorder, were likely to interact with proteins known to cause
similar disorders, presumably by sharing common underlying biochemical
mechanisms [7]. The feasibility of constructing such functional human gene
networks has been demonstrated and applied to positional candidate gene
identification [9]. It was shown that obvious candidate genes are not always
involved, and that taking an unbiased approach in finding candidate genes, e.g.
by using functional networks may result in new testable hypotheses [9].
21.2.3 Moving from networks to phenotypes
A systematic, large-scale analysis of human protein complexes comprising gene
products implicated in many different categories of human diseases has been
used to create a “phenome-interactome network” [8]. This was the first study to
explain disease phenotypes by genome-wide mapping of genetic loci onto a
21-10
human interaction network. This strategy was expanded to include epistasis and
statistical methods for evaluating the significance of deduced networks [5].
Protein interaction networks were by this method used to examine whether gene
products from interacting genetic regions could also be shown to interact in
biological pathways. Support for physical interactions at the protein level for all
the predicted genetic interactions were suggested [5], representing a novel
exploration of integrative genomics. The resulting networks point directly to
novel candidates visualized in context of their interaction network, potentially
providing even further biological insight. Another study evaluated changes at the
proteome level after exposure of pancreatic insulin-producing cells to proinflammatory cytokines resembling the inflammatory milieu surrounding the
islets in type 1 diabetes. That study demonstrated a large protein interaction
network containing many of the differentially expressed proteins [10]. Despite
use of different species and model systems and unknown dynamic differences in
the transcriptome and proteome, a significant overlap existed between genes
pinpointed in this study [10] and in other studies [5, 6], providing evidence that
common networks and pathways can be identified using different model systems
and underlines the power of integrating protein-protein interaction data with
genetic data and expression profiling.
Major histocompatibility complex (MHC) fine mapping data has been analyzed by
the same approach to characterize the MHC susceptibility interactome [11]. This
approach allowed identification of functionally important genes and gene-gene
interactions independent of the genetic linkage disequilibrium that characterizes
the MHC region, as protein-protein interactions are unlikely to depend on linkage
21-11
disequilibrium between the genes encoding the proteins. Approaches like these
may be valuable in prioritizing candidate genes in linkage regions or from
disease associated regions, in which the disease gene(s) are not known.
Information on whether genes from the different loci observed, do interact at a
functional level are potentially interesting. Obviously, the input information is
crucial for the success of such an approach. Studies will be biased by absence of
complete functional information in databases of the majority of genes, and also
interaction databases are far from complete. However, hypotheses generated
with existing knowledge may be of value, and genes, that would otherwise not
have been predicted to be involved in the disease in question, might be
identified this way. Data amounts in databases are rapidly increasing. This
include increased knowledge regarding genes, proteins, interactions among
them, methods integrating high throughput genomic and proteomic approaches,
as well as text mining methods extracting functional relationships from the
literature.
Candidate genes involved in putative interaction networks should be further
examined not only at the single gene level, but also in the context of the
networks of which they form an integral part. mRNA expression levels for each
gene can be evaluated e.g. under different relevant conditions. Genes with
differential regulation are believed to be most important. This approach has
been used recently evaluating predicted interaction networks in type 1 diabetes
[6]. Differential regulation of several genes was demonstrated, e.g. after
cytokine exposure of human pancreatic islets, supporting the prediction of the
interaction network as a whole as a risk factor. In addition, enrichment of type 1
diabetes associated SNPs in the individual interaction networks were measured
21-12
to evaluate evidence of significant association at network level. This method
provided additional support, in an independent dataset, that some of the
interaction networks could be involved in type 1 diabetes [6].
21.2.3 Future directions
Systems biology approaches complement more classical analyses of the genetics
of complex diseases and may shed light on the underlying biological pathways
and help us understand the complex interplay between multiple factors
contributing to disease pathogenesis. Combining GWAS, protein networks,
molecular biology studies, and phenotype data in searching for functional
candidates for observed genetic associations has been shown to be a feasible
approach [5, 8]. Characterization of phenotypic effects of SNPs on gene
expression or on protein function or interaction will provide a more efficient
approach to the identification of risk variants and will provide insights into
possible mechanisms whereby these variants modify disease risk. Focusing on
interplay between many components in modules or systems may demonstrate
how defects in such modules can lead to human disease. Such an understanding
is likely to be helpful in defining new key targets for prediction, prevention and
improved therapeutic responsiveness. Elucidation of networks and signaling
pathways
associated
with
disease
and
examination
of
the
effects
of
combinations of experimental changes and variations are important in drug
discovery, and a prerequisite in translation of results into clinically useful
predictors of disease and drug targets. Interaction networks can identify subnetworks corresponding to functional units in the biological system. Subnetworks associated with disease may link molecular biology to physiology and
21-13
thereby to clinically relevant issues, and the aim is that predictive gene
networks can lead directly to discovery of drug targets and biomarkers of
disease.
For identifying drug targets it is necessary to understand how the causal genes
function and act in their biological context. Identified genes from a GWAS may
not be chemically suitable as drug targets. However, proteins in the same
signaling pathway may constitute more rational and better drug targets. Disease
associated genetic loci and intermediate molecular phenotypes that are
connected with these loci and cause disease are obvious starting points to
uncover the drivers of disease. It is important to evaluate pertubations of
networks and pathways with the potential to thereby identify key steps or nodes
that drive diseases, and which may act as targets for therapeutic intervention.
To develop disease therapies by targeting a given gene it is necessary to know if
activation, inhibition or partial activation leads to disease [12]. We can now
begin to understand the context in which a gene operates and thereby suggest
the best possible points of therapeutic intervention [12].
FIGURE 21.2. INSERT BLACK/WHITE VERSION HERE.
LEGEND:
Figure 21.2. Strategy for drug target identification. Genome-wide association
scan data alone or integrated with transcriptomics-, proteomics-, or epigentics
data etc. are used as “input” data. Protein-protein interaction data and the
application of bioinformatics and systems biology allow in silico generation of
21-14
networks. Text mining analysis of these networks for enrichment of proteins
with association to disease phenotype leads to a score and ranking of each
network. This will end up in a list of potential candidate proteins whose
functional
relevance
can
be
tested
in
model
systems
using
e.g.
RNA
interference. From the outcome of the functional studies, the most promising
drugable targets are selected for drug development. Seen as a whole, this
method will from a platform of thousands of data, step by step narrow down the
number of candidate proteins ultimately resulting in identification of a few
numbers of plausible drug targets.
Systems biology approaches to develop drugs to treat human diseases is of high
interest and with the high cost of developing novel therapies, improved ways of
selecting valid drug target candidates are extremely important. Novel and highly
interdisciplinary systems biology approaches are likely to identify networks from
which the most rational target can be selected. We are still far from a
comprehensive understanding of the molecular pathogenesis of multi-factorial
diseases. This makes it difficult to identify optimal strategies for intervention and
treatment. The recent success of GWAS and the prospects for combining
genetics with high-throughput genomics, as well as general advances in genome
informatics, genotyping technology, statistical methodology and large clinical
materials are sources of optimism for the future.
References:
21-15
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Zhu, J., B. Zhang, E.E. Schadt, D.C. Rao, and C.C. Gu, A Systems Biology
Approach to Drug Discovery, in Advances in Genetics. 2008, Academic Press. p.
603-635.
Hyttinen, V., J. Kaprio, L. Kinnunen, M. Koskenvuo, and J. Tuomilehto, Genetic
liability of type 1 diabetes and the onset age among 22,650 young Finnish twin
pairs: a nationwide follow-up study. Diabetes, 2003. 52(4): p. 1052-1055.
Ridderstråle, M. and L. Groop, Genetic dissection of type 2 diabetes. Molecular
and Cellular Endocrinology, 2009. 297(1-2): p. 10-17.
Mootha, V.K., C.M. Lindgren, K.-F. Eriksson, A. Subramanian, S. Sihag, J.
Lehar, P. Puigserver, E. Carlsson, M. Ridderstrale, E. Laurila, N. Houstis, M.J.
Daly, N. Patterson, J.P. Mesirov, T.R. Golub, P. Tamayo, B. Spiegelman, E.S.
Lander, J.N. Hirschhorn, D. Altshuler, and L.C. Groop, PGC-1[alpha]-responsive
genes involved in oxidative phosphorylation are coordinately downregulated in
human diabetes. Nature Genetics, 2003. 34(3): p. 267-273.
Bergholdt, R., Z. Størling, K. Lage, E. Karlberg, P. Òlason, M. Aalund, J. Nerup,
S. Brunak, C. Workman, and F. Pociot, Integrative analysis for finding genes
and networks involved in diabetes and other complex diseases. Genome
Biology, 2007. 8: p. R253.
Bergholdt, R., C. Brorsson, K. Lage, J.H.i. Nielsen, S.r. Brunak, and F. Pociot,
Expression Profiling of Human Genetic and Protein Interaction Networks in Type
1 Diabetes. PLoS ONE, 2009. 4(7): p. e6250.
Gandhi, T.K.B., J. Zhong, S. Mathivanan, L. Karthick, K.N. Chandrika, S.S.
Mohan, S. Sharma, S. Pinkert, S. Nagaraju, B. Periaswamy, G. Mishra, K.
Nandakumar, B. Shen, N. Deshpande, R. Nayak, M. Sarker, J.D. Boeke, G.
Parmigiani, J. Schultz, J.S. Bader, and A. Pandey, Analysis of the human protein
interactome and comparison with yeast, worm and fly interaction datasets.
Nature Genetics, 2006. 38(3): p. 285-293.
Lage, K., E. Karlberg, Z. Størling, P. Olason, A. Pedersen, O. Rigina, A. Hinsby,
Z. Tümer, F. Pociot, N. Tommerup, Y. Moreau, and S. Brunak, A human
phenome-interactome network of protein complexes implicated in genetic
disorders. Nature Biotechnology, 2007. 25(3): p. 309-316.
Franke, L., H. van-Bakel, L. Fokkens, E.D. de-Jong, M. Egmont-Petersen, and C.
Wijmenga, Reconstruction of a functional human gene network, with an
application for prioritizing positional candidate genes. American Journal of
Human Genetics, 2006. 78(6): p. 1011-1025.
D'Hertog, W., L. Overbergh, K. Lage, G.B. Ferreira, M. Maris, C. Gysemans, D.
Flamez, A.K. Cardozo, G. Van den Bergh, L. Schoofs, L. Arckens, Y. Moreau,
D.A. Hansen, D.L. Eizirik, E. Waelkens, and C. Mathieu, Proteomics Analysis of
Cytokine-induced Dysfunction and Death in Insulin-producing INS-1E Cells:
New Insights into the Pathways Involved. Mol Cell Proteomics, 2007. 6(12): p.
2180-2199.
Brorsson, C., N.T. Hansen, K. Lage, R. Bergholdt, S. Brunak, and F. Pociot,
Identification of T1D susceptibility genes within the MHC region by combining
protein interaction networks and SNP genotyping data. Diabetes, Obesity and
Metabolism, 2009. 11(s1): p. 60-66.
Schadt, E., B. Zhang, and J. Zhu, Advances in systems biology are enhancing
our understanding of disease and moving us closer to novel disease treatments.
Genetica, 2009. 136(2): p. 259-269.
Download