Lecture 10

advertisement
Systems Biology
Today’s lecture will cover the following three topics
1. Introduction to transcriptional networks
2. Regulation of the expression of the Lac
operon
3. Finding Biclusters in Bipartite Graphs
transcriptional networks
By the term transcriptional networks we generally
mean gene regulatory networks
Unlike protein-protein interaction networks the
transcriptional networks are directed networks
transcriptional networks: Basic mechanism of gene regulation
transcriptional networks
transcriptional networks
Most genes are regulated at transcription level and it is assumed that
5-10% of protein coding genes encode regulatory proteins.
Some regulatory proteins play targeted role i.e. they take part in
regulation of a few genes.
Some regulatory proteins play more general role in initiating
transcription (for example the eukaryotic transcription factors of type
II or the RNA polymerase itself that is essential for the transcription
of all genes).
It is considered that dedicated regulatory proteins are those that
affect up to 5% genes of a genome.
However the boundary between the generalist and dedicated
regulatory proteins is blurred.
transcriptional networks
Experiments and methods used to determine regulatory relations
1. Complementary DNA microarrays
2. Oligonucleotide chips
3. Reverse transcription polymerase chain reaction
4. Serial analysis of gene expression
5. Chromatin Immunoprecipitation
6. Bioinformatics—e.g. by way of identifying binding sites
Transcriptional Networks: Case study 1
An extended transcriptional regulatory network of Escherichia coli and
analysis of its hierarchical structure and network motifs
Hong-Wu Ma, Bharani Kumar, Uta Ditges2, Florian Gunzer2, Jan Buer1,2 and
An-Ping Zeng*
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
This work combined data sets from 3 different sources:
1. RegulonDB (version 4.0,
http://www.cifn.unam.mx/Computational_Genomics/regulondb/)
2. Ecocyc (version 8.0, www.ecocyc.org)
3. Shen-Orr,S.S., Milo,R., Mangan,S. and Alon,U. (2002) Network motifs in the
transcriptional regulation network of Escherichia coli. Nature Genet., 31,
64–68.
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
Comparison of the TRN of E.coli from three different data
sources (A) Based on number of genes (B) Based on number
regulatory interactions
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
A combined network that includes all the 2624
interactions from the three data sets has been
produced.
In addition, this work extended this network by adding
23 additional genes and around 100 regulatory
relationships through literature survey.
The final TRN altogether includes 1278 genes and 2724
interactions.
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
This work discovered a hierarchical structure in the TRN.
The hierachical structure was identified according to the following way:
(1) genes which do not code for transcription factors (TFs) or code for a TF
which only regulates its own expression (auto-regulatory loop) were
assigned to layer 1 (the lowest layer);
(2) then we removed all the genes in layer 1 and from the remaining
network identified TFs which do not regulate other genes and assigned
the corresponding genes in layer 2;
(3) we repeated step 2 to remove nodes which have been assigned to a
layer and identified a new layer until all the genes were assigned to
different layers. As a result, a nine layer hierarchical structure was
uncovered.
From BMC Bioinformatics 2004, 5:199 of the related authors
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
The hierarchical structure implies absence of cycles in the network
i.e. feedback loops (though auto regulatory and inter-regulatory loops
exist)
As the network is not complete, we cannot say that feedback loop
could not be found in future however it seems they would not be too
many.
A possible biological explanation for the existence of this hierarchical
structure is that the interactions in this particular TRN are between
proteins and genes without involving metabolites.
Only after a regulating gene has been transcribed, translated and
eventually further modified by cofactors or other proteins, it can
regulate the target gene.
A feedback from the regulated gene at transcriptional level may delay
the process for the target gene to access a desired expression level in
a new environment.
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
Feedback control may be mainly through other interactions (e.g.
metabolite and protein interaction) at post-transcriptional level rather
than through transcriptional interactions between proteins and genes.
For example, a gene at the bottom layer may code for a metabolic
enzyme, the product of which can bind to a regulator which in turn
regulates its expression. In this case, the feedback is through metabolite–
protein interaction to change the activity of the transcription factor and
then to affect the expression of the regulated gene.
Therefore, to fully understand the gene expression regulation, an
integrated network that includes different interactions is needed.
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
To calculate network motifs in the E.coli TRN, this work removed all the
loops in the network (including the autoregulatory loops and the twogene regulatory loops). Then they used the program Mfinder developed
by Kashtan et al. to generate the motif profiles.
The first four types are the so-called coherent FFLs in which the direct effect
of the up regulator is consistent with its indirect effect through the mid
regulator.
In contrast, the last four types of FFLs are incoherent because the direct effect
of the up regulator is contradictive with its indirect effect
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
(A) Gene gadA is regulated by six FFLs (B)Gene lpd is regulated by five FFLs
(C) Gene slp is regulated by 17 regulators
Transcriptional Network: Case study 1
Nucleic Acids Research, 2004, Vol. 32, No. 22 6643–6649
Transcriptional Network: Case study 2
Topological and causal structure of the yeast transcriptional regulatory
network
Nabil Guelzim1,2, Samuele Bottani3, Paul Bourgine2 & François Képès1
nature genetics • volume 31 • may 2002
In this work the yeast transcriptional network was constructed by
manual inspection of the websites of MIPS, SwissProt, Yeast Protein
Database, S. cerevisiae Promoter Database and the Saccharomyces
Genome Database
The network consists of 491 genes and 909 regulatory relations
Transcriptional Network: Case study 2
nature genetics • volume 31 • may 2002
The network consists of 491 genes and 909 regulatory relations
Bold type indicates self-activation, bold italics indicates self-inhibition
and borders indicate essential genes. Thick lines represent activation,
thin lines represent inhibition and the dashed gray line represents dual
regulation.
Transcriptional Network: Case study 2
nature genetics • volume 31 • may 2002
Indegree distribution of this yeast transcriptional
network is exponential
Typical exponential distribution on normal scale
Transcriptional Network: Case study 2
nature genetics • volume 31 • may 2002
Indegree distribution of this yeast transcriptional network is exponential
open squares, full line --for all
402 regulated genes (367
nonregulatory and 35
interregulatory genes), 909
connections, p(k)=157e–0.45k;
R=0.99)
filled circles, broken line ---for
the subset of 35
interregulatory genes, 72
connections; p(k)=15e–0.43k;
R=0.94
Indegree distribution of the transcriptional
network on semi-log scale
Transcriptional Network: Case study 2
nature genetics • volume 31 • may 2002
Outdegree distribution of this yeast transcriptional
network follows power law
Typical power law distribution on normal scale
Transcriptional Network: Case study 2
nature genetics • volume 31 • may 2002
Outdegree distribution of this yeast transcriptional network follows power
law
Open squares, full line --for all
124 regulating proteins (909
connections; P(k)=23k−0.87;
R=0.95)
filled circles, broken line – for
37 regulating proteins that
control regulatory genes (72
connections; P(k)=19k−1.14;
R=0.99)
Outdegree distribution of the
transcriptional network on log-log
scale
The operon
an operon is a functioning unit of genomic material containing a cluster
of genes under the control of a single regulatory signal or promoter.
The genes are transcribed together into an mRNA strand and either
translated together in the cytoplasm, or undergo trans-splicing to
create monocistronic mRNAs that are translated separately.
The result of this is that the genes contained in the operon are either
expressed together or not at all.
Originally operons were thought to exist solely in prokaryotes but since
the discovery of the first operons in eukaryotes in the early 1990s, more
evidence has arisen to suggest they are more common than previously
assumed.
The Lac operon
The lac operon of e.coli consists of three genes LacZ,
LacY and LacA
They are the codes of enzymes needed for processing
lactose
LacI is an adjacent gene which is a regulator (
transcriptional repressor) of the Lac operon
Besides the promoter operator region there is a region
where a complex called CAP binds which affect the
transcription positively
LacZ codes for the enzyme B-galactosidase and LacY
codes for lactose permease, an enzyme that facilitates the
flux of lactose through the cell membrane
LacA is not directly involved in processing Lactose
Source: Models of cellular regulation by Baltazar D. Aguda and Avner Friedman
The Lac operon
The LacI tetramer binds at
the promoter region and
stops the transcription
The CAP complex binds
the cap region and
enhance the binding of
RNA polymerase
Static model of the regulation of the expression of the Lac operon
Source: Models of cellular regulation by Baltazar D. Aguda and Avner Friedman
cAMP binds and LacI is
suppressed by Allolactose
cAMP cannot bind and
repressor protein LacI binds
cAMP binds and repressor
protein LacI binds
cAMP cannot bind and LacI
is suppressed by Allolactose
Summary in Table
1. Introduction to transcriptional networks
2. Regulation of the expression of the Lac
operon
3. Finding Biclusters in Bipartite Graphs
The technique of finding biclusters can be used to
determine co-expressed gene groups
Definition of a bicluster
Given a nxp data matrix X, where n is the number of
objects (e.g. genes) and p is the number of conditions
(e.g. array), a bicluster is defined as a submatrix XIJ of
X within which a subset of objects I express similar
behavior across the subset of conditions J.
A nxp data matrix X can be easily converted to a
bipartite graph by considering a threshold or so.
Finding bicluster (densely connected regions) in a
bipartite graph is a similar problem.
A Graph G=(V,E) is bipartite if its vertex
set V can be partitioned into two subsets
V1, V2 such that each edge of E has one
end vertex in V1 and another in V2.
V1
V2
Biclusters are densely connected regions in a bipartite
graph
C
d
A
a
G
g
I
f
K
k
D
c
A
b
G
h
I
g
L
i
D
d
B
a
H
e
I
h
L
j
E
c
B
b
H
f
J
f
L
k
E
d
C
a
H
g
J
g
M
l
F
c
C
b
H
h
K
h
M
m
F
d
D
a
I
e
G
f
N
l
G
d
D
b
K
i
C
c
N
m
K
j
Gene expression data can be represented as bipartite graphs
gene/cond.
cond0
cond1
cond2
cond3
cond4
YAL005C
2.85
3.34
0
0
0
YAL012W
0.21
0.03
0.18
-0.27
-0.32
YAL014C
-0.03
-0.07
0.28
0.32
-0.27
YAL015C
-0.25
0.58
0.77
0.28
0.32
YAL016W
0.11
0.04
0.75
0.82
0.21
YAL017W
0.24
0.31
0.95
0.12
0.18
YAL021C
-0.3
0.22
0.02
-0.64
0.06
By transforming highest 5% values to 1
gene/cond.
cond0
cond1
cond2
cond3
cond4
YAL005C
1
1
0
0
0
YAL012W
0
0
0
0
0
YAL014C
0
0
0
0
0
YAL015C
0
0
0
0
0
YAL016W
0
0
0
1
0
YAL017W
0
0
1
0
0
YAL021C
0
0
0
0
0
Before transforming, the
data can be normalized
Biclusters in gene
expression data
represents transcription
modules/co-expressed
gene groups
•Tanay,A. et al. (2002) Discovering statistically significant
biclusters in gene expression data. Bioinformatics, 18 (Suppl. 1),
S136–S144.
•Ihmels,J. et al. (2002) Revealing modular organization in the
yeast transcriptional network. Nat. Genet., 31, 370–377.
•Ben-Dor,A., Chor,B., Karp,R. and Yakhini,Z. (2002) Discovering
local structure in gene expression data: the order-preserving
sub-matrix problem. In Proceedings of the 6th Annual
International Conference on Computational Biology, ACM Press,
New York, NY, USA, pp. 49–57.
•Cheng,Y. and Church,G. (2000) Biclustering of expression data.
Proc. Int. Conf. Intell. Syst. Mol. Biol. pp. 93–103.
•Murali,T.M. and Kasif,S. (2003) Extracting conserved gene
expression motifs from gene expression data. Pac. Symp.
Biocomput., 8, 77–88.
We propose a biclustering method incorporating DPClus
G/E
a
b
c
d
e
f
g
h
i
j
k
l
m
A
1
1
0
0
0
0
0
0
0
0
0
0
0
B
1
1
0
0
0
0
0
0
0
0
0
0
0
C
1
1
1
1
0
0
0
0
0
0
0
0
0
D
1
1
1
1
0
0
0
0
0
0
0
0
0
E
0
0
1
1
0
0
0
0
0
0
0
0
0
F
0
0
1
1
0
0
0
0
0
0
0
0
0
G
0
0
0
1
1
1
1
0
0
0
0
0
0
H
0
0
0
0
1
1
1
1
0
0
0
0
0
I
0
0
0
0
1
1
1
1
0
0
0
0
0
J
0
0
0
0
1
1
0
0
0
0
0
0
0
K
0
0
0
0
0
0
0
1
1
1
1
0
0
L
0
0
0
0
0
0
0
0
1
1
1
0
0
M
0
0
0
0
0
0
0
0
0
0
0
1
1
N
0
0
0
0
0
0
0
0
0
0
0
1
1
CN ik 
|C | 1
 (M
j 0
BG
) ij  ( M BG ) kj
(for ik)
An example
bipartite graph
and its
corresponding
matrix
BiClus:Biclustering method incorporating DPClus
A
B
C
D
E
F
G
H
I
J
K
L
M
N
A
0
2
2
2
0
0
0
0
0
0
0
0
0
0
B
2
0
2
2
0
0
0
0
0
0
0
0
0
0
C
2
2
0
4
2
2
1
0
0
0
0
0
0
0
D
2
2
4
0
2
2
1
0
0
0
0
0
0
0
E
0
0
2
2
0
2
1
0
0
0
0
0
0
0
F
0
0
2
2
2
0
1
0
0
0
0
0
0
0
G
0
0
1
1
1
1
0
3
3
2
0
0
0
0
H
0
0
0
0
0
0
3
0
4
2
1
0
0
0
I
0
0
0
0
0
0
3
4
0
2
1
0
0
0
J
0
0
0
0
0
0
2
2
2
0
0
0
0
0
K
0
0
0
0
0
0
0
1
1
0
0
3
0
0
L
0
0
0
0
0
0
0
0
0
0
3
0
0
0
M
0
0
0
0
0
0
0
0
0
0
0
0
0
2
N
0
0
0
0
0
0
0
0
0
0
0
0
2
0
Common neighbor matrix of the bipartite graph
Concerning each row i (i=0 to
|G|-1) of MCN, we calculate
thresholdi=avgi+(maxi- avgi) 
Gmargin
and set (MSG)ik =(MSG)ki=1if
(MCN)ik  thresholdi and
thresholdi is not an
indeterminate number (for k=0
to |G|-1).
Here, avgi = SUMi/ni where ni is
the number of non-zero entries
in row i of MCN
and maxi is the maximum value
of the entries in row i of MCN
Gmargin is a user defined value
1.
BiClus:Biclustering method incorporating DPClus
A
B
C
D
E
F
G
H
I
J
K
L
M
N
A
0
1
1
1
0
0
0
0
0
0
0
0
0
0
B
1
0
1
1
0
0
0
0
0
0
0
0
0
0
C
1
1
0
1
1
1
0
0
0
0
0
0
0
0
D
1
1
1
0
1
1
0
0
0
0
0
0
0
0
E
0
0
1
1
0
1
1
0
0
0
0
0
0
0
F
0
0
1
1
1
0
0
0
0
0
0
0
0
0
G
0
0
0
0
1
0
0
1
1
1
0
0
0
0
H
0
0
0
0
0
0
1
0
1
1
0
0
0
0
I
0
0
0
0
0
0
1
1
0
1
0
0
0
0
J
0
0
0
0
0
0
1
1
1
0
0
0
0
0
K
0
0
0
0
0
0
0
0
0
0
0
1
0
0
L
0
0
0
0
0
0
0
0
0
0
1
0
0
0
M
0
0
0
0
0
0
0
0
0
0
0
0
0
1
N
0
0
0
0
0
0
0
0
0
0
0
0
1
0
This matrix represents
a simple graph
BiClus:Biclustering method incorporating DPClus
Simple graph derived from the common neighbor matrix.
We can use DPClus to find clusters in the simple graph.
BiClus:Biclustering method incorporating DPClus
Clustering by DPClus
BiClus:Biclustering method incorporating DPClus
Clustering by DPClus
BiClus:Biclustering method incorporating DPClus
Finally determined biclusters
Evaluation of BiClus
-Using Synthetic data
-Using real data
Evaluation of BiClus
Synthetic data
Artificially
embedded
biclusters
with noise
Evaluation of BiClus
Synthetic data
Artificially
embedded
biclusters
with overlap
Evaluation of BiClus
Let M1, M2 be two sets of biclusters. The gene match score of M1 with
respect to M2 is given by the function
1
| G1  G2 |
S ( M1 , M 2 ) 
max

| M1 | (G1 ,C1 )M 1 (G2 ,C2 )M 2 | G1  G2 |
*
G
A systematic comparison and evaluation of biclustering methods
for gene expression data
Amela Prelic´, Stefan Bleuler, Philip Zimmermann, Anja Wille, Peter Bu¨
hlmann, Wilhelm Gruissem, Lars Hennig, Lothar Thiele and Eckart
Zitzle
BIOINFORMATICS, Vol. 22 no. 9 2006, pages 1122–1129
Evaluation of BiClus
effect of relevance of BCs
Synthetic data
1.2
Artificially
embedded
biclusters
with noise
1
avg m atching score
0.8
SAM BA
0.6
BiClus
0.4
0.2
0
0
0.05
0.1
0.15
noise level
0.2
0.25
0.3
Evaluation of BiClus
regulatory com plexity: relevance of BCs
Synthetic data
1.2
Artificially
embedded
biclusters
with overlap
1
avg m atching score
0.8
SAM BA
0.6
BiClus
0.4
0.2
0
0
1
2
3
4
5
overlap degree
6
7
8
9
Gasch,A.P. et al. (2000) Genomic expression programs in the response
of yeast cells to environmental changes. Mol. Biol. Cell, 11, 4241–4257.
Gene expression data collected from the above work
Gene expression data can be represented as bipartite graphs
gene/cond.
cond0
cond1
cond2
cond3
cond4
YAL005C
2.85
3.34
0
0
0
YAL012W
0.21
0.03
0.18
-0.27
-0.32
YAL014C
-0.03
-0.07
0.28
0.32
-0.27
YAL015C
-0.25
0.58
0.77
0.28
0.32
YAL016W
0.11
0.04
0.75
0.82
0.21
YAL017W
0.24
0.31
0.95
0.12
0.18
YAL021C
-0.3
0.22
0.02
-0.64
0.06
By transforming highest 5% values to 1
gene/cond.
cond0
cond1
cond2
cond3
cond4
YAL005C
1
1
0
0
0
YAL012W
0
0
0
0
0
YAL014C
0
0
0
0
0
YAL015C
0
0
0
0
0
YAL016W
0
0
0
1
0
YAL017W
0
0
1
0
0
YAL021C
0
0
0
0
0
Before transforming, the
data can be normalized
Biclusters in gene
expression data
represents transcription
modules
Evaluation of BiClus
Real gene
expression data of
yeast
0.001
0.002
0.003
0.01
P-values represents statistical significance of functional richness of the modules
P-Values calculated using FuncAssociate: The Gene Set Functionator from
http://llama.med.harvard.edu/cgi/func/funcassociate
Download