TF_regulation

advertisement
Analysis of the yeast
transcriptional regulatory network
Transcription Factor (TF)
 A TF is a protein that binds to DNA sequences and
regulates the transcriptions of corresponding genes.
 Usually the binding site of a TF is one small segment
of specific promoter sequence.
 The activity of a TF is regulated according to the cell’s
need, largely through signal transduction. It may not
be directly observed, but can be reflected by the
genes it regulates.
Expression regulatory network
Identifying the expression regulatory network is a crucial
step towards understanding the cellular regulation
system.
 Inferring network from microarray data alone
 Inferring network from microarray data and TF-TG
(Target Gene) Information
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N.
Revealing modular organization in the yeast transcriptional network.
Nat Genet. 2002 Aug;31(4):370-7.
Segal E et al. Module networks: identifying regulatory modules and
their condition-specific regulators from gene expression data.
Nat Genet. 2003 Jun;34(2):166-76.
TF Activity
 Use TF-TG relation benefit the regulatory network
identification
 TF expression level is not a good measure of the TF
activity. The activated protein level of a TF, rather than
its expression level, is what controls gene expression.
 The activity of a transcription factor is regulated
according to the cell’s need, largely through signal
transduction. It may not be directly observed, but can
be reflected by the genes it regulates.
Identify TF Activity by NCA
Network Component Analysis
Liao JC et al. Network component analysis: reconstruction
of regulatory signals in biological systems.
Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15522-7.
NCA compared with PCA, ICA
NCA Model
Without further constraints, [E] cannot be uniquely
decomposed to [A] and [P].
Criteria for Unique NCA [E] = [A][P]
1.
The connectivity matrix [A] must have full-column rank.
2.
When a node in the regulatory layer is removed along
with all of the output nodes connected to it, the resulting
network must be characterized by a connectivity matrix
that still has full-column rank. This condition implies that
each column of [A] must have at least L-1 zeros.
3.
[P] must have full row rank. In other words, each
regulatory signal cannot be expressed as a linear
combination of the other regulatory signals.
Criteria 2
Estimation of [E]=[A][P]
Iteratively estimate [A] and [P]:
A0  P1  A1  P2… until convergence
Convergence criterion: decrease of least square error < cutoff
NCA, infer TF activity in Yeast
[E] = [A] [P]
How to define the restrictions to CS? i.e. which CS{i,j}=0?
Identify the TF-TG relation by
ChIP-chip experiment
Yeast cell cycle regulation
441 genes vs. 33 transcription factors
Inference of regulatory network by Two-stage
constrained factor analysis
Yu T, Li KC.
Inference of transcriptional regulatory network by
two-stage constrained space factor analysis.
Bioinformatics. 2005 Nov 1;21(21):4033-8.
Inference of regulatory network by Twostage constrained factor analysis
Shortcoming of Liao et. al.’s approach:
E = AP
Let Cij = I{Eij}, the constraint of where the loading
matrix A can be non-zero
C comes from very noisy source.
Estimate C, A, P simultaneously.
Model setting
TF activity matrix (to be estimated)
Gene expression matrix
TF x Condition
Gene x Condition
Regulation strength matrix
Error matrix
(to be estimated)
Gene x TF
Constrained by:
ci , j  bi , j  bi , j , i, j
Connection constraint matrix
C N K
Gene x TF
1: connection; 0: no connection
Up to here, it is the NCA model by Liao et al.
Model Fitting
However, we do not assume full knowledge on C.
We require C to be bounded
by
and
Higher-confidence set, from biological evidence
Lower-confidence set, from ChIP data
Model Fitting
Difficulties:
Simultaneous estimation of both the structure and coefficients amounts to
finding optimum in a very complex function.
The number of parameters to be estimated is overwhelming.
Solution:
Find a reasonable local optimum.
Use the high-confidence set to find a starting point as close to the global
optimum as possible.
Implementation:
Stepwise model fitting.
Start with a network backbone with only the high-confidence set, and
grow the network gradually, drawing new connections from the lowconfidence set.
Set C=CMIN, estimate each activity profile tk by the
consensus of the expression of the regulated genes.
Estimate B and T by alternating least squares, using
ridge regression.
Is the reduction of total RSS in the last few steps too small?
YES
NO
From (CMAX-C), find the TF-gene pair that best agree
with current estimate of B and T
Fix estimate of T, regress each gene expression profile
on the activity profiles of TF’s that are associated with it
in CMAX. Use BIC and p-value to select TF’s.
Result
Data:
Regular growth ChIP data;
cell-cycle microarray data;
99 TFs enter our study.
Start with 891 evidenced relationships and 29154
lower-confidence relationships.
Final network has 3846 TF-gene connections.
TF’s that exhibit correlated expression and activity:
Time-shifting between a TF’s activity profile and
its expression profile:
(1) Fit the activity profile using cubic spline
(2) interpolate the spline to get shifted profile
(3) obtain correlation between the expression
profile and shifted activity profile
(4) maximize absolute correlation with regard to
minute shift.
TF’s that have activity lagging behind expression:
TF’s that have activity lagging behind expression:
SWI4
Between-TF regulations:
Download