Analysis of the yeast transcriptional regulatory network Transcription Factor (TF) A TF is a protein that binds to DNA sequences and regulates the transcriptions of corresponding genes. Usually the binding site of a TF is one small segment of specific promoter sequence. The activity of a TF is regulated according to the cell’s need, largely through signal transduction. It may not be directly observed, but can be reflected by the genes it regulates. Expression regulatory network Identifying the expression regulatory network is a crucial step towards understanding the cellular regulation system. Inferring network from microarray data alone Inferring network from microarray data and TF-TG (Target Gene) Information Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N. Revealing modular organization in the yeast transcriptional network. Nat Genet. 2002 Aug;31(4):370-7. Segal E et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 2003 Jun;34(2):166-76. TF Activity Use TF-TG relation benefit the regulatory network identification TF expression level is not a good measure of the TF activity. The activated protein level of a TF, rather than its expression level, is what controls gene expression. The activity of a transcription factor is regulated according to the cell’s need, largely through signal transduction. It may not be directly observed, but can be reflected by the genes it regulates. Identify TF Activity by NCA Network Component Analysis Liao JC et al. Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15522-7. NCA compared with PCA, ICA NCA Model Without further constraints, [E] cannot be uniquely decomposed to [A] and [P]. Criteria for Unique NCA [E] = [A][P] 1. The connectivity matrix [A] must have full-column rank. 2. When a node in the regulatory layer is removed along with all of the output nodes connected to it, the resulting network must be characterized by a connectivity matrix that still has full-column rank. This condition implies that each column of [A] must have at least L-1 zeros. 3. [P] must have full row rank. In other words, each regulatory signal cannot be expressed as a linear combination of the other regulatory signals. Criteria 2 Estimation of [E]=[A][P] Iteratively estimate [A] and [P]: A0 P1 A1 P2… until convergence Convergence criterion: decrease of least square error < cutoff NCA, infer TF activity in Yeast [E] = [A] [P] How to define the restrictions to CS? i.e. which CS{i,j}=0? Identify the TF-TG relation by ChIP-chip experiment Yeast cell cycle regulation 441 genes vs. 33 transcription factors Inference of regulatory network by Two-stage constrained factor analysis Yu T, Li KC. Inference of transcriptional regulatory network by two-stage constrained space factor analysis. Bioinformatics. 2005 Nov 1;21(21):4033-8. Inference of regulatory network by Twostage constrained factor analysis Shortcoming of Liao et. al.’s approach: E = AP Let Cij = I{Eij}, the constraint of where the loading matrix A can be non-zero C comes from very noisy source. Estimate C, A, P simultaneously. Model setting TF activity matrix (to be estimated) Gene expression matrix TF x Condition Gene x Condition Regulation strength matrix Error matrix (to be estimated) Gene x TF Constrained by: ci , j bi , j bi , j , i, j Connection constraint matrix C N K Gene x TF 1: connection; 0: no connection Up to here, it is the NCA model by Liao et al. Model Fitting However, we do not assume full knowledge on C. We require C to be bounded by and Higher-confidence set, from biological evidence Lower-confidence set, from ChIP data Model Fitting Difficulties: Simultaneous estimation of both the structure and coefficients amounts to finding optimum in a very complex function. The number of parameters to be estimated is overwhelming. Solution: Find a reasonable local optimum. Use the high-confidence set to find a starting point as close to the global optimum as possible. Implementation: Stepwise model fitting. Start with a network backbone with only the high-confidence set, and grow the network gradually, drawing new connections from the lowconfidence set. Set C=CMIN, estimate each activity profile tk by the consensus of the expression of the regulated genes. Estimate B and T by alternating least squares, using ridge regression. Is the reduction of total RSS in the last few steps too small? YES NO From (CMAX-C), find the TF-gene pair that best agree with current estimate of B and T Fix estimate of T, regress each gene expression profile on the activity profiles of TF’s that are associated with it in CMAX. Use BIC and p-value to select TF’s. Result Data: Regular growth ChIP data; cell-cycle microarray data; 99 TFs enter our study. Start with 891 evidenced relationships and 29154 lower-confidence relationships. Final network has 3846 TF-gene connections. TF’s that exhibit correlated expression and activity: Time-shifting between a TF’s activity profile and its expression profile: (1) Fit the activity profile using cubic spline (2) interpolate the spline to get shifted profile (3) obtain correlation between the expression profile and shifted activity profile (4) maximize absolute correlation with regard to minute shift. TF’s that have activity lagging behind expression: TF’s that have activity lagging behind expression: SWI4 Between-TF regulations: