Integration of all datasets The algorithm used to integrate results from the 52 datasets together is based on a second layer of the Bayesian network. Datasets are weighted differently by how well they could recover the regulatory gold standard pairs specific to erythropoiesis [1-3]. The final posterior probability of regulatory relationship is calculated according to Bayesian rules: P( FRi , j 1 | E1 , E2 ,, En ) n 1 P( FRi , j 1) P( Ek | FRi , j 1) Z k 1 (2) Where 𝐹𝑅𝑖,𝑗 = 1 represents that gene i regulates gene j, n = 52 is the number of datasets, 𝑃(𝐸𝑘 |𝐹𝑅𝑖,𝑗 = 1) stands for the score S(i,j) in dataset k as inferred from the first layer dynamic Bayesian network. Regulatory likelihood within each dataset For each dataset, data were converted into pair-wise regulatory scores 𝑆(𝑖, 𝑗), which correspond to the possibility of gene i regulating gene j. Different methods, including DBN, time-lagged correlation, Lasso regularization and TSVD, were performed to determine this score. Note that for all of these methods, score S is not symmetric due to the directionality of regulatory relationships, i.e. 𝑆(𝑖, 𝑗) ≠ 𝑆(𝑗, 𝑖). Dynamic Bayesian Network (DBN): The method described in [4] is used to determine the regulatory likelihood score in each time-course dataset retrieved from GEO, with a transcriptional time lag fixed to one. Briefly, a statistical analysis is used to determine the regulator-target gene pairs across different time slices. Instead of calculating a correlation between a transcription factor and a gene, DBN first uses the time difference between the initial change in the expression of a regulatory gene and its potential target gene to estimate the transcriptional time lag between these two genes. DBN then calculates the conditional probabilities of the target gene and its potential regulator gene changing together in a timelagged manner. This particular DBN implementation [4] allows limiting the number of potential regulators and consequently reduce the search space. Time-Lagged Correlation: Pearson product-moment correlation coefficient is used in the time-lagged correlation analysis, with the time lag of one. The direction of the interaction is determined by the order of the two proteins in the time-lagged analysis. Lasso regularization: We form the regulatory network inference problem within each dataset into a regularization problem and solve it using Lasso regularization [5]. Briefly, the time-course data is described as the following equation: N Ek (ti 1 ) Ek (ti ) ( E j (ti ) R j ,k ) (1) j 1 where 𝐸𝑘 (𝑡𝑖 ) represents the expression level for gene k at the ith time point. 𝑅𝑗,𝑘 is the regression coefficient indicating how much protein j will affect protein k, which is normalized to [0,1] and then used as the inferred probability. N is the number of genes and ε is the error caused by uncontrollable factors such as measurement errors. λ in Lasso was set to 0.25. Truncated Singular Value Decomposition (TSVD): Similar with Lasso regularization, TSVD was used to calculate regulatory regression score within each dataset [6-8]. The truncation parameter was 0.003 X the maximal singular value in the diagonal matrix. In the subsequent steps, we evaluated the performance of each of the above base-learners, and identified that DBN is the best-performing one, and thus used DBN as the first layer in the graphical model. 1. 2. 3. 4. 5. 6. 7. 8. Guan, Y., et al., A genomewide functional network for the laboratory mouse. PLoS computational biology, 2008. 4(9): p. e1000165. Huttenhower, C., et al., Exploring the human genome with functional maps. Genome research, 2009. 19(6): p. 10931106. Guan, Y., et al., Tissue-specific functional networks for prioritizing phenotype and disease genes. PLoS computational biology, 2012. 8(9): p. e1002694. Zou, M. and S.D. Conzen, A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 2005. 21(1): p. 71-79. Tibshirani, R., Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 1996: p. 267-288. Hansen, P.C., T. Sekii, and H. Shibahashi, The modified truncated SVD method for regularization in general form. SIAM Journal on Scientific and Statistical Computing, 1992. 13(5): p. 1142-1150. Zhu, F., et al., Computed tomography perfusion imaging denoising using gaussian process regression. Phys Med Biol, 2012. 57(12): p. N183-98. http://www.ncbi.nlm.nih.gov/pubmed/22617159 Zhu, F., et al., Lesion Area Detection Using Source Image Correlation Coefficient for CT Perfusion Imaging. Biomedical and Health Informatics, IEEE Journal of 2013. 17(5). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6484091