Leibniz-Institute for Natural Product Research and Infection Biology Hans-Knoell-Institute Jena, Germany R. Guthke, M. Hecker, W. Schmidt-Heck, S. Lambeck, S. Hummert, S. Priebe: Assessment of a Dynamic Network Model Inference Methods Guthke et al.: Assessment of a Dynamic Network Model Inference Methods World-wide Activities in the Assessment of Network Inference Methods, e.g. DREAM = Dialogue for Reverse Engineering Assessments and Methods http://wiki.c2b2.columbia.edu/dream/index.php/The_DREAM_Project Special Issue on DREAM2 (12/2007): Annals of the New York Academy of Sciences, Vol. 1115 Reverse Engineering Biological Networks: Opportunities and Challenges in Computational Methods for Pathway Inference Ed. Gustavo Stolovitzky and Andrea Califano DREAM3 challenges (15/09/2008), DREAM3 conference 29/10/-02/11/2008, Cambridge/MA Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Inference by Cyclic Operation of Experimental and Modeling Work Experiment Hypotheses Data-Preprocessing Feature Selection Literature & Databases Model Optimization Model Validation Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Model Optimization Experimental Data (pre-processed, selected) Model Structure Search Model Parameter Fit Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Model Optimization Model Structure Optimization Methods NetGenerator Guthke et al.: Assessment of a Dynamic Network Model Inference Methods NetGenerator Inference Algorithm Guthke et al., Bioinformatics, 2005, Toepfer et al., Lect Notes Bioinf, 2007 Model: dxij dt n = ∑ aij x j + bi , i = 1,..., n BioControl Jena GmbH j =1 Algorithm: Heuristic optimization algorithm (combined local search heuristics) minimizing - the model fit error mse and - the number N of non-zero model parameters Considering prior knowledge by regularisation mse mse mean square error χ2 = n−N (Generalized n number of observations (sample size) CrossValidation Index) GCV = mse n (1 − ) 2 N N number of parameters to be estimated Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Application of the NetGenerator Algorithm to Poor Transcriptome Time Series Data 4 Experimental examples with # microarrays m = 5, 6, 15, 8 Æ networks # nodes (genes, variables) n = 6, 4, 5, 7 Æ Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Infection 1/2 Response of the human blood cells (PBMCs) to infection by pathogen Escherichia coli (m = 5, n = 6) Cluster analysis Repres. Gene expression profiles Guthke et al., Bioinformatics (2005) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Network model Infection 2/2 Stress Response of the human-pathogen fungus Aspergillus fumigatus towards an temparature shift (m = 6, n = 4) Temp erg11 hsp30 Cluster analysis Repres. Gene expression profiles Guthke et al., Lect Notes Bioinf (2007) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods rpl3 cat2 Network model Model Optimization Model Structure Optimization Methods JCell (U Tübingen), A. Zell et al. Guthke et al.: Assessment of a Dynamic Network Model Inference Methods JCell Modelling of •Activation and inhibition of transcription by Hill-kinetics •mRNA degradation by linear kinetics : Spieth C et al. Bioinformatics (2006). Explicit structure optimization by a global search heuristics driven by a memetic algorithm (evolutionary approach with local improvement procedures for problem search) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods JCell versus NetGenerator: DREAM2-“5-Gene-Challenge“ Expression level Comparison for the DREAM2-”5-Gene-Challenge” m = 15; n =5 genes transfected into yeast; qRT-PCR data after perturbation, 2 replicates 0.035 0.0325 0.03 0.0275 0.025 0.0225 0.02 0.0175 0.015 0.0125 0.01 0.0075 0.005 0.0025 0 mRNA of gene 1 mRNA of gene 2 mRNA of gene 3 mRNA of gene 4 mRNA of gene 5 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 time in minutes Guthke et al.: Assessment of a Dynamic Network Model Inference Methods JCell versus NetGenerator: DREAM2-“5-Gene-Challenge“ mean and std for replicates Gene 1 NetGenerator JCell Gene 2-5 SSE = standard squared error SSEJCell ≈ 0.0025 > SSENetGenerator ≈ 0.0017 Guthke et al.: Assessment of a Dynamic Network Model Inference Methods JCell versus NetGenerator: DREAM2-“5-Gene-Challenge“ Original Network JCell inferred NetGenerator inferred Guthke et al.: Assessment of a Dynamic Network Model Inference Methods NetGenerator: DREAM3 “Gene Expression Prediction Challenge“ m = 8, n = 7 data: gene expression time series data for 5113 genes at eight time points for four different strains of yeast prior structural knowledge: regulatory associations between transcription factors and target genes from YEASTRACT database October 15, 2008: Notifications to predictors of their scores and ranks to DREAM3 Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Model Optimization Model Structure Optimization Methods LASSO = Least Absolute Shrinkage and Selection Operator Tibshirani 1996, van Someren 2002 & 2006 Guthke et al.: Assessment of a Dynamic Network Model Inference Methods LASSO Inference Method Least Absolute Shrinkage and Selection Operator Tibshirani 1996, van Someren 2002 & 2006 n Model: xi [t + 1] − xi [t ] = ∑ aij x j + bi , i = 1,..., n j =1 Algorithm: Minimization of both - the squared residuals - a weighted penalty term for the parameter values m m ⎫ ⎧n 2 ßλ = arg min ⎨∑ ( yi − ∑ xi , j ⋅ ß j ) + λ ∑ ß j ⎬ a j =1 j =1 ⎭ ⎩ i =1 Realized by adaptation of Grandvalet’s EM-algorithm, 1998 Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Modified LASSO Inference Method The modified LASSO punishes parameters dependent on the prior knowledge about gene regulatory effects (inhibitory, no/undefinedÆusual LASSO, activatory) m m ⎧n ⎫ 2 ßλ = arg min ⎨∑ ( yi − ∑ xi , j ⋅ ß j ) + λ ∑ (1 − Θ(V j sgn( ß j ))) ß j ⎬ a j =1 j =1 ⎩ i =1 ⎭ Hecker et al., NiSIS (2006) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Application of the LASSO Algorithm to Experimental Data Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Anti-TNF-alpha therapy of Rheumatoid Arthritis Response of patients to Etanercept, an TNF-alpha blocker m = 12, 9 + prior knowlege, n= 20 Hecker et al., NiSIS (2006); Koczan et al., Arth Res Ther (2008) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Assessment using Synthetic (in silico) Data • Different number of sampling time points: m = 5,10 and 20 • Synthetic networks with n= 3, 5,10 and 20 genes linear model for simulation of gene expression data: dx i = dt n ∑a ij x j + bi , i = 1, … , n j= 1 Solve system of ordinary differential equations (ODE) to simulate data Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Generation of Synthetic Data • • scale-free networks (after Barabási and Albert's model 1999) were randomly created they follow a power-law degree distributions of the form: P (k ) ~ k −γ γ=3 Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Pre-processing of Artificial Data Addition of noise (The original values of the numerical solution going to be disturbed by a Gaussian distributed value εij with different standard deviations σ) yik = xi (t k )(1 + ε ik ) Filter out low-informative datasets characterized by: • constant time series were found • strongly correlated data occurs (co-linearity-check: Pearson’s correlation coefficient > 0.98). Then apply network inference algorithms to the data (without prior knowledge; for LASSO λ = 10-7 ) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Scores for comparison original vs inferred TP – True Positives, FP – False Positive; TN – True Negatives, FN – False Negatives Example: The tool found a connection: - that is also in the given net - TP - where also in the given net is one, but has the wrong sign - FPS - where in the given net is no connection – FPN original The tool did not find a connection: - although there is one in the given net - FN - where in the given net is also non - TN Guthke et al.: Assessment of a Dynamic Network Model Inference Methods inferred TN TP TP FPS FN FPN Scores to assess the network inference method Sensitivity=Se=TP/(TP+FN+FPS) Specificity=Sp=TN/(TN+FPN) Precision =Pr= TP/(TP+FPS +FPN) MeanSeSp = (Se+Sp)/2 Jaccard = TP/(TP+FPN+FPS+FN) F-measure=2*Se*Pr/(Se+Pr) AUC(ROC: Se vs (1-Sp)) AUC(ROC: Pr vs (1-Sp)) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Results: Assessment of network inference TP TN FN FPN FPS NetGenerator (m =5 non-equidistant time points: 0,1,2,5,10; σ=0.05) 1% 13% 2% 15% 47% 29% 6% 0% 8% 13% 1% 15% 18% 11% 18% 21% 24% 53% 31% 500 3-gene networks 500 5-gene networks 73% 100 10-gene networks 100 20-gene networks LASSO (m = 5 equidistant time points: 0,2,4,6,8; σ=0.05) 30% 13% 11% 16% 6% 12% 3% 35% 37% 500 3-gene networks 0% 4%2% 7% 0% 4% 16% 30% 36% 500 5-gene networks 57% 78% 100 10-gene networks Guthke et al.: Assessment of a Dynamic Network Model Inference Methods 100 20-gene networks Results: Assessment of network inference TP TN FN FPN FPS NetGenerator (m = 20 time points: 0,1,...,20; σ=0.05) 0% 9% 1% 12% 13% 31% 28% 56% 37% 500 3-gene networks 500 5-gene networks 10% 16% 19% 50% 7% 0% 9% 15% 1% 12% 73% 100 10-gene networks 100 20-gene networks LASSO (m = 20 time points: 0,1,...,20; σ=0.05) 1% 2% 23% 18% 14% 14% 16% 500 3-gene networks 28% 33% 37% 48% 11% 500 5-gene networks 7% 1% 2% 40% 13% 100 10-gene networks Guthke et al.: Assessment of a Dynamic Network Model Inference Methods 19% 10% 63% 100 20-gene networks Results: Assessment of network inference m = 5, 10, 20; n = 3 ↑ # sampling time points compensates ↑ noise level σ (NetGenerator results; example 3-gene-networks) Jaccard index (Se+Sp)/2 1,0000 0,9000 0,8000 Noise level 0,7000 0,6000 σ=0 σ=0.05 σ=0.1 0,5000 0,4000 0,3000 0,2000 0,1000 0,0000 5 10 20 5 10 20 # Sampling time points m Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Results: Assessment of network inference ↑ # sampling time points compensates ↑ noise level σ Examples for NetGenerator results Origianl synthetic network 5 time points σ=0.0 (Se+Sp)/2 (1) 1,0000 0,9000 0,8000 (1) (3) (2) 0,7000 Noise level 0,6000 σ=0 σ=0.05 σ=0.1 0,5000 0,4000 0,3000 5 time points σ=0.1 (2) 0,2000 0,1000 0,0000 5 10 20 # Sampling time points m Guthke et al.: Assessment of a Dynamic Network Model Inference Methods 20 time points σ=0.1 (3) Why networks were not correctly inferred ? 1.Non-Identifiability: different networks can be fitted to the same data set, i.e. the inferred model is not unique (e.g. the measured data set may insufficient) 2. The model fit may be insufficient Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Why networks were not correctly inferred ? The number of sampling time points may be insufficient: Original structureÆSimulation of SamplingÆ Inference by NetGeneratorÆInferred structure K=5; Simulated Sampling at: tk = {0, 1, 3, 7, 20} h K=8; Simulated Sampling at: tk = {0, 1, 3, 7, 10, 14, 18, 20} h Guthke et al.: Assessment of a Dynamic Network Model Inference Methods false true Both methods infer cyclic structures Original network m = 10; n= 3, 5, 10, 20 fraction of feedback connections found 100,00% NetGenerator LASSO, λ=1e-007 LASSO, λ=0.005 90,00% 80,00% 70,00% 60,00% Network inferred by NetGenerator 50,00% 40,00% 30,00% 20,00% 10,00% 0,00% 3 5 genes 10 20 10 time points, no noise Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Both methods infer cyclic structures Original network fraction of feedback connections found FP high 100,00% NetGenerator LASSO, λ=1e-007 LASSO, λ=0.005 90,00% 80,00% 70,00% 60,00% Network inferred by LASSO, λ=1e-007 50,00% 40,00% 30,00% 20,00% 10,00% 0,00% 3 5 genes 10 20 10 time points, no noise Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Both methods infer cyclic structures Original network fraction of feedback connections found 100,00% NetGenerator LASSO, λ=1e-007 LASSO, λ=0.005 90,00% 80,00% 70,00% 60,00% Network inferred by LASSO, λ=0.005 50,00% 40,00% 30,00% 20,00% 10,00% 0,00% 3 5 genes 10 20 10 time points, no noise Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Results: Assessment of network inference Gene-gene connections aij are more reliable inferred than perturbation connections bi (using NetGenerator) dx i = dt 20 genes, 100 nets, σ=0.0 FP_S 0% TP 5% FP_N 8% FN 12% n ∑a ij x j + bi , i = 1, … , n j= 1 20 genes, 100 nets, σ=0.05 TP 7% FP_N 7% FN 10% gene-gene connections aij TN 74% TN 76% FP_S 1% TP 24% FP_N 47% TN 22% FN 5% TP 12% FP_S 9% TN 22% FN 9% FP_N 48% Guthke et al.: Assessment of a Dynamic Network Model Inference Methods perturbation connections bi Conclusions for dynamic network inference from poor time series data • Increase of time point number result in more correctly inferred connections especially when data is noisy • Influence of noise more intensive for LASSO and in general for small nets • Cyclic structures (feedback connections) are detectable • Worse detection of perturbation than gene-gene connections (for bigger networks) • Integration of prior knowledge necessary (in particular for perturbation connections) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Acknowledgement Leibniz-Institute for Natural Product Research and Infection Biology Hans-Knoell-Institute Jena, Germany Projects: HepatoSys, BioChancePlus, FORSYS, DFG, Industry,... Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Application to synthetic Data Simulated by linear and nonlinear models For example 2-gene-network linear model dx1 = a11 x1 + a12 x2 + b1 dt dx2 = a21 x1 + a22 x2 + b2 dt nonlinear model dx1 x2 = a11 x1 + a12 + b1 1 + x2 dt dx2 x = a21 1 + a22 x2 + b2 dt 1 + x1 Choose parameter values (different topologies) Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Results – Influence of Noise (NetGenerator) 2 genes: without noise with noise, std. 0.05 with noise, std. 0.1 Without noise Noise,Std. 0.05 Noise, std. 0.1 Sensitivity 0.92 0.89 0.84 Specificity 0.86 0.67 0.54 Precision 0.91 0.83 0.77 Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Results – Influence of the Number of Genes (NetGenerator) Scale-free Topologies (Barabashi AL et al. (1999): Physica A 272, 173-187) , requiring preconditions Number of Genes n = 3 5 10 20 Sensitivity 0.75 0.56 0.41 0.34 Specificity 0.62 0.63 0.78 0.90 Precision 0.78 0.59 0.46 0.41 F-measure 0.76 0.57 0.43 0.37 Guthke et al.: Assessment of a Dynamic Network Model Inference Methods Results – Linear Data vs. Nonlinear Data (NetGenerator) Without noise, 2 genes: linear data nonlinear data Linear Data Nonlinear Data Sensitivity 0.92 0.79 Specificity 0.86 0.59 Precision 0.91 0.78 Guthke et al.: Assessment of a Dynamic Network Model Inference Methods