Assessment of a Dynamic Network Model Inference Methods Leibniz-Institute for Natural Product

advertisement
Leibniz-Institute for Natural Product
Research and Infection Biology
Hans-Knoell-Institute
Jena, Germany
R. Guthke, M. Hecker, W. Schmidt-Heck, S. Lambeck, S. Hummert, S. Priebe:
Assessment of a Dynamic Network Model
Inference Methods
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
World-wide Activities in the Assessment
of Network Inference Methods, e.g. DREAM
= Dialogue for Reverse Engineering Assessments and Methods
http://wiki.c2b2.columbia.edu/dream/index.php/The_DREAM_Project
Special Issue on DREAM2 (12/2007):
Annals of the New York Academy of Sciences, Vol. 1115 Reverse Engineering Biological
Networks: Opportunities and Challenges in Computational Methods for Pathway Inference
Ed. Gustavo Stolovitzky and Andrea Califano
DREAM3 challenges (15/09/2008),
DREAM3 conference 29/10/-02/11/2008, Cambridge/MA
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Inference by Cyclic Operation of Experimental and Modeling Work
Experiment
Hypotheses
Data-Preprocessing
Feature
Selection
Literature &
Databases
Model
Optimization
Model
Validation
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Model Optimization
Experimental Data (pre-processed, selected)
Model Structure Search
Model Parameter Fit
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Model Optimization
Model Structure Optimization Methods
NetGenerator
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
NetGenerator Inference Algorithm
Guthke et al., Bioinformatics, 2005,
Toepfer et al., Lect Notes Bioinf, 2007
Model:
dxij
dt
n
= ∑ aij x j + bi , i = 1,..., n
BioControl Jena GmbH
j =1
Algorithm:
Heuristic optimization algorithm (combined local search
heuristics) minimizing
- the model fit error mse and
- the number N of non-zero model parameters
Considering prior knowledge by regularisation
mse
mse
mean square error
χ2 =
n−N
(Generalized
n
number of observations (sample size)
CrossValidation
Index)
GCV =
mse
n
(1 − ) 2
N
N
number of parameters to be estimated
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Application of the
NetGenerator Algorithm
to Poor Transcriptome Time Series Data
4 Experimental examples with # microarrays m = 5, 6, 15, 8
Æ networks # nodes (genes, variables) n = 6, 4, 5, 7 Æ
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Infection
1/2
Response of the human blood cells (PBMCs) to infection by
pathogen Escherichia coli (m = 5, n = 6)
Cluster analysis
Repres. Gene expression
profiles
Guthke et al., Bioinformatics (2005)
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Network model
Infection
2/2
Stress Response of the human-pathogen fungus Aspergillus
fumigatus towards an temparature shift (m = 6, n = 4)
Temp
erg11
hsp30
Cluster analysis
Repres. Gene expression
profiles
Guthke et al., Lect Notes Bioinf (2007)
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
rpl3
cat2
Network model
Model Optimization
Model Structure Optimization Methods
JCell (U Tübingen), A. Zell et al.
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
JCell
Modelling of
•Activation and inhibition of transcription by Hill-kinetics
•mRNA degradation by linear kinetics :
Spieth C et al. Bioinformatics (2006).
Explicit structure optimization by a global search heuristics
driven by a memetic algorithm (evolutionary approach with local
improvement procedures for problem search)
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
JCell versus NetGenerator: DREAM2-“5-Gene-Challenge“
Expression level
Comparison for the DREAM2-”5-Gene-Challenge”
m = 15; n =5 genes transfected into yeast;
qRT-PCR data after perturbation, 2 replicates
0.035
0.0325
0.03
0.0275
0.025
0.0225
0.02
0.0175
0.015
0.0125
0.01
0.0075
0.005
0.0025
0
mRNA of gene 1
mRNA of gene 2
mRNA of gene 3
mRNA of gene 4
mRNA of gene 5
0
20
40
60
80 100 120 140 160 180 200 220 240 260 280
time in minutes
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
JCell versus NetGenerator: DREAM2-“5-Gene-Challenge“
mean and std
for replicates
Gene 1
NetGenerator
JCell
Gene 2-5
SSE = standard
squared error
SSEJCell ≈ 0.0025 > SSENetGenerator ≈ 0.0017
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
JCell versus NetGenerator: DREAM2-“5-Gene-Challenge“
Original Network
JCell inferred
NetGenerator inferred
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
NetGenerator: DREAM3 “Gene Expression Prediction Challenge“
m = 8, n = 7
data: gene expression time series data for 5113 genes at eight time
points for four different strains of yeast
prior structural knowledge: regulatory associations between transcription
factors and target genes from YEASTRACT database
October 15, 2008: Notifications to predictors of their scores and ranks to DREAM3
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Model Optimization
Model Structure Optimization Methods
LASSO = Least Absolute
Shrinkage and Selection Operator
Tibshirani 1996, van Someren 2002 & 2006
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
LASSO Inference Method
Least Absolute Shrinkage and Selection Operator
Tibshirani 1996, van Someren 2002 & 2006
n
Model:
xi [t + 1] − xi [t ] = ∑ aij x j + bi , i = 1,..., n
j =1
Algorithm:
Minimization of both
- the squared residuals
- a weighted penalty term for the parameter values
m
m
⎫
⎧n
2
ßλ = arg min ⎨∑ ( yi − ∑ xi , j ⋅ ß j ) + λ ∑ ß j ⎬
a
j =1
j =1
⎭
⎩ i =1
Realized by adaptation of Grandvalet’s EM-algorithm, 1998
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Modified LASSO Inference Method
The modified LASSO punishes parameters dependent
on the prior knowledge about gene regulatory effects
(inhibitory, no/undefinedÆusual LASSO, activatory)
m
m
⎧n
⎫
2
ßλ = arg min ⎨∑ ( yi − ∑ xi , j ⋅ ß j ) + λ ∑ (1 − Θ(V j sgn( ß j ))) ß j ⎬
a
j =1
j =1
⎩ i =1
⎭
Hecker et al., NiSIS (2006)
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Application
of the LASSO Algorithm
to Experimental Data
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Anti-TNF-alpha therapy of Rheumatoid Arthritis
Response of patients to Etanercept, an TNF-alpha blocker
m = 12, 9 + prior knowlege, n= 20
Hecker et al., NiSIS (2006); Koczan et al., Arth Res Ther (2008)
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Assessment using Synthetic (in silico) Data
•
Different number of sampling time points: m = 5,10 and 20
•
Synthetic networks with n= 3, 5,10 and 20 genes
linear model for simulation of gene expression data:
dx i
=
dt
n
∑a
ij
x j + bi , i = 1, … , n
j= 1
Solve system of ordinary differential equations (ODE) to simulate
data
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Generation of Synthetic Data
•
•
scale-free networks (after Barabási and Albert's model 1999) were
randomly created
they follow a power-law degree distributions of the form:
P (k ) ~ k
−γ
γ=3
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Pre-processing of Artificial Data
Addition of noise (The original values of the numerical solution going to be
disturbed by a Gaussian distributed value εij with different standard
deviations σ)
yik = xi (t k )(1 + ε ik )
Filter out low-informative datasets characterized by:
•
constant time series were found
•
strongly correlated data occurs (co-linearity-check: Pearson’s
correlation coefficient > 0.98).
Then apply network inference algorithms to the data
(without prior knowledge; for LASSO λ = 10-7 )
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Scores for comparison original vs inferred
TP – True Positives, FP – False Positive;
TN – True Negatives, FN – False Negatives
Example:
The tool found a connection:
- that is also in the given net - TP
- where also in the given net is one,
but has the wrong sign - FPS
- where in the given net is no
connection – FPN
original
The tool did not find a connection:
- although there is one in the
given net - FN
- where in the given net is
also non - TN
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
inferred
TN
TP
TP
FPS
FN
FPN
Scores to assess the network inference method
Sensitivity=Se=TP/(TP+FN+FPS)
Specificity=Sp=TN/(TN+FPN)
Precision =Pr= TP/(TP+FPS +FPN)
MeanSeSp = (Se+Sp)/2
Jaccard = TP/(TP+FPN+FPS+FN)
F-measure=2*Se*Pr/(Se+Pr)
AUC(ROC: Se vs (1-Sp))
AUC(ROC: Pr vs (1-Sp))
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Results: Assessment of network inference
TP
TN
FN
FPN
FPS
NetGenerator (m =5 non-equidistant time points: 0,1,2,5,10; σ=0.05)
1% 13%
2%
15%
47%
29%
6% 0% 8%
13% 1% 15%
18%
11%
18%
21%
24%
53%
31%
500
3-gene networks
500
5-gene networks
73%
100
10-gene networks
100
20-gene networks
LASSO (m = 5 equidistant time points: 0,2,4,6,8; σ=0.05)
30%
13%
11%
16%
6%
12%
3%
35%
37%
500
3-gene networks
0%
4%2% 7%
0% 4%
16%
30%
36%
500
5-gene networks
57%
78%
100
10-gene networks
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
100
20-gene networks
Results: Assessment of network inference
TP
TN
FN
FPN
FPS
NetGenerator (m = 20 time points: 0,1,...,20; σ=0.05)
0% 9%
1% 12%
13%
31%
28%
56%
37%
500
3-gene networks
500
5-gene networks
10%
16%
19%
50%
7% 0% 9%
15% 1% 12%
73%
100
10-gene networks
100
20-gene networks
LASSO (m = 20 time points: 0,1,...,20; σ=0.05)
1%
2%
23%
18%
14%
14%
16%
500
3-gene networks
28%
33%
37%
48%
11%
500
5-gene networks
7% 1%
2%
40%
13%
100
10-gene networks
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
19%
10%
63%
100
20-gene networks
Results: Assessment of network inference
m = 5, 10, 20; n = 3
↑ # sampling time points compensates
↑ noise level σ
(NetGenerator results; example 3-gene-networks)
Jaccard index
(Se+Sp)/2
1,0000
0,9000
0,8000
Noise level
0,7000
0,6000
σ=0
σ=0.05
σ=0.1
0,5000
0,4000
0,3000
0,2000
0,1000
0,0000
5
10
20
5
10
20
# Sampling time points m
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Results: Assessment of network inference
↑ # sampling time points compensates ↑ noise level σ
Examples for NetGenerator results
Origianl synthetic network
5 time points
σ=0.0
(Se+Sp)/2
(1)
1,0000
0,9000
0,8000
(1)
(3)
(2)
0,7000
Noise level
0,6000
σ=0
σ=0.05
σ=0.1
0,5000
0,4000
0,3000
5 time points
σ=0.1
(2)
0,2000
0,1000
0,0000
5
10
20
# Sampling time points m
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
20 time points
σ=0.1
(3)
Why networks were not correctly inferred ?
1.Non-Identifiability:
different networks can be fitted to the same data
set, i.e. the inferred model is not unique
(e.g. the measured data set may insufficient)
2. The model fit may be insufficient
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Why networks were not correctly inferred ?
The number of sampling time points may be insufficient:
Original structureÆSimulation of SamplingÆ Inference by NetGeneratorÆInferred structure
K=5; Simulated Sampling at:
tk = {0, 1, 3, 7, 20} h
K=8; Simulated Sampling at:
tk = {0, 1, 3, 7, 10, 14, 18, 20} h
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
false
true
Both methods infer cyclic structures
Original network
m = 10; n= 3, 5, 10, 20
fraction of feedback connections found
100,00%
NetGenerator
LASSO, λ=1e-007
LASSO, λ=0.005
90,00%
80,00%
70,00%
60,00%
Network inferred by
NetGenerator
50,00%
40,00%
30,00%
20,00%
10,00%
0,00%
3
5
genes
10
20
10 time points, no noise
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Both methods infer cyclic structures
Original network
fraction of feedback connections found
FP high
100,00%
NetGenerator
LASSO, λ=1e-007
LASSO, λ=0.005
90,00%
80,00%
70,00%
60,00%
Network inferred by
LASSO, λ=1e-007
50,00%
40,00%
30,00%
20,00%
10,00%
0,00%
3
5
genes
10
20
10 time points, no noise
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Both methods infer cyclic structures
Original network
fraction of feedback connections found
100,00%
NetGenerator
LASSO, λ=1e-007
LASSO, λ=0.005
90,00%
80,00%
70,00%
60,00%
Network inferred by
LASSO, λ=0.005
50,00%
40,00%
30,00%
20,00%
10,00%
0,00%
3
5
genes
10
20
10 time points, no noise
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Results: Assessment of network inference
Gene-gene connections aij are more reliable inferred than
perturbation connections bi (using NetGenerator)
dx i
=
dt
20 genes, 100 nets, σ=0.0
FP_S 0%
TP 5%
FP_N 8%
FN 12%
n
∑a
ij
x j + bi , i = 1, … , n
j= 1
20 genes, 100 nets, σ=0.05
TP 7%
FP_N 7%
FN 10%
gene-gene connections aij
TN 74%
TN 76%
FP_S 1%
TP 24%
FP_N 47%
TN 22%
FN 5%
TP 12% FP_S 9%
TN 22%
FN 9%
FP_N 48%
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
perturbation connections bi
Conclusions
for dynamic network inference
from poor time series data
• Increase of time point number result in more correctly inferred connections
especially when data is noisy
• Influence of noise more intensive for LASSO and in general for small nets
• Cyclic structures (feedback connections) are detectable
• Worse detection of perturbation than gene-gene connections (for bigger networks)
• Integration of prior knowledge necessary (in particular for perturbation connections)
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Acknowledgement
Leibniz-Institute for Natural Product
Research and Infection Biology
Hans-Knoell-Institute
Jena, Germany
Projects: HepatoSys, BioChancePlus, FORSYS,
DFG, Industry,...
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Application to synthetic Data
Simulated by linear and nonlinear models
For example 2-gene-network
linear model
dx1
= a11 x1 + a12 x2 + b1
dt
dx2
= a21 x1 + a22 x2 + b2
dt
nonlinear model
dx1
x2
= a11 x1 + a12
+ b1
1 + x2
dt
dx2
x
= a21 1 + a22 x2 + b2
dt
1 + x1
Choose parameter values (different topologies)
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Results – Influence of Noise (NetGenerator)
2 genes:
without noise
with noise, std. 0.05
with noise, std. 0.1
Without noise Noise,Std. 0.05 Noise, std. 0.1
Sensitivity
0.92
0.89
0.84
Specificity
0.86
0.67
0.54
Precision
0.91
0.83
0.77
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Results – Influence of the Number of Genes (NetGenerator)
Scale-free Topologies (Barabashi AL et al. (1999): Physica A 272, 173-187)
, requiring preconditions
Number of Genes n =
3
5
10
20
Sensitivity
0.75
0.56
0.41
0.34
Specificity
0.62
0.63
0.78
0.90
Precision
0.78
0.59
0.46
0.41
F-measure
0.76
0.57
0.43
0.37
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Results – Linear Data vs. Nonlinear Data (NetGenerator)
Without noise, 2 genes:
linear data
nonlinear data
Linear Data
Nonlinear Data
Sensitivity
0.92
0.79
Specificity
0.86
0.59
Precision
0.91
0.78
Guthke et al.: Assessment of a Dynamic Network Model Inference Methods
Download