Sujet Titre : Specialization of BioLogical Models by Experimental

advertisement
Sujet
Titre : Specialization of BioLogical Models by Experimental Design Strategies
Directeur de
thèse
Co-encadrant
éventuels
Nom
Olivier ROUX
Email
téléphone
Unité de recherche
(avec sa labellisation)
Unité de rattachement
(Université, Ecole…)
Nombre de thèses en
cours
Nom
olivieroux@icloud.com
+ 33 2 40 37 69 79
IRCCyN
Email
téléphone
Carito.guziolowski@irccyn.ecnantes.fr
(+33) 2 40 37 69 78
IRCCyN
Unité de recherche
(avec sa labellisation)
Unité de rattachement
(Université, Ecole…)
Nombre de thèses en
cours
Ecole Centrale de Nantes
3
Carito GUZIOLOWSKI
Ecole Centrale de Nantes
2
Financement
envisagé
Résumé du sujet de la thèse (3 à 5 lignes)
High-throughput massive and parallel technologies allow us to observe different cellular
parts under a concrete situation. This information is of great value to generate and validate
cell models. However, high-throughput measurements are in many cases taken over
population aggregates, while models occur at the level of individual cells. The aim of this
thesis proposal is to provide a novel characterization of single cell systems as a family of
logic models subject to possible realizations. To achieve this, we aim to generate this
family of cell specific models from population phosphoproteomics data by using logic
programming. The logic behaviors of this family will be specialized after proposing efficient
experimental strategies. This family of models may provide remarkable benefits to medical
research through elucidating the mechanisms involved in cell dysfunction which appear in a
mutual-exclusive fashion in specific cell populations.
Descriptif du sujet de thèse (2 pages au maximum)
Encadrement : Carito Guziolowski (60%) et Olivier Roux (40%)
This PhD subject addresses a problem directly relevant to health and wellbeing, which is
developing and confronting biological models with respect to experimental data. Modeling the
complex nature of a cell is a key challenge of this decade. The recent development of highthroughput massive and parallel technologies allows us to observe different cellular parts
under a concrete situation. This information is of great value to generate and validate cell
models. However, high-throughput measurements, such as microarrays or
phosphoproteomics are in many cases taken over population aggregates, while models
representing signaling and transcriptional events occur at the level of individual cells. Over
the past years, several studies have demonstrated a high heterogeneity between individual
cells. Currently, there is a need of methods that take this heterogeneity into account when
modeling at the single cell level and integrate high-throughput data sets at the population
level into single cell models. The goal of this PhD work is to generate cell specific (logic)
models by automatically proposing efficient experimental strategies to provide relevant
measures. Modeling cell behavior at a large-scale by integrating experimental data at the
population level, in spite of its heterogeneity, noise and sparsity, is one of the main steps
towards design and control of in-cell biological systems. Manipulating in-silico these systems
can provide remarkable benefits to medical research through the understanding of the
complex mechanisms involved in cell dysfunction.
Qualitative approaches, in despite of their simplicity, allow us to model large-scale
biological systems, in opposition to quantitative methods. Among them, logic models are
able to capture interesting and relevant behaviors in the cell as several authors have shown
during the last years (Mbodj et al. Mol. Biosyst., 2013, Morris et al Methods Mol Biol, 2013,
Melas et al. Osteoarthr. Cartil., 2014). Due to factors such as the sparsity or the uncertainty
of experimental measurements, the model is often non-identifiable.
In contrast to
quantitative modeling, the use of logic models can allow tackling the model identification
problem for large-scale biological systems. In fact, in a recent study, we showed that
thousands of Boolean (early response) logic models fit several phosphoproteomic
perturbation experiments similarly well.
The aim of this PhD proposal is to explore the limits of logic model-identifiability after
proposing strategies to narrow down the number of logic models that fit equally well to data.
The main goal is to select specific experiments which will increase the system observability,
and validate our results by proving that performing such experiments allows us to obtain
precise model behaviors. Experiment selection will be achieved after exhaustive searches
over the full space of admissible behaviors by using state-of-the-art solvers of constraint logic
programs. After considering technological limitations (not all experimental designs are
achievable for a particular biological system), we expect to reduce the variability among
models and exhaustively characterize their possible mutual-exclusive behaviors. This
reduced space of logical models may be a better representative of cellular heterogeneity;
which may have a crucial impact on cellular function.
PhD thesis Objectives
We aim to study the variability of logical models learned from multiple
phosphoproteomic perturbation experiments with two approaches. First, generalize the
learning to include models with dynamic behaviors from time-series data. In fact, the large
variability of early response models could be reduced when imposing additional constraints
on dynamic behaviors. Second, implement experimental design strategies, which will
propose de novo experiments to reduce the variability of equivalent models based on
exhaustive optimization criteria. Our preliminary results in the context of the first approach
show that a method based on constraint logic programming (Answer Set Programming, ASP)
can learn a family of dynamic-patterned logical models from time-series data proposing
similar (but exhaustive and more efficient) solutions than Fuzzy logic or Synchronous
simulation methods would do based on metaheuristics. This is yet a preliminary work over a
toy-model, which needs to be improved and extended to larger case studies. Concerning the
second approach, we have previously proposed a Python package caspo
(http://bioasp.github.io/caspo), based on ASP, that learns logic models from data. This
package allows for design analyses that take into account the exponential number of logic
models displaying high fit to data. However, the iteration of the design method from caspo
exhibits a low performance when applied to real data since research spaces become too
large to explore, whereas system information is not always available. All in all, we have
begun to explore both research paths and our preliminary results show that logic
programming inspired methods can propose a solution for these problems. The main
objective of this thesis is to extend, apply, and validate these methods with regard to realcase studies. In that context, an envisaged possibility to boost our methods can be to merge
ASP optimization methods inspired from artificial intelligence techniques with metaheuristic
methods of local research. From the modeling side, the imposed constraints to tackle this
problem, can be enriched with abstract interpretation frameworks; also, multiple logic
behaviors can be modeled with Probabilistic Boolean Networks (PBN) approaches
(Trairatphisan et al. Cell Commun Signal. 2013).
Our methods will be applied to infer logic models, of signaling pathways, from
phosphoproteomics data. We confer a particular interest to this type of experiments because
cell dysfunctional models are related in many cases to the deregulation of signaling
pathways, and effects over these pathways are better measured when observing proteinactivation or de-activation rather than gene-expression. In particular we will focus on three
case studies, based on published works: (a) the DREAM8 challenge time-series phosphoproteomic data of four breast cancer cell lines, (b) in-silico generated perturbation data based
on the HPG2 liver cancer cell line model, and (c) two phosphoproteomic datasets (ligandscreening and combinatorial follow-up) of primary human hepatocytes
Download