IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008 3327 Inference of Noisy Nonlinear Differential Equation Models for Gene Regulatory Networks Using Genetic Programming and Kalman Filtering Lijun Qian, Senior Member, IEEE, Haixin Wang, Member, IEEE, and Edward R. Dougherty, Member, IEEE Abstract—A key issue in genomic signal processing is the inference of gene regulatory networks. These are used both to understand the role of biological regulation in phenotypic determination and to derive therapeutic strategies for genetic-based diseases. In this paper, gene regulatory networks are inferred via evolutionary modeling based on time-series microarray measurements. A nonlinear differential equation model is adopted. It includes random noise parameters for intrinsic noise arising from stochasticity in transcription and translation and for external noise arising from factors such as the amount of RNA polymerase, levels of regulatory proteins, and the effects of mRNA and protein degradation. An iterative algorithm is proposed for model identification. Genetic programming is applied to identify the structure of the model and Kalman filtering is used to estimate the parameters in each iteration. Both standard and robust Kalman filtering are considered. The effectiveness of the proposed scheme is demonstrated by using synthetic data and by using microarray measurements pertaining to yeast protein synthesis. Index Terms—Gene regulatory network, genetic programming, Kalman filter. I. INTRODUCTION T HE ultimate goal of the genomic revolution is to understand the genetic relations behind phenotypic characteristics of organisms. Such an understanding relies on a blueprint that specifies the manner in which genes and proteins interact to make a complex living system [10]. A critical step of obtaining such a blueprint is to identify the interactions among genes via the modeling of gene regulatory networks (GRNs). In light of the recent development of high-throughput DNA microarray technology, it becomes possible to discover GRNs, which are complex and nonlinear in nature. Specifically, the increasing existence of microarray time-series data makes possible the charManuscript received June 26, 2007; revised December 30, 2007. The associate editor coordinating the review of this manuscript and approving it for publication was Xiaodong Cai. This work was supported in part by the Department of Electrical and Computer Engineering at Prairie View A&M University and the National Science Foundation (CCF-0514644) and the National Cancer Institute (R01 CA-104620). L. Qian and H. Wang are with the Department of Electrical Engineering, Prairie View A&M University, Prairie View, TX 77446 USA (e-mail: LiQian@pvamu.edu; HWang@pvamu.edu). E. R. Dougherty is with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 USA, the Computational Biology Division of the Translational Genomics Research Institute, Phoenix, AZ 85004 USA, and the Department of Pathology of the University of Texas M. D. Anderson Cancer Center, Houston, TX 77030 USA (e-mail: edward@ee.tamu.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2008.919638 acterization of dynamic nonlinear regulatory interactions among genes. The synthesis and analysis of GRNs constitute a major component of genomic signal processing [1]. Because GRN models are difficult to deduce solely by means of experimental techniques, computational and mathematical methods are indispensable. Much research has been done on GRN modeling by linear differential/difference equations using time-series data, for example, [8], [9], [11]–[13], [15], [16], just to name a few. The basic idea is to approximate the combined effects of different genes by means of a weighted sum of their expression levels. In [13], a connectionist model is used to model small gene networks operating in the blastoderm of Drosophila. In [8], the concentrations of mRNA and protein are modeled by linear differential equations. A simple form of linear additive . functions is suggested by [9], where The degradation rate of gene ’s mRNA and environmental efand fects are assumed to be incorporated in the parameters their influence on gene ’s expression level is assumed to be linear. A method to obtain a continuous linear differential equation model from sampled time-series data is proposed in [15]. For added biological realism (all concentrations get saturated at some point in time), a sigmoid (squashing) function may be included into the equation. It has been shown that this sort of quasi-linear model can be solved by first applying the inverse of the squashing function [11]. Because GRNs are nonlinear in nature, nonlinear differential equation models, such as an S-system [26], can model much more complicated GRN behavior [14]. A linear model may satisfactorily model gene behavior if the GRN is operating around a steady-state and the linear model corresponds to the linearized model (from the nonlinear model) at that steady-state. In addition, the linear approximation holds only when the GRN has slow dynamics around that steady-state. A possible way to make the GRN model hold (not only at the vicinity of the steady-state but also at large range) is to include nonlinear terms such as . second-order polynomials, In our study, a GRN is modeled by continuous nonlinear Ordinary Differential Equations (ODEs). Compared to linear models, identification of the nonlinear differential equation model is computationally more intensive and can require more data; however, the range of nonlinear behaviors exhibited by GRNs can be more thoroughly understood with nonlinear differential equations. When more time-series data become available owing to advances in microarray or other technologies, and assuming continued improvement in computational capability, it can be expected that continuous nonlinear dynamic 1053-587X/$25.00 © 2008 IEEE 3328 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008 models will play a critical role in revealing complicated gene behavior. genes of interest and denotes the Assuming there are state (such as the microarray reading) of the th gene,1 then the dynamics of the GRN may be modeled as (1) In this study we assume the functions are in the form (2) is the th component of the nonwhere and are parameter noise and external linear function , noise, respectively, and it is assumed that and are white Gaussian noise. can be any In general, the component nonlinear function. Popular choices with important biological implications include S-systems [27] and sigmoid functions [7]. , , , For example, let , , and do not consider noise. Then (1) becomes the well-known S-systems [27], [38], that is (3) and are coefficients, and and are kiwhere netic orders. If a sigmoid function is chosen, for example, , , , and noise is excluded, (1) is the differential equation counterpart of the well-known weight matrix model2 given by (4) In this work, polynomials are chosen as the nonlinear component in the proposed model and ODEs with dynamic polynomials are used in our test cases. The polynomials are utilized as universal approximators. In order to mitigate the effect of “the curse of dimensionality”, only second-degree polynomials are selected. Note that an advantage of using low-degree polynomial models is that even when there exists some model mismatch, these models may be sufficiently accurate to represent many real systems, and thus are widely utilized in practice [32]. We note that a similar GRN model has been adopted by [28], but without noise being included in the model. The proposed model includes all the major characteristics of a gene regulatory network: it is nonlinear, dynamic, and noisy. To the best of our knowledge, no previous work has used the same model. The rationale behind the proposed model are two-fold: first, the proposed model is general and sufficiently flexible to include many well known models and new models yet to be 1In this paper, we consider the case where the states (x ) are the microarray readings. Thus the measurement equation is not needed. 2In [11], the weight matrix model is a difference equation model rather than a differential equation model. found; second, the noisy nature of GRNs is modeled explicitly. The deterministic model (without noise) corresponds to the nominal case, while the various stochastic effects are included as noise disturbances. For example, there is considerable experimental evidence that indicates the presence of significant stochasticity in transcriptional regulation in both eukaryotes and prokaryotes [4]. The inherent stochasticity of biochemical processes (transcription and translation) is modeled , which corresponds to the “inas noise in the parameters trinsic noise” mentioned in the literature [5]. Other effects, such as those from genes not been included in the microarray, the amount of RNA polymerase, levels of regulatory proteins, and the effects of mRNA and protein degradation, are modeled by [5]. Previous work has modeled these the external noise noise types by Gaussian white noise processes [6]. The inclusion of noise also enables the proposed model to provide interpretation of the fact that GRNs are robust to noise, by which it is meant that the relationships among genes are not greatly affected by small changes caused by noise. need to be identified from The nonlinear functions time-series microarray measurements such that the identification error is minimized and the simplest model structure is are represelected. In this paper, the criteria of selecting sented by a fitness function and modeling a GRN becomes a nonlinear optimization problem (minimization of fitness functions). We provide a framework to infer the proposed nonlinear ODE model with noise using time-series data, where Genetic Programming and Kalman filtering are applied. Both synthetic data and experimental data from microarray measurements are used to evaluate the proposed method. Note that although the proposed method is tested only using polynomials as the nonlinear terms, it is expected that it should perform similarly well for other choices of nonlinear terms in the proposed model, dependent of course on sufficient data for more complex nonlinear models. The remainder of the paper is organized as follows: The proposed framework and the iterative algorithm are illustrated in Section II. Section III presents the Robust Kalman filter that mitigates the effect of inaccuracy in noise statistics estimation. Simulation results are given in Section IV. Discussions of applying the proposed method to nonlinear models other than the polynomial case are provided in Section V. Section VI contains some concluding remarks. II. METHODOLOGY AND ALGORITHM DESCRIPTION Several design challenges have to be addressed when solving the nonlinear optimization problem. A common difficulty in GRN inference is that the problem is under-determined. In a typical microarray experiment, the number of the sampled data is much smaller than the number of genes involved. For example, there are thousands of genes and only 17 data points in the yeast data set [34]. Hence, the system is under-determined and there are infinitely many solutions. As pointed out in several previous studies, such as [10], choosing a solution from the many plausible ones is a difficult task. In [8], two algorithms (minimum weight solutions to linear equations and Fourier transform for stable systems) are provided to construct the GRN model from QIAN et al.: INFERENCE OF NOISY NONLINEAR DIFFERENTIAL EQUATION MODELS FOR GENE REGULATORY NETWORKS 3329 Fig. 1. Block diagram of GRN identification using GP and Kalman filtering. time-series data. A different approach is proposed by [9], where singular value decomposition is used to generate an initial solution and then refined by robust regression. Another proposed approach is to apply cubic interpolation between successive measurements to increase the total amount of data to the point that the linear equations become over-determined [12]. These techniques for linear models do not apply to nonlinear models. In this study, we have developed a systematic method to infer a GRN represented by a nonlinear ODE with large dimensionality using rather short length time-series data. We rely on three aspects of our approach to address this issue: 1) The identification problem is decoupled into sub-problems with the th sub-problem focusing on the th gene. Because the time-series data of other genes are fixed (from measurements) when we are focusing on an individual gene, we can solve the identification problem one gene at a time. This approach makes the inference of large GRNs feasible. Similar decoupling procedures have been used in previous studies such as the inference of S-system models [18], [19]. In the th sub-problem for the th gene, the number of parameters needing to be estimated is . 2) According to a recent result by Sontag [20], measurements are enough for identification of a set of differential equations with unknown parameters (if experiments are designed properly, such as the one mentioned in [20]). bound is an upper bound. In our case, the minThe paimum number of data points needed to estimate the . For exrameters within the th sub-problem is ample, the 17 data points in the yeast data set [34] will allow us to estimate up to 8 parameters in the th equation. Usually, since the GRN tends to be a sparse network, we do not expect many terms on the right-hand side of the ODE (usually 8 being more than enough). 3) The Kalman filter provides optimal estimation with excellent convergence speed. Thus, relatively short-length timeseries data is sufficient for the Kalman filter to converge. In the simulations (Section IV) we will show that the squared error of the estimation quickly converges close to zero. There are, of course, limitations to the method when it comes of genes. Given the computational environto the number ment currently being utilized, we are confident that the algo. This is more than sufficient for the rithm can handle number of genes envisioned in the application at which we are ultimately aiming, namely, utilization of control theory to derive intervention strategies to beneficially effect network dynamics. As applied thus far in the context of discrete Markovian regulatory networks using dynamic programming in the finite horizon case [2] or developing a stationary policy in the infinite horizon case [3], owing to computational reasons the number of genes has been kept small, typically no more than 15. The initial set of genes can be selected via existing biological knowledge, some data driven method to find a gene family in which there is substantial intergene interaction, or from a hybrid of the two. In this paper, a two-step nested optimization procedure is proposed to identify the nonlinear differential equation for each individual gene. Genetic programming (GP) is applied to determine the nonlinear terms (global optimization [32], [37]) and then the corresponding parameters associated with each term are estimated by Kalman filtering (local optimization) in each iteration. Such a decomposition of the problem into a structural part solved by GP and a parameter optimization part solved by Kalman filtering reduces the complexity significantly and speeds up convergence. The optimization procedures are illustrated in Fig. 1. Note that a similar method has been used in [28]; however, Recursive Least Square (RLS) rather than Kalman filtering is applied in [28], since noise is not modeled in that study. 3330 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008 Fig. 2. Example tree structure of a nonlinear differential equation. GRNs. Noise models not only can give more realistic simulations of biological systems but also can serve as a basis for analyzing the robustness of mathematical models with respect to noise. In this paper, noise is modeled via Gaussian white noise processes, and Kalman filtering is employed to optimize the GRN model by mitigating the effects of noise, thereby enhancing the obtained model relative to both robustness and stability. Kalman filtering provides minimum-mean-square-error estimation of the state of a stochastic linear system disturbed by Gaussian white noise. In our proposed scheme, the Kalman filter is applied to estimate the coefficients of the GRN model. Although the proposed GRN model is nonlinear, it is linear in terms of its coefficients. In addition, the filtering problem is fully decoupled so that the Kalman filter can be applied to each individual equation. The corresponding state and measurement equations are (5) In a related work (again in which noise is not considered), a genetic algorithm is embedded in genetic programming, with the latter being employed to discover and optimize the model structure and the former being used to optimize its parameters [33]. Note that most of the previous linear and nonlinear differential equation models fix the linear or nonlinear terms in the equations, and then the inference problem becomes a parameter estimation problem. Our model assumes a quasi-structure of the model, i.e., we provide candidate terms and let Genetic Programming and Kalman filtering decide which term should exist in the model and the corresponding parameters. A. Genetic Programming Within each sub-problem, the nonlinear terms in the equation first have to be determined. Genetic programming [21] is a type of evolutionary algorithm. All evolutionary algorithms work with a population of individuals, where each individual may be a solution of the optimization problem. GP operates on a tree structure, which is flexible enough to represent relationships efficiently. The leaves of a tree represent variables or constants, while the other nodes implement operators. An example of a tree structure is shown in Fig. 2, where two operations, multiplicaand addition/subtraction , are used. The corretion . Mutation and sponding equation is crossover operations may be performed to generate offsprings. Selection of better performing individuals (with smaller fitness value, thus minimizing identification error while favoring the simplest model structure) ensures that the population evolves towards solving the optimization problem. B. Kalman Filter Another step to determine the equation of each gene is to estimate the parameters, while recognizing that noise effects need to be mitigated. As living systems are optimized to function in the presence of noise, the corresponding mathematical models that attempt to explain these systems should be robust relative to noise. Untreated noise in GRN inference may lead to impractical GRN models and eventually to incorrect biological or medical conclusions. Thus, noise modeling is essential for better descriptions of (6) where the -dimensional state vector (containing the parameters to be estimated) is . The vector represents the process noise (uncertainties in parameters). Its covariance matrix is can be calculated as . contains all the modules, i.e., . is the measurement noise (external noise in GRN). Its covariance matrix is The noise vectors and are statistically independent. For example, suppose the equation for the th gene in the GRN model is (7) , and then the state vector is . Both and are calculated from the measurement data obtained from microarray experiments. The implementation of the Kalman filter (for the equation of the th gene) is given by the following equations [23] (the subscript being dropped for simplicity): (8) (9) (10) (11) (12) is the Kalman filter gain and is the covariance where matrix of the error. The superscripts and indicate the a priori and a posteriori values of the variables, respectively. QIAN et al.: INFERENCE OF NOISY NONLINEAR DIFFERENTIAL EQUATION MODELS FOR GENE REGULATORY NETWORKS 3331 error and keep the model as simple as possible, which may be achieved by minimizing the following fitness function: (13) Fig. 3. Block diagram of Kalman filter. and are the prior and posterior estimates, respectively. and are the covariance matrices of the parameter noise and external noise, respectively. The initial conditions are and . A block diagram of the Kalman filter is given in Fig. 3. In general, the Kalman filter may be interpreted as a one-step predictor with an appropriate gain calculator [22]. Specifically, the block “one-step predictor” corresponds to (10), the block “Kalman Filter gain calculator” corresponds to (11), and the block “Riccati equation solver” corresponds to (12). Convergence of the Kalman filter is an important issue [23]. The rate of convergence is defined as the number of iterations to obtain the optimum estimates. The convergence of the Kalman and the filter includes the convergence of the estimates . Convergence will be convergence of the estimation error studied in detail in the simulations (Section IV). In practice, noise statistics (such as the covariance matrices) may not be known and need to be estimated. The Kalman filter is sensitive to the estimation error of noise statistics. Poor estimates of the noise covariance can result in filter divergence. A robust Kalman filter is presented in Section III to compensate for the uncertainties in the estimates of the noise covariance. In [17], the EM algorithm is used to estimate both the state transition matrix and the observation matrix in a linear state space model of a GRN, and the Kalman filter is applied as a smoother. The Kalman filter is also applied in [31], where a two-stage method is implemented to infer a GRN from time-series data. First, a genetic algorithm and expectation maximization algorithm are used to cluster the genes, and then a linear state space model is adopted and the Kalman filter is applied to estimate and predict gene expressions. However, in both [17] and [31], a linear (rather than nonlinear) state space model of a GRN is adopted. Furthermore, the sensitivity of the Kalman filter with respect to inaccurate noise statistics is not discussed. C. Proposed Iterative Algorithm The task of identifying GRNs may be considered as an optimization problem. The goal is to minimize the identification where is the number of data points, is the target time series, is the obtained time series given by the obtained difis a ferential equation represented by a GP individual, and penalty term. The penalty term depends on the specific model chosen for a GRN. In this paper, is chosen as the number of and are terms on the right-hand side of (2). weights for joint optimization of the identification error and the , a very fine-grained model complexity of the GRN. If ,a will be obtained with many terms. On the contrary, if very rough model with few terms will be obtained. The scale of the first term on the right-hand side of (2) will vary depending on the characteristics of data sets (such as the number of time points, the number of genes, the amplitude of the signal). In this by means of trial and error to balance study, we let the effects of the two weights. Since it is a global nonlinear optimization problem, a nested optimization structure is adopted, where GP is applied to determine the nonlinear terms (global optimization) and Kalman filtering is employed to estimate the corresponding parameters for each term (local optimization) in each iteration. Such decomposition into a structural part solved by GP and a parameter optimization part solved by Kalman filtering reduces the complexity significantly and speeds up convergence [35]. The detailed procedures of the proposed iterative algorithm are illustrated in Fig. 4. The GP process has four operations: reproduction, crossover, mutation and selection. Kalman filtering is employed to estimate the parameters for every generation. III. ROBUSTNESS ANALYSIS The standard Kalman filter provides optimal estimates if the noise is white Gaussian and the statistics (covariance matrices) of the noise are known. The Kalman filter is optimal in the sense that it minimizes the trace of the estimation error’s covariance . Unfortunately, the noise covariance matrices are usually unknown, or at least not known exactly, in practical situations, such as in microarray experiments. Hence, it is critical to design a robust Kalman filter adaptive to the uncertainties in noise statistics. Robust Kalman filter design considering uncertainties in noise statistics can be found in [24] for continuous-time systems and in [25] for discrete-time systems. We follow the approach of [25] to derive the performance index of the robust Kalman filter and propose a genetic algorithm based search procedure to find the optimal robust Kalman filter gain. Define the estimation error as (14) From the standard Kalman filter, the dynamics of the estimation error can be derived as (15) 3332 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008 Fig. 4. Genetic programming process with Kalman filter. Then , the estimation error’s covariance at steady-state, satisfies the following equation: (16) are random variables with , , and and are uncorrelated, . The corresponding steady-state error covariance , where matrix becomes where and and may be decoupled into two parts, (22) (17) that represent the estimation error’s covariance due to process and measurement noise, respectively. It is straightforward to verify that Using a similar decomposition as before (23) (18) (19) where and represent the estimation error’s covariance due to the uncertain process and measurement noise, respectively. Each of them contains a nominal term and a term due to noise uncertainty, i.e., (20) (24) (21) (25) Suppose the noise covariance contains uncertainties QIAN et al.: INFERENCE OF NOISY NONLINEAR DIFFERENTIAL EQUATION MODELS FOR GENE REGULATORY NETWORKS where and satisfy (26) (27) From the above equations we may derive the following simple relation (28) The standard Kalman filter minimizes the performance index . The variation of the performance index is given by (29) The mean of is zero. The variance of is 3333 The intuition of choosing the search interval as is that it is expected that the new gain will not be very far away from the standard Kalman filter gain. Simulation results show that the proposed robust Kalman filter gives much better parameter estimates than that of the standard Kalman filter when the noise covariances are not known exactly. IV. SIMULATION EVALUATION In the simulation study, both synthetic data and real microarray measurements are used to evaluate the proposed algorithm. A robust Kalman filter is also tested against a standard Kalman filter when the noise covariance matrices are not fixed. A. Synthetic Data (30) In order to make the filter robust to the noise uncertainties, the variance of the changes in the performance index needs to be minimized. In addition, the filter should perform well under nominal conditions. Hence, the following weighted performance index is adopted to address the tradeoff between nominal and off-nominal conditions [25] In this part of the simulation, we use data of a metabolic network, called the E-cell system (a part of the biological phospholipid pathway), that consists of three substances and compare our algorithm with the approach in [28], where GP and RLS estimation were used without considering noise. This network can be approximated as (32) (31) where and and are weighting factors. When , the filter becomes the standard Kalman filter; when and , the filter is optimal under off-nominal conditions. A gradient decent method is suggested by [25] to search for the robust Kalman filter gain. The authors in [25] point out that “special care has to be taken to come up with the gradient descent step size and the perturbation size to find the partial derivative. Computationally this method is time consuming but this is a straightforward method of realizing a new Kalman gain”. However, it is not given in [25] how the step size should be chosen. Because it is very difficult to choose the step size to avoid local minima, and the method in [25] is computationally expensive, a genetic algorithm (GA) is used in this paper to search for the robust Kalman filter gain. Note that our method is different from the gradient descent in [25]. In addition, our method has the capability of avoiding local minima and converges fast because GA is computationally efficient and the search interval is limited to a reasonable range. The procedure of our approach is as follows: 1) Find the standard Kalman filter gain at steady state. 2) Generate candidate robust Kalman filter gains in the range , where . Calculate of their respective performance indices. Keep some small percentage of the top candidates for the next generation, perform mutation on another small percentage of the candidates, and perform crossover on the bulk of the candidates. Go to the next generation. 3) Stop if the performance index can not be further improved or the maximum number of iterations is reached. The last equation is added to the synthetic model for testing whether the proposed method would create false positives. Here is not involved in regulatory interactions with other gene genes. It is included to see if it is omitted from the obtained GRN. Assuming parameter and external noise in the E-cell network, the equations become (33) and are parameter noise and external noise, rewhere , spectively. Their covariance matrices are , , , , , , . It is assumed that and are uncorrelated for all and . Since there are three substances in the E-cell system, it is assumed that the tree structure should include a subset of the following terms on the right-hand side of the differential equation: , , , , , , , , , , , , , . In other words, a degree-2 polynomial model is adopted. 1000 individuals are first produced and ranked according to the fitness value. 5% of the individuals with the minimum fitness value are kept for the next generation. 80% individuals are performed crossover and 10% individuals are performed mutation and the remaining 5% are for other operations. The coefficients in the E-Cell model are determined by Kalman filtering. Fig. 5 shows the convergence of the Kalman filter for the E-cell model. 3334 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008 Fig. 5. Convergence of the Kalman filter for the E-cell model: estimation error versus number of iterations. TABLE I OBTAINED PARAMETERS BY AND WHEN NOISE PRESENTS GP + RLS Fig. 6. True positive rate versus false positive rate when noise level increases (the covariances are 0, 0.1, 1.0, 10, 30, 50, 90, respectively and the same values are used for and ). Q R GP + KF The resulting models using GP+RLS and GP+KF are listed in Table I. The true network structure is obtained for both methods is not involved (under reasonable noise levels). Since gene is not zero in the obtained with other genes, only parameter GRN. In other words, there is no (false) edges between and the other genes. Because predicting true edges is only useful if the number of false edges predicted is reasonably low, it is important to examine the True Positive rate versus the False Positive rate when noise level increases. The results similar to (but different from) the ROC analysis in [36] for E-cell simulation are given in Fig. 6. It is observed that the proposed method is quite robust to increased noise. Even when the noise level grows to 10, the True Positive rate is still 100% and the False Positive rate is below 25%. The results of the concentration levels of the three substances are shown in Fig. 7. We observe that under noisy conditions, GP plus Kalman filter performs well and Kalman filtering is a much better choice than the RLS algorithm with noise present. B. Robust Kalman Filter We now test the robust Kalman filter discussed in Section III using the E-cell model (given in Section IV-A); however, instead of fixed covariance matrices, it is assumed that the covariance Fig. 7. E-cell simulation by RLS and Kalman filtering. matrices are not known exactly, that is, the covariance matrices and are and , of and are the same as in Section IV-A, and are where random variables with and , . Variances are given by , , , , , and . The random variables and are uncorrelated for all . A genetic algorithm (GA) is used to search the optimal robust Kalman filter gain for the objective function defined by and . The results are summarized (31), where in Table II. It is observed that when there are uncertainties in the noise covariance matrices, the robust Kalman filter gives much more accurate estimates of the parameters than that of the standard Kalman filter. It is also interesting that when noise covariances are not known exactly, the robust Kalman filter can achieve a similar level of performance as the standard Kalman filter when noise covariances are known exactly. C. Scalability Analysis In order to study the scalability of the proposed method, a synthetic GRN with 50 genes is used (the detailed nonlinear QIAN et al.: INFERENCE OF NOISY NONLINEAR DIFFERENTIAL EQUATION MODELS FOR GENE REGULATORY NETWORKS 3335 TABLE II OBTAINED PARAMETERS AND THE CORRESPONDING PERFORMANCE INDEX BY AND WHEN THERE ARE UNCERTAINTIES IN NOISE : ROBUST KF GAIN COVARIANCE MATRICES. : STANDARD KF GAIN; K GP + KF K GP + RKF Fig. 8. Inference of a 50-gene synthetic GRN. ODE model is available at: http://www.old.pvamu.edu/edir/ lijun/GRN50.html). The proposed method is tested under various noise levels and different length of available time series data. It is observed in Fig. 8 that the mean square error between the exact model and the obtained model decreases with increased length of available time series data and decreased noise level, as expected. It is also observed that the proposed method performs well when the noise level is not too high and the length of available time series data is not too short. D. Yeast Data We consider time-series gene-expression data corresponding to yeast protein synthesis. Here, the data for 12 genes (HAP1, CYB2, CYC7, ROX1, CYT1, HAP2/3/4, CYC1, COX5A, COX5B_ex1, GPD2) are picked because the relations among them have been revealed by biological experiments. For example, HAP1 represses the nuclear encoding cytochrome gene CYC7 under the anaerobic condition; CYB2 activates CYC7; HAP1 is a repressor and it represses other genes [29]. The states , respectively. of the 12 genes are represented by The trace of the time-series microarray measurement data (raw data) from [34] of the 12 genes of interest is shown in Fig. 9, where 17 sampling data points are provided for each gene by the experiments. The data is plotted in log scale for the convenience of representation only. The sampling data points are evenly spaced and the observation interval is 10 minutes. The measurement data is originally from http://www.genomics. stanford.edu/yeast_cell_cycle/full_data.html, where related references are also available. Fig. 9. Microarray measurement data of the 12 genes of interest (17 data points per gene, sampled every 10 minutes). It is assumed that the nonlinear terms in the nonlinear differential equation model are 2-degree polynomials. Because the measurements have large values (range from several hundreds to more than a thousand), and the changing rates of the genes are not large (observed from the traces in Fig. 9), it is expected that the parameters of the GRN model will be small. The noise and are set to be diagonal matrices covariance matrices with on the diagonal. In the simulation, 1000 individuals are produced in each generation. 100 generations are calculated to reach the minimum fitness values. The following model is obtained by the proposed algorithm (without loss of generality, all the noise terms are dropped in the equations for simplicity of presentation): 3336 The detailed interactions among the 12 genes deduced from the obtained model are shown in Fig. 10. The obtained model possesses the following benefits: 1) The obtained relationships among genes are in agreement with biological experimental findings (as far as we know). For example, we observe that the Heme activator protein (HAP1) represses gene CYC7. HAP1 behaves as a repressor [29]. CYC7 is expressed under hypoxic conditions and activated by CYB2. It is also observed that HAP1 activates COX5B. It is known that HAP1 functions as a homodimer to activate oxygen-dependent expression of COX5B [30]. ROX1 activates HAP4, HAP4 activates HAP2, and HAP2 and HAP4 are the only 2 genes that activate CYT1. Again, CYT1 is known to be activated by HMG-domain site-specific DNA binding protein ROX1. Budding yeast HAP2 is required in concert with HAP3 and HAP4 to form a heterotrimeric CCAAT-binding transcriptional activation complex at the UAS2 element of CYC1. All of the above results agree with the findings in [30], where detailed biological explanations are provided. 2) The obtained model reveals not only qualitative but also quantitative relationships among genes. 3) The obtained model shows that there exist both negative feedback and positive feedback in the GRN. For instance, many genes (HAP1, CYC7, ROX1, CYT1, HAP2/3/4, CYC1, COX5B_ex1) regulate themselves by negative feedback. However, COX5A regulates itself by positive feedback. More interestingly, there also exist both negative feedback loops and positive feedback loops. For example, CYB2 activates CYC7, and CYC7 represses CYB2, which forms a negative feedback loop. On the contrary, HAP4 and CYC1 activate each other through a positive feedback loop between them. HAP4 and CYC1 will not be out of control since they are also suppressed by many other genes. 4) The obtained model also shows the detailed process of how genes work together to regulate other genes. For example, (CYC7) shows that CYB2 activates the equation of (CYB2) shows that HAP1 CYC7 and the equation of represses CYB2, which in turn shows that HAP1 represses CYC7 through CYB2. This kind of detail is not available in many other existing models. IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008 5) One gene may play opposite roles when collaborating with (CYT1) different genes. For instance, the equation of shows that CYC7 will repress CYT1 by itself; however, CYC7 will activate CYT1 when HAP2/4 is activated. In this case, the relationship from CYC7 to CYT1 cannot be determined during this time course. 6) Sometimes it is possible to determine a gene’s effect during this time course even if it plays opposite roles when collaborating with different genes. For example, (ROX1) shows that HAP3 will stimthe equation of ulate the production of HMG-domain site-specific DNA binding protein ROX1 when collaborating with HAP4; however, HAP3 will repress the synthesis of ROX1 when collaborating with another gene, CYC1. Because throughout the entire time course, the collective effect shows that HAP3 will repress ROX1 during this time course. In general, the proposed model shows the versatility of GRNs and hopefully helps us better understand the structure and dypossible choices namics of GRNs. Note that there are on the right-hand side of each equation, so that the computational complexity is not low. A PC with a 3 GHz Intel Pentium-4 processor is used in the simulation. It takes the PC about 6 hours to obtain the model. V. DISCUSSIONS The proposed approach of decoupling and state-space formulation for nonlinear system identification is not restricted to the polynomial case. In fact, the proposed method may be applied to many other nonlinear models. For example, this approach may be applied to sigmoid model of GRN, which is not based on polynomials. In this section, we demonstrate the usage of the proposed method toward the GRN model using sigmoidal functions. The GRN model using sigmoidal functions can be written as (34) where and are two parameters. is the weight and value for gene on gene . is an offset parameter. are intrinsic noise and external noise, respectively. Again, the nonlinear identification problem can be decoupled into sub-problems with the th sub-problem focusing on the th gene. Because the time-series data of other genes are fixed values when we are focusing on an individual gene, we can solve the identification problem one gene at a time. The above sigmoidal model, (34), can be decoupled into sub-problems. The fitness function of the th problem is given by (35) where is the number of data points, is the target time series, is the obtained time series given by the obtained differential equation. And the model for each individual gene is QIAN et al.: INFERENCE OF NOISY NONLINEAR DIFFERENTIAL EQUATION MODELS FOR GENE REGULATORY NETWORKS 3337 Fig. 10. Interactions among the 12 genes of yeast. given by (36) where (37) Now instead of determining parameters simultaneously, only parameters need to be estimated for each sub-problem. Thus the computational complexity is greatly reduced. In order to apply a linear estimator, the above equation can be rearranged as (38) Equation (38) is now linear in parameters . Then a linear estimator may be applied to estimate in each iteration. When noise is not considered, a Recursive Least Square (RLS) estimator may be used. If noise is modeled by Gaussian white noise process with known statistics, Kalman filter may be used to get the optimal estimate of . VI. CONCLUSIONS AND FUTURE WORK The induction of GRN models from a sequence of microarray measurements becomes attractive owing to the growing availability of time-series data. In this paper, a continuous nonlinear ordinary differential equation model with parameter noise and external noise is proposed. GRN inference is decoupled into sub-problems with each sub-problem targeted for each individual gene. Then a joint genetic programming and Kalman filtering approach is proposed to infer the nonlinear differential equation from time-series data. Simulations with synthetic and yeast data demonstrate the effectiveness of the proposed algorithm. The proposed algorithm addresses the tradeoff between diversification (flexibility to explore new regions) and intensification (convergence in local regions) by using genetic programming to provide the needed model flexibility to reduce the bias (systematic error) of the model, and it uses Kalman filter to provide fast convergence and reduce the stochastic error of the model by mitigating the effect of noise during parameter estimation. A robust Kalman filter is also presented to compensate for inaccurate estimates of noise statistics. The results (of the yeast GRN) obtained in this paper reveal many interesting phenomena in GRNs. The inference of GRNs by the proposed algorithm provides insight into a wide range of biological processes. Specifically, the obtained nonlinear dynamic model answers whether (qualitatively) or how much (quantitatively) a gene or external perturbation contributes to the behavior transition of other genes or regulators (proteins) in instances such as disease development or recovery, aging processes, cell differentiation, or other cellular phenomena. In addition, it characterizes how the parameter (intrinsic) noise and external noise affect the process of gene expression. We would like to point out that although only polynomials are used as test 3338 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008 cases in this study, our proposed methodology can be applied to a broad range of models. It is important to design control strategies to manipulate some of the genes in the GRN and drive the system to desired target states. The obtained GRN model using the methodology in this paper will be utilized as a tool to study the dynamics and steady states of the GRN under various control designs. It will have direct applications in therapeutic target discovery. This will be one of our research thrusts in the future. Time delay is ubiquitous in gene regulatory activities and incorporation of time delay may capture the system dynamics more effectively. We plan to add time delay to our nonlinear model in our future work. In addition, the statistics of the parameter noise and external noise may not be known at all. In that case, even a robust Kalman filter may not be appropriate filter may be emfor estimating parameters. Instead, an ployed to provide robust estimation of parameters even without the knowledge of the noise statistics. This will be one of our future efforts. REFERENCES [1] E. R. Dougherty, A. Datta, and C. Sima, “Research issues in genomic signal processing,” IEEE Signal Process. Mag., vol. 22, no. 6, pp. 46–68, 2005. [2] A. Datta, A. Choudhary, M. L. Bittner, and E. R. Dougherty, “External control in Markovian genetic regulatory networks,” Machine Learning, vol. 52, no. 1–2, pp. 169–191, 2003. [3] R. Pal, A. Datta, and E. R. Dougherty, “Optimal infinite horizon control for probabilistic Boolean networks,” IEEE Trans. Signal Process., vol. 54, no. 6, pt. 2, pp. 2375–2387, 2006. [4] T. Kepler and T. Elston, “Stochasticity in transcriptional regulation: Origins, consequences, and mathematical representations,” Biophys. J., vol. 81, no. 6, pp. 3116–3136, Dec. 2001. [5] P. Swain, M. Elowitz, and E. Siggia, “Intrinsic and extrinsic contributions to stochasticity in gene expression,” Proc. Natl. Acad. Sci. USA, vol. 99, pp. 12795–12800, 2002. [6] J. Hasty, J. Pradines, M. Dolnik, and J. J. Collins, “Noise-based switches and amplifiers for gene expression,” Proc. Natl. Acad. Sci. USA, vol. 97, pp. 2075–2080, 2000. [7] H. de Jong, “Modeling and simulation of genetic regulatory systems: A literature review,” J. Computat. Biol., vol. 9, no. 1, pp. 67–103, 2002. [8] T. Chen, H. L. He, and G. M. Church, “Modeling gene expression with differential equations,” in Pacific Symp. Biocomput., 1999, vol. 4, pp. 29–40. [9] M. K. S. Yeung, J. Tegnãr, and J. J. Collins, “Reverse engineering gene networks using singular value decomposition and robust regression,” Proc. Natl. Acad. Sci. USA, vol. 99, pp. 6163–6168, 2002. [10] V. Filkov, “Identifying gene regulatory networks from gene expression data,” in Handbook of Computational Molecular Biology. Boca Raton, FL: CRC Press, 2005. [11] D. C. Weaver, C. T. Workman, and G. D. Stormo, “Modeling regulatory networks with weight matrices,” in Pacific Symp. Biocomput., 1999, vol. 4, pp. 112–123. [12] P. D’haeseleer, X. Wen, S. Fuhrman, and R. Somogyi, “Linear modeling of mRNA expression levels during CNS development and injury,” in Pacific Symp. Biocomput., 1999, vol. 4, pp. 41–52. [13] E. Mjolsness, D. H. Sharp, and J. Reinitz, “A connectionist model of development,” J. Theor. Biol., vol. 152, no. 4, pp. 429–453, Oct. 1991. [14] L. F. A. Wessels, E. P. Van Someren, and M. J. T. Reinders, “A comparison of genetic network models,” in Pacific Symp. Biocomput., 2001, vol. 6, pp. 508–519. [15] I. Tabus, C. D. Giurcaneanu, and J. Astola, “Genetic networks inferred from time series of gene expression data,” in Proc. 1st Int. Symp. Control, Commun. Signal Process., Hammamet, Tunisia, 2004, pp. 755–758. [16] M. J. L. de Hoon, S. Imoto, K. Kobayashi, N. Ogasawara, and S. Miyano, “Inferring gene regulatory networks from time-ordered gene expression data of Bacillus subtilis using differential equations,” in Pacific Symp. Biocomput., 2003, vol. 8, pp. 17–28. [17] R. Yamaguchi, R. Yoshida, S. Imoto, T. Higuchi, and S. Miyano, “Finding module-based gene networks in time-course gene expression data with state space models,” IEEE Signal Process. Mag., vol. 24, no. 1, pp. 37–53, 2007. [18] Y. Maki et al., “Inference of genetic network using the expression profile time course data of mouse P19 cells,” in Proc. Genome Informatics 2002, 2002, vol. 13, pp. 382–383. [19] S. Kimura, M. Hatakeyama, and A. Konagaya, “Inference of S-system models of genetic networks from noisy time-series data,” Chem-Bio Inform. J., vol. 4, no. 1, pp. 1–14, 2004. [20] E. D. Sontag, “For differential equations with r parameters, experiments are enough for identification,” J. Nonlinear Sci., vol. 12, pp. 553–583, 2002. [21] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press, 1992. [22] S. Haykin, Adaptive Filter Theory, 4th ed. Englewood Cliffs, NJ: Prentice-Hall, 2001. [23] M. Grewal and A. Andrews, Kalman Filtering: Theory and Practice. Englewood Cliffs, NJ: Prentice-Hall, 1993. [24] S. Sasa, “Robustness of a Kalman filter against uncertainties of noise covariances,” in Proc. American Control Conf., 1998, pp. 2344–2348. [25] S. Kosanam and D. Simon, “Kalman filtering with uncertain noise covariances,” in Proc. IASTED Int. Conf. Intelligent Syst. Control, 2004, pp. 375–379. [26] M. A. Savageau, “Rules for the evolution of gene circuitry,” in Pacific Symp. Biocomput., 1998, vol. 3, pp. 54–65. [27] M. A. Savageau, “20 years of s-systems,” in Canonical Nonlinear Modeling: S-Systems Approach to Understand Complexity, E. Voit, Ed. New York: Van Nostrand Reinhold, 1991, pp. 1–44. [28] S. Ando, E. Sakamoto, and H. Iba, “Evolutionary modeling and inference of gene network,” Inf. Sci., vol. 145, pp. 237–259, 2002. [29] P. Woolf and Y. Wang, “A fuzzy logic approach to analyzing gene expression data,” Physiol. Genomics, vol. 3, pp. 9–15, 2000. [30] J. Schneider and L. Guarente, “Regulation of the yeast CYTI gene encoding cytochrome cl by HAP1 and HAP2/3/4,” Molecular Cellular Biol., vol. 11, no. 10, pp. 4934–4942, 1991. [31] Z. Chan, N. Kasabov, and L. Collins, “A two-stage methodology for gene regulatory network extraction from time-course gene expression data,” Expert Systems With Applications, vol. 30, pp. 59–63, 2006. [32] O. Nelles, Nonlinear System Identification. New York: Springer, 2001. [33] H. Cao, L. Kang, and Y. Chen, “Evolutionary modeling of systems of ordinary differential equations with genetic programming,” Genetic Programming and Evolvable Machines, vol. 1, no. 4, pp. 309–337, 2000. [34] L. Qian, Supplemental materials, yeast data set. Dept. Electr. Comput. Eng., Prairie View A&M Univ., Prairie View, TX, 2007 [Online]. Available: http://old.pvamu.edu/edir/lijun/SupMatTSP2007.html [35] H. Wang, L. Qian, and E. Dougherty, “Inference of gene regulatory networks using genetic programming and Kalman filter,” presented at the Gensips Conf., College Station, TX, 2006. [36] D. Husmeier, “Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks,” Bioinformatics, vol. 19, no. 17, pp. 2271–2282, 2003. [37] A. Tsakonas, “A comparison of classification accuracy of four genetic programming-evolved intelligent structures,” Inform. Sci., vol. 176, pp. 691–724, 2006. [38] H. Wang, L. Qian, and E. Dougherty, “Inference of gene regulatory networks using S-system: A unified approach,” in Proc. IEEE CIBCB, 2007, pp. 82–89. 2r + 1 Lijun Qian (M’01–SM’08) received the B.E. degree from Tsinghua University, Beijing, China, the M.S. degree from the Technion–Israel Institute of Technology, Haifa, and the Ph.D. degree from Rutgers University, New Brunswick, NJ. He is an Assistant Professor in the Department of Electrical and Computer Engineering at Prairie View A&M University (PVAMU), Prairie View, TX. Before joining PVAMU, he was a Researcher at the Mathematical Science Research Center of Bell Labs, Murray Hill, NJ. His major research interests are in network theory, control theory, and genomic signal processing. QIAN et al.: INFERENCE OF NOISY NONLINEAR DIFFERENTIAL EQUATION MODELS FOR GENE REGULATORY NETWORKS Haixin Wang (M’07) received the B.S. degree in electrical and mechanical engineering from Shandong University of Science and Technology, China, in 1997. Since 2005, he has been pursuing the Ph.D. degree in the Department of Electrical and Computer Engineering at Prairie View A&M University, Prairie View, TX. His research interests include bioinformatics, statistical signal processing and genetic algorithms. Edward R. Dougherty (M’05) received the M.S. degree in computer science from the Stevens Institute of Technology, Hoboken, NJ, and the Ph.D. degree in mathematics from Rutgers University, New Brunswick, NJ. He is a Professor in the Department of Electrical and Computer Engineering at Texas A&M University, College Station, TX, where he holds the Robert M. Kennedy Chair and is Director of the Genomic Signal Processing Laboratory. He is also the Director of the Computational Biology Division of the Trans- 3339 lational Genomics Research Institute in Phoenix, AZ. He is the author of 14 books, editor of five others, and author of more than 200 journal papers. He has contributed extensively to the statistical design of nonlinear operators for image processing and the consequent application of pattern recognition theory to nonlinear image processing. His research in genomic signal processing is aimed at diagnosis and prognosis based on genetic signatures and using gene regulatory networks to develop therapies based on the disruption or mitigation of aberrant gene function contributing to the pathology of a disease. Prof. Dougherty has been awarded the Doctor Honoris Causa by the Tampere University of Technology in Finland. He is a fellow of SPIE, has received the SPIE President’s Award, and served as the editor of the SPIE/IS&T Journal of Electronic Imaging. At Texas A&M, he has received the Association of Former Students Distinguished Achievement Award in Research, been named Fellow of the Texas Engineering Experiment Station, and named Halliburton Professor of the Dwight Look College of Engineering.