Continuous Hidden Process Model for Time Series Expression Experiments Supplementary Materials Yanxin Shi, Michael Klutstein, Itamar Simon, Tom Mitchell and Ziv Bar-Joseph 1. Detailed derivations Inference: Inference on CHPM is a special case of Kalman filter where the transition matrix is fixed to identity matrix and the covariance matrices for transition model and emission model are diagonal. Here we only state the final results without proof. Please see e.g. [Jor02, Mur02] for derivations. Forward pass Let Pt|t and Vt|t denote the mean and variance of the Pt given G1:t, respectively, where G1:t is the observed genes’ expression levels from time point 1 to t in a specific dataset and Pt is the hidden activity levels of processes at time point t in the same dataset. In our case, Pt is an n-dimensional column vector each dimension corresponding to a process and Gi is an m-dimensional vector each dimension corresponding to a gene. First, we compute the predicted mean and variance: Pt|t 1 Pt 1|t 1 Vt|t 1 Vt 1|t 1 Q where Q is the transition co-variance matrix. Next, we compute the following quantities: et Gt WPt|t 1 St WVt|t 1W ' R K t Vt|t 1W ' St 1 Lt log N (et , 0, St ) where W is our gene-process association weight matrix and R is the observation noise matrix in emission model. Then, the estimates of mean and variance are: Pt|t Pt|t 1 Kt et Vt|t ( I KtW )Vt|t 1 Backward pass First we compute: Pt 1|t Pt|t Vt 1|t Vt|t Q J t Vt|tVt 1|t 1 Then, our expected activity levels of processes can be computed as: Pt|T Pt|t J t ( Pt 1|T Pt 1|t ) By running a forward pass followed by a backward pass, we can infer the expected activity levels for processes and other sufficient statistics. This inference algorithm is performed for each dataset so that we can infer the hidden activity levels for each process in all datasets. For implementation, we used Kevin Murphy’s MATLAB BNT toolbox. Learning: When the weights are fully specified, CHPM resembles a factorial HMM [10] and inference and learning can be performed by maximizing the expected complete log likelihood. However, in our case the weights (and the edges corresponding to non zero weights) are unknown. Since one of the goals of CHPM is to identify new process-gene associations, we define a penalized complete log-likelihood score Score(o, h | W , ) where o, h represent all hidden and observed variables, respectively, and includes all model parameters other than the association weights W. D n m n Score(o, h : W , ) log( P(od , hd | W , )) 1 | wi , j | d 1 j 1 m n i 1 j 1 2 ( wi , j 0)( Ei , j 1 (1 Ei , j ) 0 ) i 1 j 1 where hd and od are the unobserved activity levels for biological processes and the observed expression levels for genes in dataset d, respectively. Assuming the inferred expected activity levels of all biological processes are fixed, by maximizing this score we can both determine the values for the parameters in the model and the structure of the model. Let P j ,d ,t E ( Pj ,d ,t | G1:n ,d ,1:T , , W ) , i.e. the inferred activity level of process j at time point t in dataset d, given all observed data and model parameters. This can be inferred in E-step for each dataset independently. Then, the penalized likelihood score can be rewritten as: 2 D n Td P j ,d ,1 1 ( P j ,d ,t P j ,d ,t 1 ) 2 1 2 Score(o, h : W , ) log( log ) log( log 2 d ) d 2 2 2 2 d 2 2 d d 1 j 1 d 1 j 1 t 2 D n n D m Td log( (Gi ,d ,t wi , j P j ,d ,t ) 2 j 1 2 2 d d 1 i 1 t 1 m m n 1 log 2 d ) 1 | wi , j | 2 i 1 j 1 n 2 ( wi , j 0)( Ei , j 1 (1 Ei , j ) 0 ) C i 1 j 1 where C is some constant not dependent on the parameters. Other notations are the same as the main paper. By this score, we can estimate the variance terms 2 and the process smoothness term 2 in closed form by zeroing the first derivative of the score defined above. It’s easy to show that the updating rule is: n 2 d P j , d ,1 j 1 n n 2d 2 Td ( P j , d ,t P j ,d ,t 1 ) 2 j 1 t 2 n(Td 1) After estimating 2 and 2 , we estimate the association weight matrix W and the observation noise 2 d . Unlike other parameters, using MLE W and 2 d do not have a closed form. Also, as we discussed in main paper, MLE of W will result in many nonzero or negative weight for each gene. Therefore, we use coordinate ascent to estimate W and 2 d . In the first step, we fix W and estimate 2 d . In the second step, we fix 2 d and estimate W. The second step has been discussed in detail in the main paper. Here, we only give the updating rule for 2 d . Assuming W is fixed, 2 d is updated by zeroing the first derivative of the score: n ( G wi , j P j ,d ,t ) 2 i , d ,t D m Td score 1 j 1 2 0 log( log ) d 2 2 d 2 d d 1 i 1 t 1 2 2 d Since we have two distinct noise terms a 2 d and b 2 d for each dataset representing the noise for genes with and without incoming edges, respectively, these two terms can be updated by considering only the corresponding set of genes: a 2 d b2d Td n (Gi,d ,t wi, j P j ,d ,t )2 i{genes with edges} t 1 j 1 Td # genes _ with _ edges Td n (Gi,d ,t wi, j P j ,d ,t )2 i{genes without edges} t 1 j 1 Td # genes _ without _ edges Reference [Jor02] M.I. Jordan. An introduction to probabilistic graphical models, 2002. In preparation. [Mur01] K.P. Murphy, The Bayes Net Toolbox for MATLAB (2001). [Mur02] K.P. Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning, 2002. Ph.D. Thesis, UC Berkeley.