Modeling Clinical Time Series Using Gaussian Process Sequences Zitao Liu Lei Wu Milos Hauskrecht Department of Computer Science, University of Pittsburgh Background (con’t) Motivation Development of accurate models of complex clinical time series data is critical for understanding the disease, its dynamics, and subsequently patient management and clinical decision making. • Gaussian Process (GP) • GP is an extension of a multivariate Gaussian to distributions over functions. Defined by two components: (m( x ), k ( x, x ')) . Disease understanding GP regression equations: 1 Estimated Mean ( f* ) : K ( x* , x) K (x, x) 2 I y Estimated Covariance (Cov( f* )) : K ( x* , x* ) K ( x* , x) K (x, x) 2 I 1 K ( x* , x) Making decision Patient management [( f ( x ) m( x ))( f ( x) m( x))] Goal • Data State Space Gaussian Process(SSGP) Model We consider the Gaussian process q(t) with the mean function formed by a combination of a fixed set of basis functions with coefficients, β: q( t ) f ( t ) h( t )T β, Mean function: m(x) [ f (x)] Covariance function: K (x, x) Experiments State Space Gaussian Process (con’t) f ( t) ~ f (0, K ( t, t)) In this definition, f(t) is a zero mean GP , h(t) denotes a set of fixed basis functions, for example, h( t ) (1, t, t 2 ,) , and β is a Gaussian prior, ~ (b, I ) . Therefore, q(t) is another GP process, defined by: q( t ) ~ • Choice of Covariance Functions( K K1 K2 ) (h( t ) b, K ( t, t) h( t ) h( t)) T q Figure 2. Time series for six tests from the Complete Blood Count(CBC) panel for one of the patients. T Mean Reverting Property: “Develop accurate models of complex clinical time series!” ??? Value z1 ~ ( 1 ,V1 ), w t ~ y t u( z t ) v t (0, Q ), v t ~ r () u() r () – unknown transition function; (0, R ) u() – unknown measurement function. r () u() u() Y – time series of observations; Z – hidden states driving the dynamics. Time ( Az t , Q ), p( y t | z t ) (Cz t , R ) i 2 i 1 i 1 j 1 A C Time C Y – time series of observations; Z – hidden states driving the dynamics. ??? Time (Θ denotes covariance function parameters) β,z 1 K 1 T 1 K 1 Y K K Y 2 ) [log p(β, z, Y)] Future Work • Study and model dependences among multiple time series • Extend to switching-state and controlled dynamical systems • Prediction Value C si • Learning Learn Ω\Θ: EM algorithm with Time Value p( z t 1 | z t ) (0, R ) m p( Y | ) 1 Tr K Learn Θ: gradient based methods( log 2 Value • Linear Dynamical System (LDS) (0, Q ), v t ~ m p( D ) p( z, β, Y) p( z1 ) p( z i | z i 1 ) (βi | z i ) p( y i , j | βi ) Parameter Set: {,{βi }, A, C, R, Q, 1,V1} • Idea Illustration Background A m Figure 3. Root Mean Square Error(RMSE) on CBC test samples. State Space Gaussian Process 𝒕𝒊 Figure 1. Graphical representation of the state-space Gaussian process model. Shaded nodes yi , j denote (irregular) observations and shaded nodes Ti , j denote times associated with each observation. Each rectangle (plate) corresponds to a window, which is associated with its own local GP. si is the number of observations in each window. f i , j is Gaussian field. Joint distribution: 𝒚 𝒚𝒊 Value z t 1 r ( z t ) w t ??? (𝒚𝒊 , 𝒕𝒊 ) ( 1 ,V1 ), w t ~ 1/2 RMSE n 1 |yi yi |2 i 1 n • Discrete non-linear model (GPIL) We define the time series prediction/regression function for clinical time series as: g : Yobs t y where Yobs is a sequence of past observation-time pairs Yobs ( y i , ti )in1 such that, 0 ti ti 1 , y i is a p-dimensional observation vector made at time ( ti ), and n is the number of past observations; and t tn is the time at which we would like to predict the observation y . Irregularly sampled, ti 1 ti ti ti 1. y t Cz t v t K 2 2 exp( 2 sin ( t t) ) 2 2 Root Mean Square Error(RMSE): Time Problem Statement z t 1 Az t w t K1 1 exp(1 | t t |) • Results Value Specifically, a prediction model that can: 1. Handle missing values 2. Deal with irregular time sampling intervals 3. Make accurate long term predictions z1 ~ Periodicity: • Evaluation Metric Time To support the prediction inference, we need the following steps: 1. Split Yobs and t into windows. 2. For windows that do not contain t, extract the last values in those windows as βs and feed them into Kalman Filter algorithms to infer the most recent hidden state z k where k is the index of the last window that does not contain t. 3. Get βk 1 CAz k from z k 1 Az k and βk 1 Cz k 1 . 4. If t is in window k+1, use observations ( y k 1, tk 1 ) in window k+1 and βk 1 to make the prediction, where y βk 1 K (t , tk 1 ) K 1 (tk 1 , tk 1 )( y k 1 βk 1 ) ; otherwise find out the window index i where t belongs to. The prediction at t is y CAi k z k . Acknowledgement This research work was supported by grants R01LM010019 and R01GM088224 from the National Institutes of Health. Its content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Reference • • • M. Hauskrecht, M. Valko, I. Batal, G. Clermont, S. Visweswaran, and G.F. Cooper, Conditional outlier detection for clinical alerting, in AMIA Annual Symposium Proceedings, 2010, p. 286. Carl Edward Rasmussen and Christopher K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006. R. Turner, M.P. Deisenroth, and C.E. Rasmussen, State-space inference and learning with Gaussian processes, in AISTATS, vol. 9, 2010, pp. 868-875.