TRACKING MULTIPLE CELLS BY CORRESPONDENCE RESOLUTION IN A SEQUENTIAL BAYESIAN FRAMEWORK Nilanjan Ray, Gang Dong, Scott T. Acton C. L. Brown Dept. of Electrical & Computer Engineering, Dept. of Biomedical Engineering University of Virginia, Charlottesville, Virginia, USA ABSTRACT We propose a multi-target tracking (MTT) algorithm in a sequential Bayesian framework that computes cell velocities from video microscopy. Unlike the traditional tracking methods, our formulation does not involve the estimation of target states; instead, we estimate one-to-one target correspondences by way of a sequential Markov chain Monte Carlo (MCMC) algorithm. The proposed probabilistic framework also automatically accounts for a variable number of targets. We have tested the proposed tracking algorithm on two different in vitro and one in vivo microscopy experiments. The three experiments show that the method holds promise in terms of low false positive and false negative rates as well as low rates of correspondence error. 1. INTRODUCTION Cell velocity analysis from in vitro flow chamber assays and in vivo video microscopy is a crucial task in important biomedical application areas [3, 4]. Computing cell velocities from video microscopy manually is an extremely tedious task, which is also prone to human fatigue and bias. The essential precursor to velocity computation is the implementation of a robust multi-target tracking (MTT) algorithm. In this paper we propose a novel MTT algorithm based on a sequential Bayesian framework with the motivation of solving MTT in the context of cell velocity analysis. Classically, tracking refers to target state estimation over a period of time. However we take a different route in our approach to multiple cell tracking– instead of estimating the cell state (position, velocity, etc.), we estimate the target correspondence between two consecutive video frames in a sequential MCMC framework. The target correspondence may be defined as a mapping from a set of targets on a given video frame to the set of targets on the subsequent video frame. We will refer to this mapping as the track-map. The motivation to directly estimate the track-map rather than target state comes from the fact that cell positions can be obtained reasonably accurately by straightforward detection methods. In many microscopy scenarios, these detection techniques typically yield a high rate of detection. However the detection methods also generate false positives. Thus, a tracking method requires two post-processing steps– elimination of the false positives, and retrieval of the track-map. In contrast to our approach, the typical sequential Bayesian tracking framework always involves estimation of target state. For example, multiple hypotheses tracking (MHT) methods [6] compute both the probability associated with the measurement-to-target association hypothesis and the posterior probability for target states given the measurements available up until the current time (and a given data hypothesis). Next the MHT method computes the target state posterior density by multiplying the two aforementioned probabilities and taking their sum over all possible association hypotheses. As time progresses, the number of hypotheses grows exponentially and so does the computational complexity. The joint probability data association (JPDA) method [6] is a subclass of MHT that prunes the association hypotheses by clustering the target states. JPDA is typically defined for systems with linear and Gaussian dynamics. When the system involves non-Gaussian probability distributions and non-linearity in the target state dynamics, the particle filter (PF) provides a solution for MHT [1]. However, the computation becomes formidable in the PF context, when number of targets is large and there is no straightforward method to add or subtract targets dynamically. The reversible jump MCMC method [2] provides an avenue to compute posterior target state density in such cases with variable number of targets. Even so, the sequential nature of the problem in addition to the varying number of targets renders the reversible jump MCMC computation nearly intractable due to the storage of samples for different number of targets. These factors have led us to rethink about the choice of tracking framework applicable to the cell motility analysis. Our proposed tracking framework has the following characteristics– (1) we formulate the problem in a sequential Bayesian framework that does not involve target state estimation; (2) we simultaneously estimate the track-map and refine available target detection results; (3) our formulation completely bypasses intensive reversible jump MCMC computation, yet handles a variable number of targets in a sound probabilistic framework, and (4) the implementation involves a MCMC sampler that is a hybrid of Gibbs sampling and Metropolis-Hastings (MH) algorithm that efficiently creates samples for one-to-one track-map. p(lt , f t | Z1:t ) M p( Z t | lt , f t ) p(lt , f t | lt 1 ltm1 , g t 1 g tm1 ). (8) m1 2. PROPOSED MTT FORMULATION The proposed MTT formulation assumes that all targets are detected and that the detection leads to false positives. The proposed tracking framework simultaneously refines the set of crudely detected targets to eliminate false positives and estimates the track-map. Our aim is to generate samples {ltm , f t m }mM1 from the mixture density given in the right hand side of (8) in order to represent the posterior density p(lt,ft|Z1:t). Once the samples are obtained, we can estimate the track-map and the refined detection. To accomplish the recursion (8) for the next frame (t+1) from the samples {ltm , f t m }mM1 , we 2.1 Simultaneous Track-map Estimation and Detection Refinement Let dt and dt-1 be the set of detected targets on frame t and t-1 respectively. We now define ft, the track-map, as follows: (1) f t : dt 1 dt {o}, such that the following restricted mapping is one-to-one: (2) ft restricted : {e : e dt 1 and ft (e) o} dt . construct the samples {ltm , gtm }mM1 . This is straightforward as one can construct samples for ft uniquely given those for gt and vice-versa. The following MCMC algorithm generates samples {ltm , gtm }mM1 given the set of samples We denote a null element by o. f t (e) o means the target e in set dt-1 does not find its match in set dt. We also define the following mapping for the refinement of detection: (3) lt : dt {0,1}, where 0 denotes “not a target” or false positive and 1 denotes a target or true positive. We are interested in a sequential Bayesian maximum a posteriori (MAP) estimation of (lt,ft) from the joint density p(lt,ft|Z1:t), where Z1:t denotes all the accumulated observations or measurements up to frame t. The posterior density can be written as: p(lt , f t | Z1:t ) (4) p( Z | l , f ) p(l , f | l , f ) p(l , f | Z ). Choose u with uniform distributi on in {1, , M } t t t t t t 1 t 1 t 1 t 1 1:t 1 lt 1 ft 1 Given ft-1 we can uniquely construct a “backward” trackmap gt 1 : dt 1 dt 2 {o} as follows: b, if b d t 2 , and f t 1 (b) a (5) g t 1 (a) o, otherwise. Similarly, given gt (or gt-1) we can uniquely define ft (or ft1) as well. We choose the following density as the state evolution dynamics for the system: p(lt , f t | lt 1 , f t 1 ) p(lt , f t | lt 1 , gt 1 ), (6) then (4) can be rewritten as: p(lt , f t | Z1:t ) p( Z | l , f ) p (l , f | l , g ) p(l , g | Z ). (7) t t t t t t 1 t 1 t 1 t 1 1:t 1 lt 1 gt 1 If we now approximate the density p(lt-1,gt-1|Z1:t-1) by a set of samples {ltm1 , gtm1}mM1 , then (7) can be expressed as follows: {ltm1 , gtm1}mM1 . {ltm , g tm }mM1 MCMC[{ltm1 , g tm1}mM1 ] for m 1 : M Generate ltm and f t m from p ( Z t | lt , f t ) p (lt , f t | lt 1 ltu1 , g t 1 g tu1 ) Construct g tm from f t m end Before we elaborate on the second step inside the loop of MCMC, we need these notations to denote multiple targets: ltm {ltm,n }|ndt |1 , g tm {g tm,n }|ndt |1 , f t m { f t ,mn }|ndt 11| , and likewise, where |dt| and |dt-1| are number of initial detections on frame t and t-1 respectively. A hybrid of Gibbs and MH algorithm generates ltm and f t m from p(Zt | lt , f t ) p(lt , ft | lt 1 ltu1 , gt 1 gtu1 ) , where the second density factors as: u u u u p(lt | ft , lt 1 lt 1, gt 1 gt 1 ) p( ft | lt 1 lt 1, gt 1 gt 1 ) . In order to implement this sampling algorithm, we utilize the following form for the conditional density: |dt | |dt 1| n1 n1 p(lt | f t , lt 1 , gt 1 ) p(lt | f t ) min( lt ,n , 1( ft ,n o ) ) (9) where 1(.) is the indicator function. We also assume that measurement is independent of the track-map and it factors over the targets, so: |dt | p( Z t | lt , f t ) p( Z t | lt ) p( Z t | lt ,n ). (10) n1 While generating the mth sample, a new proposal value for nth detected cell is accepted/rejected with respect to the MH ratio | d t 1 | min( ltm,i ltm, n1 ' ltm,i1 , 1( f t ,n o ) ) p( Z t | ltm, n1 ' ) in i n n 1 | d t 1 | min( ltm,i ltm,i1 , 1( f t ,n o ) ) p( Z t | ltm, n1 ) in where l in m 1 t ,n , (11) n 1 ' is the logical complement of ltm,n1 . The proposal ltm, n1 ' is symmetric and deterministic. Next, we generate samples for the track-map ft from the conditional distribution p( f t | lt , lt 1 , gt 1 ) as follows: f t m GEN_TRACKM AP[lt , l t 1 , g t-1 ] for n 1 :| d t 1 | if ltu1, jn 1 if S is empty f t ,mjn o else P {h(k , j n , g t-1 ( j n )) : k S} Choose f t ,mjn S with probabilit y P S S \ { f t ,mjn } end else f t ,mjn o end end where h(k,j,i) is the so called “motion model” that has three arguments k,j,i representing respectively the coordinates (xk,yk), (xj,yj) and (xi,yi) of three cell centers on frame t, t-1, and t-2. The h(.) can have the following form: h( k , j , i ) exp( (arg (( xk , yk ), ( x j , y j )) arg (( x j , y j ), ( xi , yi ))) 2 t ,n m1 n 1 (15) The operator “mode” selects the mode sample value. When mode is not unique, we use a random number generator to choose one (with uniform density). It is noteworthy that the MCMC samples follow the law of large numbers, and estimations such as (14) and (15) are thus possible [7]. Choose u {1, , M } with uniform distribution 2 12 t ,n n 1 and lˆt {lˆt ,n }|ndt |1 {mode[{ltm,n }mM1 ]}|ndt |1. { j1 , , j|dt 1| } RandomPerm utation[{1, , | d t 1 |}] (dist(( xk , yk ), ( x j , y j )) dist(( x j , y j ), ( xi , yi ))) 2 After generating the samples for the track-map and the detection refinement function, we estimate ft and lt as follows: fˆ { fˆ }|dt 1| {mode[{ f m }M ]}|dt 1| , (14) t S {k : l t (k ) 1} exp( where d and are some user defined values. The GEN_TRACKMAP algorithm creates a sample for the track-map the restriction (2) for which is one-to-one. To speed up the computation for a large number of targets, the concept of “gating” may be utilized by suitably defining (12). As for example, the value of h(.) may be taken as zero when “dist” exceeds certain value. ) (12) ), 2 22 where “dist” represent Euclidean distance between two points, and “arg” represents signed angle between two vectors. Motion model (12) tries to preserve the direction and the speed of a target. 1 and 2 are standard deviations for the Gaussian distributions. When i in the argument of h(.) is the null element o, we may define h(.) as: (dist(( xk , yk ), ( x j , y j )) d ) 2 h(k , j , o) exp( ) 2 12 (13) (arg (( xk , yk ), ( x j , y j )) ) 2 exp( ), 2 22 2.2 Occlusions Typically occlusions occur in video microscopy observation– a cell visible (detected) in one frame becomes invisible (undetected) in the next frame. Such a cell may remain invisible for a number of consecutive frames. In the proposed tracking framework, we can accommodate occlusions simply deferring the track-map decisions for those cells that do find a match. In order to achieve this deferral decision, we redefine the track-map as follows: (16) f t : (d t 1 rt 1 ) (d t {o}), and as before the restriction of ft, (17) f t restricted : {e : e dt 1 rt 1 and ft (e) o} dt is one-to-one. The detection refinement is as follows: (18) lt : (d t rt ) {0,1}. The “backward” track-map is defined as: (19) gt : dt (dt 1 rt 1 {o}), and the set rt is defined as: rt {e : e (d t 1 rt 1 ) and mode[{ f t ,me }mM1 ] o}. (20) The set rt basically acts as an accumulator for those cells that cannot be matched with cells in the immediate next frame. The same algorithms, MCMC and GENTRACKMAP, apply in this case as well. It is also possible to purge cells from the accumulator that are, say, k frames old by the following definition of rt: rt {e : e (d t 1 rt 1 ) and mode[{ f t ,me }mM1 ] o} \ (d t k rt k ). (21) 3. RESULTS AND DISCUSSION The proposed algorithm has been tested using three different types of cell video sequences– 1) human monocytes observed from an in vitro assay, where the cells are rolling on human P-selectin; 2) in vitro microbubble data, where ultrasound contrast microbubbles rolls in the flow chamber illuminated with a bright field, and the adhesion property is the mechanism to be investigated; 3) in vivo natural killer T (NKT) cell data, where NKT cells migrate along liver sinusoids, and the average velocity and maximal brightness are of specific importance. Each test sequence contains a total of 150 frames at 30 Hz frame rate. The sequences (1), (2) and (3) have 35, 20 and 25 average cells present, respectively, inside the tracking region on every frame. We utilize the following measurement models: p( Z t | lt ,n 0) exp( ( I t ( x n , y n ) b ) 2 / 2 b2 ), p( Z t | lt ,n 1) exp( ( I t ( x n , y n ) f ) 2 / 2 f2 ), (22) where b and f are mean and b and f are standard deviations of respectively the background and the foreground (cells) image intensity. It(xn,yn) denotes the average image intensity within a circle (cell shape) centered at (xn,yn) on frame t. The proposed MTT method, with sample sizes (M) of 10, 100, 1000, and 5000, is applied to each type of data. A Matlab implementation on a 2.4GHz, Pentium 4, 1GB RAM PC took on an average 1.8s, 4.2s, 31.0s and 148.4s per frame to track the monocyte sequence with respectively 10, 100, 1000, and 5000 samples. The Table presents the tracking results and summarizes the performance. As performance measures, the numbers of false positives (FP) or incorrectly detected targets, false negatives (FN), or missed targets and correspondence errors (CE) are given. A CE refers to an incorrect correspondence among cells in two consecutive frames. The total error is the sum of the three types of errors; and the error rate is the ratio of the number of errors to the number of cells tracked per frame. Figure 1 shows two consecutive frames of tracking display for human monocyte sequence. Figure 2 shows the displacements of each of the monocytes along the direction of blood flow (assumed horizontal in the video) in the flow chamber over the entire sequence. In our future endeavors, we wish to extend the proposed algorithm to general multi-target tracking problems, where it is possible to dynamically add and delete targets into and from the set dt of currently detected targets. We hope to utilize the powerful tools from the theory of marked point processes [5] for the purpose. We also plan to elaborate this work by performing detailed performance and computation comparisons with popular MTT techniques, such as JPDA and MHT in the near future. REFERENCES [1] A. Doucet, B. Vo, and C. Andrieu, “Particle filtering for multi-target tracking and sensor management,” Proc. Int. Conf. on Info. Fusion, Annapolis, MD, 2002. [2] P.J. Green, “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination” Biometrika, vol. 82, pp. 711-732, 1995. [3] M.A. Mackey and F. Ianzini, “Development of the largescale digital cell analysis system,” Radiation Protection and Dosimetry, vol.99, pp.289-293, 2002. [4] K. Ley and D. Vestweber, Eds., The selectins: Initiators of leukocyte endothelial adhesion. Amsterdam, The Netherlands: Harwood, pp. 63–104, 1997. [5] J. Møller and R.P. Waagepetersen, Statistical inference and simulation for spatial point processes. Chapman and Hall/CRC: Boca Raton, 2004. [6] L.D. Stone, C.A. Barlow, and T.L. Corwin, Bayesian multiple target tracking, Artech House, Boston:MA, 1999. [7] L. Tierney, “Markov chains for exploring posterior distributions,” Ann. of Statist., vol.22, pp.1701-1786, 1994. Figure 1. Tracking display for two consecutive frames. Size of bounding box is 50x100 square pixels. Figure 2. Cell displacement computed by the MTT method. Table. Tracker performances in three types of test sequences. FP (false positive) FN (false negative) CE (correspondence error)