Proceedings of the 7th Annual ISC Graduate Research Symposium ISC-GRS 2013 April 24, 2013, Rolla, Missouri Mingzhong Li Department of Engineering Management Department of Computer Science Missouri University of Science and Technology, Rolla, MO 65409 FAST MOTION FLY TRACKING BY ADAPTIVE-LBP AND CASCADED DATA ASSOCIATION Mingzhong Li, Zhaozheng Yin, Ruwen Qin ABSTRACT Learning the behavior patterns of fruit flies can inform us about the molecular mechanisms and biochemical pathways that drive human behavior based on analogical human motivations. A glass chamber to house flies was build and their behaviors in time-lapse videos are recorded in this container. Due to several challenges in data analysis such as low image contrast, small object size and fast object motion, we propose an adaptive Local Binary Pattern feature to detect flies and develop a cascaded data association approach with fine-to-coarse gating region control to track flies in the spatial-temporal domain. Our approach is validated on two long video sequences with very good performance especially on fast motion prediction, showing its potential to enable automatic characterization of biological processes. ml424@mst.edu, yinz@mst.edu, qinr@mst.edu approach to quantitatively analyze the behaviors of flies in time-lapse images. There are three main challenges for our fly tracking problem: the contrast between the flies and their surrounding background is low (Fig.1.1-1.3), making the automatic object detection hard; (2) the size of a fly (around 3-6 pixels in Fig.1) Index Terms— Multi-object tracking, adaptive LBP feature, fast motion, cascaded data association 1. INTRODUCTION Behavioral analysis of model organisms can inform us about the molecular mechanisms and biochemical pathways that drive human behavior. Specifically, studying the fruit fly offers distinct advantages over humans, including sophisticated behaviors that mimic human motivations, short generation times, and the ability to mutate and alter the genes to study the role of specific genes. We have established a novel behavioral paradigm in which flies are housed in a 7in x 7in x 1.5in open field with water and food provided (Fig.1). Within the glass chamber, we diffuse and change the light to simulate the day/night transition and control the temperature and air pressure to simulate different weather conditions. Flies are free to move anywhere they choose, including walking, flying, and interacting with other males and females. These behaviors rely on positions of the fly in relation to one another but it is difficult to assess the behaviors manually over months, which motivates us to develop an automated multiple object tracking Figure.1. Flies in the chamber is small and the appearances of flies are highly indistinguishable to each other, so we cannot extract rich usable feature descriptors on flies to build distinctive object models, making the appearance-based object tracking methods [9] unsuitable here; (3) the flies can fly at a maximum speed of 1.7 meters/second [4], or 30 pixels/second in videos captured by a 120fps video camera with the resolution of 480x848 pixels. The motion blur caused by fast-motion (Fig.1.4) makes the object detection hard. Furthermore, the displacement of a fast-moving fly between two consecutive frames is 5 or 10 times of its object size, challenging the data-association-based tracking methods that rely on good object detection performance and continuous motion [1, 7]. In this paper, we propose to conquer the challenges with the following contributions: (1) An Adaptive Local Binary Pattern (ALBP) feature is developed to classify pixels into objects and background, attacking the challenges of fly detection caused by low image contrast, lighting fluctuation and motion blur; (2) A cascaded data association approach is proposed to match objects between consecutive images, link 1 short trajectories among image subsequences, connect long trajectories between image subsequences; (3) A fine-to-coarse mechanism to control gating regions in the spatial-temporal domain is proposed to effectively remove false positive trajectories and link broken trajectories caused by fast motion. 2. METHODOLOGY 2.1. Introducing ALBP Feature for Fly Detection Considering the background variation caused by light transition over time, light diffusion and reflection on glasses, it N LBP s ( I n -I c )2n (1) n 0 where s(x) is a step function, i.e., s(x) = 0 if x < 0 and s(x) = 1 otherwise. N is the number of neighbors (e.g., N = 8 for a 3 3 neighborhood). Due to the fluctuation of intensity values, different flies in an image may exhibit different LBP features. We train and apply a Support Vector Machine (SVM) classifier on the LBP features to classify image pixels into flies and background. However, classification using the LBP feature does not generate good results, as shown in Fig.2(d). To increase the robustness over intensity fluctuation, we introduce a threshold T into the step function in Eq.1, i.e, s ( x ) 0 if x < T and s(x) = 1 otherwise. When using the same T to get thresholded LBP for all image pixels, the classification does not work well, as shown in Fig.2(e). This is because the contrast between flies and their background varies in different regions of an image. Therefore, we propose an Adaptive LBP (ALBP) feature by adapting threshold T at different locations (x; y): T , i f ( x, y ) , L L T ( x, y ) TH , i f ( x, y ) H , T T H L ( x, y ) TL ot her wi se. H L where ( x, y) (2)` computes the mean intensity value within a patch around (x,y) (e.g., the patch size is 11 11 in this paper). The parameters ( H , L , TH , TL ) in Eq.2 are learned from a training set of fly pixels { I i } with their corresponding { i }. H max i i and TH I i* H Figure.2. Fly detection. (a) Input image; (b) Zoom-in details of four sub-images in (a); (c) Segmentation by Otsu thresholding (white and black denote fly and background pixels, respectively); (d) Classify each pixel into fly or background by its LBP feature; (e) Classify pixels by constant thresholded LBP features; (f) Classify pixels by our adaptive LBP feature; (g) Detected flies by grouping nearby classified fly pixels. is not easy to build and update an accurate background model to detect flies by background subtraction. Simply thresholding the images in Fig.2(b) by the Otsu method [6] does not work as well, as shown in Fig.2(c). To distinguish the small contrast between flies and their surrounding background, we explore the Local Binary Pattern (LBP, [5]) feature that characterizes the local spatial structure of the image texture. Given the center pixel I c , a binary code is computed by comparing I c with its neighboring pixels I n : where i* arg max i i . Similarly, we define L and TL . We train and apply a SVM classifier on the ALBP feature to classify pixels into flies and background, as shown in Fig.2(f), which outperforms the results by Otsu thresholding, original LBP and constant thresholded LBP. Finally, the nearby fly pixels are grouped into fly objects. The detected flies corresponding to Fig.2(a) are shown in Fig.2(g). 2.2. CASCAEDED DATA ASSOCIATION SYSTEM WITH FINE-TO-COARSE GATING REGION CONTROL A cascaded data association approach consisting of four steps is proposed to link tracklets in growing levels of discontinuity, as schematically illustrated in Fig.3: (1) The video sequence is divided into many subsequences (Fig.3(a)). Each subsequence consists of ten thousand images in this paper. There are two reasons to apply this divide-andconquer technique here: (a) reduce a large-scale optimization problem into many small-scale solvable optimization problems, and (b) enable the fast parallel processing in each subsequence. 2 For example, monitoring flies from their birth to death lasts four months. 1.2 billions of images will be recorded at 120fps, which would result in billions of tracklets (short trajectories) and make it unsolvable on a normal workstation to optimally link all the tracklets in a single optimization problem with billions of variables. (2) Detected flies between consecutive frames are matched into tracklets. Due to the camouflage and fast motion, some flies are not detected in the current image and are re-detected in the later images, resulting in broken tracklets in the spatiotemporal domain (Fig.3(b)). (3) The short tracklets within each subsequence are linked into long trajectories by iteratively solving a Linear Assignment Figure.3. The scheme of our cascaded data association approach for object tracking. Problem (LAP) with fine-to-coarse gating region control in the spatio-temporal domain (Fig.3(c)).4. (4) The long trajectories of all subsequences are sequentially connected by solving a series of LAPs to form the complete trajectories of all flies (Fig.3(d)). After sequentially performing the matching on every pair of consecutive frames, we generate many tracklets within a subsequence (Fig.3(b)). 2.2.1. Matching Flies between Frames into Tracklets We leverage the Hungarian algorithm [3] to match flies between consecutive frames and generate tracklets. The cost function used in the Hungarian algorithm is defined as: and lie represent the head and tail locations of the tracklet in frame si and ei , respectively. There are three possible associations happening to tracklets within a subsequence: (1) Linking. Two tracklets generated by the same fly are broken in the matching step (Sec.3.1) due to miss detection. The cost to link the tail of Ti with the head of T j is ^ c( Fi t , Fjt 1 ) L (Fi t)- L (Fjt 1) L (Fi t)- L(Fjt|t 1) (3) where Fi t and Fjt 1 represent the ith fly in frame t and the jth fly in frame t-1, respectively. Function L(x) retrieves the 2.2.2. Associate Tracklets within Each Subsequence Denote the ith tracklet by Ti {lisi , lisi1 ,..., liei } where lisi i s c(Ti T j ) ^ is the L2 norm. L (Fjt|t 1)is the predicted location of the jth fly in frame t based on its previous trajectories. We use a linear motion model for the prediction. location of fly x. liei l j j . s + ei s j t iei j j s + when liei l j j s and 0 s j ei t s 3 (4) where s and t denote the gating regions in the spatial and temporal domain separately. orientation of the tail of ie and j j denote the trajectory s i Ti and the head of T j , respectively. (2) Disappearing. The tail of a tracklet is not linked to any other tracklets. The cost for a tracklet disappearance is c(Ti ) c(Ti T j ) c(Ti T j ) i j (5) j (3) Appearing. The head of a tracklet is not linked to any other tracklets. The cost for a tracklet appearance is c( T j ) c(Ti T j ) c(Ti T j ) i j (6) i The actual associations among a subsequence are determined by solving a Linear Assignment Problem (LAP): a r g mi nTc ,as.t. QT a=1 3.1. Datasets Two videos captured by GoPro Hero2 camera with resolution of 480x848 and frame rate at 120fps were used as our experimenting dataset. Dataset 1 has 21000 frames on 52 flies and dataset 2 has 220000 frames on 82 flies. An airpressure change is placed inside the chamber on Video#2 around 25 min, in order to experiment the flies’ relative activity adjustment. The result shows that our proposed system is capable of outputting satisfactory tracking result by revealing the activity pattern changing in statistical data. In this section we first introduce the evaluation metrics we used to validate the reliability of our tracking system. And secondly we will demonstrate the performance of our approach based on these metrics. Finally we exhibit the biological statistics which were supposed to reveal our hypothesis, proving the stability of our approach once again. (7) a where a is a binary vector whose elements indicate which association is selected in the optimal solution, Q is a binary matrix, the nonzero elements of each row indicating which trackets are involved in that association. The constraint ensures that each tracklet appears in only one association in the optimal solution. We propose a fine-to-coarse algorithm (Fig.3(c)) to increase the gating regions gradually and iteratively associate tracklets within a subsequence: Algorithm: Iterative Tracklet Association Initialization: 3. EXPERIMENTS k 0, (0) 120; t Repeat Solve the LAP problem in Eq.7; k k 1 ; (t k ) q (sk 1) ; until no change happens to the association. where q controls the increasing rate of gating region (e.g., q = 2 in this paper). s is set as a constant in this paper (240 pixels, half height of the chamber). The iterative algorithm solves the most confident associations first and then gradually solves the less-confident ones, which helps to handle the fast motion. As a comparison, Fig.3(e) shows an example in which the tracklets are associated incorrectly when the gating region is large and the LAP problem is solved once. 2.2.3. Generate Long Trajectories Trajectory association between consecutive subsequences is solved by formulate a LAP similar to the tracklet association. 3.2. Performance Evaluation Metrics Three well-known metrics are adopted to evaluate the performance of our approach: (1) Tracker Purity (TP, [8]), the ratio of frames that a tracklet correctly follows the ground-truth to the total number of frames that the tracklet has; (2) Target Effectiveness (TE, [8]), the ratio of frames that the object is correctly tracked to the total number of frames in which the object exists; (3) Multiple Object Tracking Accuracy (MOTA, [2]) that considers the number of missed detections, false positives and ID switches. 3.3. Quantitative Evaluation Figure.4 shows the TP of all computer-generated trajectories of dataset 1 and 2 in Fig.4(a)(b), respectively. We sort the TP values and show some examples of the trajectory evaluation in Fig.4(c-e) where the worst TP and OP are analyzed along with the best TP and OP. By retrieving their 3D plotting trajectories comparing the 2D footprints on the background image, we can tell that in most cases our approach tracks fast motion pretty well except a few false positive cases. Part of them occurs in areas where the chamber background performs a similar color feature as flies, thus flies are camouflaged around the joint edges of two glass surfaces (e.g., false positives on the top of chamber in Fig.4(c), though it will be excluded from selection at most time. But when some of the flies move across those areas, these similar background features will be treated as motion features in the lower level of the cascaded linking system, and thus be forward to the next level. This may be solved by further efforts on chamber material modification. Fast motion can be detected and trajectories be modulated by the trackers with confidence as shown in yellow ellipsis in Fig.(c)-(e). Figure. 5 demonstrates the TE evaluation of all ground truth targets (flies) in Video #1 and #2, as shown in Fig.5(a)(b). In Fig.(c)-(e) we plot the ground truth trajectories of target #1, 4 #10 and #50 in black lines, both in 3D dimension and 2D space. The colored shadow in red, green and blue around the ground truth trajectories represent the segments that are successfully tracked. By implementing this procedure, we found that most miss-detection happens in areas where the chamber structure creates low foreground-background contrast, such as the edges and corners. When the flies move along these areas, the background sematic color protects them from being detected. Fast motion detection is also proved to be satisfactory on TE evaluation and backward checking, as shown in yellow ellipsis in Fig.(c)-(e). #Frames #Flies TP TE MOTA Video#1 20000 52 0.9831 0.9310 0.9728 Video#2 220000 82 0.9657 0.9275 0.9841 Table.1. Quantitative Evaluation of our Approach Figure.4. Evaluation of TP Metrics. (a) Tracker Purity evaluation for Video#1; (b) Tracker Purity evaluation for Video#2; (c)(e)Tracker #1, #10 and #50 with their 2D trajectories and 3D plotting, demonstrating the segments of false positives and fast motion. Figure.5. Evaluation of TE Metrics. (a) Target Effectiveness evaluation for Video#1; (b)Target Effectiveness evaluation for Video#2; (c)-(e) target #1, #10 and #50 with their 2D trajectories and 3d plotting, demonstrating the segments of fast motion and miss detection. From Fig.4-5 we observe that in general our approach tracks the fast-moving flies very well. We summarize the quantitative evaluation of our approach on the two datasets in Table 1. We achieve Multiple Object Tracking Accuracy of 0.973 and 0.984 for dataset 1 and 2, separately (the perfect MOTA is 1), showing the high performance of our approach. The overall performance on Video #1 is better than Video #2, considering TP and TE. One of the main reasons is that in Video #2 more numbers of flies are participating in the experiment and thus bring up the occurrence of high speed movements and chances of camouflaging. Another reason is that in Video #2 the flies’ interactive activities between different ones are more often than that of Video #1, which highly increase occurrence of body overlapping (occlussion) 5 and high speed chasing (multiple highly blurred suspects), which are the most challenging cases to our system. 3.4 Biological Statistics As stated at the beginning of this section, we manually lowered the pressure around 25 min on Video#2, as shown in Fig.6(a). We hypothesize that the flies should be acting more obnoxious and thus are less likely to stay stationary. We implement our tracking approach on this video in order to test if the tracking result is capable of revealing the same truth. To save limited computation resource, we only did our experiment through period from the 140000th frame (about 19 minutes 26sec) to the end. 4. CONCLUSION AND FUTURE WORKS We propose a novel adaptive LBP feature to detect tiny flies with low image contrast and develop a cascaded data association approach with fine-to-coarse gating region control to track fast-moving flies. The high performance of our approach shows its potential to enable automated analysis of fly behavior. In the near future, we plan to acquire high-resolution high-speed cameras to solve the extreme fast motion and redesign the chamber to reduce the effects of camouflages. Meanwhile, adaptive local framerate selection based on flies’ activity liveliness is one of the intriguing topics for future researches which could benefit the efficiency and thus lower the time cost in large data analysis. 5. REFERENCES [1] [2] [3] [4] [5] [6] [7] Figure.6. (a) the diagram of air pressure change; (b) the number of walking flies vs. time; (c) the distance covered by walking flies vs. time; Figure.6(b), (c) illustrate the activity patterns of walking flies. An obvious abrupt rise-up after 25min is noticeable in Fig.6(b), showing that a growing number of flies begin to walk from stationary status when the air pressure change is felt by them. This experimental phenomenon matches our prediction about the changing. In Fig.6(c) the figure shows that a growing trend is performed after 25min and a peak is reached at around 26min 30sec. This observation also matches our hypothesis. In future researches, we will implement experiments with more flies in the dataset to test different patterns of biological behaviors and its related motion tracking statistics. [8] [9] R. Bise et al., “Reliable Cell Tracking By Global Data Association,” IEEE Intl. Symposium on Biomedical Imaging, 2011. R.Kasturi et al., “Framwork for Performance Evaluation of Face, Text and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol,” IEEE Trans, on PAMI 31(2), pp. 310-336, 2007. H. W. Kuhn, “Variants of the Hungarian method for assignment problems,” Naval Research Logistics Quarterly, 3:253-258,1956. J. H. Marden et al., “Aerial Performance of Drosophila Melanogaster from Population Selected for Upwind Fight Ability,” J. of Experimental Biology, 200:27472755, 1997. T. Ojala et al., “Multi-resolution gray-scal and rotation invariant texture classification with Local Binary Pattern,” IEEE Trans, on PAMI 24(7), PP. 971-987, 2002 N. Otsu, “A Threshold Selection Method from Gray Level Histograms,” IEEE Trans, on Systems, Man, and Cybernetics, 9(1):62-66, 1979. D. Padfield etal., “Coupled Minimum-Cost Flow Cell Tracking,” International Conference on Information Processing in Medical Imaging (IPMI), 2009 K. Smith et al., “Evaluating Multi-Object Tracking,” in Proc. of IEEE Conf. on CVPR Workshop on Empirical Evaluation Methods in Computer Vision, 2005 A. Yilmaz et al., “Object Tracking: a Survey,” ACM Computing Surveys, 38(4), 45 pages, 2006 6. ACKNOWLEDGEMENT Special thanks to the Intelligent System Center for supporting me and our research works which are presented in this paper. 6