Vehicle Mot ion Pattern Analysis for Surveillance by Chaowei Niu Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2006 BMassachuset t s Institute of Technology 2006. All rights reserved Author ......................-. .................-. .-. ..-............ Department of Electrical Engineering and Computer Science January 20,2006 Certified by. .........................- . .-. .. .-. ^.............. w ric L. Crimson Bernard Gordon Professor of Medical Engineering Thesis Supervisor Accepted by ...... . -. ..-.- .w.. - .. v. . . . . . . . . - * - * . * * * * * - * Arthur C. Smith Chairman, Department Committee On G r a d u a p1 ~2s nT r n .T 1 s OF TECHNOLOGY 11 JUL 1 0 2006 I LIBRARIES ARCHIVES 11 1 Motion Pattern Analysis for Far-field Vehicle Surveillance by Chaowei Niu Submitted to the Department of Electrical Engineering and Computer Science on January 20, 2006, in partial fulfillment of the requirements for the degree of Master of Science Abstract The main goal of this thesis is to analyze the motion patterns in far-field vehicle tracking data collected by multiple, stationary non-overlapping cameras. The specific focus is to fully recover the camera's network topology, which means the graph structure relating cameras and typical transitions time between cameras, then based on the recovered topology, to learn the traffic patterns(i.e. source/sink, transition probability, etc.), and finally be able to detect unusual events. I will present a weighted statistical method to learn the environment's topology. First, an appearance model is constructed by the combination of normalized color and overall model size to measure the appearance similarity of moving objects across non-overlapping views. Then based on the similarity in appearance, weighted votes are used to learn the temporally correlating information. By exploiting the st atistical spatio-temporal information weighted by the similarity in an object's appearance, this method can automatically learn the possible links between the disjoint views and recover the topology of the network. After the network topology has been recovered, we then gather statistics about motion patterns in this distributed camera setting. And finally, we explore the problem of how to detect unusual tracks using the information we have inferred. Thesis Supervisor: W. Eric L. Crimson Title: Bernard Gordon Professor of Medical Engineering Acknowledgments First of all, I would like to thank my advisor, Professor Eric Crimson, for his constant wise guidance through my graduate research, and for maintaining an open and interactive environment for me to thrive in. His ever-positive attitude toward work and life sets the standards that I always strive to achieve. Thanks to Chris Stauffer for providing the tracker, Kinh Tieu for providing the map plotting code. I would also like to thank the fellow graduate students in the group: Gerald Dalley, Biswajit Bose, Tomas Izo, Joshu Migdle, Xiaoxu Ma, and Xiaogang Wang. I want to thank my parents for their love and support, and especially for encouraging me all along to pursue my own independent development. Finally, thanks to my husband for his kind understanding and eternal support. Contents 1 Introduction 1.1 Wide-area Surveillance Problem . . . . . . . . . . . . . . . . . . . . . 1.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Appearance Model 2.1 Normalized Color Model . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Comprehensive Color Normalization Algorithm . . . . . . . . 2.1.2 Color Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Size Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Joint Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . 3 Weighted Cross Correlation Model 3.1 Cross Correlation Function . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Cross Correlation Model . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Weighted Cross Correlation Model . . . . . . . . . . . . . . . . . . . 4 Experiments and Problems 4.1 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Mutual Information and Estimation 41 5.1 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.2 Relative Entropy and Mutual Information . . . . . . . . . . . 42 5.1.3 Data Processing Inequality . . . . . . . . . . . . . . . . . . . . 43 5.2 Mutual Information Estimation . . . . . . . . . . . . . . . . . . . . . 44 5.3 Overall Review of The Algorithm . . . . . . . . . . . . . . . . . . . . 44 6 More Experiments 6.1 Simulated Network cont' . . . . . . . . . . . . . . . . . . . . . . . . . 7 Information Inference and and Unusual Track Detection 47 47 51 7.1 Transition Probability Learning . . . . . . . . . . . . . . . . . . . . . 51 7.1.1 Markov Process . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.1.2 Transition Probability Learning . . . . . . . . . . . . . . . . . 53 7.2 Source and Sink Learning . . . . . . . . . . . . . . . . . . . . . . . . 63 7.3 Unusual Track Detection . . . . . . . . . . . . . . . . . . . . . . . . . 63 8 Summary and Discussion 69 List of Figures 1-1 Tracking examples. The bounding box shows the moving object. The first row shows tracking through the same view, the middle row shows tracking through overlapping camera views, and the bottom row shows tracking through non-overlapping camera views. 2- 1 Examples of observations 2-2 ........... .......................... Color histograms of one car's two observations before color normalization . 2-3 Color histograms of the same car's two observations after color normalization 3-1 Example of the case which cross correlation doesn't work . . . . . . . 3-2 After applying the general and weighted cross correlation function on the data from two cameras located at an intersection, the results are shown in Figure 5 (a) and (b), respectively. (b) has a clear peak which suggests a possible link with transition time 11 seconds between those cameras, which (a) doesn't. ................................. 4- 1 (a),(b),(c) are the three non-overlapping cameras we have used. The cameras' relative location is shown in (d) using the shaded quadrangle. .... 4-2 Detected sources/sinks. Black arrows indicate direct links between source/sink 3 and source/sink 4, source/sink 6 and source/sink 7 ........... 4-3 Cross correlation functions between different views. Left one gives the cross correlation between camera b,source/sink 3 and camera c, source/sink 4, with transition time 3 seconds; Right one shows correlation between camera c, source/sink 6 and camera a, source/sink 7, with transition time 4 seconds. Statistics of the simulated data . . . . . . . . . . . . . . . . . . . . . . Cross correlation for each pair of the observers from 17,18,...,to 26. The column index from left to right is: observer 17, observer 18, ...., observer 26; The row index from up to bottom is: observer 17, observer 18, ...., observer 26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The recovered topology based on the weighted cross correlation,the red cross indicates the false link based on the group truth. . . . . . . . . . . . . . (a) The adjacency matrix of the mutual information. (b) The recovered corresponding topology. . . . . . . . . . . . . . . . . . . . . . . . . . . The fully recovered simulated network topology . . . . . . . . . . . . . . Transition probability of the network. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. . . . . . . . . . . . . . . . . . . . . . . . . Transition probability of the network between Sam to 9am. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers . . . . . . . . . . Transition probability of the network between 12pm to lpm. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. . . . . . Transition probability of the network between 6pm to 7pm. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. . . . . . . . . . Transition probability of the network for sedan. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. . . . . . . . . . . . . . . . . . Transition probability of the network for bus. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. . . . . . . . . . . . . . . . . . Transition probability of the network for gas truck. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. . . . . . . . . . . . . . . Transition probability of the network for panel truck. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. . . . . . . . . . Source and sink distribution. The size and color of the number is corresponding to the probability of that observer to be a source or a sink 64 7-10 Gas truck source and sink distribution. The size and color of the number is corresponding to the probability of that observer to be a source or a sink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7-11 Panel truck source and sink distribution. The size and color of the number is corresponding to the probability of that observer to be a source or a sink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 List of Tables 6.1 The learned associated transition time . . . ................ 48 Chapter 1 Introduction Because of the development of technology, multi-camera visual surveillance applications are rapidly increasing in interest. Those applications include tracking moving objects throughout a set of views, classifying those moving objects into different categories (i.e. cars, people, animals), learning the network topology, getting statistics about the moving objects, and finally detecting and interpreting uncommon activities of the moving objects. In this thesis, we are focusing on the last three applications, assuming the first two tasks have been solved. 1.1 Wide-area Surveillance Problem Consider the problem of wide-area surveillance, such as traffic monitoring and activity classification around critical assets (e.g. an embassy, a troop base, critical infrastructure facilities like oil depots, port facilities, airfield tarmacs). We want to monitor the flow of movement in such a setting from a large number of cameras, typically without overlapping fields of view (FOV). To coordinate observations in these distributed cameras, first we need to know the connectivity of movement between fields of view (i.e. when an object leaves one camera, it is likely to appear in a small number of other cameras with some probability). In some instances, one can carefully site and calibrate the cameras so that the observations are easily coordinated. In many cases, however, cameras must be rapidly deployed and may not last for long periods of time. Hence we seek a passive way of determining the topology of the camera network. That is, we want to determine the graph structure relating cameras, and the typical transitions between cameras, based on noisy observations of moving objects in the cameras. If we can in fact determine the "blind" links (i.e. links that connect the disjoint views which cannot be observed directly) between camera sites, we can gather statistics about patterns of usage in this distributed camera setting. We can then record site usage statistics, and detect unusual movements. To determine the network topology and to answer these questions, we must first solve the tracking problem, i.e. we must maintain a moving object's identity from frame to frame, through the same camera view, through overlapping camera views, and through non-overlapping carnera views, as shown in Figure 1-1. The bounding box shows the moving object. In the field of view (FOV), vehicles tend to appear and disappear at certain locations. These locations may correspond to garage entrances, or the edge of a camera view, and been called sources and sinks, respectively [20]. Based on the visible tracking trajectories, one can easily learn the links between each source and sink[l]. Tracking through the same views and overlapping views has been widely studied[2] [3] [4] [5]. However, little attention has been paid to the non-overlapping tracking correspondence. Good understanding of the activity requires knowledge of the trajectories of moving objects. For the field out of view, however, the tracking correspondences are unavailable, even the tracking trajectories are unavailable, which makes this problem harder. 0bservations In this thesis, first we focus on how to learn the non-overlapping network topology, which means to detect the possible "blind" links between disjoint views, and determine the transition time (i.e., the time between disappearing at one location and reappearing at the other location). Our learning is based on the following observat ions: Figure 1-1: Tracking examples. The bounding box shows the moving object. The first row shows tracking through the same view, the middle row shows tracking through overlapping camera views, and the bottom row shows tracking through nonoverlapping camera views. 1. Physical characteristics of moving objects do not change. For example, a red sedan in one view is still a red sedan in another disjoint view, it cannot become a white SUV. 2. Vehicles running on the same route roughly share the same speed and other trajectory characteristics. Based on the real road traffic, most vehicles on road are just following traffic. They will slow down and stop with a red light and will speed up when the green light turns on. This will make the assumption that the transition time from one location to another location is Gaussian distributed reasonable. 3. The trajectories of moving objects are highly correlated across non-overlapping views (i.e. vehicles are not randomly moving between different views). To be more illustrative, suppose a vehicle wants to go from location A to location C through location B. It will go directly from A to B and then to C, instead of doing loops between A and B (i.e. from A to B, then to A, then to B) and finally goes to C. Given these three observations, we propose to use a weighted cross-correlation technique to learn the non-overlapping network topology. First, an appearance model is constructed by the combination of the normalized color and overall size model to measure the moving object's appearance similarity across the non-overlapping views. Then based on the similarity in appearance, the votes are weighted to exploit the temporally correlating information. From the learned correlation function the possible links between disjoint views can be detected and the associated transition time can be estimated. Given the possible (i.e. candidate) links, we can finally recover the network topology by the estimated mutual information. This method combines the appearance information and statistics information of the observed trajectories, which can overcome the disadvantages of the approaches which only use one of them. This method avoids the camera calibration, and avoids solving the tracking correspondence between disjoint views. Related Work One possible approach to learn the connectivity or spatial adjacency of the camera network is to use calibrated camera networks [14] [17]. Jain et al. [14]used calibrated cameras and an environmental model to obtain the 3D location of a person. Collins et al. [17] developed a system consisting of multiple calibrated cameras and a site model, and then used region correlation and location on the 3D site model for tracking. This kind of method usually requires detecting the same landmarks with known 3D coordinates from different cameras and using a complex site model. Another possible approach is to solve the tracking correspondence problem directly. Ali et al.[9] uses MAP estimation over trajectories and camera pose parameters to calibrate and track with a network of non-overlapping cameras. Huang and Russell [7] present a Bayesian foundation for computing the probability of identity, which is expressed in terms of appearance probabilities. Their appearance model is treated as the product of several independent models, such as: lane, size, color and arrival time. They have used a simple Gaussian model to measure the transition probability between two disjoint views. Javed et al. [8] adopted Huang and Russell's method[7] and used Parzen windows to estimate the inter-camera space-time (i.e., transition time between two views) probabilities and then solved the correspondence problem by maximizing the posterior probability of the space-time and appearance. Kang et al. [15] used a combination of appearance model and motion model to track the moving objects continuously using both stationary and moving cameras, then learned the homography between the stationary, the moving cameras and the affine transform derived from the stabilization. The above methods require that we establish the correspondence for individual tracks between non-overlapping views. The correspondence assignment problem can be found in time 0(n3) by formulating the problem as a weighed bipartite matching problem, which is difficult and time consuming. However, appearance information between different views is still quite useful and should not be discarded. Other approaches to estimate the spatio-temporal information uses statistical model[10] [ll][16] [l3]. Petty et al.[ll] proposed to estimate transition time from aggregate traffic parameters in a freeway scenario. Westerman et al. [16] used cumulative arrivals at successive detector sites to estimate vehicle arrivals. Ellis[13] proposed two stage algorithm to learn the topology. First detecting entry and exit zones in each view, then temporally correlating the disappearance and reappearance of tracked objects between those views to detect possible links. For those statistical methods, the performance is only based on information of appearing and leaving time of the detected moving objects at each soure/sink. It will not perform well under fair heavy traffic condition. 1.4 Thesis Organization This thesis is organized as follows. In Chapter 2, we introduce the joint probability model (i.e. appearance model) for measuring the similarity in appearance between detected moving objects. In Chapter 3, the cross-correlation method is constructed to learn the spatio-temporal information, then, the proposed method, "weighted" cross-correlation method, to learn the possible link associated with transition time is discussed. Chapter 4 gives the experimental results and associated problems. Section 5 presents mutual information and how to fully recover the network topology based on the estimated mutual information followed by the results presented in Chapter 6. Given the recovered network topology, Chapter 7 discusses how to learn the transition probability and the source/sink information, finally be able to detect the unusual tracks. Chapter 2 The Appearance Model To coordinate observations in the distributed cameras, we need to know the connectivity of movement between fields of view (i.e. when an object leaves one camera, it is likely to appear in a small number of other cameras with some probability), which means we need to know the network topology. In the following three parts, we will focus on how to recover the camera network topology. The far field vehicle tracking system we have been using is provided by Chris Stauffer[3]. The input to the tracking system is the video sequence, and the output of the tracking system is a set of tracking sequences, where each track is a sequence of observations of the same object (supposely) in the field of view. These tracks are provided as input to our topology learning system. Some sample observations are shown in Figure 2-1. In different views, the same object can appear dramatically different, not only the size, but the color as well. In order to relate the appearance of an object from Figure 2- 1: Examples of observations view to view, the appearance model (i.e. color model, and size model) should be learned first. Learning the appearance model is carried out by assuming that there exists some known correspondences between disjoint views. One way to achieve the correspondence, is by driving the same car around the environment. An other possible way is to manually detect interesting vehicles (i.e. yellow cab, Fedex truck, blue bus) across the disjoint views. Since we only need to model color and overall size, unlike the traditional appearance-based correspondence method, which requires a significant amount of the known correspondence, only some small number of the best matches are needed in the training phase. 2.1 Normalized Color Model Various methods have been proposed to model the color change of moving objects from one camera to another. For far-field vehicle surveillance, since a vehicle is the only moving object and usually contains one color, a single color model per vehicle would be sufficient. However, under different views, the same color may appear dramatically different due to the lighting geometry and illurninant color (figure 2-2). Based on this consideration, we adopt a normalized color model. First, we use the comprehensive color normalization(CCN) algorithm proposed by Finlayson et al. [I81 to reprocess the input color images. 2.1.1 Comprehensive Color Normalization Algorithm The light reflected from a surface depends on the spectral properties of the surface reflectance and of the illumination incident on the surface. In the case of Lambertian surfaces, the light is simply the product of the spectral power distribution of the light source with the percent spectral reflectance of the surface. Assuming a single point source light, illumination, surface reflection and sensor function, combining together forms a sensor response: where A is wavelength, p is a 3-vector of sensor responses ( r g b pixel value), F is the 3-vector of response functions (red, green and blue sensitive), E is the illumination striking surface reflectance Sxat location x. Integration is over the visible spectrum w . Bar denotes vector quantities. The light reflected at x, is proportional to E(A)Sx(A) and is projected onto Z on the sensor array. The precise power of the reflected light is governed by the dot-product term Ex .n' . Here, fix is the unit vector corresponding to the surface normal at x and Ex is in the direction of the light source. The length of ?i models the power of the incident light at x. Note that this implies that the function E(A) is actually constant across the scene. Substituting f^ for L sX(A)E(A)F(A) allows us to simplify the above formula into: It is now understood that ox'^ is that part of a scene that does not vary with lighting geometry (but does change with illuminant color). Equation 2.2, which deals only with point-source lights is easily generalized to more complex lighting geometries. Suppose the light incident at x is a combination of m point source lights with lighting direction vectors equal to Ex'* (2 = 1,2, ,m). In this case, the camera response is equal to: Of course, all the lighting vectors can be combined into a single effective direction vector: This equation conveys the intuitive idea that the camera response to m light equals to sum of the responses to each individual light. Since we now understand the dependency between camera response and lighting geometry, it is a scalar relationship dependent on E n x , it is straightforward to normalize it: A when px,E= ( r ,g, b) then the normalization returns: (A, ,^Ã, ;qÑ. Hence, we can define function R(): where I is an N x 3 image matrix with N image pixels, whose columns contain the intensity of 3 RGB color channels. Let us now consider the effect of illuminant color. If we hold lighting geometry, the vectors ex, fixed and assume the camera sensors are delta functions: F(A) = 8(A - \i),i = (1,2,3). Under E(A) the camera response is equal to: and under a different El (A): Jw Combining the above two equations together we can get: This equation informs us that, as the color of light changes, the values recorded in each color channel scale by a factor (one factor per channel). It is straightforward to remove the image dependence on illuminate color by function C(): where I is an N x 3 image matrix with N image pixels, whose columns contain the intensity of 3 RGB color channels. The N/3 here is to ensure that the total sum of all pixels after the column normalization is N which is the same as that after the row normalization. The comprehensive normalization procedure is defined as a loop: 1. I n = I 2. do = Ii = C(R(Ii)) until Note that after the iteration, we get a lighting geometry and illurninant color independent image. An example is shown in figure 2-2 and 2-3. Because HSV color model is more similar in the way humans tend to perceive color, the example is shown in the HSV color model. Figure 2-2 is the color histograms of one car's two observations before color normalization. We can see that the two histograms for Hue and Saturation are pretty different. After the normalization, however, the histograms for Hue and Saturation match well (Figure 2-3). "L IJ 0 0.5 ¡ f"lKl 0.5 Hue Saturation Hue Saturation 0 Val# 0.5 1 Value Figure 2-2: Color histograms of one car's two observations before color normalization Color Model After the color normalization procedure, we can use a color histogram (in color space HS) to fit a multivariate Gaussian distribution modeling the color change Pdm throughout any two different scenes: Hue Saturation Value Hue Saturation Value Figure 2-3: Color histograms of the same car's two observations after color normalization where el, el are the camera 1 and 2. Ocl,Oc2 are the detected observation under camera 1 and camera 2 respectively. h, s are H and S information included in the observation. ph,, and Sh,. are the mean and variance respectively. And 0 1 ' = 0'2 means those two observations are actually generated by the same object. For each pair of different views, there is a multivariate Gaussian distribution associated with it. To learn the parameters of the Gaussian, for each pair of the camera views, we have used the quadratic distances of the normalized color histogram (i-e. H and S histograms) to compute the mean and variance. 2.2 Size Model For far-field surveillance, even after successful detection, there are often very few image pixels per object, which makes it difficult to model the shape change throughout cameras. However, we know for sure that a sedan in one scene cannot be a truck in another scene, which means overall size information still plays an important role in correspondence. Here we use width and length of the bounding box to measure the overall size. This estimate of size is somewhat simplistic. However, given that objects are fairly small in far field settings, it is unlikely that we will be able to recover the shape detail, so all we rely on is over size measures. Ideally, we should fit a best ellipse to the shape, to account for orientation relative to the camera, but in general given the small image size of objects, we find width and length to suffice. We also adopt a multivariate Gaussian distribution to model the size change Pãze where wcl ,lC1are the detected vehicle's width and length under camera 1. pw,iand EWi are the mean and variance respectively. The imaging transformation of a perspective camera leads to distortion of a number of geometric scene properties. As a result, objects appear to grow larger as they approach the camera center and become smaller when they are far away from the camera[19]. So in the sense of simple normalization, the average size over the whole trajectory has been adopted, when we do the size model. The parameters of this Gaussian distribution can be estimated using the same procedure as described in Chapter 2.1.2. Joint Probability Model Given two observations oi and 4,where oi is the observation a from camera z and 4 is the observation b from camera j , the similarity in appearance between those two observations can be calculated as the probability that these two observations are actually generated by the same object, which is called "appearance probability", denoted by P(o,,,, oi,, la = 6). It is important to note that the appearance probability is not the probability a = b. Assuming that color and size information of each observation is independent, the similarity in appearance between two observations can be described as the product of the color and size similarity: Now we know how to model the appearance change of objects from view to view, and how to measure the similarity in appearance for two observations. This result will be used to help exploring the st atistical spatio-temporal information (see Chapter 3). Chapter 3 Weighted Cross Correlation Model If we can determine the "blind" links (i.e. links that connect the disjoint views) between camera sites, we can then gather statistics about patterns of usage in this distributed camera setting. This would then allow us to detect unusual movements, to classify types of activities, to record site usage statistics. In this chapter, we will discuss how to incorporate the appearance similarity information into the cross correlation function, then use it to estimate the possible blind links between disjoint views. 3.1 Cross Correlation Function In statistics, the term cross correlation is sometimes used to refer to the covariance cov(X, Y) between two random vectors X and Y, in order to distinguish that concept from the ('covariance7'of a random vector X , which is understood to be the matrix of covariances between the scalar components of X. In signal processing, the cross correlation (or sometimes "cross-covariance") is a standard method of estimating the degree to which two series are correlated, commonly used to find features in an unknown signal by comparing it to a known one[l2]. Consider two discrete series x(2) and y(2) where 2 = 0,1,2.. .N - 1. The cross correlation R at delay d is defined as: If the above is computed for all delays d=0,1,2, ...N-1 then it results in a cross correlation series of twice the length as the original series. There is the issue of what to do when the index into the series is less than 0 or greater than or equal to the number of points (i - d < 0 or i - d >. N). The most common approaches are to either ignore these points or assuming the series x and y are zero for i < 0 and i N. In many signal processing applications the series is assumed to be circular in which case the out of range indexes are "wrapped" back within range, ie: x(-1) = x(N - 1),x(N + 5) = x(5) etc. The range of delays d and thus the length of the cross correlation series can be less than N, for example the aim may be to test correlation at short delays only. The denominator in the expression above serves to normalize the correlation coefficients such that -1 < r(d) < 1,the bounds indicating maximum correlation and 0 indicating no correlation. A high negative correlation indicates a high correlation but of the inverse of one of the series. Cross Correlation Model As mentioned in the chapter of Introduction, there are two observations: Transition time from one location to another location is Gaussian distributed. And the trajectories of moving objects are highly correlated across non-overlapping views. Under these two observations, we can see that the sequences of appearing vehicles under the connected cameras (i.e. there exist routes directly connecting those cameras) are highly correlated. Since cross correlation function can capture the degree of correlation between two signals, we present a simple cross-correlation model to estimate the existence of possible blind links and the associated transition time between different cameras. For each traffic source/sink (i.e. locations where objects tend to appear in a scene and locations where objects tend to disappear from a scene), traffic can be represented Figure 3-1: Example of the case which cross correlation doesn't work as a discrete flow signal V ,(t), which is defined as the list of observations (see Figure 2-1) appearing in a time interval around time t at source/sink The cross-correlation function between signals V,(t) and 2. V,(t) can indicate the possibility of a link, and be used to estimate the transition time if there exists such a link: If there is a possible link between source/sink 2 and j, there should exist a clear peak in Rjyj(T)at time T = t, where t denotes the transition time from location z to location j . In this sense, a possible "blind" link from location 2 to location j has been learned. However, there are some limitations to this method. For example, it would not perform well under heavy traffic conditions. To illustrate this problem, we present an extreme situation (See Figure 3-1). Suppose at source/sink A, an yellow school bus leaves every 5 minutes since 8am, at source/sink B, a blue police car appears every 5 minutes since 8:01am, and there is no possible link between A and B. However, if we use the cross correlation method directly, a possible link will be learned and the learned transition time would be 60 seconds. Intuitively, at different source/sinks, only those observations which look similar in appearance can be counted to derive the spatio-temporal relation. In order to fix this problem, we propose a weighted cross correlation technique 3.3 Weighted Cross Correlation Model The weighted cross correlation technique is defined as : Specifically, for a pair of disappearing vehicles at source/sink i at time t and appearing vehicles at source/sink j at time t+T, calculate the similarity in appearance between those two observations and update R,,, (T). Then peak values can be detected using the threshold estimated as: threshold = mean(Ru(T)) + w * std(Ri,,(T)) where w is a user-defined constant. In this work, we assume there is only one popular transition time if there is a link between i and j. People in real life tend to choose the shortest path between the start location and the destination, which makes the single transition time reasonable with the assumption of constant velocity. Although we assume there is only one popular transition time between two disjoint views, this weighted cross correlation model can be applied to the cases with multiple transition times which will result in multiple peaks in R(T). For our implementation, transition time is assigned with the time associated with the highest detected peak. Figure 5 gives an example when weighted cross correlation can detect a valid link, while general cross correlation fails. After applying the general and weighted cross correlation function on the data from two cameras located at an intersection, the results are shown in Figure 3-2 (a) and (b), respectively. (b) has a clear peak which suggests a possible link with transition time 11 seconds between those cameras, which (a) does not. In this part, we learned how to use the weighted cross correlation model to estimate the possible blind links and the associated transition time between disjoint views. we Figure 3-2: After applying the general and weighted cross correlation function on the data from two cameras located at an intersection, the results are shown in Figure 5 (a) and (b), respectively. (b) has a clear peak which suggests a possible link with transition time 11 seconds between those cameras, which (a) doesn't. will present the experimental results in the next section using both real tracking data and synthetic tracking data. Chapter 4 Experiments and Problems In order to evaluate the proposed the weighted cross correlation method, we have tested it both on real data and synthetic data. Real Data Figure 41: (a),(b),(c) are the three non-overlapping cameras we have used. The cameras' relative location is shown in (d) using the shaded quadrangle. Figure 4-2: Detected sources/sinks. Black arrows indicate direct links between source/sink 3 and source/sink 4, source/sink 6 and source/sink 7 For the real data experiment, we used three non-overlapping cameras distributed around several buildings. The layout of the environment and the cameras' location are shown in Figure 4-1. For each camera, we have 1 hour of vehicle tracking data obtained from a tracker based on[3] every day for six days. There are total of 213 observations in camera(a),1056 observations in camera (b), 1554 observations in camera (4. In our cameras, all the streets are two way streets, i.e. each source is also a sink. For simplicity, we merge sources and sinks into groups of source/sinks. The detected source/sinks in each camera are learned by clustering the spatial distribution of each observation's trajectory's beginning and ending points (i.e. the appearing coordinate and disappearing coordinate). The detected source/sinks are shown in Figure 4-2. For each source/sink, there is an associated Gaussian distribution with mean and variance. From the cameras' spatial relationship, we know that there exists directly links between source/sink 3 and source/sink 4, source/sink 6 and source/sink 7, and there is no other direct link among those sources/sinks. Visible links can be easily Correlationbetween sourcelsink 6 and 7 Figure 4-3: Cross correlation functions between different views. Left one gives the cross correlation between camera b,source/sink 3 and camera c, source/sink 4, with transition time 3 seconds; Right one shows correlation between camera c, source/sink 6 and camera a, source/sink 7, with transition time 4 seconds. learned using trajectories information. Our goal is to learn such "blind" links. Because we only focus on learning the "blind" link between disjoint views, we know that the transition time must be non-negative which is determined by the nature of traffic flow, i.e, the same vehicle must first disappear at one specific location, then can reappear at the other different location. However, if overlapping views have been considered, the transition time may be negative. For any pair of source/sinks, we have using the disappearing vehicles at one sink and the appearing vehicles at the other source to calculate the weighted cross correlation function. A possible link has been detected if there exists a significant peak in the cross-correlation function (See equation 3.4, in our experiments, w is set to 2). Only two possible links have been detected as shown in Figure 4-3. The left one gives the cross correlation between camera b,source/sink 3 and camera c, source/sink 4, with transition time 3 seconds; The right one shows correlation between camera c, source/sink 6 and camera a, source/sink 7, with transition time 4 seconds. Notice that the detected "blind" links don't include the links like the one between sourcefsink 10 to source/sink 6 through source/sink 7. The reason is that we have used the visible trajectory's information. If we want to check the possible "blind" link between sourcefsink 10 and sourcefsink 6, we would use the observations that Figure 4-4: Statistics of the simulated data leave the scene through source/sink 10 and the ones that enter the scene through source/sink 6, which wouldn't give the link through source/sink 7. So the cameras' topology can be fully recovered. Simulated Data We also tested our algorithm on a simulated network. This simulator synthetically generates the traffic flow in a set of city streets, allowing for stop signs, traffic lights, and differences in traffic volume (i.e. morning rush hours and afternoon rush hours have a higher volume, as well as lunch traffic). The network includes 101 cameras which are located at roads' intersections (including cross and T intersections). For each camera, there are two observers that look in the opposite directions of the traffic flow (i-e. Observer 1 and 2 belong to camera 1 , Observer 3 and 4 belong to camera 2, etc). Every observer can be treated as a source/sink. Tracking data has been simulated 24 hours every day for 7 week days, including 2597 vehicles (Fig. 4-4). Transition time may change with the road condition. For example, it will be larger during rush hour than during non-rush hour. So in our experiment, we only pick one particular hour data (loam to Ham) each day for 5 days. For each camera, the only information we have is that vehicles appear then disappear from this location roughly the same time (i.e. the duration is very short), so we can treat it like a delta function. For each pair of the observers, we first calculate the cross correlation function that has been learned for each pair of the observers. A possible link has been detected Figure 4 5 : Cross correlation for each pair of the observers from 17,18,...,to 26. The column index from left to right is: observer 17, observer 18, ...., observer 26; The row index from up to bottom is: observer 17, observer 18, ...., observer 26. Figure 4-6: The recovered topology based on the weighted cross correlation,the red cross indicates the false link based on the group truth. if there exists a significant peak in the cross-correlation function (See equation 3.4, in our experiments, w is set to 2). Figure 4-5 shows the cross correlation results for each pair of the observers from 17, 18,. . . , to 26, each row is the parent observer, each column is the child observer, detected possible links are highlighted in black background figures. From the detected links, however, the topology wasn't correctly recovered(see Figure 4-6). For example, there are detected links from observer 25 to 22, from observer 22 to 23 and from observer 25 to 23. We don't know if the link from observer 25 to 23 is actually through observer 22, or if there exists another link between them. 4.3 Problems Unlike the real data, this camera view has only one source/sink and we have no information of any visible links, so we don't know where the vehicles are coming from and where they are going. Hence all the vehicles have been used to calculate the cross correlation function. In order to get rid of those "fake" links and recover the true topology, we think of mutual information. Chapter 5 Mutual Information and Estimation From the previous chapter, we know that weighted cross correlation function can only help us to detect "possible" blind links, which may include "fake" links. In this chapter, we will focus on how to solve this problem and discuss how to use mutual information to remove the fake links and fully recover the network topology. 5.1 Mutual Information Mutual information is a measure of the amount of information that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other. 5.1.1 Entropy First, we will introduce the concept of entropy, which is a measure of uncertainty of a random variable and a measure of the amount of information required on the average to describe the random variable. Let X be a random variable and p ( x ) be the probability distribution function of X . Deftnition[21]:The entropy H ( X ) of a discrete random variable X is defined The definition of entropy is related to the definition of entropy in thermodynamics; The higher the entropy of one random variable, the more uncertain of this random variable. Next, we will introduce the two relate concepts: relative entropy and mutual information. 5.1.2 Relative Entropy and Mutual Information The relative entropy is a measure of the distance between two distributions. In statistics, it arises as an expected logarithm of the likelihood ratio. The relative entropy D(p\\q) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p. For example, if we knew the true distribution of the the random variable, then we could construct a code with average description length + H ( p ) . If, instead, we used the code for a distribution q, we would need H ( p ) D(p\\q) bits on the average to describe the random variable. Definition [211: The relative entropy or Kullback- Leibler distance between two probability distributions p ( x ) and q ( y ) is defined as: It can be easily shown that relative entropy is always non-negative and is zero if and only if p = q. However, it is not a true distance between distributions since it is not symmetric and does not satisfy the triangle inequality. Nonetheless, it is often useful to think of relative entropy as a "distance" between distributions. Now we are ready to introduce mutual information, which is a measure of the amount of information that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other. Definition[21]: Consider two random variables X and Y with a joint probability distribution p(x, y) and marginal probability distribution functions p(x) and p(y). The mutual information I ( X ; Y) is the relative entropy between the joint distribution and the product distribution p(x)p(y): R-om the definition of mutual information, it can be easily shown that I ( X ; Y) >0 with equality if and only if X and Y are independent (i.e. the joint distribution is the same as the product of the marginal distributions). In other words, the higher the mutual information between two variables, the more likely the two variables are dependent. Also, it can be shown[21] that for a Markov chain type topology between three random variables X -> Y -> 2, we have I ( X ; Y) $: I ( X ; 2 ) . Mutual information can also be interpreted by the concept of entropy. Mutual information between X and Y is the uncertainty of X minus the uncertainty of X given the information of Y. In other words, mutual information is the amount by which the uncertainty about X decreases when Y is given: the amount of information Y contains about X. Mutual information has been used in many fields to recover the topology[23] [24], which means to find the dependency relationships between the variables involved. Graphical models provide a useful methodology for expressing the dependency structure of a set of random variables[27]. Random variables can be treated as nodes of a graph, while edges between nodes indicate the dependency, which can be estimated by the pairwise mutual information. It has been showed that the graph with the maximum edge weight will be the optimum tree dependency approximation. 5.1.3 Data Processing Inequality As mentioned before, for a Markov chain type topology between three random variables X Ñ Y -> 2,we have I ( X ; Y) > I(X; Z ) , this is called data processing inequal- ity. Considering our camera network problem, the mutual information of neighboring cameras should be greater than non-neighboring cameras. We can use this property to refine the network topology. 5.2 Mutual Informat ion Estimation In order to calculate the mutual information, we need to estimate the joint and marginal distributions of X and Y, which is computationally hard. However, with the assumption that X and Y are jointly Gaussian distributed with correlation coefficient pxy,the quantity of mutual information can be computed analytically as (pxycan capture the linear dependence between X and Y regardless of their joint distribution) [26] [25]: I ( X ; Y) 1 2 10g2(l- p,,) 2 = -- (5.5) From Chapter 3, we already know how to estimate the weighted cross correlation the correlation R i j (T). So if there exists a clear peak in Ru (T) at time T = Tpeak, coefficient can be estimated as: Because the cross correlation function is under the assumption that the signals are transient, which is not accurate for our case, we have used median of Ri,j(T) instead of mean of Ri,j(T). 5.3 Overall Review of The Algorithm To implement the proposed algorithm, four steps must proceed sequentially : 1. For each possible pair of source/sinks, learn the cross correlation function; 2. Detect the possible links using the peak detection algorithm; 3. For the detected links, estimate the cross correlation coefficients, otherwise, set the cross correlation coefficient to 0; 4. Recover the network topology based on the estimated mutual information. Chapter 6 More Experiments In this chapter, we will use the estimated mutual information to recover the simulated network topology based on the weighted cross correlation function. 6.1 Simulated Network cont' As we discussed in Chapter 4, for the simulated network (there is only one sourcefsink per camera view), only using the weighted cross correlation function, the topology cannot be correctly recovered. So after the cross correlation function has been learned, mutual information has been estimated as shown in Figure 6-1(a) with intensities corresponding to the magnitude of the mutual information. The brighter the figure, the higher the mutual information. F'rom the data processing inequality, we know that mutual information for the neighboring cameras is higher than the mutual information for the nonneighboring cameras. So we can cluster the mutual information into two clusters based on the magnitude. The cluster with higher mutual information would be used to recover the network topology. Figure 6-l(b) is the recovered topology for observer 17 to observer 26. We can see that the link from observer 25 to 23 is actually through 22 which is consistent with the ground truth. Table 6-1 shows the learned associated transition time for each link. Finally, the fully recovered topology of the simulated network is shown in Figure 6-2. Number means the index of the observers. .I . Figure 6-1: (a) The adjacency matrix of the mutual information. (b) The recovered corresponding topology. For the real data, since there are multiple source/sinks per camera view, which means we can get information of visible trajectories, we can successfully recovery the topology without using mutual information. If there is only one source/sink per camera view (i.e. zooming in), or every camera view is treated as one large source/sink, however, mutual information will be needed to learn the network topology. P a r e n t observer 1 Child observer 1 Ttan. time(Seconds) Table 6.1: The learned associated transition t irne 1 Figure 6-2: The fully recovered simulated network topology Chapter 7 Information Inference and and Unusual Track Detection In the previous chapter, we discussed how to recover the network topology using the far-field vehicle tracking data. Now given the network topology, we can learn the transition probability information, the source/sink information, and finally can detect unusual tracks. 7.1 7.1.1 Transition Probability Learning Markov Process A discrete-time stochastic process[29][28]is a collection Xn for n E 1 : N of random variables ordered by the discrete time index n. In general, the distribution for each of the variables Xncan be arbitrary and different for each n. There may also be arbitrary conditional independence relationships between different subsets of variables of the process-this corresponds to a graphical model with edges between most nodes. Now we will consider discrete-time first-order Markov chains[30], in which the state changes at certain discrete time instants, indexed by an integer variable n. At each time step n, the Markov chain has a state, denoted by Xn, which belongs to a finite set S of possible states, called the state space. Without loss of generality, and unless there is a statement to the contrary, we will assume that S = 1, ,m, for some positive integer m. The Markov chain is described in terms of its transition probabilities pi, : whenever the state happens to be i, there is probability pi, that the next state is equal to j. Mathematically, The key assumption underlying Markov processes is that the transition probabilities pi, apply whenever state z is visited, no matter what happened in the past, and no matter how state i was reached. Mathematically, we assume the Markov property, which requires that for all times n, all states i ,j <E S, and all possible sequences 20, .. , of earlier states. Thus, the probability law of the next state Xn+1 depends on the past only through the value of the present state Xn. The transition probabilities pi, must be of course nonnegative, and sum to one: m 5,. = 1,for all i. j=1 We will generally allow the probabilities pi,, to be positive, in which case it is possible for the next state to be the same as the current one. Even though the state does not change, we still view this as a state transition of a special type (a self-transition). Specification of Markov Models A Markov chain model is specified by identifying 1. The set of states S = 1, ,m. 2. The set of possible transitions, namely, those pairs (2, j) for which p~ 3. And, the numerical values of those pi, that are positive. > 0. The Markov chain specified by this model is a sequence of random variables Xo,Xi, X2, *, that take values in S and which satisfy P(Xn+1= J\Xn = Z,Xn-1 = ...,Xo = to) = P(Xn+1= l\Xn = 2) = Pi, (7.4) for all times n. all states i ,j E S, and all possible sequences ia, ,in-i of earlier states. All of the elements of a Markov chain model can be encoded in a transition probability matrix, which is simply a two-dimensional array whose element at the zth row and jth column is pij : Pll P12 P21 P22 " Plm * * . Pml Pm2 ' Pmm It is also helpful to lay out the model in the so-called transition probability graph, " whose nodes are the states and whose arcs are the possible transitions. By recording the numerical values of pij near the corresponding arcs, one can visualize the entire model in a way that can make some of its major properties readily apparent. 7.1.2 Transition Probability Learning After we know the connectivity of the network, we can fit a first order Markov model to this network, hence to learn the transition probability from nodes to nodes. In the real world, traffic patterns do not remain the same all the time. We wouldn't expect the traffic of morning rush hour to have the same pattern as that of evening non-rush hour. In other words, the transition probability of the network will change with time. Therefore, we would also like to learn the transition probability in the function of time. We will continue to use the simulated network to demonstrate. As we said before, this simulator synthetically generates the traffic flow in a set of city streets, allowing for stop signs, traffic lights, and differences in traffic volume (i.e. morning rush hours 53 and afternoon rush hours have a higher volume, as well as the lunch traffic). The network includes 101 cameras which are located at roads7 intersections (including cross and T intersections). For each camera, there are two observers that look in the opposite directions of the traffic flow (i-e. Observer 1 and 2 belong to camera 1 , Observer 3 and 4 belong to camera 2, etc) . So there are total of 202 observers. Every observer can be treated as a source/sink. Tracking data has been simulated 24 hours every day for 7 week days, including 2597 vehicles. The learned transition probability is shown in the Figure 7-1. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. We have studied the transition probability for every two hours from morning to evening. Figure 7-2 shows the Transition probability of the network between Sam to 9am. Figure 7-3 shows the transition probability of the network between 12pm to Ipm. And Figure 7-4 shows the transition probability of the network between 6pm to 7pm. From those figures, we can see that transition probability does change with the time. The traffic patterns do not remain same for all types of vehicles either. We wouldn't expect the traffic of buses to have the same pattern as that of gas trucks. In other words, the transition probability of the network will change with different kinds of vehicles. So we try to learn the transition probability in the function of vehicle type. In the simulator, there are total 7 kinds of vehicles: sedan, bus, SUV, minivan, pickup, gas truck, and panel truck. And the results are shown from Figure 7-5 to 7-8. Figure 7-5 shows the transition probability for sedan. Figure 7-6 shows the transition probability for bus. Figure 7-7 shows the transition probability for gas truck. And Figure 7-8 shows the transition probability for panel truck. F'rom those figures, we can see different type of vehicles has totally different transition probability. Especially for gas truck, it only contains activity at limited regions. Figure 7-1: Transition probability of the network. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. Figure 7-2: Transition probability of the network between Sam to 9am. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers Figure 7-3: Transition probability of the network between 12pm to lpm. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. Figure 7-4: Transition probability of the network between 6pm to 7pm. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. Sedan Figure 7-5: Transition probability of the network for sedan. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. Bus Figure 7-6: Transition probability of the network for bus. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. Figure 7-7: TYansition probability of the network for gas truck. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. Figure 7-8: Transition probability of the network for panel truck. The number means the observer. The width of the link is proportional to the magnitude of the transition probability. The thicker the link, the higher the transition probability between the two observers. 7.2 Source and Sink Learning In the field of view, vehicles tend to appear and disappear at certain locations. These locations may correspond to garages entrances, or the edge of a camera view, which have been called sources and sinks, respectively [20]. Source/sink information is also important for motion pattern analysis. It will help us to get an overall sense of what type of vehicles tend to appear and disappear at specific locations and get to know existence of certain infrastructure. For example, if we observe that the gas truck always disappear at one location, we may conclude that there might be a gas station around. It will also help us to detect some unusual events. So in this part, we focus on to learn the source and sink distribution for all the data and also learn the source and sink distribution for different type of vehicles. Figure 7-9 , 7-10 and 7-11 show the source and sink distribution for all the tracking data, for the tracking data of gas truck and for the tracking data of panel truck, respectively. In the figures, the size and color of the number is corresponding to the probability of that observer to be a source or a sink. The larger and brighter the number, the higher the probability of that observer to be a source or a sink. From those figures, we can see that different type of vehicles yields different source/sink distribution. For example, for gas truck, it only tends to appear at observer 200 and observer 75, and disappear at observer 188 and 71. For panel truck, however, it tends to appear at observer 17, observer 172, observer 187, etc, and disappear at observer 18, observer 171, observer 188, etc. 7.3 Unusual Track Detection The ultimate goal of most surveillance system is the automatic detection of unusual activities thereby triggering alarms. How to define unusual? In Merriam-Webster dictionary, "Unusual" is defined as "uncommon, rare" . For the motion pattern analysis, unusual tracks means the trades which are different from the normal tracks. Given the large number of observations, after we get the statistics(i.e. normal pattern) of Source Sink Figure 7-9: Source and sink distribution. The size and color of the number is corresponding to the probability of that observer to be a source or a sink Figure 7- 10: Gas truck source and sink distribution. The size and color of the number is corresponding to the probability of that observer to be a source or a sink. Pimdmck Source Figure 7-11: Panel truck source and sink distribution. The size and color of the number is corresponding to the probability of that observer to be a source or a sink the tracking data, we could explore the unusual tracks in the following aspects: Does the track include any unusual path given the time and vehicle type ? In another word, is this track associated with very low transition probability? Does the track coming from an unusual source or disappearing at an unusual sink given the vehicle type? Does the track has repeated pattern? If any track has yes to above questions, it will be flagged. In the following part, we will discuss in detail how to detect an unusual track using two examples. Case 1: For the surveillance problem, a gas truck is one special kind of vehicle. In most of cases, it follows a specific route, say appearing from a certain location to the gas station, then, reappearing from the gas station to a specific sink. When it has a different route from its routine route, there may exist some kind of situation we should pay attention to. For example, if we detect a gas truck has a track of following sequence: observer 72, observer 70, observer 104, observer 106. From the source and sink distribution, we know that for a gas truck, the probability that it will disappear at observer 106 is 0.02, which is too small. We may flag this as an unusual track. Case 2: For the surveillance problem, we should also pay attention to the repeated motion pattern. Consider this scenario: A terrorist plans on bombing a certain critical asset, say a power supply facility. He transports supplies using a pickup truck from his hideout to the parking lot of the facility in several separate, consecutive trips to load and unload supplies. The track looks like: observer 50, 58, 62, 149, 150, 61, 57, 49, 50, 58, 62, 149,150, 61, 57, 49, 50, 58, 62, 149, 150, 61, 57, 49, 50, 58, 62, 149, 150. This track has repeated under camera 25, 29, 31, 75 for 7 times. We need to find and flag this track. In the one week data, for pickup truck, there are total 1 track which has a pattern repeated for 7 times, 8 tracks which have a pattern repeated for 6 times, 46 tracks which have a pattern repeated for 5 times. The track we are interested ranks number 1 in all the repeated tracks, and will stand out from all the tracks. Therefore, we can flag this track as unusual track. Chapter 8 Summary and Discussion In this thesis, we have studied how to learn the motion pattern of the vehicles using far-field vehicle tracking data. The first and most important step is to recover the network's topology. In order to solve this problem, we proposed a weighted crosscorrelation technique. First, an appearance model is constructed by the combination of the normalized color and overall size model to measure the moving object's appearance similarity across the non-overlapping views. Then based on the similarity in appearance, the votes are weighted to exploit the temporally correlating information. From the learned correlation function the possible links between disjoint views can be detected and the associated transition time can be estimated. Based on the learn cross correlation, the network topology can be recovered based on the estimated mutual information. This method combines the appearance information and statistics information of the observed trajectories, which can overcome the disadvantages of the approaches which only use one of them. This method avoid doing the camera calibration, avoid solving the tracking correspondence between disjoint views. However, our algorithm is based on three assumptions: (a) The appearance of the moving objects doesn't change. (b)The objects are moving at a roughly constant velocity. (c)The trajectories of the moving objects are highly correlated across nonoverlapping views. If any of these three assumption fails, the proposed algorithm would present uncorrect results. Another limitation of this method is that it can only solving the topology with one popular transition time between disjoint views; If the transition time is multi-model, one possible way to solve it is to estimate the mutual information directly, which means to estimate the joint distribution and marginal distribution of variables. After we discover the topology of the network, we then gather statistics about motion patterns in this distributed camera setting. This would then allow us to record site usage statistics, to classify types of activities, and to detect unusual movements. First , we fit a first order of markov model to this network, hence to learn the transition probability from nodes to nodes, to learn the transition probability in the function of time, as well as the transition probability in the function of vehicle type. Then we infer the information of the source/sink distribution, and the source/sink distribution in the function of vehicle type. Finally, we explore the problem how to detect unusual tracks using the information we have inferred. In future, we would like to explore the problem of recovering the network topology associated with multi transition time. Estimating the mutual information directly from estimating the joint and marginal distribution of the variables is probably good given the condition of multi transition time. The next work is to find a more "general" way to define unusual tracks using the statistics information of the tracking data. Bibliography [I] Chris Stauffer, Eric Grimson, "Learning Patterns of Activity Using Real-Time Tracking", IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI), 22(8):747-757, 2000. [2] Reid D B. "An algorithm for tracking multiple targets", AC, 24(6):843-854, December 1974. [3] Chris Stauffer, "Adaptive Background Mixuture Models for Real-Time Tracking" , Proc. IEEE Conf. on Computer Vision and Pattern Recognition, July, 1999. [4] Isard M, MacCormick J P, "Bramble: A Baysian Multiple-Blob Tracker", IEEE International Conference on Computer Vision, October, 2001. [5] Williams 0, Blake A and Cipolla R. "A Sparse Probabilistic learning algorithm for real-time tracking". IEEE International Conference on Computer Vision, October, 2003. [6] Chris Stauffer , Kinh Tieu, "Automated multi-camera planare tracking correspondence modeling", Proc. IEEE Conf. on Computer Vision and Pattern Recognition, July, 2003. [7] Huang T, Russell S, ('Object identification in a Bayesian context",in Proc. of IJCAI, 1997. [8] Omar Javed, Zeeshan Rasheed, Khurram Shafique, Mubarak Shah, "Tracking Across Multiple Cameras With Disjoint Views", IEEE International Conference on Computer Vision, October, 2003. [19] Biswajit Bose, Eric Grimson, "Ground Plane Rectification by Tracking Moving Objects". Proceedings of the Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), Nice, France, October 2003. [20] Chris Stauffer. " Estimating Tracking Sources and Sinks," Conference on Computer Vision and Pattern Recognition Workshop, 2003. [21] Cover T M, Thomas J A, "Elements of Information Theory". Wiley, 1991. 2 Cormen T H, Leiserson C E, and Rivern R L, "Introduction to Algorithms", MIT Press, 1990. [23] Singhal M, "A Dynamic Information-Structure Mutual Exclusion Algorithm for Distributed Systems", IEEE Transactions on Parallel and Distributed Systems, 1992. [24] Brillinger D R, "Second-order moments and mutual information in the analysis of time series", Recent Advances in Statistical Methods. Imperial College Press, London, 2002. 2 Kullback S, "Information Theory And Statistics". Dover, 1968. 2 Wentian Li, "Mutual Information Functions Versus Correlation Functions" . Journal of Statistical Physics, 60(5-6):823-837 1990. [27] Finn Jensen. "An Introduction to Bayesian Networks" . Spring, 1996. 2 Queesland University of Technology, "Introduction to Markov Chains courtesy of OUT" . [29] Jeff Bilmes, "What HMMs Can Do". UWEE Technical Report Number UWEETR-2002-2003, 2002. 3 Dimitri P. Bertsekas, John N. Tsitsiklis, "Introduction to Probability'' . Athena Scientific, 2002.