Context-dependent Detection of Unusual Events in Videos by Geometric Analysis of Video Trajectories Longin Jan Latecki (latecki@temple.edu) Computer and Information Sciences Temple University, Philadelphia Nilesh Ghubade and Xiangdong Wen (nileshg@temple.edu) Agenda Introduction Mapping of video to a trajectory Relation: motion trajectory video trajectory Discrete curve evolution Polygon simplification Key frames Unusual events in surveillance videos Results Main Tools Mapping the video sequence to a polyline in a multi-dimensional space. The automatic extraction of relevant frames from videos is based on polygon simplification by discrete curve evolution. Mapping of video to a trajectory Mapping of the image stream to a trajectory (polyline) in a feature space. Representing each frame as: Frame 0 Bin0 ……… Bin n X-coord of Y-coord of Bin’s the Bin’s the Bin’s Frequency Count centroid centroid Frame N Bin n Used in our experiments Red-Green-Blue (rgb) Bins Each frame as a 24-bit color image (8 bit per color intensity): • Bin 0 = color intensities from 0-31 • Bin 1 = color intensities from 32-63 • Bin 8 = color intensities from 224-255 Three attributes per bin: • Row of the bin’s centroid • Column of the bin’s centroid • Frequency count of the bin. (8 bins per color level * 3 attributes/bin)*3 color levels = 72 feature Theoretical Results: Motion trajectory Video trajectory Consider a video in which an object (a set of pixels) is moving on a uniform background. The object is visible in all frames and it is moving with a constant speed on a linear trajectory. Then the video trajectory in the feature space is a straight line. If n objects are moving with constant speeds on a linear trajectory, then the trajectory is a straight line in the feature space. Consider a video in which an object (a set of pixels) is moving on a uniform background. Then the trajectory vectors are contained in the plane. If n objects are moving, then the dimension of the trajectory is at most 2n. If a new object suddenly appears in the movie, the dimension of the trajectory increases at least by 1 and at most by 3. MovingDotMovieWithAdditionalDot.avi Robust Rank Computation Using singular value decomposition, based on: C. Rao, A. Yilmaz, and M.Shah. View-Invariant Representation and Recognition of actions. Int. J. of Computer Vision 50, 2002. M. Seitz and C. R. Dyer. View-invariant analysis of cyclic motion. Int. J. of Computer Vision 16, 1997. err2 ( M ) n 2 i i 3 We compute err in a window of 11 consecutive frames in our experiments. MovingDotMovieWithAdditionalDot.avi -21 MovingDotMovieWithAdditionalDotBins:Graph of Norm Dist for window of "11" frames VERSUS frame number x 10 8 Norm Dist for the window of "11" frames 7 6 5 4 3 2 1 0 0 20 40 60 80 Frame Number 100 120 140 160 Interpolation of video trajectory MovingDotMovie_Clockwise.avi MovingDotMovieWithAdditionalDot.avi Polygon simplification Relevance Ranking Frame Number 0 1 1 100 98 12 99 5 Frames with decreasing relevance Discrete Curve Evolution P=P0, ..., Pm Pi+1 is obtained from Pi by deleting the vertices of Pi that have minimal relevance measure K(v, Pi) = K(u,v,w) = |d(u,v)+d(v,w)-d(u,w)| v v w w u u Discrete Curve Evolution: Preservation of position, no blurring Discrete Curve Evolution: robustness with respect to noise Discrete Curve Evolution: extraction of linear segments Key Frame Extraction Key frames and rank Security1 Bins Matrix Distance Matrix err for seciurity1 video -3 1 x 10 security1Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number Norm Dist for the window of "11" frames 0.8 0.6 0.4 0.2 0 0 50 100 150 200 Frame Number 250 300 350 400 M. S. Drew and J. Au: http://www.cs.sfu.ca/~mark/ftp/AcmMM00/ Predictability of video parts: Local Curveness computation We divide the video polygonal curve P into parts T_i. For videos with 25 fps: T_i contains 25 frames. We apply discrete curve evolution to each T_i until three points remain: a, b, c. Curveness measure of T_i: C(T_i,P) = |d(a, b) + d(b, c) - d(a, c)| b is the most relevant frame in T_i and the first vertex of T_i+1 security7 -4 4.5 x 10 security7Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number 4 Norm Dist for the window of "11" frames 3.5 3 2.5 2 1.5 1 err for seciurity7 0.5 0 0 50 100 150 200 Frame Number 250 300 350 400 2D projection by PCA of video trajectory for security7 Mov3 -4 4 x 10 Mov3Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number 0 50 Norm Dist for the window of "11" frames 3.5 3 2.5 2 1.5 1 0.5 0 100 150 200 Frame Number 250 300 350 400 Mov3: Rustam waving his hand. Bins Matrix Key frames = 1 378 52 142 253 235 148 31 155 167 Distance Matrix Key frames = 1 378 253 220 161 109 50 155 149 270 Hall_monitor -5 6.5 x 10 HallMonitorBins:Graph of Norm Dist for window of "11" frames VERSUS frame number 6 err for hall_monitor Norm Dist for the window of "11" frames 5.5 5 4.5 4 3.5 3 2.5 2 1.5 0 50 100 150 Frame Number 200 250 300 Hall Monitor: 2 persons entering-exiting in a hall. Bins Matrix Key frames = 1 300 35 240 221 215 265 241 278 280 Distance Matrix Key frames = 1 300 37 265 241 240 235 278 280 282 CameraAtLightSignal.avi Multimodal Histogram Histogram of lena Segmented Image Image after segmentation – we get a outline of her face, hat etc Gray Scale Image - Multimodal Original Image of Lena Thank you