Extraction of key frames from videos

Context-dependent Detection of Unusual Events in Videos by Geometric Analysis of Video Trajectories Longin Jan Latecki (latecki@temple.edu) Computer and Information Sciences Temple University, Philadelphia Nilesh Ghubade and Xiangdong Wen (nileshg@temple.edu) Agenda  Introduction  Mapping of video to a trajectory  Relation: motion trajectory  video trajectory  Discrete curve evolution  Polygon simplification  Key frames  Unusual events in surveillance videos  Results Main Tools  Mapping the video sequence to a polyline in a multi-dimensional space.  The automatic extraction of relevant frames from videos is based on polygon simplification by discrete curve evolution. Mapping of video to a trajectory  Mapping of the image stream to a trajectory (polyline) in a feature space.  Representing each frame as: Frame 0 Bin0 ……… Bin n X-coord of Y-coord of Bin’s the Bin’s the Bin’s Frequency Count centroid centroid Frame N Bin n Used in our experiments  Red-Green-Blue  (rgb) Bins Each frame as a 24-bit color image (8 bit per color intensity): • Bin 0 = color intensities from 0-31 • Bin 1 = color intensities from 32-63 • Bin 8 = color intensities from 224-255  Three attributes per bin: • Row of the bin’s centroid • Column of the bin’s centroid • Frequency count of the bin.  (8 bins per color level * 3 attributes/bin)*3 color levels = 72 feature Theoretical Results: Motion trajectory  Video trajectory Consider a video in which an object (a set of pixels) is moving on a uniform background. The object is visible in all frames and it is moving with a constant speed on a linear trajectory. Then the video trajectory in the feature space is a straight line. If n objects are moving with constant speeds on a linear trajectory, then the trajectory is a straight line in the feature space. Consider a video in which an object (a set of pixels) is moving on a uniform background. Then the trajectory vectors are contained in the plane. If n objects are moving, then the dimension of the trajectory is at most 2n. If a new object suddenly appears in the movie, the dimension of the trajectory increases at least by 1 and at most by 3. MovingDotMovieWithAdditionalDot.avi Robust Rank Computation Using singular value decomposition, based on: C. Rao, A. Yilmaz, and M.Shah. View-Invariant Representation and Recognition of actions. Int. J. of Computer Vision 50, 2002. M. Seitz and C. R. Dyer. View-invariant analysis of cyclic motion. Int. J. of Computer Vision 16, 1997. err2 ( M )  n 2   i i 3 We compute err in a window of 11 consecutive frames in our experiments. MovingDotMovieWithAdditionalDot.avi -21 MovingDotMovieWithAdditionalDotBins:Graph of Norm Dist for window of "11" frames VERSUS frame number x 10 8 Norm Dist for the window of "11" frames 7 6 5 4 3 2 1 0 0 20 40 60 80 Frame Number 100 120 140 160 Interpolation of video trajectory MovingDotMovie_Clockwise.avi MovingDotMovieWithAdditionalDot.avi Polygon simplification Relevance Ranking Frame Number 0 1 1 100 98 12 99 5 Frames with decreasing relevance Discrete Curve Evolution P=P0, ..., Pm Pi+1 is obtained from Pi by deleting the vertices of Pi that have minimal relevance measure K(v, Pi) = K(u,v,w) = |d(u,v)+d(v,w)-d(u,w)| v v w w u u Discrete Curve Evolution: Preservation of position, no blurring Discrete Curve Evolution: robustness with respect to noise Discrete Curve Evolution: extraction of linear segments Key Frame Extraction Key frames and rank Security1 Bins Matrix Distance Matrix err for seciurity1 video -3 1 x 10 security1Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number Norm Dist for the window of "11" frames 0.8 0.6 0.4 0.2 0 0 50 100 150 200 Frame Number 250 300 350 400 M. S. Drew and J. Au: http://www.cs.sfu.ca/~mark/ftp/AcmMM00/ Predictability of video parts: Local Curveness computation We divide the video polygonal curve P into parts T_i. For videos with 25 fps: T_i contains 25 frames. We apply discrete curve evolution to each T_i until three points remain: a, b, c. Curveness measure of T_i: C(T_i,P) = |d(a, b) + d(b, c) - d(a, c)| b is the most relevant frame in T_i and the first vertex of T_i+1 security7 -4 4.5 x 10 security7Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number 4 Norm Dist for the window of "11" frames 3.5 3 2.5 2 1.5 1 err for seciurity7 0.5 0 0 50 100 150 200 Frame Number 250 300 350 400 2D projection by PCA of video trajectory for security7 Mov3 -4 4 x 10 Mov3Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number 0 50 Norm Dist for the window of "11" frames 3.5 3 2.5 2 1.5 1 0.5 0 100 150 200 Frame Number 250 300 350 400 Mov3: Rustam waving his hand. Bins Matrix Key frames = 1 378 52 142 253 235 148 31 155 167 Distance Matrix Key frames = 1 378 253 220 161 109 50 155 149 270 Hall_monitor -5 6.5 x 10 HallMonitorBins:Graph of Norm Dist for window of "11" frames VERSUS frame number 6 err for hall_monitor Norm Dist for the window of "11" frames 5.5 5 4.5 4 3.5 3 2.5 2 1.5 0 50 100 150 Frame Number 200 250 300 Hall Monitor: 2 persons entering-exiting in a hall. Bins Matrix Key frames = 1 300 35 240 221 215 265 241 278 280 Distance Matrix Key frames = 1 300 37 265 241 240 235 278 280 282 CameraAtLightSignal.avi Multimodal Histogram Histogram of lena Segmented Image Image after segmentation – we get a outline of her face, hat etc Gray Scale Image - Multimodal Original Image of Lena Thank you

Extraction of key frames from videos

Related documents

Products

Support

Extraction of key frames from videos

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib