Extraction of key frames from videos

advertisement
Context-dependent Detection of Unusual Events in Videos
by
Geometric Analysis of Video Trajectories
Longin Jan Latecki
(latecki@temple.edu)
Computer and Information Sciences
Temple University, Philadelphia
Nilesh Ghubade and Xiangdong Wen
(nileshg@temple.edu)
Agenda
 Introduction
 Mapping
of video to a trajectory
 Relation: motion trajectory  video trajectory
 Discrete curve evolution
 Polygon simplification
 Key frames
 Unusual events in surveillance videos
 Results
Main Tools
 Mapping
the video sequence to a polyline
in a multi-dimensional space.
 The automatic extraction of relevant frames
from videos is based on polygon
simplification by discrete curve evolution.
Mapping of video to a trajectory
 Mapping
of the image stream to a trajectory
(polyline) in a feature space.
 Representing each frame as:
Frame 0
Bin0
……… Bin n
X-coord of Y-coord of
Bin’s
the Bin’s the Bin’s Frequency
Count
centroid
centroid
Frame N
Bin n
Used in our experiments
 Red-Green-Blue

(rgb) Bins
Each frame as a 24-bit color image (8 bit per
color intensity):
• Bin 0 = color intensities from 0-31
• Bin 1 = color intensities from 32-63
• Bin 8 = color intensities from 224-255

Three attributes per bin: • Row of the bin’s centroid
• Column of the bin’s centroid
• Frequency count of the bin.

(8 bins per color level * 3 attributes/bin)*3 color
levels = 72 feature
Theoretical Results:
Motion trajectory  Video trajectory
Consider a video in which an object (a set of pixels) is
moving on a uniform background. The object is visible in
all frames and it is moving with a constant speed on a
linear trajectory. Then the video trajectory in the feature
space is a straight line.
If n objects are moving with constant speeds on a linear
trajectory, then the trajectory is a straight line in the
feature space.
Consider a video in which an object (a set of pixels) is
moving on a uniform background.
Then the trajectory vectors are contained in the plane.
If n objects are moving, then the dimension of the
trajectory is at most 2n.
If a new object suddenly appears in the movie, the
dimension of the trajectory increases at least by 1 and
at most by 3.
MovingDotMovieWithAdditionalDot.avi
Robust Rank Computation
Using singular value decomposition, based on:
C. Rao, A. Yilmaz, and M.Shah.
View-Invariant Representation and Recognition of actions.
Int. J. of Computer Vision 50, 2002.
M. Seitz and C. R. Dyer.
View-invariant analysis of cyclic motion.
Int. J. of Computer Vision 16, 1997.
err2 ( M ) 
n
2

 i
i 3
We compute err in a window of 11 consecutive frames in our experiments.
MovingDotMovieWithAdditionalDot.avi
-21
MovingDotMovieWithAdditionalDotBins:Graph
of Norm Dist for window of "11" frames VERSUS frame number
x 10
8
Norm Dist for the window of "11" frames
7
6
5
4
3
2
1
0
0
20
40
60
80
Frame Number
100
120
140
160
Interpolation of video trajectory
MovingDotMovie_Clockwise.avi
MovingDotMovieWithAdditionalDot.avi
Polygon simplification
Relevance Ranking Frame Number
0
1
1
100
98
12
99
5
Frames with
decreasing
relevance
Discrete Curve Evolution
P=P0, ..., Pm
Pi+1 is obtained from Pi by deleting the
vertices of Pi that have minimal relevance
measure
K(v, Pi) = K(u,v,w) = |d(u,v)+d(v,w)-d(u,w)|
v
v
w
w
u
u
Discrete Curve Evolution:
Preservation of position, no blurring
Discrete Curve Evolution:
robustness with respect to noise
Discrete Curve Evolution:
extraction of linear segments
Key Frame Extraction
Key frames and rank
Security1
Bins
Matrix
Distance Matrix
err for seciurity1 video
-3
1
x 10
security1Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number
Norm Dist for the window of "11" frames
0.8
0.6
0.4
0.2
0
0
50
100
150
200
Frame Number
250
300
350
400
M. S. Drew and J. Au: http://www.cs.sfu.ca/~mark/ftp/AcmMM00/
Predictability of video parts:
Local Curveness computation
We divide the video polygonal curve P into parts T_i.
For videos with 25 fps: T_i contains 25 frames.
We apply discrete curve evolution to each T_i
until three points remain: a, b, c.
Curveness measure of T_i:
C(T_i,P) = |d(a, b) + d(b, c) - d(a, c)|
b is the most relevant frame in T_i
and the first vertex of T_i+1
security7
-4
4.5
x 10
security7Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number
4
Norm Dist for the window of "11" frames
3.5
3
2.5
2
1.5
1
err for
seciurity7
0.5
0
0
50
100
150
200
Frame Number
250
300
350
400
2D projection by PCA of video trajectory for security7
Mov3
-4
4
x 10
Mov3Bins:Graph of Norm Dist for window of "11" frames VERSUS frame number
0
50
Norm Dist for the window of "11" frames
3.5
3
2.5
2
1.5
1
0.5
0
100
150
200
Frame Number
250
300
350
400
Mov3:
Rustam waving his hand.
Bins
Matrix
Key frames = 1 378 52 142
253 235 148 31 155 167
Distance
Matrix
Key frames = 1 378 253 220
161 109 50 155 149 270
Hall_monitor
-5
6.5
x 10
HallMonitorBins:Graph of Norm Dist for window of "11" frames VERSUS frame number
6
err for
hall_monitor
Norm Dist for the window of "11" frames
5.5
5
4.5
4
3.5
3
2.5
2
1.5
0
50
100
150
Frame Number
200
250
300
Hall Monitor:
2 persons entering-exiting in
a hall.
Bins
Matrix
Key frames = 1 300 35 240
221 215 265 241 278 280
Distance
Matrix
Key frames = 1 300 37 265
241 240 235 278 280 282
CameraAtLightSignal.avi
Multimodal Histogram
Histogram of lena
Segmented Image
Image after segmentation – we get a outline of her face, hat etc
Gray Scale Image - Multimodal
Original Image of Lena
Thank you
Download