II. Prpoposed method - Academic Science,International Journal of

advertisement
Video summarization using color layout descriptor to
extract key frames
Spoorthy.B
Computer Science Engineering Mtech
Nitte Meenakshi Institute and Technology
Bangalore, India
spoorthybasappa@gmail.com
Abstract—Video summarization is a process of removing the
redundant frames and generating the most informative key
frames of the videos. Commonly clustering methods are used to
generate the video summaries but clustering methods are
computationally very complex for real time applications. In this
paper we proposed a method which is less computation
complexity compared to clustering and aggregation methods and
it does not need any user defined parameters. We have used a
color layout descriptor as frame feature and adaptive threshold is
used to shot boundary detection, we extract the key frames later
we eliminate the similar key frames so that we get nonredundant,most informative static video summary.
keywords—video summary, adaptive threshold ,key frames.
Dr. Jharna Majumdar
Dean, R&D HOD of Computer Science Engineering Mtech
Nitte Meenakshi Institute and Technology
Bangalore, India
jharnamajumdar@gmail.com
complicated clustering methods. Our approach is based on low
level frame color feature with adaptive threshold technique.
he overview of the proposed method.
Very first the input video are given to sampling to reduce
redundant frame comparison . Next the CLD Feature is
extracted from each pre-sampled frames. After the CLD
features is extracted using adaptive threshold technique the
video shot boundaries are detected. Next the key frames are
extracted within the shot boundaries , finally by applying
histogram difference method and comparing with thresholds
we eliminate the similar key frames.
The figure below show the steps behind our method.
I. INTRODUCTION
Internet videos have got an enormous popularity in the
present world. The video contents are increasing day by day in
you tube and yahoo videos .We need an efficient tool for fast
video browsing of the internet videos. Usually every videos
contain a lot of redundant information which can be removed
to make video data more suitable for retrieval, indexing and
storage . this approach of removing redundancies from the
video and generating condensed versions of the video is called
video summarization . the video summaries contain the most
important and relevant content of a video at the same time , the
original message of the video must be preserved. The video
summaries can be generated in two different ways[3] static
and dynamic. Static video summaries deal with the extraction
of the key frames from the video . the key frames are still
frames extracted from the video which hold the most important
content of the video and they are representative of the video.
The dynamic video summary contains small shots that are
accumulated in a time ordered sequence.
Commonly clustering methods are used in static video
summaries. Where the similar frames are clustered and frames
are extracted per cluster .The K-means[4] , K-medoids and
Delaunay Triangulation [4,5,6]are the most common clustering
methods. Clustering methods are very complicated to apply for
frame features , computationally they are very complex and
they need user defined parameters like number of clusters ,
desired number of summarized frames.
In this papers we have proposed a fast and effective approach
for video summary generation which is more efficient then the
Fig1. Steps of the proposed method
II. PRPOPOSED METHOD
The detailed explanation behind the proposed work is given
here. [6],[7]
I. Sampling of input video:
The video contains many redundant frames, our aim is to
remove the redundant frames to reduce the computational time
in frame comparison. We can sample the frames by splitting
the input video into time segments[clustering paper] or the very
simple and commonly used method which we used in our
paper is by taking every 10th frames of the input video so that
we get uniform samples with fixed sample rate.
II. Extraction of frame feature.
Frames features extraction is the method of extracting the
properties of the frames. using the frame feature we can
determine the shot boundaries. The commonly used frame
feature are pixel difference, histogram difference, template
matching edge change etc. In this paper we have used a Color
layout descriptor as a frame feature . CLD has been designed to
efficiently and compactly represent spatial layout of colors
inside images. To obtain the CLD feature we have to different
step , which is given in the figure 2.
Step 3: DCT Transformation:
The three Y,Cb,Cr 8x8 matrix is transformed by 8x8 DCT , so
three sets of 64 DCT coefficients are obtained. To calculate the
DCT in a 2D array we used the below formula.
Step 4: Zigzag Scanning:
The purpose of zigzag scanning is to group low
frequency coefficients in top of vector and Zigzag scanning is
performed to convert 2D array into 1D array. Zigzag scanning
is performed on each of DCT matrix.
Fig 2: CLD Feature extraction method
Step 1: Image Partitioning
The input frames on RGB color space is divided into
64 blocks to guarantee the invariance to resolution or scaling.
Fig 5: Zigzag sanning
Finally a zigzag saccened DCT coefficents are concatenated
into a feature vector of certain number of most representative
Y,Cb,Cr coefficents.we have used a 12 dimentional CLD
feature for video shot detection.for every pre- sampled frames
we extract the CLD feature vector .[4]
=
Were
Fig 3: Image Partitioning
Step 2: Representative color selection.
After image partitioning stage a single representative
color is selected from each blocks .Any method to select the
representative color can be applied, but the standard
recommends the use of the average of the pixel colors in a
block as the corresponding representative color, since it is
simpler and the description accuracy is sufficient in general.
The selection results in a tiny image icon of size 8x8.This tiny
image icon is converted from RGB color space to YCbCr
color space.
F
Fig 4 : Representative color selection
is the CLD feature vector of the ith frame .
III. Video shot boundary detection using adaptive
threshold technique.
Using video shot boundary detection we can summarize the
video effectively and get the must informatic frames. The input
video is divided into shots , where shots represents the
sequence of video frames taken continuously by one camera
.shot boundaries are detected at the position where the shot
transitions occurs. In our paper using CLD feature we
calculated the difference value between two successive fames
and that difference value is compared with the adaptive
threshold value. If the difference value is greater than the
adaptive threshold value we can say that there is a shot
transition in between those frames.
The frame difference is calculated using normalized
Euclidean distance between Successive CLD frame feature as
give below.
Adaptive threshold value is computed using the formula
below :[4]
Ti = α . µ +Tconst
Where α and Tconst are empirically determined parameter
and µ represents the mean
value.
IV. Key frame Selection
Key frames are chosen to be the first or last frames of a
shot, some methods uses any single frames inside a shot . In
our approach we selected a last frame of the shot as key frame.
V. Elimination of similar key frames
After the key frames are selected there may be similar key
frames are appear at different temporal positions in the input
video. To eliminate those similar key frames we used a simple
technique . first the histogram difference [3] is calculated
between all the selected key frames and distance matrix is
computed . this distance matrix values are compared with the
predefined threshold T dist . if the frame distance value less
than predefined threshold T dist that frames are removed from
the summary. Finally the remaining key frames represents the
summary of the input video.
In the below figure shows the result of the key frames
which are obtained. The first row frames are the key frames
which are obtained before elimination and the second row
frames are the key frames obtained after the similar key frames
elimination.
frames which represents the input video . we used sampling
technique to reduce redundant frame comparison and
elimination of similar key frames using histogram difference
method. Finally we get the good result of video summary with
most informative key frames . this method is compared with
the clustering method. Compared to clustering method our
method is most efficient and less computational complexity.
Input Video
Total # of
frames
# of Selected
key frames
Cartoon.avi
Movie.avi
Exotic
Terrane
New Horizon
Hurricane
Force
406
622
2938
4
21
16
1816
18
2390
10
# of frames
After
elimination
of similar key
frames
3
8
8
6
5
REFERENCES
Fig 6: key frames extracted before ( 1st and 2nd row ) and
after elimination of similar key frames.(3rd row)
III. EXPIREMANTAL RESULTS
We have implemented the proposed algorithm in Visual
Basics 6.0, VC++ . we have provided the different input videos
downloaded from the open video web site and experimented by
giving different α and Tconst values till we get a satisfactory key
frames video summary. Our results are compared with the
results from K-means clustering method and manual
comparison.The table below gives the results for different
inputvideos
Fig 7: Final summary of 3 different videos (one video per row).
CONCLUSION
In this paper we proposed an efficient technique called
Color layout descriptor for frame feature extraction video and
using the adaptive threshold technique we extract the key
[1] Truong, B.T., Venkatesh, S., Video abstraction: A
systematic review and classification. ACM Transactions on
Multimedia Computing Communications and Applications, 3
(2007), No. 1, 1-37.
[2] Tommy chheng ,. “ Video summarization using
Clustering:” university of California.
[3] Miss A.V Kumthekar prof Mrs.J K Patil “, IJSRET,vol
2 issue 4 july 2013
[4]Stevica ,Marko Sasa V. “ Video summarization using
color feature and efficient threshold technique”,ISSN 00332097 R 89 NR 2a/2013
[5]Azar Nasreen ,”Key frames extraction using edge
change ratio for shot segmentation” IJARCCE vol2 issue 11
nov 2013.
[6] Muhammad Imran, Rathiah Hashim, Aun Irtaza, Noor
Elaiza Abd Khalid,” CBIR Using Colour Layout Descriptor
and Coiflets Wavelets” ISSN 2090-4304
[7] Hamid A. Jalab,” Image Retrieval System Based on
Color Layout Descriptor and Gabor Filters”,(ICOS2011),
September 25 - 28, 2011, Langkawi, Malaysia
[8] Eidenberger, H., Statistical analysis of content-based
MPEG-7 descriptors for image retrieval. Multimedia Systems,
10 (2004), No. 2, 84–97.
[9] Deselaers, T., Keysers, D., Ney, H., Features for Image
Retrieval: A Quantitative Comparison. DAGM Symposium
Symposium for Pattern Recognition, (2004), 228–236.
[10]
The
Open
Video
Project
Online:
http://www.openvideo.org
References
Download