Video summarization using color layout descriptor to extract key frames Spoorthy.B Computer Science Engineering Mtech Nitte Meenakshi Institute and Technology Bangalore, India spoorthybasappa@gmail.com Abstract—Video summarization is a process of removing the redundant frames and generating the most informative key frames of the videos. Commonly clustering methods are used to generate the video summaries but clustering methods are computationally very complex for real time applications. In this paper we proposed a method which is less computation complexity compared to clustering and aggregation methods and it does not need any user defined parameters. We have used a color layout descriptor as frame feature and adaptive threshold is used to shot boundary detection, we extract the key frames later we eliminate the similar key frames so that we get nonredundant,most informative static video summary. keywords—video summary, adaptive threshold ,key frames. Dr. Jharna Majumdar Dean, R&D HOD of Computer Science Engineering Mtech Nitte Meenakshi Institute and Technology Bangalore, India jharnamajumdar@gmail.com complicated clustering methods. Our approach is based on low level frame color feature with adaptive threshold technique. he overview of the proposed method. Very first the input video are given to sampling to reduce redundant frame comparison . Next the CLD Feature is extracted from each pre-sampled frames. After the CLD features is extracted using adaptive threshold technique the video shot boundaries are detected. Next the key frames are extracted within the shot boundaries , finally by applying histogram difference method and comparing with thresholds we eliminate the similar key frames. The figure below show the steps behind our method. I. INTRODUCTION Internet videos have got an enormous popularity in the present world. The video contents are increasing day by day in you tube and yahoo videos .We need an efficient tool for fast video browsing of the internet videos. Usually every videos contain a lot of redundant information which can be removed to make video data more suitable for retrieval, indexing and storage . this approach of removing redundancies from the video and generating condensed versions of the video is called video summarization . the video summaries contain the most important and relevant content of a video at the same time , the original message of the video must be preserved. The video summaries can be generated in two different ways[3] static and dynamic. Static video summaries deal with the extraction of the key frames from the video . the key frames are still frames extracted from the video which hold the most important content of the video and they are representative of the video. The dynamic video summary contains small shots that are accumulated in a time ordered sequence. Commonly clustering methods are used in static video summaries. Where the similar frames are clustered and frames are extracted per cluster .The K-means[4] , K-medoids and Delaunay Triangulation [4,5,6]are the most common clustering methods. Clustering methods are very complicated to apply for frame features , computationally they are very complex and they need user defined parameters like number of clusters , desired number of summarized frames. In this papers we have proposed a fast and effective approach for video summary generation which is more efficient then the Fig1. Steps of the proposed method II. PRPOPOSED METHOD The detailed explanation behind the proposed work is given here. [6],[7] I. Sampling of input video: The video contains many redundant frames, our aim is to remove the redundant frames to reduce the computational time in frame comparison. We can sample the frames by splitting the input video into time segments[clustering paper] or the very simple and commonly used method which we used in our paper is by taking every 10th frames of the input video so that we get uniform samples with fixed sample rate. II. Extraction of frame feature. Frames features extraction is the method of extracting the properties of the frames. using the frame feature we can determine the shot boundaries. The commonly used frame feature are pixel difference, histogram difference, template matching edge change etc. In this paper we have used a Color layout descriptor as a frame feature . CLD has been designed to efficiently and compactly represent spatial layout of colors inside images. To obtain the CLD feature we have to different step , which is given in the figure 2. Step 3: DCT Transformation: The three Y,Cb,Cr 8x8 matrix is transformed by 8x8 DCT , so three sets of 64 DCT coefficients are obtained. To calculate the DCT in a 2D array we used the below formula. Step 4: Zigzag Scanning: The purpose of zigzag scanning is to group low frequency coefficients in top of vector and Zigzag scanning is performed to convert 2D array into 1D array. Zigzag scanning is performed on each of DCT matrix. Fig 2: CLD Feature extraction method Step 1: Image Partitioning The input frames on RGB color space is divided into 64 blocks to guarantee the invariance to resolution or scaling. Fig 5: Zigzag sanning Finally a zigzag saccened DCT coefficents are concatenated into a feature vector of certain number of most representative Y,Cb,Cr coefficents.we have used a 12 dimentional CLD feature for video shot detection.for every pre- sampled frames we extract the CLD feature vector .[4] = Were Fig 3: Image Partitioning Step 2: Representative color selection. After image partitioning stage a single representative color is selected from each blocks .Any method to select the representative color can be applied, but the standard recommends the use of the average of the pixel colors in a block as the corresponding representative color, since it is simpler and the description accuracy is sufficient in general. The selection results in a tiny image icon of size 8x8.This tiny image icon is converted from RGB color space to YCbCr color space. F Fig 4 : Representative color selection is the CLD feature vector of the ith frame . III. Video shot boundary detection using adaptive threshold technique. Using video shot boundary detection we can summarize the video effectively and get the must informatic frames. The input video is divided into shots , where shots represents the sequence of video frames taken continuously by one camera .shot boundaries are detected at the position where the shot transitions occurs. In our paper using CLD feature we calculated the difference value between two successive fames and that difference value is compared with the adaptive threshold value. If the difference value is greater than the adaptive threshold value we can say that there is a shot transition in between those frames. The frame difference is calculated using normalized Euclidean distance between Successive CLD frame feature as give below. Adaptive threshold value is computed using the formula below :[4] Ti = α . µ +Tconst Where α and Tconst are empirically determined parameter and µ represents the mean value. IV. Key frame Selection Key frames are chosen to be the first or last frames of a shot, some methods uses any single frames inside a shot . In our approach we selected a last frame of the shot as key frame. V. Elimination of similar key frames After the key frames are selected there may be similar key frames are appear at different temporal positions in the input video. To eliminate those similar key frames we used a simple technique . first the histogram difference [3] is calculated between all the selected key frames and distance matrix is computed . this distance matrix values are compared with the predefined threshold T dist . if the frame distance value less than predefined threshold T dist that frames are removed from the summary. Finally the remaining key frames represents the summary of the input video. In the below figure shows the result of the key frames which are obtained. The first row frames are the key frames which are obtained before elimination and the second row frames are the key frames obtained after the similar key frames elimination. frames which represents the input video . we used sampling technique to reduce redundant frame comparison and elimination of similar key frames using histogram difference method. Finally we get the good result of video summary with most informative key frames . this method is compared with the clustering method. Compared to clustering method our method is most efficient and less computational complexity. Input Video Total # of frames # of Selected key frames Cartoon.avi Movie.avi Exotic Terrane New Horizon Hurricane Force 406 622 2938 4 21 16 1816 18 2390 10 # of frames After elimination of similar key frames 3 8 8 6 5 REFERENCES Fig 6: key frames extracted before ( 1st and 2nd row ) and after elimination of similar key frames.(3rd row) III. EXPIREMANTAL RESULTS We have implemented the proposed algorithm in Visual Basics 6.0, VC++ . we have provided the different input videos downloaded from the open video web site and experimented by giving different α and Tconst values till we get a satisfactory key frames video summary. Our results are compared with the results from K-means clustering method and manual comparison.The table below gives the results for different inputvideos Fig 7: Final summary of 3 different videos (one video per row). CONCLUSION In this paper we proposed an efficient technique called Color layout descriptor for frame feature extraction video and using the adaptive threshold technique we extract the key [1] Truong, B.T., Venkatesh, S., Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing Communications and Applications, 3 (2007), No. 1, 1-37. [2] Tommy chheng ,. “ Video summarization using Clustering:” university of California. [3] Miss A.V Kumthekar prof Mrs.J K Patil “, IJSRET,vol 2 issue 4 july 2013 [4]Stevica ,Marko Sasa V. “ Video summarization using color feature and efficient threshold technique”,ISSN 00332097 R 89 NR 2a/2013 [5]Azar Nasreen ,”Key frames extraction using edge change ratio for shot segmentation” IJARCCE vol2 issue 11 nov 2013. [6] Muhammad Imran, Rathiah Hashim, Aun Irtaza, Noor Elaiza Abd Khalid,” CBIR Using Colour Layout Descriptor and Coiflets Wavelets” ISSN 2090-4304 [7] Hamid A. Jalab,” Image Retrieval System Based on Color Layout Descriptor and Gabor Filters”,(ICOS2011), September 25 - 28, 2011, Langkawi, Malaysia [8] Eidenberger, H., Statistical analysis of content-based MPEG-7 descriptors for image retrieval. Multimedia Systems, 10 (2004), No. 2, 84–97. [9] Deselaers, T., Keysers, D., Ney, H., Features for Image Retrieval: A Quantitative Comparison. DAGM Symposium Symposium for Pattern Recognition, (2004), 228–236. [10] The Open Video Project Online: http://www.openvideo.org References