International Journal of Electrical, Electronics and Computer Systems, (IJEECS) _______________________________________________________________________ GRADUAL TRANSITION DETECTION IN MOVIE VIDEOS: A CHALLENGE FOR SHOT BOUNDARY DETECTION ALGORITHM 1 Shraddha C. Nistane, 2Krishna K. Warhade 1 M.E. student at Department of Electronics & Telecommunication, MITCOE, Pune 2 Professor at Department of Electronics & Telecommunication, MITCOE, Pune Email : 1shraddhanistane@gmail.com, 2krishna.warhade@mitcoe.edu.in of frames shot uninterruptedly by one camera. There are several film transitions usually used in film editing to juxtapose adjacent shots; In the context of shot transition detection they are usually group into two types i.e abrupt transition and gradual transition. ABSTRACT: Automatic shot boundary detection has been an active research area for nearly a decade and has led to high performance detection algorithms for hard cuts, fades and wipes. But finding gradual transition is major challenge in the presence of camera and object motion. In this paper, a review of different gradual transition detection methods is presented. Specially, the review focuses on dissolve detection in the presence of camera and object motion. Abrupt transition (AT) is a sudden transition from one shot to another, i . e. one frame belongs to the first shot, the next frame belongs to the second shot. They are also known as hard cuts or simply cuts [2]. Fig.1 shows consecutive frames with abrupt transition from star war movie. Gradual transition (GT) is a transition in which the two shots are combined using Index term: Shot boundary detection, gradual transition, dissolve detection. I. INTRODUCTION The increased availability and usage of on-line digital video has created a need for automated video content analysis techniques. Most research on video content involves automatically detecting the boundaries between camera shots. Shot transition detection is used to split up a film into basic temporal units called shots; a shot is a series of interrelated consecutive pictures taken contiguously by a single camera and representing a continuous action in time and space [1]. This operation is of great use in software for post-production of videos. It is also a fundamental step of automated indexing and content-based video retrieval or summarization applications which provide an efficient access to huge video archives, e.g. user may choose a representative picture from each scene to create a visual overview of the whole film and, by processing such indexes, a search engine can process search items. A digital video consists of frames that are presented to the viewer's eye in rapid succession to create the impression of movement. Each frame within a digital video can be uniquely identified by its frame index, a serial number. A shot is a sequence Fig.1. consecutive frames with abrupt transition from star war I Movie __________________________________________________________________________ ISSN (Online): 2347-2820, Volume -1, Issue-2, 2013 1 International Journal of Electrical, Electronics and Computer Systems, (IJEECS) _______________________________________________________________________ Several reviews on shot boundary detection have been published in the last decade. III. CONTENT BASED VIDEO INDEXING AND RETRIEVAL There are four main processes involved in content-based video indexing and retrieval [3-5]: video content analysis, video structure parsing, summarization or abstraction, and indexing. Each process poses many challenges. We briefly review and discuss these challenging research issues as mentioned below. A.. Video content analysis Video content analysis is the capability of automatically analyzing video to detect and determine temporal events not only based on a single image but also on the basis of text, audio, speed. Video content analysis is used in a wide range of areas including entertainment, health-care, retail, automotive, transport, home automation, safety, security, network, multimedia, Internet communication, Mobile communication, Distance education, Sports and News. Many different functionalities can be implemented in video content analysis. video motion detection is one of the simpler forms where motion is detected with regard to a fixed background scene. More advanced functionalities include video tracking, object detection, motion detection, face recognition. Fig.2. consecutive frames of dissolve transitions chromatic, spatial or spatial-chromatic effects where one shot is gradually replace by another. These are also often known as soft transitions and these shot can be of various types, e.g., wipes, dissolves, fades [2]. Fig.2 shows consecutive frames of dissolve transition. Although cut detection appears to be a simple task for a human being, it is a non-trivial task for computers. Cut detection would be a trivial problem if each frame of a video was enriched with additional information about when and by which camera it was taken. Possibly no algorithm for cut detection will ever be able to detect all cuts with certainty, unless it is provided with powerful artificial intelligence. B. Video structure parsing Digital video needs to be properly processed before it c inserted into a video server. These tasks include compressing, parsing and indexing a video sequence. Video parsing is the process of detecting scene changes or the boundaries between camera shots in a video stream. The video parsing is a similar process like text document parsing, but it requires higher level of content analysis on the basis of pixel, color, edges, motion, objects etc. Shot boundary detection algorithm process visual information contained in video frames and can segment the video into frames with similar visual information. While most algorithms achieve good results with hard cuts, many fail with recognizing soft cuts. Hard cuts usually go together with sudden and extensive changes in the visual content, while soft cuts feature shows slow and gradual changes. A human being can compensate this lack of visual diversity with understanding the meaning of a scene. While a computer assumes a black line wiping a shot away to be "just another regular object moving slowly through the on-going scene", a person understands that the scene ends and is replaced by a black screen. ARTICLE IN PRESS II. CHALLENGES IN SHOT BOUNDARY DETECTION C. Video summarization Due to the rapid progress in technology of network and multimedia, large number of videos is available on the internet. Video summarization plays an important role in this context. It helps in efficient storage, quick browsing, and retrieval of large collection of video data without losing important aspects. This process is similar to extraction of keywords or summaries in text document processing. That is, we need to extract a subset of video data from the original video such as key frames or highlights as entries for shots, scenes, or stories. Combining the structure information extracted from video parsing and the key frames extracted in video abstraction, we can build a visual table of contents for a video. The major challenge in video segmentation is a detection of gradual transition in the presence of motion. A gradual shot transition occurs when the change takes place over a sequence of frames. The most common gradual transitions are dissolves and fades (in-out). A dissolve in a video sequence is a shot transition with the first shot gradually disappearing while the second shot gradually appears. A fade of a video sequence is a shot transition with the first shot gradually disappearing (fade out) before the second shot gradually appears (fade in). During the fading transition, two shots are spatially and temporally well separated by some monochrome frames. __________________________________________________________________________ ISSN (Online): 2347-2820, Volume -1, Issue-2, 2013 2 International Journal of Electrical, Electronics and Computer Systems, (IJEECS) _______________________________________________________________________ D. Video indexing the precision and recall rates can get in a satisfactory level. The structural and contents found in content analysis, video parsing and video summarization are referred as metadata. So the, Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is not feasible for large video collections. Based on the metadata, we can build video indices and the table of content. After indexing, these metadata is proceeding for clustering process where sequence or shots classifies into different visual categories or an indexing structure. As in many other information systems, we need schemes and tools to use the indices and content metadata to query, search, and browse large video databases. Researchers have developed numerous schemes and tools for video indexing and query. However, robust and effective tools tested by thorough experimental evaluation with large data sets are still lacking. Therefore, in the majority of cases, retrieving or searching video databases by keywords or phrases will be the mode of operation. In some cases, we can retrieve performance by content similarity defined by low-level visual features of, for instance, key frames, and example-based queries. Yang Xu et al. have proposed 3- DWT based motion suppression for video shot boundary detection [6]. In this method adaptive threshold is selected, so that gradual transition and motion are adequately discriminated. Under the framework of 3-DWT framework dramatic motion are characterized. In the proposed method, motion intensity is extracted and motion suppression value (MSV) is defined, which is integrated into histogram based on edges based methods for video shot boundary detection. This method is more efficient to detect the gradual transition and also solves the problem of motion information which suffers from noise and illumination. Detection of gradual transition in video sequences using B-spline interpolation has been proposed by Jeho Nam and Ahmed H. Tewfik [7]. In this proposed method, the focus is on to recover the original transition behavior of edit effect though it was distorted by motion and other pre-processing operations. In this technique, existing abrupt shot boundaries are detected. There is undefined framework for dealing with all gradual transition. To overcome this problem, B-spline interpolation curve fitting is used for estimating silent production IV. RELATED WORK In this Section, we describe various algorithm in the area of video segmentation specially gradual transition detection. Jun Li et al. have proposed DWT-based shot boundary detection using support vector machine [8]. In this technique, shot boundary detection algorithms extract the color and the edge in different direction from wavelet transition coefficients. Then a multi-class support vector machine (SVM) classifier is used to classify the video shot into three categories: cut transition (AT), gradual transition (GT) and normal sequences (NF). To enhance the robustness of the algorithm, the feature vector from all frames within a temporal window is formed. This technique is capable for numerical experiments using a variety of detecting and discriminating shot transitions in videos with different characteristics. The video shot boundary detection using Eigen value, decomposition and Gaussian transition detection has been proposed by Ali Amiri [1]. In this method, using generalized eigen value decomposition, novel shot boundary algorithm is designed. After comparing the video frame, the distance function is calculated. This distance function gives abrupt changes in hard cuts and semi Gaussian behavior in gradual transition. From this distance function transition is detected. This approach significantly increases the effectiveness of shot boundary task, while at the same time reduces the computation cost. This approach provides the good result in recall with a range of 92.4% - 97.2%. But the accuracy is not satisfactory because it is unable to predict fast camera motion and flashlight changes. Vasileios Chasanis et al. have proposed simultaneous detection of abrupt cuts and dissolves in videos using vector machines [9]. In the proposed methodology commonly color histogram and χ2 value features are used. Here, normalized RGB histogram is chosen. For each frame normalized histogram is computed with 256 bins for each one of the RGB component as HR, HG and HB respectively. These three histograms are concatenated into a 768 dimension vector which represented final histogram for each frame. Variations of χ 2 can increases the difference between two histogram and dissimilarity is obtained by using inter frame distance. For detecting transition, distance should be less than the minimum length. The classification of features is done on support vector machine. It finds an optimal hyper- plane which separates two points of two classes. The features which are extracted gives input to SVM classifier which categorizes transition of the video Y.-N. Li et al. have proposed a fast shot detection framework employing pre-processing techniques including thresholding and bisection-based comparisons to eliminate non-boundary regions [2]. The factors that lead to high detection speed in the proposed framework are three folds. Firstly, a large number of non-boundary frames are discarded before hard cut and GT detections. Secondly, the types of shot boundaries can be predicted in preprocessing stage, so that it can detect the preserved segments directly using the right detector. Thirdly, the burden of GT detection is eased, because the duration of each possible GT is available in advance and it is not necessary to calculate frame distances for multiple times. On Simulations and comparisons, significant speed up is achieved in the proposed framework, while __________________________________________________________________________ ISSN (Online): 2347-2820, Volume -1, Issue-2, 2013 3 International Journal of Electrical, Electronics and Computer Systems, (IJEECS) _______________________________________________________________________ sequence into normal transition, abrupt and gradual transition. It gives effective segmentation and so for their used for video indexing and browsing. The advantage of proposed method is that there is no use of threshold for detecting. Therefore, algorithm is not sensitive to content of video. based on mutual information for better detection accuracy and efficiency. In addition, the method employ hierarchical analysis both spatially and temporarily on the video data, leading to reduced computation and improved gradual transition detection accuracy. Then this algorithm is implemented on the public test video data and results prove both the efficiency and the accuracy of shot boundary detection. Tuanfa Qin et al. have proposed a fast shot –boundary detection based on K-step slipped window [10]. This method considers the low-feature and edit-feature of the video-shot. First, it selects the candidates of the shotboundary by K-step slipped window and adaptive thresholds, and then, it does hard cut detection, flash exclusion and gradual transition detection orderly. This method compares its experiment results to some existing method in terms of computational time and precision. It has been found that this method has higher precision and low computational time. It improves the efficiency and precision of gradual transition detection An accumulation algorithm for video shot boundary detection has been proposed by T. Lu and P. N. Suganathan [13]. The proposed algorithm takes the difference between consecutive frames and accumulates them. When accumulation difference exceeds a threshold, transition is detected. The algorithm have introduced C frame. The content of frame C represented the changes from beginning of the shot. The frame C takes the difference and similarity between first frame and second frame i.e f1 and f2 and then compares with consecutive frames until difference exceeds threshold. As soon as it exceeds threshold, transition is detected. This method can detects all shot boundary like cuts, fades and dissolve. It resolved the problem of zooming in the block matching methods. Mohanta et al. have proposed model-based shot boundary detection technique using frame transition parameters [11]. Based on their proposed model, formulate frame estimation scheme using the previous and the next frames. The transition parameters along with the error in estimation are determined from global feature such as the color intensity histograms. Transition parameters are also derived from local features like scatter matrix of edge strength and motion matrix. The transition parameters derived from global and local features along with the error in probability density function. Estimation describes the variability in visual content of the neighboring frames. So the transition parameters and corresponding errors constitute the feature vector for shot boundary classification. Hence, color intensity histogram, edge scatter nor motion matrices are not directly used as features in frame classification. Rather local variation (between frames) of these statistics is used as features for shot detection and classification. Thus feature set incorporates more abstract information about shot transition, and at the same time, the size of the feature vector is drastically reduced. As a result, within shot variation and between shot variations results are reflected at different degree. For classification, they have employed a neural network with back-propagation mechanism. It classifies the frames into one of the three categories: no change, gradual change, and abrupt change. Partial dissimilarity measure and final shape of the decision boundary are evolved to best suit the training data. Thus, the proposed scheme is less dependent on the critical issues like selection of various thresholds or sliding window size. Finally, a simple but effective post processing is carried out on the classified output to repair the false transition and misclassification error, if any. The method is found to be efficient and works better than the existing algorithm for a variety of benchmark videos. K. I. Koumousis et al. have proposed a new approach to gradual transition detection [14]. In this method the problem of shot boundary detection for gradual transitions within video sequences is uses statistical tests in conjunction with the Iterative Self Organizing Data Analysis (ISODATA) classification algorithm for consecutive video frames. The confusion matrix from the classification results is formed in order to calculate the Kappa coefficient and then it is used to identify the transition. A fast coarse-to-fine video shot segmentation algorithm has been proposed by Liu and Jian-Xun Li [15]. The camera motion, object motion and gradual shot transition can be differentiated through this method. Based on the improved information entropy theory, the differences between the shots of a video sequence are calculated. The adaptive thresholds are implemented to select sequences of candidate shots from the video sequence. Because the camera/object motion and the gradual shot transition present similar characteristics, they are all selected as the candidate shots. Then a fast motion-edge detection algorithm is implemented to distinguish the gradual shot transition. The proposed algorithm is based on the statistical properties of the characteristics, hence compared to the single characteristic detection algorithm; the computational complexity is reduced effectively. This algorithm is reduces both the computational complexity and error detections caused by the camera/object motion effectively. The comparative analysis of the various algorithms for shot boundary detection are shown in Table I, in terms of feature used, type of transition detected and their merits. Mutual information based video shot boundary detection method has been proposed by Na Lv et al. [12]. The proposed method improves upon the graph cut algorithm __________________________________________________________________________ ISSN (Online): 2347-2820, Volume -1, Issue-2, 2013 4 International Journal of Electrical, Electronics and Computer Systems, (IJEECS) _______________________________________________________________________ Using F1 measure, can rank the performance of the different algorithms[4] . F1 combines recall and precision with equal weight. F1 measure is a harmonic average of recall and precision and is given below V. EVALUATION CRITERION For evaluation performance of shot boundary detection algorithm, the two metrics recall and precision are used. Recall is defined as R= C C+M = C D F1( R, P ) = (1) C (2) C+FP Where D is the total number of actual frames with dissolves boundaries, C is the number of dissolve frames correctly detected by the algorithm; M is the number of number of dissolve frames missed by algorithm and FP false positives detected by algorithm. Algorithm Using Eigen value, decomposition and Gaussian transition detection [1] A fast shot detection framework employing pre-processing techniques. [2] Features used Generalized value Eigen Detection rate DR= Number of dissolves correctly detected (Hit) Number of actual dissolve in the video ( Actual ) Types of transition detection Abrupt changes in hard cuts and semi Gaussian behavior in gradual transition. 3-DWT based motion suppression [6] B-spline interpolation [7] Thresholding and bisection based comparison on large no. of non boundary frames. Motion intensity and motion suppression value Low-scale and lowresolution image DWT- based shot boundary detection using support vector machine [8] Color and the edge in different direction from wavelet transition coefficients. Using vector machines. [9] Color histogram and χ2 Abrupt cuts and dissolve K-step slipped window [10] Low-feature and editfeature of the videoshot From global feature such as the color intensity histogram Hard cut detection, flash exclusion and gradual transition detection Classifies no change, gradual change, and abrupt change. Model-based shot boundary detection technique using frame transition parameters [11] (3) R +P However, the dissolve transition takes place over a certain range of frames unlike gradual transition that occurs at single frames. So, it is very difficult to evaluate if the methods could actually identify different types of dissolve transitions with camera and object motion. Therefore, we need an additional evaluation metric which can elaborate details about how many dissolve types are correctly detected and is given by Whereas precision is defined as P= 2×R×P Consequent hard cuts and gradual transition. Gradual motion transition and Classify the detected regions into dissolve and fade types by investigating the intra frame standard deviation and the number of consecutive frames of solid Color Cut transition, gradual transition and normal sequences. (4) Merits Solve the problem illumination and noise. Significant speedup is achieved. Pricision and recall satisfactory. of are Solve the problem of illumination and noise. Undefined framework for dealing with all gradual transition. Increases the effectiveness and reduces the computational cost. Good result in recall with arrange of 92.4-97.2% No use of threshold for detection Not sensitive to content of video. Accuracy of shot detection. Reduces the amount of calculation. Less dependent on selection thresholds. Effective post processing. Reduces false transition and misclassification. __________________________________________________________________________ ISSN (Online): 2347-2820, Volume -1, Issue-2, 2013 5 International Journal of Electrical, Electronics and Computer Systems, (IJEECS) _______________________________________________________________________ Mutual information based [12] An accumulation algorithm [13] ISODATA classification algorithm. [14] A fast coarse-to-fine video shot segmentation algorithm [15] Frame gray variance based method and the block color histogram C frames CUTs and GTs Accuracy in shot boundary detection. Cuts, fades and dissolve Kappa coefficient Gradual effects Effectively judges the small motion in horizontal and circular motion and optical zooms. Reduces the false detection and improve the efficiency in terms of precision and recall. No need to have knowledge of number of cluster Difference between DC images of all Iframe images Camera motion, object motion and gradual shot transition No use of threshold for detection. Not sensitive to content of video Table I. The comparative analysis of various shot boundary detection algorithm The portion of the video considered for analysis consist of 200 frames with camera motion (frames 16-217) and 75 frames with dissolve transition (frames 289-363). VI. EXPERIMENTAL RESULTS: We have selected the test video sequence which contains significant camera motion with dissolve transition. Fig.3 shows consecutive frames from the movie Xperia. frame 16 frame 32 frame 54 frame 124 frame 187 frame 217 frame 289 frame 293 frame 301 frame 308 frame 315 frame 320 frame 333 frame 338 frame 352 frame 363 Fig.3. Consecutive frames from movie clip Xperia showing dissolve transitions __________________________________________________________________________ ISSN (Online): 2347-2820, Volume -1, Issue-2, 2013 6 International Journal of Electrical, Electronics and Computer Systems, (IJEECS) _______________________________________________________________________ Histogram difference between frames 12 10 8 6 4 2 0 0 50 100 150 200 250 Frame Index 300 350 400 Fig.4. Histogram difference between consecutive frames We have applied color histogram difference between frames as mentioned in [9] to the above test video sequence. Fig.4 shows histogram difference of 370 consecutive frames. Here actual dissolve transition is from frame 289-363, whereas camera motion is from frame 16-217. It can be clearly observed from Fig, 4 that because of fast camera motion the histogram difference between frames shows higher value than dissolve transition. Hence, if we apply global or adaptive threshold to the histogram difference between frames, the algorithm will provide more false positives than actual dissolve transition. If the shot boundary algorithm falsely identifies camera motion as a shot boundary then correspondingly key frames and video retrieval will provide false results. Algorithm has been presented in brief. The main focus of the review is to detect gradual transitions in the presence of camera and object motion. We feel that the feature extraction techniques reviewed in this paper will provide important clues to design efficient shot boundary detection algorithm. After reviewing the literature in details, we conclude that most of the algorithms are unable to differentiate between gradual transition and motion. In future, we wanted to develop algorithm which will detect gradual transition without showing any false positive for object and camera motion in movie video. REFERENCES: [1] Ali Amiri , Mahmood Fathy, “Video shot boundary detection using generelized eigen value decomposition and Gaussian transition detection”, Computing and Informatics, vol. 30, pp. 595- 619, 2011. [2] Y.-N, Li, Z.-M, Lu, X.-M, Niu, “Fast video shot boundary detection framework employing preprocessing techniques IET, Image Process, vol. 3, iss. 3, pp. 121–134, 2009. [3] R. Bole, B. Yeo, M. Yeung, “Video query: research directions”, IBM Journal of Research and Development, vol . 42, iss. 2, pp. 233–252, 1998. [4] N. Dimitrova, H.-J. Zhang, Sezan, T. Huang, A.Zakhor, video content analysis and Multimedia, vol. 9, iss. 3, pp. Hence we required robust and effective algorithm which can either suppress motion with respect to dissolve transition or can differentiate between motion and actual shot transition. V. CONCLUSION The demand for multimedia data services necessitates the development of techniques to store, navigate and retrieve visual data. The use of existing text indexing techniques for image and video indexing is inefficient and complex. Moreover, this approach is not generic, and hence is not useful in a wide variety of applications. Consequently, shot boundary detection techniques should be employed to search for desired images and video in a database. This paper reviews shot boundary detection techniques proposed in the recent literature. The main contribution of each B. Shahraray, I. “Applications of retrieval”, IEEE 42–55, 2002. __________________________________________________________________________ ISSN (Online): 2347-2820, Volume -1, Issue-2, 2013 7 International Journal of Electrical, Electronics and Computer Systems, (IJEECS) _______________________________________________________________________ [5] M. Lew, N. Sebe, P. Gardner, Video indexing and understanding, in: M. Lew (Ed.), “ Principles of visual information retrieval”, Springer, Berlin, pp. 163–196; 2001. [6] Yang Xu, Xu De, Gaun Tengfei, Wu Aimin, Lang Congyan, “3 DWT based motion suppression for video shot boundary detection”, Knowlwdge -based Intellingent Information and Engineering System, Lecture notes in Computer Science, vol. 3682, pp. 1204-1209, 2005 [7] Jeho Nam and Ahmed H. Tewfik, “Detection of gradual transitions in video sequences using Bspline interpolation” IEEE Transaction on Multimedia, vol. 7, iss. 4, pp. 667-678, Aug 2005. [8] [9] [10] K-step slipped window”, 2nd IEEE international Conference on Network Infrastructure and Digita Content, pp. 190-195, 24-26 Sept 2010. Jun Li, Youdong Ding, Yunyu Shi, Qingyue Zeng, “DWT-based shot boundary detection using support vector machine”, Information Assurance and Security, vol. 1, pp. 435 – 438, 18-20 Aug 2009. Vasileios Chasanis, Aristidis Likas, Nikolaos Galatsanos, “Simultaneous detection of abrupt cuts and dissolves in videos using support vector machines”, Pattern Recognition Letters 30, pp. 55-65, 2009 [11] Partha Pratim Mohanta, Sanjoy Kumar Saha, and Bhabatosh Chanda, “A model-based shot boundary detection techniqueusing frame transition farameters, IEEE transaction on Multimedia , vol. 14, iss. 1, pp. 223-233, Feb.2012. [12] Na Lv, Zhiquan Feng and Jingliang Peng, “mutual information based video shot boundary detection” Image Analysis and Signal Processing, pp. 1-5, 9-11 Nov. 2012 [13] T. LU tong, P.N. Suganthan, “An accumulation algorithm for video shot boundary detection”, Multimedia Tools and Applications,vol. 22, iss. 1, pp. 89–106, Jan 2004. [14] K. Koumousis , V. Fotopoulos, A. N. Skodras , “A new approach to gradual video transition detection”, Informatics(PCI), pp. 245-249, 5-7 Oct 2012 [15] Liu Liu, Jian-Xun Li, “A Novel Shot Segmentation Algorithm Based on Motion Edge Feature”, 2010 Synopsis on Photonics and Optoelecctronic, pp. 1-5, 19-20 June 2010. Tuanfa Qin, Jiayu Gu, Huiting Chen, Zhenhua Tang, “A fast shot boundary detection based on __________________________________________________________________________ ISSN (Online): 2347-2820, Volume -1, Issue-2, 2013 8