Video Classification Using Normalized Information Distance Khalid Kaabneh Graduate College of Computer Studies Amman Arab University for Graduate Studies Amman-Jordan Abstract— There has been a vast collection of multimedia resources on the net. This has opened an opening for researchers to explore and advance the science in the field of research in storing, handling, and retrieving digital videos. Video classification and segmentation are fundamental steps for efficient accessing; retrieving, browsing and compressing large amount of video data. The basic operation video analysis is to design a system that can accurately and automatically segments video material into shots and scenes. This paper presents a detailed video segmentation technique based on pervious researches which lacks performance and since some of the videos is stored in a compressed form using the Normalized Information Distance (NID) which approximates the value of a theoretical distance between objects using the Kolmogrov Complexity Theory. This technique produced a better results in reference to performance, high recall and precision results . Keywords— Shot-Boundary Detection, Kolmogorov complexity, Normalized Information Distance. Asmaa Abdullah Zeineb Al-Halalemah Amman-Jordan binary strings. The Kolmogorov complexity, K(x) of x is defined as the length of the shortest program that can produce x. The conditional Kolmogorov complexity of x and y, K(x | y), is the length of the shortest program that produce x when y is given as input.[8] The Normalized Information Distance (NID) is an approximation of the Kolmogorov complexity using binary data and standard compression algorithms. The distance calculation between two images using NID is: d(x, y) max{(| c(xy) | - | c(y) |), ((| c(yx) | - | c(x) |)} max{| c(x) |, | c(y) |} …(1) where |c(x)| and |c(y)| is the length of the compressed data for x and y respectively. |c(xy)| is the length of the compressed data for the concatenation of x and y, and similarly for |c(yx)|. Lower values for d(x,y) indicate a smaller “distance” between the two images, indicating some similarity has been detected. The question is whether a small distance in from this calculation will translate into relevance between the two images. I. INTRODUCTION I ndexing and retrieval of digital video is an active research area in computer science . The increasing availability and use of on-line video has led to a demand for efficient and accurate automated video analysis techniques. In videos, a shot is an unbroken sequence of frames from one camera; meanwhile a scene is defined as a collection of one or more adjoining shots that focus on an object or objects of interest. The segmentation of video into scene is far more desirable than simple shot boundary detection [2]. This is because people generally visualize video as a sequence of scenes not of shots, so shots are really a phenomenon peculiar to only video. A number of researches have been devised to implement a segmentation systems based on in reference to image regions and features such as color, size, shape, and others[ ]. Some techniques make use relevance to color histogram comparison or motion tracking. This paper investigates the method known as Normalized Information Distance or (NID). NID approximates the value of a theoretical distance between frames using the theoretical Kolmogorov complexity. Kolmogorov complexity is a noncomputable function that can be used to measure the similarity between objects, data or anything that can be encoded as finite II. RELATED WORK A great deal of researches has been done on content analysis and segmentation of video using different techniques. In this paper, we have based ourselves on a pervious work we have experimented with using different segmentation techniques [a, b] and a work done by other researchers such as: Zhang, Kankanhalli, and Smoliar used a pixel-based difference method, which is one of the easiest ways to detect if two adjacent frames are significantly different [2,3]. Kasturi and Jain used a statistical methods expand on the idea of pixel differences by breaking the images into regions and comparing statistical measures of the pixels in those regions [4]. Zabih et al compared the number and position of edges in successive video frames, allowing for global camera motion by aligning edges between frames [10]. 1 III. OUR TECHNIQUE We detect shot boundaries by calculating the distance calculation between the two images in the types of videos, by the result of calculations we derive how many images and scenes per type of video with detect the value of d(x,y) when the image or scene vary from the next image or scene , also we compared the real calculation with the output from the distance calculation (we discussed the result in result section) , the distance equation is : d(x, y) max{(| c(xy) | -| c(y)|), ((| c(yx) | -| c(x) |)} max{| c(x) |, | c(y)|} are commonly used in the field of information retrieval .Recall is defined as the proportion of shot boundaries correctly identified by the system to the total number of shot boundaries present (correct and missed ). Precision is the proportion of correct shot boundaries identified by the system to the total number of shot boundaries identified by the system. We express recall and precision as: Recall = Correct Scene Changes Correct + Missed Scene Changes Precision = Correct Scene Changes Correct + False Scene Changes Where : | c(xy) | : length of compressed size of concatenated IV. TEST BED Table 2. Experiment results Recall and Precision Video Type Recall Precision News 93.3 78.6 Movies 95.3 100 Cookery program 90 100 Sports 88.9 72.7 Documentary 93.8 85 Comedy/ Drama 100 100 Commercials 100 91.7 To evaluate the results of our research, a database of various compressed video types was used. The video clips were digitized in Bmp format (total of 25442 frames) and resolution of 320*240 pixels. Video Type No.of frames News 4314 Movies 4194 Cookery program 3060 Sports 3045 Documentary 5752 Comedy/ Drama 2249 Commercials 2828 Total 25442 Table 1.Video types used & number of frames in each EXPERIMENTAL RESULT Results( Recall & Precision) 120 100 80 60 40 20 0 Ne ws The following table shows the video types used and the number of frames were analyses Sp or Do ts cu m en Co ta m ry ed y/ Dr am Co a m m er cia ls images y and x . | c(x)| : length of compressed size of image x . | c(y)| : length of compressed size of image y . Ideally, both recall and precision should equal 1. This would indicate that we have identified all existing shot boundaries correctly, without identifying any false boundaries. Now we will describe the results obtained from this experiment, the recall and precision for each video type as in table 2 and figure 1 below : M Co ov ok ie s er y pr og ra m images x and y . | c(yx) | : length of compressed size of concatenated Recall Precision In reporting our experimental results, we use recall and precision to evaluate system performance. Recall and precision 2 Figure 5. Results obtained from experiment (Recall&Precision) . [5] X.U. Cabedo and S. K. Bhattacharjee: Shot detection tools in digital video, in Proceedings Non-linear Model Based Image Analysis 1998, Springer Verlag, pp 121-126, Glasgow, July 1998. [6] V. CONCLUSION Video segmentation has been gaining a great attention since the world has been relaying on the internet for fast accessing of digital multimedia files such as digital videos. Video segmentation is the basic block for video indexing, searching, and compression. We have presented a novel video segmentation algorithm and demonstrated its performance. Better features based on dual comparison measurements further improved the segmentation. Swanberg, D., Shu C.F., and Jain, R., “Knowledge Guided Parsing and Retrieval in Video Databases”, in Storage and Retrieval for Image and Video Databases, Wayne Niblack, Editor, Proc. SPIE 1908, February, 1993, pp. 173-187. [7] Little, T.D.C, Ahanger, G., Folz, R.J., Gibbon, J.F., Reeve, F.W., Schelleng, D.H., and Venkatesh, D., “A Digital On-Demand Video Service Supporting Content-Based Queries”, Proc. ACM Multimedia 93, Anaheim, CA, August, 1993, pp.427-436. [8] R. Zabih, J. Miller, and K. Mai, A feature –based algorithm for detecting and classifying scene breaks, in Proceedings ACM Multimedia 95 , pages 189-200 , November 1993. [9] J. Canny , A computational approach to edge detection ,in IEEE Transaction on Pattern Analysis and Machine Intelligence 8(6),pages 679-698,1986. [10] Y. Gong, C. H. Chuan, and G. Xiaoyi, Image indexing and retrieval based on color histograms, in Multimedia Tools and Applications, volume 2, pages 133-156,1996. REFERENCES [1] H. J. Zhang, A. Kankanhalli and S. W. Smoliar, Automatic partitioning of full-motion video , in Multimedia Systems, volume 1 , pages 10-28, 1993. [2] R. Kasturi, and R. Jain, Dynamic Vision, in Computer Vision: Principles, R. Kasturi, and R. Jain, Editors, IEEE, Computer Society Press, Washington, 1991. [3] H. Ueda, T. Miyatake, and S. Yoshizawa, “IMPACT: An Interactive Natural-motion-picture dedicated Multimedia Authoring System, “in proceedings of CHI, 1991(New Orleans, Louisiana, Apr-May, 1991) ACM, New York, 1991, pp. 343-350. [11] J. Meng, Y. Juan, S.-F. Chang , Scene change detection in an MPEG compressed video sequence, in IS&T/SPIE Symposium Proceedings , volume 2419, February 1995. [12] M. Tague, The pragmatics Information Retrieval experimentation, in Information Retrieval Experiment, Karn Sparck Jones Ed., Buttersworth, pages 59-102, 1981. [13] Arman, F., Hsu, A., and Chiu, M-Y., “Image Processing on Encoded Video Sequences”, Multimedia Systems (1994) Vol.1, No. 5, pp. 211219. [a] K. Kaabneh and H. Al-Bdour. “A New Segmentation Technique and Evaluation.” Mu’tah Lil-Buhuth wadDirasat Journal, 19 (2): 39-51, Jan. 2004. [4] A. Nagasaka and Y. Tanaka, Automatic video indexing and full –video search for object appearances , in Visual Database Systems II , Elsevier Science Publishers , pages 113-117, 1992. 3 [b] K. Kaabneh, O. Alia, A. Suleiman, A. Aburabela, “Video Segmentation via Dual Shot Boundary Detection (DSBD)”, In the proceedings of the 2nd IEEE International Conference On Information & Communication Technologies (ICTTA’06), April 24 - 28, 2006. 4