International Journal of Advanced Intelligence Volume 2, Number 1, pp.37-55, July, 2010. c AIA International Advanced Information Institute ⃝ A Method for Detecting Subtitle Regions in Videos Using Video Text Candidate Images and Color Segmentation Images Yoshihide Matsumoto, Tadashi Uemiya, Masami Shishibori and Kenji Kita Faculty of Engineering, The University of Tokushima 2-1 Minami-josanjima, Tokushima 770-8506, Japan matsumoto@laboatec.com; uchikosi@helen.ocn.ne.jp; {bori;kita}@is.tokushima-u.ac.jp Received (January 2010) Revised (May 2010) In this paper, a method for detecting text regions in digital videos with telop, such as drama, movie and news programming, is proposed. The typical characteristics of telop are that it does not move, and that its edges are strong. This method takes advantage of these characteristics to produce video text candidate images. Then, this method produces the video text region images from both the video text candidate images and the color segmentation images. The video text region images and the original image are used to identify the color of the telop. Finally, text regions are detected by increasing neighboring pixels of the identified color. The experiment results show that the precision of this method was 80.36% and the recall was 77.55%, whereas the precision of the traditional method was 40.22% with the recall 75.48%. Higher accuracy was achieved by using this new method. Keywords: Video text candidate image; Color segmentation image; Video text region image; Multimedia information retrieval. 1. Introduction In recent years, with the spread of the Internet, increased hardware specifications, and the development of imaging devices such as digital cameras and digital video cameras, there are more and more opportunities to accumulate large amounts of video content in personal computers. It is difficult to efficiently search the required image or scene within these contents, and so the information is needed that clearly describes the content. The required information usually includes cut points, camera work, sound, and subtitles. Subtitles often describe the subject being photographed or the topic. Subtitles also appear in sync with the video, making them noteworthy as useful strings that reflect the semantic content. One of the well-known image-handling technologies focusing on subtitles is the Informedia project1,2 , where large-size image data are processed using images from cut scenes, subtitle-recognition characters, and speech-recognition data. A method has been proposed for matching cooking instructions and cooking images using subtitles and closed caption.3 A method to index the semantic attributes corresponding 37 38 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita to scenes in news programs using the closed caption has been proposed.4 A method has also been proposed for recognizing text residing within a subtitle region.5 To implement applied methods like this, it is first necessary to detect temporal and spatial ranges of the subtitles in the image. The establishment of a highly accurate method for detecting subtitle regions is desired. Sato et al.6 have proposed a traditional subtitle detection method, where macro block coding information is used to detect subtitle regions in images compressed as MPEGs. While this method allows for fast processing, the accuracy has not yet reached a practical level. Arai et al.7 have focused on a feature of subtitles, called edge pairs, to propose another method, where subtitle regions are detected from the spatial distribution and temporal continuity of edge pairs. Although the detection accuracy of this method has been developed to a practical level, the absence of a learning function may decrease the accuracy as the text fonts change. Hori et al.8 have proposed yet another method, where text candidate images are obtained from the logical products of low-dispersion images and immovable-edge images, followed by learning-based detection of subtitle regions. While this method leads to high recall, precision is low. Thus it tends to detect excessive regions as subtitles, resulting in subtitle text getting crushed. Additionally, there has been a proposal to increase the detection accuracy of subtitle regions by first creating text candidate images, and then using a learning-based identification device called Support Vector Machine (SVM)9 and a feature point extraction operator called Harris Interest Operator (Harris operator)10,11 . Although this method12 increases precision, it has its own issues such as the fact that it needs data for learning, and that the recall decreases. This paper proposes a method for detecting subtitle regions with high accuracy by first generating video text candidate images in the same way as in traditional methods7,8 , followed by checking color segmentation images against the original image. In this method, text candidate images are obtained first in the same way as in the traditional method8 , based on the regions where little brightness change occurs between continuous frame images, and on the regions with no changes in edges. The subtitles within the text candidates images obtained this way are detected almost perfectly, but the background tends to be excessively detected at the same time. In other words, the recall is high while the precision is low. As a workaround, the text candidate images and the color segmentation images obtained this way are combined, after which only the color segments that appear to be text are selected, thereby generating text region images with low background noise. The text region images thus obtained have few instances of the background falsely detected as text. However, because subtitle regions are detected based on color segments, some characters in minute color segments of the subtitle text tend to escape detection. In other words, the precision is high while the recall decreases. In an effort to improve the recall, text color was used, assuming that the color information of the subtitles does not change. Specifically, the color information of the subtitles is determined us- A Method for Detecting Subtitle Regions in Videos 39 ing multiple text region images generated within continuous frames and the original image. This is followed by the improvement of the recall by increasing neighboring pixels that have similar color information, thereby accurately detecting subtitle regions. Chapter 2 introduces a traditional method for detecting subtitle regions using video text candidate images. Chapter 3 proposes a method for generating text region images using video text candidate images and color segmentation images, as well as a method for detecting subtitles by automatically setting the color of the subtitle text using text region images and the original image. Chapter 4 provides experiments for assessing the validity of the proposed method, along with the results and discussions. Finally, Chapter 5 presents the conclusion and describes future issues. 2. Overview of Traditional Methods This chapter introduces a traditional method for detecting subtitle regions using text candidate images. Text candidate images are also used in our proposed method as subtitle region images in their first phase. 2.1. A method for generating video text candidate images using low distributed images and immovable edge images Hori et al.8 have proposed a method for generating video text candidate images from the logical products of low distributed images and immovable edge images. First, one low distributed image is created from continuous frame images based on an arbitrary number of brightness images. If the arbitrary number is an“ N, ”brightness frames for N frames are then used to obtain the distribution value of the brightness of each pixel. In this method, we chose brightness images for 4 frames. Pixels whose distribution values are lower than a specified threshold value are assigned a value of 1, with other pixels assigned 0 or 2, in order to obtain low distribute images. The threshold value is set using discriminate analysis. Static regions such as subtitles have little change in brightness, thus their distribution values are low. More dynamic regions have higher distribution values. Therefore, the resultant low distributed images tend to have most of the subtitles intact. Similarly, one immovable edge is created from continuous frame images based on an arbitrary number of brightness images. First, edge images with the value of 2 are obtained from brightness images. Wavelet conversion is used to detect edges. Then the logical product of the edge images for N frames is obtained. In this method, we chose brightness images for 4 frames. The images obtained by the logical product are called immovable edge images, which have sharp edges on the boundaries with the background. Static pixels are prone to remain here, and so the subtitles tend to remain in a similar way to low distributed images. Low distributed images and immovable edge images are obtained in the flow shown in Fig. 1. The logical product 40 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita Fig. 1. An illustration of making a video text candidate image from each video frame. obtained from the low distributed images and immovable edge images, which in turn are obtained in the above manner, will generate video text candidate images. 2.2. A method for detecting subtitles using SVM and the Harris operator Hiramatsu et al.12 have proposed a method which suppresses erroneous detection with the use of SVM and the Harris operator. In this method, video text candidate images are first generated excluding as much as possible the background parts except for the subtitles. Then the video text candidate images are divided into blocks similar in size to the pre-determined text, as shown in Fig. 2. Brightness histogram for each block is created from the white pixels remaining in that block. Each block is assessed using SVM, labeling subtitle-bearing blocks as positive, and those without subtitles as negative. The Harris operator, which is high in recall for image enlargement, is then applied to images determined to be subtitle regions by SVM, to increase the precision. The interest points detected by the Harris operator are seen abundant in parts with large color variation as well as in edges. Since in many cases subtitle regions are represented as supplementary colors for the images around them, it is expected that many interest points will be detected in the vicinity of subtitle regions. Therefore, blocks with positive identification by SVM are detected as subtitle regions if they have many interest points. Subtitle regions may not be recognized if edges of the text reside within blocks. Subtitles are long text strings aligned horizontally in a long string. Therefore, to avoid this non-recognition issue, the number of interest points on the right and left side of the region in question is used to determine if that region is a subtitle region. A Method for Detecting Subtitle Regions in Videos 41 Fig. 2. An example of histogram data generation. Fig. 3. An example of detecting the interest points by the Harris operator. 2.3. Issues on traditional methods Traditional methods eliminates the background using the characteristics of subtitles found in images, and then recognizes subtitle regions using SVM with the manually-prepared positive and negative data, and the interest points. However, 42 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita such traditional methods have the following issues: (i) Positive and negative data must be prepared manually so that SVM can learn them. (ii) Subtitle texts not residing within divided blocks may make subsequent text recognition difficult. To solve these issues, we focused on techniques for dividing image regions. In the subsequent chapters, we will discuss a subtitle-region detection method based on a technique for dividing image regions. 3. The Proposed Method This paper proposes a method for detecting subtitle regions based on images that have been processed with color segmentation and video text candidate images. We will call subtitle images generated using video text candidate images and color segmentation images “text region images.” We will first discuss a method for generating text region images using video text candidate images and color segmentation images. After that, we will use the text region images and the original image to automatically set the text color, and discuss the process flow for detecting final subtitle regions. 3.1. Generating text region images using color segmentation images 3.1.1. Introduction to the method The process flow for detecting subtitles based on color a segmentation image is shown in Fig. 4. First, a video text candidate image is obtained in the same way as in the traditional method8. At the same time, an image processed with color segmentation (“a color segmentation image”) and color segmentation image data are obtained (Step 1 of Fig. 4). The color segmentation image data include the region numbers, the size of each region (the total number of pixels), the central coordinate (x, y), the color information of the regions (luv), and the coordinates that belong to the regions. A video text candidate image is created from four continuous frames, while a color segmentation image is created from the first frame that was used when creating the video text candidate image. Then, these two images are used to eliminate noise (Step 2 of Fig. 4). We will process the elimination in two ways: (1) by horizontally scanning the video text candidate image so that only the pixels within the subtitles remain, and (2) by checking the video text candidate image against the color segmentation image data in order to select only the color segments that appear to be subtitles. After this elimination process, we will eliminate the edges of subtitles, because it is common for subtitles to have edges added on (Step 3 of Fig. 4). Specifically, we take advantage of the fact that the bodies and edges of subtitle characters use different colors. We will A Method for Detecting Subtitle Regions in Videos 43 Fig. 4. Outline of detecting text regions by using color segmentation images. use the k-means method to classify the colors of the regions that contain the white pixels that are left after the noise elimination process. Finally, we will supplement the text characters (Step 4 of Fig. 4) to improve recall. Specifically, we will search each segmentation region around pixels that remain as part of the subtitles at the end of Step 3, and increase the regions that resemble the subtitle region in size and typical color. Below are detailed discussions of each module. 3.1.2. Generating color segmentation images In this process, the region integration method is used to generate color segmentation images. Region integration is a method for dividing an image into multiple sets (regions) of pixels that have similar amount of characteristic and are spatially close, based on such characteristics as the pixel values and the texture. The reason we chose this method is the characteristic of the subtitles. As discussed in the section on low distributed images, the brightness of subtitles vary little, and their color does not change much. In other words, all subtitles have more or less the same characteristics, which led us to speculate the color segmentation process might successfully extract subtitle regions. Below are the steps for integrating regions. Fig. 5 and 6 show examples of color segmentation images generated using the region integration method. Step 1 Search for each pixel by raster scanning, flagging any unlabeled and unclassified pixels and labeling them. Step 2 Check eight (8) pixels neighboring the flagged pixels, and assign them the same label as that of the flagged pixels if the pixel value is the same. Step 3 Repeat Step 2 with the newly labeled pixels as the flagged pixels. 44 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita Step 4 If no pixels are labeled in Step 2, repeat Step 1. Step 5 The process is complete when all pixels have been labeled. Sets (regions) of neighboring pixels with the same pixel value are obtained at this point. Proceed to the next step using the labeled pixels. Step 6 Obtain the average pixel values among the pixels bearing the same label. Step 7 Of the neighboring sets of pixels, integrate the two that have the smallest difference in the average pixel values obtained in Step 6. Step 8 Repeat Steps 6 and 7. To avoid the eventuality of only one existing set of pixels, the maximum average difference should be established for allowing integration. [End of the steps of the region integration method] Fig. 5. Original image. Fig. 6. Color segmentation image. Fig. 7. Noise elimination by scanning of white pixels. 3.1.3. Noise elimination The process of noise elimination is two-fold. The first phase starts with horizontal scanning of a video text candidate image as shown in Fig. 7, creating a histogram with a tally of white pixels. The scanning direction depends on the direction of the subtitles. Because white pixels are packed into subtitles, the histogram shows locally high numbers where subtitles are found. Based on this observation, locations where the histogram numbers show sharp climbs and falls are identified, and only A Method for Detecting Subtitle Regions in Videos 45 these locations are kept, thus narrowing down subtitle-containing regions. In the second phase of noise elimination, we take advantage of the characteristics of subtitles, i.e., images processed with color segmentation based on color information are used. Because each character of the subtitle has the same color information, we can predict that the background and the subtitles reside in different regions of a color segmentation image. We can also predict that the subtitle regions are narrower than the background. The noise elimination process takes advantage of these characteristics. First, the video text candidate image and the color segmentation image data are checked against each other after the noise eliminating process in Phase 1, and the ratio of white pixels in each region is measured. Next, regions that have higher number of white pixels than the threshold value are made all white, and all other pixels are made black. Since subtitle regions are smaller than backgrounds, even a single white pixel remaining in a region might continue to remain after this process largely depending on whether the region is a subtitle region or a background. Fig. 9 shows an image after noise elimination. Fig. 8. Noise elimination by ratio of white pixels. Fig. 9. Image with noise eliminated. 46 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita 3.1.4. Classification by k-means Generally, each character of subtitles consists of the edge part and the character itself, each in its own color. After noise elimination, an image may still have both of these parts left. If the edge part is still left, the entire character is crushed, making it difficult to indentify the character, especially if it is a complicated character such as kanji. The k-means method enables the classification of each pixel in the subtitle characters based on the color information, and it detects only the pixels that belong to the characters. In a video text candidate image with noise eliminated, the colors of the regions to which the remaining white pixels belong are classified using kmeans, as shown in Fig. 10. After the classification, only the regions with colors that belong to the class with the most clusters are kept. Fig. 11 shows an example of an image after classification by k-means. 3.1.5. Character complementation As can be seen in Fig. 11, images that have been classified by k-means tend to be high in precision and low in recall, leading to frequent non-detection. We will now focus on the characteristics of each region, and supplement the subtitle region. The part of the region that falls within the 16 x 16 square pixels of the remaining white pixels is searched as shown in Fig. 12. Then the Euclidean distance is calculated with the size of the region, the central coordinate of the region, and the color of the region as the amounts of characteristic. If the resultant Euclidean distance is less than the threshold value, that region is added as a subtitle. Fig. 13 shows the image after character complementation, i.e., the video text region image after the application of the method based on color segmentation images. Fig. 10. Classification of each pixel by k-means. A Method for Detecting Subtitle Regions in Videos 47 Fig. 11. Image after classification by k-means. Fig. 12. An illustration of complementation of the text characters. Fig. 13. An example of the video text region image after application of the proposed method. 3.2. Method for detecting subtitles by automatically setting the text color 3.2.1. Overview of the method Text region images generated following the method described in the preceding section tends to escape detection in the minute segmentation regions residing within the subtitle section, lowering the recall. We will now apply a recall-improving technique based on text color (Fig. 14). First, the color information of the subtitles is specified using the multiple text region images generated among continuous frames, and the original picture image (Step 1 of Fig. 14). Then the text characters are supplemented (Step 2 of Fig. 14). The pixels remaining after supplementation are labeled, and regions that are too large are removed (Step 3 of Fig. 14). We will discuss the details of each module. 48 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita Fig. 14. Outline of detecting text regions by specifying the text color. Fig. 15. An illustration of color histogram generation. 3.2.2. Automatically setting the text color Multiple text region images generated from continuous images and the original picture image of the top frame that was used to generate each text region image are used to specify the range of the subtitle text color. In this experiment, we focused on the pixels remaining in thirty (30) text region images. These pixels are checked against the original picture image to extract the RGB value. Then the RGB 256 gradation levels are compressed into 16 levels to generate histogram (Fig. 15). The gradation level with the most pixels is determined to be the range of the color of this text. A Method for Detecting Subtitle Regions in Videos 49 3.2.3. Character supplementation and labeling Eight (8) square pixels around the white pixels remaining after Step 7 of Fig. 14 (pixels that have been detected as being within the subtitle region) are searched. Pixels residing within the range determined in Step 7 are made white. Eight square pixels around these new white pixels are similarly searched until no more pixels are joining. After characters are supplemented, they are labeled. Labels that are connected above the threshold value are removed. Fig. 16 shows an example of the final result after automatically setting the text color. Fig. 16. An example of a video text region image generated by the proposed method. 4. Assessment 4.1. Method of experiment We conducted an experiment to confirm the validity of our proposed method. As the experiment data, we used drama image data that includes subtitles with full RGB colors, a resolution of 352 x 240, and a frame rate of 29.97 fps. As the correct data, we only used the images of the subtitles in the overlay region of this drama. The correct data, text candidate images, and the text region images were checked against one another for each pixel to calculate the precision and recall. 30 images were selected at random from the drama for assessment. The criteria for assessment recall (r) and precision (p) can be represented in the following formulae (1),(2): precision : p = recall : r = Nd Nd + Nf Nd Nd + Nm (1) (2) Nd : Number of correctly detected pixels Nm : Number pixels that escaped detection Nf : Number of falsely detected pixels The level of Precision and Recall in the method in the past is as shown in Table 1. The unit used in detection is the number of pixels. Detection is deemed correct if white pixels exist where the subtitles within the frame are displayed. It is deemed “ escaped detection ”if white pixels do not exist. Detection is deemed false if white 50 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita Table 1. Evaluation of method in the past. Precision Recall 40.22% 75.48% pixels exist in frames or locations where there are no subtitles. The experiment criteria for each method are listed below: • Method for detecting subtitles using segmentation region images – The threshold value used for noise elimination in the second phase is 50 – The number of classes for k-means is 2, 3, 4, and 5, variable. – The threshold values of the Euclidean distance when characters are supplemented are 10, 20, and 30, variable. • Method for detecting subtitles by automatically setting the text color Regions are removed by labeling if they include 128 or more pixels that are connected. 4.2. Experiment results Fig. 17 shows the shift in the detection accuracy when the number of classes used by k-means in the text detection method with region segmentation images changes, and when the threshold value of the Euclidean distance changes in character supplementation. In Fig. 17, K=N Precision represents the precision value when the number of classes under the k-means method is set to N (2, 3, 4, or 5), and K=N Recall represents the recall value when the number of classes is set to N. In addition, eucM represents the detection accuracy when the threshold for the Euclidean distance is set to M (10, 20, or 30) in character supplementation. The results shown in Fig. 17 indicate that precision is higher than recall in each case as well as the recall remains steady. Changes in parameter values did not affect accuracy significantly when the threshold for the Euclidean distance changed. On the other hand, when the number of classes under k-means changed, both precision and recall were affected. When the number of classes was 3, the precision was highest, decrease in recall was at a minimum, and the balance between these two elements was optimal, producing the best accuracy. Each character in subtitles generally consists of three parts: the background, the edges, and the character body. It is expected that setting the number of classes to three (3) enabled appropriate classification and detection of the character bodies. When the number of classes was set to two (2), the background mixed into the selected class, resulting in lower precision. Higher numbers of classes such as 4 and 5 resulted in big drops in recall, with increased numbers of pixels that escaped detection. This is because subtitle characters do not consist of exactly the same color, but rather the color varies slightly from character to character. For example, subtitle characters that are seemingly white were found to consist of four (4) smaller A Method for Detecting Subtitle Regions in Videos 51 parts: mostly white, light gray, gray, and dark gray. Although the mostly white part has more pixels than the dark gray, larger numbers of classes ultimately lower the probability of the mostly white part being detected. We can reason that, as a result, there were more pixels escaping detection and edges were falsely detected, significantly lowering recall. Fig. 17. Experiment results. Fig. 18 shows the results of an experiment comparing one of traditional methods (video text candidate images) and our proposed method (text region images and final result images), using the parameters with which the accuracy was best in the experiment shown in Fig. 17 (the number of classes = 3, and the threshold for the Euclidean distance = 30). 52 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita Fig. 18. Experiment Results. Fig. 19. An example of a video text candidate Fig. 20. An example of a video text region imimage. age. Fig. 21. An example of a video text region im- Fig. 22. An example of a video text region image that uses a color segmentation image. age with automatic setting of the telop color. Fig. 18 shows that our proposed method brings about better results in both precision and recall compared to traditional methods. Noise elimination was a factor in the improvement of precision. In rather static video scenes with such objects as a building, many non-subtitle pixels remained in the video text candidate image, lowering precision. It may be argued that our proposed method eliminated the non-subtitle pixels, improving precision. Additionally, edges of video text candidate images had a strong tendency to remain in subtitles as Fig. 19 indicates, and in many cases only the edge of a character remained. The fact that our proposed method enabled the supplementation of missing parts of a character as in Fig. 20 may have contributed improved precision. Recall of the method based on region segmentation images did not significantly improve over traditional methods. To deal with this issue, color classification under A Method for Detecting Subtitle Regions in Videos 53 k-means in our proposed method is simply based on the number of elements in each class. In cases where many edge pixels remained, this may have disabled supplementation of subtitles almost completely as Fig. 21 shows, resulting in lower recall. In the future, the location of each element in a class should be taken into consideration to develop an algorithm that enables more accurate selection of the class for the character body. Furthermore, we found that accuracy was higher in a version of our method where region segmentation images are used and then the text color is automatically set, compared to a version that uses region segmentation images alone. Fig. 21 and 22 are of the same scene. We can see that Fig. 22 shows less missing pixels of characters and more correct pixels have been detected compared to Fig. 21. Subtitles that do not benefit from a method using region segmentation images alone can be improved if text color is considered. Use of text color may be the factor in the improvement in recall. 5. Conclusion This paper has proposed a method for detecting subtitle regions using region segmentation images. Assessment experiments confirmed that our proposed method has high detection accuracy over a traditional method of using video text candidate images. It is thought that our proposed method has arrived at a practicable level because it can clearly detect the character from the text regions as shown in Fig. 20 and 22. However, there is a problem with a lot of parameter and thresholds that should be set to make video text candidate images. Future issues include a review of the ways to automatically set parameters under our proposed method, and improvement of accuracy by eliminating minute non-subtitle regions. 54 Y. Matsumoto, T. Uemiya, M. Shishibori and K. Kita Acknowledgments This study was financed partly by the Basic Scientific Research Grant (B) (17300036) and the Basic Scientific Research Grant (C) (17500644). References 1. H.D. Wactler, A.G. Hauptmann, and M.J. Witbrock, Informedia News-on-Demand: Using Speech Recognition to Create a Digital Video Library, CMU Tech. Rep. CMU-CS-98-109, Carnegie Mellon University, 1998. 2. H.D. Wactler, M.G. Christel, Y. Gong, and A.G. Hauptmann, Lessons Learned from Building a Terabyte Digital Video Library, IEEE Comput., 32(2), pp. 66-73, 1999. 3. H. Miura K. Takano S. Hamada I. Iide, O. Sakai and H. Tanaka, Video Analysis of the Structure of Food and Cooking Steps with the Corresponding, IEICE Journal, J86-D-II(11), pp. 1647-1656,2003. 4. I. Iide, S. Hamada, S. Sakai and E. Tanaka, TV News Subtitles for the Analysis of the Semantic Dictionary Attributes, IEICE Journal, J85-D-II(7), pp. 1201-1210,2002. 5. S. Mori, M. Kurakake, T. Sugimura, T. Shio and A. Suzuki,The Shape of Characters and the Background Characteristics Distinguish Correction Function by Using Dynamic Visual Character Recognition in the Subtitles, IEICE Journal, J83-D-II(7), pp. 1658-1666, 2000. 6. S. Sato, Y. Shinkura, Y. Taniguchi, A. Akutsu, Y. Sotomura and H. Hamada,Subtitles from the MPEG High-speed Video Coding Region of the Detection Method, IEICE Journal, J81D-II(8), pp. 1847-1855,1998. 7. K. Arai, H. Kuwano, M.Kurakage, T.Sugimura,The Video Frame Subtitle Display Detection Method, IEICE Journal, D-2, J83-D-2(6), pp. 1477-1486, 2000. 8. O. Hori, U. Mita,Subtitles for Recognition from the Video Division Robust Character Extraction Method, IEICE Journal,D-2, J84-D-2(8), pp. 1800-1808, 2001. 9. http://svmlight.joachims.org: “SVM-Light Support Vector Machine” 10. C. Harris, and M. Stephens,A Combined Corner and Edge Detector, Proceeding of the 4th Alvey Vision Conference, pp. 147-151, 1988. 11. V. Gouet and N. Boujemaa,Object-based Queries Using Color Points of Interest, Proceedings of IEEE workshop CBAIVLICBPR, pp. 30-38, 2001. 12. D. Hiramatsu, M. Shishibori and K. Kita,Subtitled Subtitles from the Area of Video Data Detection Method, IEICE Journal Information Systems and Information Industry Association and the Joint Research, IP-07-24 IIS-07-48, 2007. Yoshihide Matsumoto He graduated The Information System Technology & Enginering cource student from Kochi University of Technology in Mar. 2002. He joined Laboatec in Japan Co,LTD. in same year, he became the CTO position, Applied IT Lab. Master in 2008. He became a Graduate School of Doctor Program cource of Advanced Technology & Sience at the University of Tokushima in Oct. 2006. His Study of Multi-Media IT System Publication was received a 2003 Japan IBM user symposium of QuasiSelected Award. A Method for Detecting Subtitle Regions in Videos 55 Tadashi Uemiya He graduated from Waseda University in Mar.1968, joined Kawasaki Heavy Industry Co. Ltd. in 1968, through 2000, and same year, transferred to IT Dep. of Benesse Co.Ltd to 2006 of Retired. He became a Graduate School of Doctor Program cource student of Advanced Technology & Sience at the University of Tokushima in Oct. 2006. His research interests include IE & IT and Inovation IT Solution and information Technology. He has experienced for Aero Jet Engines Deveropment Project of international 5 countries, development CAD/CAM/CAE/CG SYSTEM and Implementation, PICS, Web Information Infrastructure and was Senior member of IEEE etc. And First instration IP public network implementation with MPLS Technology in Japan with Co-work Project of NTT and CISCO Japan. ; and many experience for Security system of ISMS, SRMS and The personal information Protection. He is Executive IT Consulting Now. Masami Shishibori He graduated from the University of Tokushima in 1991, completed the doctoral program in 1995, and joined the faculty as a research associate, becoming a lecturer in 1997 and an associate professor in 2001. His research interests are multimedia data search and natural language processing. He is a coauthor of information Retrieval Algorithms (Kyoritsu Shuppan). He received the ISP 45th Natl. Con. Incentive Award. He holds a D.Eng. degree, and is a member of ICIER and NLP Kenji Kita He graduated from Waseda University in 1981, joined Oki Electric Industry Co., Ltd. in 1983, and transferred to ART Interpreting Telephony Research Laboratories in 1987. He became a lecturer at the University of Tokushima in 1992, an associate professor in 1993, and a professor in 2000. His research interests include natural language processing and information retrieval. He received a 1994 ASJ Technology Award. His publications include Probabilistic Language Models (Tokyo University Press) and Information Retrieval Algorithms (Kyoritsu Shuppan). He holds a D.Eng. degree.