Justin Kwong Graduate Introduction to Imaging Science 11/9/07 Seam Carving as a Superior Algorithm for Content-Aware Image Resizing Seam carving is an algorithm developed to resize images while maintaining the size of the content that is important to the viewer. This is sometimes called content-aware resizing or image retargeting. While not flawless, it is superior to other techniques because it is robust and efficient, to the point of being able to retarget images in real time. It also has the ability to interchange, combine and manually alter its metric of feature importance while still using the same technique for image resizing, making it extremely flexible. The variety of digital displays in today’s world makes image resizing an extremely important task. A particular image is likely to be displayed on both a tiny cellular phone screen and on an entire wall by a projector. The task is not as simple scaling the image a certain percent. Inherent in many of these displays is a change in aspect ratio, where the height and width of the display has different scale factors than the original image. An analogous problem is trying the change only the height or width on an image inside a digital presentation or paper. The result is an image that looks stretched or squished. Even more relevant is the current state of television displays. Video is being displayed in the old 4:3 and new 16:9 ratios. Television can be found with screens built for either ratio. Somehow, the television must accommodate video signals of both aspect ratios. Currently implementation of resizing only uses techniques like cropping and scaling with no regard for image content. The seam carving algorithm is fairly straightforward with some intricacies for implementation. It begins by converting an image into an energy image with what they call an energy function. The energy they refer to in the paper [Avidan and Shamir 2007] is simply a measure of feature importance at every pixel location. Any function can be used to do this, and test have shown that gradient magnitude e(I) I I x y Page 1 of 11 works well for many cases [Avidan and Shamir 2007]. Some other energy functions tested were L1, L2-norm of gradient, saliency measure [Itti et al. 1999], Harris-corners measure [Harris and Santella 1988], and Histogram of Gradients. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 1: Image and its gradient magnitude image [Avidan and Shamir video 2007] Different energy functions have been shown to be effective in different cases and produce artifacts in others. Because the seam carving algorithm can use any of these functions, the algorithm can be fine tuned for certain tasks, but can also incorporate future importance metrics that are even more robust than gradient magnitude. The next logical step is to remove the pixels of lowest energy in order to change the size of the image. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 2: Various methods of pixel removal in order to reduce the width of the original image (a) [Avidan and Shamir 2007] Figure 2 shows the resulting images using various methods for image size reduction. The (f) image is the case where the pixels of lowest energy are removed in ascending order until the total number of pixels left in the image is the number of desired pixels for the resized image. As each pixel is removed, the remaining pixels are shifted to the left. It is evident that this method has no consideration for the structure of the image. The black area to the right of the image shows the amount of low energy pixels in a particular row. Image (e) is a slightly improved method where an equal number of the lowest energy Page 2 of 11 pixels are taken from each row, but this still distorts the image significantly. Image (c) removes columns with the lowest total energy in ascending order. Clear breaks in the structure can be seen where a column was removed. Seam carving (d) uses a clever method of traversing the image starting at a top pixel (or left for vertical scaling) and following a path of least energy. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 3: Average Pixel Energy for various pixel removal methods. [Avidan and Shamir 2007] Figure 3 shows how the removal of pixels increases the average energy of the pixels. Theoretically, the higher the average energy of the pixels is at a certain width, the more important information was retained. However, as shown in the optimal and pixel images of Figure 2, only maintaining the highest energy pixels leaves no consideration for the for structure of the image. On the other hand, cropping and removing columns forces many of the high energy pixels to be removed. The blue curve shows how seam carving is a middle ground that takes into consideration the energy as well as the structure of the image. Creating a seam by following a path of least energy is straightforward for digital images. Starting at any top pixel for vertical seams or left pixel for horizontal seams, the pixel of lowest value in the three adjacent pixels (either below the starting pixel or to the right) is chosen as the next step on the path. This is then repeated until of the other side of the image is reached. Page 3 of 11 QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 4: Image with horizontal and vertical seam highlighted in red. [Avidan and Shamir 2007] Figure 4 shows how this method avoids breaking up the structures in the image. The vertical seam travels between trees in the tree line and around the rock at the bottom of the image. The horizontal seam moves around the small tree at the bottom right of the image. Using any top pixel or left pixel as the starting point, a seam for every column and row can be calculated. Just like the column removal example, the seams can be ordered from lowest to highest energy by summing all the pixels in the path, and then each seam removed in ascending order of total energy. Seams can also be added to the image to increase its dimensions. In this case, a set of the lowest energy seams are found and seams are inserted next to them. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 5: Example of enlarging an image using seam carving. (a) being the original image. [Avidan and Shamir 2007] Image (b) from Figure 5 shows the smear like artifact created if only the lowest energy seam is used. Therefore, a set of low energy seams is used (highlighted in image (c)). The resulting image (d) looks just as authentic as the original image (a). Page 4 of 11 Another unique advantage of the seam carving algorithm is its ability to remove features in image. This is accomplished by adding negative weight to the energy image. A user can choose this region on the image using some form of graphical user interface. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 6: Example of feature removal. [Avidan and Shamir 2007] To generate the right image in Figure 6 from the original on the left, negative weight was placed on the women in the image. In this case, positive weight was to the man’s energy pixels to ensure that he was not removed from the image. This is shown in the bottom right inset of the original image as red and green highlight. The width of the image was then reduced until every seam containing the woman had been removed. The field of content-aware digital image resizing is still fairly new; nevertheless seam carving is not the only attempt at retargeting images. The major task in all these algorithms is identifying what important features in the image are so that they can be maintained. This mainly psychological criterion has two general approaches: top down and bottom-up. The top down method, takes into account the internal state of the viewer such as his/her motivation at that moment. For example, face-detecting algorithms use people’s desire to identify other people in a scene as an important feature location [Viola and Jones 2001]. Bottom-up methods only take into account internal queues, general to all perception. A saliency map is an example of this. The saliency algorithm takes into account numerous factors like color, orientation, motion, intensity, etc. to identify where the focus of the viewer will be in an image. A few algorithms have been developed that rely on cropping out a region of interest. A method developed by Suh et al. [2003] uses the output of either top down or Page 5 of 11 bottom-up metrics to locate an important feature, and then creates a smaller image by cropping the feature out. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 7: Image and saliency map, original and cropped [Suh et al. 2003] This method has been adapted to displaying large images on small mobile device by Chen et al. [2003]. Liu et al. [2003] extended the method by cropping out multiple important features in the image and displaying them over time. However, all these algorithms are limited by the method of cropping. In Suh and Chen’s method, if the image has multiple interesting features, only one is shown. Even with the added value of Liu’s method, in any form of cropping, information about the background as well as how things in the image are located relative to each other is lost. Seam carving, on the other hand, maintains the relative positions of features in the image, and by removing seams in order of total energy, the least noticeable/important background is removed first. Another method developed for displaying images on various devices developed by Jacobs et al. [2003] uses an adaptive grid method. Images as well as text are placed in a template. The algorithm automatically adjusts the template for a certain display sizes, taking into account the size and ratio of the grids in the template. The major downfall of this paper is that it makes no mention of how images would be resized e.g. what if the image in the grid is larger than the display. On top of which, it forces the user to do time consuming preprocessing of their data (separating information into different grids), acquire and learn software for generating a template like the one shown in Figure 8. Page 6 of 11 QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 8: Template windows document layout [Jacobs et al. 2003] Seam carving can be done to any image and is efficient enough to be done in real time. Just as text in a webpage can be dynamically warped to the next line when changing the size of the browser window, images on a webpage can be dynamically resized by seam carving [Swieskowski 2007]. Liu and Gleicher [2005; 2006] developed a retargeting method that combines cropping and non-linear image resizing. Like the previously mentioned cropping schemes, the most important feature of the image is found using saliency and left unchanged. The area around that region is then scaled in a non-linear fashion to reduce the image size. Original Radial/Linear Radial/Quadratic Cartesian/Linear Cartesian/Quadratic QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 9: Fisheye-view warping scheme [Liu and Gleicher 2005] Figure 9 shows the various non-linear scaling methods used for warping the background. While this is an improvement on solely cropping, which totally eliminated the context of the image, the distortion created by the warping of the background is unmistakable and objectionable. In Figure 9, the Eiffel Tower is almost completely unrecognizable and the person’s hands are completely distorted in all the scaling schemes. In Seam carving, an Page 7 of 11 entire area is never scaled or changed at a time. For the particular original image presented in Figure 9, the difference in intensity of the person’s face and the tower compared to the night sky would generate seams that traveled around those two important features, causing much less distortion. Setlur et al. [2005] devised a non-photorealistic method of image retargeting similar to Liu and Gleicher’s. In Setlur’s method, the exact outline of the feature of interest is cropped out, that region is filled in with the surrounding background. The background is resized and then the feature is placed back into the image. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZ W) decompressor are needed to see t his picture. Figure 10: Non-photorealistic image retargeting steps. Left to right: original image, feature map, feature identification, feature removal, and filled in background. Bottom: retargeted image. [Setlur et al. 2005] The obvious draw back in this algorithm is the extreme change in perspective. In Figure 10, the objects that were far away in the sky are retargeted, making them look like giant toys hovering right above the man. In seam carving, the relative size of the features to the background remains unchanged for reasonable resizing. The space between the important features is simply removed. Another drawback of Setlur’s method is that it is computationally heavy. In order to cleanly outline and remove features, combinations of many feature metrics must be used. Seam carving can be limited to one simple metric like gradient magnitude for efficient image processing. A recent algorithm proposed by Gal et al. [2006] is almost a combination of Liu and Gleicher’s method with Setlur’s method. The algorithm is a formulation of a Page 8 of 11 Laplacian editing technique that allows an image to be warped arbitrarily but maintains the structure of a specified feature. The feature can be outlined using a non-rectangular shape and remain unchanged like in Setlur’s method, and the background can be warped in any fashion like Liu and Gleicher’s method. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Figure 11: Comparison of warping techniques. [Gal et al. 2006] Currently this algorithm requires user defined features, but it could incorporate the feature selection scheme used by Setlur. This, however, leaves the same problem of large computation times being needed to properly identify and outline the feature. Also, the local constraints of the features cannot always be met depending on the global warping of the image. Because seam carving discretely removes pixels, it does not affect in intensity of any of the remaining pixels. Seam carving appears to be the most functional and ready to implement retargeting algorithm at present. Its ability to use different energy or feature importance metrics allows it to be robust but it can also be tailored to specific tasks. The choice of less computationally heavy metrics and ease of calculating a seam makes it highly efficient as well. The ability to enlarge images without noticeable distortion and remove features are attributes that other retargeting algorithms do not have. The path followed by each seam has obvious advantages over only being able to work with rectangular regions of interest, like in Liu and Gleicher’s method. Because seam carving removes discrete pixels to resize the image, no distortions or artifacts are generated from having to warp the image. With further testing on various devices and image contents, seam carving could make tasks like surfer in the web on a PDA much more practical and common in the near future. Page 9 of 11 References Avidan, S., and Shamir, A. 2007, Seam Carving for Content-Aware Image Resizing. ACM Transactions on Graphics, v26, n3. Chen, L., Xie, X., Fan, X., Ma, W., Zhang, H., and Zhou, H. 2003. A visual attention model for adapting images on small displays. Multimedia Systems 9, 4, 353–364. Dalal, N., and Triggs , B. 2005. Histograms of oriented gradients for human detection. In International Conference on Computer Vision & Pattern Recognition, vol. 2, 886–893. Gal, R., Sorkine, O., and Cohen-Or, D. 2006. Feature-aware texturing. In Eurographics Symposium on Rendering. Harris, C., and Stephens , M. 1988. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, 147–151. Itti , L., Koch, C. , and Neibur, E. 1999. A model of saliency based visual attention for rapid scene analysis. PAMI 20, 11, 1254–1259. Jacobs, C., Li, W., Schrier, E., Bargeron, D., and Salesin, D. 2003. Adaptive grid-based document layout. In Proceeding of ACM SIGGRAPH, 838-847. Kadir, T. 2001. Saliency, scale and image description. International Journal of Computer Vision, v45, n2, 83-105. Swieskowski, P. 2007. Seam Carving Demo. http://swieskowski.net/carve/. Accessed October 2007. Liu, F., and Gleicher, M. 2005. Automatic Image Retargeting with Fisheye-View Warping. In ACM UIST, 153-162. Page 10 of 11 Liu, F., and Gleicher, M. 2006. Video-Retargeting: Automating Pan and Scan. In ACM international conference on Multimedia, 241-250 Liu, H., Xie, X., Ma, W., and Zhang, H. 2003. Automatic browsing of large pictures on mobile devices. Proceedings of the eleventh ACM international conference on Multimedia, 148-155. Setlur, V., Takagi, S., Raskar, R., Gleicher, M., and Gooch, B. 2005. Automatic Image Retargeting. In In the Mobile and Ubiquitous Multimedia (MUM), ACM Press. Suh, B., Ling, H., Bederson, B. B., and Jacobs, D. W. 2003. Automatic thumbnail cropping and its effectiveness. In USIT ’03: Proceedings of the 16th annual ACM symposium on User interface software and technology, ACM Press, New York, NY, USA, 95-104 Viola, P., and Jones, M. 2001. Rapid object detection using a boosted cascade of simple features. In Conference on Computer Vision and Pattern Recognition (CVPR). Page 11 of 11