Seam_Carving_Paper

advertisement
Justin Kwong
Graduate Introduction to Imaging Science
11/9/07
Seam Carving as a Superior Algorithm for Content-Aware Image Resizing
Seam carving is an algorithm developed to resize images while maintaining the
size of the content that is important to the viewer. This is sometimes called content-aware
resizing or image retargeting. While not flawless, it is superior to other techniques
because it is robust and efficient, to the point of being able to retarget images in real time.
It also has the ability to interchange, combine and manually alter its metric of feature
importance while still using the same technique for image resizing, making it extremely
flexible.
The variety of digital displays in today’s world makes image resizing an
extremely important task. A particular image is likely to be displayed on both a tiny
cellular phone screen and on an entire wall by a projector. The task is not as simple
scaling the image a certain percent. Inherent in many of these displays is a change in
aspect ratio, where the height and width of the display has different scale factors than the
original image. An analogous problem is trying the change only the height or width on an
image inside a digital presentation or paper. The result is an image that looks stretched or
squished. Even more relevant is the current state of television displays. Video is being
displayed in the old 4:3 and new 16:9 ratios. Television can be found with screens built
for either ratio. Somehow, the television must accommodate video signals of both aspect
ratios. Currently implementation of resizing only uses techniques like cropping and
scaling with no regard for image content.
The seam carving algorithm is fairly straightforward with some intricacies for
implementation. It begins by converting an image into an energy image with what they
call an energy function. The energy they refer to in the paper [Avidan and Shamir 2007]
is simply a measure of feature importance at every pixel location. Any function can be
used to do this, and test have shown that gradient magnitude
e(I) 


I
I
x
y

Page 1 of 11
works well for many cases [Avidan and Shamir 2007]. Some other energy functions
tested were L1, L2-norm of gradient, saliency measure [Itti et al. 1999], Harris-corners
measure [Harris and Santella 1988], and Histogram of Gradients.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 1: Image and its gradient magnitude image [Avidan and Shamir video 2007]
Different energy functions have been shown to be effective in different cases and produce
artifacts in others. Because the seam carving algorithm can use any of these functions, the
algorithm can be fine tuned for certain tasks, but can also incorporate future importance
metrics that are even more robust than gradient magnitude.
The next logical step is to remove the pixels of lowest energy in order to change
the size of the image.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 2: Various methods of pixel removal in order to reduce the width of the original
image (a) [Avidan and Shamir 2007]
Figure 2 shows the resulting images using various methods for image size reduction. The
(f) image is the case where the pixels of lowest energy are removed in ascending order
until the total number of pixels left in the image is the number of desired pixels for the
resized image. As each pixel is removed, the remaining pixels are shifted to the left. It is
evident that this method has no consideration for the structure of the image. The black
area to the right of the image shows the amount of low energy pixels in a particular row.
Image (e) is a slightly improved method where an equal number of the lowest energy
Page 2 of 11
pixels are taken from each row, but this still distorts the image significantly. Image (c)
removes columns with the lowest total energy in ascending order. Clear breaks in the
structure can be seen where a column was removed. Seam carving (d) uses a clever
method of traversing the image starting at a top pixel (or left for vertical scaling) and
following a path of least energy.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 3: Average Pixel Energy for various pixel removal methods. [Avidan and Shamir
2007]
Figure 3 shows how the removal of pixels increases the average energy of the pixels.
Theoretically, the higher the average energy of the pixels is at a certain width, the more
important information was retained. However, as shown in the optimal and pixel images
of Figure 2, only maintaining the highest energy pixels leaves no consideration for the for
structure of the image. On the other hand, cropping and removing columns forces many
of the high energy pixels to be removed. The blue curve shows how seam carving is a
middle ground that takes into consideration the energy as well as the structure of the
image.
Creating a seam by following a path of least energy is straightforward for digital
images. Starting at any top pixel for vertical seams or left pixel for horizontal seams, the
pixel of lowest value in the three adjacent pixels (either below the starting pixel or to the
right) is chosen as the next step on the path. This is then repeated until of the other side of
the image is reached.
Page 3 of 11
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 4: Image with horizontal and vertical seam highlighted in red. [Avidan and Shamir
2007]
Figure 4 shows how this method avoids breaking up the structures in the image. The
vertical seam travels between trees in the tree line and around the rock at the bottom of
the image. The horizontal seam moves around the small tree at the bottom right of the
image. Using any top pixel or left pixel as the starting point, a seam for every column and
row can be calculated. Just like the column removal example, the seams can be ordered
from lowest to highest energy by summing all the pixels in the path, and then each seam
removed in ascending order of total energy.
Seams can also be added to the image to increase its dimensions. In this case, a set
of the lowest energy seams are found and seams are inserted next to them.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 5: Example of enlarging an image using seam carving. (a) being the original
image. [Avidan and Shamir 2007]
Image (b) from Figure 5 shows the smear like artifact created if only the lowest energy
seam is used. Therefore, a set of low energy seams is used (highlighted in image (c)). The
resulting image (d) looks just as authentic as the original image (a).
Page 4 of 11
Another unique advantage of the seam carving algorithm is its ability to remove
features in image. This is accomplished by adding negative weight to the energy image.
A user can choose this region on the image using some form of graphical user interface.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 6: Example of feature removal. [Avidan and Shamir 2007]
To generate the right image in Figure 6 from the original on the left, negative weight was
placed on the women in the image. In this case, positive weight was to the man’s energy
pixels to ensure that he was not removed from the image. This is shown in the bottom
right inset of the original image as red and green highlight. The width of the image was
then reduced until every seam containing the woman had been removed.
The field of content-aware digital image resizing is still fairly new; nevertheless
seam carving is not the only attempt at retargeting images. The major task in all these
algorithms is identifying what important features in the image are so that they can be
maintained. This mainly psychological criterion has two general approaches: top down
and bottom-up. The top down method, takes into account the internal state of the viewer
such as his/her motivation at that moment. For example, face-detecting algorithms use
people’s desire to identify other people in a scene as an important feature location [Viola
and Jones 2001]. Bottom-up methods only take into account internal queues, general to
all perception. A saliency map is an example of this. The saliency algorithm takes into
account numerous factors like color, orientation, motion, intensity, etc. to identify where
the focus of the viewer will be in an image.
A few algorithms have been developed that rely on cropping out a region of
interest. A method developed by Suh et al. [2003] uses the output of either top down or
Page 5 of 11
bottom-up metrics to locate an important feature, and then creates a smaller image by
cropping the feature out.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 7: Image and saliency map, original and cropped [Suh et al. 2003]
This method has been adapted to displaying large images on small mobile device by
Chen et al. [2003]. Liu et al. [2003] extended the method by cropping out multiple
important features in the image and displaying them over time. However, all these
algorithms are limited by the method of cropping. In Suh and Chen’s method, if the
image has multiple interesting features, only one is shown. Even with the added value of
Liu’s method, in any form of cropping, information about the background as well as how
things in the image are located relative to each other is lost. Seam carving, on the other
hand, maintains the relative positions of features in the image, and by removing seams in
order of total energy, the least noticeable/important background is removed first.
Another method developed for displaying images on various devices developed
by Jacobs et al. [2003] uses an adaptive grid method. Images as well as text are placed in
a template. The algorithm automatically adjusts the template for a certain display sizes,
taking into account the size and ratio of the grids in the template. The major downfall of
this paper is that it makes no mention of how images would be resized e.g. what if the
image in the grid is larger than the display. On top of which, it forces the user to do time
consuming preprocessing of their data (separating information into different grids),
acquire and learn software for generating a template like the one shown in Figure 8.
Page 6 of 11
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 8: Template windows document layout [Jacobs et al. 2003]
Seam carving can be done to any image and is efficient enough to be done in real time.
Just as text in a webpage can be dynamically warped to the next line when changing the
size of the browser window, images on a webpage can be dynamically resized by seam
carving [Swieskowski 2007].
Liu and Gleicher [2005; 2006] developed a retargeting method that combines
cropping and non-linear image resizing. Like the previously mentioned cropping
schemes, the most important feature of the image is found using saliency and left
unchanged. The area around that region is then scaled in a non-linear fashion to reduce
the image size.
Original
Radial/Linear
Radial/Quadratic
Cartesian/Linear
Cartesian/Quadratic
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 9: Fisheye-view warping scheme [Liu and Gleicher 2005]
Figure 9 shows the various non-linear scaling methods used for warping the background.
While this is an improvement on solely cropping, which totally eliminated the context of
the image, the distortion created by the warping of the background is unmistakable and
objectionable. In Figure 9, the Eiffel Tower is almost completely unrecognizable and the
person’s hands are completely distorted in all the scaling schemes. In Seam carving, an
Page 7 of 11
entire area is never scaled or changed at a time. For the particular original image
presented in Figure 9, the difference in intensity of the person’s face and the tower
compared to the night sky would generate seams that traveled around those two important
features, causing much less distortion.
Setlur et al. [2005] devised a non-photorealistic method of image retargeting
similar to Liu and Gleicher’s. In Setlur’s method, the exact outline of the feature of
interest is cropped out, that region is filled in with the surrounding background. The
background is resized and then the feature is placed back into the image.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZ W) decompressor
are needed to see t his picture.
Figure 10: Non-photorealistic image retargeting steps. Left to right: original image,
feature map, feature identification, feature removal, and filled in background. Bottom:
retargeted image. [Setlur et al. 2005]
The obvious draw back in this algorithm is the extreme change in perspective. In Figure
10, the objects that were far away in the sky are retargeted, making them look like giant
toys hovering right above the man. In seam carving, the relative size of the features to the
background remains unchanged for reasonable resizing. The space between the important
features is simply removed. Another drawback of Setlur’s method is that it is
computationally heavy. In order to cleanly outline and remove features, combinations of
many feature metrics must be used. Seam carving can be limited to one simple metric like
gradient magnitude for efficient image processing.
A recent algorithm proposed by Gal et al. [2006] is almost a combination of Liu
and Gleicher’s method with Setlur’s method. The algorithm is a formulation of a
Page 8 of 11
Laplacian editing technique that allows an image to be warped arbitrarily but maintains
the structure of a specified feature. The feature can be outlined using a non-rectangular
shape and remain unchanged like in Setlur’s method, and the background can be warped
in any fashion like Liu and Gleicher’s method.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Figure 11: Comparison of warping techniques. [Gal et al. 2006]
Currently this algorithm requires user defined features, but it could incorporate the
feature selection scheme used by Setlur. This, however, leaves the same problem of large
computation times being needed to properly identify and outline the feature. Also, the
local constraints of the features cannot always be met depending on the global warping of
the image. Because seam carving discretely removes pixels, it does not affect in intensity
of any of the remaining pixels.
Seam carving appears to be the most functional and ready to implement
retargeting algorithm at present. Its ability to use different energy or feature importance
metrics allows it to be robust but it can also be tailored to specific tasks. The choice of
less computationally heavy metrics and ease of calculating a seam makes it highly
efficient as well. The ability to enlarge images without noticeable distortion and remove
features are attributes that other retargeting algorithms do not have. The path followed by
each seam has obvious advantages over only being able to work with rectangular regions
of interest, like in Liu and Gleicher’s method. Because seam carving removes discrete
pixels to resize the image, no distortions or artifacts are generated from having to warp
the image. With further testing on various devices and image contents, seam carving
could make tasks like surfer in the web on a PDA much more practical and common in
the near future.
Page 9 of 11
References
Avidan, S., and Shamir, A. 2007, Seam Carving for Content-Aware Image Resizing.
ACM Transactions on Graphics, v26, n3.
Chen, L., Xie, X., Fan, X., Ma, W., Zhang, H., and Zhou, H. 2003. A visual attention
model for adapting images on small displays. Multimedia Systems 9, 4, 353–364.
Dalal, N., and Triggs , B. 2005. Histograms of oriented gradients for human detection. In
International Conference on Computer Vision & Pattern Recognition, vol. 2, 886–893.
Gal, R., Sorkine, O., and Cohen-Or, D. 2006. Feature-aware texturing. In Eurographics
Symposium on Rendering.
Harris, C., and Stephens , M. 1988. A combined corner and edge detector. In Proceedings
of the 4th Alvey Vision Conference, 147–151.
Itti , L., Koch, C. , and Neibur, E. 1999. A model of saliency based visual attention for
rapid scene analysis. PAMI 20, 11, 1254–1259.
Jacobs, C., Li, W., Schrier, E., Bargeron, D., and Salesin, D. 2003. Adaptive grid-based
document layout. In Proceeding of ACM SIGGRAPH, 838-847.
Kadir, T. 2001. Saliency, scale and image description. International Journal of Computer
Vision, v45, n2, 83-105.
Swieskowski, P. 2007. Seam Carving Demo. http://swieskowski.net/carve/. Accessed
October 2007.
Liu, F., and Gleicher, M. 2005. Automatic Image Retargeting with Fisheye-View
Warping. In ACM UIST, 153-162.
Page 10 of 11
Liu, F., and Gleicher, M. 2006. Video-Retargeting: Automating Pan and Scan. In ACM
international conference on Multimedia, 241-250
Liu, H., Xie, X., Ma, W., and Zhang, H. 2003. Automatic browsing of large pictures on
mobile devices. Proceedings of the eleventh ACM international conference on
Multimedia, 148-155.
Setlur, V., Takagi, S., Raskar, R., Gleicher, M., and Gooch, B. 2005. Automatic Image
Retargeting. In In the Mobile and Ubiquitous Multimedia (MUM), ACM Press.
Suh, B., Ling, H., Bederson, B. B., and Jacobs, D. W. 2003. Automatic thumbnail
cropping and its effectiveness. In USIT ’03: Proceedings of the 16th annual ACM
symposium on User interface software and technology, ACM Press, New York, NY,
USA, 95-104
Viola, P., and Jones, M. 2001. Rapid object detection using a boosted cascade of simple
features. In Conference on Computer Vision and Pattern Recognition (CVPR).
Page 11 of 11
Download