SEMANTICS-PRESERVING WARPING FOR STEREO IMAGE RETARGETING 1st Author 2nd Author 3rd Author 1st author's affiliation 1st line of address 2nd line of address 2nd author's affiliation 1st line of address 2nd line of address 3rd author's affiliation 1st line of address 2nd line of address ABSTRACT Due to availability and popularity of stereoscopic displays in the recent years, research into stereo image retargeting is receiving considerable attention. In this paper, we extend the tearable image warping method for stereo image retargeting. Our method retargets both the left and right image of the stereo image pair simultaneously to preserve scene consistency, and minimizes distortion using a global optimization algorithm. It is also able to preserve stereoscopic properties of the resulting stereo image. Experimental results show that our approach can preserve structural details better than stereoscopic seam carving, and protects objects better than stereoscopic traditional warping. Besides, compared to scene warping, our approach can guarantee semantic connectedness. Categories and Subject Descriptors I.4.9 [Image Processing and Computer Vision]: Applications General Terms Algorithms, Experimentation Keywords Stereo image retargeting, tearable warping, stereo image resizing 1. INTRODUCTION With the recent availability of stereoscopic displays, particularly for 3D entertainment, research into stereo image retargeting is receiving considerable attention. Stereo image retargeting aims to resize a stereo image pair to fit various stereoscopic displays such as 3D monitor, 3D television and stereo camera phone. An ideal stereo image retargeting method should be able to (1) ensure scene consistency, (2) preserve geometric/structural details, and (3) preserve stereoscopic properties between the left and right image. More specifically, scene consistency properties [2][7] include zero object distortion, correct scene occlusion and correct semantic connectedness (consistent physical contacts between objects with their environment in the retargeted image). State-of-the-art stereo image retargeting approaches can be categorized into discrete, continuous and hybrid methods. Discrete approaches [8][9] can produce impressive results for stereo images with simple background but due to its discontinuous nature, object distortion and artifacts are unavoidable for images Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference’10, Month 1–2, 2010, City, State, Country. Copyright 2010 ACM 1-58113-000-0/00/0010 …$15.00. with complex background. Continuous approaches [3][11] utilize warping to retarget stereoscopic images. Due to their continuous nature, these approaches can preserve semantic connectedness well but object distortion and distortion to structural details are unavoidable in extreme retargeting cases. Lee et al. [6] proposes a hybrid retargeting approach that segments out the objects into object layers, perform warping on the background layer and paste the objects back to the warped background to produce the retargeted image. This approach ensures objects protection and better preserves structural details but does not guarantee semantic connectedness between a segmented object and its background. To better preserve scene consistency while minimizing artifacts and preserving stereoscopic properties, we propose a novel stereo image retargeting technique based on the tearable image warping technique [7]. Conceptually, in tearable image warping, an object’s boundary is divided into tearable and non-tearable segments. Tearable segments correspond to where depth discontinuity occurs, and these segments are allowed to break away from its environment, thus allowing the background warping to be distributed more evenly, leading to reduced distortion of structural details. Non-tearable segments correspond to the object boundary that has actual physical contacts with the environment or other objects. These segments help to preserve semantic connectedness by constraining the object to maintain the real contacts in the 3D world. This approach is able to preserve semantic connectedness very well. However, it is designed for single image and thus, unable to preserve the stereoscopic properties of a retargeted stereo image. The main contribution of this work is to extend the tearable image warping method for stereo image retargeting. The proposed method retargets both the left and right image of the stereo image pair simultaneously to preserve scene consistency, and minimize distortion, through a revised-optimization algorithm. In addition, it successfully maintains consistent stereoscopic properties of the resulting stereo image pair to avoid visual discomfort during viewing. Besides being able to maintain stereoscopic consistency and guarantees semantic connectedness, experimental results show that our approach can preserve structural details better than stereoscopic seam carving, and protect objects better than traditional stereoscopic warping. 2. RELATED WORK Techniques for stereo image retargeting can be categorized into discrete, continuous and hybrid approaches. Discrete methods include seam-carving [2] and patch-based [8] approaches. As with its 2D counterpart, the geometrically consistent seam carving [2] approach for stereo image retargeting produces good results for non-complex images but fails in complex images and extreme retargeting cases. The shift-map [8] based approach characterizes geometric rearrangement of a stereo image pair. It can preserve the foreground objects well but does not consider the preservation of semantics features like ripples, shadow, symmetry, texture, etc. Continuous approaches generally utilize warping to retarget a stereo image pair. Niu et al. [2] extend traditional image warping to stereo images with the objective to preserve prominent objects and 3D structure. Chang et al. [3] warping-based approach aims to avoid diplopia (double vision) by optimizing two stereoscopic constraints; vertical alignment for avoiding vertical artifact and horizontal disparity consistency. However, these warping-based approaches cannot avoid object distortion in cases of extreme image retargeting. In addition, the stereoscopic quality of the retargeted images could be reduced because the whole image is warped in a continuous manner, thus disallowing proper occlusions and depth discontinuity to be created. To address the limitations of the above warping based approaches, Lee et al. [6] proposes scene warping, a hybrid approach that decomposes the input stereo image into several layers according to the depth order and each layer is warped according to its own mesh deformation. The warped layers were then composited together according to depth order to get the retargeted image. This method produces better stereoscopic quality and ensures object protection, but it cannot guarantee the preservation of the semantic connectedness between a foreground object and its background environment, such as shadow and ripples. The tearable image warping method [7] is a hybrid technique that was designed to reduce distortion inherent to traditional warping while preserving scene consistency by simultaneously protecting objects, ensuring correct depth order of objects and maintaining consistent semantics connectedness between objects and their environment. This technique performs well for single image retargeting, even in extreme retargeting cases. However, as it is not designed to handle stereo images, it cannot preserve the stereoscopic properties when applied on stereo image pair, leading to 3D viewing discomfort. 3. OUR APPROACH Our approach extends the tearable warp algorithm [7] for single image retargeting to the stereoscopic domain. The challenge lies in maintaining the stereoscopic properties of the retargeted stereo image pair and ensuring that the object extraction process for both the left and right image is trivial and coherent. As in the scene warping approach [6], in our tearable warp based approach, an input stereo image pair is first separated into the background and object layers. Then, the background layers are warped and the objects are pasted back onto the background layers to produce the retargeted results. The core difference in our approach is that for each object, an object handle that represents the connection between the object and its background is defined. During the warping process, we constrain the object handles to be kept as rigid as possible and in the final image compositing phase, we ensure that the object is pasted onto the warped background to coincide with the object handle. This technique guarantees the preservation of semantic connectedness. In the following sub-sections, we provide a detail description of our algorithm in three main steps; (1) pre-processing, (2) warping and (3) image compositing. 3.1 Pre-processing In the pre-processing phase, given a stereo image pair, we first compute the disparity map using the sum of absolute difference (SAD). The disparity values should be preserved in the retargeted image pair for ensuring comfortable 3D viewing experience. Next, we provide a simple, semi-automatic interface, powered by Grabcut [4] to allow users to select objects on the left image and define their respective handles with just a few clicks. The Figure 1: (top left) Original left image (yellow lines: object handles), (top right) object segments, (bottom left) inpainted left background layer and (bottom right) disparity map. corresponding object segments and object handles in the right image are then automatically inferred from the left image based on the disparity map. Exemplar based inpainting [1] is then used to fill up holes in the background layers. Figure 1 illustrates the disparity map for a given stereo image pair as well as the object segments, their respective object handles and the inpainted background layer for the left image. We can observe that although the inpainting result is imperfect, the artifacts are not shown in the retargeted results illustrated in Figure 3. 3.2 Warping Next, we perform triangle-based warping on the inpainted background layer pair to retarget it to the desired target size. Let L denote the left image and R denote the right image. Given the source left and right triangle meshes, ML and MR and the respective object handles, the warping process is the problem of mapping ML and MR to their target meshes, ML’ and MR’. During the warping process, we aim to preserve the stereoscopic properties and keep the objects handles as rigid as possible. The warping energy attempts to minimize a set of errors that consists of the scale transformation error, the smoothness error and the stereoscopic quality error, subject to a set of constraints. 3.3.1. Scale transformation error For each triangle t ο MLοMR , we constrain the transformation to non-uniform scaling [7][10], denoted by, πΊπ‘ = ( ππ‘π₯ 0 0 π¦ ). ππ‘ (1) The scale transformation error, πΈπ€ is defined as, πΈπ€ = ∑π‘∈ππΏοππ π΄π‘ βπ½π‘ − πΊπ‘ β2πΉ (2) where, π΄π‘ is the area of triangle t, β. β2πΉ is the Frobenius norm, and π½π‘ is a 2x2 Jacobian matrix that represents the linear portion of the affine mapping that maps a triangle to its corresponding triangle in MLοMR . 3.3.2. Smoothing error The smoothness error tries to avoid discontinuity by minimizing the scale difference between neighboring triangles, πΈπ = ∑ π ,π‘∈ππΏοππ π , π‘ πππ ππππππππ‘ where π΄π π‘ = (π΄π + π΄π‘ ) / 2. π΄π π‘ βπΊπ‘ − πΊπ β2πΉ (3) 3.3.3. Stereoscopic quality error To ensure the stereoscopic properties are preserved, we minimize the change in two stereoscopic properties [6]; (1) disparity between input and output stereo mesh pair, and (2) vertical drift between left and right output meshes. Let πΉ = {(πππΏ , πππ )} denote the set of corresponding points in the disparity map where πππΏ and πππ are corresponding points of the input meshes, ML and MR. Let πΉΜ = {(πΜππΏ , πΜππ )} denote the set of corresponding points of the output meshes, ML’ and MR’. The stereoscopic quality error is then defined as πΈπ = ∑ (πΈπ· (πΜππΏ , πΜππ )) + πΈπ (πΜππΏ , πΜππ ) (πΜππΏ,πΜππ )∈πΉΜ not move out of the image boundary. (4) 3.3 Image Compositing where πΈπ· promotes disparity consistency and πΈπ ensures zero vertical drift. πΈπ· (πΜππΏ , πΜππ ) = ((πΜππ [π₯] − πΜππΏ [π₯]) − (πππ [π₯] − πππΏ [π₯]))2 πΈπ (πΜππΏ , πΜππ ) = (πΜππ [π¦] − πΜππΏ [π¦]) Figure 2: Preservation of disparity consistency: (left) Original stereo image, (middle) 20% horizontally reduced stereo image without preserving stereoscopic properties by tearable warping method, and (right) our approach. Yellow boxes highlight inconsistent disparity. 2 (5) (6) where [x] and [y] extracts the x-coordinate and y-coordinate of the particular point. 3.3.5. Total Error The total warping energy function E can be formulated as the weighted sum of the scale transformation error, smoothness error and the stereoscopic quality error, (7) πΈ = πΌπΈπ€ + π½πΈπ + πΎπΈπ where α , β and γ are weights. 3.3.5. Constraints To preserve semantics connectedness, we define the handle shape constraint [7] to rigidly preserve the shape and orientation of the object handles. In addition, the image boundary and object boundary constraints were added to ensure that the original image boundary remains on the boundary and user-defined objects do At this stage, all objects in both images will be scaled to its respective scale factor ππ and inserted to coincide with its corresponding object handle in the warped background image. The sequence of the insertion of an object to the background image follows the depth order acquired from depth map. 3.4 Optimization Details We use the CVX Matlab toolbox to find the solution to the convex quadratic function defined in Eq. (7). Weights for the total energy; α, β and γ are set to 1, 0.5 and 0.1 respectively. The handle shape and image boundary constraints are set as hard constraints while the object boundary is set as inequality constraint. 4. RESULTS AND DISCUSSION The results in this paper were generated on a desktop with Intel i7 CPU 3.40GHz and 12GB memory. Excluding the time taken for inpainting which is performed at the pre-processing stage, our algorithm produces a retargeted result in about 3.5s for a stereo image pair of resolution 640 x 480. From Figure 2, it is obvious that the tearable image warping approach designed for single image [7] failed to preserve the disparity consistency. On the other hand, our algorithm successfully minimized the variance of disparity between the original and retargeted stereo images. Therefore, our results do not cause visual discomfort to the user Figure 3: Extreme Retargeting Results - reducing width / height by 40%. (column 1) Input left image with object handle in yellow, (column 2) our method, (column 3) stereoscopic, traditional warping [10], (column 4) stereoscopic seam carving [9]. (column 5 and 6) 1st row: input left image, 2nd row: our approach, 3rd row: traditional warping [10] and and 4th row: stereoscopic seam carving [9]. Figure 4: Resized image using our method can better preserve the geometric structure of the fence (top) left image of our results (bottom) left image of results of traditional warping. during 3D viewing of the retargeted stereo image. Figure 6: More results - reducing width by 20% (column 1 and 3) input stereo image and left image with object handle marked in yellow and (column 2 and 4) stereo image and left image of our method. We compare our method with stereoscopic traditional nonhomogenous warping [10], stereoscopic seam carving [9] and scene warping [6]. Results in Figure 3 shows that our approach can better preserve foreground objects and structural details than seam carving [9] and traditional warping [10] approaches in extreme retargeting cases. Severe object distortion / compression can be observed for most results of seam carving and traditional warping whereas our results achieve zero object distortion. Figure 4 illustrates that our approach can preserve geometric structures better than non-homogeneous traditional warping. This is not surprising as our approach needs to preserve only the object handle shape, whereas in the traditional warping approach, the whole area that contains the object needs to be preserved, leaving less space for compression during warping. able to preserve the stereoscopic properties in the retargeted images and compares favorably with existing methods. Since the warping process does not involve the foreground objects, our method ensures object protection and produces less severe compression compared to traditional warping methods, particularly in extreme retargeting cases. The core strength of our method is in its ability to guarantee the preservation of semantic connectedness between an object and its background without distorting the objects, while preserving the stereoscopic properties of the stereo image. Drawbacks of our approach include the need for the user to assist in segmenting the objects and potential artifacts due to poor inpainting. To ease user input, our algorithm automatically propagates the object segments in the left image to the right image using disparity information. Next, we compare our results with scene warping [6]. Due to adopting the similar layer-based approach, both scene warping and our approach can avoid object distortion and preserve structural details better than traditional warping and seam carving. In addition, as illustrated in Figure 5, both of these approaches can allow overlapping based on depth order and thus, can support extreme image retargeting. However, our approach should preserve semantic connectedness better than scene warping. Our optimization algorithm constrains the object handle to be as rigid as possible and thus it can guarantee the preservation of the semantic relationship between an object and its background, as shown in the perfect shadow connection in our retargeted results in Figure 6. We could not show the result of scene warping for comparison on preservation of semantic connectedness due to unavailability of their source code. But, based on theoretical analysis, the shadow may not be well-preserved by scene warping as there is no energy or constraint that guarantees the preservation of semantic connectedness. 6. REFERENCES 5. CONCLUSION This paper has successfully extended the tearable image warping method [7] to perform stereoscopic image retargeting. This proposed method retargets both the left and right image of the stereo image pair simultaneously using a revised, global optimization algorithm. Experiments show that our approach is Figure 5: Reducing width by 40%. (left) Original left stereo image, (middle) our method, and (right) scene warping [6]. [1] A. Criminisi, P. Pérez, and K. Toyama, "Region filling and object removal by exemplar-based image inpainting", IEEE Trans. Image Processing, 13, 9 (2004), 1200–1212. [2] A., Mansfield, P. Gehler, L., Van Gool, and C., Rother, Scene carving: Scene consistent image retargeting. In Proceedings of ECCV Conference (Crete, Greece, 5-11 September 2010), 143-156. [3] C.-H. Chang, C.-K. Liang, and Y.-Y. Chuang, "Content-Aware Display Adaptation and Interactive Editing for Stereoscopic Images", IEEE Trans.on Multimedia., 13, 4 (Auguest 2011), 589– 601. [4] C. Rother, V. Kolmogorov, and a Blake, "GrabCut - Interactive foreground extraction using iterated graph cuts", ACM Trans. on Graphics, 23, 3 (August 2004), 309-314. [5] D. Conge, M. Kumar, and R. Miller, "Improved seam carving for image resizing", In Proceedings of theIEEE Workshop on Signal Processing Systems Conference (San Francisco, USA, 6-8 October 2010),345-349. [6] K.-Y. Lee, C.-D. Chung, and Y.-Y. Chuang, "Scene warping: Layer-based stereoscopic image resizing", In Proceedings of theCVPR Conference (Providence, Rhode Island, USA, June 2012), 49–56. [7] L.-K. Wong and K.-L. Low, "Tearable Image Warping for Extreme Image Retargeting,"In Proceedings of theComputer Graphics International Conference (Bournemouth, U.K., June 2012), 1–8. [8] S. Qi and J. Ho, "Shift-Map Based Stereo Image Retargeting", In Proceedings of the ACCV Conference (Daejeon, Korea, November 2012),457–469. [9] T. Basha, Y. Moses, and S. Avidan, "Geometrically consistent stereo seam carving", In Proceedings of theICCV Conference (Barcelona, Spain, August, 2011),1816–1823. [10] Y. Jin, L. Liu, and Q. Wu, "Nonhomogeneous scaling optimization for real time image resizing", Visual Computer, 26, 6–8 (April 2010), 769–778. [11] Y. Niu, W.-C. Feng, and F. Liu. Enabling warping on stereoscopic images. ACM Trans. on Graphics, 31, 6 (November 2012), ArticleNo. 183.