Brian Hook Kyle Halbach Mike Treffert Semi-Automated Joiners Figure 0: Joiner by David Hockney Abstract Joiners – see figure 0 – have rapidly gained popularity ever since artist David Hockney introduced them and coined the term “joiners.” By simply going to an image sharing site such as Flickr and searching for joiners, one can see the immense popularity that this artist style has experienced in the digital age. This paper focuses on the process of semi-automating the process of creating joiners in order to create a preliminary joiner. This preliminary joiner would be used by an artist to determine whether or not that is the ordering of the images they would want to do in their actual joiner. Introduction The idea of a joiner came about from the fact that a single viewpoint of a scene may not fully capture the scene or represent it as is desire. A joiner is an image that is generated by combining multiple images of the same thing. This is done by layering images taken from the same object or scene at various viewpoints. Since images from different viewpoints are used, the end result can give a sense of the 3d view of the object or scene. While joiners can be very appealing, the time that it takes to generate them is not. For simple joiners created using photographs, it can take anywhere between 30 minutes to an hour to create an appealing joiner which contains less than 20 photographs. However, with advances in computational image processing, this time can be reduced by executing a program that generates a joiner for the user. Even though computer generated can be very appealing, they are often not what the user desires. However, in this case, computer generated joiners are still useful because they give the user a basic starting point for their rendition of the joiner. Motivation The basic motivation behind joiners is the fact that there is an easy way to smoothly piece together images that are taken from a single viewpoint. This method is called panorama generation. However, the methods used to generate panoramas cannot be used to piece together photos taken from multiple viewpoints. Joiners set out to solve this problem by piecing together images from multiple viewpoints. The motivation for this project came from a paper by Lihi Zelnik-Manor and Pietro Perona called “Automating Joiners.” In the paper the authors explain their method of fully automating the joiner creation process. More about this paper will be discussed in the following section. Another motivation 1 Brian Hook Kyle Halbach Mike Treffert for this project was an implementation of Zelnik-Manor and Perona’s algorithm by Ekapol Chuangsuwanich [2]. In this implementation, Chuangsuwanich implemented the algorithm laid out by Zelnick-Manor and Perona but also added in a slight twist to the algorithm by allowing users to select the ordering of the images in the joiner. “Automating Joiners” Summary This paper details an algorithm that the author’s used to automate the process of generating joiners. This algorithm had the following four main steps: feature detection, global alignment, ordering and iterative refinement. The first step of their algorithm used the SIFT feature detector to find corresponding points between the set of input images. Once the authors obtained a set of feature points between each image, they then moved onto the second step of their algorithm. This step, global alignment, can be explained in two sub-steps: RANSAC and solving for similarity matrices. The first sub-step involved executing the RANSAC algorithm to throw away feature points that were false positive matches between each pair of input images. The second sub-step involved them computing similarity transforms that would map each set of feature points in one image to the corresponding features points in another image. A similarity transformation is a transformation that can rotate, resize, and translate the image all while maintaining shape. This transformation is used to avoid distorting the images in the joiner. In the paper, the authors briefly explain their method of computing the transformations as solving a specific optimization problem. The optimization problem that they set up uses weighting each feature point differently which allows them to give more precedence to specific points when needed. The third step of their general algorithm involves ordering the images in the outputted joiner. This step is essentially determining which image goes on top of which other images. In the paper they explain two different method of layering the images. The first method is a Photoshop-like method which treats each image as a layer and you order these layers. An example of this method would be A over B, B over D, and D over C. The second method allows for choosing a layering in each region of overlap. This method allows for overlaps such as A over B, B over C, and C over A. In order to determine the ordering of the images, the paper described three approaches. However, out of these three approaches they found that only two of them were very successful. These two methods were minimizing the gradient at the boundary between two overlapping images and minimizing the sum of color difference between the two images. While the paper did state that both of these methods worked, it did note that the minimization of the gradient appeared to work better. The final step of the paper’s algorithm was the iterative refinement step. The purpose of this step is essentially to fix the boundaries of overlap in the outputted joiner. The reason that the boundaries need fixing is because the layering of the images occludes certain artifacts at the boundaries. This can create inconsistencies at the boundaries. For example, a pole in one image may be 10 pixels to the left at the boundary between two images. The approach that the paper took to solving this was to weigh feature points near these boundaries more heavily and weigh the feature points further from these boundaries less. Their algorithm would the recomputed the similarity matrices and remap the images onto the joiner. Finally, the authors discussed the idea of blending in the joiner. Although, it was not a step in their algorithm, the authors stated that there was the potential to implement blending at the end of their algorithm. The purpose of this would be to hide the seams that would still exist after iterative refinement. This has the potential to make a joiner seem more realistic. 2 Brian Hook Kyle Halbach Mike Treffert Other Related Work A work that is related to joiner creation is the creation of panoramas. In panorama creation, the objective is to piece together multiple images taken from a single viewpoint of a scene to recreate a representation of the scene. Since all of the images are taken from a single viewpoint, the images can be smoothly pieced together by being projected onto a specific output plane using prospective projection. However, these same ideas do not apply to joiner creation because joiners allow for pictures from multiple viewpoints. This creates that unique case where things that are occluded in one image may not be occluded in another. This is not the case in panoramas because all images come from one viewpoint where what is occluded in one image will be occluded in another image – assuming a stationary scene. Problem Statement To implement an application that semi-automatically generates preliminary joiners using photos from multiple viewpoints. The output of this application will not be a true joiner but rather a preliminary joiner that will give the artist a general idea or starting point for the true created joiner. Method For our method we implemented a hybrid algorithm based on the algorithm in “Automating Joiners.” For the first step of our algorithm we used the SIFT feature detector [3] to obtain matching feature points between a set of images. For this step, we assumed that the images in the source directors were in order of overlap. This means that we assumed that image one overlapped with image two, image two overlapped with image three and so on. For the second step of our algorithm, we used RANSAC to compute the “best” similarity transformation. To do this, we used six random points out of a matching feature point set and used the built in Matlab function cp2tform to compute a non-reflective similarity transformation between the six points. Once we obtained that, we checked to see how many of the feature points this transformation mapped correctly. If it was the best similarity transformation we had seen so far, we kept it. Otherwise, we moved on to the next iteration of RANSAC. We iterated through RANSAC one hundred times because we felt like that was sufficient enough to obtain a similarity transformation that properly mapped all the feature points. For the next step of our algorithm, we selected a reference image and computed the similarity transformations that mapped each image to the reference image by creating a combination of all similarity matrices between the current image and the reference image. Once we obtained these transformations, we then mapped each image onto its own output-sized image plane. For the layering step of the algorithm, we chose not to go with an automated method of layering. Instead of minimizing the gradient across the edges, we chose to accept user input about the desired layering of the images. The main reason that we went with this method was because we are not implementing an application that creates a true joiner but rather a tool that an artist could use to see what their joiner may look like if they layered it in such a manner. Another reason for this approach is that using the “best” ordering according to minimizing the gradient may not have the desired look that the artist was aiming for. For example if the artist desires to have one photo in the center that contains writing or a specific object that they don’t want occluded by any overlapping images, then this image should be the very top layer. However, the “best” ordering may cause there to be an image that partially overlaps with that text or object. See images 1a and 1b for examples of such a case where this might happen. 3 Brian Hook Kyle Halbach Mike Treffert Image 1a: The top layer is the image that has the text on the wall. This makes it so the text is completely readable and none of it is occluded form the joiner. Image 1b: The text on the wall is a combination of two overlapping images. In this image the text is not as readable as it is in image 1a. Limitations A big limitation to our application is the fact that we do not implement bundle adjustment to determine which images overlap with which. This fact is what causes us to make the assumption that the input images are ordered in the directory based on order of overlap. Due to this limitation, the quality of our output is not as good as if we did do bundle adjustment and computed similarity transforms for images based on what images a specific image overlaps with. Another limitation of our implementation is the fact that we didn’t implement iterative refinement. During the implementation phase of our project, we made five different attempts at implementing this step of the algorithm. However, each of the implementations seemed to make the output worse than it was without this step implemented. The way that we attempted to implement it was to select the N closest points to the overlap edges in the joiner and use those points to recalculate the similarity transformations. The reason we had to do this instead of weighing the feature points like the paper did was because the way we computer similarity transforms did not allow us to weigh feature points. Our justification behind trying to use this method is that the feature points along the edges of the overlap boundaries are the most important points because matching these points will fix discontinuities between images in the joiner. Results Even without the iterative refinement step implemented, we were still able to get some decent results. Figures 2, 3 and 4 are good results that we got using our application. 4 Brian Hook Kyle Halbach Mike Treffert Figure 2 Figure 3 5 Brian Hook Kyle Halbach Mike Treffert Figure 4 Clearly for all of these examples, there are still discontinuities along the edges because there is no iterative refinement to clear them up. However, since we did our layering based on user input instead of computing the “best” layering, there were many failures if the user did not pass in a good ordering. Figures 5 and 6 are examples of such cases. Figure 5 6 Brian Hook Kyle Halbach Mike Treffert Figure 6 Concluding Remarks In conclusion, even with the two major limitations that our problem contains, we were still able to get decent joiners to be generated. Considering the intent of our application is to just give the artist an idea of how they should go about creating their joiner, we see this as the desired result. Since we got bad results if we did certain ordering, this is actually what we want because this would tell the user of our application that when they create the actual joiner, they shouldn’t layer their images that way. Even though we believe that our application meets its purpose, there are many aspects that we would like to improve upon in the future. One of these is to implement some sort of bundle adjustment in order to eliminate the assumption that images are given in the order of overlap. This would allow us to compute better joiners because similarity transforms could be combinations of similarity transforms that map each image to all the other images. This is in contrast to our method which only creates a similarity transform that accounts for the mapping from one image to one other image. Another improvement that we would like to make on our application is getting the iterative refinement step to work. This step was only excluded from this version of our application due to the fact that we were constrained by time. We feel like if we had enough time, we could get this step functioning properly. This would greatly improve the output of our application. This would also allow our application to generate actual joiners instead of just preliminary joiners. A final improvement that we hope to do is add a user interface that is easy and simple to use. Currently we just have everything implemented as a Matlab function which takes the ordering of the images as a parameter. Our original intention was to have a website user interface. However, that became infeasible as we decided to focus on different aspects of our project (ie we focused more on photomosaic generation). We feel like a user interface would be beneficial to our application because it would allow the user to graphically manage the images 7 Brian Hook Kyle Halbach Mike Treffert that they are using. It would also allow the user to easily reorder the images as well as preview how each image might impact the result. References [1] Zelnik-Manor, Lihi and Perona, Pietro. “Automating Joiners” http://www.vision.caltech.edu/lihi/Demos/AutoJoiners.html [2] http://www.cs.cmu.edu/afs/andrew/scs/cs/15-463/f07/proj_final/www/echuangs/#Readjustment [3] SIFT Feature Detector http://www.cs.ubc.ca/~lowe/keypoints/ 8