PuzzleSolver An application of computer vision. Joe Zeimen Computer Vision Colorado School of Mines Spring 2013 Table of Contents Introduction .............................................................................................................................. 3 Previous Work .......................................................................................................................... 3 Program Implementation ..................................................................................................... 4 Assumptions ....................................................................................................................................... 4 Input ...................................................................................................................................................... 4 Output ................................................................................................................................................... 5 PuzzleSolver Program ..................................................................................................................... 5 Scanning and finding the pieces ............................................................................................................... 5 Filtering ............................................................................................................................................................... 6 Finding Contours ............................................................................................................................................. 7 Finding the corners ........................................................................................................................................ 8 Extracting Edges ........................................................................................................................................... 10 Normalizing Edges ....................................................................................................................................... 10 Classifying edges ........................................................................................................................................... 11 Comparing Edges .......................................................................................................................................... 12 Assembly Algorithm ...................................................................................................................... 13 Algorithm Implementation ...................................................................................................................... 15 Creating the output image ........................................................................................................................ 18 Runtime ............................................................................................................................................. 19 Testing ............................................................................................................................................... 19 Limits of the program and possible improvements ........................................................... 20 Accomplishments ........................................................................................................................... 20 Obtaining Source Code ................................................................................................................. 21 References .............................................................................................................................. 21 Appendix ................................................................................................................................. 22 Appendix A: Solved Puzzles ........................................................................................................ 22 Appendix B: Source Code ............................................................................................................. 28 main.cpp ........................................................................................................................................................... 28 edge.cpp ............................................................................................................................................................ 29 edge.h ................................................................................................................................................................ 31 piece.cpp .......................................................................................................................................................... 32 piece.h ............................................................................................................................................................... 35 puzzle.cpp ........................................................................................................................................................ 36 puzzle.h ............................................................................................................................................................. 41 utils.cpp ............................................................................................................................................................ 41 utils.h ................................................................................................................................................................. 43 PuzzleDisjointSet.cpp ................................................................................................................................. 44 PuzzleDisjointSet.h ...................................................................................................................................... 47 2 Introduction This report outlines the procedure I used to solve jigsaw puzzles by way of computer vision. Essentially the goal of this project is to take a set of input images of scans of puzzle pieces, and produce one output image of an assembled puzzle. Specifically the goal of this project was to use the shapes of the pieces only, and not color information. At a very high level the algorithm for solving the puzzle works as follows: 1. Obtaining a good quality scan of the pieces. 2. Threshold images so that the pieces are white and background is black. 3. Find the boundaries of the pieces, split this up between the “sides”. 4. Comparing the edges 5. Using the comparison information to find a symbolic representation of the puzzle solution 6. Taking the symbolic solution and producing an output image The algorithm above was implemented in C++ using the OpenCV library and it has successfully solved several jigsaw puzzles. Previous Work In "Solving jigsaw puzzles by computer," Wolfson et al. were able to solve jigsaw puzzles using computer vision techniques relying only on the shapes of the pieces. This is called an apictorial method. For this project the authors ended up photographing the puzzle pieces individually and obtained boundary data for each piece. This boundary data was then converted to a smoothed polygonal curve. The curve is then divided among the 4 sides and these sides are used in a local matching algorithm. The assembly part of the algorithm first solves the frame of the puzzle using the known edge pieces with flat edges. They show that this problem is NP-­‐ complete and suggest a heuristic for solving the frame pieces. Once the frame has been solved correctly, the interior pieces were placed in an iterative fashion using the matching scores of two edges [1]. This paper is probably the most useful for my task since it is an attempt to solve the same problem. "Jigsaw puzzles with pieces of unknown orientation" is a bit different in that it uses square pieces and color information to try and solve the puzzle. Think of a regular image cut up into squares and then trying to piece together the original image [2]. While this method does not rely on shape information it still relies on 4 edges per piece and tries to find the correct place and orientation using a cost function. This is a much more robust algorithm and it has recently broken records in puzzle assembly. It is a greedy algorithm so it may be possible to implement using just 3 shapes instead of colors, especially since it appears to just rely on costs between 2 edges. Extracting the edges from a binary image is straightforward, but only choosing the correct corners from the pieces, which are very shapely, can be another challenge. There are 2 approaches to this problem, one relies on finding corners in a gray scale image, and the other relies on looking at the curvature of the contour. In [3] an algorithm is described that uses the contour to find corners. In this method it ends up using distances instead of measuring the curvature to decide if a corner exists or not. The corner finding is vital, because it allows the partitioning of the contour into each of the edges, which need to be considered independently. Once the edges are extracted the similarity between different contours must also be measured. A highly cited article called "Visual Pattern Recognition by Moment Invariants" presents Hu moments [4]. These basically are ways to characterize contours; there are 7 in all. Essentially each moment gives a score to a feature the closer the score is to 0 between two contours, the more similar they are. This is what OpenCV uses internally. Some of these moments work even when the contour is of different sizes, which is not desirable in my case, because I will know the size of the contours. Selectively choosing some of these moments with properties needed for jigsaw puzzle solving may provide reasonable results. Another much simpler method called the Hausdorff distance [5], looks for the maximum of the minimums of every point compared between the two contours. An approach like this may be simpler and should be evaluated to see if it can also provide matching scores good enough for the problem. I ended up using a comparison operation very similar to the Hausdorff distance to get good results. Program Implementation Assumptions In order to reduce the complexity of the problem, the following assumptions needed to be made. The puzzle needs to be rectangular and grid-­‐like, meaning that the pieces are in rows and columns. The corners need to be sharp and well defined, with 4 corners meeting together unless it is on an edge. There needs to be only one tab or hole on each edge. Since the algorithm uses shape information it will not work on pieces that are purposefully made to be the same shape as other pieces. Input The input to this algorithm is a set of images where the background is black, the minimum distance between 2 corners, parameters on how much to filter the image to try and remove noise and finally a threshold value. 4 Output The output of the algorithm is an image of the completed puzzle. In the event that the puzzle could not be solved correctly, and the program recognizes it, it will give an error message. PuzzleSolver Program Most of the rest of this document describes from beginning to end how the puzzle solver program works starting with scanning the pieces up until the final output image is generated. Scanning and finding the pieces To get reasonable images to start with, I used a consumer flatbed scanner to scan the pieces. I did this in a dark room with the scanner lid open. This produced a very black background to aid in extracting the pieces. After this was done I could use thresholding to find a binary representation of where pieces are. I with one puzzle I attempted to use color keying. I tried to use the HSV values to isolate only puzzle pieces out of the image. I did not find this as reliable as the thresholding method. Color keying is a bit more complex and I made the decision to abandon going further with it in order to focus more time on other parts of algorithm. The thresholding approach does end up giving me workable results for several different puzzles. Color keying would necessarily need to have the background color chosen so as to be different from the colors in the puzzle. Figure 1. The result of thresholding the left image with a value of 30. 5 Filtering The images obtained from the scanner need to be filtered; there is a lot of dust and noise in the images. Along the edges of the pieces there are very small paper fibers that stick out, simply due to the nature of the material that the pieces are made of and how they are cut. The two properties I need to balance when filtering, is to reduce noise to make corner detection easier, while not disturbing the shape too much so that edge comparisons also give high quality results. While doing this project I have tried 3 different filtering techniques in order to produce better corner Figure 2. An example showing a puzzle piece; immediately after b ing converted to finding and edge matching. grayscale. Notice the dust and paper fibers sticking out from the piece. (300 dpi The first attempt was to use an open and close 476x487px) operation on the binary images of the backs of the puzzle pieces. For this I used a 3x3 disk-­‐structuring element. This eliminated the white specks of dust in the binary image, removed all black dots inside of the pieces, and removed some of the noise generated by the thin fibers sticking out. This however did not work for all of the puzzles. For example, in the puzzle piece depicted in figure 2, using a threshold of 30, then performing the opening and closing I get the image in figure 3. This has disconnected most of the fiber poking out, but there is another tiny connected fiber at the “neck” of the tab at the top. I found this to be pretty unreliable in terms of how consistent it was with different puzzles. I found myself needing to adjust the size of the structuring element for every puzzle I tried to solve, with some never fully working. The next approach I used was a standard Gaussian blur. The blur was done on the original image, Figure 3. After thresholding at 30 and opening a nd closing using a 3x3 disk-­‐ converted to gray scale, and then a thresholding structuring element. operation was applied. This made the edges less noisy, but did not eliminate large anomalies like the paper fiber poking out. The more blurring that was done the rounder the sharp corners got, making the corner detection harder. When blurred, as the thresholding value went up, the pieces got slightly smaller. This would mean that the holes in the 6 pieces would get bigger and the tabs smaller, resulting in poorer edge comparison results. Figure 4. Left, A threshold of 30, right 70. Figure 5. Gaussian blur, 15x15 square, sigma = 3. Figure 6. Median blur, K=7 The images above depict the results of the Gaussian blur. Although it is hard to see in this picture, the puzzle piece on the right in Figure 4, is a bit smaller than the one on the left due to its higher threshold. The most reliable filtering method I use is a median filter, also known as a despeckle filter. A median filter considers the KxK pixel box surrounding a pixel. The median value from all of the pixels in that box is used as the value for the pixel in question. If you have a very speckled image, say a black background with white spots, the dark background is most likely going to be the median value in the area, so all of the white spots will be replaced with the black value. Using this median filtering technique the quality of results of the edge matching made four out of five of the puzzles solvable. The one that didn’t work could still be solved using the opening and closing technique. Figure 7. Median filtered image followed by a threshold of 30. Finding Contours After a quality binary image representation of the pieces is obtained, individual pieces can be found by using OpenCV’s findContours() function. A contour is 7 simply a representation of a curve. The findContours() method takes in a binary image and using a border following technique returns a vector of contours. The algorithm used for OpenCV’s implementation is described in [6]. The contours returned are represented as lists of points going in counter-­‐clockwise order around the connected regions. I stayed with this convention throughout the project so everything that has to do with rotations or outer edges is expressed in a counter clockwise fashion. This means that if you were walking along the contour, the puzzle piece would be on your left the entire time. Some extra noise is also eliminated in this step. The size of the contour is an easy way to determine if it is close to the expected size of a puzzle piece. If it is a lot smaller than the expected size, we can safely ignore the contour, because it clearly does not represent a piece. If it is that small it is probably just some dust from the scanner. The contour will help define a bounding box surrounding the piece, using the min and max of the x and y values of all of the points. A 15-­‐pixel border is added to these values and a cropped version of the original color image is obtained. Using the contour I can then draw a new version of the piece with a filled in contour. This will act as the mask that I use when generating the output image of the solution. It will also be used to find the corners of the piece. This image has the advantage of containing exactly one piece and not pieces of others that are possibly partially inside the bounding rectangle of the contour. This mask has the same dimensions as the cropped color image. All of this information is stored in the piece class in my C++ code. These images are passed into the constructor of the piece class, which will then process the images and contours further. Finding the corners Finding the corners of the puzzle pieces is essential in providing places to split up the contour representing the piece into the 4 edges that make up the piece. It also provides known points that will match with other puzzle pieces making it easy to create a transformation matrix to place each piece in the final output image. To find these corners I relied on OpenCV’s goodFeaturesToTrack() function. I specifically opted for the Harris corner detector. Described in the OpencCV documentation, “for each pixel (x,y) it calculates a 2x2 gradient covariance matrix M(x,y) over a blockSize by blockSize neighborhood. Then it computes the following characteristic: 𝑑𝑠𝑡 𝑥, 𝑦 = det𝑀(!,!) − 𝑘 ∙ tr𝑀 !,! ! ” [7]. The good features to track can extract the local maxima from this matrix and return a list of points of all of the found corners. To find the correct corners using the properties of puzzle pieces will help. The corners are very roughly around the same 8 distance from each other. Given that we can estimate the closest two corners can be together we can pass this value in as the parameter minDistance to the function. With this restriction this function will only return the best corners that are at least minDistance away from each other. goodFeaturesToTrack() can also take in a quality parameter, qualityLevel. This parameter is a value between 0 and 1. 0. 1 would only allow the best corner. A value of 0.5 would take the strongest corner, multiply its strength by 0.5 and any other corner with a value higher than that would also be included in the output. It is near impossible to pick a value for quality that will work correctly for every piece. Therefore it is necessary to perform a binary search, adjusting the qualityLevel until exactly 4 corners are found. These corners, although close to the correct locations they are not close enough. OpenCV also has a function to refine these corners to a much more precise location called cornerSubPix(). As I have experimented with this algorithm, it can become very hard to find the correct corner when the pieces become small. The roundness of the corner seems to stay about the same for any size puzzle piece. So as the pieces get smaller, the corners do not get sharper and eventually some of the holes and tabs are about the same sharpness as the corners. This is the biggest issue I have found in trying to go to puzzles with more Figure 8. The left image was used to find the corners, the right pieces. Poorly cut pieces that have image shows the corners marked on the original color image. anomalies also can sometimes cause a false corner to be detected so it is important to make sure those are fixed before going in the scanner. Incorrectly finding the corners may not always result in an unsolvable puzzle but can significantly affect the quality of the output image. For example, see the figure 9. The incorrect corner affected the other anchor points used in the affine transforms to produce the final output image. 9 Figure 9. One misclassified corner in the left image, the resulting solution image on the right. Extracting Edges After the outer contour of the piece is obtained, and the corners are found, the 4 sides of each piece, which I call edges or sides interchangeably, can be extracted. This is important so that edges can be compared independently from each other. The contour is a vector of points, going in counter clockwise order around the piece. To split it up the closest point to each of the found corners in the previous step will be used as beginning and end points. For each corner, the closest point in the contour is found. The corners are then set to these newer even more refined values. The beginning and end of the contour is almost always between two corners. Using the std::rotate() function the contour can be rotated so that the starting point of the contour also corresponds to a corner. Next the indices of the next 3 corners are found inside of the contour. These correspond to where the contour can be cut. They also give the order the corners go in, in counter clockwise order. At this point each piece has individual edges listed in counter clockwise order as well as the corners listed in counter clockwise order. Edge 0 goes from corner 0 to corner 1, edge 1 goes from corner 1 to corner 2 and so on for each edge. Because the edges are going in counter clockwise order, if you walk from the beginning of one edge to the end, the puzzle piece is on the left. These contour pieces are passed into the edge constructor for additional processing. Normalizing Edges To aid in classifying edges, and later comparing edges, I create modified versions of the edge contour. For this I line up the endpoints along the y-­‐axis, and the beginning point is adjusted to be at the origin. The endpoint will be on the positive y-­‐axis down below the x-­‐axis. To do this a rigid transform is used. If we let a be the start of the edge, and b the end of the edge. c=a-­‐b, c is a vector pointing from a to b. The angle that the edge will need to be rotated is θ = cos !! (𝑐. 𝑦/ 𝑐 ). If c.x is less than zero, 10 then – θ is used. To find the new point locations each point goes through the following transformation: cos 𝜃 −sin 𝜃 −𝑎. 𝑥 𝑒𝑑𝑔𝑒[𝑖]. 𝑥 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑒𝑑𝑔𝑒[𝑖] = sin 𝜃 cos 𝜃 −𝑎. 𝑦 𝑒𝑑𝑔𝑒[𝑖]. 𝑦 Now to compare this with another edge, we will need this normalized edge flipped by 180 degrees. Since I had already written the code to normalize the edge in the way above, I just reversed the original edge vector and computed a new reversed_normalized_edge. In the reversed_normalized edge if you walk from beginning to end, the puzzle piece is actually now on the right. Classifying edges Now that we have a consistent representation for each edge, the edges can be classified as outer edges, tabs, or holes. This is needed so that upon comparison the program can immediately avoid comparisons of pieces that obviously don’t go together, which would be anything other than a tab and a hole going together. To decide if the edge is an outer edge, the length of the contour representing the edge can be found using OpenCV’s arcLength() function. This function returns the total length of the contour; this is then compared against the distance between the beginning and end points of the edge. If the contour length is less than 1.3 times the distance between the beginning and end, it is classified as an edge point. I empirically found this to work well by changing the value and looking at its classified edges to make sure that they were correct. To classify the piece as a hole or an edge, I use the fact that the normalized edge is lined up with the y-­‐axis. The minimum and maximum x values along the entire contour can be compared. If the absolute value of the max value is greater than the absolute value of the minimum value, then the piece is classified as a tab, otherwise it is classified as a hole. These types are specified in as an enumerated type. Figure 10. Three edges and the classifications assigned to them, the puzzle piece would be to the right of each one. 11 Comparing Edges The purpose of edge comparison is to give a numerical value to describe how well 2 edges fit together. This can also be thought of as a cost; incorrectly matched edges should cost more than correctly matched edges. In that scenario the total cost of the assembled puzzle should be the minimum. OpenCV has an implementation of finding Hu moments in its compare_shapes() function. I found this to produce unsatisfactory results, my best guess as to why this is, is because some of the Hu moments are size and orientation invariant and at a high level the pieces look very similar so not taking size into account could make two incorrectly matched pieces more similar than they actually are. There are quite a few restrictions I have put onto the puzzle to make this edge comparison task easier. The biggest being that each piece needs 4 edges, and that 4 corners meet together unless they are part of the frame. This means that the beginning and ending of one edge would correspond to the ending and beginning of another edge. The pieces are also scanned in at the exact same resolution; so a pair of correctly matched edges would have the same length. I constructed my own comparison between the edges that works well enough to solve puzzles. To compare 2 edges the program takes the normalized edge of edge a Figure 11. The 2 edges compared and takes the reversed normalized edge of edge b. are mapped onto each other. The Remember that the normalized edge’s endpoints are cost of the left image is 0.4536 and the cost of the right is 0.8658. on the y-­‐axis and the beginning is at the origin. For the Both are correct matches reversed normalized edge the edge is essentially flipped 180 degrees and its endpoint is now at the origin. The program goes through each point in one contour and finds the closest point in the other contour. All of these distances are added up, and the sum is divided by the length of the edge. This is essentially the average distance between the two edges. A perfect match would be zero. As the edges deviate in shape the value grows. Comparing 2 frame edges from the outside edge of the puzzle will always result in very small costs, to avoid matching to edges together, the program returns a very high number as the cost of merging those two edges. This number needs to just be larger than any other real match could be. The same is done when two holes or tabs are compared. This improves the speed of the algorithm later because it will eliminate some work that would other wise be done. 12 When doing a puzzle by hand it feels like a very binary yes or no answer the question: do the pieces fit?. I was hoping for there to be a big jump in cost when comparing pieces that should not go together. The graph below shows the cost for the best 500 edge matches in order from lowest in cost to highest. The 48 piece puzzle is 6 by 8 pieces, so there are only 82 correct matches in total. There is not an obvious jump that can be used to separate the correct matches from the incorrect matches. For this 48 piece puzzle the lowest cost, matches are correct matches. In a 104 piece puzzle, which has 187 possible correct matches, I found within the first 150 lowest cost matches 3 were incorrect. Costs for 500 lowest cost matches in 48 pc. puzzle 2.5 Cost 2 1.5 1 0 1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307 324 341 358 375 392 409 426 443 460 477 494 0.5 The imperfect results from the comparison algorithm can still usually be used, because the assembly algorithm is robust enough to throw out some logical inconsistencies and produce a correct representation of the puzzle. Assembly Algorithm My assembly algorithm comes pretty much directly from [2]. In that paper the author is concerned with assembling a puzzle of square pieces from a digital image using color information to compare edges. Like the problem described in this paper the pieces go together in a similar fashion, each piece has 4 sides and are arranged in a rectangular grid. I am also able to compare each edge against other edges and associate a cost. To understand how the algorithm works, imagine that you have a disassembled puzzle in front of you. Looking at all of the available ways to put 2 pieces together 13 you would select what you think is the best match. Let one of the pieces be A and the other B. Now rotate A so that the edge that will be joined with B is facing the right, and then rotate B so that the edge that will be joined with A will be on the left. Now join the pieces together. Now look again at all of the available ways to put two pieces together and find the best match only looking at 2 pieces at a time. Then join those pieces in the same manner as A and B were joined above. This is done until the puzzle is assembled. When joining the groups of pieces together, if 2 pieces completely overlap, that match is rejected and ignored. The same goes for if 2 pieces are found to be already in the same group, the match is ignored. The basic outline of the assembly algorithm is as follows: 1. Compare all edges against every other edge store the results as a list of matches and associated cost. 2. Sort the results from step 1 based on cost. 3. Use the next unused lowest-­‐cost edge match and try to put the pieces together. If pieces overlap each other, or the pieces are already joined ignore this match. The algorithm borrows ideas from Kruskal’s algorithm for finding the minimum spanning tree. In this algorithm, edges of a graph are added in order from the minimum cost to the maximum cost until every node has been added. It also only adds an edge to the MST if it does not create a cycle in the graph. One could think of a puzzle piece as a node in a graph, and a match between the edges of pieces as an edge in the graph. The algorithm also uses a more advanced disjoint set data structure to keep track of which pieces are in the same group. A disjoint set data structure supports 3 operations, make_set(), find(), and merge(). Make set adds an object to the disjoint set structure and makes it its own set with a pointer to itself as its representative. Merge takes 2 objects, finds each of their representatives, and sets one as the new representative of the other. Find returns the representative of an object, it does this by recursively following pointers of representatives until a representative points to itself. If two objects have the same representative, they are said to be in the same set. In the puzzle assembly algorithm the pieces each start out as their own set, and their representative set to themselves. Then when two pieces, say A and B, need to be merged into one set, B’s representative might be set to A. Now when you ask B who it’s representative is it will return A, and when you ask A the same question, it will also return A. This means they are in the same set. The actual merging is much more complicated, because it will not allow two sets to be merged if it would cause overlap. Overlap being a piece being placed completely on top of another piece. It also keeps track of every set and rotations of the pieces. The premise behind ignoring possible matches because of overlap is that this algorithm assumes that since it is using lower cost matches in the beginning, those 14 matches have a much higher chance to be correct than the later, higher cost, matches. So if a match later on in the algorithm causes overlap, it is assuming that that match is incorrect, because pieces merged before fit together even better. A merge rarely fails because of overlapping pieces. It never happens in my 24 piece puzzle, and happens 3 times in my 104 and 48 piece puzzles. Algorithm Implementation To store each match and its score, I have created a struct that holds the index of both edges and the score. I then compare all of the edges against each other and store all of these structs in a vector. I then sort the vector based on the cost. This allows the algorithm to use the best matching pieces first. This part of the assembly algorithm is the most time intensive. If there are N edges then there are N2/2 comparisons. It is over 2 because if piece A is compared against piece B, comparing piece B against piece A is not needed. The comparison algorithm internally is N2 where N is the number of points in the contour. This is also a trivially parallel; using OpenMP this part can become multithreaded with just 3 extra lines of code and a compiler flag. The implementation of this can be found in puzzle.cpp. Next, the algorithm iterates over the vector of matches and costs, merging pieces until the puzzle is solved. If there are N pieces in the puzzle there needs to be N successful merges. The merging is all handled in a data structure I wrote called PuzzleDisjointSet. This data structure supports the same operations as a disjoint set, but it is specialized for this specific puzzle domain. Inside of the loop the merge operation is called on the puzzle disjoint set. It is telling it with what pieces to merge, and what edges to merge the pieces with. If successful it will return true, false otherwise. Unsuccessful merges could happen when two pieces would overlap, or two pieces are already in the same set. The puzzle disjoint set needs to keep track of the relative locations of each piece within a set, and how each piece is rotated so the correct edges line up. To aid in this I have come up with a convention for how sides are numbered, and how rotations are expressed. Starting from the left, each side is numbered in counter clockwise order starting with 0. So left, down, right and up, are 0, 1, 2, and 3 respectively. Because the problem is restricted to puzzles that go together in a grid like fashion, with 4 edges each, the pieces can be rotated in one of four ways. The rotations are expressed in the number of quarter turns counter clockwise the piece has been rotated. I always mod this number by 4 so that the rotation is always between 0 and 3. So when the piece has a rotation of 1, side 0 is now at the bottom, and side 1 is now at the right. To store the relative locations and rotations of each of the pieces, I simply store the integers in an OpenCV integer matrix. Each set has 2 matrices, one for locations and one for rotations. The location matrix has the index number of the piece in its location relative to the other pieces in the set. If a piece does not occupy a grid 15 location, -­‐1 is used as the sentinel value for empty. The rotation matrix has the same dimensions as locations matrix. Each entry in the rotation matrix corresponds to a piece in the same location in the location matrix. The rotation matrix stores values 0 to 4 representing how many quarter turns that piece has been turned relative to its initial starting position. When the puzzle disjoint set is initialized each piece is represented as its own set. Each set will have a 1 by 1 location matrix with its id as the value of its only element. Each set will also have a 1 by 1 rotation matrix that is set to 0. Each set will also be its own representative, by assigning its representative to itself. The representative is simply an integer corresponding to the id of who its representative is. To illustrate how pieces are merged, I will use a simple example. Lets imagine that you have found that piece 5 and 12 need to be merged. Side 0 on piece 5 needs to be joined with side 3 on piece 12. Initially the sets for those two pieces would look like this: Locations matrix Rotations matrix Representative 5 0 5 0 12 12 The convention is to always join pieces vertically in the middle. Piece five side 0 is the left side, this means it will need to be rotated 2 quarter turns so that side 0 is on the right. To be able to join side 3 on piece 12, it needs to be rotated 1 quarter turn so that side 3 is facing the right. The updated data structure would look like this: Locations matrix Rotations matrix Representative 5 2 5 1 12 12 Next I calculate the size of the matrix needed to store both pieces. In this example it is obvious that it will need 1 row, 2 columns. I also know that piece 12 will go in the second column, so I create 4 new 1x2 matrices 2 of these are to store the locations, and 2 are to store the rotations. The location matrices are initialized to -­‐1, and the rotations matrix is initialized to 0. I then copy the contents of the location matrix for piece 5 into one of the location matrices with an offset of 0. Followed by copying the 16 contents of the location matrix for piece 12 into the other location matrix with an offset of (0,1). I do the same for the rotation matrixes. I now have the following: New location matrices New Rotations matrices 5 -­‐1 2 0 -­‐1 12 0 1 I do not copy them into one new matrix, because if there is overlap I would not be able to tell. When combining them now I iterate over all of the positions in the matrices, if both matrices don’t have a -­‐1 as a value in one of the positions then there is overlap and the merge stops and fails here. Otherwise, both of the pieces are joined by combining the matrices. The first piece that was combined will become the new representative for the second piece. The actual representative will hold the new location and rotations matrix. So after a successful merge the disjoint set data structure will have the following data: Locations matrix Rotations matrix Representative 5 12 2 1 5 0 12 12 Although a set has a member called representative, this does not mean that that id is the true representative. It is simply a pointer to its parent in the tree. To find the actual representative, you need to recursively follow the pointers to representatives until a representative points to itself. That is representative == this.id. This highest representative will contain the locations and rotations of all of the sets below it. If piece 12 above needs to be merged with another piece, the disjoint set data structure will see that it has a representative of 5 and will use the location and rotation matrices listed under 5. After all of the pieces have been placed, you could use any piece id in the disjoint set, find its representative, and you would get the final solution grid. 17 The merging operation must handle much more complicated matrices. I encourage anyone interested to look at the join_sets() function in the PuzzleDisjointSet.cpp file. It has many comments and you can follow the procedure that will work when merging any complicated pair of sets. After the assembly algorithm is complete, a location matrix and rotation matrix is obtained that describes how each piece fits with the other pieces. In Matlab notation this is what the matrices look like for a solved 104 piece puzzle. Locations: [37, 59, 92, 85, 79, 24, 4, 96, 71, 73, 10, 2, 25; 30, 81, 34, 18, 42, 86, 89, 103, 70, 38, 20, 74, 94; 87, 23, 1, 32, 22, 65, 99, 82, 7, 14, 72, 57, 97; 29, 12, 16, 44, 21, 98, 17, 6, 53, 90, 66, 26, 19; 31, 8, 39, 95, 91, 88, 69, 102, 11, 101, 49, 13, 55; 27, 60, 68, 33, 64, 61, 51, 78, 54, 52, 35, 83, 93; 75, 28, 46, 77, 5, 67, 62, 80, 41, 47, 40, 50, 100; 76, 15, 43, 84, 0, 36, 63, 58, 9, 45, 56, 48, 3] Rotations: [0, 3, 2, 0, 1, 1, 1, 2, 1, 3, 3, 2, 0; 0, 2, 0, 2, 1, 2, 0, 1, 3, 0, 2, 3, 2; 1, 2, 0, 0, 0, 0, 2, 1, 1, 3, 1, 1, 1; 0, 1, 0, 3, 0, 0, 2, 0, 2, 3, 2, 1, 2; 0, 0, 2, 2, 0, 3, 3, 1, 1, 0, 1, 2, 1; 0, 0, 2, 2, 0, 0, 2, 0, 2, 0, 1, 1, 3; 3, 0, 2, 2, 2, 1, 0, 0, 1, 2, 1, 0, 2; 1, 2, 2, 0, 2, 3, 2, 0, 1, 0, 3, 0, 2] Creating the output image Given the relative locations and rotations, the pieces can be placed into an output image using affine transforms. I have tried using rigid transforms, but the resulting image is not as aesthetically pleasing, and generally the pieces are not distorted too much when using an affine transform. To do this I start with the upper left hand corner (0,0) of the location and rotation matrix. Looking at the rotation matrix, I can calculate which corner goes in the upper left of the image. With a rotation of 0, it will be corner 0; with a rotation of 1 it will be 3 etc. Using the above locations and rotations, corner 0 will need to be placed in at point (10,10). (10,10), because I use a 10 pixel border so it doesn’t run right up to the edge of the picture. To calculate the affine transform I need 2 more mappings. I can calculate the distance between corner 0 and corner 3. If I let d be this distance, then corner 3 maps to (10+d,10). If e is the distance between corner 0 and corner 1, then corner 1 maps to (10,10+e). OpenCV has a function to calculate the affine transform matrix between the original locations and the desired locations. Then using this transformation matrix, the original color image of the piece, and the black 18 and white binary image as a mask, the piece can be affine transformed into place in the final output image. This is done using OpenCV’s warpAffine() function. For the rest of the pieces, I use the locations of the corners of the previously placed as the anchor points for the corners of the current piece in question. Because of this dependence, sometimes the final image can be a little off. Any error is passed along to the next calculation. You can see in the final output images that this usually causes one of the outer edges of the puzzle to be at a slight angle compared to where it should be. If the puzzle was not solved successfully and there are -­‐1 values inside the locations matrix representing empty places. The image saving routine will stop at its first -­‐1 and save only the partial results. This is because I have not implemented a strategy to handle holes. Creating the output image is a very computationally intensive task, and takes the longest out of any part of the program. With a 104 piece puzzle, this takes 60% of the computation time. I think that this could be improved if the affine transforms were processed on the graphics card instead of in the CPU. Runtime The runtime will always be different depending on the computer that it is running on. So I will give the results that I see on my computer to give a realistic estimate of runtime. For reference I run this program on a mid 2011 MacBook Air, 4 GB ram, i5 1.7 GHz, 2 “hyper-­‐threaded” cores. Initialization is the time to read in the images, find the pieces, edges etc. Solving the puzzle includes finding the edge costs and assembling the puzzle. Draw output is the time it takes to place each piece in the output image and save the output image. Number of Pieces 24 48 104 Initialization (s) 2.05 2.20 11.91 Solve puzzle (s) 0.32 0.78 16.77 Draw output (s) 8.95 16.24 43.54 Total time (s) 11.32 19.21 72.22 Testing To test this program I heavily relied on generating output images for each of the stages. Many of the pictures in this paper came directly from those tests. For example to make sure that the classification of edges was correct I printed out each edge and its classification into an image and saved it to disk. I then could go though and see if each edge and visually tell if each edge was classified successfully. To test and the assembly algorithm I stepped through the results of each merge. While at the same time merging the same pieces of the actual physical puzzle. This 19 was done when initially trying to solve the 24 piece puzzle. For the larger puzzles the program produced correct results so it was assumed that the assembly algorithm was working, as it should. The program can still produce all of these debugging images. I have simply commented them out with a description of what each of the blocks of debugging code does. Limits of the program and possible improvements Aside from the restrictions that I placed on this problem, there are still some improvements that could be made to enhance the quality of results. Notably improving piece finding and corner detection. This program is still a bit unstable when it comes to solving puzzles using the front side of the puzzle. This is due to the fact that the puzzles will almost always have some black areas in them. One possible way to solve this problem is to use a different colored background, something very different from any of the colors in the puzzle. Any pixel different from that background color could then be classified as a pixel in a puzzle piece. I have tried this method, but the puzzles I have contain a wide variety of colors and it was very difficult to find a piece of paper that was not the same color as a part of the puzzle. I think that working on this problem would be a reasonable effort and could even improve the quality of the contour and help produce even better quality scores between matches of pieces. When trying to go from a 104 piece puzzle to a 300 piece puzzle, the biggest problem I had was trying to successfully find the corners. The main problem with going to puzzles with more pieces is the fact that the pieces get much smaller. As the pieces get smaller the noise of small paper fibers and the sharpness of the corners stays about the same. The smaller pieces have tighter curves for the holes and tabs, so the sharpness of the corners relative to the sharpness of the holes and tabs is even closer. To a corner-­‐detecting algorithm, the tabs look like very good corner candidates. I would suggest a more advanced corner-­‐detecting algorithm that uses some more domain specific features to find the correct corners might be more successful. Accomplishments This program has solved front and back scans of a 24 and 48 piece puzzles. It also has solved a 104 pc puzzle using just the backside. It can solve any of these puzzles in less than 1.5 minutes on a dual core, 1.7 GHz, i5 processor when compiled with the Intel C++ Compiler and aggressive optimizations. Appendix A shows each of the 5 puzzles it has solved, with an example of one of the input images for each puzzle. 20 Obtaining Source Code The source code is included in this document in Appendix B. However an easier to read and download version is stored in a Git repository online at https://github.com/jzeimen/PuzzleSolver there you can browse the source code online and read my instructions on how to compile and run. Also included in the repository are all of my input images for several puzzles. References [1] H. Wolfson et al. "Solving jigsaw puzzles by computer," Annals of Operations Research, Vol. 12 Issue 1-­‐4, pp. 51-­‐64, Dec. 1988. [2] A. C. Gallagher, "Jigsaw puzzles with pieces of unknown orientation," cvpr, pp.382-­‐389, 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012 [3] M. Sarfraz et al. "A new approach to corner detection," in Computer Vision and Graphics, International Conference, Warsaw, Poland pp. 528-­‐533, 2006. [4] M. K. Hu, "Visual Pattern Recognition by Moment Invariants", IRE Trans. Info. Theory, vol. IT-­‐8, pp.179–187, 1962 [5] D. P. Huttenlocher, "Comparing images using the Hausdorff distance," IEEE Trans. Pattern Anal. Mach. Intell. Vol. 15 Issue 549, pp. 850-­‐863, Sep. 1993. [6] Suzuki, S. and Abe, K., Topological Structural Analysis of Digitized Binary Images by Border Following. CVGIP 30 1, pp. 32-­‐46 1985. [7] OpenCV. (Apr. 11, 2013)OpenCV 2.4.5.0 documentation. [Online] Avaliable: http://docs.opencv.org/2.4.5/modules/refman.html 21 Appendix Appendix A: Solved Puzzles Angry Birds Front, 24 pc 22 Angry Birds back, 24 pc 23 Toy Story Front, 48 pc 24 Toy Story Back, 48 pc 25 104 pc puzzle back 26 27 Appendix B: Source Code main.cpp // // // // // // // main.cpp PuzzleSolver Created by Joe Zeimen on 4/4/13. Copyright (c) 2013 Joe Zeimen. All rights reserved. #include #include #include #include #include #include #include <iostream> <string.h> "puzzle.h" <cassert> "util.h" "PuzzleDisjointSet.h" <sys/time.h> //Dont forget final "/" in directory name. static const std::string input = "/Users/jzeimen/Documents/school/College/Spring2013/ComputerVision/FinalProject/PuzzleSolver/PuzzleSo lver/Scans/"; static const std::string output = "/tmp/final/finaloutput.png"; int main(int argc, const char * argv[]) { // std::cout << "Starting..." << std::endl; timeval time; gettimeofday(&time, NULL); long millis = (time.tv_sec * 1000) + (time.tv_usec / 1000); long inbetween_millis = millis; //Toy Story Color & breaks with median filter, needs filter() 48 pc puzzle puzzle(input+"Toy Story/", 200, 22, false); // //Toy Story back works w/ median filter 48pc puzzle puzzle(input+"Toy Story back/", 200, 50); // //Angry Birds color works with median, or filter 24 pc puzzle puzzle(input+"Angry Birds/color/",300,30); // //Angry Birds back works with median 24 pc puzzle puzzle(input+"Angry Birds/Scanner Open/",300,30); // //Horses back not numbered 104 pc puzzle puzzle(input+"horses/", 380, 50); //Horses back numbered 104 pc puzzle puzzle(input+"horses numbered/", 380, 50); gettimeofday(&time, NULL); std::cout << std::endl << "time to initialize:" << (((time.tv_sec * 1000) + (time.tv_usec / 1000))inbetween_millis)/1000.0 << std::endl; inbetween_millis = ((time.tv_sec * 1000) + (time.tv_usec / 1000)); puzzle.solve(); gettimeofday(&time, NULL); std::cout << std::endl << "time to solve:" << (((time.tv_sec * 1000) + (time.tv_usec / 1000))inbetween_millis)/1000.0 << std::endl; inbetween_millis = ((time.tv_sec * 1000) + (time.tv_usec / 1000)); puzzle.save_image(output); 28