“Agricola” Board Game Assist Program EGGN 512 – Computer Vision Derek Lang May 7, 2012 Introduction German-style board games are a class of board games that usually rely on strategy, deemphasize luck and conflict, and keep all players in play until the end of the game [1]. Typically, the rules are simple to keep the game accessible for players of different ages and nationalities. One notable exception is the game Agricola. There are 360 cards, and 17 different types of tokens (to a total of 317 total) that are used for the game. The game's concept is that of farm management in late 17th century Europe. A player's board represents their farm. The score is largely based on what tokens are present on their farm. The purpose of this paper is to provide a means of processing an image of a player's end-game board, and to compute a score. Hopefully, this algorithm will provide a quick means of calculating the score and determining the winner of the game. Color images are taken using a DSLR camera as JPEGs under the same lighting conditions. 17 images were used (see Fig. 1), 1 for calibrating the board detection, 16 of varying end-game conditions. The images were then rectified using the Bibble ® software lens correction plugin. All code was developed in Matlab as a proof of concept. This paper will first discuss the existing literature relevant to this project. It will then describe in detail the techniques used in this project. The project results will be shown. Finally, a discussion of the results and recommendations on future work will be presented along with reference material. An appendix of all the code developed for this project is included at the end of the paper. Related Work Markers and recognizable patterns have been used extensively to determine the pose of objects in relation to a camera. One common example is the concentric contrasting circle (CCC) [2]. A CCC consists of a white circle centered in a black circle. An intensity threshold can be taken of the image, and correspondence between the centroids of the circles will determine a CCC detection. However, using multiple black and white CCCs will need a non-trivial solution to determine the geometry of the points, so colored CCCs will be used. Color segmentation is usually considered where the colors and lighting conditions may not be known in advance. One method by Wang and Watada [3] uses the Karhunen-Loeve transform to extract the most information from a color image, into a 2-dimensional form, and to use an Otsu [4] threshold to segment. Another method would use K-means clustering [5] to determine which colors to threshold around. For this project, the colors of the board are known in advance, so a simple threshold can be determined through experimentation. The SIFT algorithm [6] is a widely used method for creating and matching feature descriptors. In SIFT, a keypoint is detected from a LOG scale-space, and a feature vector is created from the neighborhood gradient around that keypoint. Ke and Sukthankar [7] developed an optimization of the SIFT algorithm. The algorithm alters the last step of the SIFT algorithm (keypoint description) and instead of building the descriptors from the local image gradients of the image, a precomputed eigenspace is used to express the gradient images, which is used to project a keypoint's local image gradient, deriving a feature vector that only contains the top neigenvectors for a given image gradient. Calonder, Lepetit, Strecha, and Fua [8] also developed a feature descriptor, which they called a 'Binary Robust Independent Elementary Feature' (BRIEF). An image patch surrounding a keypoint can be described as a binary string of comparisons between different regions of the image patch. The descriptor is very fast to compute, and the Hamming distance can be used to match features, which is faster and more efficient than the L2 norm used in other feature descriptors. Algorithm The main algorithm can be described in the following steps: 1. Detection of the board 2. Detection of tiles 3. Detection of player pieces: ◦ Persons ◦ Stables ◦ Fences 4. Detection of livestock pieces 5. Detection of crop pieces 6. Detection of pastures 7. Calculation of score Although there are a lot of different pieces to consider for this project, there are many factors that help to simplify the task. Knowledge of the rules of the board can be used as logic that will reduce the amount of detections that have to be made and help check for errors. Once the detections have been made, a score can be calculated using lookup tables. Board Detection Fig. 1: Samples of the test images. The board is detected by placing four color markers similar to CCCs [1] at the four corners of the board. The image is converted to HSI values, and a narrow band hue threshold of the image can be taken for each of the targets. The white marker centers can be found by taking a threshold of the intensity of the image. The markers are then found by matching the centroid of the colored circle to the centroid of the white center. Finding the four markers will give a correspondence to the known geometry of the board, which can be used to create a transform to convert the image into a orthornormal perspective. See Fig. 4 for an example of this normalization. The normalized test image can be further subdivided into 54 regions that can be categorized as follows: 15 squares where most of the pieces are placed, the 38 bordering regions where the fence pieces are placed, and a region at the top of the board where a 'stockpile' of extra pieces for the player is placed. This reduces the complexity of detection algorithms, as fewer pieces will have to be detected at once, and smaller images to process will be faster to process. Fig. 2: Test image (left) and orthonormal transformed image (right) Tile Detection The tiles that are placed on the board are of four different types: Wooden house, Clay house, Stone house, and Field. The SIFT algorithm with a coarse Hough space [9] is used to determine a match between the template and the test region. Since there are 16 different images used for each of the tile types in the game, it is possible that 3,456 SIFT comparisons will be made. A few simplifications are made to increase the likelihood of matching and reduce the number of comparisons: 1. A full transformation between the matching sets of points is not calculated, only the difference between the descriptors is used as a rough transformation. 2. The angle of the transformation should be close to 90 degrees, so transforms outside of this range can be discarded. 3. Any transformation that does not have a minimum number of points is discarded. 4. From the game rules, it is known that your house will only be built of one type of material. A check for one type of house is skipped if another type is found. 5. If a match is found for one region, that region is not used for subsequent comparisons. Player Piece Detection There are three types of pieces that are specific to a player on a board. For simplification, only one color is considered for this project. The 'person' and 'stable' pieces are considered using the following method: On a region, a hue threshold is performed to isolate the piece from the background. The binary threshold is then morphologically closed and eroded to clean up the detection. From this binary threshold, the area and the centroid of the isolated piece can be used to determine a match for that piece. Since normalization can distort a piece to 'expand' into the neighboring regions, The centroid of a piece will lie within a radius of the center of the region (see Fig. 3). Fig. 3: Player Piece detection. RGB image (left) and corresponding hue threshold (right). Note that the centroid for the fence from the neighboring region is much closer to the edge than the true detection. The 'fence' pieces are isolated using a hue threshold, but instead of calculating the regional properties for that binary image, a simple percentage is calculated from the binary image. If the percentage of blue pixels to the total region is above a threshold (20%) then there is a match for a 'fence' piece in that region. Livestock Piece Detection There are three type of livestock pieces: the sheep (white), the boar (black) and the cattle (brown). Since they are all the same cube shape and occur in the same regions, they are given the same consideration. We only check the regions that do not have a field tile, since the game rules precludes this from occuring. Thresholds based on HSI conversion of the region are made to produce a binary detection. Hue and saturation are used for the cattle, and saturation and intensity are used for the sheep and boar. Morphological operations are used to isolate a detection from noise. The binary images' area is then compared to a threshold to ensure a detection. Crop Piece Detection There are two types of crop pieces: the grain (yellow) piece and the vegetable (orange) piece. The method to detect them is similar to detecting player or livestock pieces. A hue threshold is performed on a region, and the region properties of 'blobs' are thresholded to ensure detection. We use knowledge of the game's rules to only check for pieces on the large top region, and on top of filed tiles. Since the pieces are rotationally invariant discs, we can use a much larger erosion element to isolate each piece (see Fig. 4). Fig. 4: Crop piece detection. The hue threshold is eroded to isolate the pieces from possible touching. Pasture Detection Pastures are defined as any squares that are enclosed by fences on all sides. Since these can be of an arbitrary shape and size, a hole-filling algorithm [10] is used. A representational 7 by 11 pixel binary image of the board is created with all the fence detections used to construct outlines of the pastures. Using Matlab's imfill function, the filled representational image is xor'd with the original representational image, to create an image that contains only the detected pastures. It is then a simple matter to count pastures for detection. (see Fig. 5) Fig. 5: Pasture detection. Representational image showing fences (left) and resulting pastures (right). Results Board Detection and orthonormal image creation The board detection algorithm took 19 seconds on average to run. Marker detection was successful on all but one image (9515.jpg), making it 93.75% accurate. It is believed that glare on the board from nearby light over-saturated the image, affecting the threshold of the image (see Fig. 6). Fig. 6: Marker detection miss. Successfully detected marker (left) and unsuccessfully detected marker (right) Image 9512 9513 9514 9515 9516 9517 9518 9519 9520 9522 9523 9524 9525 9526 9527 9530 Calculated Score 10 17 10 0 32 23 29 25 28 12 9 19 22 24 49 22 Actual Score 11 18 21 22 33 24 29 26 28 12 16 23 23 25 50 25 Accuracy Notes 90.91% Missed sheep piece 94.44% Missed cattle pieces 47.62% SIFT mismatch 0.00% Marker detect failure 96.97% Missed cattle pieces 95.83% Missed cattle pieces 100.00% 96.15% Missed stable 100.00% 100.00% 56.25% Missed fence 82.61% Missed fence 95.65% Missed fence 96.00% Missed fence 98.00% Missed fence 88.00% 2 persons stacked Table 1: Scoring results. The calculated scores versus their actual scores is shown in Table 1. This method is 83.65% accurate based on these testing images. It is interesting to note that there were no false positives in detecting pieces, only false negatives. A SIFT mismatch would count one tile as another type, and that detection would cause checks for the other tile types to be skipped (see Fig. 7). A false positive would only occur if a person piece was not covering the center of the tile. It may be possible to avoid this detection by obscuring the center of the template images, but it is unknown if this would affect the robustness of the detection. Fig. 7: False positive tile detection. A fence piece would be missed if the piece was not completely in the region where it would be checked (see Fig. 8). An attempt to mitigate this was to enlarge the regions where they would be checked, but a miss like this could be attributed to player error of the piece not being in the correct region. Fig. 8: Fence miss. Note that the borders of the square above and below the region are visible. Livestock pieces would be missed if the pieces were touching (see Fig. 9). The threshold detection would treat the pieces as one large 'blob', and only one would be counted for multiple pieces. Many attempts were tried to remedy this, such as checking the area or bounding rectangle of a detection. This was exacerbated by occlusion of the piece and distortion of the piece due to the orthonormal transformation, which would lead to greater variance in the detection. Fig. 9: Livestock miss. Touching pieces are not isolated from eachother in a hue threshold. Discussion This method for calculating the score is largely accurate, with usually only one miss to account for the difference in score. The score in 9 of the 16 cases was also only off by one point. Errors in fence piece registration could be mitigated by further enlarging the region where the detection is checked. There is a condition where in the game, a player can stack two person pieces on top of each other. For the sake of simplicity of this method, this condition was not considered. The biggest obstacle to in correcting the score would be accounting for the cases where livestock pieces were touching. One way to reduce the exacerbating effects of the normalization of the board would be to require the picture to be taken from top-down. It would then be much easier to use the area of a detection to determine if there were multiple pieces in a single 'blob'. The time to run the algorithm would vary between 100 to 200 seconds. The main contributing factors to this length are the board marker detection and the SIFT tile detection. This could be improved by a number of means. Marker detection relies on comparing the centroid of the colored markers to their white centers, which can result in thousands of region property comparisons being made. This can be reduced by using solid markers, such as in Hislop's project [11], or by a single marker (similar to an AR target) that could give the pose needed to determine a transformation. The SIFT feature descriptor could be replaced by a more compact alternative, such as a BRIEF [8] descriptor, PC-SIFT [7], or SURF [12]. Before the development of this algorithm, a computer spreadsheet was created to help in calculation of the score. The time to determine score (for each player) is comparable or shorter to this algorithm. If this algorithm were optimized, robustified and ported to a stand-alone program, it could be preferable to use this as a means of determining score. References [1] Web. “German-Style board game”, http://en.wikipedia.org/wiki/German-style_board_game, accessed May 7, 2012. [2] L. Gatrell, W. Hoff, C. and Sklair, “Robust Image Features: Concentric Contrasting Circles and Their Image Extraction,” Proc. of Cooperative Intelligent Robotics in Space, Vol. 1612, SPIE, W. Stoney (ed.), 1991. [3] C. Wang, and J. Watada “Robust Color Image Segmentation by Karhunen-Loeve Transform based Otsu Multi-thresholding and K-means Clustering”, 2011 Fifth International Conference on Genetic and Evolutionary Computing, pp. 377-380, 2011. [4] N.Otsu, “A threshold selection from gray level histograms”, IEEE Trans. Systems, Man and Cybernetics, vol.9, pp. 62-66,Mar. 1979. [5] R. Szeliski, Computer Vision: Algorithms and Applications, Springer International, pp. 289-292, 2010. [6] R. Szeliski, Computer Vision: Algorithms and Applications, Springer International, p. 223, 2010 [7] Y. Ke and R. Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors ,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’2004), pp. 506–513, 2004. [8] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary Robust Independent Elementary Features,” in Proceedings of the European Conference on Computer Vision, 2010. [9] W. Hoff, “EGGN 512: Computer Vision”, http://egdegrees.mines.edu/course/eggn512/lectures/, Colorado School of Mines, USA, accessed May 7, 2012 [10] P. Soille, Morphological Image Analysis: Principles and Applications, Springer-Verlag, pp. 173174, 1999. [11] E. Hislop, “Augmented Reality for a Board Game”, Colorado School of Mines, W. Hoff (prof.), 2011. [12] H. Bay, A. Ess, T. Tuytelaars, L. V. Gool, “Surf: Speeded Up Robust Features”, Computer Vision and Image Understanding 10, pp. 346–359, 2008. Appendix A: code function score=scoring(fname) % score=scoring(fname) % Agricola board game AR project % main scoring function. % reads fname, and hopefully computes the score corrresponding to that % board. % please make sure that the board has markers, and that the image is % treated properly. p=path; if strfind('vlfeat',p)==[] run('..\vlfeat-0.9.14\toolbox\vl_setup.m') % setup vlfeat toolbox end clear p %% read file % fname=input('enter file name:\n','s'); I=imread(fname); %% detect board markers and normalise [Inorm]=board_norm(I); %crop image for processing. crops=board_crop(Inorm); pause(5) %% Detect tiles template_struct crops=sift_test(crops,stone); crops=sift_test(crops,clay); crops=sift_test(crops,wood); crops=sift_test(crops,field); %% Detect Stables crops=stable_test(crops); %% Detect Fences & Pastures crops=fence_test(crops); crops=pasture_test(crops); %% Detect Persons crops=person_test(crops); %% Detect Livestock crops=cow_test2(crops); crops=sheep_test(crops); crops=boar_test(crops); %% Detect Vegetables/Wheat (Todo) crops=wheat_test(crops); crops=veg_test(crops); %% Determine numbers run score_list squarelist=[8:2:16 24:2:32 40:2:48]; % arrays awheat=[crops.wheat]; aveg=[crops.veg];