2013 10th IEEE International Conference on Control and Automation (ICCA) Hangzhou, China, June 12-14, 2013 A Fast and Robust Fingertips Tracking Algorithm for Vision-Based Multi-touch Interaction Qunqun Xie1 Guoyuan Liang1 Cheng Tang1 and Xinyu Wu1,2 It’s well-known that people usually touch things of interest with fingers. Hence in a multi-touch system, the most challenging work is to identify the locations of fingertips. In recent years fingertip tracking has received great attentions of researchers all over the world. Fingertip tracking, in general, is still a tough problem due to its flexible shapes and high degree of freedom (DOF). Therefore, people tend to employ extra sensors, typically mechanical or optical, to capture fingertip positions directly [9-11]. While with the development of computer vision, various vision-based fingertip tracking algorithms have been reported. Some of these approaches utilize the geometry properties of hand, e.g. curvature, edge or shape, and build a model to locate the fingertips. Dominguez, et al. presented a curvature analysis based method to track the fingertips by employing a headmounted camera system [12]. Hongwei Ying et al. proposed a fingertip detection algorithm by analyzing edges of fingers after segmentation from depth images captured by a trinocular vision system [13]. Kim, S. and Pak, Y. et al. described a I. INTRODUCTION method based on Active Shape Models (ASM) and an ellipse The multi-touch technology has seen rapid growth in equation to detect and track fingertips without using skin recent years and is currently one of the research hotspots in color [14]. Tony Heap et al. first constructed a 3D deformable Human Computer Interaction (HCI) [1]. The first multi-touch Point Distribution Model to track hand with single video screen based on pressure sensing was designed by Nimish camera [15]. Some other vision-based tracking algorithms, Mehta in 1982[2]. Since then, various multi-touch devices however, make use of certain image analysis techniques, such and technologies have been launched by researchers all as template matching or color segmentation etc., to track over the world [3-7]. Early implementations of multi-touch fingertip movements. The methods described in [16-17] emtechnologies are not only complex but also expensive until ployed circular and elliptical templates for fingertip tracking. 2005 when Jefferson et al. presented a FTIR-based solution Some researchers believe geometry structure contains the which greatly reduces the cost of multi-touch technology most important information of fingers, and presented some [8]. Currently the multi-touch technologies can be classified geometry structure based fingertip tracking algorithms [18into two categories: Senor-based and Computer Vision based. 19]. Most of these approaches, however, are computationally Senor-based technologies integrate different types of sensors expensive and can only track 2D trajectories of fingertips. into touchpad and directly receive finger touch as input. More recently, Daniel R. Schlegel et al. built a new visionAlthough it works well with various portable devices, the based interaction system named AirTouch [20] which is able relatively high cost limits its applications to some extent. to track multiple fingertips in 3D space, but the user need to Recently, the development of computer vision makes it wear a glove with marks. possible to seek inexpensive vision-based solutions which In this paper, we propose a fast and robust fingertip may have good scalability as well as good performance. tracking algorithm based on geometry structure model of hand. Compared with existing methods, our algorithm at*This work described in this paper is partly supported by Shentempts to track movements of multiple fingertips not only zhen Nanshan District Technical R&D and Innovative Design Fund (KC2012JSYB0050A), Shenzhen Internet Industrial Developing Special in 2D but also in 3D space without using any marks. A Fund (JC201005270368A), and Guangdong Innovative Research Team stereovision system is set up to retrieve depth information Program (201001D0104648280). 1 Q. Xie, G. Liang, C. Tang, X. Wu are with Guangdong Provincial Key of the scene. The algorithm detects the hand region using Laboratory of Robotics and Intelligent System, Shenzhen Institutes skin color filter as well as depth images reconstructed from of Advanced Technology, Chinese Academy of Sciences. Email: the stereovision system, and then calculates the position {qq.xie,gy.liang,cheng.tang,xinyu.wu}@siat.ac.cn of palm center. Finally based on the observation that the 2 X. Wu is also with the Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong SAR,China. geometry structure of hands is almost identical, it’s possible Abstract— Finger touch is the most natural way for human interaction with the external world. In the past five years, the great success of multi-touch trackpad on portable devices implies the big potential of multi-touch technology to be applied in vision-based human-computer interaction (HCI) systems. The implementation of multi-touch technology highly depends on accurate and fast fingertip tracking. In this paper we present a fast and robust algorithm for tracking fingertip positions in a stereovision-based 3D multi-touch interaction system. Our method first detects the hand region by a two-step strategy based on skin color filter as well as depth images. Then a geometry model is built to locate the fingertips. The accuracy and effectiveness of the fingertip tracking algorithm is examined over several video sequences with complicated backgrounds. Experimental results verify that our algorithm can reliably and accurately track the movements of fingertips in real time. The effectiveness of the fingertip tracking algorithm also reveals the capability allowing user to interact with computers through their finger movements in 3D space over a virtual thin-film-like touch surface which is set up at a certain distance away from the screen. 978-1-4673-4708-2/13/$31.00 ©2013 IEEE 1346 utilizing the geometry relation between palm center and hand contour to determine the fingertip locations on the contour. Accuracy and effectiveness of this algorithm are examined over several video sequences with complicated backgrounds. Experimental results verify that our algorithm can achieve stable and efficient performance during the tracking process. We are now building a 3D virtual multi-touch interaction system in our lab. A virtual thin-film-like touch surface is set up at a certain distance in front of the screen. By a stereovision-based fingertip tracking system, user can pull, push, spin or twist the virtual surface in 3D space, which may activate different inputs into the computer. The system is expected to greatly improve the richness of user experience. The rest of the paper is organized as follows: In Section II, we will present the efficient hand region localization algorithm based on skin color and depth information. Section III describes the robust and fast approach for fingertip tracking. Section IV gives a brief introduction on the 3D virtual multitouch interaction system now under development in our lab. Experimental results and discussion are presented in Section V. Finally, the conclusions are drawn in Section VI. (a) (b) Fig. 1. Detection of skin-color regions. (a) Original hand image; (b) Extracted skin-color regions. which can be estimated from training data as follows: µs = ∑s = 1 n−1 1 n n ∑ cj (2) j=1 n ∑ (c j − µs )(c j − µs )T (3) j=1 where n is the number of training samples. Finally, the Gaussian Mixture Model is defined by II. EFFECTIVE HAND LOCALIZATION k In order to track the movements of fingertips, hand region should be segmented from background first. Hand localization, however, is still a challenging problem due to its high degree of freedom (DOF) and flexible shapes. For the purpose of improving system efficiency, our approach handles the problem in a simpler way. The hand is considered as the closest object with skin color in front of the camera. This assumption is acceptable in most of human-computer interaction tasks. The hand region can be detected by a twostep method. First, a skin-color filter is applied to locate the candidates of hand region.Then the hand is segmented from depth images using depth clipping and region grow algorithm. P(c|skin) = ∑ λi pi (c|skin) (4) i=1 where k denotes the number of mixture components, λi denotes the weight of each Gaussian model which satisfy k ∑ λi = 1. The P(c|skin) can be used directly as the measure i=1 of how ”skin-like” the color is. In this paper we set k to 5 and the Gaussian Mixture Model parameters are evaluated by the well-known Expectation Maximization (EM) algorithm. After extracting skin-like regions from background, we apply morphological operations on the extracted regions for the purpose of removing noises. In this stage, some objects with skin-like color are also extracted as candidates of hand region. Fig. 1 shows the result for skin-color region detection. A. Skin Color Filter Skin-color has been proven to be an effective cue for extracting hand and face regions from background. Basically, skin color detection is to define decision rules and build a skin color classifier. The main difficulty is to find both appropriate color space and adequate decision rules. In our algorithm, we choose YCbCr color space for skincolor segmentation [21]. YCbCr color space separates the color information to three channels: luminance, chrominance and compactness, and is appropriate for skin-color segmentation. In addition, a parametric model, the Gaussian Mixture Model is also employed to describe the skin-color distribute. For a single Gaussian model, skin-color probability distribution p(c|skin) is defined as follow[22]: p(c|skin) = T −1 1 1 e− 2 (c−µs ) ∑s (c−µs ) 2π | ∑s |1/2 (1) here c is a color vector. µs and ∑ s are the model parameters B. Hand Segmentation from Depth Image As mentioned before, hand is assumed to be the closest object with skin-color in front of the camera. Therefore, the hand can be identified from all the candidates by depth clipping technique and region grow algorithm. Note that the hand region in depth image is continuously distributed and extends within a limited 3D space, so the points with minimum depth are picked as seeds. By applying region grow algorithm, the hand region is segmented from the background. Fig. 2 illustrates the process of hand segmentation. Unfortunately, the extracted hand region from above step sometimes contains not only the hand but also a small part of wrist. In order to get a clean hand segmentation, we follow the work by zhenyao et al. [18] and divide the wrist and hand by a boundary curve Bw , as shown in Fig. 3. The definition of BW can be found in [18]. 1347 (a) (b) (d) (c) (e) Fig. 2. Hand region extraction. (a) Original hand image; (b) Depth image of (a); (c) Extracted region after skin-color filtering and noise removal; (d) Extracted region after applying depth clipping and region grow algorithm; (e) Hand region: intersection of (c) and (d). Fig. 3. The wrist and hand are divided by a boundary curve Bw . III. FINGERTIP TRACKING Fingertip tracking is the key part in a multi-touch system. In this section, we will describe a fast and robust fingertip track algorithm. Different from most existing approaches which are usually based on template matching, curvature analysis or color marks, our approach utilizes the geometry information of hand shape to determine the positions of fingertips. Due to its simplicity and effectiveness, our method is expected to make faster and more stable tracking on fingertips. A. Palm Center Localization In order to find positions of fingertips, the center and the size of palm should be estimated first. Most of current works apply repeated morphological operations until the object is small enough to indicate the location of palm center. It is usually time-consuming and difficult to decide when to stop for there is no explicit rule for stopping. In general, most of current works stop in terms of the size of hand region. The size, however, changes along with the depth and shape of hand. In addition, the ambiguous definition of a ”small enough” size also leads to the ambiguity of ”best” palm center. In this section, we propose a projection based method for palm center extraction which is more effective and more simple. We notice the palm is a rectangle-like region, more or less. Based on this observation, a projection based algorithm is Algorithm 1 Palm Region Extraction Input: Binary image of hand IH Projection angle interval ∆θ = 45o A predefined threshold λ Output: Binary image of palm IP 1: Initialize IP with 0 2: Construct the set of projection ⌊ angles ⌋ Θ = {θi = i∆θ , i = o 0, ±1, · · · , ±N − 1, N}, N = 180 ∆θ 3: Define a point set L and set it to empty 4: Project the hand image in all directions angle count=0 w =width of IH , h =height of IH FOR each θk ∈ Θ angle count=angle count+1; FOR each pH (i, j) ∈ IH , 0 ≤ i < w, 0 ≤ j < h IF pH (i, j) ̸= 0 point count= 0 WHILE pH (i + cos θk , j + sin θk ) ̸= 0 put pH (i, j) into line L i = i + cos θk , j = j + sin θk point count=point count+1 END WHILE IF point count≥ λ FOR each point pL (il , jl ) ∈ L, l = 1, 2 · · · , point count IF pP (il , jl ) = angle count − 1, pP (il , jl ) ∈ IP set pP (il , jl ) = angle count empty L END IF 5: Output palm image IP FOR each pP (i, j) ∈ IP , 0 ≤ i < w, 0 ≤ j < h IF pP (i, j) ̸= angle count set pP (i, j) = 0 proposed to extract palm region. The basic idea is to project the hand region in all directions. If the projection line goes through only one block of the hand region, that implies this block is a candidate of palm region and should be preserved. After projections along all directions, the intersection of all candidates will make a final palm region. This algorithm is formulated in Algorithm 1. Normally it’s unnecessary to project the image in all directions because the extracted region will not change much when the projection angle interval is smaller than a certain value. Here we do the projection every 450 from 00 to ±1800 experientially. Compared with shape model based methods, our algorithm is faster and simpler. Fig. 4 shows an example for the palm region and palm center extraction. Palm center C0 is defined as the point in the palm region which has maximum distance from the closest palm boundary[18]: 1348 C0 = arg min{ min P∈R palm ,PB ∈B (d2 (P, PB ))} (5) an index to each point in the candidate set, then sort the set by the index. A function ϕ is defined for calculating the index θ p f = ϕ (Pf ,C0 ) (8) Pf ∈F (a) (b) (d) here Pf is the point in the candidate set F and θPf is the index of Pf . Actually ϕ has a clear physical meaning, i.e. the angle of inclination of the line Pf C0 with negative x-axis. The candidate set is sorted by θPf in ascending order (clockwise). Then the distances between successive points are calculated as follows to determine the start and end point of several subsets (the contour points for each finger) of the candidate set (c) (e) Fig. 4. Palm region and palm center extraction. (a) Original hand image; (b) Extracted hand region; (c) Preserved hand region when the projection angleθ =00 ; (d) Preserved hand region when the projection angle θ =900 ; (e) Extracted palm region: intersection of all preserved hand regions when projection angle varies from 00 to ±1800 at an interval equals to 450 . The red point is the palm center. where P denotes the point inside palm region R p , PB denotes the boundary B of palm region, and d2 is the function for calculating 2D Euclidean distance between two points. The size of palm R is defined as the distance between C0 and the closest boundary. R = min(d2 (C0 , P)) P∈B DPi = d2 (Pi , Pi+1 ) Pi ,Pi+1 ∈F (9) here, D pi denotes the distance between successive points Pi and Pi+1 . If D pi is greater than a predefined threshold δ (in our system, it is set to 2.5), Pi is considered as the start or end point of a subset. Therefore, all points in the candidate set can be divided into several subsets which are disconnected on the hand contour. Meanwhile each of them corresponds to one fingertip contour. Finally we compute the distance from each point in the subset to the palm center. The one with maximum distance is identified as a fingertip. The process of fingertip localization is illustrated in Fig. 5. (6) B. Fingertip Localization The fingertip is considered as the point with maximum distance to the palm center on the contour of each finger. Here we utilize another two-step method to locate the fingertips: Contour for each finger is extracted first. Then the point on the contour with maximum distance to the palm center is picked as fingertip. Finger contours can be regarded as subsets of hand contour. Here we extract the hand contour following the algorithm proposed by Suzuki and Abe et al. [23]. The contour which makes the maximum area is considered as the hand border. Then the distances between contour points and palm center C0 are calculated. If a distance is larger than a predefined threshold, this contour point will be put into a candidate set of fingertips immediately, as formulated in the following equation F = {Pf |d2 (P,C0 ) > α R, P ∈ B} (7) here P denotes a contour point, d2 (P,C0 ) denotes the distance between palm center C0 and P, α is a scale factor, we set it to 1.2 empirically. F is the candidate set of the fingertips. Once the candidate set is generated, a simple but effective approach is used to identify the fingertips in the candidate set. Firstly we need to determine the contour points for each finger in the candidate set. This can be fulfilled by tracing all the contour points and compared it with the points in the candidate set. In order to decrease the computing time, we present a more effective solution here. The idea is to assign (a) (b) (d) (c) (e) Fig. 5. Fingertip detection. (a) Extracted Hand region; (b) Hand contour; (c) Candidate set of fingertips on the hand contour. (d) Calculation of index θ p f for each point in the candidate set. The set is sorted by the index. The indexes are illustrated using a color coding scheme from pure green (the minima) to pure red (the maxima). The points are sorted in ascending order (clockwise); (e) Extracted fingertips (green points) and the start and end point for each fingertip region (red points). IV. THE MULTI-TOUCH SYSTEM There is a 3D virtual multi-touch interaction system now under development in our lab. Normal multi-touch systems usually need a special made touchpad to capture fingertips’ movements. Our vision-based multi-touch system, however, doesn’t require any pressure sensing devices, and the user 1349 doesn’t need to touch the surface physically. In this system, a virtual thin-film-like touch surface is laid at a certain distance away from the screen. The movements of user’s fingertips can be tracked by the stereovision system in real time. The user can interact with computer by pulling, pushing, spinning or twisting the virtual elastic surface in 3D space. Daniel, Albert et al. built a similar interaction system named AirTouch [20]. A marked glove is required to track the fingertips and only one fingertip touch on the 2D virtual screen can be recognized. Our system can identify multifinger touch without any marks, and allow user to interact with computer in 3D space as well. The system is expected to greatly improve the richness of user experience. Our 3D virtual multi-touch system is developed on the Window 7 platform. The stereovision system captures the depth image at a frame rate of 20Hz. We apply TUIO (A Protocol for Table-Top Tangible User Interfaces) protocol to package the multi-touch inputs with timestamp and send it to a TUIO client [24]. TUIO is an open framework which defines a common protocol and APIs for multi-touch surface. It allows the transmission of an abstract description of interactive surfaces and has been mainly designed as an abstraction for interactive surfaces. In our implementation, the fingertip touch events are sent to a TUIO client through the TUIO server after handled by the fingertip tracking system. The architecture of the 3D virtual multi-touch system is presented in Fig. 6. Fig. 6. experiment, the user is asked to touch the virtual touch screen with one, two, three and four fingers respectively. Four video sequences, each of them lasts 3 seconds, are recorded for testing the fingertip localization algorithm. The extracted positions of fingertips are sent to the TUIO client through the TUIO server. The tracking results are shown in Fig. 7. At each row, the first frame in the test video sequence, the tracked trajectories of the fingertips on the virtual touch screen, and the responses of multi-touch client are illustrated from left to right. The second experiment aims to test the accuracy of the tracking algorithm. For each test video sequence, we manually identify the positions of fingertips as the ground truth frame by frame, and then compare them with the positions detected by the tracking system. All together 360 frames are processed and the accuracy rates are calculated for all sequence. Even with the presence of complicated background, light changes and image noises, the total correct detection rate still reach as high as 91.1%, as shown in Table 1. TABLE I T HE STATISTICAL RESULTS OF OUR FINGERTIPS TACKING Video sequence for test ALGORITHM Number of frames Recognition rate 90 96.7%(87/90) 90 92.2%(83/90) 90 88.9%(80/90) 90 86.7%(78/90) 360 91.1%(328/360) The architecture of 3D vision-based virtual multi-touch system Total V. EXPERIMENTS AND DISCUSSION The stereovision-based fingertip tracking system is developed on a Xeon 3.07Ghz workstation. The application executes at a frame rate of 20Hz on average, which is pretty fast and good enough for real time interactions. The tracking system consists of three modules. The core module is fingertip tracking which encapsulates the functions of tracking algorithm and system input/output. Another module is the TUIO server. It is responsible for adding timestamp to the multi-touch input sequences and package them based on the TUIO protocol. The third module is the multi-touch client. We use the open source program TUIO Smoke [24] as the TUIO client. The distance from the virtual touch surface to screen is set to 0.5 meter by default during system initialization. Two experiments are designed to examine the effectiveness and accuracy of the fingertip tracking algorithm. In the first VI. CONCLUSIONS AND FUTURE WORKS In this paper, we have proposed a fast and robust method to track the fingertips. Different from existing approaches, this vision based method needs neither pressure sensing devices nor extra marks for fingertip localization. With the help of a stereovision system, 3D positions of the fingertips are recovered by an efficient algorithm based on skin color detection and geometry model analysis. The accuracy and effectiveness of this algorithm has been verified by two experiments performed on four video sequences with one to four moving fingers under complex background, changing lights as well as image noises. Although the geometry model used in our algorithm sometimes suffers from the inaccuracy of the structure representation for the hand and 1350 [5] [6] (a) [7] [8] (b) [9] [10] (c) [11] [12] [13] (d) Fig. 7. Illustration of the tracking results when touching the virtual screen. From (a) to (d): Tracking results for touch with one to four fingers. At each row from left to right: the first frame in each test video sequence, the tracked trajectories of the fingertips on the virtual touch screen, and the responses of multi-touch client (different colors represent different fingertip trajectories). [14] finger, it works pretty well in most cases. In fact, the tracking algorithm is so efficient that it is appropriate for real time HCI tasks. Hopefully this algorithm will be integrated into a visionbased 3D virtual multi-touch interaction system now under development in our lab. The system can track 3D positions of fingertips and recognize actions of fingers (pull, push, spin, and twist etc.) over a virtual thin-film-like touch surface. It is believed that this technology can greatly improve user experience and has broad prospects in various HCI applications. Future work includes the improvement of the geometry model. In addition, a regular web camera is more preferable than a stereovision system because of the lower cost. It would be interesting to improve the algorithm so that it can work with single camera in the future. [16] R EFERENCES [22] [1] R. Chang, F. Wang, and P. You, “A survey on the development of multi-touch technology,” in Wearable Computing Systems (APWCS), 2010 Asia-Pacific Conference on. IEEE, 2010, pp. 363–366. [2] N. Metha, “A flexible machine interface,” MA Sc. Thesis, Department of Electrical Engineering, University of Toronto, 1982. [3] J. Han, “Multi-touch interaction wall,” in ACM SIGGRAPH 2006 Emerging technologies. ACM, 2006, p. 25. [4] S. Hodges, S. Izadi, A. Butler, A. Rrustemi, and B. Buxton, “Thinsight: versatile multi-touch sensing for thin form-factor displays,” in [23] [15] [17] [18] [19] [20] [21] [24] 1351 Proceedings of the 20th annual ACM symposium on User interface software and technology. ACM, 2007, pp. 259–268. D. Wigdor, C. Forlines, P. Baudisch, J. Barnwell, and C. Shen, “Lucid touch: a see-through mobile device,” in Proceedings of the 20th annual ACM symposium on User interface software and technology. ACM, 2007, pp. 269–278. A. Butler, S. Izadi, and S. Hodges, “Sidesight: multi-touch interaction around small devices,” in Proceedings of the 21st annual ACM symposium on User interface software and technology. ACM, 2008, pp. 201–204. E. Shen, S. Tsai, H. Chu, Y. Hsu, and C. Chen, “Double-side multitouch input for mobile devices,” in Proceedings of the 27th international conference extended abstracts on Human factors in computing systems. ACM, 2009, pp. 4339–4344. J. Han, “Low-cost multi-touch sensing through frustrated total internal reflection,” in Proceedings of the 18th annual ACM symposium on User interface software and technology. ACM, 2005, pp. 115–118. J. Carey, T. Kimberley, S. Lewis, E. Auerbach, L. Dorsey, P. Rundquist, and K. Ugurbil, “Analysis of fmri and finger tracking training in subjects with chronic stroke,” Brain, vol. 125, no. 4, pp. 773–788, 2002. Á. Cassinelli, S. Perrin, and M. Ishikawa, “Smart laser-scanner for 3d human-machine interface,” in CHI’05 extended abstracts on Human factors in computing systems. ACM, 2005, pp. 1138–1139. N. Motamedi, “Hd touch: multi-touch and object sensing on a high definition lcd tv,” in CHI’08 extended abstracts on Human factors in computing systems. ACM, 2008, pp. 3069–3074. L. Chi, L. Prada Gomez, R. Ryskamp, and S. Mavinkurve, “Wearable heads-up display with integrated finger-tracking input sensor,” Jun. 19 2012, uS Patent 8,203,502. H. Ying, J. Song, X. Ren, and W. Wang, “Fingertip detection and tracking using 2d and 3d information,” in Intelligent Control and Automation, 2008. WCICA 2008. 7th World Congress on. IEEE, 2008, pp. 1149–1152. S. Kim, Y. Park, K. Lim, H. Lee, S. Kim, and S. Lee, “Fingertips detection and tracking based on active shape models and an ellipse,” in TENCON 2009-2009 IEEE Region 10 Conference. IEEE, 2009, pp. 1–6. T. Heap and D. Hogg, “Towards 3d hand tracking using a deformable model,” in Automatic Face and Gesture Recognition, 1996., Proceedings of the Second International Conference on. IEEE, 1996, pp. 140–145. Y. Sato, Y. Kobayashi, and H. Koike, “Fast tracking of hands and fingertips in infrared images for augmented desk interface,” in Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 2000, pp. 462–467. S. Dominguez, T. Keaton, and A. Sayed, “Robust finger tracking for wearable computer interfacing,” in Proceedings of the 2001 workshop on Perceptive user interfaces. ACM, 2001, pp. 1–5. Z. Mo and U. Neumann, “Real-time hand pose recognition using lowresolution depth images,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2. Ieee, 2006, pp. 1499–1505. H. Koike, Y. Sato, and Y. Kobayashi, “Integrating paper and digital information on enhancedesk: A method for realtime finger tracking on an augmented desk system,” ACM Transactions on Computer-Human Interaction, vol. 8, no. 4, pp. 307–322, 2001. D. Schlegel, A. Chen, C. Xiong, J. Delmerico, and J. Corso, “Airtouch: Interacting with computer systems at a distance,” in Applications of Computer Vision (WACV), 2011 IEEE Workshop on. IEEE, 2011, pp. 1–8. R. Hsu, M. Abdel-Mottaleb, and A. Jain, “Face detection in color images,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 5, pp. 696–706, 2002. V. Vezhnevets, V. Sazonov, and A. Andreeva, “A survey on pixel-based skin color detection techniques,” in Proc. Graphicon, vol. 3. Moscow, Russia, 2003. S. Suzuki et al., “Topological structural analysis of digitized binary images by border following,” Computer Vision, Graphics, and Image Processing, vol. 30, no. 1, pp. 32–46, 1985. M. Kaltenbrunner, T. Bovermann, R. Bencina, and E. Costanza, “Tuio: A protocol for table-top tangible user interfaces,” in Proc. of the The 6th Intl Workshop on Gesture in Human-Computer Interaction and Simulation, 2005.