BARE-HAND TRACKING BASED ON THE NEW DAD OBJECT

I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Published Online August 2012 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2012.09.01 Difference of the Absolute Differences – A New Method for Motion Detection Khalid Youssef, Peng-Yung Woo Department of Electrical Engineering, Northern Illinois University, Dekalb, IL 60115, USA E-mail: pwoo@niu.edu Abstract— This article presents a new method, which reduces costs and processing time for spatial object motion detection by focusing on the bare-hand motion that mimics computer mouse functions to allow the user to move the mouse pointer in real-time by the motion of his/her hand without any gloves worn, any object carried, or any key hit. In this article, the study of this topic is from the viewpoint of computer vision and image processing. The principals of the difference of the absolute differences (DAD) are investigated. A new method based on the DAD principles, which is conceptually different from all the existing approaches to spatial object motion detection, is developed and applied successfully to the bare-hand motion. The realtime implementation of the bare-hand motion detection demonstrates the accuracy and efficiency of the DAD method. Index Terms— Bare-Hand Motion, Computer Vision, Difference of the Absolute Differences Method, Human-Machine Interaction, Image Processing, Spatial Object Motion Detection I. Introduction Motion detection and further tracking of general spatial articulated objects including the human body and limbs is a research area that has been attracting more and more attention. Recent literature reveals a lot of studies in this area including the proposed wearable target method [1]-[2], the sensor method [3], the electrostatic field method [4], the multiple-camera method [5]-[7] and so on. Hand motion detection has become a key topic in the research area of spatial object motion due to its potential in human-machine interaction. In this study, a hand gesture is defined as a dynamic movement referring to a sequence of hand postures over a short time span. A hand posture refers to a static hand pose without any involvement of movements. The hand gesture recognition process is realized by “building up out of a group of hand postures in various ways of composition” [8]. According to this definition, hand motion detection can be considered as a hand gesture recognition process. Hardenberg and Bedard [9] have traced the research on vision-based hand gesture recognition and tracking Copyright © 2012 MECS back to 1991. They claim that up to 2001, the time of their publication, there has not been any dominating hand motion detection and tracking method. Furthermore, it is claimed that most of the proposed systems have problems in case of lighting condition and background clutter changes. Also, none of the presented work provides a robust motion detection and tracking method for a rapid hand motion. Triesch and Malsburg [10] use wavelets. Though robust results in classifying hand postures against complex backgrounds are claimed, the calculations proposed in their approach cannot be performed in real-time. Ware and Balakrishnan [11] use color segmentation. However, the user has to wear special colored gloves. Sato et al. [12] use infrared segmentation with expensive hardware equipments. Segen [13] uses contours. Restrictive background conditions are required for their approach. Laptev and Lindeberg [14] use Blob-models. This approach requires an explicit setup stage before starting the tracking. Crowley et al. [15], and O'Hagan and Zelinsky [16] use correlation, but a maximum speed of the hand motions is set. Recently, Garg et al. [8] provide a review on the state-of-the-art hand motion detection and tracking studies. None of the existing approaches is believed to solve the problems raised in [9]. Stenger et al. [17] propose an approach formulated within a Bayesian framework, which is extremely computationally expensive. Bretzner et al. [18], Sanchez-Nietsen et al. [19], and Stenger [20] propose the discernment of the skin color of the user. The drawback of this approach is the mistaken discernment of the skin due to other skincolor-like objects as well as lighting condition changes. Lienhart and Maydt [21], Barczak and Dadgostar [22], Chen et al. [23], and Wang and Wang [24] work on local invariant features. Though methods based on this seem promising, they are computationally costly which leads to expensive equipments. A bare-hand human-machine interaction that is essentially composed of the bare-hand motion detection and tracking is important in many areas. The key advantages of bare-hand interaction in the areas such as virtual environment, smart surveillance and medical systems are, just to mention a few, the elimination of physical contact, the reduction of space occupation, the increase in device durability, the reduction of equipment cost, the expansion of user applications and so on. Several recent publications on I.J. Intelligent Systems and Applications, 2012, 9, 1-14 2 Difference of the Absolute Differences – A New Method for Motion Detection bare-hand interaction are focusing on its use in video games [7], [25]-[29]. Video game giants such as Nintendo, Sony’s Playstation, and Microsoft’s Xbox are using this technology as the basis of the new trend in video games. With the development of the computer technology and applications, the current facilities for humancomputer interaction such as the keyboard, mouse, and pen do not meet the increasing demands [8]. Evidently, the human hands used directly as an input device without any gloves worn, any object carried, or any key hit, in particular, provide natural human-computer interaction, and in many cases are much more practical than the conventional input devices. The bare-hand interaction here is essentially composed of the barehand motion detection as well as the virtual clicks that are the results of hand gestures. In the bare-hand motion the user moves the mouse pointer, and in the virtual clicks the user performs the similar functions as those performed by the right and/or left clicks of a conventional computer mouse, both in real-time by his/her different hand gestures. As well-known, an appealing bare-hand motion detection method must satisfy two main requirements, i.e., the ability to run in real-time and the affordability in the sense of less expensive equipments. If these requirements were not satisfied, the method would not be practical or accessible to the average users. There have been some methods proposed that satisfy either one of the two requirements. However, the challenge is to satisfy both of them. All the existing bare-hand motion detection methods are systematically based on the same approach that is to track the hand motion by using consecutive hand postures. Each method tries to achieve hand posture recognition in a different way. A new approach to spatial object motion detection is proposed in this article, and a new method based on this approach is developed. Instead of achieving object posture recognition, the proposed approach is to detect the motions of the object by determining the motion direction directly without relying on the positions. The developed method encompasses an innovative technique that permits vast enhancements in the processing speed, detection accuracy, and performance robustness by transforming 2D image processing to 1D signal processing. Also, it is extremely cost effective, since it can operate with only one commercial video camera without any additional sensor or other equipment. To overcome the deficiencies in the existing approaches to the bare-hand motion detection, the developed method is applied to a system design with satisfactory results. The organization of the rest of the article is as follows. The new concepts and theory of the developed method is given in Section II. The graphic tool that works for the new concepts is presented in Section III. In Section IV, the system design of a bare-hand motion detection application based on the developed method is Copyright © 2012 MECS described. Experimental results are presented in Section V, followed by conclusions and observations in Section VI. II. The Method of the Difference of the Absolute Differences A. The difference of the absolute differences v.s. the sum of the absolute differences The prevailing way of thinking in object motion detection is to determine the positions of an object at different time instances. The position of the object at a certain time instant can be determined from one frame alone, though, at most of the time, two or more consecutive frames are also used. Once the positions of the object are determined at two time instances, the motion direction between the two time instances can be obtained. However, in a lot of practical object motion detection applications, the exact position of the object is not a matter of concern. For example, the actual physical position of a regular computer mouse on the table is irrelevant to the operation of the computer. It is the change of the mouse positions that directs the pointer on the computer screen. Thus, the key idea in the developed method is to obtain the motion direction directly without determining the positions of the object. The information loss of the exact object position in this method is worthily compensated for by the enhancements in the processing speed, detection accuracy, and performance robustness. The developed method is named the Difference of the Absolute Differences (DAD), which requires three consecutive frames to determine the motion direction of an object in a certain direction. In fact, this is similar to working with the second-order derivative of a function. In image processing, the method of the Sum of the Absolute Differences (SAD) is widely used for motion detection and video compression [30]-[32]. The SAD method basically performs a pixel-by-pixel comparison between two frames by the calculation of the absolute difference between pixels. This process is repeated for multiple consecutive frames, after which the absolute differences are summed up to create the final SAD 2D matrix. In a typical motion detection application, a number of consecutive frames are compared to a background frame with each frame divided into a number of smaller blocks, and the exceeding of a set threshold in a certain location of the SAD matrix indicates some motion in the corresponding block in the background frame. Though the SAD method is effective for motion detection, it performs poorly when used in a real-time application such as the hand motion detection to determine the position of the hand because of many factors, namely, the dynamic background, high noise, and other moving objects. Some important information such as the motion direction is lost in the process of creating the SAD matrix, regardless of the number of the frames used. Furthermore, for robust position detection, a very high frame rate is needed I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection when using the SAD method, which requires expensive cameras and leads to exhaustive computation, which, in turn, prevents the application from running in real- 3 time on personal computers. The SAD and DAD images of a hand are shown in Fig. 1. Fig.1 (a) Ordinary image (b) SAD image (c) DAD image B The concept of the difference of the absolute differences Consider a gray-scaled digital image, where each pixel in the image varies between 0 (white) and 255 (black). l(t) is the luminance function of a pixel with respect to time. Assuming the time difference between two consecutive frames is a unity, the luminance difference is then effectively the first-order derivative of l(t) between the two frames. Fig. 2 shows the luminance function l(t) of a certain pixel over an interval of 200 frames as well as l’(t), g(t) and f(t), which are defined by (1)-(3), respectively. Fig.2 Luminance function l(t) of a pixel and l’(t) =dl/dt, g(t)=|dl/dt|, and f(t)=dg/dt l (t  t )  l (t ) t 0 t l t   dl / dt  lim l t  t   l t  t 0 t g t   dl / dt  lim f t   dg dt  lim t 0 g (t  t )  g (t ) t (1) (2) (3) The function l(t) indicates the luminance in the pixel, where a steady value corresponds to an absence of motion, and an abruptly changing value corresponds to a motion activity. Taking the first-order derivative of l(t) locates its peaks and valleys (local maximums and minimums), which correspond to the abrupt changes in luminance, i.e., the motion activities. The first-order derivative, however, cannot determine the motion direction, since it is unable to differentiate two possible scenarios that might cause abrupt changes in the luminance in a pixel, i.e., an object moving toward and Copyright © 2012 MECS covering the pixel so that the pixel value now corresponds to the object luminosity, and an object moving away and not covering the pixel anymore so that the pixel value now corresponds to the background luminosity. It can be determined from l’(t) that at the location where a motion activity takes place, there is either an abrupt change from a high luminosity to a low luminosity, or vice versa. Since the relation between the object luminosity and the background luminosity is not constrained, i.e., the background can have either a higher or a lower luminosity compared to the object, l’(t) cannot determine whether the motion activity corresponds to the first or the second scenario. To overcome this difficulty, f(t) is used. Since l’(t) determines the locations of the moving object, taking the derivative of l’(t), which is the second-order derivative of l(t), gives the change in the locations of the moving object. Furthermore, since the locations of the moving object are not related to the sign of l’(t), the derivative of the absolute values of l’(t) are then used instead, which is preferred for practical reasons I.J. Intelligent Systems and Applications, 2012, 9, 1-14 4 Difference of the Absolute Differences – A New Method for Motion Detection such as keeping the values of l’(t) in the original range of 0 to 255. The luminance functions of 120 pixels in a full row over an interval of 200 frames are shown in Fig. 3. The functions f(t) of the same pixels over the same interval are shown in Fig. 4. In Fig. 3 and Fig. 4, the 2D graphs on the right are the top view of the 3D graphs. A plot of the values of function f(t) of all pixels in one frame (i.e., at a specific time or when t is a constant) is shown in Fig. 5, which shows the relation between motion activities in adjacent pixels in one frame. This enables the determination of the motion direction at that instant of time. Fig. 3 Luminance functions l(t) of pixels in a full row Fig. 4 Function f(t) of pixels in a full row Fig. 5 Values of function f(t) of all pixels in one frame Copyright © 2012 MECS I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection C. The generalization of the concept of the difference of the absolute differences The concepts above can be generalized to regions composed of more than one row. For example, to approximate the motion of a whole frame in the horizontal direction, the f(t) of each pixel is first calculated, and then summations along the columns in the vertical direction are performed. This is clarified further in Fig. 6, where frame d is obtained by calculating the absolute difference l’(t) between frame b and frame a, while frame e the absolute difference between frame c and frame b. Finally, frame f is obtained by calculating the difference between frame e and frame d. The pixel values in the frames are displayed in tables under them. Note that in the absolute difference frames d and e, there are two types of pixels. The pixels with a zero value indicate that there has been no change in the corresponding pixels between the two frames, while the pixels with a positive value indicate a change. On the other hand, there are three types of pixels in the DAD frame f, namely, zero, positive and negative. A negative pixel in a DAD frame indicates that the object occupying that position in the previous frame is not occupying it anymore, i.e., moving away from that position. The opposite is true for a positive pixel. Thus, it is concluded that the motion direction is from negative pixels to positive pixels in a DAD frame. By adding all the rows in the DAD frame, i.e., performing summations along the columns in the vertical direction, into one vector as shown in Fig. 7, the sequence “–, +, –, +” indicates that the object moves from left to right in the horizontal direction. 5 Fig. 7 DAD superposition D. Mathematical approximation for computation efficiency In order to increase the computation efficiency, some mathematical approximation can be taken to simplify the calculation of the difference of the absolute differences for the consecutive frames as described in Fig. 6 and Fig. 7. As is to be shown, when some reasonable conditions are applied, the rows in the original frames a, b and c can be summed into vectors first, respectively, and then the absolute differences and therefore the difference are calculated in one dimension rather than performing calculations in two dimensions first and then the summation. This shuffling of steps enhances the total processing speed. Two assumptions are made. First, the color of the moving object is assumed to be almost uniform, i.e., pixels representing the object have about same values. Second, since the size of the object is small relative to that of the different regions of the background, the color of each region of the background is also assumed to be almost uniform. In bare-hand motion detection, the first assumption is evidently always true, while the second is true for most cases. The entries of the image matrices obtained from consecutive frames are mainly composed of two parts that represent the background and the moving object, respectively. For the frame rates that are high enough, the background is almost unchanged between two consecutive frames, and thus the pixels and therefore the matrix entries representing the background are almost identical. Similarly, the pixels and therefore the matrix entries representing the moving object are also almost unchanged, though undergo a position shift. The above results presents the fact that the entries of two consecutive matrices are identical everywhere except at the location of the moving object. Mathematically, for the matrix entries of the two consecutive images, we must have either bij=aij or bij=ai+y,j+x, where aij and bij are the entries of the image matrices A and B, respectively. With the conditions above, the approximation of the difference of the absolute differences for the consecutive frames can be achieved. Without loss of generality, this is illustrated mathematically by using 3  3 image matrices, where entries h are the pixels of the moving object, and entries d the pixels of the background. While the object is moving from the left to the right in the horizontal direction, the three consecutive image matrices are, respectively Fig. 6 Method of the Difference of Absolute Differences (DAD) Copyright © 2012 MECS  a11 A  a21 a31 a12 a22 a32 a13  d d a23    h d a33   h d d d  , d  (4) I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection 6 b11 b12 B  b21 b22 b31 b32 b13  d b23   d b33  d d d h d  , h d   c11 C  c21 c31 c13  d c23   d c33  d d d d c12 c22 c32 d h . h  (5)  0     0 i   0  (6)  0     d  h i  d  h where the second steps in the equations are based on the two aforementioned assumptions. The DAD operation done in Fig. 6 and the summation done in Fig. 7 finally give gij, which are the vector entries in Fig. 7 by using (7).   g ij   cij  bij  bij  aij 0 d h d h   2 d  h 0 0 0 0   0   hd d h h  d   d  h 0 hd hd 0   0   0  0   hd  h  d  0 2hd  On the other hand, let us calculate a function (7) g ij* as shown in (8). i g ij*   cij   bij   bij   aij i i  3d 3d i i d  2h  3d d  2h 3d   3d d  2h 3d   d  2h 3d 3d   0 2d  h  2h  d   2d  h  2h  d  0   2 d  h 0 2hd  (8) As seen, g ij* = g ij when the two reasonable assumptions are imposed. This indicates that (8), in which the absolute differences and therefore the difference are calculated in one dimension so that the total processing speed is enhanced, can be used to replace (7) to obtain the same results as shown in Fig. 7. In practice, even if the second assumption is not completely true, it does not have a noticeable effect on the performance of the system. Noise elimination and thresholding techniques further reduce the adverse effect. For some rarely happening severe occasions, e.g., the object is in a transition between two regions with high brightness difference, the approximation undergoes an elimination effect, and the corresponding motion step is ignored without having a noticeable effect on the overall detecting performance. On the other hand, depending on the nature of the application, there is always the option of not using the approximation at the cost of a slower processing speed. It is worth mentioning that even with the speed reduction due to no use of the approximate calculation (8), the DAD method is still much faster than the other existing methods. III. The Dad Diagram And the Dad Plot A. Introduction of the DAD diagram and the DAD plot Copyright © 2012 MECS Fig. 8 DAD diagram and DAD plot The DAD diagram used to study the behavior of a moving object is developed as a graphic tool. It is a graphical illustration that corresponds to (8). The detailed illustration of the basic DAD diagram is shown in Fig. 8. As seen, the DAD diagram is essentially generated from three consecutive frames. In the DAD diagram, the next frame is superimposed on the previous one and lines are extended from the boundaries of the object to obtain plots 1, 2 and 3. The values in plots 1, 2 and 3 are obtained by adding all the rows of frames 1, 2 and 3, respectively. In this basic DAD diagram, a threshold of value one is applied to I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection plots 1, 2 and 3, where the pixel values of the background and the moving object are assumed to be zero and one, respectively. The DAD plot is the difference between plots 5 and 4, which, in turn, are the absolute difference between plots 2 and 1, and that between plots 3 and 2, respectively. The obtained DAD 7 plot helps a 1D study of the motion of an object on a certain axis of the direction. Fig. 9 gives some examples of the DAD diagrams. As seen, different number of overlaps in the three consecutive frames represents different patterns of the object motion, which present different number of peaks in their DAD plots. Fig. 9 Examples of the DAD diagram The values of the variables defined in a DAD plot characterize the different motion patterns of the object in concern and become the key to an accurate motion detection. These variables are defined in Fig. 10, where there is the maximum number of peaks due to two overlaps of the three consecutive frames. Any other motion patterns that have only one or no overlaps give less than four peaks. If a DAD plot has more than four peaks, it indicates there is more than one object moving on the axis of the direction that corresponds to the DAD plot. For a four-peak DAD plot with a threshold applied there are ten main variables of interest. The variables p1, p2, p3 and p4 give the polarities of their corresponding peaks. The polarity is positive if the peak is above zero, while the polarity is negative if the peak is below zero. The variables w1, w2, w3 and w4 are the measures of the width of the corresponding peaks. The variable d1 measures the length between the beginning of the signal and the beginning of the first peak. The variable d2 measures the length between the first shift from nonzero to zero that follows the first change in polarity and the second shift from zero to nonzero that precedes the second change in polarity. The variables Wa, Wb and D are three composite variables that can be obtained from a combination of two or more main variables. B. The use of the DAD plot variables The variables obtained from the DAD plots are used to find the unique motion patterns of the object in concern, e.g., 1) Wa and Wb should have similar or close values. 2) d2 should be less than Wa or Wb. 3) Wa and Wb should be in a certain range corresponding to the size range of the object in concern. Any signal that does not have such features is usually just noise. Copyright © 2012 MECS Fig. 10 Variables defined in a DAD plot A complete presentation of all the possible motion patterns, not including those special cases of complete overlapping between frames, is given in Fig. 11, where column one and column six give valid motion patterns that are motions with no direction change. As seen, for a left-to-right motion there are four possible peak polarity sequences, namely, “-, +”, “-, -, +”, “-, +, +” and “-, +, -, +”, while for a right-to-left motion there are another four, namely, “+, -”, “+, -, -”, “+, +, -” and “+, -, +, -”. Evidently, the right-to-left motion patterns give a reversed order of the peak polarities as those for the left-to-right motion patterns. On the other hand, columns 2, 3, 4 and 5 in Fig. 11 present all the possible motion patterns with direction changes, which are regarded as invalid. Furthermore, in the case of twopeak DAD plots, some of the invalid motion patterns have the same polarity sequence as that of a valid motion pattern. This strongly suggests that in this case, the polarity sequence alone cannot differentiate between the valid and invalid motion patterns. Thus, the width of the peaks as well as the length between the two peaks is used in the determination of the validity of those motion patterns that are described by two-peak DAD plots. It is seen from the figure that those valid motion patterns that are described by two-peak DAD plots have the features 1) w1= w2  w 2) d2 ≥ w. The special cases with complete overlapping between frames have unique polarity sequences that separate them from valid motion patterns. They are very unlikely to occur in practice though. I.J. Intelligent Systems and Applications, 2012, 9, 1-14 8 Difference of the Absolute Differences – A New Method for Motion Detection Fig. 11 Motion patterns and their DAD plots, not including special cases with complete overlapping between frames C. The variations of the DAD diagram There are variations for the DAD diagram. For example, removing the aforementioned threshold would preserve more information, and using a multisegment DAD diagram would expose more details in each motion direction. Two variations for the DAD diagram that can be used for object motion detection are shown in Fig. 12, where the left figure shows a variation in which no threshold is applied and the detecting is performed in four axes of the directions (0°, 90°, 45°, -45°), and the right figure shows another variation in which a frame is divided into sectors and the object motion in each sector is plotted separately for two axes of the directions (0°, 90°). Note that in the case of the basic DAD plots shown in Fig. 11, motion patterns can be determined manually. However, for the complex DAD diagram variations, human inspection and determination of motion patterns becomes impossible. Pattern recognition methods are used in practice. One option is to train artificial neural networks to learn all the various motion patterns characterized by the DAD plots. The details of the above procedure are out of the scope of this article. An important aspect of the DAD method is that it can be viewed as an approach that allows for a tradeoff between the processing speed and the accuracy of the object position estimation, whereas the conventional motion detection and tracking methods based on finding the exact position of an object at every frame are regarded as special cases of the DAD method such that its accuracy of the object position estimation attains the maximum in a sense. In a single-segment DAD diagram, all pixels in one direction are summed to facilitate describing the general motion in the perpendicular direction. The object position is estimated as the region enclosed by the lines perpendicular to the coordinates that locate the beginning and the end of motion activities in two directions. This is demonstrated by an example shown in Fig. 13-a, where the shaded region gives the estimated object position. Fig. 13 Object position estimation in a DAD plot Fig. 12 Variations of the DAD diagram Copyright © 2012 MECS In a multi-segment DAD diagram, the image is divided into several sectors and the motion in each sector is estimated. This gives a better accuracy of the object position estimation as shown in Fig. 13-b. In an N  N frame, if N sectors in each direction are used, the accuracy of the object position estimation is then the I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection maximum since each sector is now narrowed down to one pixel. However, other complications are to be faced in this situation and it is unnecessary to have such a level of accuracy for most of the cases. Performing an approximation by using a limited number of sectors greatly simplifies the situation and increases the processing speed. Thus, the DAD diagram provides an approach to trading-off the system complexity and the accuracy of the object position estimation. IV. Application A. Single-segment DAD diagram v.s. multi-segment DAD diagram The developed DAD method is successfully applied to a bare-hand mouse application. The basic DAD diagrams as well as their complex variations that demonstrate an advanced level of the DAD method are used individually in this practical application. The single-segment DAD diagrams perform poorly in the bare-hand motion detection, and require a pretreatment of noise reduction and filtering in 2D image processing for an acceptable performance. Though this limits the speed efficiency, the processing speed of the system based on the single-segment DAD diagrams is still faster than those of the existing hand motion detection and tracking methods. On the other hand, if enough segments are used, no 2D image processing is required and all the processing in the system is performed in one dimension. Having multiple segments in each direction greatly increases robustness against noise, which spares complicated noise reduction and filtering techniques. Hence, much higher processing speed is achieved. Also, more information about what is happening in the separate parts of the image becomes available. Since it is unnecessary to maintain the shape 9 information in individual segments, thresholding is applied. The shape information can be induced from the motion activity between the segments. This is illustrated graphically by the example in Fig. 14. Two example objects are given in Fig. 14, where a circle is in Fig. 14-a1 and a triangle is in Fig. 14-a2. Fig. 14-b1 and Fig. 14-b2 show the plots resulted from adding all the rows in Fig. 14-a1 and Fig. 14-a2 without any threshold applied to the sums, respectively. As seen, shape information of the objects is preserved in these plots. One can easily tell which plot is generated from the circle or the triangle. Fig. 14-c1 and Fig. 14-c2 show the plots resulted from the same operation as before, but with a threshold applied to the sums. In this case, the two plots look identical. One cannot tell which plot is generated from the circle or the triangle. Fig. 14-d1 and Fig. 14-d2 are obtained by dividing the images in Fig. 14-a1 and Fig. 14-a2 into four equal sectors, respectively. Suppose, without loss of generality, that both Fig. 14-a1 and Fig. 14-a2 have 120 rows. Let row 1- row 30 be sector 1, row 31- row 60 sector 2, row 61- row 90 sector 3, and row 91- row 120 sector 4. The rows in the sectors are then added to obtain four separate segments, and a threshold is applied to the sum. As seen now, even though thresholding is applied, shape information is preserved in these plots. One can easily tell which plot is generated from the circle or the triangle. This makes it possible to use segments with discrete values as shown in Fig. 12-b, rather than continuous values as shown in Fig. 12-a. It should be noted that each of the two example objects given in Fig. 14 is from one frame, while the image shown in Fig. 12-b is the DAD plots obtained from three consecutive frames. In other words, Fig. 14 serves only to explain the concept of “preservation of shape information” in multi-segment DAD plots. Fig. 14 Preservation of shape information in multiple segments Copyright © 2012 MECS I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection 10 The relation between the multi-segment DAD plots and the motion direction cannot be easily observed, since the relation between the variables in the DAD plots and the motion direction becomes rather statistical. This makes the problem a perfect candidate for intelligent pattern recognition algorithms. History information can also be used. In object motion detection in real time, variables from the previous DAD plots can be used in addition to the variables from the most recent DAD plot so as to increase the robustness and accuracy of the object motion detection. B. The DAD method applied to the bare-hand motion detection In the system designed, the bare-hand mouse moves i the horizontal and vertical directions (0°, 90°), where four segments are designed for each direction, similar to the case shown in Fig. 12-b. Experimental results indicate that four is the minimum segment number that allows the removal of 2D processing while keeping a good system performance. The frame rate reached for this four-segment system exceeds by far 30 frames/second, the standard rate of the commercial web-cameras. Higher accuracy can be achieved by using more than four segments for each direction while still keeping a real-time processing speed on average personal computers. However, with designs of more segments, more variables are introduced and therefore designs with more than ten segments for each direction are not recommended, since this greatly complicates the pattern recognition in the sense of making the processing speed unpractical. The designed system receives three consecutive 120  160 frames as its input and produces two 4  120 matrices as its output. Matrices M 1 120 ,160 , M 2 120 ,160 , 3 M 120 ,160 and give the three input frames at t-2, t-1, and t, respectively, where 120 and 160 are the numbers OH OV 4 ,120 120 , 4 of rows and columns. Matrices and give the outputs. The overall input/output relation is described by equations (9)-(21), where the matrices and their corresponding matrix entries are denoted by upper-case and their corresponding lower-case letters, ~ k ,s respectively. Four row summated vectors H 1,160 for horizontal motion detection and four column summated ~ k ,s vectors V 120 ,1 for vertical motion detection are calculated first, respectively, for the four separate equal segments indexed by s (s=1, 2, 3, 4) in the two directions of the three incoming frames indexed by k ~ k ,s ~ k ,s H 4k,160 whose entries hsk, j are shown in (11). So do the ~ k ,s vik,s V 120 , 1 four vectors whose entries are shown in k AH 4k,160 AV120 ,4 (12). (13) and (14) calculate and , the absolute differences of the row and column sums, respectively, for the two consecutive frames. (15) and DV DH 120 , 4 4 ,160 (17) calculate and , the differences between the absolute differences in the two directions, respectively. (16) performs a resize operation on the matrix obtained in (15) so that the resulted DV 120 , 4 has the same size as that of obtained in (17). (18) and (19) perform a moving window average low RH ~ k ,s DV 4 ,120 120 , 4 pass filter operation on and , respectively. Finally, (20) and (21) calculate the OH OV 4 ,120 120 , 4 outputs and by performing a threshold operation. The whole procedure above is illustrated in the flow diagram of the system shown in Fig. 15. The features extracted from OH 4,120 and OV120 , 4 are then forwarded as inputs to an artificial neural network (ANN). Since the objective of this article is to present the developed DAD method for object motion detection, the component units in Fig.15 are not elaborated here. ~ k ,s k s h1, j  30 i 30( s 1)1 mi , j ~ k ,s k s v i ,1  40 j 30( s 1) 1 mi. j ~ k ,s s 1, 2, 3, 4; k  1, 2, 3 s 1, 2, 3, 4; k  1, 2, 3 hsk, j  h1, j s 1, 2, 3, 4; k  1, 2, 3 ~ k ,s v i ,1 s  1, 2, 3, 4; k  1, 2, 3 vik,s  1 AH 4k,160  H 4k,160  H 4k,160 k k 1 k AV120 , 4  V120, 4  V120, 4 k  1, 2 k  1, 2 DH 4,160  AH 42,160  AH 41,160 (9) (10) (11) (12) (13) (14) (15) RH 4,120  decimate[interpolate( DH 4,160 ), 3), 4] (16) (k=1, 2, 3). The entries of H 1,160 and V 120 ,1 are given by (9) and (10), respectively. The four vectors H 1,160 RH 4,120 2 1 DV120, 4  AV120 , 4  AV120, 4 (17) for each frame are stacked together to become a matrix Copyright © 2012 MECS I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection fhi , j fvi , j [10 n1 rhi ,( j 10 n ) ] / 10 when j 10  j when j 10 [n1 rhi ,n ] / j 10  [n1 rh(i 10n), j ] / 10 when i 10  i when i 10  [n1 rhn, j ] / i ohi , j (18) ovi , j (19) 11 If fhi , j  threshold 1     1 If fhi , j  threshold  0 otherwise  (20) If fvi , j  threshold 1     1 If fvi , j  threshold  0 otherwise  (21) Fig. 15 System flow diagram C. Some observations In a multi-segment DAD diagram, the DAD plots obtained by the superposition of a number of single vectors are a measure of dimension reduction that greatly reduces the size of the source data while maintaining a unique representation for different motion patterns to a certain extent. As seen in the developed system, since four segments are used for each direction, simpler individual segments are obtained. The values of the variables extracted from the DAD plots are the features that allow the ANN to determine different motion patterns. While the working frame rate falls in the average rate range of the commercial webcams (15 frames/second –> 30 frames/second), the segments with more than two peaks indicate an overlapping of the moving hand in the two consecutive frames so that this extremely slow motion can be safely ignored. One segment from a typical four-segment DAD plot is shown in Fig. 16 as an example. In the case of a multi-segment DAD plot, d1 gives information about the location of the motion activity on one segment relative to other segments. The values of the six variables are extracted from each Copyright © 2012 MECS segment for each new frame obtained from the webcam. The variables from two previous DAD plots are also used in addition to the variables from the most recent DAD plot to enhance the performance of the ANN. As mentioned already, the values of these variables are used as the inputs of the ANN. This gives a total of 144 features (6 variables  8 segments  3 frames) to be sent to the ANN. Since this article focuses on the development of the DAD method for object motion detection, specifically in bare-hand motion detection, the details of the function of the ANN are not discussed here. Fig. 16 One segment from a typical four-segment DAD plot I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection 12 V. Experiment Results The developed object motion detection method applied to the bare-hand mouse is tested on different platforms, and its processing speed is measured. The results are listed in Table 1. The processor speed of each platform is shown as a metric of reference with an understanding that other platform specifications may influence the processing speed too. As seen, the variety of the platforms presented in Table 1 gives a good idea about the range of the processing speed to be expected in using the developed method. A number of typical processing speeds of the existing similar systems are listed in Table 2 for comparisons [2][9][33]. The developed object motion detection method is also tested in different human-computer interaction systems based on hand motion detection and tracking, including bare-hand mouse, navigation, and game control. The testing is executed in a variety of environments with different backgrounds and lighting conditions. All the tests indicate that the developed object motion detection method fulfills its task requirements in real-time, and features its robustness against noise, immunity to background changes as well as its motion detection accuracy. A video demonstration is available at the web page accompanying this work [34]. Table 1 Experimental results of the processing speed of the developed system on different platforms Platform Average desktop Average laptop Fast desktop Processor speed 2.67GHz 1.6 GHz 3.2GHz Multisegment system speed 130 frames/second 57 frames/second 195 frames/second much higher than the standard frame rates of the commercial webcams, which do not exceed 30 frames/second. Although the system built for demonstration is completely implemented in software, hardware implementation, whose processing speed is limited only by the frame rate of the camera in use, is perfectly realistic. It is worth emphasizing that the approach presented in this article is feasible to all kinds of object motion detection. With extremely high-speed specialized cameras used, processing speeds in the order of hundreds of thousands of frames/second can be reached. Some example applications at a high frame rate are bullet tracking, racing car tracking, missile tracking and so on. Future works include 3D object motion detection and tracking. One option is to use two cameras that are positioned in a way that one of them detects the moving object in the y-z plane, while the other detects it in the x-z plane. The other option is to use one camera only with the introduction of some basic variations in the DAD diagram. As noticed, the DAD diagrams used in this article serve for the object motion detection in a plane parallel to the camera. 3D motion detection implies that the information of the depth of the object motion is needed. Different patterns obtained from the resulted DAD plots identify the motion in the depth direction. One example case is shown in Fig. 17, where the resulted DAD plot presents a pattern that features uniquely a motion in the depth direction. Table 2 Typical processing speeds of the existing similar systems during years 1994 – 2009 Reference Year Processor speed Reported speed Wang & Popovic 2009 2.4GHz 10 frames/second Hardenberg & Bedard 2001 Not specified 20 - 25 frames/second Rehg & Knade 1994 Dedicated Hardware 10 - 15 frames/second VI. Conclusions And Future Work A new approach to object motion detection, which is based on the DAD method, is proposed in this article. The experimental results give up to 195 frames/second for a bare-hand mouse system based on multi-segment DAD diagrams that can track a moving hand in horizontal and vertical directions. This frame rate is Copyright © 2012 MECS Fig. 17 An example DAD diagram obtained from an object motion detection in the depth direction References [1] Wang, R. Y., “Real-Time Hand-Tracking as a User Input Device”, ACM Symposium on User Interface Software and Technology (UIST), 2008 I.J. Intelligent Systems and Applications, 2012, 9, 1-14 Difference of the Absolute Differences – A New Method for Motion Detection 13 [2] Wang, R. Y. & Popovic, J., “Real-Time HandTracking with a Color Glove”, ACM Transactions on Graphics, 2009 Reality”, Proceedings of the International Workshop on Gesture and Face Recognition, 1995, Zurich [3] Milanovic, V. & Lo, W. K., “Fast and HighPrecision 3D Tracking and Position Measurement with MEMs Micromirrors”, Optical MEMs and Nanophotonics. IEEE/LEOS. 2009 [16] O'Hagan, R., & Zelinsky, A., “Finger Track - A Robust and Real-Time Gesture Interface”, Australian Joint Conference on Artificial Intelligence. 1997, Perth [4] Lee, J. et al., “The 3D Sensor Table for Bare Hand Tracking and Posture Recognition”, Lecture Notes in Computer Science, Springer, 2006 [17] Stenger, B., Thayananthan, A., Torr, P. & Cipolla, R., “Model-Based Hand Tracking Using a Hierarchical Bayesian Filter”, IEEE Transactions on Pattern Analysis and Machine, pp. 1372 – 1384, 2006 [5] Campos, T. E. & Murray, D. W., “RegressionBased Hand Pose Estimation from Multiple Cameras”, Conference on Computer Vision and Pattern Recognition (CVPR), 2006 [6] Fujiyoshi, H. et al., “Fast 3D Position Measurement with Two Unsynchronized Cameras”, IEEE International Symposium on Computational Intelligence in Robotics and Automation, 2003 [7] Schlattmann, M. et al., “Real-Time Bare-Hands Tracking for 3D Games”, IADIS International Conference on Game and Entertainment Technology (GET), 2009 [8] Garg, P., Aggarwa, N. & Sofat, S., “Vision Based Hand Gesture Recognition”, Proceedings of World Academy of Science, Engineering and Technology, Vol. 49, pp. 972-977, 2009 [9] Hardenberg, C. V. & Bedard, F., “Bare Hand Human Computer Interaction”, Proceedings of the 2001 ACM Workshop on Perceptive User Interfaces, 2001 [10] Triesch, J. & Malsburg, C., “Robust Classification of Hand Postures Against Complex Background”, International Conference on Automatic Face and Gesture Recognition, 1996, Killington [11] Ware, C. & Balakrishnan, R., “Researching for Objects in VR Displays: Lag and Frame Rate”, ACM Transactions on Computer-Human Interaction, 1994 [12] Sato, Y., Kobayashi, Y. & Koike, A. H., “Fast Tracking of Hands and Fingertips in Infrared Images for Augmented Desk Interface”, IEEE International Conference on Automatic Face and Gesture Recognition, 2000 [13] Segen, J., “GestureVR: Vision-Based 3D Hand Interface for Spatial Interaction”, ACM Multimedia, 1998, Bristol [14] Laptev, I. & Lindeberg, T., “Tracking of MultiState Hand Models Using Particle Filtering and a Hierarchy of Multi-Scale”, Proceedings of the IEEE Workshop on Scale-Space and Morphology, 2001 [18] Bretzner, L., Laptev, I. & Lindeberg, T., “Hand Gesture Recognition Using Multi-scale Color Features Hierarchichal Models and Particle Filtering”, Proceedings of the International Conference on Automatic Face and Gesture Recognition, 2002, Washington D.C. [19] Sánchez-Nielsen, E., Antón-Canalí s, L. & Hernández-Tejera, M., “Hand Gesture Recognition for Human Machine Interaction”, 12th International Conference on Computer Graphics, Visualization and Computer Vision (WSCG), 2004 [20] Stenger, B., “Template Based Hand Pose Recognition Using Multiple Cues”, Computer Vision – ACCV, Vol. 3852, pp. 551-560, 2006, Springer [21] Lienhart, R., & Maydt, J., “An Extended Set of Haar-Like Features for Rapid Object Detection”, Proceedings of the IEEE International Conference on Image Processing, 2002 [22] Barczak, A. L. & Dadgostar, F., “Real-Time Hand Tracking Using a Set of Co-Operative Classifiers Based on Haar-Like Features”, Research Letters in the Information and Mathematical Sciences, Vol. 7, pp. 29-42, 2005 [23] Chen, Q., Georganas, N. & Petriu, E., “Real-Time Vision Based Hand Gesture Recognition Using Haar-Like Features,” Proceedings of the IEEE International Conference on Instrumentation and Measurement Technology, 2005, Warsaw [24] Wang, C. C. & Wang, K. C., “Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction”, Recent Progress in Robotics: Viable Robotic Service to Human, Vol. 370, pp. 317-329, 2009, Springer Berlin / Heidelberg [25] Kang, H., Lee, C. W. & Jung, K., “RecognitionBased Gesture Spotting in Video Games”, Pattern Recognition Letters, pp. 1701-1714, 2004 [15] Crowley, J., Bérard, F., & Coutaz, J., “Finger Tracking as an Input Device for Augmented [26] Lu, P., Chen, Y., Zeng, X. & Wang, Y., “A Vision Based Game Control Method”, Computer Vision in Human-Computer Interaction, Vol. 3766, pp. 70-78, 2005, Springer Berlin / Heidelberg Copyright © 2012 MECS I.J. Intelligent Systems and Applications, 2012, 9, 1-14 14 Difference of the Absolute Differences – A New Method for Motion Detection [27] Jong-Hyun, Y., Park, J.-S. & Sung, M. Y., “Vision-Based Bare-Hand Gesture Interface for Interactive Augmented Reality Applications”, Entertainment Computing - ICEC 2006, Vol. 4161, pp. 386-389, Springer Berlin / Heidelberg, 2006 [28] Park, H. S., Jung, D. J. & Kim, H. J., “VisionBased Game Interface Using Human Gesture”, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2006 [29] Song, P., Yu, H. & Winkler, S., “Vision-Based 3D Finger Interactions for Mixed Reality Games with Physics Simulation”, Proceedings of The 7th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry, 2008 Illinois University in 1988 and is currently a Professor of Electrical Engineering. His current research interests include intelligent control, robotics, digital signal processing, neural networks, fuzzy systems and digital image processing. He has authored and co-authored over 50 international journal articles and book sections as well as over 60 international conference papers. He is a senior member of the IEEE and the member of the IASTED Standing Technical Committee. He has been the member of many International Program Committees for various IEEE and IASTED International Conferences. He is currently also the Advisory Professor of Tongji University, China. [30] Vassiliadis, S., Wong, S., Hakkennes, E., Wong, J. S. & Pechanek, G., “The Sum-Absolute Difference Motion Estimation Accelerator”, Proceedings of the IEEE 24th Euromicro Conference, 1998 [31] Vanne, J., Aho, E., Hamalainen, T. & Kuusilinna, K., “A High-Performance Sum of Absolute Difference Implementation for Motion Estimation”, IEEE Transactions on Circuits and Systems for Video Technology, pp. 876-883, 2006 [32] Rehman, S., Young, R., Chatwin, C. & Birch, P., “An FPGA Based Generic Framework for High Speed Sum of Absolute Difference Implementation”, European Journal of Scientific Research, pp. 6-29, 2009 [33] Rehg, J. & Knade, T., “Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking”, Third European Conference on Computer Vision, pp. 35-46, Stockholm, 1994 [34] http://sites.google.com/site/kyoussefsite/bhtdemo Khalid Youssef received his M.S. degree in electrical engineering from Northern Illinois University, IL, USA in 2010. During his Master’s education, he did research on robotic control, fuzzy systems, neural networks and image processing, and co-authored international journal articles and international conference papers. Currently, he is a Ph.D. candidate in electrical engineering in University of California (Los Angeles). Peng-Yung Woo received the B.S. degree in physics /electrical engineering from Fudan University, Shanghai, China in 1982 and the M.S. degree in electrical engineering from Drexel University, Philadelphia, PA, in 1983. In 1988, he received the Ph.D. degree in system engineering from the University of Pennsylvania, PA. He joined Northern Copyright © 2012 MECS I.J. Intelligent Systems and Applications, 2012, 9, 1-14

BARE-HAND TRACKING BASED ON THE NEW DAD OBJECT

Related documents

Products

Support

BARE-HAND TRACKING BASED ON THE NEW DAD OBJECT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib