Assigning Gesture Subtlety using an Accelerometer Himanshu Sahni and Shray Bansal Introduction We consider a gesture as being subtle if it's not seen as being out of the norm by other people. So it should not be disruptive to the flow of any social event. In the context of wearable computing, subtle hand gestures play a very important role. They could be used to easily perform tasks such as looking at past notes, checking to see if you've got a new message, or even look up the meaning of a word which you just heard somebody say. It is important that such gestures must also be socially acceptable and comfortable to use. Loud and obtrusive gestures have the obvious flaw of being socially inappropriate and uncomfortable for the user. This problem is addressed in the literature from an HCI point of view where it is found that small, quiet gestures are most acceptable and comfortable. Our problem is to address this from an Artificial Intelligence standpoint. Our goal is to design a system which would be able to assign a subtlety score to any given hand gesture based upon a training set of gestures voted on subtlety by humans. More quantitative comparisons can be made from the error values and the histogram in Figure 1. Score assignment is better predicted by Euclidean kNN on an average. Dynamic time warping seems to display good results (prediction error of 0.09) for ‘Swing Phone’ on which the Euclidean approach performed relatively worse (prediction error 0.17). We took the average of the scores obtained over all the query vectors as the predicted score for that gesture. One challenge in this approach was to handle feature vectors of different sizes. For this we took the smaller of the query or training feature vector and slid it over the other to find the best fit. Here we’re assuming that the gestures were performed in approximately the same length of time. One of our test examples, ‘Window Close’ performed poorly across the board. This might be due to two reasons. Firstly, this may point to a weakness in our methodology. It is worthy to notice that ‘Window Close’ gets consistently scored next to ‘Window Open’ by all the techniques. This might be due to the fact that these two gestures have very similar net acceleration profiles and thus our NN techniques will assign it as such. In fact, we observed that the maximum number of instances appearing were of ‘Window Open’ while our algorithm searched for neighbors for ‘Window Close’. Another plausible reason is due to lack of data. ‘Window Close’ was one of the gestures we had the least data on and thus most machine learning algorithms cannot be expected to perform well. Results User Study Gesture Name Euclidean Distance One NN kNN (k = 4) Dynamic Time Warping One NN k N N ( k = 4 ) Tap Phone[6] (0.162) Tap Phone (0.162) Tap Phone (0.162) Tap Phone (0.162) Fire On[7] (0.312) Fire On (0.312) Fire On (0.312) Fire On (0.312) Draw ‘W’ (0.598) Draw ‘W’ (0.598) WindowClose[7] Draw ‘W’ (0.598) (0.382) Figure 1. Also compared are the scores assigned by the best implementations of the Euclidean Distance and Dynamic Time Warping approach with the user generated scores. The histogram below shows this comparison. Draw ‘W’[6] (0.598) Triangle (0.632) Triangle (0.632) Door Close (0.625) Analysis We contrast two approaches for assigning subtlety score. Broadly, they can be classified as being either time-dependent or time-independent depending on whether we used dynamic time warping to compute the similarity between gesture examples. Both approaches involve using one nearest neighbor and weighted k nearest neighbor techniques. In the first one, we evaluate the Euclidean distance between a query vector and the entire training set. In the one NN approach, we pick the closest feature vector in the training set and assign the query its subtlety score. In the k NN approach, we weigh the scores of the k neighbors by their distance to the query and assign the query a weighted sum of their scores. In each case, a query gesture will contain more than one query feature vector. Triangle[8] (0.632) Fire Off (0.657) Fire Off (0.657) Triangle (0.632) 0.54 Fire off[7] (0.657) Window Close (0.685) Window Close Fire Off (0.685) (0.657) 0.53 Door Close[7] (0.678) Flick Air (0.688) Door Close (0.686) Flick Air (0.688) 0.52 Flick Air[9] (0.688) Window Open (0.694) Flick Air (0.688) Window Open (0.694) Window Open[7] Orient Phone (0.703) (0.694) Absolute Error Materials and methods Window Open Orient Phone (0.694) (0.703) 0.5 Swing Phone (0.721) Orient Phone (0.703) Window Close (0.749) Multi-Finger Snap[9] (0.776) Door Close (0.746) Swing Phone (0.722) Multi-Finger Snap (0.776) 0.48 Throw Money[6] (0.805) Multi-Finger Snap Multi-Finger (0.776) Snap (0.776) Swing Phone (0.787) 0.47 Swing Phone[6] (0.885) Throw Money (0.805) Throw Money (0.805) Throw Money (0.805) Slap Phone[6] (0.891) Slap Phone (0.891) Slap Phone (0.891) Slap Phone (0.891) Door Open[7] (1.000) Door Open (1.000) Door Open (1.000) Door Open (1.000) Literature cited 0.51 Orient Phone[6] (0.703) 0.49 0 2 4 6 k Values 8 10 12 Figure 2 Shows the effect of increasing the number of neighbors on the sum of absolute errors in the prediction scores. We can see a clear minima around k = 2 to 4. Conclusions Figure 1. The authors partake in a conversation where one of them is performing a gesture to control a wearable device. Table 1. The first column lists all the gestures in the data set, both training and test, in ascending order. Next to the gesture name is the reference to the paper it was mentioned in for a gesture recognition study. Below each gesture is the normalized score the gesture received from the user study. The next four columns contrast the Dynamic Time Warping and Euclidean distance approach for characterizing distance between feature vectors. Below each gesture name is the subtlety score predicted by our system. Given the limited training set, the algorithms perform reasonably well. From Table 1, it is clear that kNNoutperforms the other approaches. In terms of qualitative results, Euclidean kNN with k set equal to 4 ranks two of the three test gestures fairly accurately but fails to rank the ‘Window Close’ gesture properly. It has a total displacement of 5 in the rankings compared to the 11 in the Dynamic Time Warping implementation. . We need to improve our implementation and try using better algorithms. One obvious limitation is just using 1 NN and we can improve that by using more nearest neighbor points. We can also improve on our method of handling time series as time signals do not behave like normal features. Scaling in time (same gesture taking longer to be performed by different users) is a severe limitation to nearest neighbors. We can try to implement more complex algorithms such as SVMs to fit our data better. [1]Kern N., Schiele B., Junker H., Lukowicz P., &Troster G. (2003). Wearable sensing to annotate meeting recordings. Journal: Personal and Ubiquitous Computing.[2]Lyons K., Skeels C., Starner T., Snoeck C. M., Wong B. A., &Ashbrook D. (2004). Augmenting conversations using dual-purpose speech. Proceedings of the 17th annual ACM symposium on User interface software and technology - UIST '04.[3] Julie R. & Stephen B. (2010). Usable gestures for mobile interfaces: evaluating social acceptability. Proceedings of the 28th international conference on Human factors in computing systems (CHI '10), 887-896.[4] Montero C. S., Alexander J., Subramanian S., Marshall M. T (2010). Would You Do That? – Understanding Social Acceptance Of Gestural Interfaces. Proceedings of the 12th international conference on Human computer interaction with mobile devices and services 275278, 2010.[5]Starner T., Auxier J., Ashbrook D., and Gandy M (2000). The Gesture Pendant: A Self-illuminating, Wearable, Infrared Computer Vision System for Home Automation Control and Medical Monitoring. In IEEE International Symposium on Wearable Computers, 87-94[6]Bailly G., Müller J., Rohs M., Wigdor D., Kratz S. (2012). ShoeSense: A New Perspective on Hand Gestures and Wearable Applications. Proceedings of the 30th international conference on Human factors in computing systems (CHI '10). [7]Deyle T., Palinko S., Poole, E.S. , Starner T.(2007). Hambone: A BioAcoustic Gesture Interface. Proceedings of the 11th IEEE International Symposium on Wearable Computers.[8]http://funf.org/inabox[9]Lementec J. &Bajcsy P (2004). Recognition of arm gestures using multiple orientation sensors: gesture classification. Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems, p 965–970.[10]Ashbrook D., Starner T. (2010). MAGIC:a motion gesture design tool. Proceedings of the 28th international conference on Human factors in computing systems(CHI ’10).