Himanshu Sahni and Shray Bansal Introduction

advertisement
Assigning Gesture Subtlety using an Accelerometer
Himanshu Sahni and Shray Bansal
Introduction
We consider a gesture as being subtle if it's not seen as
being out of the norm by other people. So it should not be
disruptive to the flow of any social event. In the context of
wearable computing, subtle hand gestures play a very
important role. They could be used to easily perform tasks
such as looking at past notes, checking to see if you've
got a new message, or even look up the meaning of a
word which you just heard somebody say.
It is important that such gestures must also be socially
acceptable and comfortable to use. Loud and obtrusive
gestures have the obvious flaw of being socially
inappropriate and uncomfortable for the user. This
problem is addressed in the literature from an HCI point of
view where it is found that small, quiet gestures are most
acceptable and comfortable. Our problem is to address
this from an Artificial Intelligence standpoint. Our goal is to
design a system which would be able to assign a subtlety
score to any given hand gesture based upon a training set
of gestures voted on subtlety by humans.
More quantitative comparisons can be made from the error values and the
histogram in Figure 1. Score assignment is better predicted by Euclidean kNN on an
average. Dynamic time warping seems to display good results (prediction error of
0.09) for ‘Swing Phone’ on which the Euclidean approach performed relatively
worse (prediction error 0.17).
We took the average of the scores obtained over all the
query vectors as the predicted score for that gesture. One
challenge in this approach was to handle feature vectors of
different sizes. For this we took the smaller of the query or
training feature vector and slid it over the other to find the
best fit. Here we’re assuming that the gestures were
performed in approximately the same length of time.
One of our test examples, ‘Window Close’ performed poorly across the board. This
might be due to two reasons. Firstly, this may point to a weakness in our
methodology. It is worthy to notice that ‘Window Close’ gets consistently scored next
to ‘Window Open’ by all the techniques. This might be due to the fact that these two
gestures have very similar net acceleration profiles and thus our NN techniques will
assign it as such. In fact, we observed that the maximum number of instances
appearing were of ‘Window Open’ while our algorithm searched for neighbors for
‘Window Close’. Another plausible reason is due to lack of data. ‘Window Close’
was one of the gestures we had the least data on and thus most machine learning
algorithms cannot be expected to perform well.
Results
User Study
Gesture Name
Euclidean Distance
One NN
kNN (k = 4)
Dynamic Time
Warping
One NN
k
N
N
(
k
=
4
)
Tap Phone[6]
(0.162)
Tap Phone
(0.162)
Tap Phone
(0.162)
Tap Phone
(0.162)
Fire On[7]
(0.312)
Fire On
(0.312)
Fire On
(0.312)
Fire On
(0.312)
Draw ‘W’
(0.598)
Draw ‘W’
(0.598)
WindowClose[7] Draw ‘W’
(0.598)
(0.382)
Figure 1. Also compared are the scores assigned by the best implementations of
the Euclidean Distance and Dynamic Time Warping approach with the user
generated scores. The histogram below shows this comparison.
Draw ‘W’[6]
(0.598)
Triangle
(0.632)
Triangle
(0.632)
Door Close
(0.625)
Analysis
We contrast two approaches for assigning subtlety
score. Broadly, they can be classified as being either
time-dependent or time-independent depending on
whether we used dynamic time warping to compute
the similarity between gesture examples. Both
approaches involve using one nearest neighbor and
weighted k nearest neighbor techniques.
In the first one, we evaluate the Euclidean distance
between a query vector and the entire training set. In
the one NN approach, we pick the closest feature
vector in the training set and assign the query its
subtlety score. In the k NN approach, we weigh the
scores of the k neighbors by their distance to the query
and assign the query a weighted sum of their scores.
In each case, a query gesture will contain more than
one query feature vector.
Triangle[8]
(0.632)
Fire Off
(0.657)
Fire Off
(0.657)
Triangle
(0.632)
0.54
Fire off[7]
(0.657)
Window Close
(0.685)
Window Close Fire Off
(0.685)
(0.657)
0.53
Door Close[7]
(0.678)
Flick Air
(0.688)
Door Close
(0.686)
Flick Air
(0.688)
0.52
Flick Air[9]
(0.688)
Window Open
(0.694)
Flick Air
(0.688)
Window Open
(0.694)
Window Open[7] Orient Phone
(0.703)
(0.694)
Absolute Error
Materials and methods
Window Open Orient Phone
(0.694)
(0.703)
0.5
Swing Phone
(0.721)
Orient Phone
(0.703)
Window Close
(0.749)
Multi-Finger
Snap[9] (0.776)
Door Close
(0.746)
Swing Phone
(0.722)
Multi-Finger
Snap (0.776)
0.48
Throw Money[6]
(0.805)
Multi-Finger Snap Multi-Finger
(0.776)
Snap (0.776)
Swing Phone
(0.787)
0.47
Swing Phone[6]
(0.885)
Throw Money
(0.805)
Throw Money
(0.805)
Throw Money
(0.805)
Slap Phone[6]
(0.891)
Slap Phone
(0.891)
Slap Phone
(0.891)
Slap Phone
(0.891)
Door Open[7]
(1.000)
Door Open
(1.000)
Door Open
(1.000)
Door Open
(1.000)
Literature cited
0.51
Orient Phone[6]
(0.703)
0.49
0
2
4
6
k Values
8
10
12
Figure 2 Shows the effect of increasing the number of neighbors on the sum
of absolute errors in the prediction scores. We can see a clear minima around
k = 2 to 4.
Conclusions
Figure 1. The authors partake in a
conversation where one of them is
performing a gesture to control a
wearable device.
Table 1. The first column lists all the gestures in the data
set, both training and test, in ascending order. Next to
the gesture name is the reference to the paper it was
mentioned in for a gesture recognition study. Below each
gesture is the normalized score the gesture received
from the user study. The next four columns contrast the
Dynamic Time Warping and Euclidean distance
approach for characterizing distance between feature
vectors. Below each gesture name is the subtlety score
predicted by our system.
Given the limited training set, the algorithms perform reasonably
well. From Table 1, it is clear that kNNoutperforms the other
approaches. In terms of qualitative results, Euclidean kNN with k set
equal to 4 ranks two of the three test gestures fairly accurately but
fails to rank the ‘Window Close’ gesture properly. It has a total
displacement of 5 in the rankings compared to the 11 in the Dynamic
Time Warping implementation.
.
We need to improve our implementation and try using better algorithms. One
obvious limitation is just using 1 NN and we can improve that by using more
nearest neighbor points. We can also improve on our method of handling time
series as time signals do not behave like normal features. Scaling in time (same
gesture taking longer to be performed by different users) is a severe limitation to
nearest neighbors. We can try to implement more complex algorithms such as
SVMs to fit our data better.
[1]Kern N., Schiele B., Junker H., Lukowicz P., &Troster G. (2003). Wearable
sensing to annotate meeting recordings. Journal: Personal and Ubiquitous
Computing.[2]Lyons K., Skeels C., Starner T., Snoeck C. M., Wong B.
A., &Ashbrook D. (2004). Augmenting conversations using dual-purpose
speech. Proceedings of the 17th annual ACM symposium on User interface
software and technology - UIST '04.[3] Julie R. & Stephen B. (2010). Usable
gestures for mobile interfaces: evaluating social acceptability. Proceedings of
the 28th international conference on Human factors in computing systems (CHI
'10), 887-896.[4] Montero C. S., Alexander J., Subramanian S., Marshall M. T
(2010). Would You Do That? – Understanding Social Acceptance Of Gestural
Interfaces. Proceedings of the 12th international conference on Human
computer interaction with mobile devices and services 275278, 2010.[5]Starner T., Auxier J., Ashbrook D., and Gandy M (2000). The
Gesture Pendant: A Self-illuminating, Wearable, Infrared Computer Vision
System for Home Automation Control and Medical Monitoring. In IEEE
International Symposium on Wearable Computers, 87-94[6]Bailly G., Müller
J., Rohs M., Wigdor D., Kratz S. (2012). ShoeSense: A New Perspective on
Hand Gestures and Wearable Applications. Proceedings of the 30th
international conference on Human factors in computing systems (CHI '10).
[7]Deyle T., Palinko S., Poole, E.S. , Starner T.(2007). Hambone: A BioAcoustic Gesture Interface. Proceedings of the 11th IEEE International
Symposium on Wearable Computers.[8]http://funf.org/inabox[9]Lementec J.
&Bajcsy P (2004). Recognition of arm gestures using multiple orientation
sensors: gesture classification. Proceedings of the 7th International IEEE
Conference on Intelligent Transportation Systems, p 965–970.[10]Ashbrook
D., Starner T. (2010). MAGIC:a motion gesture design tool. Proceedings of the
28th international conference on Human factors in computing systems(CHI
’10).
Download