Interaction with Pioneer Robot by Voice and Vision Team 5 陳彥璋(B95902021) 涂宗瑋(b95902052) 黃雨喬(B95902066) 陳冠瑋(B95902094) Abstract –Since we have learned a lot from the direction of the whole robot. So it can rotate at the Robotics course instructed by prof. Lichen Fu this original place. On the other hand, human can also semester, we want to implement some interesting interact with Pioneer P3-DX by some basic gestures. interaction with a real robot. This paper will To make the interaction more interesting, we have present our final project which we have written plotted some sapid situations and little games for user some applications for controlling the Pioneer to play with the robot. In the next section we briefly describing how P3-DX by voice, vision and gestures. we control the Pioneer P3-DX and doing obstacle I. INTRODUCTION avoidance. In section III we show how we can control the robot by voice and how the robot can make Here we use the Pioneer P3-DX to perform a reactions through speaker or some others. In section series of controls and interactions by voice, vision and IV we develop a simple program for controlling the gestures. The Pioneer P3-DX is one kind of mobile robot by basic gestures. And the conclusion is in robot that provides a platform for various research section V. purposes manufactured by MobileRobots Inc. (ActivMedia Robotics). II. MOBILE ROBOT CONTROL To interact with Pioneer P3-DX, first we should install a camera and a microphone on it, which is In this project, we use Pioneer P3-DX as the responsible for its visual perception and audio platform of mobile robots. Basically, Pioneer P3-DX perception, and for speaking we also need a speaker, has eight ultrasonic transducer sensors arranged to since the pioneer is just a platform for doing some provide 180-degree forward coverage. The sensors movement, the simplest way to do this is to put a read ranges from 15cm to approximately 5m.[1] laptop on it and install all the necessary devices and software on the laptop. After setting up all the peripherals and software, we have to design the A. Set up the Pioneer P3-DX First, we should install “ARIA” (Advanced robot’s behaviors. First, we have to let the robot do Robot Interface for Applications)[2] which is a C++ some basic movements like moving forward, library for mobile robot platform on the laptop. Then backward or turning left or right. The robot’s action install RS-232 driver which can activate our user depends on different conditions we designed. When it programs to control and communicate with Pioneer receives commands such as go forward or go P3-DX. Fig.1 shows that how we control Pioneer backward through voice, it must execute the command P3-DX. immediately. The Pioneer P3-DX we used has three wheels set up, with one of the wheels a steer wheel, and the other two are fixed wheels which controls the Fig.2 Our algorithm is simple. The mobile robot may Fig.1 B. Control the robot collide with some barriers when area U2 exists some obstacles. If we detect the obstacle in 1m in U2 area, After setting up, we start to program Pioneer check if there are obstacles in 2m in U3 area. If no, P3-DX. We declare an object called “robot”, then call make a left turn; otherwise, check if there are the function “setVel2(left_wheel_velocity, obstacles in 2m in U1 area. If no, make a right turn; right_wheel_velocity)” to control Pioneer P3-DX as otherwise, rotate itself until there are no obstacles in follows: 1m in U2 area. Besides, if there are some obstacles Go forward: setVel2(100, 100) very near U1 or U3 area when U2 does not have any Go backward: setVel2(-100, -100) barrier in 1m, correct the robot’s direction by a small Turn left: setVel2(-100, 100) turn. Turn right: setVel2(100, -100) Stop: setVel2(0, 0) D. Other technique The ultrasonic transducer sensors on Pioneer C. Obstacle avoidance We have learned the basic techniques of robot’s P3-DX receive sensing data 4 times per second. We set a timer in our program that will refresh all data movement, so we should challenge the advanced and Pioneer P3-DX received every 250ms. Also, in this important issue: “obstacle avoidance”. Although timer function, we read two files that are been written ARIA has provided this feature, we decide to by voice control program and visual control program. complete the issue without using it. We divide the The method of communication by reading/writing ultrasonic transducer sensors into three parts described files allows our mobile robot to be controlled by other as Fig.2 shows. techniques easily. III .VOICE CONTROL AND REACTION While we have a Pioneer P3-DX as our platform for us to implement functions on it, we still have no medium to receive the voice from the outer world. Therefore, we place a notebook on the Pioneer P3-DX and they commute between RS-232. Now, we predefined words and the accuracy of it. Based on it, have many kinds of devices such as microphone and the robot can determine what to do or do nothing (if amplifier to utilize. We discuss two parts in this phase. the accuracy is too low or there are no match words). The first is how to use voice to send commands to the Furthermore, we can have more than one keyword to robot. The second one is the reaction from the robot. determine what reaction should be taken. As long as Notice that the reaction includes both the mobile we have more keywords to represent a sentence, the reaction on the Pioneer P3-DX and the speech from accuracy of the speech will be more precise and near the amplifier. what we want. A. B. Voice Control At this part, the most important topic is how to Reaction Once the command is received by the software, let the computer realizing what we are talking about. the robot will react according to the command. For We use the software supplied by Industrial example, if it receives commands such as “go forward Technology Research Institute (ITRI)[3] which is 4 meters”, ”go backward 2 meters”, or even “robot, called mTTS. The mTTS has many text corpora and let’s go playing”, it will behave different kinds of use the language model to recognize the input speech, reaction and speak different sentences. But how can Fig.3 shows the diagram of the mTTS process. The the robot speak? The Microsoft SDK (Software mTTS can not only recognize Chinese words but also Development Kit)[4] provides the function to speak English words (The performance of English words from computer via the amplifier, but we first have to recognition is not good enough, so our project mainly transform the Chinese words into the words that SDK use Chinese words as our input and output speech.) can recognize. Then, we design several sentences Before doing everything, we must define what we responding to the commands. While the robot receives want to speak, and train these words into acoustic the command, it immediately speaks what we have models. After that, the software can recognize the defined before. input speech spoken by human according the acoustic models and the algorithm. IV. GENERATE COMMAND THROUGH VISION Now, we have the word that most closer to our In vision input, we use the feature matching technique to implement image recognition and gesture many commands will be almost useless (like ordering motion detection. So far our implementation is SURF the robot to come nearby then the user is just near it!). (Speed Up Robust Feature) feature oriented. SURF But if we lengthen the focus we will lose many feature is a good tool in analyzing the “interesting” gesture details so that we can’t even observe enough points as features in images and using a vector of useful features. depth 64 or 128 to describe each single feature. Another difficulty is the dilemma of whether or Features in different images are matched if the not to receive more command. Imagine the situation distance between their vectors is small enough. that, the user is asking the robot to perform a For hardware and computing power limitation, right-turn. In the robots view, the world perceived is we use only two frames in time series in feature turning left, and it is very likely that the robot detect matching. In order to amplify the displacement of some features moving left when the feature doesn’t features matched, the frames will be sampled every belong to a human, therefore generating a command specified time span. If there are enough proportion of of turning right. Of course we can always ignore the matched features performing the same information during the action performing, but that displacement, says 3/4 of them, we assert that the way doesn’t always make sense when the robot is object observed is performing some specific motion, performing wrong command and we want to cease it and later translate those motion detected back to from doing something dangerous. But maybe in the control commands. And in image recognition there presence of a safe environment, the solution will make will be pure feature comparison in static image file in more sense. database. Since the SURF is an implementation that In addition, limited by the camera, there is improves SIFT descriptor, the recognizing will be another difficulty that this approach can hardly detect scale invariant and rotation invariant. The two a motion that is too fast, for that there can be blur recognition can be processed simultaneously without which affecting the feature extraction process a lot, interfere, and they can be added together to generate and that will interrupt the motion detection. Improving more complex commands. the camera may help solving the problem, and However, there are still some difficulties to overcome. The first one is the sensitivity of the increasing the number of frames to compare may also help, but also increase the computing time. gesture motion detection module. If the detection is too sensitive, even a little movement in the detecting V.CONCLUSION area will be detected (so that even some minor This paper has presented some implementation movement of background noise will be detected, the for interacting with Pioneer like using voice and false alarm); but if it is not, then we can hardly giving gestures as input to control it. And we have the robot input command. And this problem is also do speaking some sentences so that we have more fun distance-related, which is the second problem. In the when playing with it. During the time that we worked presence of fixed focus camera, we can only on this final project, we have learned that it is really guarantee a good detection under some distance range. hard to do what this course has told us. Although it is For example, using the camera that comes with the not easy to do this, we are still interested in doing laptop we can do recognition easily, but that way some funny interaction with robots. REFERENCES [1]http://www.activrobots.com/ROBOTS/p2dx.html [2]http://robots.mobilerobots.com/wiki/ARIA [3] http://www.itri.org.tw/index.asp [4]http://www.microsoft.com/downloads/details.aspx? FamilyId=A55B6B43-E24F-4EA3-A93E-40C0EC4F 68E5&displaylang=en JOB DEFINITION We have three major parts in this project, each part we have a person in charge and we completed these parts together. 陳彥璋 is responsible for the first part: mobile robot control, 涂宗瑋 is working on the voice control part, 黃雨喬 is responsible for the vision part, and 陳冠瑋 is try to come up with some idea how we interact with the Pioneer and teammembers’ reports. integrate all