finalreport

advertisement
Interaction with Pioneer Robot by Voice and Vision
Team 5
陳彥璋(B95902021) 涂宗瑋(b95902052) 黃雨喬(B95902066) 陳冠瑋(B95902094)
Abstract –Since we have learned a lot from the
direction of the whole robot. So it can rotate at the
Robotics course instructed by prof. Lichen Fu this
original place. On the other hand, human can also
semester, we want to implement some interesting
interact with Pioneer P3-DX by some basic gestures.
interaction with a real robot. This paper will
To make the interaction more interesting, we have
present our final project which we have written
plotted some sapid situations and little games for user
some applications for controlling the Pioneer
to play with the robot.
In the next section we briefly describing how
P3-DX by voice, vision and gestures.
we control the Pioneer P3-DX and doing obstacle
I. INTRODUCTION
avoidance. In section III we show how we can control
the robot by voice and how the robot can make
Here we use the Pioneer P3-DX to perform a
reactions through speaker or some others. In section
series of controls and interactions by voice, vision and
IV we develop a simple program for controlling the
gestures. The Pioneer P3-DX is one kind of mobile
robot by basic gestures. And the conclusion is in
robot that provides a platform for various research
section V.
purposes manufactured by MobileRobots Inc.
(ActivMedia Robotics).
II. MOBILE ROBOT CONTROL
To interact with Pioneer P3-DX, first we should
install a camera and a microphone on it, which is
In this project, we use Pioneer P3-DX as the
responsible for its visual perception and audio
platform of mobile robots. Basically, Pioneer P3-DX
perception, and for speaking we also need a speaker,
has eight ultrasonic transducer sensors arranged to
since the pioneer is just a platform for doing some
provide 180-degree forward coverage. The sensors
movement, the simplest way to do this is to put a
read ranges from 15cm to approximately 5m.[1]
laptop on it and install all the necessary devices and
software on the laptop. After setting up all the
peripherals and software, we have to design the
A.
Set up the Pioneer P3-DX
First, we should install “ARIA” (Advanced
robot’s behaviors. First, we have to let the robot do
Robot Interface for Applications)[2] which is a C++
some basic movements like moving forward,
library for mobile robot platform on the laptop. Then
backward or turning left or right. The robot’s action
install RS-232 driver which can activate our user
depends on different conditions we designed. When it
programs to control and communicate with Pioneer
receives commands such as go forward or go
P3-DX. Fig.1 shows that how we control Pioneer
backward through voice, it must execute the command
P3-DX.
immediately. The Pioneer P3-DX we used has three
wheels set up, with one of the wheels a steer wheel,
and the other two are fixed wheels which controls the
Fig.2
Our algorithm is simple. The mobile robot may
Fig.1
B.
Control the robot
collide with some barriers when area U2 exists some
obstacles. If we detect the obstacle in 1m in U2 area,
After setting up, we start to program Pioneer
check if there are obstacles in 2m in U3 area. If no,
P3-DX. We declare an object called “robot”, then call
make a left turn; otherwise, check if there are
the function “setVel2(left_wheel_velocity,
obstacles in 2m in U1 area. If no, make a right turn;
right_wheel_velocity)” to control Pioneer P3-DX as
otherwise, rotate itself until there are no obstacles in
follows:
1m in U2 area. Besides, if there are some obstacles

Go forward:
setVel2(100, 100)
very near U1 or U3 area when U2 does not have any

Go backward:
setVel2(-100, -100)
barrier in 1m, correct the robot’s direction by a small

Turn left:
setVel2(-100, 100)
turn.

Turn right:
setVel2(100, -100)

Stop:
setVel2(0, 0)
D.
Other technique
The ultrasonic transducer sensors on Pioneer
C.
Obstacle avoidance
We have learned the basic techniques of robot’s
P3-DX receive sensing data 4 times per second. We
set a timer in our program that will refresh all data
movement, so we should challenge the advanced and
Pioneer P3-DX received every 250ms. Also, in this
important issue: “obstacle avoidance”. Although
timer function, we read two files that are been written
ARIA has provided this feature, we decide to
by voice control program and visual control program.
complete the issue without using it. We divide the
The method of communication by reading/writing
ultrasonic transducer sensors into three parts described
files allows our mobile robot to be controlled by other
as Fig.2 shows.
techniques easily.
III .VOICE CONTROL AND REACTION
While we have a Pioneer P3-DX as our
platform for us to implement functions on it, we still
have no medium to receive the voice from the outer
world. Therefore, we place a notebook on the Pioneer
P3-DX and they commute between RS-232. Now, we
predefined words and the accuracy of it. Based on it,
have many kinds of devices such as microphone and
the robot can determine what to do or do nothing (if
amplifier to utilize. We discuss two parts in this phase.
the accuracy is too low or there are no match words).
The first is how to use voice to send commands to the
Furthermore, we can have more than one keyword to
robot. The second one is the reaction from the robot.
determine what reaction should be taken. As long as
Notice that the reaction includes both the mobile
we have more keywords to represent a sentence, the
reaction on the Pioneer P3-DX and the speech from
accuracy of the speech will be more precise and near
the amplifier.
what we want.
A.
B.
Voice Control
At this part, the most important topic is how to
Reaction
Once the command is received by the software,
let the computer realizing what we are talking about.
the robot will react according to the command. For
We use the software supplied by Industrial
example, if it receives commands such as “go forward
Technology Research Institute (ITRI)[3] which is
4 meters”, ”go backward 2 meters”, or even “robot,
called mTTS. The mTTS has many text corpora and
let’s go playing”, it will behave different kinds of
use the language model to recognize the input speech,
reaction and speak different sentences. But how can
Fig.3 shows the diagram of the mTTS process. The
the robot speak? The Microsoft SDK (Software
mTTS can not only recognize Chinese words but also
Development Kit)[4] provides the function to speak
English words (The performance of English words
from computer via the amplifier, but we first have to
recognition is not good enough, so our project mainly
transform the Chinese words into the words that SDK
use Chinese words as our input and output speech.)
can recognize. Then, we design several sentences
Before doing everything, we must define what we
responding to the commands. While the robot receives
want to speak, and train these words into acoustic
the command, it immediately speaks what we have
models. After that, the software can recognize the
defined before.
input speech spoken by human according the acoustic
models and the algorithm.
IV. GENERATE COMMAND THROUGH VISION
Now, we have the word that most closer to our
In vision input, we use the feature matching
technique to implement image recognition and gesture
many commands will be almost useless (like ordering
motion detection. So far our implementation is SURF
the robot to come nearby then the user is just near it!).
(Speed Up Robust Feature) feature oriented. SURF
But if we lengthen the focus we will lose many
feature is a good tool in analyzing the “interesting”
gesture details so that we can’t even observe enough
points as features in images and using a vector of
useful features.
depth 64 or 128 to describe each single feature.
Another difficulty is the dilemma of whether or
Features in different images are matched if the
not to receive more command. Imagine the situation
distance between their vectors is small enough.
that, the user is asking the robot to perform a
For hardware and computing power limitation,
right-turn. In the robots view, the world perceived is
we use only two frames in time series in feature
turning left, and it is very likely that the robot detect
matching. In order to amplify the displacement of
some features moving left when the feature doesn’t
features matched, the frames will be sampled every
belong to a human, therefore generating a command
specified time span. If there are enough proportion of
of turning right. Of course we can always ignore
the matched features performing the same
information during the action performing, but that
displacement, says 3/4 of them, we assert that the
way doesn’t always make sense when the robot is
object observed is performing some specific motion,
performing wrong command and we want to cease it
and later translate those motion detected back to
from doing something dangerous. But maybe in the
control commands. And in image recognition there
presence of a safe environment, the solution will make
will be pure feature comparison in static image file in
more sense.
database. Since the SURF is an implementation that
In addition, limited by the camera, there is
improves SIFT descriptor, the recognizing will be
another difficulty that this approach can hardly detect
scale invariant and rotation invariant. The two
a motion that is too fast, for that there can be blur
recognition can be processed simultaneously without
which affecting the feature extraction process a lot,
interfere, and they can be added together to generate
and that will interrupt the motion detection. Improving
more complex commands.
the camera may help solving the problem, and
However, there are still some difficulties to
overcome. The first one is the sensitivity of the
increasing the number of frames to compare may also
help, but also increase the computing time.
gesture motion detection module. If the detection is
too sensitive, even a little movement in the detecting
V.CONCLUSION
area will be detected (so that even some minor
This paper has presented some implementation
movement of background noise will be detected, the
for interacting with Pioneer like using voice and
false alarm); but if it is not, then we can hardly giving
gestures as input to control it. And we have the robot
input command. And this problem is also
do speaking some sentences so that we have more fun
distance-related, which is the second problem. In the
when playing with it. During the time that we worked
presence of fixed focus camera, we can only
on this final project, we have learned that it is really
guarantee a good detection under some distance range.
hard to do what this course has told us. Although it is
For example, using the camera that comes with the
not easy to do this, we are still interested in doing
laptop we can do recognition easily, but that way
some funny interaction with robots.
REFERENCES
[1]http://www.activrobots.com/ROBOTS/p2dx.html
[2]http://robots.mobilerobots.com/wiki/ARIA
[3] http://www.itri.org.tw/index.asp
[4]http://www.microsoft.com/downloads/details.aspx?
FamilyId=A55B6B43-E24F-4EA3-A93E-40C0EC4F
68E5&displaylang=en
JOB DEFINITION
We have three major parts in this project, each part we
have a person in charge and we completed these parts
together. 陳彥璋 is responsible for the first part:
mobile robot control, 涂宗瑋 is working on the voice
control part, 黃雨喬 is responsible for the vision part,
and 陳冠瑋 is try to come up with some idea how we
interact with the Pioneer and
teammembers’ reports.
integrate all
Download