Optimal robot navigation for inspection and surveillance in electric

advertisement
Boletín IIE
octubre-diciembre-2014
Artículo de investigación
Optimal robot navigation for inspection and
surveillance in electric substations
Alberto Reyes Ballesteros1, Ángel Félix2, César González2 and Eduardo Islas Pérez1
Paper originally presented at the CIGRE International Symposium, Paris, August 2012.
Abstract
In this work we use the factored Markov Decision Process (MDP) approach as an alternative to implement optimal navigation skills under
uncertainty for a robot to inspect a substation installation. The main contribution in this work is that the optimization model is approximated using machine learning tools. We tested our technique in a virtual substation where a robot navigates over different regions and equipment of different costs. After a rapid, random exploration of the environment, for all scenarios a model can be learned and solved. The decisions obtained in each case are reasonable, guiding the robot to the regions of high reward and avoiding regions of low reward. Our approach
can be implemented using a physical robotic platform and applied to power industry for surveillance and search purposes.
1
2
164
Instituto de Investigaciones Eléctricas
ITESM México City Campus
Artículo de investigación
Summary
In this work we use the factored Markov Decision
Process (MDP) approach as an alternative to implement optimal navigation skills under uncertainty for a robot to inspect a substation installation.
The main contribution in this work is that the optimization model is approximated using machine
learning tools. We tested our technique in a virtual
substation where a robot navigates over different
regions and equipment of different costs. After a
rapid, random exploration of the environment, for
all scenarios a model can be learned and solved.
The decisions obtained in each case are reasonable,
guiding the robot to the regions of high reward and
avoiding regions of low reward. Our approach can
be implemented using a physical robotic platform
and applied to power industry for surveillance and
search purposes.
Introduction
The problem of invaders and equipment fault detection in electric substations has recently been
attacked using mobile robots carrying visible light
and infrared thermograph cameras (Rui et al.,
2010). The robot moves inside the substation and
positions for collecting images of possible critical
points. The architecture of the robot that they implemented for the motion system has two driven
wheels and two omni-directional. To perform tasks
like localization and map-building, mobile robots
estimate the position based on sensing and motion
using an artificial landmarks subsystem. For the
communication system they used wireless communication over an Ethernet channel to communicate
the robot with a host PC.
One of the most important and challenging components of a mobile robot is the navigation system, a robot must be able to optimize the use of
its resources while avoiding risky zones and reach
inspection goals, or interest locations. In general,
the solution described above is robust; however, to
navigate reliably in a real environment, autonomy
and rationality are always required.
An appropriate framework for this type of problems is based on Markov Decision Processes, which
has developed as a standard method for representing uncertainty in decision theoretic planning. In
this approach the use of factored representations (Puterman, 1994) allows to
specify the robot state space, the robot dynamics, and the utility assignment
system very compactly by using dynamic Bayesian networks and decision
trees. This specification is an approximated factored Markov Decision Model
(MDP) that can be easily solved using traditional techniques (Gerkey et al.,
2003) to obtain an optimal policy.
Factored markov decision processes
A Markov decision process (MDP) (Puterman, 1994) models a sequential
decision problem, in which a system evolves in time and is controlled by an
agent. The system dynamics is governed by a probabilistic transition function
F that maps states S and actions A to new states S´. At each time, an agent
receives a reward R that depends on the current state s and the applied action
a. Thus, they solve the problem of finding a recommendation strategy or policy
that maximizes the expected reward over time and also deals with the uncertainty on the effects of an action.
Formally, an MDP is a tuple M =< S,A,F,R >, where S is a finite set of states
{s1,..., sn}. A is a finite set of actions for all states. F: AxSxS is the state transition
function specified as a probability distribution. The probability of reaching state
s´ by performing action a in state s is written as F(a,s,s´). R:SxA
R is the reward function. R(s,a) is the reward that the agent receives if it takes action a in
state s.
For the discrete discounted infinite-horizon case with any given discount factor g, there is a policy p* that is optimal regardless of the starting state and that
satisfies the Bellman equation [2]:
V p (s) = maxa{R(s,a) + g
∑ F(a,s,s´) V
p
(s´)} (1)
s ∈S
Two methods for solving these equations and finding an optimal policy for an
MDP are: (a) dynamic programming and (b) linear programming.
In a factored MDP (Boutilier et al., 1999), the set of states is described via a
set of random variables X = {X1,...,Xn}, where each Xi takes on values in some
finite domain Dom(Xi). A state s defines a value xi ∈ Dom(Xi) for each variable
Xi. The transition model can be exponentially large if it is explicitly represented
as matrices, however, the frameworks of dynamic Bayesian networks (DBN)
(Dean and Kanazawa, 1989) and decision trees (Quinlan, 1986) give us the
tools to describe the transition model and the reward function concisely.
The MDP model can be learned from data based on a random exploration
in a simulated environment. We assume that the agent can explore the state
space and that for each state– action cycle it can receive some immediate reward. Based on this random exploration, the reward and transition functions
are induced.
165
Boletín IIE
octubre-diciembre-2014
Artículo de investigación
Optimal robot navigation using factored MDPS
In order to illustrate or technique consider the simulated environment shown
in figure 1(a). In this setting goals are represented as dark-color squares with
positive immediate reward (300), and non-desirable regions as light-color
squares with negative reward (-300). The remaining regions in the navigation
area receive a reward value of 0 (black). Rewarded regions are multi-valued
and the number of rewarded squares is also variable. The robot sensor system
included x-y position, angular orientation, and navigation bounds detection.
The possible actions in this experiment were: go forward, clockwise rotation
(right turn), counterclockwise rotation (left turn), and the null action. In order to simulate real motion, a 10% of Gaussian noise was added to the robot
effectors. According to the general algorithm presented in this paper to learn
factored models, the first stage to build a decision model is to collect samples
from random exploration. Figure 1(b) illustrates the trace of the exploration
performed.
The reward function mapped from data in a next step is represented as the
decision tree illustrated in figure 3. There, it can be observed that reward only
depends on the xy position. The reward function was obtained using Weka
(Witten, 2005). Finally, to complete the decision model, the transition func-
tion is induced from data. The transition function
is represented using a 2-step Bayesian net and it is
induced using Elvira (Elvira Consortium, 2002).
Figure 2 shows the transition function for action
Goforward, and illustrates how if the robot is located at position (s0,s0) with orientation s0 (and
it executes action Goforward) then, with a joint
probability of 0.7 it gets the position (s1,s0) and
orientation s0.
The solution of the problem is illustrated in figure
1(a) using the resulting MDP model and value iteration algorithm (Putermann, 1994). With the idea
of using a topological representation of the environment in upcoming experiments, we expressed
the policy found using a topological map (figure 4). The method successfully guides the robot to
move to a likely position with the highest reward.
For instance, assuming that the robot has an orientation s2 at the position (s3, s2), the optimal action commands the robot to turn right until it gets
orientation s1. In this new state, the robot simply
goes forward to achieve the goal. The problem with
MDP models is that they value the quality of sensor information in a sound way.
a)
b)
Figure 1. a) Continuous navigation space. b) Exploration trace.
166
Figure 2. Transition model for the Action 0: Go forward. If the robot is located at position (s0, s0) with orientation s0 then it gets the position (s1,s0)
with orientation s0 after executing action 0: Go forward. The joint probability
of this transition is 0.7.
Artículo de investigación
Figure 3. Reward model for the simulated navigation area.
Tools for robot simulation
and virtual substation design
The Player-Stage tool
Player is a network server for robot control that provides a clean and simple interface to robot sensors
and actuators over the IP network. In this tool, a
client program talks to Player over a TCP socket to
allow reading data from sensors, writing commands
to actuators, and configuring devices on the fly.
Player supports a variety of robot hardware, such
as the ActivMedia Pioneer 2 or the RWI platform,
however several other robots and many common
sensors are also supported. Player runs on Linux (PC
and embedded), Solaris and *BSD, and it is designed
to be language and platform independent. The client
program can run on any machine that has a network
connection to the robot, and it can be written in any
language that supports TCP sockets.
Player makes no assumptions about how to structure the robot control programs so that a client can
be a highly concurrent multi-threaded program or
a simple read-think-act loop. Player is also designed
to support virtually any number of clients. Any client can connect to and read sensor data from (and
even write motor commands to) any instance of
Player on any robot. Aside from distributed sensing
Figure 4. Simplified resulting policy on an equivalent discrete topological
map. Assuming that the robot has an orientation s2 at the position (s3, s2),
the optimal action commands the robot to turn right until it gets orientation
s1. In this new state, the robot simply goes forward to achieve the goal.
for control, Player can also be used for monitoring experiments. The behavior
of the server itself can also be configured on the fly.
Stage is a robot simulator that provides a virtual world populated by mobile
robots and sensors, along with various objects for the robots to sense and manipulate. Stage is designed to support research into multi-agent autonomous
systems, so it provides fairly simple, computationally cheap models of lots of
devices rather than attempting to emulate any device with great fidelity. Stage
167
Boletín IIE
octubre-diciembre-2014
Artículo de investigación
is intended to be just realistic enough to enable users to move controllers between Stage robots and real robots, while still being fast enough to simulate
large populations. Player also contains several useful ‘virtual devices’; including
some sensor pre-processing or sensor-integration algorithms to help rapidly
build powerful robot controllers. Figure 5 shows a couple of virtual objects
modeled using the Stage tool.
A more detailed description about the player-stage tool can be found in Gerkey et al., 2003.
SiDSED: A system for designing electrical
distribution substations
In the IIE we are using virtual reality tools to develop three-dimensional environments with application in the electrical sector (Islas et al., 2004). One of
these systems is the creation of virtual electrical substations to facilitate the design process of new substations (Islas et al., 2010). In this paper we are propos-
ing the application of virtual substations to prove
new control and navigation algorithms for virtual
and physical robots. These substations are designed
using different abstraction levels: the first level consists of building blocks of basic elements, such as
transformers, high-voltage circuit-breakers, lightning rods, structures, foundations, duct banks, etc.
In the second level are the building blocks that are
formed by elements of the first level, for instance,
the transformer-perch building block is composed
by a transformer, a transition perch, foundations,
and groundings. The H structure with disconnect
switches is formed by an H structure, disconnect
switches, a motoperator, foundations and grounding. Finally, the third level is formed by building
blocks in the most superior abstraction level, for
example: line bay, transformer bay, control room,
edge wall, etc.
Substations design
We can see in figure 6 an example of an H arrangement virtual substation which is the most used
configuration. It was built using mainly the elements from the third level of abstraction.
Figure 5. Virtual objects modelled using Stage: An ActivMedia Pioneer 2 virtual model (left) and a virtual 2.5D navigation
world (right).
Nowadays we have a library of almost a hundred of
building blocks in the first level of abstraction, 20
elements in the second level and 10 in the most superior and complex level of abstraction. Additionally three substations with different arrangements
were developed (H, ring and main bus). With the
building blocks library we can design many other
arrangements as well as generating fully new substation designs for electrical distribution. Therefore
we can develop different configurations in order to
prove new algorithms for control and navigation of
robots.
Experimental results
Figure 6. H Arrangement for an Electrical Substation.
168
We built a simulated robot immersed in a virtual
world using the Player Stage tool. The MDP controller was developed in matlab using a special
toolbox for dynamic programming. In this implementation states are physical locations in the substation, and the possible actions are orthogonal
movements to the right, left, up or down. The risky
zones and inspection areas are associated to an optimization function that leads the robot throughout convenient navigation paths.
Artículo de investigación
Figure 7. Virtual scaled environment. 3D substation model (left) , 2D robot approaching the high power transformer (center), 2.5D
robot approaching the high power transformer (right).
We compared this technique with the wander algorithm (similar to the one used in Rui et al., 2010)
and notice that the utility increases exponential
and progressively, and it is steady when the robot
achieves the inspection area. The wander algorithm
never reaches the maximum value for utility. See
figure 8.
Conclusions and future work
We plan to implement these features in a Pionner2
robot and extend the MDP models to deal with
partially observable environments. In this way, the
robot system will be much more effective in reasoning about the uncertainty in the belief state and
employ the extended-range sensing action when
deemed necessary. One of the original motivations
of this work has been its application to surveillance,
search and monitoring tasks in indoor radiated areas of nuclear power plants. With these techniques,
it is possible to guide robot navigation maximizing safety, minimizing risk, minimizing execution
time, all under an uncertain world.
We plan to add surveillance and inspection routines under an optimization framework using a
multiagent approach.
Some of the benefits obtained with the use of virtual environments are related mainly with cost
savings through using the virtual environments as
tools to prove control and navigation algorithms
and verifying the algorithms before downloading in
real robots.
Figure 8. Utility value function for a robot using an MDP controller and a
wander algorithm.
Bibliography
Rui G., Lei H., Yong S., Mingrui W.. A Mobile Robot for Inspection of Substation Equipments., Proc
of the 1st International Conference on Applied Robotics for the Power Industry (CARPI-2010),
Montreal, 2010.
Puterman M. Markov Decision Processes. New York: Wiley, 1994.
Gerkey B., Richard T. and Howard A. The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems. In Proceedings of the 11th International Conference on Advanced Robotics
(ICAR 2003), pages 317-323, Coimbra, Portugal, June 2003.
Boutilier C., Dean T. and Hanks S. Decision-theoretic planning: structural assumptions and computational leverage. Journal of AI Research, vol. 11, pp. 1–94, 1999.
169
Boletín IIE
octubre-diciembre-2014
Artículo de investigación
Dean T. and Kanazawa K. A model for reasoning about persistence and causation. Computational
Intelligence, vol. 5, pp. 142–150, 1989.
Quinlan J. R. Induction of decision trees. Machine Learning,
1(1):81–106, 1986.
Darwiche A. and G. M. Action networks: A framework for reasoning about actions and change under
understanding. In Proceedings of the Tenth Conf. on Uncertainty in AI, UAI 94, Seattle, WA,
USA, 1994, pp. 136–144.
Elvira Consortium. Elvira: an environment for creating and using
probabilistic graphical models. Technical report, U. de Granada,
Spain, 2002.
Islas E., Zabre E. and Pérez M. Evaluación de herramientas de hardware y software para el desarrollo
de aplicaciones de realidad virtual. Boletín IIE, vol. 28, pp. 61-67, Apr-Jun 2004.
Witten I. H. Data Mining: Practical Machine Learning Tools
and Techniques with Java Implementations, 2nd Ed. Morgan
Kaufmann, USA, 2005.
Islas E., Bahena J., Romero J. and Molina M. Design and costs estimation of electrical substations based
on three-dimensional building blocks. 6th International Symposium on Visual Computing, 2010.
170
ALBERTO REYES BALLESTEROS
[areyes@iie.org.mx]
EDUARDO ISLAS PÉREZ
[eislas@iie.org.mx]
Doctor en Ciencias de la Computación por el ITESM Cuernavaca. Maestro en Inteligencia Artificial por el LANIA y la Universidad Veracruzana. Ingeniero Mecánico Electricista por la Universidad Veracruzana. Ingresó al Instituto de Investigaciones Eléctricas (IIE) en 1990 a la División de
Estudios de Ingeniería. En 2009 realizó una estancia posdoctoral en el Instituto Superior Técnico
de la Universidad Técnica de Lisboa en Portugal. Su área de especialidad se relaciona con el desarrollo de sistemas inteligentes para el sector de energía. Su actividad principal se enfoca a procesos
de generación de electricidad con fuentes renovables y convencionales de energía, y robótica para
el sector eléctrico. Ha desarrollado tecnologías para la predicción de la generación y el apoyo a la
toma de decisiones de operación para su aplicación en el sector eléctrico. Actualmente trabaja en
predicción de la generación eólica y optimización de la compra-venta de energía mediante técnicas
de inteligencia artificial. Es autor de varios artículos nacionales e internacionales, capítulos de libro, así como de registros de derechos de autor otorgados. Ha recibido diversas distinciones nacionales e internacionales y se ha desempeñado como docente en diversas universidades a nivel local
y nacional. Ha dirigido tesis a nivel profesional y posgrado en diversas universidades mexicanas y
extranjeras. Es miembro del Sistema Nacional de Investigadores (SNI), nivel 1, y de la Sociedad
Mexicana de Inteligencia Artificial (SMIA).
Maestro en Ciencias Computacionales con especialidad en Inteligencia Artificial por la Facultad de Física de la Universidad
Veracruzana y el Laboratorio de Informática Avanzada en el año
2000. Ingeniero Industrial en Eléctrica por el Instituto Tecnológico de Pachuca en 1992. Realizó una estancia en la Universidad
de Auburn en Alabama para el desarrollo de su tesis de Maestría
en el año 2000.
Download