Boletín IIE octubre-diciembre-2014 Artículo de investigación Optimal robot navigation for inspection and surveillance in electric substations Alberto Reyes Ballesteros1, Ángel Félix2, César González2 and Eduardo Islas Pérez1 Paper originally presented at the CIGRE International Symposium, Paris, August 2012. Abstract In this work we use the factored Markov Decision Process (MDP) approach as an alternative to implement optimal navigation skills under uncertainty for a robot to inspect a substation installation. The main contribution in this work is that the optimization model is approximated using machine learning tools. We tested our technique in a virtual substation where a robot navigates over different regions and equipment of different costs. After a rapid, random exploration of the environment, for all scenarios a model can be learned and solved. The decisions obtained in each case are reasonable, guiding the robot to the regions of high reward and avoiding regions of low reward. Our approach can be implemented using a physical robotic platform and applied to power industry for surveillance and search purposes. 1 2 164 Instituto de Investigaciones Eléctricas ITESM México City Campus Artículo de investigación Summary In this work we use the factored Markov Decision Process (MDP) approach as an alternative to implement optimal navigation skills under uncertainty for a robot to inspect a substation installation. The main contribution in this work is that the optimization model is approximated using machine learning tools. We tested our technique in a virtual substation where a robot navigates over different regions and equipment of different costs. After a rapid, random exploration of the environment, for all scenarios a model can be learned and solved. The decisions obtained in each case are reasonable, guiding the robot to the regions of high reward and avoiding regions of low reward. Our approach can be implemented using a physical robotic platform and applied to power industry for surveillance and search purposes. Introduction The problem of invaders and equipment fault detection in electric substations has recently been attacked using mobile robots carrying visible light and infrared thermograph cameras (Rui et al., 2010). The robot moves inside the substation and positions for collecting images of possible critical points. The architecture of the robot that they implemented for the motion system has two driven wheels and two omni-directional. To perform tasks like localization and map-building, mobile robots estimate the position based on sensing and motion using an artificial landmarks subsystem. For the communication system they used wireless communication over an Ethernet channel to communicate the robot with a host PC. One of the most important and challenging components of a mobile robot is the navigation system, a robot must be able to optimize the use of its resources while avoiding risky zones and reach inspection goals, or interest locations. In general, the solution described above is robust; however, to navigate reliably in a real environment, autonomy and rationality are always required. An appropriate framework for this type of problems is based on Markov Decision Processes, which has developed as a standard method for representing uncertainty in decision theoretic planning. In this approach the use of factored representations (Puterman, 1994) allows to specify the robot state space, the robot dynamics, and the utility assignment system very compactly by using dynamic Bayesian networks and decision trees. This specification is an approximated factored Markov Decision Model (MDP) that can be easily solved using traditional techniques (Gerkey et al., 2003) to obtain an optimal policy. Factored markov decision processes A Markov decision process (MDP) (Puterman, 1994) models a sequential decision problem, in which a system evolves in time and is controlled by an agent. The system dynamics is governed by a probabilistic transition function F that maps states S and actions A to new states S´. At each time, an agent receives a reward R that depends on the current state s and the applied action a. Thus, they solve the problem of finding a recommendation strategy or policy that maximizes the expected reward over time and also deals with the uncertainty on the effects of an action. Formally, an MDP is a tuple M =< S,A,F,R >, where S is a finite set of states {s1,..., sn}. A is a finite set of actions for all states. F: AxSxS is the state transition function specified as a probability distribution. The probability of reaching state s´ by performing action a in state s is written as F(a,s,s´). R:SxA R is the reward function. R(s,a) is the reward that the agent receives if it takes action a in state s. For the discrete discounted infinite-horizon case with any given discount factor g, there is a policy p* that is optimal regardless of the starting state and that satisfies the Bellman equation [2]: V p (s) = maxa{R(s,a) + g ∑ F(a,s,s´) V p (s´)} (1) s ∈S Two methods for solving these equations and finding an optimal policy for an MDP are: (a) dynamic programming and (b) linear programming. In a factored MDP (Boutilier et al., 1999), the set of states is described via a set of random variables X = {X1,...,Xn}, where each Xi takes on values in some finite domain Dom(Xi). A state s defines a value xi ∈ Dom(Xi) for each variable Xi. The transition model can be exponentially large if it is explicitly represented as matrices, however, the frameworks of dynamic Bayesian networks (DBN) (Dean and Kanazawa, 1989) and decision trees (Quinlan, 1986) give us the tools to describe the transition model and the reward function concisely. The MDP model can be learned from data based on a random exploration in a simulated environment. We assume that the agent can explore the state space and that for each state– action cycle it can receive some immediate reward. Based on this random exploration, the reward and transition functions are induced. 165 Boletín IIE octubre-diciembre-2014 Artículo de investigación Optimal robot navigation using factored MDPS In order to illustrate or technique consider the simulated environment shown in figure 1(a). In this setting goals are represented as dark-color squares with positive immediate reward (300), and non-desirable regions as light-color squares with negative reward (-300). The remaining regions in the navigation area receive a reward value of 0 (black). Rewarded regions are multi-valued and the number of rewarded squares is also variable. The robot sensor system included x-y position, angular orientation, and navigation bounds detection. The possible actions in this experiment were: go forward, clockwise rotation (right turn), counterclockwise rotation (left turn), and the null action. In order to simulate real motion, a 10% of Gaussian noise was added to the robot effectors. According to the general algorithm presented in this paper to learn factored models, the first stage to build a decision model is to collect samples from random exploration. Figure 1(b) illustrates the trace of the exploration performed. The reward function mapped from data in a next step is represented as the decision tree illustrated in figure 3. There, it can be observed that reward only depends on the xy position. The reward function was obtained using Weka (Witten, 2005). Finally, to complete the decision model, the transition func- tion is induced from data. The transition function is represented using a 2-step Bayesian net and it is induced using Elvira (Elvira Consortium, 2002). Figure 2 shows the transition function for action Goforward, and illustrates how if the robot is located at position (s0,s0) with orientation s0 (and it executes action Goforward) then, with a joint probability of 0.7 it gets the position (s1,s0) and orientation s0. The solution of the problem is illustrated in figure 1(a) using the resulting MDP model and value iteration algorithm (Putermann, 1994). With the idea of using a topological representation of the environment in upcoming experiments, we expressed the policy found using a topological map (figure 4). The method successfully guides the robot to move to a likely position with the highest reward. For instance, assuming that the robot has an orientation s2 at the position (s3, s2), the optimal action commands the robot to turn right until it gets orientation s1. In this new state, the robot simply goes forward to achieve the goal. The problem with MDP models is that they value the quality of sensor information in a sound way. a) b) Figure 1. a) Continuous navigation space. b) Exploration trace. 166 Figure 2. Transition model for the Action 0: Go forward. If the robot is located at position (s0, s0) with orientation s0 then it gets the position (s1,s0) with orientation s0 after executing action 0: Go forward. The joint probability of this transition is 0.7. Artículo de investigación Figure 3. Reward model for the simulated navigation area. Tools for robot simulation and virtual substation design The Player-Stage tool Player is a network server for robot control that provides a clean and simple interface to robot sensors and actuators over the IP network. In this tool, a client program talks to Player over a TCP socket to allow reading data from sensors, writing commands to actuators, and configuring devices on the fly. Player supports a variety of robot hardware, such as the ActivMedia Pioneer 2 or the RWI platform, however several other robots and many common sensors are also supported. Player runs on Linux (PC and embedded), Solaris and *BSD, and it is designed to be language and platform independent. The client program can run on any machine that has a network connection to the robot, and it can be written in any language that supports TCP sockets. Player makes no assumptions about how to structure the robot control programs so that a client can be a highly concurrent multi-threaded program or a simple read-think-act loop. Player is also designed to support virtually any number of clients. Any client can connect to and read sensor data from (and even write motor commands to) any instance of Player on any robot. Aside from distributed sensing Figure 4. Simplified resulting policy on an equivalent discrete topological map. Assuming that the robot has an orientation s2 at the position (s3, s2), the optimal action commands the robot to turn right until it gets orientation s1. In this new state, the robot simply goes forward to achieve the goal. for control, Player can also be used for monitoring experiments. The behavior of the server itself can also be configured on the fly. Stage is a robot simulator that provides a virtual world populated by mobile robots and sensors, along with various objects for the robots to sense and manipulate. Stage is designed to support research into multi-agent autonomous systems, so it provides fairly simple, computationally cheap models of lots of devices rather than attempting to emulate any device with great fidelity. Stage 167 Boletín IIE octubre-diciembre-2014 Artículo de investigación is intended to be just realistic enough to enable users to move controllers between Stage robots and real robots, while still being fast enough to simulate large populations. Player also contains several useful ‘virtual devices’; including some sensor pre-processing or sensor-integration algorithms to help rapidly build powerful robot controllers. Figure 5 shows a couple of virtual objects modeled using the Stage tool. A more detailed description about the player-stage tool can be found in Gerkey et al., 2003. SiDSED: A system for designing electrical distribution substations In the IIE we are using virtual reality tools to develop three-dimensional environments with application in the electrical sector (Islas et al., 2004). One of these systems is the creation of virtual electrical substations to facilitate the design process of new substations (Islas et al., 2010). In this paper we are propos- ing the application of virtual substations to prove new control and navigation algorithms for virtual and physical robots. These substations are designed using different abstraction levels: the first level consists of building blocks of basic elements, such as transformers, high-voltage circuit-breakers, lightning rods, structures, foundations, duct banks, etc. In the second level are the building blocks that are formed by elements of the first level, for instance, the transformer-perch building block is composed by a transformer, a transition perch, foundations, and groundings. The H structure with disconnect switches is formed by an H structure, disconnect switches, a motoperator, foundations and grounding. Finally, the third level is formed by building blocks in the most superior abstraction level, for example: line bay, transformer bay, control room, edge wall, etc. Substations design We can see in figure 6 an example of an H arrangement virtual substation which is the most used configuration. It was built using mainly the elements from the third level of abstraction. Figure 5. Virtual objects modelled using Stage: An ActivMedia Pioneer 2 virtual model (left) and a virtual 2.5D navigation world (right). Nowadays we have a library of almost a hundred of building blocks in the first level of abstraction, 20 elements in the second level and 10 in the most superior and complex level of abstraction. Additionally three substations with different arrangements were developed (H, ring and main bus). With the building blocks library we can design many other arrangements as well as generating fully new substation designs for electrical distribution. Therefore we can develop different configurations in order to prove new algorithms for control and navigation of robots. Experimental results Figure 6. H Arrangement for an Electrical Substation. 168 We built a simulated robot immersed in a virtual world using the Player Stage tool. The MDP controller was developed in matlab using a special toolbox for dynamic programming. In this implementation states are physical locations in the substation, and the possible actions are orthogonal movements to the right, left, up or down. The risky zones and inspection areas are associated to an optimization function that leads the robot throughout convenient navigation paths. Artículo de investigación Figure 7. Virtual scaled environment. 3D substation model (left) , 2D robot approaching the high power transformer (center), 2.5D robot approaching the high power transformer (right). We compared this technique with the wander algorithm (similar to the one used in Rui et al., 2010) and notice that the utility increases exponential and progressively, and it is steady when the robot achieves the inspection area. The wander algorithm never reaches the maximum value for utility. See figure 8. Conclusions and future work We plan to implement these features in a Pionner2 robot and extend the MDP models to deal with partially observable environments. In this way, the robot system will be much more effective in reasoning about the uncertainty in the belief state and employ the extended-range sensing action when deemed necessary. One of the original motivations of this work has been its application to surveillance, search and monitoring tasks in indoor radiated areas of nuclear power plants. With these techniques, it is possible to guide robot navigation maximizing safety, minimizing risk, minimizing execution time, all under an uncertain world. We plan to add surveillance and inspection routines under an optimization framework using a multiagent approach. Some of the benefits obtained with the use of virtual environments are related mainly with cost savings through using the virtual environments as tools to prove control and navigation algorithms and verifying the algorithms before downloading in real robots. Figure 8. Utility value function for a robot using an MDP controller and a wander algorithm. Bibliography Rui G., Lei H., Yong S., Mingrui W.. A Mobile Robot for Inspection of Substation Equipments., Proc of the 1st International Conference on Applied Robotics for the Power Industry (CARPI-2010), Montreal, 2010. Puterman M. Markov Decision Processes. New York: Wiley, 1994. Gerkey B., Richard T. and Howard A. The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems. In Proceedings of the 11th International Conference on Advanced Robotics (ICAR 2003), pages 317-323, Coimbra, Portugal, June 2003. Boutilier C., Dean T. and Hanks S. Decision-theoretic planning: structural assumptions and computational leverage. Journal of AI Research, vol. 11, pp. 1–94, 1999. 169 Boletín IIE octubre-diciembre-2014 Artículo de investigación Dean T. and Kanazawa K. A model for reasoning about persistence and causation. Computational Intelligence, vol. 5, pp. 142–150, 1989. Quinlan J. R. Induction of decision trees. Machine Learning, 1(1):81–106, 1986. Darwiche A. and G. M. Action networks: A framework for reasoning about actions and change under understanding. In Proceedings of the Tenth Conf. on Uncertainty in AI, UAI 94, Seattle, WA, USA, 1994, pp. 136–144. Elvira Consortium. Elvira: an environment for creating and using probabilistic graphical models. Technical report, U. de Granada, Spain, 2002. Islas E., Zabre E. and Pérez M. Evaluación de herramientas de hardware y software para el desarrollo de aplicaciones de realidad virtual. Boletín IIE, vol. 28, pp. 61-67, Apr-Jun 2004. Witten I. H. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd Ed. Morgan Kaufmann, USA, 2005. Islas E., Bahena J., Romero J. and Molina M. Design and costs estimation of electrical substations based on three-dimensional building blocks. 6th International Symposium on Visual Computing, 2010. 170 ALBERTO REYES BALLESTEROS [areyes@iie.org.mx] EDUARDO ISLAS PÉREZ [eislas@iie.org.mx] Doctor en Ciencias de la Computación por el ITESM Cuernavaca. Maestro en Inteligencia Artificial por el LANIA y la Universidad Veracruzana. Ingeniero Mecánico Electricista por la Universidad Veracruzana. Ingresó al Instituto de Investigaciones Eléctricas (IIE) en 1990 a la División de Estudios de Ingeniería. En 2009 realizó una estancia posdoctoral en el Instituto Superior Técnico de la Universidad Técnica de Lisboa en Portugal. Su área de especialidad se relaciona con el desarrollo de sistemas inteligentes para el sector de energía. Su actividad principal se enfoca a procesos de generación de electricidad con fuentes renovables y convencionales de energía, y robótica para el sector eléctrico. Ha desarrollado tecnologías para la predicción de la generación y el apoyo a la toma de decisiones de operación para su aplicación en el sector eléctrico. Actualmente trabaja en predicción de la generación eólica y optimización de la compra-venta de energía mediante técnicas de inteligencia artificial. Es autor de varios artículos nacionales e internacionales, capítulos de libro, así como de registros de derechos de autor otorgados. Ha recibido diversas distinciones nacionales e internacionales y se ha desempeñado como docente en diversas universidades a nivel local y nacional. Ha dirigido tesis a nivel profesional y posgrado en diversas universidades mexicanas y extranjeras. Es miembro del Sistema Nacional de Investigadores (SNI), nivel 1, y de la Sociedad Mexicana de Inteligencia Artificial (SMIA). Maestro en Ciencias Computacionales con especialidad en Inteligencia Artificial por la Facultad de Física de la Universidad Veracruzana y el Laboratorio de Informática Avanzada en el año 2000. Ingeniero Industrial en Eléctrica por el Instituto Tecnológico de Pachuca en 1992. Realizó una estancia en la Universidad de Auburn en Alabama para el desarrollo de su tesis de Maestría en el año 2000.