Barış Akgün Georgia Institute of Technology e-mail: barisakgun@gmail.com http://www.cc.gatech.edu/~bakgun3 Research Statement Robotics is at an inflection point. On one side, the demand for robotics and automation is increasing. There is always interest in automating repetitive, boring and dangerous tasks in many different fields such as manufacturing, service, medical, food, logistics, education and defense. In addition, the world population is aging according to the UN. This will result in shortage of labor, increased need for elder care and additional economic pressure on the working class. Robotics and related technologies are promising solutions to these issues. On the other side, robots are getting more accessible. The last five years have seen an exponential growth of collaborative robots in areas of manufacturing where extreme payload capacity and extreme precision is not the main concern. These robots are safer and lower in cost than industrial robots and the demand for even lower costs and competition will make them more accessible. The increasing demand for robots and decreasing robot costs will converge and result in wide-spread adoption of robots within the next 20 years. The National Robotics Initiative announced in 2011 will further accelerate the development and the use robots in the US. Computer scientists and engineers will play a pivotal roll in allowing this robot revolution to happen by helping the robotic platforms to realize their full potential, just like the PC and smart-phone revolutions. The robot revolution will result in more people becoming robot users. Many challenges will emerge in deploying robots beyond the structured factory floors where they will encounter varied, complex and unpredictable environments. The robots will interact with different end-users with different needs and preferences. It is difficult to pre-program robots for all the scenarios they will face and as a result programming robots after deployment will be useful, if not necessary. My research aim is to enable people to program robots to do new things or customize their existing actions through Learning from Demonstration (LfD) interactions. An important aspect of my research that sets me aside from most existing work is that most of the end-users of these LfD systems will be non-expert teachers who does not have any robotics or machine learning background. My research agenda focuses on empowering these users to program and customize their own robotic systems and to have these robotic systems learn and adapt by themselves with their acquired knowledge. My research is highly interdisciplinary and lies at the intersection of Human-Robot Interaction (HRI) and Machine Learning with strong connections to Artificial Intelligence, Machine-Perception and Manipulation. I develop algorithms and interactions to enable non-experts teach new skills to robots, empirically study them with user-studies and utilize the outcomes to improve, augment, modify the existing approaches and develop novel ones. Most people in the field either work on details of the interactions or focus only on the learning algorithms, without looking at the source of the data. Previous and Ongoing Work: My dissertation research started by investigating how non-expert teachers may teach low-level motor skills to robots(e.g. pouring beverages, opening/closing containers, serving food). Some users provided highly noisy and inconsistent demonstrations, making it difficult to learn with reasonable amount of data. To overcome some of the challenges that non-expert teachers face when providing demonstrations to robots, I have introduced keyframes [2] and trajectory-keyframe hybrid demonstrations [1] as well as algorithms to learn from these demonstrations, to the field of LfD. Keyframes are a sparse Figure 1: A naïve user’s Figure 2: Execution keyframe demonstration of a learned skill Figure 3: Extracting goal information Figure 4: A naïve user’s teleoperation demonstrations Barış Akgün Georgia Institute of Technology e-mail: barisakgun@gmail.com http://www.cc.gatech.edu/~bakgun3 set of sequential points that constitute the skill. Keyrames allow the users to only show important parts of a skill while ignoring noisy, inconsistent and unintended motions, as depicted in Fig. 1. I have tested these approaches through a set of studies with a total number of 60 end-users [2, 3] and found that non-expert teachers are able to utilize keyframes to provide demonstrations, with kinesthetic demonstrations (Fig. 1) or teleoperation demonstrations (Fig. 4). The keyrame approach has lead to multiple projects outside my thesis work including from other labs. During my studies with non-expert teachers, I observed that they concentrate on achieving the goal of a demonstrated skill rather than providing clean demonstrations of how to achieve it exactly. To leverage this goal-oriented behavior, I have developed an LfD system that can learn goals and actions simultaneously [4]. The learned action models are used in executing the skill (Fig. 2) and the goal models are used to monitor this execution. Goal information comes from the robot’s external sensors, such as cameras (Fig. 3), and action information comes from the internal sensors, such as encoders. A pilot study with 8 people showed that goal models can differentiate between successful and failed demonstrations. Another study with 8 people showed that the learned models are able to monitor the executions successfully with 90% average success rate, even if the learned action models are not as successful, with 66% success rate. Learning actions and goal simultaneously from the same set of demonstrations is a novel approach to learning from demonstration. In order to minimize the time the users spend teaching and to build more accurate models, robots should improve the learned skills themselves, especially given that the learned action models may not be successful. Based on the monitoring power of the goal models, I have developed an approach to self-improve the learned action models by utilizing the learned goal models [5]. In this approach, the robot executes the skill by sampling from the current action model, monitors this execution with the learned goal model (Fig. 2) and updates the action model based on the feedback from the monitoring result. This approach lies between policy search methods and reward/value function learning under the broader field of reinforcement learning. I have tested this self-improvement algorithm with 12 non-expert teachers within an interactive learning from demonstration framework [6]. The teachers were able to see robot’s execution during the interaction and provide feedback. This study replicated the results of [5] and also showed that non-expert teacher data can be used to seed self-learning: the robot was able to reach 100% execution success in 5 case studies, starting from 0% in four cases and 60% in one case with self-learning. My work differs from the others in the fact that it does not require a cost function to be pre-programmed and doesn’t require fine tuning after self-learning. At a high level, my work represents a unified LfD and self-learning system. The self-improvement work I did closed the loop between learning from teachers and self-learning. Future Agenda: My research vision includes deployment and long-term evaluation of LfD systems that learn from non-expert teachers in realistic scenarios. An example would be to deploy these robots in machine shops and observe how the workers utilize them in machining, assembly and transport tasks. Other examples include kitchens at homes (domestic scenario) and coffee shops (service scenario). My five year plan is to develop my existing framework furhter to allow a deployment such as this. I identified three main components for this research: (1) generalizing the system for more types of skills and environments, (2) exploring new self-improvement approaches and (3) is to enabling further interaction opportunities. Generalizing the system: There are infinite possibilities of skills to learn considering the breadth of applications. It is necessary to develop generic LfD systems to handle as many types of skills as possible. I will incorporate additional types of demonstrations. I will investigate knowledge transfer between a previously learned skill and a new skill, and how to adapt/edit a learned skill to work in a new environment. I will explore motion-planning to enable the learned skills to be executed in more environments without additional interaction. I will work on tighter integration of goal model usage for error recovery and correction during execution to form a feedback loop at the level of keyframes. For example, in an assembly task, the robot will need to reach for a tool in clutter and use the tool. At any point if the robot drops the tool, it will need to pick it back up again. For this, robot needs to plan with obstacles, understand that it dropped the tool, pick it up and execute its learned action. Barış Akgün Georgia Institute of Technology e-mail: barisakgun@gmail.com http://www.cc.gatech.edu/~bakgun3 Self-improvement: There is a limit to data that can be obtained from users considering their capabilities, patience and fatigue. It is necessary to let the robot learn from more data by self-exploration or reinforcement learning. My thesis work has already showed that this is possible and it has a lot of potential. I will further my work in this regard to increase efficiency and to enable learning a tighter closed-loop system with the action and goal models. I plan to look at gradient based methods, selective sampling and local sampling in a graphical model. These research directions will warrant theoretical analysis and formalization to come up with new methods. In general, machine learning proofs deal with large enough data which is not practical for robots. New methods will need guarantees that the robot gets closer each time with safe exploration. This is tricky since goal models, which lead the exploration, are learned from teacher data. Interaction: There are multiple open questions about interactive learning, transparency and interfaces in the field of embodied LfD. For interactive learning, the hypothesis is that if the user knows the state of learning during demonstrations, e.g. by seeing the skill that is being learned, he would change his demonstrations which in turn would result in faster learning and better data. There is no direct comparison of learning from the entire data after demonstrations versus learning incrementally during demonstrations in the context of LfD. The robot would repeat the skill after each demonstration to communicate its state of learning. Repeating the skill is an implicit way of communicating the state of learning to the teacher. However, the robot can be more explicit and more transparent. Examples of this include hesitation or slowing down during skill execution. Interactive learning open other areas to explore, especially for goal learning. For example the user can demonstrate the goal of the skill by performing it himself/herself to provide goal data rapidly. Another example is to let the user supervise a short session of self-improvement executions of the robot and label the results in order to update both of the models. These were employed in [5, 6] but not investigated in detail. Finally, there is additional information that the user would want to communicate to the robot besides the demonstrations, such as granular feedback on executions or hints (a particular pose to handle a tool, always going around an object instead of above etc.). This thread of research will look at possible ways to create new interaction paradigms both based on my previous experience and newly discovered user behavior through experimentation. Other Topics: Most of the time the skills that the robots learn are part of a larger task, for example assembling a product or meal preparation. My long term research goals include learning of the entire task along with the skills through interaction. Goal models can be leveraged to learn the skill transitions within a task and interaction can be leveraged to learn and refine the task structure. In my long-term LfD scenario, robots will interact with more objects and learn how to manipulate them. This offers interesting directions for affordance learning. In addition, I am very interested in robot design, particularly mobile manipulators, for safe and intuititive human-robot interaction. The observations from long-term LfD studies can guide this. This would be a collaborative effort with a group that has a design expertise. My aim is to build an interdisiplinary research lab that will employ students/researchers from a multidude of fields such as computer science, mechanical engineering and electrical engineering. I am planning to work with both undergraduate (more infrastructure) and graduate students (more research). The proposed research can be funded with a combination of start-up funds and NSF grants. My research has the potential to attract other grants, such as from ONR or industry, as well. The cost of doing robotics research is decreasing; the sensors such and robots are getting cheaper. The proposed research can be accomplished with commercially available robots (e.g. Baxter from Rethink Robotics, UR3 from Universal Robots, Stanley base with Kinova arm) and sensors (e.g. 2D cameras from Bassler, depth cameras such as ASUS Xtion and Microsoft Kinect). One mobile manipulator and one stationary robot arm with several sensors in a 30 − 50m2 room is enough to start and sustain 2-4 students for the first 3 years. I think robots will play an instrumental role in the mankind’s future. Bridging the gap between users and robots is an important endeavor. My existing and proposed work aims to let people program their robots and have them functional in their environments. This will make robots more useful and increase their adoption rate. I am very enthusiastic to be doing robotics research and contribute to the robot revolution. Barış Akgün Georgia Institute of Technology e-mail: barisakgun@gmail.com http://www.cc.gatech.edu/~bakgun3 References [1] Baris Akgun, Maya Cakmak, Karl Jiang, and Andrea L Thomaz. Keyframe-based learning from demonstration. International Journal of Social Robotics, 4(4):343–355, 2012. [2] Baris Akgun, Maya Cakmak, Jae Wook Yoo, and L. Andrea Thomaz. Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective. In ACM/IEEE Intl. Conference on Humanrobot interaction (HRI), pages 391–398, 2012. Best Paper Nominee. [3] Baris Akgun, Kaushik Subramanian, and Andrea Thomaz. Novel interaction strategies for learning from teleoperation. In AAAI Fall Symposia 2012, Robots Learning Interactively from Human Teachers, 2012. [4] Baris Akgun and Andrea L. Thomaz. Simultaneously learning actions and goals from demonstration. Autonomous Robots, 2015. [5] Baris Akgun and Andrea L. Thomaz. Using learned goal models to autonomously improve learned action models. In International Conference on Intelligent Robots and Systems (IROS), 2015. [6] Baris Akgun and Andrea L. Thomaz. Interactive goal learning and exploration: iGoaL-E. In Preperation, 2016.