Barı¸s Akgün e-mail: Georgia Institute of Technology

advertisement
Barış Akgün
Georgia Institute of Technology
e-mail: barisakgun@gmail.com
http://www.cc.gatech.edu/~bakgun3
Research Statement
Robotics is at an inflection point. On one side, the demand for robotics and automation is increasing.
There is always interest in automating repetitive, boring and dangerous tasks in many different fields such as
manufacturing, service, medical, food, logistics, education and defense. In addition, the world population is
aging according to the UN. This will result in shortage of labor, increased need for elder care and additional
economic pressure on the working class. Robotics and related technologies are promising solutions to these
issues. On the other side, robots are getting more accessible. The last five years have seen an exponential
growth of collaborative robots in areas of manufacturing where extreme payload capacity and extreme precision is not the main concern. These robots are safer and lower in cost than industrial robots and the demand
for even lower costs and competition will make them more accessible. The increasing demand for robots and
decreasing robot costs will converge and result in wide-spread adoption of robots within the next 20 years.
The National Robotics Initiative announced in 2011 will further accelerate the development and the use robots
in the US. Computer scientists and engineers will play a pivotal roll in allowing this robot revolution to happen
by helping the robotic platforms to realize their full potential, just like the PC and smart-phone revolutions.
The robot revolution will result in more people becoming robot users. Many challenges will emerge in
deploying robots beyond the structured factory floors where they will encounter varied, complex and unpredictable environments. The robots will interact with different end-users with different needs and preferences.
It is difficult to pre-program robots for all the scenarios they will face and as a result programming robots after
deployment will be useful, if not necessary. My research aim is to enable people to program robots to do new
things or customize their existing actions through Learning from Demonstration (LfD) interactions. An important aspect of my research that sets me aside from most existing work is that most of the end-users of these
LfD systems will be non-expert teachers who does not have any robotics or machine learning background.
My research agenda focuses on empowering these users to program and customize their own robotic systems
and to have these robotic systems learn and adapt by themselves with their acquired knowledge.
My research is highly interdisciplinary and lies at the intersection of Human-Robot Interaction (HRI) and
Machine Learning with strong connections to Artificial Intelligence, Machine-Perception and Manipulation.
I develop algorithms and interactions to enable non-experts teach new skills to robots, empirically study them
with user-studies and utilize the outcomes to improve, augment, modify the existing approaches and develop
novel ones. Most people in the field either work on details of the interactions or focus only on the learning
algorithms, without looking at the source of the data.
Previous and Ongoing Work: My dissertation research started by investigating how non-expert teachers may teach low-level motor skills to robots(e.g. pouring beverages, opening/closing containers, serving
food). Some users provided highly noisy and inconsistent demonstrations, making it difficult to learn with
reasonable amount of data. To overcome some of the challenges that non-expert teachers face when providing
demonstrations to robots, I have introduced keyframes [2] and trajectory-keyframe hybrid demonstrations
[1] as well as algorithms to learn from these demonstrations, to the field of LfD. Keyframes are a sparse
Figure 1: A naïve user’s Figure 2: Execution
keyframe demonstration of a learned skill
Figure 3: Extracting
goal information
Figure 4: A naïve user’s teleoperation demonstrations
Barış Akgün
Georgia Institute of Technology
e-mail: barisakgun@gmail.com
http://www.cc.gatech.edu/~bakgun3
set of sequential points that constitute the skill. Keyrames allow the users to only show important parts of
a skill while ignoring noisy, inconsistent and unintended motions, as depicted in Fig. 1. I have tested these
approaches through a set of studies with a total number of 60 end-users [2, 3] and found that non-expert
teachers are able to utilize keyframes to provide demonstrations, with kinesthetic demonstrations (Fig. 1) or
teleoperation demonstrations (Fig. 4). The keyrame approach has lead to multiple projects outside my thesis
work including from other labs.
During my studies with non-expert teachers, I observed that they concentrate on achieving the goal of a
demonstrated skill rather than providing clean demonstrations of how to achieve it exactly. To leverage this
goal-oriented behavior, I have developed an LfD system that can learn goals and actions simultaneously [4].
The learned action models are used in executing the skill (Fig. 2) and the goal models are used to monitor this
execution. Goal information comes from the robot’s external sensors, such as cameras (Fig. 3), and action
information comes from the internal sensors, such as encoders. A pilot study with 8 people showed that
goal models can differentiate between successful and failed demonstrations. Another study with 8 people
showed that the learned models are able to monitor the executions successfully with 90% average success
rate, even if the learned action models are not as successful, with 66% success rate. Learning actions and goal
simultaneously from the same set of demonstrations is a novel approach to learning from demonstration.
In order to minimize the time the users spend teaching and to build more accurate models, robots should
improve the learned skills themselves, especially given that the learned action models may not be successful.
Based on the monitoring power of the goal models, I have developed an approach to self-improve the learned
action models by utilizing the learned goal models [5]. In this approach, the robot executes the skill by
sampling from the current action model, monitors this execution with the learned goal model (Fig. 2) and
updates the action model based on the feedback from the monitoring result. This approach lies between
policy search methods and reward/value function learning under the broader field of reinforcement learning.
I have tested this self-improvement algorithm with 12 non-expert teachers within an interactive learning from
demonstration framework [6]. The teachers were able to see robot’s execution during the interaction and
provide feedback. This study replicated the results of [5] and also showed that non-expert teacher data can be
used to seed self-learning: the robot was able to reach 100% execution success in 5 case studies, starting from
0% in four cases and 60% in one case with self-learning. My work differs from the others in the fact that it
does not require a cost function to be pre-programmed and doesn’t require fine tuning after self-learning. At
a high level, my work represents a unified LfD and self-learning system. The self-improvement work I did
closed the loop between learning from teachers and self-learning.
Future Agenda: My research vision includes deployment and long-term evaluation of LfD systems
that learn from non-expert teachers in realistic scenarios. An example would be to deploy these robots in
machine shops and observe how the workers utilize them in machining, assembly and transport tasks. Other
examples include kitchens at homes (domestic scenario) and coffee shops (service scenario). My five year
plan is to develop my existing framework furhter to allow a deployment such as this. I identified three main
components for this research: (1) generalizing the system for more types of skills and environments, (2)
exploring new self-improvement approaches and (3) is to enabling further interaction opportunities.
Generalizing the system: There are infinite possibilities of skills to learn considering the breadth of
applications. It is necessary to develop generic LfD systems to handle as many types of skills as possible. I will
incorporate additional types of demonstrations. I will investigate knowledge transfer between a previously
learned skill and a new skill, and how to adapt/edit a learned skill to work in a new environment. I will
explore motion-planning to enable the learned skills to be executed in more environments without additional
interaction. I will work on tighter integration of goal model usage for error recovery and correction during
execution to form a feedback loop at the level of keyframes. For example, in an assembly task, the robot will
need to reach for a tool in clutter and use the tool. At any point if the robot drops the tool, it will need to pick
it back up again. For this, robot needs to plan with obstacles, understand that it dropped the tool, pick it up
and execute its learned action.
Barış Akgün
Georgia Institute of Technology
e-mail: barisakgun@gmail.com
http://www.cc.gatech.edu/~bakgun3
Self-improvement: There is a limit to data that can be obtained from users considering their capabilities,
patience and fatigue. It is necessary to let the robot learn from more data by self-exploration or reinforcement
learning. My thesis work has already showed that this is possible and it has a lot of potential. I will further
my work in this regard to increase efficiency and to enable learning a tighter closed-loop system with the
action and goal models. I plan to look at gradient based methods, selective sampling and local sampling in
a graphical model. These research directions will warrant theoretical analysis and formalization to come up
with new methods. In general, machine learning proofs deal with large enough data which is not practical for
robots. New methods will need guarantees that the robot gets closer each time with safe exploration. This is
tricky since goal models, which lead the exploration, are learned from teacher data.
Interaction: There are multiple open questions about interactive learning, transparency and interfaces in
the field of embodied LfD. For interactive learning, the hypothesis is that if the user knows the state of learning
during demonstrations, e.g. by seeing the skill that is being learned, he would change his demonstrations
which in turn would result in faster learning and better data. There is no direct comparison of learning from the
entire data after demonstrations versus learning incrementally during demonstrations in the context of LfD.
The robot would repeat the skill after each demonstration to communicate its state of learning. Repeating
the skill is an implicit way of communicating the state of learning to the teacher. However, the robot can
be more explicit and more transparent. Examples of this include hesitation or slowing down during skill
execution. Interactive learning open other areas to explore, especially for goal learning. For example the user
can demonstrate the goal of the skill by performing it himself/herself to provide goal data rapidly. Another
example is to let the user supervise a short session of self-improvement executions of the robot and label the
results in order to update both of the models. These were employed in [5, 6] but not investigated in detail.
Finally, there is additional information that the user would want to communicate to the robot besides the
demonstrations, such as granular feedback on executions or hints (a particular pose to handle a tool, always
going around an object instead of above etc.). This thread of research will look at possible ways to create new
interaction paradigms both based on my previous experience and newly discovered user behavior through
experimentation.
Other Topics: Most of the time the skills that the robots learn are part of a larger task, for example
assembling a product or meal preparation. My long term research goals include learning of the entire task
along with the skills through interaction. Goal models can be leveraged to learn the skill transitions within
a task and interaction can be leveraged to learn and refine the task structure. In my long-term LfD scenario,
robots will interact with more objects and learn how to manipulate them. This offers interesting directions
for affordance learning. In addition, I am very interested in robot design, particularly mobile manipulators,
for safe and intuititive human-robot interaction. The observations from long-term LfD studies can guide this.
This would be a collaborative effort with a group that has a design expertise.
My aim is to build an interdisiplinary research lab that will employ students/researchers from a multidude
of fields such as computer science, mechanical engineering and electrical engineering. I am planning to work
with both undergraduate (more infrastructure) and graduate students (more research). The proposed research
can be funded with a combination of start-up funds and NSF grants. My research has the potential to attract
other grants, such as from ONR or industry, as well. The cost of doing robotics research is decreasing; the
sensors such and robots are getting cheaper. The proposed research can be accomplished with commercially
available robots (e.g. Baxter from Rethink Robotics, UR3 from Universal Robots, Stanley base with Kinova
arm) and sensors (e.g. 2D cameras from Bassler, depth cameras such as ASUS Xtion and Microsoft Kinect).
One mobile manipulator and one stationary robot arm with several sensors in a 30 − 50m2 room is enough to
start and sustain 2-4 students for the first 3 years.
I think robots will play an instrumental role in the mankind’s future. Bridging the gap between users and
robots is an important endeavor. My existing and proposed work aims to let people program their robots and
have them functional in their environments. This will make robots more useful and increase their adoption
rate. I am very enthusiastic to be doing robotics research and contribute to the robot revolution.
Barış Akgün
Georgia Institute of Technology
e-mail: barisakgun@gmail.com
http://www.cc.gatech.edu/~bakgun3
References
[1] Baris Akgun, Maya Cakmak, Karl Jiang, and Andrea L Thomaz. Keyframe-based learning from demonstration. International Journal of Social Robotics, 4(4):343–355, 2012.
[2] Baris Akgun, Maya Cakmak, Jae Wook Yoo, and L. Andrea Thomaz. Trajectories and keyframes for
kinesthetic teaching: A human-robot interaction perspective. In ACM/IEEE Intl. Conference on Humanrobot interaction (HRI), pages 391–398, 2012. Best Paper Nominee.
[3] Baris Akgun, Kaushik Subramanian, and Andrea Thomaz. Novel interaction strategies for learning from
teleoperation. In AAAI Fall Symposia 2012, Robots Learning Interactively from Human Teachers, 2012.
[4] Baris Akgun and Andrea L. Thomaz. Simultaneously learning actions and goals from demonstration.
Autonomous Robots, 2015.
[5] Baris Akgun and Andrea L. Thomaz. Using learned goal models to autonomously improve learned action
models. In International Conference on Intelligent Robots and Systems (IROS), 2015.
[6] Baris Akgun and Andrea L. Thomaz. Interactive goal learning and exploration: iGoaL-E. In Preperation,
2016.
Download