Uploaded by KK HE

IROS-Robotic Control in Adversarial and Sparse Reward Environments-A Robust Goal-Conditioned Reinforcement Learning Approach

advertisement
Robotic Control in Adversarial and Sparse Reward Environments: A
Robust Goal-Conditioned Reinforcement Learning Approach
Xiangkun He1 and Chen Lv1
Abstract— This paper presents a novel robust goalconditioned reinforcement learning (RGCRL) approach for
robotic control in adversarial and sparse reward environments.
The proposed method is evaluated on three tasks with adversarial attacks and sparse reward settings. The results show that
our scheme can ensure robotic control performance and policy
robustness on the adversarial and sparse reward tasks.
I. INTRODUCTION
With deep neural networks (DNNs) as function approximators, reinforcement learning (RL) algorithms have demonstrated their worth in a series of challenging tasks, from
games to robotic control. While existing RL-based robot
control methods have achieved many compelling results, they
mostly assume that the observations are free of uncertainties.
This assumption can hardly hold in real-world scenarios. The
observations of robots may include unexpected perturbations
that naturally arise from inevitable stochastic noises or sensing errors. Control policies that are not robust to observation
uncertainty can not only degrade robot performance, but also
cause catastrophic failures.
Additionally, a common challenge, especially for robotics,
is sample-efficient learning from sparse rewards, in which an
agent has to find a long sequence of “correct” actions in order
to achive a desired outcome. Unfortunately, unpredictable
perturbations on observations may make it worse.
Hence, we propose a novel RGCRL method for robotic
control in adversarial and sparse reward environments.
1 X. He and 1 C. Lv are with the School of Mechanical and Aerospace
Engineering, Nanyang Technological University, Singapore 639798
xiangkun.he@ntu.edu.sg, lyuchen@ntu.edu.sg
Fig. 1.
II. METHODOLOGY
The main components of the proposed approach can
be summarized as follows: (1) A robust goal-conditioned
Markov decision process (RGCMDP) is proposed to model
agent behaviors in uncertain and sparse reward environments.
(2) A paradigm of mixed adversarial attacks is developed to
generate diverse adversarial examples by combining whitebox and black-box attacks. (3) A robust goal-conditioned
actor-critic algorithm is advanced to optimize an agent’s
control policies and keep the variations of the policies
perturbed by the optimal adversarial attacks within bounds.
III. EVALUATION AND CONCLUSIONS
By combining HER [1] with twin delayed deep deterministic policy gradient (TD3), soft actor-critic (SAC) and robust
deep deterministic policy gradient (RDDPG) [2] algorithms,
three baselines are implemented to benchmark our approach,
which are indicated as TD3-HER, SAC-HER and RDDPGHER, respectively.
The results in Fig. 1 demonstrate the proposed RGCRL
approach is effective and outperforms the three competitive
baselines in terms of the success rate and policy robustness
on the three adversarial and sparse reward tasks.
R EFERENCES
[1] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Advances in neural information processing
systems, vol. 30, 2017.
[2] A. Pattanaik, Z. Tang, S. Liu, G. Bommannan, and G. Chowdhary,
“Robust deep reinforcement learning with adversarial attacks,” in Proceedings of the 17th International Conference on Autonomous Agents
and MultiAgent Systems, 2018, pp. 2040–2042.
Evaluation results for the different agents on the three adversarial and sparse reward tasks. (a): Reach; (b): Push; (c): Pick-and-Place.
Download