Research Breakthrough (XIV): Solving Robotic Mobile Manipulation via Reinforcement Learning

发布者：汤靖玲发布时间：2025-09-09浏览次数：13

Recently, Assistant Professor Yang Tianpei from Professor Gao Yang's research group at the School of Artificial Intelligence Science and Technology, Nanjing University, proposed a Causal Information Prioritization (CIP) algorithm for efficient reinforcement learning to address the low sample efficiency problem in RL. This method explicitly models the causal relationships between states, actions, and rewards, guiding the agent to focus on critical information with causal effects on task objectives during exploration, thereby significantly improving learning efficiency and policy stability.

Fig. 1: Framework for Causal Information-Prioritized Reinforcement Learning

The core idea of CIP encompasses two aspects: First, it utilizes counterfactual data augmentation techniques to generate synthetic samples by swapping irrelevant state features, thereby enhancing the learning of key state-reward relationships without additional environmental interactions. Second, it introduces a causality-aware empowerment mechanism that prioritizes actions with substantial causal effects on rewards through causal reweighting and mutual information maximization, thereby improving the targeting and controllability of exploration.

Fig. 2: Partial Experimental Task Scenarios

The advantage of this method lies in its ability to effectively overcome the sample inefficiency problem caused by blind exploration and spurious correlations in traditional reinforcement learning approaches. It performs exceptionally well in complex scenarios such as sparse rewards, high-dimensional states, and pixel observations. Experiments demonstrate that CIP achieves optimal or near-optimal performance across 39 tasks, including locomotion control, robotic arm manipulation, and Adroit hand manipulation, exhibiting strong generalization capabilities and robustness.

Fig. 3: Comparative Results of Robotic Arm Manipulation Tasks

Table 4: Comparative Experimental Results for Locomotion Tasks

Furthermore, the CIP framework exhibits potential for integration with cutting-edge technologies such as object-centric representation and 3D perception, providing a crucial foundation for future applications in real-world scenarios including robotic manipulation and multi-agent collaboration. This work was published at the ICLR 2025 conference.

Project Homepage：https://sites.google.com/view/rl-cip/

Github Link：https://github.com/HYeCao/CIP