Lecture Review|Reinforcement Learning and Its Embodied AI Applications (II)

发布者：汤靖玲发布时间：2026-05-11浏览次数：28

On April 30, Tenure-Track Associate Professor Yang Tianpei of our school invited Associate Researcher Tang Hongyao from the Faculty of Intelligence and Computing at Tianjin University and University-Appointed Associate Professor Ma Yi from Shanxi University to deliver a special academic report in Classroom 125, East District, Nanyong Building. The report focused on cutting-edge topics such as continuous reinforcement learning, embodied foundation models, and the unified evolution of physical intelligence. The session was moderated by Yang Tianpei.

Tang Hongyao, in his talk titled Dynamic Reinforcement Learning under Continuous Change, shared his latest research findings. He pointed out that while current artificial intelligence methods have achieved good performance in well-defined and stable tasks, how to maintain effective and stable learning in scenarios where data distributions, environments, or task objectives continuously change remains a major challenge for AI towards continuous evolution. He introduced the dynamics of reinforcement learning under continuous change, with a focus on several key aspects: the evolution of policy networks in low-dimensional parameter spaces, latent space policy modulation methods, the chaining effects of network perturbation in continual reinforcement learning and corresponding regularization methods, as well as the mechanisms of plasticity loss in reinforcement learning under non-stationary conditions. These insights provide new research perspectives for understanding deep reinforcement learning processes and enhancing the continual learning capabilities of intelligent agents.

Ma Yi delivered an in-depth report on the topic of A Unified Evolutionary Framework for Physical Intelligence Based on Embodied Foundation Models. He introduced Embodied-R1.5, a unified embodied foundation model proposed by his team. This model integrates core capabilities such as embodied cognition, spatial reasoning, task planning, embodied error correction, and precise localization, achieving leading performance on multiple embodied VLM benchmarks and VLA evaluation suites. During his talk, Ma highlighted the model's innovative designs in data construction, reinforcement fine-tuning, multi-task reinforcement learning training strategies, and the Planner-Grounder-Corrector closed-loop autonomous framework. He also demonstrated its application potential in real-world experiments, including zero-shot long-horizon autonomous manipulation across different robot embodiments, tool function understanding, and complex multi-step reasoning.

Faculty and students in attendance actively raised questions on topics such as the stability of continuous reinforcement learning, training paradigms for embodied foundation models, cross-embodiment generalization capabilities, and the deployment of real-world robotic tasks. The two experts drew on their own research findings and application experience to provide in-depth answers, leading to a discussion that was both thorough and highly insightful.