Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation

Song, Ziying; Liu, Lin; Pan, Hongyu; Liao, Bencheng; Guo, Mingzhe; Yang, Lei; Zhang, Yongchang; Xu, Shaoqing; Jia, Caiyan; Luo, Yadan

计算机科学 > 计算机视觉与模式识别

arXiv:2507.04049 (cs)

[提交于 2025年7月5日 ]

标题：打破模仿瓶颈：强化扩散推动多样化轨迹生成

标题： Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation

Authors:Ziying Song, Lin Liu, Hongyu Pan, Bencheng Liao, Mingzhe Guo, Lei Yang, Yongchang Zhang, Shaoqing Xu, Caiyan Jia, Yadan Luo

摘要：大多数端到端自动驾驶方法依赖于从单一专家演示中进行模仿学习，这通常导致保守和同质的行为，限制了在复杂现实场景中的泛化能力。在本工作中，我们提出了DIVER，一个集成强化学习与基于扩散生成的端到端驾驶框架，以生成多样且可行的轨迹。 DIVER的核心是一个强化的基于扩散生成机制。首先，该模型根据地图元素和周围代理生成多个参考轨迹，这些轨迹来源于单一的真实轨迹，缓解了仅依赖单一专家演示时模仿学习带来的局限性。其次，使用强化学习来引导扩散过程，其中基于奖励的监督在生成的轨迹上施加安全性和多样性约束，从而提高其实用性和泛化能力。此外，为了解决基于L2的开环指标在捕捉轨迹多样性方面的局限性，我们提出了一种新的多样性度量标准，用于评估多模式预测的多样性。在闭环NAVSIM和Bench2Drive基准以及开环nuScenes数据集上的大量实验表明，DIVER显著提高了轨迹多样性，有效解决了模仿学习固有的模式崩溃问题。

摘要： Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.

评论：	16页，6图
主题：	计算机视觉与模式识别 (cs.CV) ; 机器人技术 (cs.RO)
引用方式：	arXiv:2507.04049 [cs.CV]
	(或者 arXiv:2507.04049v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.04049

提交历史

来自： Ziying Song [查看电子邮件]
[v1] 星期六， 2025 年 7 月 5 日 14:19:19 UTC (10,207 KB)

计算机科学 > 计算机视觉与模式识别

标题：打破模仿瓶颈：强化扩散推动多样化轨迹生成

标题： Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 打破模仿瓶颈：强化扩散推动多样化轨迹生成 显示英文标题

标题： Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：打破模仿瓶颈：强化扩散推动多样化轨迹生成