Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

Jin, Guizhe; Li, Zhuoren; Leng, Bo; Yu, Ran; Xiong, Lu

计算机科学 > 机器人技术

arXiv:2506.23771 (cs)

[提交于 2025年6月30日 ]

标题：多时间尺度分层强化学习用于自动驾驶的统一行为与控制

标题： Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

Authors:Guizhe Jin, Zhuoren Li, Bo Leng, Ran Yu, Lu Xiong

摘要：强化学习（RL）在自动驾驶（AD）中得到越来越多的应用，并显示出明显的优势。然而，大多数基于RL的AD方法忽略了策略结构设计。一个仅输出短时域车辆控制指令的RL策略会导致由于网络输出波动而引起的驾驶行为波动，而一个仅输出长时域驾驶目标的策略无法实现驾驶行为和控制的统一最优。因此，我们提出了一种多时域分层强化学习方法。我们的方法采用分层策略结构，其中高层和低层RL策略被统一训练，分别生成长时域运动指导和短时域控制指令。其中，运动指导通过混合动作显式表示，以捕捉结构化道路上的多模态驾驶行为，并支持增量式的低层扩展状态更新。此外，设计了一个分层安全机制以确保多时域安全性。在基于模拟器和HighD数据集的高速公路上多车道场景中的评估表明，我们的方法显著提高了AD性能，有效提高了驾驶效率、动作一致性和安全性。

摘要： Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.

评论：	8页，提交至IEEE机器人与自动化快报
主题：	机器人技术 (cs.RO) ; 人工智能 (cs.AI)
引用方式：	arXiv:2506.23771 [cs.RO]
	(或者 arXiv:2506.23771v1 [cs.RO] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.23771

提交历史

来自： Guizhe Jin [查看电子邮件]
[v1] 星期一， 2025 年 6 月 30 日 12:17:42 UTC (1,501 KB)

计算机科学 > 机器人技术

标题：多时间尺度分层强化学习用于自动驾驶的统一行为与控制

标题： Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器人技术

标题： 多时间尺度分层强化学习用于自动驾驶的统一行为与控制 显示英文标题

标题： Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：多时间尺度分层强化学习用于自动驾驶的统一行为与控制