DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

Godbole, Mihir; Gao, Xiangbo; Tu, Zhengzhong

计算机科学 > 计算机视觉与模式识别

arXiv:2506.17590 (cs)

[提交于 2025年6月21日 ]

标题： DRAMA-X：面向驾驶的细粒度意图预测与风险推理基准

标题： DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

Authors:Mihir Godbole, Xiangbo Gao, Zhengzhong Tu

摘要：理解易受伤害道路使用者（VRUs）如行人和骑自行车者的短期运动对于安全的自动驾驶至关重要，尤其是在存在模糊或高风险行为的城市场景中。虽然视觉语言模型（VLMs）已经实现了开放词汇感知，但它们在细粒度意图推理中的应用仍缺乏深入研究。值得注意的是，目前没有现有的基准测试评估安全关键情境下的多类意图预测。为解决这一差距，我们引入了DRAMA-X，这是一个通过自动化标注流程从DRAMA数据集中构建的细粒度基准。 DRAMA-X包含5,686帧容易发生事故的图像，这些图像带有物体边界框、九类方向意图分类法、二进制风险评分、为自车生成的专家动作建议以及描述性运动总结。这些标注使得对自动驾驶决策核心的四个相互关联任务进行结构化评估成为可能：物体检测、意图预测、风险评估和动作建议。作为参考基线，我们提出了SGG-Intent，这是一种轻量级、无需训练的框架，模拟了自车的推理流程。它依次使用基于VLM的检测器从视觉输入生成场景图，推断意图，评估风险，并通过由大型语言模型驱动的组合推理阶段推荐动作。我们评估了一系列最近的VLMs，在所有四个DRAMA-X任务中比较了性能。我们的实验表明，基于场景图的推理提高了意图预测和风险评估，特别是在显式建模上下文线索时效果更明显。

摘要： Understanding the short-term motion of vulnerable road users (VRUs) like pedestrians and cyclists is critical for safe autonomous driving, especially in urban scenarios with ambiguous or high-risk behaviors. While vision-language models (VLMs) have enabled open-vocabulary perception, their utility for fine-grained intent reasoning remains underexplored. Notably, no existing benchmark evaluates multi-class intent prediction in safety-critical situations, To address this gap, we introduce DRAMA-X, a fine-grained benchmark constructed from the DRAMA dataset via an automated annotation pipeline. DRAMA-X contains 5,686 accident-prone frames labeled with object bounding boxes, a nine-class directional intent taxonomy, binary risk scores, expert-generated action suggestions for the ego vehicle, and descriptive motion summaries. These annotations enable a structured evaluation of four interrelated tasks central to autonomous decision-making: object detection, intent prediction, risk assessment, and action suggestion. As a reference baseline, we propose SGG-Intent, a lightweight, training-free framework that mirrors the ego vehicle's reasoning pipeline. It sequentially generates a scene graph from visual input using VLM-backed detectors, infers intent, assesses risk, and recommends an action using a compositional reasoning stage powered by a large language model. We evaluate a range of recent VLMs, comparing performance across all four DRAMA-X tasks. Our experiments demonstrate that scene-graph-based reasoning enhances intent prediction and risk assessment, especially when contextual cues are explicitly modeled.

评论：	19页，5张图表，预印本正在审阅中。代码可在以下链接获取：https://github.com/taco-group/DRAMA-X
主题：	计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI); 机器人技术 (cs.RO)
引用方式：	arXiv:2506.17590 [cs.CV]
	(或者 arXiv:2506.17590v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.17590

提交历史

来自： Mihir Sunil Godbole [查看电子邮件]
[v1] 星期六， 2025 年 6 月 21 日 05:01:42 UTC (3,811 KB)

计算机科学 > 计算机视觉与模式识别

标题： DRAMA-X：面向驾驶的细粒度意图预测与风险推理基准

标题： DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： DRAMA-X：面向驾驶的细粒度意图预测与风险推理基准 显示英文标题

标题： DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： DRAMA-X：面向驾驶的细粒度意图预测与风险推理基准