Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment

Yoshihara, Yuki; Jiang, Linjing; Karatas, Nihan; Kanamori, Hitoshi; Harada, Asuka; Tanaka, Takahiro

计算机科学 > 计算机视觉与模式识别

arXiv:2507.08367 (cs)

[提交于 2025年7月11日 ]

标题：使用大型语言模型理解驾驶风险：面向老年驾驶员评估

标题： Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment

Authors:Yuki Yoshihara, Linjing Jiang, Nihan Karatas, Hitoshi Kanamori, Asuka Harada, Takahiro Tanaka

摘要：本研究探讨了多模态大型语言模型（LLM），特别是ChatGPT-4o，在使用静态行车记录仪图像进行类似人类的交通场景解释方面的潜力。本文重点研究与老年驾驶员评估相关的三个判断任务：评估交通密度、评估交叉口可视性以及识别停车标志。这些任务需要上下文推理，而不是简单的物体检测。我们使用零样本、少样本和多样本提示策略，以人工标注作为参考标准来评估模型性能。评估指标包括精确率、召回率和F1分数。结果表明，提示设计显著影响性能，交叉口可视性的召回率从21.7%（零样本）增加到57.0%（多样本）。对于交通密度，一致性从53.5%增加到67.6%。在停车标志检测中，模型表现出高精确率（最高达86.3%），但召回率较低（约76.7%），表明其响应倾向较为保守。输出稳定性分析显示，人类和模型在解释结构模糊的场景时都面临困难。然而，模型的解释性文本与其预测相一致，提高了可解释性。这些发现表明，通过精心设计的提示，LLM在场景级驾驶风险评估中具有作为辅助工具的潜力。未来的研究应探索使用更大数据集、多样化标注者和新一代模型架构在老年驾驶员评估中的可扩展性。

摘要： This study investigates the potential of a multimodal large language model (LLM), specifically ChatGPT-4o, to perform human-like interpretations of traffic scenes using static dashcam images. Herein, we focus on three judgment tasks relevant to elderly driver assessments: evaluating traffic density, assessing intersection visibility, and recognizing stop signs recognition. These tasks require contextual reasoning rather than simple object detection. Using zero-shot, few-shot, and multi-shot prompting strategies, we evaluated the performance of the model with human annotations serving as the reference standard. Evaluation metrics included precision, recall, and F1-score. Results indicate that prompt design considerably affects performance, with recall for intersection visibility increasing from 21.7% (zero-shot) to 57.0% (multi-shot). For traffic density, agreement increased from 53.5% to 67.6%. In stop-sign detection, the model demonstrated high precision (up to 86.3%) but a lower recall (approximately 76.7%), indicating a conservative response tendency. Output stability analysis revealed that humans and the model faced difficulties interpreting structurally ambiguous scenes. However, the model's explanatory texts corresponded with its predictions, enhancing interpretability. These findings suggest that, with well-designed prompts, LLMs hold promise as supportive tools for scene-level driving risk assessments. Future studies should explore scalability using larger datasets, diverse annotators, and next-generation model architectures for elderly driver assessments.

主题：	计算机视觉与模式识别 (cs.CV) ; 系统与控制 (eess.SY)
引用方式：	arXiv:2507.08367 [cs.CV]
	(或者 arXiv:2507.08367v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.08367

提交历史

来自： Yuki Yoshihara [查看电子邮件]
[v1] 星期五， 2025 年 7 月 11 日 07:28:49 UTC (8,824 KB)

计算机科学 > 计算机视觉与模式识别

标题：使用大型语言模型理解驾驶风险：面向老年驾驶员评估

标题： Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 使用大型语言模型理解驾驶风险：面向老年驾驶员评估 显示英文标题

标题： Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：使用大型语言模型理解驾驶风险：面向老年驾驶员评估