Vision-Based Localization and LLM-based Navigation for Indoor Environments

Rahimi, Keyan; Haque, Md. Wasiul; Dasgupta, Sagar; Rahman, Mizanur

计算机科学 > 机器学习

arXiv:2508.08120 (cs)

[提交于 2025年8月11日 ]

标题：基于视觉的定位和基于大语言模型的室内环境导航

标题： Vision-Based Localization and LLM-based Navigation for Indoor Environments

Authors:Keyan Rahimi, Md. Wasiul Haque, Sagar Dasgupta, Mizanur Rahman

摘要：室内导航由于缺乏可靠的GPS信号以及大型封闭环境的建筑复杂性，仍然是一个复杂的挑战。本研究提出了一种室内定位和导航方法，该方法结合了基于视觉的定位与基于大语言模型（LLM）的导航。定位系统利用经过两阶段过程微调的ResNet-50卷积神经网络，通过智能手机摄像头输入来识别用户的位置。为了补充定位，导航模块使用一个大语言模型，该模型通过精心设计的系统提示引导，以解释预处理的平面图图像并生成分步指导。在具有重复特征和有限可视性的现实办公室走廊中进行了实验评估，以测试定位的鲁棒性。即使在受限的观看条件和短时查询下，该模型在所有测试航点上都达到了高置信度和96%的准确性。使用ChatGPT在真实建筑平面图上进行的导航测试平均指令准确率为75%，但在零样本推理和推理时间方面存在观察到的限制。这项研究展示了使用现成相机和公开可用的平面图实现可扩展、无需基础设施的室内导航的潜力，特别是在医院、机场和教育机构等资源受限的环境中。

摘要： Indoor navigation remains a complex challenge due to the absence of reliable GPS signals and the architectural intricacies of large enclosed environments. This study presents an indoor localization and navigation approach that integrates vision-based localization with large language model (LLM)-based navigation. The localization system utilizes a ResNet-50 convolutional neural network fine-tuned through a two-stage process to identify the user's position using smartphone camera input. To complement localization, the navigation module employs an LLM, guided by a carefully crafted system prompt, to interpret preprocessed floor plan images and generate step-by-step directions. Experimental evaluation was conducted in a realistic office corridor with repetitive features and limited visibility to test localization robustness. The model achieved high confidence and an accuracy of 96% across all tested waypoints, even under constrained viewing conditions and short-duration queries. Navigation tests using ChatGPT on real building floor maps yielded an average instruction accuracy of 75%, with observed limitations in zero-shot reasoning and inference time. This research demonstrates the potential for scalable, infrastructure-free indoor navigation using off-the-shelf cameras and publicly available floor plans, particularly in resource-constrained settings like hospitals, airports, and educational institutions.

评论：	20页，6图，1表
主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2508.08120 [cs.LG]
	(或者 arXiv:2508.08120v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.08120

提交历史

来自： Md. Wasiul Haque [查看电子邮件]
[v1] 星期一， 2025 年 8 月 11 日 15:59:09 UTC (2,468 KB)

计算机科学 > 机器学习

标题：基于视觉的定位和基于大语言模型的室内环境导航

标题： Vision-Based Localization and LLM-based Navigation for Indoor Environments

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 基于视觉的定位和基于大语言模型的室内环境导航 显示英文标题

标题： Vision-Based Localization and LLM-based Navigation for Indoor Environments

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于视觉的定位和基于大语言模型的室内环境导航