Navigating to Objects in the Real World

Gervet, Theophile; Chintala, Soumith; Batra, Dhruv; Malik, Jitendra; Chaplot, Devendra Singh

计算机科学 > 机器人技术

arXiv:2212.00922 (cs)

[提交于 2022年12月2日 ]

标题：在现实世界中定位物体

标题： Navigating to Objects in the Real World

Authors:Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik, Devendra Singh Chaplot

摘要：语义导航对于在不受控环境如我们的家庭、学校和医院中部署移动机器人是必要的。许多基于学习的方法被提出以应对经典空间导航流水线在语义理解方面的不足，该流水线使用深度传感器构建几何地图并规划到达点目标。广泛地说，端到端学习方法使用深度神经网络将传感器输入反应性地映射到动作，而模块化学习方法则通过基于学习的语义感知和探索来丰富经典流水线。但已学习的视觉导航策略主要在仿真环境中进行评估。不同类别的方法在机器人上表现如何？我们进行了一项大规模的实证研究，比较了经典、模块化和端到端学习方法中的代表性方法，在六所没有先验经验、地图或设备的住宅中进行比较。我们发现模块化学习在现实世界中表现良好，达到了90%的成功率。相比之下，端到端学习则表现不佳，由于仿真与现实之间的图像领域差距较大，其成功率从仿真中的77%下降到现实世界的23%。对于实践者来说，我们展示了模块化学习是一种可靠的导航到物体的方法：策略设计中的模块化和抽象性使得Sim-to-Real迁移成为可能。对于研究人员来说，我们确定了两个关键问题，这些问题阻碍了当今模拟器成为可靠的评估基准——(A) 图像中的Sim-to-Real差距较大，以及 (B) 模拟与现实世界错误模式之间的脱节——并提出了具体的前进步骤。

摘要： Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end learning approaches reactively map sensor inputs to actions with deep neural networks, while modular learning approaches enrich the classical pipeline with learning-based semantic sensing and exploration. But learned visual navigation policies have predominantly been evaluated in simulation. How well do different classes of methods work on a robot? We present a large-scale empirical study of semantic visual navigation methods comparing representative methods from classical, modular, and end-to-end learning approaches across six homes with no prior experience, maps, or instrumentation. We find that modular learning works well in the real world, attaining a 90% success rate. In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality. For practitioners, we show that modular learning is a reliable approach to navigate to objects: modularity and abstraction in policy design enable Sim-to-Real transfer. For researchers, we identify two key issues that prevent today's simulators from being reliable evaluation benchmarks - (A) a large Sim-to-Real gap in images and (B) a disconnect between simulation and real-world error modes - and propose concrete steps forward.

评论：	39页，19图和表，已提交至《科学机器人》
主题：	机器人技术 (cs.RO) ; 计算机视觉与模式识别 (cs.CV); 机器学习 (cs.LG)
引用方式：	arXiv:2212.00922 [cs.RO]
	(或者 arXiv:2212.00922v1 [cs.RO] 对于此版本)
	https://doi.org/10.48550/arXiv.2212.00922

提交历史

来自： Theophile Gervet [查看电子邮件]
[v1] 星期五， 2022 年 12 月 2 日 01:10:47 UTC (34,947 KB)

计算机科学 > 机器人技术

标题：在现实世界中定位物体

标题： Navigating to Objects in the Real World

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器人技术

标题： 在现实世界中定位物体 显示英文标题

标题： Navigating to Objects in the Real World

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：在现实世界中定位物体