Pixels-to-Graph: Real-time Integration of Building Information Models and Scene Graphs for Semantic-Geometric Human-Robot Understanding

Longo, Antonello; Chung, Chanyoung; Palieri, Matteo; Kim, Sung-Kyun; Agha, Ali; Guaragnella, Cataldo; Khattak, Shehryar

计算机科学 > 机器人技术

arXiv:2506.22593 (cs)

[提交于 2025年6月27日 ]

标题：像素到图：用于语义几何人机理解的建筑信息模型与场景图的实时集成

标题： Pixels-to-Graph: Real-time Integration of Building Information Models and Scene Graphs for Semantic-Geometric Human-Robot Understanding

Authors:Antonello Longo, Chanyoung Chung, Matteo Palieri, Sung-Kyun Kim, Ali Agha, Cataldo Guaragnella, Shehryar Khattak

摘要：自主机器人在高风险、危险的应用中作为人类操作员的支持平台，正发挥着越来越重要的作用。为了完成具有挑战性的任务，需要高效的一个人机协作和理解。虽然通常机器人规划利用3D几何信息，但人类操作员习惯于环境的高层紧凑表示，如俯视2D地图，代表建筑信息模型（BIM）。 3D场景图已成为一种强大的工具，以弥合人类可读的2D BIM和机器人3D地图之间的差距。在这项工作中，我们引入了像素到图（Pix2G），一种新颖的轻量级方法，可在资源受限的机器人平台上实时从图像像素和LiDAR地图生成结构化的场景图，用于未知环境的自主探索。为了满足机载计算约束，该框架设计为仅在CPU上执行所有操作。该方法的输出是一个去噪的2D俯视环境地图和一个结构分割的3D点云，这些通过一个多层图无缝连接，从物体级别到建筑级别抽象信息。所提出的方法在使用NASA JPL NeBula-Spot四足机器人进行的现实世界实验中进行了定量和定性评估，以实时自主探索和映射杂乱的车库和城市办公室类似环境。

摘要： Autonomous robots are increasingly playing key roles as support platforms for human operators in high-risk, dangerous applications. To accomplish challenging tasks, an efficient human-robot cooperation and understanding is required. While typically robotic planning leverages 3D geometric information, human operators are accustomed to a high-level compact representation of the environment, like top-down 2D maps representing the Building Information Model (BIM). 3D scene graphs have emerged as a powerful tool to bridge the gap between human readable 2D BIM and the robot 3D maps. In this work, we introduce Pixels-to-Graph (Pix2G), a novel lightweight method to generate structured scene graphs from image pixels and LiDAR maps in real-time for the autonomous exploration of unknown environments on resource-constrained robot platforms. To satisfy onboard compute constraints, the framework is designed to perform all operation on CPU only. The method output are a de-noised 2D top-down environment map and a structure-segmented 3D pointcloud which are seamlessly connected using a multi-layer graph abstracting information from object-level up to the building-level. The proposed method is quantitatively and qualitatively evaluated during real-world experiments performed using the NASA JPL NeBula-Spot legged robot to autonomously explore and map cluttered garage and urban office like environments in real-time.

评论：	论文被2025年IEEE自动化科学与工程国际会议（CASE）接收
主题：	机器人技术 (cs.RO) ; 人工智能 (cs.AI); 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2506.22593 [cs.RO]
	(或者 arXiv:2506.22593v1 [cs.RO] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.22593

提交历史

来自： Antonello Longo [查看电子邮件]
[v1] 星期五， 2025 年 6 月 27 日 19:23:31 UTC (11,673 KB)

计算机科学 > 机器人技术

标题：像素到图：用于语义几何人机理解的建筑信息模型与场景图的实时集成

标题： Pixels-to-Graph: Real-time Integration of Building Information Models and Scene Graphs for Semantic-Geometric Human-Robot Understanding

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器人技术

标题： 像素到图：用于语义几何人机理解的建筑信息模型与场景图的实时集成 显示英文标题

标题： Pixels-to-Graph: Real-time Integration of Building Information Models and Scene Graphs for Semantic-Geometric Human-Robot Understanding

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：像素到图：用于语义几何人机理解的建筑信息模型与场景图的实时集成