DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation

Wang, Yunheng; Fang, Yuetong; Wang, Taowen; Feng, Yixiao; Tan, Yawen; Zhang, Shuning; Liu, Peiran; Ji, Yiding; Xu, Renjing

Computer Science > Robotics

arXiv:2509.11197 (cs)

[Submitted on 14 Sep 2025 ]

Title: DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation

Title: DreamNav：一种基于轨迹的零样本视觉和语言导航想象框架

Authors:Yunheng Wang, Yuetong Fang, Taowen Wang, Yixiao Feng, Yawen Tan, Shuning Zhang, Peiran Liu, Yiding Ji, Renjing Xu

Abstract: Vision-and-Language Navigation in Continuous Environments (VLN-CE), which links language instructions to perception and control in the real world, is a core capability of embodied robots. Recently, large-scale pretrained foundation models have been leveraged as shared priors for perception, reasoning, and action, enabling zero-shot VLN without task-specific training. However, existing zero-shot VLN methods depend on costly perception and passive scene understanding, collapsing control to point-level choices. As a result, they are expensive to deploy, misaligned in action semantics, and short-sighted in planning. To address these issues, we present DreamNav that focuses on the following three aspects: (1) for reducing sensory cost, our EgoView Corrector aligns viewpoints and stabilizes egocentric perception; (2) instead of point-level actions, our Trajectory Predictor favors global trajectory-level planning to better align with instruction semantics; and (3) to enable anticipatory and long-horizon planning, we propose an Imagination Predictor to endow the agent with proactive thinking capability. On VLN-CE and real-world tests, DreamNav sets a new zero-shot state-of-the-art (SOTA), outperforming the strongest egocentric baseline with extra information by up to 7.49\% and 18.15\% in terms of SR and SPL metrics. To our knowledge, this is the first zero-shot VLN method to unify trajectory-level planning and active imagination while using only egocentric inputs.

Abstract: 视觉与语言导航在连续环境（VLN-CE）中，将语言指令与现实世界中的感知和控制联系起来，是具身机器人的核心能力。最近，大规模预训练基础模型被用作感知、推理和行动的共享先验，使零样本VLN无需任务特定训练即可实现。然而，现有的零样本VLN方法依赖于昂贵的感知和被动场景理解，将控制简化为点级选择。因此，它们部署成本高，动作语义不一致，并且规划视野短浅。为了解决这些问题，我们提出了DreamNav，重点关注以下三个方面：(1) 为了减少感官成本，我们的EgoView Corrector对齐视角并稳定自我中心感知；(2) 我们的轨迹预测器而不是点级动作，更倾向于全局轨迹级规划，以更好地与指令语义对齐；(3) 为了实现前瞻性及长时程规划，我们提出了一个想象预测器，使代理具备主动思考的能力。在VLN-CE和真实世界测试中，DreamNav创下了新的零样本最先进（SOTA）性能，相比最强的自我中心基线，在SR和SPL指标上分别提升了高达7.49%和18.15%。据我们所知，这是第一个统一轨迹级规划和主动想象的零样本VLN方法，且仅使用自我中心输入。

Subjects:	Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2509.11197 [cs.RO]
	(or arXiv:2509.11197v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2509.11197

Submission history

From: Yunheng Wang [view email]
[v1] Sun, 14 Sep 2025 09:54:20 UTC (16,450 KB)

Computer Science > Robotics

Title: DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation

Title: DreamNav：一种基于轨迹的零样本视觉和语言导航想象框架

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title: DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation Show Chinese title

Title: DreamNav：一种基于轨迹的零样本视觉和语言导航想象框架

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: DreamNav: A Trajectory-Based Imaginative Framework for Zero-Shot Vision-and-Language Navigation