WorldExplorer: Towards Generating Fully Navigable 3D Scenes

Schneider, Manuel-Andreas; Höllein, Lukas; Nießner, Matthias

计算机科学 > 计算机视觉与模式识别

arXiv:2506.01799 (cs)

[提交于 2025年6月2日 ]

标题： WorldExplorer：迈向生成完全可导航的3D场景

标题： WorldExplorer: Towards Generating Fully Navigable 3D Scenes

Authors:Manuel-Andreas Schneider, Lukas Höllein, Matthias Nießner

摘要：从文本生成三维世界是计算机视觉领域一个备受期待的目标。现有方法受限于场景内部的探索程度，即当超出中心视角或全景视角时，会产生拉长且杂乱无章的伪影。为此，我们提出了WorldExplorer，这是一种基于自回归视频轨迹生成的新方法，能够在广泛的视点范围内构建完全可导航的三维场景，并保持一致的视觉质量。我们通过创建与360度全景图相对应的多视图一致图像来初始化场景。然后，在迭代场景生成管道中利用视频扩散模型对其进行扩展。具体而言，我们沿着短小且预先定义的轨迹生成多个视频，这些视频深入探索场景，包括围绕物体的运动。我们的新颖场景记忆机制使每个视频都基于最相关的先前视图，而碰撞检测机制则防止出现诸如进入物体之类的退化结果。最后，我们通过3D高斯点阵优化将所有生成的视图融合成统一的3D表示。与先前的方法相比，WorldExplorer生成的高质量场景在大范围相机运动下仍然保持稳定，首次实现了逼真且不受限制的探索。我们认为这标志着向生成沉浸式且真正可探索的虚拟三维环境迈出了重要的一步。

摘要： Generating 3D worlds from text is a highly anticipated goal in computer vision. Existing works are limited by the degree of exploration they allow inside of a scene, i.e., produce streched-out and noisy artifacts when moving beyond central or panoramic perspectives. To this end, we propose WorldExplorer, a novel method based on autoregressive video trajectory generation, which builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints. We initialize our scenes by creating multi-view consistent images corresponding to a 360 degree panorama. Then, we expand it by leveraging video diffusion models in an iterative scene generation pipeline. Concretely, we generate multiple videos along short, pre-defined trajectories, that explore the scene in depth, including motion around objects. Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results, like moving into objects. Finally, we fuse all generated views into a unified 3D representation via 3D Gaussian Splatting optimization. Compared to prior approaches, WorldExplorer produces high-quality scenes that remain stable under large camera motion, enabling for the first time realistic and unrestricted exploration. We believe this marks a significant step toward generating immersive and truly explorable virtual 3D environments.

评论：	项目页面：见 https://the-world-explorer.github.io/，视频：见 https://youtu.be/c1lBnwJWNmE
主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2506.01799 [cs.CV]
	(或者 arXiv:2506.01799v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.01799

提交历史

来自： Lukas Höllein [查看电子邮件]
[v1] 星期一， 2025 年 6 月 2 日 15:41:31 UTC (32,356 KB)

计算机科学 > 计算机视觉与模式识别

标题： WorldExplorer：迈向生成完全可导航的3D场景

标题： WorldExplorer: Towards Generating Fully Navigable 3D Scenes

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： WorldExplorer：迈向生成完全可导航的3D场景 显示英文标题

标题： WorldExplorer: Towards Generating Fully Navigable 3D Scenes

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： WorldExplorer：迈向生成完全可导航的3D场景