Generating 360{\deg} Video is What You Need For a 3D Scene

Zhang, Zhaoyang; Hold-Geoffroy, Yannick; Hašan, Miloš; Chen, Ziwen; Luan, Fujun; Dorsey, Julie; Hu, Yiwei

计算机科学 > 图形学

arXiv:2504.02045 (cs)

[提交于 2025年4月2日 (v1) ，最后修订 2025年9月25日 (此版本， v4)]

标题：生成360°视频是您对3D场景所需的内容

标题： Generating 360° Video is What You Need For a 3D Scene

Authors:Zhaoyang Zhang, Yannick Hold-Geoffroy, Miloš Hašan, Ziwen Chen, Fujun Luan, Julie Dorsey, Yiwei Hu

摘要：生成3D场景仍然是一个具有挑战性的任务，这是由于缺乏现成的场景数据。大多数现有方法仅能生成部分场景，并提供有限的导航自由度。我们引入了一种实用且可扩展的解决方案，使用360{\deg }视频作为中间场景表示，捕捉完整的场景上下文并确保生成过程中的视觉内容一致性。我们提出了WorldPrompter，一个从文本提示合成可行走的3D场景的生成管道。 WorldPrompter结合了一个条件360{\deg }全景视频生成器，能够生成一个包含128帧的视频，模拟一个人穿过并捕捉虚拟环境的过程。随后，该视频由一个快速的前馈3D重建器重建为高斯点云，从而在3D场景中实现真正的可行走体验。实验表明，我们的全景视频生成模型，在图像和视频数据混合训练下，对于静态场景实现了令人信服的空间和时间一致性。这一点通过平均COLMAP匹配率94.6%得到验证，使得高质量的全景高斯点云重建成为可能，并提升了整个场景的导航性能。定性和定量结果也表明，它优于最先进的360{\deg }视频生成器和3D场景生成模型。

摘要： Generating 3D scenes is still a challenging task due to the lack of readily available scene data. Most existing methods only produce partial scenes and provide limited navigational freedom. We introduce a practical and scalable solution that uses 360{\deg} video as an intermediate scene representation, capturing the full-scene context and ensuring consistent visual content throughout the generation. We propose WorldPrompter, a generative pipeline that synthesizes traversable 3D scenes from text prompts. WorldPrompter incorporates a conditional 360{\deg} panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model, trained with a mix of image and video data, achieves convincing spatial and temporal consistency for static scenes. This is validated by an average COLMAP matching rate of 94.6\%, allowing for high-quality panoramic Gaussian splat reconstruction and improved navigation throughout the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360{\deg} video generators and 3D scene generation models.

评论：	SIGGRAPH Asia 2025。项目页面： https://zhaoyangzh.github.io/projects/worldprompter/
主题：	图形学 (cs.GR) ; 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2504.02045 [cs.GR]
	(或者 arXiv:2504.02045v4 [cs.GR] 对于此版本)
	https://doi.org/10.48550/arXiv.2504.02045

提交历史

来自： Zhaoyang Zhang [查看电子邮件]
[v1] 星期三， 2025 年 4 月 2 日 18:04:32 UTC (43,031 KB)
[v2] 星期一， 2025 年 9 月 22 日 17:49:50 UTC (28,991 KB)
[v3] 星期二， 2025 年 9 月 23 日 20:29:33 UTC (28,991 KB)
[v4] 星期四， 2025 年 9 月 25 日 03:04:40 UTC (28,991 KB)

计算机科学 > 图形学

标题：生成360°视频是您对3D场景所需的内容

标题： Generating 360° Video is What You Need For a 3D Scene

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 图形学

标题： 生成360°视频是您对3D场景所需的内容 显示英文标题

标题： Generating 360° Video is What You Need For a 3D Scene

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：生成360°视频是您对3D场景所需的内容