GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Xu, Tian-Xing; Gao, Xiangjun; Hu, Wenbo; Li, Xiaoyu; Zhang, Song-Hai; Shan, Ying

Computer Science > Graphics

arXiv:2504.01016 (cs)

[Submitted on 1 Apr 2025 ]

Title: GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Title: GeometryCrafter：带有扩散先验的一致性开放世界视频几何估计

Authors:Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan

Abstract: Despite remarkable advancements in video depth estimation, existing methods exhibit inherent limitations in achieving geometric fidelity through the affine-invariant predictions, limiting their applicability in reconstruction and other metrically grounded downstream tasks. We propose GeometryCrafter, a novel framework that recovers high-fidelity point map sequences with temporal coherence from open-world videos, enabling accurate 3D/4D reconstruction, camera parameter estimation, and other depth-based applications. At the core of our approach lies a point map Variational Autoencoder (VAE) that learns a latent space agnostic to video latent distributions for effective point map encoding and decoding. Leveraging the VAE, we train a video diffusion model to model the distribution of point map sequences conditioned on the input videos. Extensive evaluations on diverse datasets demonstrate that GeometryCrafter achieves state-of-the-art 3D accuracy, temporal consistency, and generalization capability.

Abstract: 尽管视频深度估计领域取得了显著进展，但现有方法在通过仿射不变预测实现几何保真度方面存在固有局限性，限制了它们在重建和其他以度量为基础的下游任务中的适用性。我们提出了GeometryCrafter，这是一种新颖的框架，可以从开放世界视频中恢复具有时间一致性的高保真点图序列，从而实现准确的3D/4D重建、相机参数估计以及其他基于深度的应用。我们的方法的核心是一种点图变分自编码器（VAE），它学习了一种与视频潜在分布无关的潜在空间，以实现有效的点图编码和解码。利用VAE，我们训练了一个视频扩散模型来建模点图序列的分布，该分布以输入视频为条件。在多个数据集上的广泛评估表明，GeometryCrafter在3D准确性、时间一致性以及泛化能力方面达到了最先进的水平。

Comments:	Project webpage: https://geometrycrafter.github.io/
Subjects:	Graphics (cs.GR) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.01016 [cs.GR]
	(or arXiv:2504.01016v1 [cs.GR] for this version)
	https://doi.org/10.48550/arXiv.2504.01016

Submission history

From: Wenbo Hu [view email]
[v1] Tue, 1 Apr 2025 17:58:03 UTC (21,284 KB)

Computer Science > Graphics

Title: GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Title: GeometryCrafter：带有扩散先验的一致性开放世界视频几何估计

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Graphics

Title: GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Show Chinese title

Title: GeometryCrafter：带有扩散先验的一致性开放世界视频几何估计

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors