SatDreamer360: Geometry Consistent Street-View Video Generation from Satellite Imagery

Ze, Xianghui; Zhu, Beiyi; Song, Zhenbo; Lu, Jianfeng; Shi, Yujiao

计算机科学 > 计算机视觉与模式识别

arXiv:2506.00600v1 (cs)

[提交于 2025年5月31日 ]

标题： SatDreamer360：从卫星图像生成一致的街景视频

标题： SatDreamer360: Geometry Consistent Street-View Video Generation from Satellite Imagery

Authors:Xianghui Ze, Beiyi Zhu, Zhenbo Song, Jianfeng Lu, Yujiao Shi

摘要：从卫星图像生成连续的地平面视频是一项具有重大应用潜力的挑战性任务，可应用于仿真、自主导航和数字孪生城市等领域。现有方法主要集中在合成单个地平面视图图像上，通常依赖于高度图或手工制作的投影等辅助输入，但在生成时间一致的序列方面表现不佳。本文提出了一种新颖的框架{卫星梦者360}，可以从单一卫星图像和预定义轨迹生成几何和时间一致的地平面视频。为了弥合大视角差距，我们引入了一种紧凑的三平面表示法，直接从卫星图像编码场景几何。基于光线的像素注意力机制从三平面检索视点相关的特征，从而实现准确的跨视点对应关系，而无需额外的几何先验。为了确保多帧一致性，我们提出了一个受对极约束的时间注意力模块，利用轨迹上的已知相对姿态在帧之间对齐特征。为了支持评估，我们引入了{VIGOR++}，这是一个大规模的跨视点视频生成数据集，具有密集的轨迹注释和高质量的地平面序列。广泛的实验表明，SatDreamer360 在不同城市场景中的保真度、连贯性和几何对齐方面表现出色。

摘要： Generating continuous ground-level video from satellite imagery is a challenging task with significant potential for applications in simulation, autonomous navigation, and digital twin cities. Existing approaches primarily focus on synthesizing individual ground-view images, often relying on auxiliary inputs like height maps or handcrafted projections, and fall short in producing temporally consistent sequences. In this paper, we propose {SatDreamer360}, a novel framework that generates geometrically and temporally consistent ground-view video from a single satellite image and a predefined trajectory. To bridge the large viewpoint gap, we introduce a compact tri-plane representation that encodes scene geometry directly from the satellite image. A ray-based pixel attention mechanism retrieves view-dependent features from the tri-plane, enabling accurate cross-view correspondence without requiring additional geometric priors. To ensure multi-frame consistency, we propose an epipolar-constrained temporal attention module that aligns features across frames using the known relative poses along the trajectory. To support evaluation, we introduce {VIGOR++}, a large-scale dataset for cross-view video generation, with dense trajectory annotations and high-quality ground-view sequences. Extensive experiments demonstrate that SatDreamer360 achieves superior performance in fidelity, coherence, and geometric alignment across diverse urban scenes.

主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2506.00600 [cs.CV]
	(或者 arXiv:2506.00600v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.00600

提交历史

来自： Xianghui Ze [查看电子邮件]
[v1] 星期六， 2025 年 5 月 31 日 15:15:54 UTC (13,760 KB)

计算机科学 > 计算机视觉与模式识别

标题： SatDreamer360：从卫星图像生成一致的街景视频

标题： SatDreamer360: Geometry Consistent Street-View Video Generation from Satellite Imagery

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： SatDreamer360：从卫星图像生成一致的街景视频 显示英文标题

标题： SatDreamer360: Geometry Consistent Street-View Video Generation from Satellite Imagery

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： SatDreamer360：从卫星图像生成一致的街景视频