Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

Xu, Xiang; Kong, Lingdong; Wang, Song; Zhou, Chuanwei; Liu, Qingshan

计算机科学 > 计算机视觉与模式识别

arXiv:2507.05260 (cs)

[提交于 2025年7月7日 ]

标题：超越单次，超越单一视角：跨视角和长时程蒸馏以获得更好的LiDAR表示

标题： Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

Authors:Xiang Xu, Lingdong Kong, Song Wang, Chuanwei Zhou, Qingshan Liu

摘要： LiDAR表示学习旨在从大规模、易于获取的数据集中提取丰富的结构和语义信息，减少对昂贵的人工标注的依赖。然而，现有的LiDAR表示策略常常忽视LiDAR序列中的固有时空线索，限制了其有效性。在本工作中，我们提出了LiMA，一种新颖的长期图像到LiDAR记忆聚合框架，该框架显式地捕捉更长范围的时间相关性，以增强LiDAR表示学习。 LiMA包含三个关键组件：1）一个跨视图聚合模块，用于对齐和融合相邻相机视图之间的重叠区域，构建一个更加统一且无冗余的记忆库；2）一个长期特征传播机制，能够高效地对齐和整合多帧图像特征，在LiDAR表示学习过程中强化时间一致性；以及3）一个跨序列记忆对齐策略，强制执行驾驶序列之间的的一致性，提高对未见过环境的泛化能力。 LiMA保持了高的预训练效率，并在下游任务中不会产生额外的计算开销。在主流的基于LiDAR的感知基准上的大量实验表明，LiMA显著提升了LiDAR语义分割和3D目标检测。我们希望这项工作能激发更多针对自动驾驶的有效预训练范式。代码已公开供未来研究使用。

摘要： LiDAR representation learning aims to extract rich structural and semantic information from large-scale, readily available datasets, reducing reliance on costly human annotations. However, existing LiDAR representation strategies often overlook the inherent spatiotemporal cues in LiDAR sequences, limiting their effectiveness. In this work, we propose LiMA, a novel long-term image-to-LiDAR Memory Aggregation framework that explicitly captures longer range temporal correlations to enhance LiDAR representation learning. LiMA comprises three key components: 1) a Cross-View Aggregation module that aligns and fuses overlapping regions across neighboring camera views, constructing a more unified and redundancy-free memory bank; 2) a Long-Term Feature Propagation mechanism that efficiently aligns and integrates multi-frame image features, reinforcing temporal coherence during LiDAR representation learning; and 3) a Cross-Sequence Memory Alignment strategy that enforces consistency across driving sequences, improving generalization to unseen environments. LiMA maintains high pretraining efficiency and incurs no additional computational overhead during downstream tasks. Extensive experiments on mainstream LiDAR-based perception benchmarks demonstrate that LiMA significantly improves both LiDAR semantic segmentation and 3D object detection. We hope this work inspires more effective pretraining paradigms for autonomous driving. The code has be made publicly accessible for future research.

评论：	ICCV 2025；26页，12图，10表；代码见 http://github.com/Xiangxu-0103/LiMA
主题：	计算机视觉与模式识别 (cs.CV) ; 机器学习 (cs.LG); 机器人技术 (cs.RO)
引用方式：	arXiv:2507.05260 [cs.CV]
	(或者 arXiv:2507.05260v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.05260

提交历史

来自： Xiang Xu [查看电子邮件]
[v1] 星期一， 2025 年 7 月 7 日 17:59:58 UTC (9,560 KB)

计算机科学 > 计算机视觉与模式识别

标题：超越单次，超越单一视角：跨视角和长时程蒸馏以获得更好的LiDAR表示

标题： Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 超越单次，超越单一视角：跨视角和长时程蒸馏以获得更好的LiDAR表示 显示英文标题

标题： Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：超越单次，超越单一视角：跨视角和长时程蒸馏以获得更好的LiDAR表示