Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels

Chen, Sining; Shi, Yilei; Zhu, Xiao Xiang

计算机科学 > 计算机视觉与模式识别

arXiv:2506.02534 (cs)

[提交于 2025年6月3日 ]

标题：利用不完美标签的弱监督增强单目高度估计

标题： Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels

Authors:Sining Chen, Yilei Shi, Xiao Xiang Zhu

摘要：单目高度估测被认为是遥感领域中最高效且最具成本效益的三维感知手段，并且自深度学习出现以来就吸引了大量关注。尽管训练神经网络需要大量的数据，但完美的标签数据却非常稀缺，且仅限于发达地区可用。因此，训练出的模型缺乏泛化能力，这限制了现有方法大规模应用的潜力。我们首次通过引入带有不完美标签的数据来训练逐像素高度估测网络（包括与高质量标签相比存在不完整、不精确和不准确的标签）来解决这一问题。我们提出了一种基于集成的流水线，该流水线可兼容任何单目高度估测网络。考虑到噪声标签、领域迁移以及高度值长尾分布的挑战，我们精心设计了架构和损失函数，利用平衡软损失和序数约束的弱监督方式从不完美标签中挖掘隐藏信息。我们在两个具有不同分辨率的数据集DFC23（0.5到1米）和GBH（3米）上进行了广泛的实验。结果显示，所提出的流水线相较于基线模型在各种域上的性能更加均衡，分别使DFC23和GBH的平均均方根误差改善了22.94%和18.62%。通过消融研究验证了每个设计组件的有效性。代码可在https://github.com/zhu-xlab/weakim2h获取。

摘要： Monocular height estimation is considered the most efficient and cost-effective means of 3D perception in remote sensing, and it has attracted much attention since the emergence of deep learning. While training neural networks requires a large amount of data, data with perfect labels are scarce and only available within developed regions. The trained models therefore lack generalizability, which limits the potential for large-scale application of existing methods. We tackle this problem for the first time, by introducing data with imperfect labels into training pixel-wise height estimation networks, including labels that are incomplete, inexact, and inaccurate compared to high-quality labels. We propose an ensemble-based pipeline compatible with any monocular height estimation network. Taking the challenges of noisy labels, domain shift, and long-tailed distribution of height values into consideration, we carefully design the architecture and loss functions to leverage the information concealed in imperfect labels using weak supervision through balanced soft losses and ordinal constraints. We conduct extensive experiments on two datasets with different resolutions, DFC23 (0.5 to 1 m) and GBH (3 m). The results indicate that the proposed pipeline outperforms baselines by achieving more balanced performance across various domains, leading to improvements of average root mean square errors up to 22.94 %, and 18.62 % on DFC23 and GBH, respectively. The efficacy of each design component is validated through ablation studies. Code is available at https://github.com/zhu-xlab/weakim2h.

主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2506.02534 [cs.CV]
	(或者 arXiv:2506.02534v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.02534

提交历史

来自： Sining Chen [查看电子邮件]
[v1] 星期二， 2025 年 6 月 3 日 07:14:16 UTC (2,598 KB)

计算机科学 > 计算机视觉与模式识别

标题：利用不完美标签的弱监督增强单目高度估计

标题： Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 利用不完美标签的弱监督增强单目高度估计 显示英文标题

标题： Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：利用不完美标签的弱监督增强单目高度估计