High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery

Peng, Hongxing; Chen, Lide; Zhu, Hui; Chen, Yan

计算机科学 > 计算机视觉与模式识别

arXiv:2507.00825v2 (cs)

[提交于 2025年7月1日 (v1) ，最后修订 2025年7月8日 (此版本， v2)]

标题：高频语义和几何先验用于挑战性无人机图像中的端到端检测变压器

标题： High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery

Authors:Hongxing Peng, Lide Chen, Hui Zhu, Yan Chen

摘要：基于无人机的物体检测（UAV-OD）面临诸多挑战，包括无人机图像中目标尺寸小、密集分布以及杂乱背景。当前算法通常依赖于手工设计的组件，如锚框，这需要精细调整且泛化能力有限，以及非极大值抑制（NMS），这对阈值敏感且容易对密集目标进行错误分类。因此，这些通用架构难以适应航空成像特性，导致性能受限。此外，新兴的端到端框架尚未有效缓解这些航空特定的挑战。为解决这些问题，我们提出了HEGS-DETR，这是一个全面增强的实时检测变压器框架，专为无人机设计。首先，我们引入了高频增强语义网络（HFESNet）作为新的主干网络。 HFESNet保留关键的高频空间细节以提取鲁棒的语义特征，从而提高在复杂背景中对小目标和遮挡目标的区分能力。其次，我们的高效小目标金字塔（ESOP）策略以最小的计算开销战略地融合高分辨率特征图，显著提升小目标检测效果。最后，提出的可选查询回收（SQR）和几何感知位置编码（GAPE）模块增强了检测器解码器的稳定性和定位精度，有效优化边界框，并为密集场景提供明确的空间先验。在VisDrone数据集上的实验表明，HEGS-DETR在基线基础上实现了5.1%的AP50和3.8%的AP提升，同时保持实时速度并将参数数量减少了4M。

摘要： Unmanned Aerial Vehicle-based Object Detection (UAV-OD) faces substantial challenges, including small target sizes, high-density distributions, and cluttered backgrounds in UAV imagery. Current algorithms often depend on hand-crafted components like anchor boxes, which demand fine-tuning and exhibit limited generalization, and Non-Maximum Suppression (NMS), which is threshold-sensitive and prone to misclassifying dense objects. These generic architectures thus struggle to adapt to aerial imaging characteristics, resulting in performance limitations. Moreover, emerging end-to-end frameworks have yet to effectively mitigate these aerial-specific challenges.To address these issues, we propose HEGS-DETR, a comprehensively enhanced, real-time Detection Transformer framework tailored for UAVs. First, we introduce the High-Frequency Enhanced Semantics Network (HFESNet) as a novel backbone. HFESNet preserves critical high-frequency spatial details to extract robust semantic features, thereby improving discriminative capability for small and occluded targets in complex backgrounds. Second, our Efficient Small Object Pyramid (ESOP) strategy strategically fuses high-resolution feature maps with minimal computational overhead, significantly boosting small object detection. Finally, the proposed Selective Query Recollection (SQR) and Geometry-Aware Positional Encoding (GAPE) modules enhance the detector's decoder stability and localization accuracy, effectively optimizing bounding boxes and providing explicit spatial priors for dense scenes. Experiments on the VisDrone dataset demonstrate that HEGS-DETR achieves a 5.1% AP50 and 3.8% AP increase over the baseline, while maintaining real-time speed and reducing parameter count by 4M.

评论：	14页，9图，即将发表于KBS
主题：	计算机视觉与模式识别 (cs.CV)
ACM 类：	I.2.10; I.4.8; I.5.1
引用方式：	arXiv:2507.00825 [cs.CV]
	(或者 arXiv:2507.00825v2 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.00825

提交历史

来自： Lide Chen [查看电子邮件]
[v1] 星期二， 2025 年 7 月 1 日 14:56:56 UTC (13,425 KB)
[v2] 星期二， 2025 年 7 月 8 日 01:32:53 UTC (13,376 KB)

计算机科学 > 计算机视觉与模式识别

标题：高频语义和几何先验用于挑战性无人机图像中的端到端检测变压器

标题： High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 高频语义和几何先验用于挑战性无人机图像中的端到端检测变压器 显示英文标题

标题： High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：高频语义和几何先验用于挑战性无人机图像中的端到端检测变压器