Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization

Fernández-Menduiña, Samuel; Pavez, Eduardo; Ortega, Antonio

电气工程与系统科学 > 图像与视频处理

arXiv:2504.02216v1 (eess)

[提交于 2025年4月3日 ]

标题：基于特征保留率失真优化的机器图像编码

标题： Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization

Authors:Samuel Fernández-Menduiña, Eduardo Pavez, Antonio Ortega

摘要：许多图像和视频主要通过计算机视觉算法进行处理，仅涉及偶尔的人类检查。当此类内容需要在处理前进行压缩时（例如，在分布式应用中），编码方法必须优化视觉质量和下游任务性能。我们首先表明，鉴于从原始图像和解码图像获得的特征，减少压缩对任务损失影响的一种方法是使用特征之间的距离作为失真度量，通过率失真优化（RDO）进行操作。然而，直接优化这种率失真权衡需要针对每种编码参数的编码、解码和特征评估的迭代工作流程，这在计算上是不切实际的。我们通过简化RDO公式来解决这个问题，使失真项能够使用基于块的编码器进行计算。我们首先将泰勒展开应用于特征提取器，将特征距离重新表述为具有神经网络雅可比矩阵的二次度量。然后，我们将线性化度量替换为基于块的近似值，我们称之为输入相关的平方误差（IDSE）。为了降低计算复杂度，我们使用雅可比草图来近似IDSE。由此产生的损失可以在变换域中以块为单位进行评估，并与平方误差和（SSE）结合，以同时解决视觉质量和计算机视觉性能。跨多个特征提取器和下游神经网络的AVC模拟显示，与基于SSE的RDO相比，对于相同的计算机视觉准确性，可以节省高达10%的比特率，且没有解码器复杂度开销，编码器复杂度仅增加7%。

摘要： Many images and videos are primarily processed by computer vision algorithms, involving only occasional human inspection. When this content requires compression before processing, e.g., in distributed applications, coding methods must optimize for both visual quality and downstream task performance. We first show that, given the features obtained from the original and the decoded images, an approach to reduce the effect of compression on a task loss is to perform rate-distortion optimization (RDO) using the distance between features as a distortion metric. However, optimizing directly such a rate-distortion trade-off requires an iterative workflow of encoding, decoding, and feature evaluation for each coding parameter, which is computationally impractical. We address this problem by simplifying the RDO formulation to make the distortion term computable using block-based encoders. We first apply Taylor's expansion to the feature extractor, recasting the feature distance as a quadratic metric with the Jacobian matrix of the neural network. Then, we replace the linearized metric with a block-wise approximation, which we call input-dependent squared error (IDSE). To reduce computational complexity, we approximate IDSE using Jacobian sketches. The resulting loss can be evaluated block-wise in the transform domain and combined with the sum of squared errors (SSE) to address both visual quality and computer vision performance. Simulations with AVC across multiple feature extractors and downstream neural networks show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE, with no decoder complexity overhead and just a 7% encoder complexity increase.

主题：	图像与视频处理 (eess.IV) ; 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2504.02216 [eess.IV]
	(或者 arXiv:2504.02216v1 [eess.IV] 对于此版本)
	https://doi.org/10.48550/arXiv.2504.02216

提交历史

来自： Samuel Fernández [查看电子邮件]
[v1] 星期四， 2025 年 4 月 3 日 02:11:26 UTC (6,726 KB)

电气工程与系统科学 > 图像与视频处理

标题：基于特征保留率失真优化的机器图像编码

标题： Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 图像与视频处理

标题： 基于特征保留率失真优化的机器图像编码 显示英文标题

标题： Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于特征保留率失真优化的机器图像编码