Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN

Nigam, Shivangi; Behera, Adarsh Prasad; Verma, Shekhar; Nagabhushan, P.

计算机科学 > 计算机视觉与模式识别

arXiv:2508.03415 (cs)

[提交于 2025年8月5日 ]

标题：使用频率分布CycleGAN进行图像翻译的潜在表示学习

标题： Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN

Authors:Shivangi Nigam, Adarsh Prasad Behera, Shekhar Verma, P. Nagabhushan

摘要：本文介绍了Fd-CycleGAN，这是一种图像到图像（I2I）翻译框架，通过增强潜在表示学习来逼近真实数据分布。在CycleGAN的基础上，我们的方法集成了局部邻域编码（LNE）和频率感知监督，以捕捉细粒度的局部像素语义，同时保留来自源领域的结构一致性。我们采用基于分布的损失度量，包括KL/JS散度和基于日志的相似性度量，以显式量化真实图像和生成图像分布在空间和频率域中的对齐程度。为了验证Fd-CycleGAN的有效性，我们在多样化的数据集上进行了实验——Horse2Zebra、Monet2Photo和一个合成增强的Strike-off数据集。与基线CycleGAN和其他最先进的方法相比，我们的方法在感知质量、更快的收敛性和改进的模式多样性方面表现出色，尤其是在低数据情况下。通过有效捕捉局部和全局分布特征，Fd-CycleGAN实现了更视觉一致和语义一致的翻译。我们的结果表明，频率引导的潜在学习显著提高了图像翻译任务中的泛化能力，在文档修复、艺术风格迁移和医学图像合成中具有前景应用。我们还提供了与基于扩散的生成模型的比较见解，突出了我们轻量级对抗方法在训练效率和定性输出方面的优势。

摘要： This paper presents Fd-CycleGAN, an image-to-image (I2I) translation framework that enhances latent representation learning to approximate real data distributions. Building upon the foundation of CycleGAN, our approach integrates Local Neighborhood Encoding (LNE) and frequency-aware supervision to capture fine-grained local pixel semantics while preserving structural coherence from the source domain. We employ distribution-based loss metrics, including KL/JS divergence and log-based similarity measures, to explicitly quantify the alignment between real and generated image distributions in both spatial and frequency domains. To validate the efficacy of Fd-CycleGAN, we conduct experiments on diverse datasets -- Horse2Zebra, Monet2Photo, and a synthetically augmented Strike-off dataset. Compared to baseline CycleGAN and other state-of-the-art methods, our approach demonstrates superior perceptual quality, faster convergence, and improved mode diversity, particularly in low-data regimes. By effectively capturing local and global distribution characteristics, Fd-CycleGAN achieves more visually coherent and semantically consistent translations. Our results suggest that frequency-guided latent learning significantly improves generalization in image translation tasks, with promising applications in document restoration, artistic style transfer, and medical image synthesis. We also provide comparative insights with diffusion-based generative models, highlighting the advantages of our lightweight adversarial approach in terms of training efficiency and qualitative output.

评论：	本文正在接受IEEE汇刊的出版审查。如果被接受，版权将转移给IEEE
主题：	计算机视觉与模式识别 (cs.CV) ; 人工智能 (cs.AI); 图形学 (cs.GR)
引用方式：	arXiv:2508.03415 [cs.CV]
	(或者 arXiv:2508.03415v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.03415

提交历史

来自： Adarsh Prasad Behera [查看电子邮件]
[v1] 星期二， 2025 年 8 月 5 日 12:59:37 UTC (20,282 KB)

计算机科学 > 计算机视觉与模式识别

标题：使用频率分布CycleGAN进行图像翻译的潜在表示学习

标题： Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： 使用频率分布CycleGAN进行图像翻译的潜在表示学习 显示英文标题

标题： Learning Latent Representations for Image Translation using Frequency Distributed CycleGAN

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：使用频率分布CycleGAN进行图像翻译的潜在表示学习