SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

Ge, Xingtong; Zhang, Xin; Xu, Tongda; Zhang, Yi; Zhang, Xinjie; Wang, Yan; Zhang, Jun

计算机科学 > 计算机视觉与模式识别

arXiv:2506.00523 (cs)

[提交于 2025年5月31日 ]

标题： SenseFlow：基于流的文本到图像蒸馏的分布匹配扩展

标题： SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

Authors:Xingtong Ge, Xin Zhang, Tongda Xu, Yi Zhang, Xinjie Zhang, Yan Wang, Jun Zhang

摘要：分布匹配蒸馏（DMD）已被成功应用于文本到图像的扩散模型，例如Stable Diffusion（SD）1.5。然而，原始的DMD在大规模基于流的文本到图像模型上（如SD 3.5和FLUX）遇到了收敛困难的问题。在本文中，我们首先分析了在大规模模型上应用原始DMD时遇到的问题。然后，为了克服可扩展性挑战，我们提出了隐式分布对齐（IDA），以正则化生成器与虚假分布之间的距离。此外，我们提出了段内引导（ISG），以重新分配教师模型的时间步重要性分布。仅使用IDA，DMD可以在SD 3.5上收敛；同时使用IDA和ISG，DMD可以在SD 3.5和FLUX.1上收敛。除了其他改进（如扩展的判别器模型），我们的最终模型\textbf{SenseFlow}在蒸馏基于扩散的文本到图像模型（如SDXL）和基于流匹配的模型（如SD 3.5 Large和FLUX）方面表现优异。源代码将在https://github.com/XingtongGe/SenseFlow上提供。

摘要： The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to regularize the distance between the generator and fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep importance distribution from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Along with other improvements such as scaled up discriminator models, our final model, dubbed \textbf{SenseFlow}, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX. The source code will be avaliable at https://github.com/XingtongGe/SenseFlow.

评论：	正在审阅
主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2506.00523 [cs.CV]
	(或者 arXiv:2506.00523v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.00523

提交历史

来自： Xingtong Ge [查看电子邮件]
[v1] 星期六， 2025 年 5 月 31 日 11:59:02 UTC (21,841 KB)

计算机科学 > 计算机视觉与模式识别

标题： SenseFlow：基于流的文本到图像蒸馏的分布匹配扩展

标题： SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： SenseFlow：基于流的文本到图像蒸馏的分布匹配扩展 显示英文标题

标题： SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： SenseFlow：基于流的文本到图像蒸馏的分布匹配扩展