Measuring Semantic Information Production in Generative Diffusion Models

Handke, Florian; Koulischer, Félix; Raya, Gabriel; Ambrogioni, Luca

Statistics > Machine Learning

arXiv:2506.10433 (stat)

[Submitted on 12 Jun 2025 ]

Title: Measuring Semantic Information Production in Generative Diffusion Models

Title: 生成扩散模型中的语义信息生产测量

Authors:Florian Handke, Félix Koulischer, Gabriel Raya, Luca Ambrogioni

Abstract: It is well known that semantic and structural features of the generated images emerge at different times during the reverse dynamics of diffusion, a phenomenon that has been connected to physical phase transitions in magnets and other materials. In this paper, we introduce a general information-theoretic approach to measure when these class-semantic "decisions" are made during the generative process. By using an online formula for the optimal Bayesian classifier, we estimate the conditional entropy of the class label given the noisy state. We then determine the time intervals corresponding to the highest information transfer between noisy states and class labels using the time derivative of the conditional entropy. We demonstrate our method on one-dimensional Gaussian mixture models and on DDPM models trained on the CIFAR10 dataset. As expected, we find that the semantic information transfer is highest in the intermediate stages of diffusion while vanishing during the final stages. However, we found sizable differences between the entropy rate profiles of different classes, suggesting that different "semantic decisions" are located at different intermediate times.

Abstract: 众所周知，在扩散反向动力学过程中，生成图像的语义和结构特征会以不同的时间出现，这种现象与磁铁和其他材料中的物理相变有关。本文介绍了一种通用的信息论方法来衡量这些类别语义“决策”在生成过程中的发生时间。通过使用最优贝叶斯分类器的在线公式，我们估计了给定噪声状态时类别标签的条件熵。然后，我们利用条件熵的时间导数确定了噪声状态和类别标签之间信息传输最高的时间间隔。我们在一维高斯混合模型以及在CIFAR10数据集上训练的DDPM模型上展示了我们的方法。正如预期的那样，我们发现语义信息传输在扩散的中间阶段最高，而在最后阶段消失。然而，我们发现不同类别的熵率曲线存在显著差异，这表明不同的“语义决策”位于不同的中间时间点。

Comments:	4 pages, 3 figures, an appendix with derivations and implementation details, accepted at ICLR DeLTa 2025
Subjects:	Machine Learning (stat.ML) ; Machine Learning (cs.LG)
Cite as:	arXiv:2506.10433 [stat.ML]
	(or arXiv:2506.10433v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2506.10433

Submission history

From: Florian Handke [view email]
[v1] Thu, 12 Jun 2025 07:35:29 UTC (1,852 KB)

Statistics > Machine Learning

Title: Measuring Semantic Information Production in Generative Diffusion Models

Title: 生成扩散模型中的语义信息生产测量

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title: Measuring Semantic Information Production in Generative Diffusion Models Show Chinese title

Title: 生成扩散模型中的语义信息生产测量

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Measuring Semantic Information Production in Generative Diffusion Models