When and how can inexact generative models still sample from the data manifold?

Chandramoorthy, Nisha; de Clercq, Adriaan

计算机科学 > 机器学习

arXiv:2508.07581 (cs)

[提交于 2025年8月11日 ]

标题：何时以及如何不精确的生成模型仍然可以从数据流形中采样？

标题： When and how can inexact generative models still sample from the data manifold?

Authors:Nisha Chandramoorthy, Adriaan de Clercq

摘要： A curious phenomenon observed in some dynamical generative models is the following: despite learning errors in the score function or the drift vector field, the generated samples appear to shift \emph{沿着} the support of the data distribution but not \emph{远离} from it. In this work, we investigate this phenomenon of \emph{支持的鲁棒性} by taking a dynamical systems approach on the generating stochastic/deterministic process. Our perturbation analysis of the probability flow reveals that infinitesimal learning errors cause the predicted density to be different from the target density only on the data manifold for a wide class of generative models. Further, what is the dynamical mechanism that leads to the robustness of the support? We show that the alignment of the top Lyapunov vectors (most sensitive infinitesimal perturbation directions) with the tangent spaces along the boundary of the data manifold leads to robustness and prove a sufficient condition on the dynamics of the generating process to achieve this alignment. Moreover, the alignment condition is efficient to compute and, in practice, for robust generative models, automatically leads to accurate estimates of the tangent bundle of the data manifold. Using a finite-time linear perturbation analysis on samples paths as well as probability flows, our work complements and extends existing works on obtaining theoretical guarantees for generative models from a stochastic analysis, statistical learning and uncertainty quantification points of view. Our results apply across different dynamical generative models, such as conditional flow-matching and score-based generative models, and for different target distributions that may or may not satisfy the manifold hypothesis.

摘要： A curious phenomenon observed in some dynamical generative models is the following: despite learning errors in the score function or the drift vector field, the generated samples appear to shift \emph{along} the support of the data distribution but not \emph{away} from it. In this work, we investigate this phenomenon of \emph{robustness of the support} by taking a dynamical systems approach on the generating stochastic/deterministic process. Our perturbation analysis of the probability flow reveals that infinitesimal learning errors cause the predicted density to be different from the target density only on the data manifold for a wide class of generative models. Further, what is the dynamical mechanism that leads to the robustness of the support? We show that the alignment of the top Lyapunov vectors (most sensitive infinitesimal perturbation directions) with the tangent spaces along the boundary of the data manifold leads to robustness and prove a sufficient condition on the dynamics of the generating process to achieve this alignment. Moreover, the alignment condition is efficient to compute and, in practice, for robust generative models, automatically leads to accurate estimates of the tangent bundle of the data manifold. Using a finite-time linear perturbation analysis on samples paths as well as probability flows, our work complements and extends existing works on obtaining theoretical guarantees for generative models from a stochastic analysis, statistical learning and uncertainty quantification points of view. Our results apply across different dynamical generative models, such as conditional flow-matching and score-based generative models, and for different target distributions that may or may not satisfy the manifold hypothesis.

主题：	机器学习 (cs.LG) ; 动力系统 (math.DS); 概率 (math.PR)
引用方式：	arXiv:2508.07581 [cs.LG]
	(或者 arXiv:2508.07581v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2508.07581

提交历史

来自： Nisha Chandramoorthy [查看电子邮件]
[v1] 星期一， 2025 年 8 月 11 日 03:24:34 UTC (5,132 KB)

计算机科学 > 机器学习

标题：何时以及如何不精确的生成模型仍然可以从数据流形中采样？

标题： When and how can inexact generative models still sample from the data manifold?

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 何时以及如何不精确的生成模型仍然可以从数据流形中采样？ 显示英文标题

标题： When and how can inexact generative models still sample from the data manifold?

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：何时以及如何不精确的生成模型仍然可以从数据流形中采样？