A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training

Zhang, Johnny R.; Mi, Xiaomei; Du, Gaoyuan; Sun, Qianyi; Wang, Shiqi; Li, Jiaxuan; Zhou, Wenhua

Computer Science > Machine Learning

arXiv:2509.14216 (cs)

[Submitted on 17 Sep 2025 ]

Title: A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training

Title: 一个通用的巴拿赫-布雷格曼框架用于随机迭代：统一随机镜像下降、学习和大语言模型训练

Authors:Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)

Abstract: Stochastic optimization powers the scalability of modern artificial intelligence, spanning machine learning, deep learning, reinforcement learning, and large language model training. Yet, existing theory remains largely confined to Hilbert spaces, relying on inner-product frameworks and orthogonality. This paradigm fails to capture non-Euclidean settings, such as mirror descent on simplices, Bregman proximal methods for sparse learning, natural gradient descent in information geometry, or Kullback--Leibler-regularized language model training. Unlike Euclidean-based Hilbert-space methods, this approach embraces general Banach spaces. This work introduces a pioneering Banach--Bregman framework for stochastic iterations, establishing Bregman geometry as a foundation for next-generation optimization. It (i) provides a unified template via Bregman projections and Bregman--Fejer monotonicity, encompassing stochastic approximation, mirror descent, natural gradient, adaptive methods, and mirror-prox; (ii) establishes super-relaxations ($\lambda > 2$) in non-Hilbert settings, enabling flexible geometries and elucidating their acceleration effect; and (iii) delivers convergence theorems spanning almost-sure boundedness to geometric rates, validated on synthetic and real-world tasks. Empirical studies across machine learning (UCI benchmarks), deep learning (e.g., Transformer training), reinforcement learning (actor--critic), and large language models (WikiText-2 with distilGPT-2) show up to 20% faster convergence, reduced variance, and enhanced accuracy over classical baselines. These results position Banach--Bregman geometry as a cornerstone unifying optimization theory and practice across core AI paradigms.

Abstract: 随机优化推动了现代人工智能的可扩展性，涵盖机器学习、深度学习、强化学习和大型语言模型训练。然而，现有理论主要局限于希尔伯特空间，依赖内积框架和正交性。这种范式无法捕捉非欧几里得环境，例如单纯形上的镜像下降，稀疏学习的Bregman邻近方法，信息几何中的自然梯度下降，或Kullback--Leibler正则化语言模型训练。与基于欧几里得的希尔伯特空间方法不同，这种方法采用一般的巴拿赫空间。这项工作引入了一个开创性的巴拿赫--Bregman框架用于随机迭代，确立了Bregman几何作为下一代优化的基础。它（i）通过Bregman投影和Bregman--Fejer单调性提供了一个统一的模板，涵盖了随机逼近、镜像下降、自然梯度、自适应方法和镜像-临近；（ii）在非希尔伯特环境下建立了超松弛（$\lambda > 2$），实现了灵活的几何结构并阐明了其加速效果；以及（iii）提供了从几乎必然有界性到几何速率的收敛定理，并在合成和现实任务中进行了验证。在机器学习（UCI基准）、深度学习（如Transformer训练）、强化学习（actor--critic）和大型语言模型（WikiText-2与distilGPT-2）上的实证研究显示，相较于经典基线，收敛速度提高了高达20%，方差减少，准确性增强。这些结果将巴拿赫--Bregman几何确立为核心人工智能范式中优化理论与实践的基石。

Comments:	69 pages, 10 figures. Preprint
Subjects:	Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Cite as:	arXiv:2509.14216 [cs.LG]
	(or arXiv:2509.14216v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2509.14216

Submission history

From: Johnny R. Zhang [view email]
[v1] Wed, 17 Sep 2025 17:50:59 UTC (6,682 KB)

Computer Science > Machine Learning

Title: A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training

Title: 一个通用的巴拿赫-布雷格曼框架用于随机迭代：统一随机镜像下降、学习和大语言模型训练

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title: A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training Show Chinese title

Title: 一个通用的巴拿赫-布雷格曼框架用于随机迭代：统一随机镜像下降、学习和大语言模型训练

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training