Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2509.14216

Help | Advanced Search

Computer Science > Machine Learning

arXiv:2509.14216 (cs)
[Submitted on 17 Sep 2025 ]

Title: A Universal Banach--Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training

Title: 一个通用的巴拿赫-布雷格曼框架用于随机迭代:统一随机镜像下降、学习和大语言模型训练

Authors:Johnny R. Zhang (Independent Researcher), Xiaomei Mi (University of Manchester), Gaoyuan Du (Amazon), Qianyi Sun (Microsoft), Shiqi Wang (Meta), Jiaxuan Li (Amazon), Wenhua Zhou (Independent Researcher)
Abstract: Stochastic optimization powers the scalability of modern artificial intelligence, spanning machine learning, deep learning, reinforcement learning, and large language model training. Yet, existing theory remains largely confined to Hilbert spaces, relying on inner-product frameworks and orthogonality. This paradigm fails to capture non-Euclidean settings, such as mirror descent on simplices, Bregman proximal methods for sparse learning, natural gradient descent in information geometry, or Kullback--Leibler-regularized language model training. Unlike Euclidean-based Hilbert-space methods, this approach embraces general Banach spaces. This work introduces a pioneering Banach--Bregman framework for stochastic iterations, establishing Bregman geometry as a foundation for next-generation optimization. It (i) provides a unified template via Bregman projections and Bregman--Fejer monotonicity, encompassing stochastic approximation, mirror descent, natural gradient, adaptive methods, and mirror-prox; (ii) establishes super-relaxations ($\lambda > 2$) in non-Hilbert settings, enabling flexible geometries and elucidating their acceleration effect; and (iii) delivers convergence theorems spanning almost-sure boundedness to geometric rates, validated on synthetic and real-world tasks. Empirical studies across machine learning (UCI benchmarks), deep learning (e.g., Transformer training), reinforcement learning (actor--critic), and large language models (WikiText-2 with distilGPT-2) show up to 20% faster convergence, reduced variance, and enhanced accuracy over classical baselines. These results position Banach--Bregman geometry as a cornerstone unifying optimization theory and practice across core AI paradigms.
Abstract: 随机优化推动了现代人工智能的可扩展性,涵盖机器学习、深度学习、强化学习和大型语言模型训练。 然而,现有理论主要局限于希尔伯特空间,依赖内积框架和正交性。 这种范式无法捕捉非欧几里得环境,例如单纯形上的镜像下降,稀疏学习的Bregman邻近方法,信息几何中的自然梯度下降,或Kullback--Leibler正则化语言模型训练。 与基于欧几里得的希尔伯特空间方法不同,这种方法采用一般的巴拿赫空间。 这项工作引入了一个开创性的巴拿赫--Bregman框架用于随机迭代,确立了Bregman几何作为下一代优化的基础。 它(i)通过Bregman投影和Bregman--Fejer单调性提供了一个统一的模板,涵盖了随机逼近、镜像下降、自然梯度、自适应方法和镜像-临近;(ii)在非希尔伯特环境下建立了超松弛($\lambda > 2$),实现了灵活的几何结构并阐明了其加速效果;以及(iii)提供了从几乎必然有界性到几何速率的收敛定理,并在合成和现实任务中进行了验证。 在机器学习(UCI基准)、深度学习(如Transformer训练)、强化学习(actor--critic)和大型语言模型(WikiText-2与distilGPT-2)上的实证研究显示,相较于经典基线,收敛速度提高了高达20%,方差减少,准确性增强。 这些结果将巴拿赫--Bregman几何确立为核心人工智能范式中优化理论与实践的基石。
Comments: 69 pages, 10 figures. Preprint
Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Cite as: arXiv:2509.14216 [cs.LG]
  (or arXiv:2509.14216v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2509.14216
arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Johnny R. Zhang [view email]
[v1] Wed, 17 Sep 2025 17:50:59 UTC (6,682 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs.AI
< prev   |   next >
new | recent | 2025-09
Change to browse by:
cs
cs.LG

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号