A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity

Lee, Wei-Cheng; Orabona, Francesco

计算机科学 > 机器学习

arXiv:2506.01052 (cs)

[提交于 2025年6月1日 ]

标题：时域学习的有限时间分析：基于线性函数逼近且无需投影和强凸性

标题： A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity

Authors:Wei-Cheng Lee, Francesco Orabona

摘要：我们研究了带有线性函数逼近的时差（TD）学习的有限时间收敛性质，这是强化学习中的一个基础算法。虽然之前的工作已经建立了收敛保证，但这些结果通常依赖于假设每次迭代都被投影到一个有界集合上，或者学习率根据未知的强凸常数设定——这些条件既人为又不符合当前实践。在本文中，我们挑战了这些假设的必要性，并对TD学习进行了细化分析。我们证明了简单的无投影变体以速率$\tilde{\mathcal{O}}(\frac{||\theta^*||^2_2}{\sqrt{T}})$收敛，即使存在马尔可夫噪声也是如此。我们的分析揭示了TD更新的一种新颖的自我限制特性，并利用它来保证迭代有界。

摘要： We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in reinforcement learning. While prior work has established convergence guarantees, these results typically rely on the assumption that each iterate is projected onto a bounded set or that the learning rate is set according to the unknown strong convexity constant -- conditions that are both artificial and do not match the current practice. In this paper, we challenge the necessity of such assumptions and present a refined analysis of TD learning. We show that the simple projection-free variant converges with a rate of $\tilde{\mathcal{O}}(\frac{||\theta^*||^2_2}{\sqrt{T}})$, even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.

主题：	机器学习 (cs.LG) ; 优化与控制 (math.OC); 机器学习 (stat.ML)
引用方式：	arXiv:2506.01052 [cs.LG]
	(或者 arXiv:2506.01052v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.01052

提交历史

来自： Wei-Cheng Lee [查看电子邮件]
[v1] 星期日， 2025 年 6 月 1 日 15:39:00 UTC (21 KB)

计算机科学 > 机器学习

标题：时域学习的有限时间分析：基于线性函数逼近且无需投影和强凸性

标题： A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 时域学习的有限时间分析：基于线性函数逼近且无需投影和强凸性 显示英文标题

标题： A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：时域学习的有限时间分析：基于线性函数逼近且无需投影和强凸性