Online Training and Pruning of Deep Reinforcement Learning Networks

Guenter, Valentin Frank Ingmar; Sideris, Athanasios

计算机科学 > 机器学习

arXiv:2507.11975 (cs)

[提交于 2025年7月16日 ]

标题：在线深度强化学习网络的训练与剪枝

标题： Online Training and Pruning of Deep Reinforcement Learning Networks

Authors:Valentin Frank Ingmar Guenter, Athanasios Sideris

摘要：在强化学习（RL）算法中，对深度神经网络（NN）进行扩展已被证明在使用特征提取网络时可以提高性能，但获得的性能提升伴随着计算和内存复杂性的显著增加。在监督学习中，神经网络剪枝方法已成功解决了这一挑战。然而，它们在RL中的应用仍研究不足。我们提出了一种方法，将同时训练和剪枝集成到先进的RL方法中，特别是针对由在线特征提取网络（OFENet）增强的RL算法。我们的网络（XiNet）被训练以在RL网络的权重和用于0/1随机变量的变分伯努利分布参数上解决随机优化问题，$\xi$用于缩放网络中的每个单元。随机问题公式引入了正则化项，当一个单元对性能贡献较小时，这些项会促进变分参数收敛到0。在这种情况下，相应的结构被永久禁用并从其网络中剪枝。我们提出了一种成本感知的、促进稀疏性的正则化方案，专门针对OFENets的DenseNet架构，以这些网络中随机变量（RVs）的参数来表达涉及网络的参数复杂性。然后，当将此成本与正则化项匹配时，与它们相关的许多超参数会被自动选择，从而有效地结合RL目标和网络压缩。我们在连续控制基准（MuJoCo）和Soft Actor-Critic RL代理上评估了我们的方法，结果表明OFENets可以大幅剪枝而性能损失很小。此外，我们的结果证实，在训练过程中对大型网络进行剪枝会产生更高效且性能更好的RL代理，而不是从头开始训练较小的网络。

摘要： Scaling deep neural networks (NN) of reinforcement learning (RL) algorithms has been shown to enhance performance when feature extraction networks are used but the gained performance comes at the significant expense of increased computational and memory complexity. Neural network pruning methods have successfully addressed this challenge in supervised learning. However, their application to RL is underexplored. We propose an approach to integrate simultaneous training and pruning within advanced RL methods, in particular to RL algorithms enhanced by the Online Feature Extractor Network (OFENet). Our networks (XiNet) are trained to solve stochastic optimization problems over the RL networks' weights and the parameters of variational Bernoulli distributions for 0/1 Random Variables $\xi$ scaling each unit in the networks. The stochastic problem formulation induces regularization terms that promote convergence of the variational parameters to 0 when a unit contributes little to the performance. In this case, the corresponding structure is rendered permanently inactive and pruned from its network. We propose a cost-aware, sparsity-promoting regularization scheme, tailored to the DenseNet architecture of OFENets expressing the parameter complexity of involved networks in terms of the parameters of the RVs in these networks. Then, when matching this cost with the regularization terms, the many hyperparameters associated with them are automatically selected, effectively combining the RL objectives and network compression. We evaluate our method on continuous control benchmarks (MuJoCo) and the Soft Actor-Critic RL agent, demonstrating that OFENets can be pruned considerably with minimal loss in performance. Furthermore, our results confirm that pruning large networks during training produces more efficient and higher performing RL agents rather than training smaller networks from scratch.

评论：	25页，5图，4表
主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 机器人技术 (cs.RO)
引用方式：	arXiv:2507.11975 [cs.LG]
	(或者 arXiv:2507.11975v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.11975

提交历史

来自： Valentin Frank Ingmar Guenter [查看电子邮件]
[v1] 星期三， 2025 年 7 月 16 日 07:17:41 UTC (296 KB)

计算机科学 > 机器学习

标题：在线深度强化学习网络的训练与剪枝

标题： Online Training and Pruning of Deep Reinforcement Learning Networks

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 在线深度强化学习网络的训练与剪枝 显示英文标题

标题： Online Training and Pruning of Deep Reinforcement Learning Networks

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：在线深度强化学习网络的训练与剪枝