Kevin: Multi-Turn RL for Generating CUDA Kernels

Baronio, Carlo; Marsella, Pietro; Pan, Ben; Guo, Simon; Alberti, Silas

计算机科学 > 机器学习

arXiv:2507.11948 (cs)

[提交于 2025年7月16日 ]

标题：凯文：生成CUDA内核的多轮强化学习

标题： Kevin: Multi-Turn RL for Generating CUDA Kernels

Authors:Carlo Baronio, Pietro Marsella, Ben Pan, Simon Guo, Silas Alberti

摘要：编写GPU内核是一项具有挑战性的任务，对AI系统的效率至关重要。它也是高度迭代的：领域专家编写代码并通过执行反馈来提升性能。此外，它提供了可验证的奖励，如正确性和加速比，使其成为应用强化学习（RL）的自然环境。为了在训练中显式地融入这一过程的迭代特性，我们开发了一个灵活的多轮RL方案，解决了现实环境中遇到的独特挑战，例如从长轨迹中学习以及在各轮中的有效奖励分配。我们提出了Kevin - K(ernel D)evin，第一个使用多轮RL训练的CUDA内核生成和优化模型。在我们的评估设置中，Kevin相比其基础模型（QwQ-32B）取得了显著提升，将生成的内核的正确性（纯CUDA）从56%提高到82%，平均加速比从0.53倍提升至1.10倍的基准（PyTorch Eager），并且超越了前沿模型如o4-mini（0.78倍）。最后，我们研究了其在测试时扩展轴上的行为：我们发现扩展串行优化比并行采样更有益。特别是，当提供更多的优化轮次时，Kevin表现出更高的改进率。

摘要： Writing GPU kernels is a challenging task and critical for AI systems' efficiency. It is also highly iterative: domain experts write code and improve performance through execution feedback. Moreover, it presents verifiable rewards like correctness and speedup, making it a natural environment to apply Reinforcement Learning (RL). To explicitly incorporate the iterative nature of this process into training, we develop a flexible multi-turn RL recipe that addresses unique challenges encountered in real-world settings, such as learning from long trajectories and effective reward attribution across turns. We present Kevin - K(ernel D)evin, the first model trained with multi-turn RL for CUDA kernel generation and optimization. In our evaluation setup, Kevin shows significant gains over its base model (QwQ-32B), improving correctness of generated kernels (in pure CUDA) from 56% to 82% and mean speedup from 0.53x to 1.10x of baseline (PyTorch Eager), and surpassing frontier models like o4-mini (0.78x). Finally, we study its behavior across test-time scaling axes: we found scaling serial refinement more beneficial than parallel sampling. In particular, when given more refinement turns, Kevin shows a higher rate of improvement.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI); 性能 (cs.PF); 软件工程 (cs.SE)
引用方式：	arXiv:2507.11948 [cs.LG]
	(或者 arXiv:2507.11948v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.11948

提交历史

来自： Carlo Baronio [查看电子邮件]
[v1] 星期三， 2025 年 7 月 16 日 06:33:07 UTC (3,123 KB)

计算机科学 > 机器学习

标题：凯文：生成CUDA内核的多轮强化学习

标题： Kevin: Multi-Turn RL for Generating CUDA Kernels

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 凯文：生成CUDA内核的多轮强化学习 显示英文标题

标题： Kevin: Multi-Turn RL for Generating CUDA Kernels

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：凯文：生成CUDA内核的多轮强化学习