AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

Press, Ori; Amos, Brandon; Zhao, Haoyu; Wu, Yikai; Ainsworth, Samuel K.; Krupke, Dominik; Kidger, Patrick; Sajed, Touqir; Stellato, Bartolomeo; Park, Jisun; Bosch, Nathanael; Meril, Eli; Steppi, Albert; Zharmagambetov, Arman; Zhang, Fangzhao; Perez-Pineiro, David; Mercurio, Alberto; Zhan, Ni; Abramovich, Talor; Lieret, Kilian; Zhang, Hanlin; Huang, Shirley; Bethge, Matthias; Press, Ofir

计算机科学 > 软件工程

arXiv:2507.15887 (cs)

[提交于 2025年7月19日 ]

标题： AlgoTune：语言模型可以加速通用数值程序吗？

标题： AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

摘要：尽管语言模型（LM）的能力有所进步，但评估迄今为止主要集中在模型在人类已经解决的任务上的表现，包括编程（Jimenez等，2024年）和数学（Glazer等，2024年）。因此，我们提出测试模型在开放性基准中设计和实现算法的能力：我们让LM编写高效解决计算机科学、物理和数学中计算挑战性问题的代码。我们的AlgoTune基准包含从领域专家那里收集的155个编码任务，以及一个用于验证和计时LM合成解决方案代码的框架，该框架与流行开源包中的参考实现进行比较。此外，我们开发了一个基线LM代理AlgoTuner，并在一系列前沿模型上评估了其性能。AlgoTuner相对于我们的参考求解器平均提升了1.72倍，这些求解器使用了如SciPy、sk-learn和CVXPY之类的库。然而，我们发现当前模型无法发现算法创新，而是更倾向于表面优化。我们希望AlgoTune能推动LM代理的发展，使其表现出超越最先进人类表现的创造性解决问题能力。

摘要： Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (Jimenez et al., 2024) and mathematics (Glazer et al., 2024). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 155 coding tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages. In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models. AlgoTuner achieves an average 1.72x speedup against our reference solvers, which use libraries such as SciPy, sk-learn and CVXPY. However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.

主题：	软件工程 (cs.SE) ; 人工智能 (cs.AI); 计算与语言 (cs.CL); 机器学习 (cs.LG)
引用方式：	arXiv:2507.15887 [cs.SE]
	(或者 arXiv:2507.15887v1 [cs.SE] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.15887

提交历史

来自： Ori Press [查看电子邮件]
[v1] 星期六， 2025 年 7 月 19 日 11:23:25 UTC (822 KB)

计算机科学 > 软件工程

标题： AlgoTune：语言模型可以加速通用数值程序吗？

标题： AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 软件工程

标题： AlgoTune：语言模型可以加速通用数值程序吗？ 显示英文标题

标题： AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： AlgoTune：语言模型可以加速通用数值程序吗？