Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing

Sim, Kevin; Renau, Quentin; Hart, Emma

计算机科学 > 神经与进化计算

arXiv:2501.11411 (cs)

[提交于 2025年1月20日 ]

标题：超越炒作：对 bin packing 的 LLM 演化启发式方法的基准测试

标题： Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing

Authors:Kevin Sim, Quentin Renau, Emma Hart

摘要：将大型语言模型（LLMs）与进化算法相结合，最近显示出作为设计新启发式方法的显著前景，这些方法在组合优化领域优于现有方法。一种不断升级的竞赛正在迅速产生新的启发式方法，并提高进化它们的过程的效率。然而，由于希望快速展示新方法的优势，对特定领域产生的新启发式方法的评估通常很粗略：仅在非常少的数据集上进行测试，这些数据集中的实例都属于该领域的特定类别，且每个类别中的实例数量也很少。以装箱问题为例，据我们所知，我们进行了首次对新LLM生成的启发式方法的严格基准测试研究，使用三个性能指标，在大量基准实例上将其与众所周知的现有启发式方法进行比较。对于每种启发式方法，我们随后进化出被该启发式方法赢得的新实例，并进行实例空间分析，以了解每种启发式方法在特征空间中的表现良好区域。我们表明，与现有的简单启发式方法相比，大多数LLM启发式方法在广泛范围的基准测试中泛化效果不佳，并建议，任何从仅在实例空间小区域起作用的非常专业启发式方法中获得的收益，都需要仔细权衡生成这些启发式方法的相当大的成本。

摘要： Coupling Large Language Models (LLMs) with Evolutionary Algorithms has recently shown significant promise as a technique to design new heuristics that outperform existing methods, particularly in the field of combinatorial optimisation. An escalating arms race is both rapidly producing new heuristics and improving the efficiency of the processes evolving them. However, driven by the desire to quickly demonstrate the superiority of new approaches, evaluation of the new heuristics produced for a specific domain is often cursory: testing on very few datasets in which instances all belong to a specific class from the domain, and on few instances per class. Taking bin-packing as an example, to the best of our knowledge we conduct the first rigorous benchmarking study of new LLM-generated heuristics, comparing them to well-known existing heuristics across a large suite of benchmark instances using three performance metrics. For each heuristic, we then evolve new instances won by the heuristic and perform an instance space analysis to understand where in the feature space each heuristic performs well. We show that most of the LLM heuristics do not generalise well when evaluated across a broad range of benchmarks in contrast to existing simple heuristics, and suggest that any gains from generating very specialist heuristics that only work in small areas of the instance space need to be weighed carefully against the considerable cost of generating these heuristics.

评论：	将出现在《进化计算的应用》第28届国际会议，EvoApplications 2025上
主题：	神经与进化计算 (cs.NE)
引用方式：	arXiv:2501.11411 [cs.NE]
	(或者 arXiv:2501.11411v1 [cs.NE] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.11411

提交历史

来自： Quentin Renau [查看电子邮件]
[v1] 星期一， 2025 年 1 月 20 日 11:23:50 UTC (5,041 KB)

计算机科学 > 神经与进化计算

标题：超越炒作：对 bin packing 的 LLM 演化启发式方法的基准测试

标题： Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 神经与进化计算

标题： 超越炒作：对 bin packing 的 LLM 演化启发式方法的基准测试 显示英文标题

标题： Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：超越炒作：对 bin packing 的 LLM 演化启发式方法的基准测试