How Many Instructions Can LLMs Follow at Once?

Jaroslawicz, Daniel; Whiting, Brendan; Shah, Parth; Maamari, Karime

计算机科学 > 人工智能

arXiv:2507.11538 (cs)

[提交于 2025年7月15日 ]

标题： LLMs一次可以遵循多少条指令？

标题： How Many Instructions Can LLMs Follow at Once?

Authors:Daniel Jaroslawicz, Brendan Whiting, Parth Shah, Karime Maamari

摘要：生产级大语言模型系统需要同时严格遵守数十条甚至数百条指令。然而，目前尚未对高指令密度下大语言模型的指令遵循能力进行表征，因为现有的基准测试仅在包含单个或少量指令的任务上评估模型。我们引入了IFScale，这是一个针对业务报告写作任务的500条关键词包含指令的简单基准，用于衡量随着指令密度增加，指令遵循性能如何下降。我们在七家主要供应商的20个最先进模型上进行了评估，发现即使是最先进的模型，在最大密度500条指令的情况下也只能达到68%的准确率。我们的分析揭示了模型大小和推理能力与3种不同的性能下降模式、对早期指令的偏倚以及指令遵循错误的不同类别相关。我们的见解可以帮助指导现实应用中指令密集提示的设计，并突出重要的性能与延迟权衡。我们开源了基准测试和所有结果，以便进一步分析，网址为https://distylai.github.io/IFScale。

摘要： Production-grade LLM systems require robust adherence to dozens or even hundreds of instructions simultaneously. However, the instruction-following capabilities of LLMs at high instruction densities have not yet been characterized, as existing benchmarks only evaluate models on tasks with a single or few instructions. We introduce IFScale, a simple benchmark of 500 keyword-inclusion instructions for a business report writing task to measure how instruction-following performance degrades as instruction density increases. We evaluate 20 state-of-the-art models across seven major providers and find that even the best frontier models only achieve 68% accuracy at the max density of 500 instructions. Our analysis reveals model size and reasoning capability to correlate with 3 distinct performance degradation patterns, bias towards earlier instructions, and distinct categories of instruction-following errors. Our insights can help inform design of instruction-dense prompts in real-world applications and highlight important performance-latency tradeoffs. We open-source the benchmark and all results for further analysis at https://distylai.github.io/IFScale.

主题：	人工智能 (cs.AI)
引用方式：	arXiv:2507.11538 [cs.AI]
	(或者 arXiv:2507.11538v1 [cs.AI] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.11538

提交历史

来自： Daniel Jaroslawicz [查看电子邮件]
[v1] 星期二， 2025 年 7 月 15 日 17:59:42 UTC (6,761 KB)

计算机科学 > 人工智能

标题： LLMs一次可以遵循多少条指令？

标题： How Many Instructions Can LLMs Follow at Once?

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 人工智能

标题： LLMs一次可以遵循多少条指令？ 显示英文标题

标题： How Many Instructions Can LLMs Follow at Once?

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： LLMs一次可以遵循多少条指令？