When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation

Oskooei, Amirkia Rafiei; Cosdan, Kaan Baturalp; Isiktas, Husamettin; Aktas, Mehmet S.

计算机科学 > 软件工程

arXiv:2510.16809 (cs)

[提交于 2025年10月19日 ]

标题：当多示例提示失败时：对LLM代码翻译的实证研究

标题： When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation

Authors:Amirkia Rafiei Oskooei, Kaan Baturalp Cosdan, Husamettin Isiktas, Mehmet S. Aktas

摘要：大型语言模型（LLMs）具有广阔的上下文窗口，为在上下文中学习（ICL）提供了新的途径，在这种情况下，提供许多示例（“多射击”提示）通常被认为可以提高性能。我们针对代码翻译这一复杂任务检验了这一假设。通过一项涉及超过90,000次翻译的大规模实证研究，我们系统地评估了从零射击到最多625个示例的多射击配置中上下文示例扩展的影响，提示跨度从大约100,000到800,000个标记。我们的研究结果揭示了一个“多射击悖论”：虽然静态相似性指标可能随着示例数量的增加而略有改善，但功能正确性在少量射击提示（5-25个示例）时达到峰值。提供大量示例往往会降低这一关键的功能性能。本研究表明，对于代码翻译，少量精心选择的示例的质量胜过数量，这挑战了“更多更好”在ICL中的普遍有效性，并强调了最佳提示策略的任务依赖性。我们的结果对在软件工程中有效利用LLMs具有重要意义。

摘要： Large Language Models (LLMs) with vast context windows offer new avenues for in-context learning (ICL), where providing many examples ("many-shot" prompting) is often assumed to enhance performance. We investigate this assumption for the complex task of code translation. Through a large-scale empirical study of over 90,000 translations, we systematically evaluate the impact of scaling in-context examples from zero-shot to many-shot configurations of up to 625 examples, with prompts spanning from approximately 100,000 to 800,000 tokens. Our findings reveal a "many-shot paradox": while static similarity metrics may modestly improve with more examples, functional correctness consistently peaks with few-shot prompting (5-25 examples). Providing substantially more examples often degrades this crucial functional performance. This study highlights that for code translation, the quality of a few well-chosen examples outweighs sheer quantity, challenging the universal efficacy of "more is better" for ICL and underscoring the task-dependent nature of optimal prompting strategies. Our results have significant implications for effectively leveraging LLMs in software engineering.

主题：	软件工程 (cs.SE) ; 人工智能 (cs.AI); 计算与语言 (cs.CL); 编程语言 (cs.PL)
MSC 类：	68T50, 68N30, 68W40
ACM 类：	I.2.7; D.2.7; I.2.6
引用方式：	arXiv:2510.16809 [cs.SE]
	(或者 arXiv:2510.16809v1 [cs.SE] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.16809

提交历史

来自： Amirkia Rafiei Oskooei [查看电子邮件]
[v1] 星期日， 2025 年 10 月 19 日 12:29:13 UTC (11,725 KB)

计算机科学 > 软件工程

标题：当多示例提示失败时：对LLM代码翻译的实证研究

标题： When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 软件工程

标题： 当多示例提示失败时：对LLM代码翻译的实证研究 显示英文标题

标题： When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：当多示例提示失败时：对LLM代码翻译的实证研究