When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation

Oskooei, Amirkia Rafiei; Cosdan, Kaan Baturalp; Isiktas, Husamettin; Aktas, Mehmet S.

Computer Science > Software Engineering

arXiv:2510.16809 (cs)

[Submitted on 19 Oct 2025 ]

Title: When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation

Title: 当多示例提示失败时：对LLM代码翻译的实证研究

Authors:Amirkia Rafiei Oskooei, Kaan Baturalp Cosdan, Husamettin Isiktas, Mehmet S. Aktas

Abstract: Large Language Models (LLMs) with vast context windows offer new avenues for in-context learning (ICL), where providing many examples ("many-shot" prompting) is often assumed to enhance performance. We investigate this assumption for the complex task of code translation. Through a large-scale empirical study of over 90,000 translations, we systematically evaluate the impact of scaling in-context examples from zero-shot to many-shot configurations of up to 625 examples, with prompts spanning from approximately 100,000 to 800,000 tokens. Our findings reveal a "many-shot paradox": while static similarity metrics may modestly improve with more examples, functional correctness consistently peaks with few-shot prompting (5-25 examples). Providing substantially more examples often degrades this crucial functional performance. This study highlights that for code translation, the quality of a few well-chosen examples outweighs sheer quantity, challenging the universal efficacy of "more is better" for ICL and underscoring the task-dependent nature of optimal prompting strategies. Our results have significant implications for effectively leveraging LLMs in software engineering.

Abstract: 大型语言模型（LLMs）具有广阔的上下文窗口，为在上下文中学习（ICL）提供了新的途径，在这种情况下，提供许多示例（“多射击”提示）通常被认为可以提高性能。我们针对代码翻译这一复杂任务检验了这一假设。通过一项涉及超过90,000次翻译的大规模实证研究，我们系统地评估了从零射击到最多625个示例的多射击配置中上下文示例扩展的影响，提示跨度从大约100,000到800,000个标记。我们的研究结果揭示了一个“多射击悖论”：虽然静态相似性指标可能随着示例数量的增加而略有改善，但功能正确性在少量射击提示（5-25个示例）时达到峰值。提供大量示例往往会降低这一关键的功能性能。本研究表明，对于代码翻译，少量精心选择的示例的质量胜过数量，这挑战了“更多更好”在ICL中的普遍有效性，并强调了最佳提示策略的任务依赖性。我们的结果对在软件工程中有效利用LLMs具有重要意义。

Subjects:	Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
MSC classes:	68T50, 68N30, 68W40
ACM classes:	I.2.7; D.2.7; I.2.6
Cite as:	arXiv:2510.16809 [cs.SE]
	(or arXiv:2510.16809v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2510.16809

Submission history

From: Amirkia Rafiei Oskooei [view email]
[v1] Sun, 19 Oct 2025 12:29:13 UTC (11,725 KB)

Computer Science > Software Engineering

Title: When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation

Title: 当多示例提示失败时：对LLM代码翻译的实证研究

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title: When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation Show Chinese title

Title: 当多示例提示失败时：对LLM代码翻译的实证研究

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: When Many-Shot Prompting Fails: An Empirical Study of LLM Code Translation