Transformers Meet In-Context Learning: A Universal Approximation Theory

Li, Gen; Jiao, Yuchen; Huang, Yu; Wei, Yuting; Chen, Yuxin

计算机科学 > 机器学习

arXiv:2506.05200 (cs)

[提交于 2025年6月5日 ]

标题： Transformer 遇到提示学习：一种通用近似理论

标题： Transformers Meet In-Context Learning: A Universal Approximation Theory

Authors:Gen Li, Yuchen Jiao, Yu Huang, Yuting Wei, Yuxin Chen

摘要：现代大型语言模型具备即时学习的能力，即在推理时仅使用提示中的少量输入-输出示例即可执行新任务，而无需微调或参数更新。我们发展了一种通用近似理论，以更好地理解变换器如何实现即时学习。对于任何函数类（每个代表一个不同的任务），我们展示了如何构建一个变换器，该变换器在没有任何进一步权重更新的情况下，仅通过几个即时上下文示例即可进行可靠的预测。与最近的许多文献将变换器视为算法近似器的做法不同——即构造变换器来模拟优化算法的迭代过程作为近似学习问题解的一种方式——我们的工作采取了根本不同的方法，基于通用函数近似。这种方法提供的近似保证不受所近似优化算法有效性的影响，因此远远超出了凸问题和线性函数类。我们的构建揭示了变换器如何能够同时学习通用表示并动态适应即时上下文示例。

摘要： Modern large language models are capable of in-context learning, the ability to perform new tasks at inference time using only a handful of input-output examples in the prompt, without any fine-tuning or parameter updates. We develop a universal approximation theory to better understand how transformers enable in-context learning. For any class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can perform reliable prediction given only a few in-context examples. In contrast to much of the recent literature that frames transformers as algorithm approximators -- i.e., constructing transformers to emulate the iterations of optimization algorithms as a means to approximate solutions of learning problems -- our work adopts a fundamentally different approach rooted in universal function approximation. This alternative approach offers approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being approximated, thereby extending far beyond convex problems and linear function classes. Our construction sheds light on how transformers can simultaneously learn general-purpose representations and adapt dynamically to in-context examples.

主题：	机器学习 (cs.LG) ; 统计理论 (math.ST); 机器学习 (stat.ML)
引用方式：	arXiv:2506.05200 [cs.LG]
	(或者 arXiv:2506.05200v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.05200

提交历史

来自： Yuchen Jiao [查看电子邮件]
[v1] 星期四， 2025 年 6 月 5 日 16:12:51 UTC (98 KB)

计算机科学 > 机器学习

标题： Transformer 遇到提示学习：一种通用近似理论

标题： Transformers Meet In-Context Learning: A Universal Approximation Theory

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： Transformer 遇到提示学习：一种通用近似理论 显示英文标题

标题： Transformers Meet In-Context Learning: A Universal Approximation Theory

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： Transformer 遇到提示学习：一种通用近似理论