Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time

Bordelon, Blake; Letey, Mary I.; Pehlevan, Cengiz

统计学 > 机器学习

arXiv:2510.01098 (stat)

[提交于 2025年10月1日 ]

标题：上下文回归的尺度定律理论：深度、宽度、上下文和时间

标题： Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time

Authors:Blake Bordelon, Mary I. Letey, Cengiz Pehlevan

摘要：我们研究深度线性自注意力模型中线性回归的上下文学习（ICL），描述性能如何依赖于各种计算和统计资源（宽度、深度、训练步数、批量大小和每个上下文的数据量）。在数据维度、上下文长度和残差流宽度按比例扩展的联合极限下，我们分析了三种ICL设置的极限渐近行为：（1）各向同性协变量和任务（ISO），（2）固定且结构化协变量（FS），以及（3）协变量在不同上下文中随机旋转且结构化（RRS）。对于ISO和FS设置，我们发现当上下文长度有限时，深度才有助于提升ICL性能。相反，在RRS设置中，协变量在不同上下文中发生变化，增加深度即使在无限上下文长度下也能显著提高ICL性能。这提供了一个新的可解的简单神经网络缩放定律模型，该模型依赖于变压器的宽度和深度，并预测了计算量函数的最优变压器形状。这个简单模型能够计算风险的精确渐近行为，并在ICL任务的源/容量条件下推导出幂律。

摘要： We study in-context learning (ICL) of linear regression in a deep linear self-attention model, characterizing how performance depends on various computational and statistical resources (width, depth, number of training steps, batch size and data per context). In a joint limit where data dimension, context length, and residual stream width scale proportionally, we analyze the limiting asymptotics for three ICL settings: (1) isotropic covariates and tasks (ISO), (2) fixed and structured covariance (FS), and (3) where covariances are randomly rotated and structured (RRS). For ISO and FS settings, we find that depth only aids ICL performance if context length is limited. Alternatively, in the RRS setting where covariances change across contexts, increasing the depth leads to significant improvements in ICL, even at infinite context length. This provides a new solvable toy model of neural scaling laws which depends on both width and depth of a transformer and predicts an optimal transformer shape as a function of compute. This toy model enables computation of exact asymptotics for the risk as well as derivation of powerlaws under source/capacity conditions for the ICL tasks.

评论：	29页的预印本
主题：	机器学习 (stat.ML) ; 无序系统与神经网络 (cond-mat.dis-nn); 机器学习 (cs.LG)
引用方式：	arXiv:2510.01098 [stat.ML]
	(或者 arXiv:2510.01098v1 [stat.ML] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.01098

提交历史

来自： Blake Bordelon [查看电子邮件]
[v1] 星期三， 2025 年 10 月 1 日 16:45:04 UTC (1,476 KB)

统计学 > 机器学习

标题：上下文回归的尺度定律理论：深度、宽度、上下文和时间

标题： Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 机器学习

标题： 上下文回归的尺度定律理论：深度、宽度、上下文和时间 显示英文标题

标题： Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：上下文回归的尺度定律理论：深度、宽度、上下文和时间