A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws

Wang, Hong-Yi; Luo, Di; Poggio, Tomaso; Chuang, Isaac L.; Ziyin, Liu

统计学 > 机器学习

arXiv:2510.00504 (stat)

[提交于 2025年10月1日 ]

标题：通用压缩理论：彩票假设与超多项式扩展定律

标题： A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws

Authors:Hong-Yi Wang, Di Luo, Tomaso Poggio, Isaac L. Chuang, Liu Ziyin

摘要：在训练大规模模型时，性能通常根据一个缓慢的幂律随着参数数量和数据集大小而变化。一个基本的理论和实际问题是，是否可以用显著更小的模型和大量更少的数据实现相当的性能。在本工作中，我们提供了积极且建设性的答案。我们证明，一个关于$d$个对象的通用排列不变函数可以渐近地压缩为一个关于$\operatorname{polylog} d$个对象的函数，误差趋于零。这个定理得出两个关键结论：(Ia) 一个大型神经网络可以被压缩为多项对数宽度，同时保持其学习动力学；(Ib) 一个大型数据集可以被压缩为多项对数规模，同时保持对应模型的损失景观不变。 (Ia) 直接建立了\textit{动态的}抽奖券假设的证明，该假设指出任何普通网络都可以被强烈压缩，使得学习动力学和结果保持不变。 (Ib) 表明形式为$L\sim d^{-\alpha}$的神经缩放定律可以被提升到任意快速的幂律衰减，最终达到$\exp(-\alpha' \sqrt[m]{d})$。

摘要： When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller models and substantially less data. In this work, we provide a positive and constructive answer. We prove that a generic permutation-invariant function of $d$ objects can be asymptotically compressed into a function of $\operatorname{polylog} d$ objects with vanishing error. This theorem yields two key implications: (Ia) a large neural network can be compressed to polylogarithmic width while preserving its learning dynamics; (Ib) a large dataset can be compressed to polylogarithmic size while leaving the loss landscape of the corresponding model unchanged. (Ia) directly establishes a proof of the \textit{dynamical} lottery ticket hypothesis, which states that any ordinary network can be strongly compressed such that the learning dynamics and result remain unchanged. (Ib) shows that a neural scaling law of the form $L\sim d^{-\alpha}$ can be boosted to an arbitrarily fast power law decay, and ultimately to $\exp(-\alpha' \sqrt[m]{d})$.

评论：	预印本
主题：	机器学习 (stat.ML) ; 无序系统与神经网络 (cond-mat.dis-nn); 信息论 (cs.IT); 机器学习 (cs.LG)
引用方式：	arXiv:2510.00504 [stat.ML]
	(或者 arXiv:2510.00504v1 [stat.ML] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.00504

提交历史

来自： Liu Ziyin [查看电子邮件]
[v1] 星期三， 2025 年 10 月 1 日 04:35:23 UTC (3,662 KB)

统计学 > 机器学习

标题：通用压缩理论：彩票假设与超多项式扩展定律

标题： A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

统计学 > 机器学习

标题： 通用压缩理论：彩票假设与超多项式扩展定律 显示英文标题

标题： A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通用压缩理论：彩票假设与超多项式扩展定律