Random matrix theory and the loss surfaces of neural networks

Baskerville, Nicholas P

数学物理

arXiv:2306.02108 (math-ph)

[提交于 2023年6月3日 ]

标题：随机矩阵理论与神经网络的损失曲面

标题： Random matrix theory and the loss surfaces of neural networks

Authors:Nicholas P Baskerville

摘要：神经网络模型是机器学习中最成功的方法之一，在近年来得到了大量的发展和研究，并在科学、工程以及现代生活的几乎每一个领域都找到了具体的实际应用。然而，对于神经网络的理论理解远远落后于它们的实际成功以及围绕它们发展起来的工程经验法则。随机矩阵理论提供了一个丰富的工具框架，可以用来从理论上探索神经网络现象学的各个方面。在本论文中，我们通过扩展先前利用随机矩阵理论来理解和描述大型神经网络损失面的工作，特别是在不同的架构上进行推广。受到随机矩阵理论在物理学及其他领域的历史应用启发，我们证明了真实神经网络中存在局部随机矩阵普适性，并以此作为建模假设，推导出关于神经网络损失面及其谱系的Hessian的强大且新颖的结果。除了这些主要贡献之外，我们还利用神经网络损失面的随机矩阵模型来揭示现代神经网络训练方法的某些方面，并甚至推导出一种流行优化算法的新颖且有效的变体。总体而言，本论文在巩固随机矩阵理论在现代神经网络理论研究中的地位方面做出了重要贡献，揭示了现有方法的一些局限性，并开始研究随机矩阵理论在深度学习理论中的全新角色，基于局部随机矩阵普适性的实验发现和新颖的理论成果。

摘要： Neural network models are one of the most successful approaches to machine learning, enjoying an enormous amount of development and research over recent years and finding concrete real-world applications in almost any conceivable area of science, engineering and modern life in general. The theoretical understanding of neural networks trails significantly behind their practical success and the engineering heuristics that have grown up around them. Random matrix theory provides a rich framework of tools with which aspects of neural network phenomenology can be explored theoretically. In this thesis, we establish significant extensions of prior work using random matrix theory to understand and describe the loss surfaces of large neural networks, particularly generalising to different architectures. Informed by the historical applications of random matrix theory in physics and elsewhere, we establish the presence of local random matrix universality in real neural networks and then utilise this as a modeling assumption to derive powerful and novel results about the Hessians of neural network loss surfaces and their spectra. In addition to these major contributions, we make use of random matrix models for neural network loss surfaces to shed light on modern neural network training approaches and even to derive a novel and effective variant of a popular optimisation algorithm. Overall, this thesis provides important contributions to cement the place of random matrix theory in the theoretical study of modern neural networks, reveals some of the limits of existing approaches and begins the study of an entirely new role for random matrix theory in the theory of deep learning with important experimental discoveries and novel theoretical results based on local random matrix universality.

评论：	320页，博士论文
主题：	数学物理 (math-ph) ; 机器学习 (cs.LG); 概率 (math.PR)
引用方式：	arXiv:2306.02108 [math-ph]
	(或者 arXiv:2306.02108v1 [math-ph] 对于此版本)
	https://doi.org/10.48550/arXiv.2306.02108

提交历史

来自： Nicholas Baskerville [查看电子邮件]
[v1] 星期六， 2023 年 6 月 3 日 13:16:17 UTC (10,190 KB)

数学物理

标题：随机矩阵理论与神经网络的损失曲面

标题： Random matrix theory and the loss surfaces of neural networks

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学物理

标题： 随机矩阵理论与神经网络的损失曲面 显示英文标题

标题： Random matrix theory and the loss surfaces of neural networks

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：随机矩阵理论与神经网络的损失曲面