Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

Nguyen, Tam; Nguyen, Tan M.; Baraniuk, Richard G.

Computer Science > Computation and Language

arXiv:2312.00751 (cs)

[Submitted on 1 Dec 2023 ]

Title: Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

Title: 通过正则化非局部泛函减轻Transformer的过平滑问题

Authors:Tam Nguyen, Tan M. Nguyen, Richard G. Baraniuk

Abstract: Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model's depth grows. In this work, we show that self-attention layers in transformers minimize a functional which promotes smoothness, thereby causing token uniformity. We then propose a novel regularizer that penalizes the norm of the difference between the smooth output tokens from self-attention and the input tokens to preserve the fidelity of the tokens. Minimizing the resulting regularized energy functional, we derive the Neural Transformer with a Regularized Nonlocal Functional (NeuTRENO), a novel class of transformer models that can mitigate the over-smoothing issue. We empirically demonstrate the advantages of NeuTRENO over the baseline transformers and state-of-the-art methods in reducing the over-smoothing of token representations on various practical tasks, including object classification, image segmentation, and language modeling.

Abstract: 变压器在广泛的自然语言处理和计算机视觉应用中取得了显著的成功。然而，由于在模型深度增加时标记表示变得相同，深度变压器模型的表示能力会退化。在本工作中，我们表明变压器中的自注意力层最小化一个促进平滑性的泛函，从而导致标记统一性。然后，我们提出了一种新的正则化器，惩罚自注意力产生的平滑输出标记与输入标记之间的差值的范数，以保持标记的保真度。通过最小化得到的正则化能量泛函，我们推导出具有正则化非局部泛函的神经变压器（NeuTRENO），这是一种新型的变压器模型，可以缓解过度平滑问题。我们通过实验证明了NeuTRENO在减少各种实际任务中的标记表示过度平滑方面优于基线变压器和最先进的方法，包括物体分类、图像分割和语言建模。

Comments:	24 papes
Subjects:	Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Cite as:	arXiv:2312.00751 [cs.CL]
	(or arXiv:2312.00751v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2312.00751

Submission history

From: Tam Nguyen [view email]
[v1] Fri, 1 Dec 2023 17:52:47 UTC (2,455 KB)

Computer Science > Computation and Language

Title: Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

Title: 通过正则化非局部泛函减轻Transformer的过平滑问题

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title: Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals Show Chinese title

Title: 通过正则化非局部泛函减轻Transformer的过平滑问题

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals