Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2312.00751

Help | Advanced Search

Computer Science > Computation and Language

arXiv:2312.00751 (cs)
[Submitted on 1 Dec 2023 ]

Title: Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

Title: 通过正则化非局部泛函减轻Transformer的过平滑问题

Authors:Tam Nguyen, Tan M. Nguyen, Richard G. Baraniuk
Abstract: Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model's depth grows. In this work, we show that self-attention layers in transformers minimize a functional which promotes smoothness, thereby causing token uniformity. We then propose a novel regularizer that penalizes the norm of the difference between the smooth output tokens from self-attention and the input tokens to preserve the fidelity of the tokens. Minimizing the resulting regularized energy functional, we derive the Neural Transformer with a Regularized Nonlocal Functional (NeuTRENO), a novel class of transformer models that can mitigate the over-smoothing issue. We empirically demonstrate the advantages of NeuTRENO over the baseline transformers and state-of-the-art methods in reducing the over-smoothing of token representations on various practical tasks, including object classification, image segmentation, and language modeling.
Abstract: 变压器在广泛的自然语言处理和计算机视觉应用中取得了显著的成功。 然而,由于在模型深度增加时标记表示变得相同,深度变压器模型的表示能力会退化。 在本工作中,我们表明变压器中的自注意力层最小化一个促进平滑性的泛函,从而导致标记统一性。 然后,我们提出了一种新的正则化器,惩罚自注意力产生的平滑输出标记与输入标记之间的差值的范数,以保持标记的保真度。 通过最小化得到的正则化能量泛函,我们推导出具有正则化非局部泛函的神经变压器(NeuTRENO),这是一种新型的变压器模型,可以缓解过度平滑问题。 我们通过实验证明了NeuTRENO在减少各种实际任务中的标记表示过度平滑方面优于基线变压器和最先进的方法,包括物体分类、图像分割和语言建模。
Comments: 24 papes
Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Cite as: arXiv:2312.00751 [cs.CL]
  (or arXiv:2312.00751v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2312.00751
arXiv-issued DOI via DataCite

Submission history

From: Tam Nguyen [view email]
[v1] Fri, 1 Dec 2023 17:52:47 UTC (2,455 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • TeX Source
license icon view license
Current browse context:
cs.CL
< prev   |   next >
new | recent | 2023-12
Change to browse by:
cs
cs.AI

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号