Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2106.00047

Help | Advanced Search

Computer Science > Machine Learning

arXiv:2106.00047 (cs)
[Submitted on 31 May 2021 ]

Title: Learning and Generalization in RNNs

Title: 循环神经网络中的学习与泛化

Authors:Abhishek Panigrahi, Navin Goyal
Abstract: Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make progress towards remedying this situation by proving that RNNs can learn functions of sequences. In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions. Conceptually and technically, we introduce new ideas which enable us to extract information from the hidden state of the RNN in our proofs -- addressing a crucial weakness in previous work. We illustrate our results on some regular language recognition problems.
Abstract: 简单的循环神经网络(RNNs)及其更高级的变体如LSTMs等在序列建模方面非常成功。然而,它们的理论理解仍然不足,未能跟上前馈网络的进步。在前馈网络中,特别是在高度过参数化的单隐藏层网络的特殊情况下,已经出现了相对完整的理解。在本文中,我们通过证明RNN可以学习序列函数来弥补这一情况。与之前只能处理序列中各个标记函数之和的函数的工作不同,我们允许一般的函数。在概念和技术上,我们引入了新的想法,使我们能够在证明中从RNN的隐藏状态中提取信息——解决了之前工作的关键弱点。我们在一些正则语言识别问题上展示了我们的结果。
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2106.00047 [cs.LG]
  (or arXiv:2106.00047v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2106.00047
arXiv-issued DOI via DataCite

Submission history

From: Abhishek Panigrahi [view email]
[v1] Mon, 31 May 2021 18:27:51 UTC (2,645 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs
< prev   |   next >
new | recent | 2021-06
Change to browse by:
cs.LG

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号