Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > q-bio > arXiv:2509.24262

Help | Advanced Search

Quantitative Biology > Quantitative Methods

arXiv:2509.24262 (q-bio)
[Submitted on 29 Sep 2025 (v1) , last revised 21 Oct 2025 (this version, v2)]

Title: LAMP-PRo: Label-aware Attention for Multi-label Prediction of DNA- and RNA-binding Proteins using Protein Language Models

Title: LAMP-PRo:使用蛋白质语言模型进行DNA和RNA结合蛋白多标签预测的标签感知注意力

Authors:Nimisha Ghosh, Dheeran Sankaran, Rahul Balakrishnan Adhi, Sharath S, Amrut Anand
Abstract: Identifying DNA- (DBPs) and RNA-binding proteins (RBPs) is crucial for the understanding of cell function, molecular interactions as well as regulatory functions. Owing to their high similarity, most of the existing approaches face challenges in differentiating between DBPs and RBPs leading to high cross-prediction errors. Moreover, identifying proteins which bind to both DNA and RNA (DRBPs) is also quite a challenging task. In this regard, we propose a novel framework viz. LAMP-PRo which is based on pre-trained protein language model (PLM), attention mechanisms and multi-label learning to mitigate these issues. First, pre-trained PLM such ESM-2 is used for embedding the protein sequences followed by convolutional neural network (CNN). Subsequently multi-head self-attention mechanism is applied for the contextual information while label-aware attention is used to compute class-specific representations by attending to the sequence in a way that is tailored to each label (DBP, RBP and non-NABP) in a multi-label setup. We have also included a novel cross-label attention mechanism to explicitly capture dependencies between DNA- and RNA-binding proteins, enabling more accurate prediction of DRBP. Finally, a linear layer followed by a sigmoid function are used for the final prediction. Extensive experiments are carried out to compare LAMP-PRo with the existing methods wherein the proposed model shows consistent competent performance. Furthermore, we also provide visualization to showcase model interpretability, highlighting which parts of the sequence are most relevant for a predicted label. The original datasets are available at http://bliulab.net/iDRBP\_MMC and the codes are available at https://github.com/NimishaGhosh/LAMP-PRo.
Abstract: 识别DNA结合蛋白(DBPs)和RNA结合蛋白(RBPs)对于理解细胞功能、分子相互作用以及调控功能至关重要。由于它们的高度相似性,现有的大多数方法在区分DBPs和RBPs时面临挑战,导致高交叉预测误差。此外,识别同时结合DNA和RNA的蛋白质(DRBPs)也是一个相当具有挑战性的任务。在这方面,我们提出了一种新的框架,即LAMP-PRo,该框架基于预训练的蛋白质语言模型(PLM)、注意力机制和多标签学习来缓解这些问题。首先,使用预训练的PLM如ESM-2对蛋白质序列进行嵌入,随后使用卷积神经网络(CNN)。接着,应用多头自注意力机制来获取上下文信息,而标签感知注意力用于通过以针对每个标签(DBP、RBP和非NABP)的方式关注序列来计算类别特定的表示,在多标签设置中。我们还引入了一种新颖的跨标签注意力机制,以显式捕捉DNA结合蛋白和RNA结合蛋白之间的依赖关系,从而更准确地预测DRBP。最后,使用一个线性层和一个sigmoid函数进行最终预测。进行了大量实验,将LAMP-PRo与现有方法进行比较,其中所提出的模型表现出一致的优异性能。此外,我们还提供了可视化结果,以展示模型的可解释性,突出显示序列中哪些部分对于预测标签最为相关。原始数据集可在http://bliulab.net/iDRBP\_MMC获取,代码可在https://github.com/NimishaGhosh/LAMP-PRo获取。
Subjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2509.24262 [q-bio.QM]
  (or arXiv:2509.24262v2 [q-bio.QM] for this version)
  https://doi.org/10.48550/arXiv.2509.24262
arXiv-issued DOI via DataCite

Submission history

From: Nimisha Ghosh [view email]
[v1] Mon, 29 Sep 2025 04:13:51 UTC (236 KB)
[v2] Tue, 21 Oct 2025 13:08:07 UTC (236 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
license icon view license
Current browse context:
q-bio.QM
< prev   |   next >
new | recent | 2025-09
Change to browse by:
cs
cs.AI
cs.LG
q-bio

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号