Twin Peaks: Dual-Head Architecture for Structure-Free Prediction of Protein-Protein Binding Affinity and Mutation Effects

Dey, Supantha; Chowdhury, Ratul

定量生物学 > 定量方法

arXiv:2509.22950 (q-bio)

[提交于 2025年9月26日 ]

标题：双峰：用于蛋白质-蛋白质结合亲和力和突变效应的无结构预测的双头架构

标题： Twin Peaks: Dual-Head Architecture for Structure-Free Prediction of Protein-Protein Binding Affinity and Mutation Effects

Authors:Supantha Dey, Ratul Chowdhury

摘要：我们提出了一种新颖的双头深度学习架构，用于蛋白质-蛋白质相互作用建模，该架构仅使用蛋白质序列信息即可同时预测结合亲和力($\Delta G$)和突变引起的亲和力变化($\Delta\Delta G$)。我们的方法通过采用在共享表示网络上运行的专业预测头，实现了对这两个值的直接和优化预测，相较于现有方法有了显著进步。为了确保稳健的泛化能力，我们整合了来自SKEMPI v2和PDBbind的互补数据集，并采用严格的基于蛋白质域的划分策略，防止训练集和验证集之间的信息泄露。我们的架构结合了基于Transformer的编码器和一种新的交叉注意力机制，可直接处理成对的蛋白质序列，而无需任何结构信息。网络使用ESM3表示嵌入输入序列，然后采用可学习的切片窗口嵌入层高效管理变长序列。一个多层Transformer编码器与双向自注意力捕捉蛋白质内部模式，而交叉注意力层则实现了蛋白质对之间相互作用的显式建模。这个共享表示网络输入到单独的$\Delta G$和$\Delta\Delta G$预测头，允许任务特定优化，同时利用共同特征。该模型在$\Delta\Delta G$验证中实现了0.485的皮尔逊相关性，同时保持了强大的$\Delta G$预测（皮尔逊：0.638）。尽管现有方法需要蛋白质结构数据和结合界面信息，但我们的模型消除了这些限制。这对于许多结构未知或难以结晶的蛋白质，如病毒蛋白和内在无序蛋白，提供了关键优势。

摘要： We present a novel dual-head deep learning architecture for protein-protein interaction modeling that enables simultaneous prediction of binding affinity ($\Delta G$) and mutation-induced affinity changes ($\Delta\Delta G$) using only protein sequence information. Our approach offers a significant advancement over existing methods by employing specialized prediction heads that operate on a shared representation network, allowing direct and optimized prediction of both values. To ensure robust generalization, we integrated complementary datasets from SKEMPI v2 and PDBbind with a rigorous protein domain-based splitting strategy that prevents information leakage between training and validation sets. Our architecture combines transformer-based encoders with a novel cross-attention mechanism that processes paired protein sequences directly, without requiring any structural information. The network embeds input sequences using ESM3 representations, then employs a learnable sliced window embedding layer to manage variable-length sequences efficiently. A multi-layer transformer encoder with bidirectional self-attention captures intra-protein patterns, while cross-attention layers enable explicit modeling of interactions between protein pairs. This shared representation network feeds into separate $\Delta G$ and $\Delta\Delta G$ prediction heads, allowing task-specific optimization while leveraging common features. The model achieves $\Delta\Delta G$ validation of Pearson correlation at 0.485, while maintaining strong $\Delta G$ predictions (Pearson: 0.638). While existing approaches require protein structure data and binding interface information, our model eliminates these constraints. This provides a critical advantage for the numerous proteins with unknown structures or those challenging to crystallize, such as viral and intrinsically disordered proteins.

主题：	定量方法 (q-bio.QM)
引用方式：	arXiv:2509.22950 [q-bio.QM]
	(或者 arXiv:2509.22950v1 [q-bio.QM] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.22950

提交历史

来自： Supantha Dey [查看电子邮件]
[v1] 星期五， 2025 年 9 月 26 日 21:32:33 UTC (30 KB)

定量生物学 > 定量方法

标题：双峰：用于蛋白质-蛋白质结合亲和力和突变效应的无结构预测的双头架构

标题： Twin Peaks: Dual-Head Architecture for Structure-Free Prediction of Protein-Protein Binding Affinity and Mutation Effects

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

定量生物学 > 定量方法

标题： 双峰：用于蛋白质-蛋白质结合亲和力和突变效应的无结构预测的双头架构 显示英文标题

标题： Twin Peaks: Dual-Head Architecture for Structure-Free Prediction of Protein-Protein Binding Affinity and Mutation Effects

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：双峰：用于蛋白质-蛋白质结合亲和力和突变效应的无结构预测的双头架构