GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration

Li, Ziwen; Chen, Xiang 'Anthony'; Jeon, Youngseung

定量生物学 > 定量方法

arXiv:2501.16382 (q-bio)

[提交于 2025年1月24日 ]

标题： GraPPI：一种用于大规模蛋白质-蛋白质相互作用探索的检索-分割-求解图RAG框架

标题： GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration

Authors:Ziwen Li, Xiang 'Anthony' Chen, Youngseung Jeon

摘要：药物发现（DD）对维持和改善公共健康做出了巨大贡献。假设抑制蛋白质错误折叠可以减缓疾病进展，研究人员专注于目标识别（Target ID）以寻找药物结合的蛋白质结构。尽管大型语言模型（LLMs）和检索增强生成（RAG）框架加速了药物发现，但将模型整合到连贯的工作流程中仍然具有挑战性。我们进行了一项用户研究，与药物发现研究人员一起确定LLMs和RAG在Target ID中的适用性。我们发现了两个主要结果： 1）一个LLM应基于初始蛋白质和具有治疗影响的蛋白质候选物提供多个蛋白质-蛋白质相互作用（PPIs）；2）该模型必须提供PPI及相关解释以更好地理解。基于这些观察，我们确定了之前Target ID方法的三个局限性：1）语义模糊性，2）缺乏可解释性，3）检索单元过短。为了解决这些问题，我们提出了GraPPI，一种基于大规模知识图谱（KG）的检索-分割-求解代理管道RAG框架，通过将整个PPI路径的分析分解为专注于PPI边分析的子任务，以支持大规模PPI信号通路探索，从而理解治疗影响。

摘要： Drug discovery (DD) has tremendously contributed to maintaining and improving public health. Hypothesizing that inhibiting protein misfolding can slow disease progression, researchers focus on target identification (Target ID) to find protein structures for drug binding. While Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks have accelerated drug discovery, integrating models into cohesive workflows remains challenging. We conducted a user study with drug discovery researchers to identify the applicability of LLMs and RAGs in Target ID. We identified two main findings: 1) an LLM should provide multiple Protein-Protein Interactions (PPIs) based on an initial protein and protein candidates that have a therapeutic impact; 2) the model must provide the PPI and relevant explanations for better understanding. Based on these observations, we identified three limitations in previous approaches for Target ID: 1) semantic ambiguity, 2) lack of explainability, and 3) short retrieval units. To address these issues, we propose GraPPI, a large-scale knowledge graph (KG)-based retrieve-divide-solve agent pipeline RAG framework to support large-scale PPI signaling pathway exploration in understanding therapeutic impacts by decomposing the analysis of entire PPI pathways into sub-tasks focused on the analysis of PPI edges.

评论：	14页；5张图。发表于NAACL 2025的成果
主题：	定量方法 (q-bio.QM) ; 人工智能 (cs.AI); 机器学习 (cs.LG)
引用方式：	arXiv:2501.16382 [q-bio.QM]
	(或者 arXiv:2501.16382v1 [q-bio.QM] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.16382

提交历史

来自： Ziwen Li [查看电子邮件]
[v1] 星期五， 2025 年 1 月 24 日 18:16:53 UTC (1,186 KB)

定量生物学 > 定量方法

标题： GraPPI：一种用于大规模蛋白质-蛋白质相互作用探索的检索-分割-求解图RAG框架

标题： GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

定量生物学 > 定量方法

标题： GraPPI：一种用于大规模蛋白质-蛋白质相互作用探索的检索-分割-求解图RAG框架 显示英文标题

标题： GraPPI: A Retrieve-Divide-Solve GraphRAG Framework for Large-scale Protein-protein Interaction Exploration

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： GraPPI：一种用于大规模蛋白质-蛋白质相互作用探索的检索-分割-求解图RAG框架