Patient-specific Biomolecular Instruction Tuning

Adam, Irsyad; Chen, Zekai; Laub, David; Porwal, Shaun; Pekis, Arda; Brown, Kevin

Quantitative Biology > Quantitative Methods

arXiv:2509.22853 (q-bio)

[Submitted on 26 Sep 2025 ]

Title: Patient-specific Biomolecular Instruction Tuning

Title: 患者特异性生物分子指令调优

Authors:Irsyad Adam, Zekai Chen, David Laub, Shaun Porwal, Arda Pekis, Kevin Brown

Abstract: Proteomics data is essential to pathogenic understanding of a disease phenotype. In cancer, analysis of molecular signatures enables precision medicine through the identification of biological processes that drive individualized tumor progression, therapeutic resistance, and clinical heterogeneity. Recent advances in multimodal large language models (LLMs) have shown remarkable capacity to integrate and reason across heterogeneous data modalities. However, performing multi-modal language modeling for molecular understanding of patient-specific proteomics remains a significant challenge due to two barriers: (1) the lack of instruction-tuning datasets that enable clinical interpretation from proteomics data, and (2) the absence of language modeling architectures designed to capture the rich heterogeneity of molecular data. In this work, we introduce CPTAC-PROTSTRUCT, the first instruction tuning dataset for molecular understanding of oncology, comprising over 400k open-ended examples derived from individualized proteomic profiles curated from the largest national proteomics cancer study (CPTAC). Additionally, we propose KRONOS (Knowledge Representation of patient Omics Networks in Oncology via Structured tuning), a novel graph-LLM framework that leverages molecular interaction topology with proteomics to learn patient-specific graph representations for enhanced clinical reasoning. We show that KRONOS achieves competitive performance across benchmark clinical tasks, including molecular classification, temporal trajectory modeling, and tumor stage prediction from proteomics data. Ultimately, this approach empowers LLMs to understand patient-level pathogenesis, advancing precision medicine through more accurate diagnosis, prognosis, and treatment stratification.

Abstract: 蛋白质组学数据对于理解疾病表型的致病性至关重要。在癌症中，分子特征的分析通过识别驱动个体化肿瘤进展、治疗耐受性和临床异质性的生物过程，使精准医学成为可能。多模态大型语言模型（LLMs）的最新进展显示出整合和跨异构数据模态进行推理的显著能力。然而，由于两个障碍，针对患者特异性蛋白质组学的多模态语言建模仍然是一个重大挑战：（1）缺乏能够从蛋白质组学数据中进行临床解释的指令调优数据集，以及（2）缺乏专门设计来捕捉分子数据丰富异质性的语言建模架构。在本工作中，我们引入了CPTAC-PROTSTRUCT，这是首个用于肿瘤学分子理解的指令调优数据集，包含超过40万条从最大的国家蛋白质组学癌症研究（CPTAC）中整理的个体化蛋白质组谱衍生的开放性示例。此外，我们提出了KRONOS（通过结构化调优在肿瘤学中对患者组学网络的知识表示），一种新颖的图-LLM框架，它利用分子相互作用拓扑结构与蛋白质组学结合，以学习增强临床推理的患者特异性图表示。我们展示了KRONOS在基准临床任务中的竞争力，包括分子分类、时间轨迹建模和从蛋白质组学数据中预测肿瘤阶段。最终，这种方法使LLMs能够理解患者水平的发病机制，通过更准确的诊断、预后和治疗分层推动精准医学的发展。

Subjects:	Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
MSC classes:	92C40, 68T07, 62P10
ACM classes:	I.2.7; I.5.1; J.3
Cite as:	arXiv:2509.22853 [q-bio.QM]
	(or arXiv:2509.22853v1 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.2509.22853

Submission history

From: Irsyad Adam [view email]
[v1] Fri, 26 Sep 2025 19:05:32 UTC (533 KB)

Quantitative Biology > Quantitative Methods

Title: Patient-specific Biomolecular Instruction Tuning

Title: 患者特异性生物分子指令调优

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Quantitative Methods

Title: Patient-specific Biomolecular Instruction Tuning Show Chinese title

Title: 患者特异性生物分子指令调优

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Patient-specific Biomolecular Instruction Tuning