Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > q-bio > arXiv:2509.22853

Help | Advanced Search

Quantitative Biology > Quantitative Methods

arXiv:2509.22853 (q-bio)
[Submitted on 26 Sep 2025 ]

Title: Patient-specific Biomolecular Instruction Tuning

Title: 患者特异性生物分子指令调优

Authors:Irsyad Adam, Zekai Chen, David Laub, Shaun Porwal, Arda Pekis, Kevin Brown
Abstract: Proteomics data is essential to pathogenic understanding of a disease phenotype. In cancer, analysis of molecular signatures enables precision medicine through the identification of biological processes that drive individualized tumor progression, therapeutic resistance, and clinical heterogeneity. Recent advances in multimodal large language models (LLMs) have shown remarkable capacity to integrate and reason across heterogeneous data modalities. However, performing multi-modal language modeling for molecular understanding of patient-specific proteomics remains a significant challenge due to two barriers: (1) the lack of instruction-tuning datasets that enable clinical interpretation from proteomics data, and (2) the absence of language modeling architectures designed to capture the rich heterogeneity of molecular data. In this work, we introduce CPTAC-PROTSTRUCT, the first instruction tuning dataset for molecular understanding of oncology, comprising over 400k open-ended examples derived from individualized proteomic profiles curated from the largest national proteomics cancer study (CPTAC). Additionally, we propose KRONOS (Knowledge Representation of patient Omics Networks in Oncology via Structured tuning), a novel graph-LLM framework that leverages molecular interaction topology with proteomics to learn patient-specific graph representations for enhanced clinical reasoning. We show that KRONOS achieves competitive performance across benchmark clinical tasks, including molecular classification, temporal trajectory modeling, and tumor stage prediction from proteomics data. Ultimately, this approach empowers LLMs to understand patient-level pathogenesis, advancing precision medicine through more accurate diagnosis, prognosis, and treatment stratification.
Abstract: 蛋白质组学数据对于理解疾病表型的致病性至关重要。 在癌症中,分子特征的分析通过识别驱动个体化肿瘤进展、治疗耐受性和临床异质性的生物过程,使精准医学成为可能。 多模态大型语言模型(LLMs)的最新进展显示出整合和跨异构数据模态进行推理的显著能力。 然而,由于两个障碍,针对患者特异性蛋白质组学的多模态语言建模仍然是一个重大挑战:(1)缺乏能够从蛋白质组学数据中进行临床解释的指令调优数据集,以及(2)缺乏专门设计来捕捉分子数据丰富异质性的语言建模架构。 在本工作中,我们引入了CPTAC-PROTSTRUCT,这是首个用于肿瘤学分子理解的指令调优数据集,包含超过40万条从最大的国家蛋白质组学癌症研究(CPTAC)中整理的个体化蛋白质组谱衍生的开放性示例。 此外,我们提出了KRONOS(通过结构化调优在肿瘤学中对患者组学网络的知识表示),一种新颖的图-LLM框架,它利用分子相互作用拓扑结构与蛋白质组学结合,以学习增强临床推理的患者特异性图表示。 我们展示了KRONOS在基准临床任务中的竞争力,包括分子分类、时间轨迹建模和从蛋白质组学数据中预测肿瘤阶段。 最终,这种方法使LLMs能够理解患者水平的发病机制,通过更准确的诊断、预后和治疗分层推动精准医学的发展。
Subjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
MSC classes: 92C40, 68T07, 62P10
ACM classes: I.2.7; I.5.1; J.3
Cite as: arXiv:2509.22853 [q-bio.QM]
  (or arXiv:2509.22853v1 [q-bio.QM] for this version)
  https://doi.org/10.48550/arXiv.2509.22853
arXiv-issued DOI via DataCite

Submission history

From: Irsyad Adam [view email]
[v1] Fri, 26 Sep 2025 19:05:32 UTC (533 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
license icon view license
Current browse context:
q-bio.QM
< prev   |   next >
new | recent | 2025-09
Change to browse by:
cs
cs.AI
cs.CL
cs.LG
q-bio

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号