Few-Label Multimodal Modeling of SNP Variants and ECG Phenotypes Using Large Language Models for Cardiovascular Risk Stratification

Menon, Niranjana Arun; Li, Yulong; Farooq, Iqra; Ahmed, Sara; Awais, Muhammad; Razzak, Imran

定量生物学 > 定量方法

arXiv:2510.16536 (q-bio)

[提交于 2025年10月18日 ]

标题：基于大型语言模型的SNP变异和心电图表型的少标签多模态建模用于心血管风险分层

标题： Few-Label Multimodal Modeling of SNP Variants and ECG Phenotypes Using Large Language Models for Cardiovascular Risk Stratification

Authors:Niranjana Arun Menon, Yulong Li, Iqra Farooq, Sara Ahmed, Muhammad Awais, Imran Razzak

摘要：心血管疾病（CVD）风险分层仍然是一项重大挑战，这是由于其多因素性质以及高质量标注数据集的可用性有限。虽然基因组和电生理数据如SNP变异和ECG表型日益可获取，但在低标签设置中有效整合这些模态并不容易。这一挑战源于缺乏良好注释的多模态数据集以及生物信号的高维度，这限制了传统监督模型的有效性。为了解决这个问题，我们提出了一种少标签多模态框架，利用大语言模型（LLMs）结合遗传和电生理信息进行心血管风险分层。我们的方法结合了一种伪标签精炼策略，以自适应地从弱监督预测中提炼出高置信度标签，从而仅使用少量真实注释就能实现稳健的模型微调。为了增强可解释性，我们将任务框架化为一个思维链（CoT）推理问题，促使模型在生成预测的同时产生临床上相关的推理过程。实验结果表明，多模态输入、少标签监督和CoT推理的结合提高了在不同患者群体中的鲁棒性和泛化能力。使用多模态SNP变异和ECG衍生特征的实验结果表明，其性能与在完整数据集上训练的模型相当，突显了基于LLM的少标签多模态建模在推进个性化心血管护理方面的潜力。

摘要： Cardiovascular disease (CVD) risk stratification remains a major challenge due to its multifactorial nature and limited availability of high-quality labeled datasets. While genomic and electrophysiological data such as SNP variants and ECG phenotypes are increasingly accessible, effectively integrating these modalities in low-label settings is non-trivial. This challenge arises from the scarcity of well-annotated multimodal datasets and the high dimensionality of biological signals, which limit the effectiveness of conventional supervised models. To address this, we present a few-label multimodal framework that leverages large language models (LLMs) to combine genetic and electrophysiological information for cardiovascular risk stratification. Our approach incorporates a pseudo-label refinement strategy to adaptively distill high-confidence labels from weakly supervised predictions, enabling robust model fine-tuning with only a small set of ground-truth annotations. To enhance the interpretability, we frame the task as a Chain of Thought (CoT) reasoning problem, prompting the model to produce clinically relevant rationales alongside predictions. Experimental results demonstrate that the integration of multimodal inputs, few-label supervision, and CoT reasoning improves robustness and generalizability across diverse patient profiles. Experimental results using multimodal SNP variants and ECG-derived features demonstrated comparable performance to models trained on the full dataset, underscoring the promise of LLM-based few-label multimodal modeling for advancing personalized cardiovascular care.

主题：	定量方法 (q-bio.QM) ; 人工智能 (cs.AI); 机器学习 (cs.LG)
引用方式：	arXiv:2510.16536 [q-bio.QM]
	(或者 arXiv:2510.16536v1 [q-bio.QM] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.16536

提交历史

来自： Imran Razzak [查看电子邮件]
[v1] 星期六， 2025 年 10 月 18 日 15:19:35 UTC (1,364 KB)

定量生物学 > 定量方法

标题：基于大型语言模型的SNP变异和心电图表型的少标签多模态建模用于心血管风险分层

标题： Few-Label Multimodal Modeling of SNP Variants and ECG Phenotypes Using Large Language Models for Cardiovascular Risk Stratification

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

定量生物学 > 定量方法

标题： 基于大型语言模型的SNP变异和心电图表型的少标签多模态建模用于心血管风险分层 显示英文标题

标题： Few-Label Multimodal Modeling of SNP Variants and ECG Phenotypes Using Large Language Models for Cardiovascular Risk Stratification

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于大型语言模型的SNP变异和心电图表型的少标签多模态建模用于心血管风险分层