Towards Applying Large Language Models to Complement Single-Cell Foundation Models

Palayew, Steven; Wang, Bo; Bader, Gary

计算机科学 > 机器学习

arXiv:2507.10039 (cs)

[提交于 2025年7月14日 ]

标题：面向将大型语言模型应用于补充单细胞基础模型

标题： Towards Applying Large Language Models to Complement Single-Cell Foundation Models

Authors:Steven Palayew, Bo Wang, Gary Bader

摘要：单细胞基础模型，如scGPT，在单细胞组学领域代表了重大进展，能够实现各种下游生物任务的最先进性能。然而，这些模型本质上存在局限性，因为生物学中存在大量信息以文本形式存在，而它们无法利用这些信息。因此，最近有几项研究提出了使用大语言模型（LLMs）作为单细胞基础模型的替代方案，并取得了具有竞争力的结果。然而，对于驱动这种性能的因素缺乏深入了解，同时对使用LLMs作为替代方法而非与单细胞基础模型互补的方法存在强烈关注。因此，在本研究中，我们探讨了当应用于单细胞数据时，哪些生物见解有助于大语言模型的性能，并引入了scMPT；一个利用scGPT和从捕捉这些见解的LLMs中获得的单细胞表示之间协同效应的模型。 scMPT的表现比其任一组件模型都更强且更一致，而这些组件模型在不同数据集上经常存在较大的性能差距。我们还尝试了其他融合方法，证明了将专业推理模型与scGPT结合以提高性能的潜力。本研究最终展示了大语言模型在补充单细胞基础模型并推动单细胞分析改进方面的潜力。

摘要： Single-cell foundation models such as scGPT represent a significant advancement in single-cell omics, with an ability to achieve state-of-the-art performance on various downstream biological tasks. However, these models are inherently limited in that a vast amount of information in biology exists as text, which they are unable to leverage. There have therefore been several recent works that propose the use of LLMs as an alternative to single-cell foundation models, achieving competitive results. However, there is little understanding of what factors drive this performance, along with a strong focus on using LLMs as an alternative, rather than complementary approach to single-cell foundation models. In this study, we therefore investigate what biological insights contribute toward the performance of LLMs when applied to single-cell data, and introduce scMPT; a model which leverages synergies between scGPT, and single-cell representations from LLMs that capture these insights. scMPT demonstrates stronger, more consistent performance than either of its component models, which frequently have large performance gaps between each other across datasets. We also experiment with alternate fusion methods, demonstrating the potential of combining specialized reasoning models with scGPT to improve performance. This study ultimately showcases the potential for LLMs to complement single-cell foundation models and drive improvements in single-cell analysis.

主题：	机器学习 (cs.LG) ; 基因组学 (q-bio.GN)
引用方式：	arXiv:2507.10039 [cs.LG]
	(或者 arXiv:2507.10039v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.10039

提交历史

来自： Steven Palayew [查看电子邮件]
[v1] 星期一， 2025 年 7 月 14 日 08:16:58 UTC (476 KB)

计算机科学 > 机器学习

标题：面向将大型语言模型应用于补充单细胞基础模型

标题： Towards Applying Large Language Models to Complement Single-Cell Foundation Models

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 面向将大型语言模型应用于补充单细胞基础模型 显示英文标题

标题： Towards Applying Large Language Models to Complement Single-Cell Foundation Models

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：面向将大型语言模型应用于补充单细胞基础模型