Large Language Models for Bioinformatics

Ruan, Wei; Lyu, Yanjun; Zhang, Jing; Cai, Jiazhang; Shu, Peng; Ge, Yang; Lu, Yao; Gao, Shang; Wang, Yue; Wang, Peilong; Zhao, Lin; Wang, Tao; Liu, Yufang; Fang, Luyang; Liu, Ziyu; Liu, Zhengliang; Li, Yiwei; Wu, Zihao; Chen, Junhao; Jiang, Hanqi; Pan, Yi; Yang, Zhenyuan; Chen, Jingyuan; Liang, Shizhe; Zhang, Wei; Ma, Terry; Dou, Yuan; Zhang, Jianli; Gong, Xinyu; Gan, Qi; Zou, Yusong; Chen, Zebang; Qian, Yuanxin; Yu, Shuo; Lu, Jin; Song, Kenan; Wang, Xianqiao; Sikora, Andrea; Li, Gang; Li, Xiang; Li, Quanzheng; Wang, Yingfeng; Zhang, Lu; Abate, Yohannes; He, Lifang; Zhong, Wenxuan; Liu, Rongjie; Huang, Chao; Liu, Wei; Shen, Ye; Ma, Ping; Zhu, Hongtu; Yan, Yajun; Zhu, Dajiang; Liu, Tianming

定量生物学 > 定量方法

arXiv:2501.06271 (q-bio)

[提交于 2025年1月10日 ]

标题：生物信息学中的大型语言模型

标题： Large Language Models for Bioinformatics

摘要：随着大规模语言模型（LLM）技术的快速发展以及生物信息学专用语言模型（BioLMs）的出现，对当前发展状况、计算特征和多样化应用进行全面分析的需求日益增长。本综述旨在通过提供对BioLMs的全面回顾来满足这一需求，重点介绍其演变过程、分类及显著特征，并详细探讨训练方法、数据集和评估框架。我们研究了BioLMs在疾病诊断、药物发现和疫苗开发等关键领域的广泛应用，突显了它们在生物信息学中的影响和变革潜力。我们识别了BioLMs中固有的关键挑战和局限性，包括数据隐私和安全问题、可解释性问题、训练数据和模型输出中的偏见以及领域适应的复杂性。最后，我们突出了新兴趋势和未来方向，为研究人员和临床医生提供了有价值的见解，以推动BioLMs在日益复杂的生物学和临床应用中的发展。

摘要： With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.

评论：	64页，1图
主题：	定量方法 (q-bio.QM) ; 人工智能 (cs.AI); 计算工程、金融与科学 (cs.CE)
引用方式：	arXiv:2501.06271 [q-bio.QM]
	(或者 arXiv:2501.06271v1 [q-bio.QM] 对于此版本)
	https://doi.org/10.48550/arXiv.2501.06271

提交历史

来自： Yanjun Lyu [查看电子邮件]
[v1] 星期五， 2025 年 1 月 10 日 01:43:05 UTC (591 KB)

定量生物学 > 定量方法

标题：生物信息学中的大型语言模型

标题： Large Language Models for Bioinformatics

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

定量生物学 > 定量方法

标题： 生物信息学中的大型语言模型 显示英文标题

标题： Large Language Models for Bioinformatics

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：生物信息学中的大型语言模型