Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > q-bio > arXiv:2505.07896

Help | Advanced Search

Quantitative Biology > Genomics

arXiv:2505.07896 (q-bio)
[Submitted on 12 May 2025 ]

Title: Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability

Title: 大型语言模型与单细胞转录组学在解析选择性运动神经元脆弱性的结合

Authors:Douglas Jiang, Zilin Dai, Luxuan Zhang, Qiyi Yu, Haoqi Sun, Feng Tian
Abstract: Understanding cell identity and function through single-cell level sequencing data remains a key challenge in computational biology. We present a novel framework that leverages gene-specific textual annotations from the NCBI Gene database to generate biologically contextualized cell embeddings. For each cell in a single-cell RNA sequencing (scRNA-seq) dataset, we rank genes by expression level, retrieve their NCBI Gene descriptions, and transform these descriptions into vector embedding representations using large language models (LLMs). The models used include OpenAI text-embedding-ada-002, text-embedding-3-small, and text-embedding-3-large (Jan 2024), as well as domain-specific models BioBERT and SciBERT. Embeddings are computed via an expression-weighted average across the top N most highly expressed genes in each cell, providing a compact, semantically rich representation. This multimodal strategy bridges structured biological data with state-of-the-art language modeling, enabling more interpretable downstream applications such as cell-type clustering, cell vulnerability dissection, and trajectory inference.
Abstract: 理解通过单细胞水平测序数据的细胞身份和功能仍然是计算生物学中的一个关键挑战。 我们提出了一种新的框架,利用来自NCBI基因数据库的基因特定文本注释生成生物背景化的细胞嵌入。 对于单细胞RNA测序(scRNA-seq)数据集中的每个细胞,我们根据表达水平对基因进行排名,检索它们的NCBI基因描述,并使用大型语言模型(LLM)将这些描述转换为向量嵌入表示。 使用的模型包括OpenAI的text-embedding-ada-002、text-embedding-3-small和text-embedding-3-large(2024年1月),以及领域特定模型BioBERT和SciBERT。 嵌入是通过每个细胞中前N个高表达基因的加权平均值来计算的,提供了一个紧凑且语义丰富的表示。 这种多模态策略将结构化的生物数据与最先进的语言模型相结合,使得下游应用更加可解释,如细胞类型聚类、细胞脆弱性剖析和轨迹推断。
Subjects: Genomics (q-bio.GN) ; Artificial Intelligence (cs.AI)
Cite as: arXiv:2505.07896 [q-bio.GN]
  (or arXiv:2505.07896v1 [q-bio.GN] for this version)
  https://doi.org/10.48550/arXiv.2505.07896
arXiv-issued DOI via DataCite

Submission history

From: Feng Tian [view email]
[v1] Mon, 12 May 2025 03:39:33 UTC (2,993 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
q-bio.GN
< prev   |   next >
new | recent | 2025-05
Change to browse by:
cs
cs.AI
q-bio

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号