Genomics

Replacements

See recent articles

Showing new listings for Monday, 29 September 2025

Total of 2 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2506.11152 (replaced) [cn-pdf, pdf, html, other]: Title: HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

Title: HEIST：用于空间转录组学和蛋白质组学数据的图基础模型

Hiren Madhu, Jo√£o Felipe Rocha, Tinglin Huang, Siddharth Viswanath, Smita Krishnaswamy, Rex Ying

Subjects: Genomics (q-bio.GN) ; Machine Learning (cs.LG) ; Cell Behavior (q-bio.CB)

Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.

单细胞转录组学和蛋白质组学已成为生物数据驱动洞察的重要来源，使先进的深度学习方法能够理解单细胞水平上的细胞异质性和基因表达。随着空间组学数据的出现，我们有望在组织环境中表征细胞，因为它提供了空间坐标和细胞内转录或蛋白质计数。蛋白质组学通过直接测量蛋白质提供了互补视角，蛋白质是细胞功能的主要执行者和关键治疗靶点。然而，现有模型要么忽略空间信息，要么忽略细胞内的复杂遗传和蛋白质程序。因此，它们无法推断细胞内部调控如何适应微环境线索。此外，这些模型通常使用固定的基因词汇表，阻碍了它们对未见过基因的泛化能力。在本文中，我们介绍了HEIST，这是一个用于空间转录组学和蛋白质组学的层次图变压器基础模型。 HEIST将组织建模为层次图。高级图是一个空间细胞图，每个细胞则由其低级基因共表达网络图表示。 HEIST通过执行层内和跨层的信息传递来利用嵌入中的层次结构，因此可以推广到新的数据类型，包括空间蛋白质组学而无需重新训练。 HEIST在15个器官中124种组织的2230万细胞上进行了预训练，使用了空间感知对比和掩码自编码目标。对HEIST嵌入的无监督分析揭示了先前模型未能发现的空间信息子群体。下游评估展示了对蛋白质组学数据的泛化能力，并在多个技术平台上的临床结果预测、细胞类型注释和基因填补方面达到了最先进的性能。
[2] arXiv:2507.06113 (replaced) [cn-pdf, pdf, other]: Title: A Statistical Framework for Co-Mediators of Zero-Inflated Single-Cell RNA-Seq Data

Title: 零膨胀单细胞RNA测序数据的共同调节器统计框架

Seungjun Ahn, Li Chen, Maaike van Gerwen, Zhigang Li

Comments: 23 pages and 3 figures

Subjects: Methodology (stat.ME) ; Genomics (q-bio.GN) ; Quantitative Methods (q-bio.QM) ; Applications (stat.AP)

Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity, enabling detailed molecular profiling at the individual cell level. However, integrating high-dimensional single-cell data into causal mediation analysis remains challenging due to zero inflation and complex mediator structures. We propose a novel mediation framework leveraging zero-inflated negative binomial models to characterize cell-level mediator distributions and beta regression for zero-inflation proportions. The model can identify expression level as well as expressed proportion that could mediate disease-leading causal pathway. Extensive simulation studies demonstrate improved power and controlled false discovery rates. We further illustrate the utility of this approach through application to ROSMAP single-cell transcriptomic data, uncovering biologically meaningful mediation effects that enhance understanding of disease mechanisms.

单细胞RNA测序（scRNA-seq）彻底改变了细胞异质性的研究，使在单个细胞水平上进行详细的分子表征成为可能。然而，由于零膨胀和复杂的中介结构，将高维单细胞数据整合到因果中介分析中仍然具有挑战性。我们提出了一种新的中介框架，利用零膨胀负二项模型来描述细胞水平的中介分布，并使用贝塔回归来处理零膨胀比例。该模型可以识别能够介导疾病相关因果路径的表达水平以及表达比例。大量的模拟研究显示了改进的统计功效和受控的假发现率。我们进一步通过应用到ROSMAP单细胞转录组数据来说明这种方法的实用性，揭示了生物学上有意义的中介效应，从而增强了对疾病机制的理解。

Total of 2 entries

Showing up to 2000 entries per page: fewer | more | all

Genomics

Showing new listings for Monday, 29 September 2025

Replacement submissions (showing 2 of 2 entries )