Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > stat > arXiv:1704.02007

Help | Advanced Search

Statistics > Machine Learning

arXiv:1704.02007 (stat)
[Submitted on 6 Apr 2017 ]

Title: DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

Title: DIMM-SC:一种用于基于微滴的单细胞转录组数据聚类的狄利克雷混合模型

Authors:Zhe Sun, Ting Wang, Ke Deng, Xiao-Feng Wang, Robert Lafyatis, Ying Ding, Ming Hu, Wei Chen
Abstract: Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. Methods: We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. An expectation-maximization algorithm is used for parameter inference. Results: We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods.
Abstract: 动机:单细胞转录组测序(scRNA-Seq)已成为在单细胞分辨率下研究细胞和分子过程的革命性工具。在现有技术中,最近开发的基于液滴的平台能够通过唯一分子标识符(UMI)直接计数转录本拷贝数,实现对数千个单细胞的高效并行处理。尽管技术有所进步,但用于分析基于液滴的scRNA-Seq数据的统计方法和计算工具仍然不足。特别是,针对大规模单细胞转录组数据的基于模型的聚类方法仍缺乏深入研究。方法:我们开发了DIMM-SC,这是一种用于基于液滴的单细胞转录组数据聚类的狄利克雷混合模型。该方法显式地对scRNA-Seq实验中的UMI计数数据进行建模,并通过狄利克雷混合先验来表征不同细胞簇之间的变化。使用期望最大化算法进行参数推断。结果:我们进行了全面的模拟以评估DIMM-SC,并将其与其他现有的聚类方法如K-means、CellTree和Seurat进行比较。此外,我们分析了具有已知簇标签的公共scRNA-Seq数据集以及来自系统性硬化症研究的内部scRNA-Seq数据集,并结合先前的生物学知识来基准测试和验证DIMM-SC。模拟研究和实际数据应用均表明,总体而言,与现有其他聚类方法相比,DIMM-SC在聚类准确性方面有显著提高,聚类变异性也大大降低。更重要的是,作为一种基于模型的方法,DIMM-SC能够为每个单细胞量化聚类不确定性,从而促进严格的统计推断和生物学解释,而这些通常是现有聚类方法所不具备的。
Subjects: Machine Learning (stat.ML) ; Quantitative Methods (q-bio.QM)
Cite as: arXiv:1704.02007 [stat.ML]
  (or arXiv:1704.02007v1 [stat.ML] for this version)
  https://doi.org/10.48550/arXiv.1704.02007
arXiv-issued DOI via DataCite

Submission history

From: We Chen [view email]
[v1] Thu, 6 Apr 2017 20:01:29 UTC (4,942 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • Other Formats
view license
Current browse context:
q-bio
< prev   |   next >
new | recent | 2017-04
Change to browse by:
q-bio.QM
stat
stat.ML

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号