MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis

Aakur, Sathyanarayanan N.; Narayanan, Sai; Indla, Vineela; Bagavathi, Arunkumar; Ramnath, Vishalini Laguduva; Ramachandran, Akhilesh

Computer Science > Machine Learning

arXiv:2107.09883 (cs)

[Submitted on 21 Jul 2021 ]

Title: MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis

Title: MG-NET：利用伪成像进行多模态宏基因组分析

Authors:Sathyanarayanan N. Aakur, Sai Narayanan, Vineela Indla, Arunkumar Bagavathi, Vishalini Laguduva Ramnath, Akhilesh Ramachandran

Abstract: The emergence of novel pathogens and zoonotic diseases like the SARS-CoV-2 have underlined the need for developing novel diagnosis and intervention pipelines that can learn rapidly from small amounts of labeled data. Combined with technological advances in next-generation sequencing, metagenome-based diagnostic tools hold much promise to revolutionize rapid point-of-care diagnosis. However, there are significant challenges in developing such an approach, the chief among which is to learn self-supervised representations that can help detect novel pathogen signatures with very low amounts of labeled data. This is particularly a difficult task given that closely related pathogens can share more than 90% of their genome structure. In this work, we address these challenges by proposing MG-Net, a self-supervised representation learning framework that leverages multi-modal context using pseudo-imaging data derived from clinical metagenome sequences. We show that the proposed framework can learn robust representations from unlabeled data that can be used for downstream tasks such as metagenome sequence classification with limited access to labeled data. Extensive experiments show that the learned features outperform current baseline metagenome representations, given only 1000 samples per class.

Abstract: 新病原体和人畜共患病如SARS-CoV-2的出现凸显了开发新型诊断和干预流程的必要性，这些流程能够从少量标记数据中快速学习。结合下一代测序技术的进步，基于宏基因组的诊断工具在推动快速现场诊断方面具有巨大潜力。然而，开发这种方法面临重大挑战，其中最主要的是学习自我监督表示，这些表示能够在标记数据非常少的情况下帮助检测新的病原体特征。考虑到密切相关的病原体可以共享超过90%的基因组结构，这使得这一任务尤为困难。在这项工作中，我们通过提出MG-Net来解决这些挑战，MG-Net是一种自我监督表示学习框架，利用从临床宏基因组序列派生的伪图像数据的多模态上下文。我们证明，所提出的框架可以从无标签数据中学习到稳健的表示，这些表示可用于下游任务，例如在有限访问标记数据的情况下进行宏基因组序列分类。大量实验表明，所学特征在每类仅1000个样本的情况下优于当前的宏基因组表示。

Comments:	To appear in MICCAI 2021
Subjects:	Machine Learning (cs.LG) ; Genomics (q-bio.GN)
Cite as:	arXiv:2107.09883 [cs.LG]
	(or arXiv:2107.09883v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2107.09883

Submission history

From: Sathyanarayanan Aakur [view email]
[v1] Wed, 21 Jul 2021 05:53:01 UTC (776 KB)

Computer Science > Machine Learning

Title: MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis

Title: MG-NET：利用伪成像进行多模态宏基因组分析

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title: MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis Show Chinese title

Title: MG-NET：利用伪成像进行多模态宏基因组分析

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis