Molecule Generation by Principal Subgraph Mining and Assembling

Kong, Xiangzhe; Huang, Wenbing; Tan, Zhixing; Liu, Yang

Computer Science > Machine Learning

arXiv:2106.15098 (cs)

[Submitted on 29 Jun 2021 (v1) , last revised 17 Dec 2022 (this version, v4)]

Title: Molecule Generation by Principal Subgraph Mining and Assembling

Title: 通过主子图挖掘和组装生成分子

Authors:Xiangzhe Kong, Wenbing Huang, Zhixing Tan, Yang Liu

Abstract: Molecule generation is central to a variety of applications. Current attention has been paid to approaching the generation task as subgraph prediction and assembling. Nevertheless, these methods usually rely on hand-crafted or external subgraph construction, and the subgraph assembling depends solely on local arrangement. In this paper, we define a novel notion, principal subgraph, that is closely related to the informative pattern within molecules. Interestingly, our proposed merge-and-update subgraph extraction method can automatically discover frequent principal subgraphs from the dataset, while previous methods are incapable of. Moreover, we develop a two-step subgraph assembling strategy, which first predicts a set of subgraphs in a sequence-wise manner and then assembles all generated subgraphs globally as the final output molecule. Built upon graph variational auto-encoder, our model is demonstrated to be effective in terms of several evaluation metrics and efficiency, compared with state-of-the-art methods on distribution learning and (constrained) property optimization tasks.

Abstract: 分子生成是多种应用的核心。当前的研究关注于将生成任务视为子图预测和组装。然而，这些方法通常依赖于手工设计或外部的子图构建，且子图组装仅依赖于局部排列。在本文中，我们定义了一个新的概念，主子图，它与分子内的信息模式密切相关。有趣的是，我们提出的合并与更新子图提取方法可以从数据集中自动发现频繁的主子图，而之前的方法无法做到这一点。此外，我们开发了一种两步的子图组装策略，首先按顺序预测一组子图，然后全局地组装所有生成的子图作为最终输出的分子。基于图变分自编码器，我们的模型在多个评估指标和效率方面被证明是有效的，与在分布学习和（约束）属性优化任务上的最先进方法相比。

Comments:	Accepted by NeurIPS 2022. Oral presentation
Subjects:	Machine Learning (cs.LG) ; Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2106.15098 [cs.LG]
	(or arXiv:2106.15098v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.15098

Submission history

From: Xiangzhe Kong [view email]
[v1] Tue, 29 Jun 2021 05:26:18 UTC (795 KB)
[v2] Sun, 19 Dec 2021 07:59:46 UTC (769 KB)
[v3] Sat, 1 Oct 2022 02:56:19 UTC (799 KB)
[v4] Sat, 17 Dec 2022 13:44:00 UTC (800 KB)

Computer Science > Machine Learning

Title: Molecule Generation by Principal Subgraph Mining and Assembling

Title: 通过主子图挖掘和组装生成分子

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title: Molecule Generation by Principal Subgraph Mining and Assembling Show Chinese title

Title: 通过主子图挖掘和组装生成分子

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Molecule Generation by Principal Subgraph Mining and Assembling