Coder as Editor: Code-driven Interpretable Molecular Optimization

Zhu, Wenyu; Li, Chengzhu; Tian, Xiaohe; Wang, Yifan; Jia, Yinjun; Wang, Jianhui; Gao, Bowen; Zhang, Ya-Qin; Ma, Wei-Ying; Lan, Yanyan

计算机科学 > 机器学习

arXiv:2510.14455v1 (cs)

[提交于 2025年10月16日 ]

标题：编码器作为编辑器：代码驱动的可解释分子优化

标题： Coder as Editor: Code-driven Interpretable Molecular Optimization

Authors:Wenyu Zhu, Chengzhu Li, Xiaohe Tian, Yifan Wang, Yinjun Jia, Jianhui Wang, Bowen Gao, Ya-Qin Zhang, Wei-Ying Ma, Yanyan Lan

摘要：分子优化是药物发现中的核心任务，需要精确的结构推理和领域知识。虽然大型语言模型（LLMs）在生成自然语言中的高层编辑意图方面表现出色，但它们在忠实执行这些修改方面往往存在困难，尤其是在处理非直观表示如SMILES时。我们引入了MECo，这是一个通过将编辑操作转换为可执行代码来弥合理性与执行之间差距的框架。 MECo将LLMs的分子优化重新构造成一个级联框架：从分子和属性目标生成人类可理解的编辑意图，随后通过代码生成将这些意图转化为可执行的结构编辑。我们的方法在重现从化学反应和目标特定化合物对中得出的真实编辑方面实现了超过98%的准确率。在涵盖物理化学性质和靶点活性的下游优化基准测试中，MECo通过38-86个百分点的改进使一致性达到90%以上，并在保持结构相似性的同时，相较于基于SMILES的基线方法取得了更高的成功率。通过将意图与执行对齐，MECo实现了一致、可控且可解释的分子设计，为药物发现中的高保真反馈循环和人机协作工作流程奠定了基础。

摘要： Molecular optimization is a central task in drug discovery that requires precise structural reasoning and domain knowledge. While large language models (LLMs) have shown promise in generating high-level editing intentions in natural language, they often struggle to faithfully execute these modifications-particularly when operating on non-intuitive representations like SMILES. We introduce MECo, a framework that bridges reasoning and execution by translating editing actions into executable code. MECo reformulates molecular optimization for LLMs as a cascaded framework: generating human-interpretable editing intentions from a molecule and property goal, followed by translating those intentions into executable structural edits via code generation. Our approach achieves over 98% accuracy in reproducing held-out realistic edits derived from chemical reactions and target-specific compound pairs. On downstream optimization benchmarks spanning physicochemical properties and target activities, MECo substantially improves consistency by 38-86 percentage points to 90%+ and achieves higher success rates over SMILES-based baselines while preserving structural similarity. By aligning intention with execution, MECo enables consistent, controllable and interpretable molecular design, laying the foundation for high-fidelity feedback loops and collaborative human-AI workflows in drug discovery.

主题：	机器学习 (cs.LG) ; 生物大分子 (q-bio.BM)
引用方式：	arXiv:2510.14455 [cs.LG]
	(或者 arXiv:2510.14455v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.14455

提交历史

来自： Wenyu Zhu [查看电子邮件]
[v1] 星期四， 2025 年 10 月 16 日 08:55:06 UTC (1,674 KB)

计算机科学 > 机器学习

标题：编码器作为编辑器：代码驱动的可解释分子优化

标题： Coder as Editor: Code-driven Interpretable Molecular Optimization

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 编码器作为编辑器：代码驱动的可解释分子优化 显示英文标题

标题： Coder as Editor: Code-driven Interpretable Molecular Optimization

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：编码器作为编辑器：代码驱动的可解释分子优化