FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

Gupta, Advait; Raj, Rishie; Nguyen, Dang; Zhou, Tianyi

计算机科学 > 计算机视觉与模式识别

arXiv:2506.20911 (cs)

[提交于 2025年6月26日 ]

标题： FaSTA$^*$：具有子程序挖掘的快速-慢速加工路径代理，用于高效的多轮图像编辑

标题： FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

Authors:Advait Gupta, Rishie Raj, Dang Nguyen, Tianyi Zhou

摘要：我们开发了一个成本高效的神经符号代理，以解决具有挑战性的多轮图像编辑任务，例如“在图像中检测长椅并将其重新着色为粉色。同时，移除猫以获得更清晰的视角，并将墙壁重新着色为黄色。” 它结合了大型语言模型（LLMs）快速的高层子任务规划与慢速、精确的工具使用和每个子任务的局部 A$^*$搜索，以找到一个成本高效的工具路径——一系列对AI工具的调用。为了节省类似子任务上的 A$^*$成本，我们通过LLMs对之前成功的工具路径进行归纳推理，持续提取/优化常用的子例程，并将其作为新工具用于未来任务的自适应快慢规划中，其中首先探索高层子例程，仅在它们失败时才激活低层 A$^*$搜索。可重复使用的符号子例程显著节省了在应用于相似图像的相同类型子任务上的探索成本，产生了一个类似人类的快慢工具路径代理“FaSTA$^*$”：首先由LLMs尝试快速子任务规划，并按子任务选择基于规则的子例程，这预计可以覆盖大多数任务，而慢速 A$^*$搜索仅在遇到新颖和具有挑战性的子任务时被触发。通过与最近的图像编辑方法进行比较，我们证明 FaSTA$^*$在计算效率方面显著更高，同时在成功率方面仍能与最先进基线保持竞争力。

摘要： We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as "Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A$^*$ search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A$^*$ on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A$^*$ search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent "FaSTA$^*$'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A$^*$ search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA$^*$ is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate.

主题：	计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2506.20911 [cs.CV]
	(或者 arXiv:2506.20911v1 [cs.CV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.20911

提交历史

来自： Advait Gupta [查看电子邮件]
[v1] 星期四， 2025 年 6 月 26 日 00:33:43 UTC (41,939 KB)

计算机科学 > 计算机视觉与模式识别

标题： FaSTA$^*$：具有子程序挖掘的快速-慢速加工路径代理，用于高效的多轮图像编辑

标题： FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 计算机视觉与模式识别

标题： FaSTA$^*$：具有子程序挖掘的快速-慢速加工路径代理，用于高效的多轮图像编辑 显示英文标题

标题： FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： FaSTA$^*$：具有子程序挖掘的快速-慢速加工路径代理，用于高效的多轮图像编辑