CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Yu, Xinlei; Wang, Changmiao; Jin, Hui; Elazab, Ahmed; Jia, Gangyong; Wan, Xiang; Zou, Changqing; Ge, Ruiquan

电气工程与系统科学 > 图像与视频处理

arXiv:2506.23121 (eess)

[提交于 2025年6月29日 (v1) ，最后修订 2025年7月6日 (此版本， v2)]

标题： CRISP-SAM2：用于多器官分割的具有跨模态交互和语义提示的SAM2

标题： CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Authors:Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge

摘要：多器官医学分割是医学图像处理中的关键组成部分，对于医生做出准确诊断和制定有效的治疗计划至关重要。尽管在这一领域取得了显著进展，但当前的多器官分割模型往往存在细节不准确、依赖几何提示和空间信息丢失的问题。为解决这些挑战，我们引入了一个名为CRISP-SAM2的新模型，该模型基于SAM2的CRoss-modal交互和语义提示。该模型代表了一种通过器官文本描述引导的多器官医学分割有前途的方法。我们的方法首先通过逐步的跨模态注意力交互机制，将视觉和文本输入转换为跨模态上下文语义。然后将这些语义注入图像编码器，以增强对视觉信息的详细理解。为了消除对几何提示的依赖，我们使用了语义提示策略，替换原始提示编码器以增强对困难目标的感知。此外，应用了一种用于记忆的相似性排序自我更新策略和掩码精炼过程，以进一步适应医学成像并增强局部细节。在七个公共数据集上进行的比较实验表明，CRISP-SAM2优于现有模型。广泛的分析也证明了我们方法的有效性，从而确认了其优越的性能，尤其是在解决前述限制方面。我们的代码可在以下地址获取：https://github.com/YU-deep/CRISP_SAM2.git.

摘要： Multi-organ medical segmentation is a crucial component of medical image processing, essential for doctors to make accurate diagnoses and develop effective treatment plans. Despite significant progress in this field, current multi-organ segmentation models often suffer from inaccurate details, dependence on geometric prompts and loss of spatial information. Addressing these challenges, we introduce a novel model named CRISP-SAM2 with CRoss-modal Interaction and Semantic Prompting based on SAM2. This model represents a promising approach to multi-organ medical segmentation guided by textual descriptions of organs. Our method begins by converting visual and textual inputs into cross-modal contextualized semantics using a progressive cross-attention interaction mechanism. These semantics are then injected into the image encoder to enhance the detailed understanding of visual information. To eliminate reliance on geometric prompts, we use a semantic prompting strategy, replacing the original prompt encoder to sharpen the perception of challenging targets. In addition, a similarity-sorting self-updating strategy for memory and a mask-refining process is applied to further adapt to medical imaging and enhance localized details. Comparative experiments conducted on seven public datasets indicate that CRISP-SAM2 outperforms existing models. Extensive analysis also demonstrates the effectiveness of our method, thereby confirming its superior performance, especially in addressing the limitations mentioned earlier. Our code is available at: https://github.com/YU-deep/CRISP_SAM2.git.

评论：	被ACMMM25接受
主题：	图像与视频处理 (eess.IV) ; 人工智能 (cs.AI); 计算机视觉与模式识别 (cs.CV); 机器学习 (cs.LG)
引用方式：	arXiv:2506.23121 [eess.IV]
	(或者 arXiv:2506.23121v2 [eess.IV] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.23121

提交历史

来自： Xinlei Yu [查看电子邮件]
[v1] 星期日， 2025 年 6 月 29 日 07:05:27 UTC (3,211 KB)
[v2] 星期日， 2025 年 7 月 6 日 02:53:08 UTC (3,211 KB)

电气工程与系统科学 > 图像与视频处理

标题： CRISP-SAM2：用于多器官分割的具有跨模态交互和语义提示的SAM2

标题： CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 图像与视频处理

标题： CRISP-SAM2：用于多器官分割的具有跨模态交互和语义提示的SAM2 显示英文标题

标题： CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： CRISP-SAM2：用于多器官分割的具有跨模态交互和语义提示的SAM2