RORem: Training a Robust Object Remover with Human-in-the-Loop

Li, Ruibin; Yang, Tao; Guo, Song; Zhang, Lei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.00740 (cs)

[Submitted on 1 Jan 2025 (v1) , last revised 15 Apr 2025 (this version, v3)]

Title: RORem: Training a Robust Object Remover with Human-in-the-Loop

Title: RORem：通过人机交互训练一个鲁棒的物体移除器

Authors:Ruibin Li, Tao Yang, Song Guo, Lei Zhang

Abstract: Despite the significant advancements, existing object removal methods struggle with incomplete removal, incorrect content synthesis and blurry synthesized regions, resulting in low success rates. Such issues are mainly caused by the lack of high-quality paired training data, as well as the self-supervised training paradigm adopted in these methods, which forces the model to in-paint the masked regions, leading to ambiguity between synthesizing the masked objects and restoring the background. To address these issues, we propose a semi-supervised learning strategy with human-in-the-loop to create high-quality paired training data, aiming to train a Robust Object Remover (RORem). We first collect 60K training pairs from open-source datasets to train an initial object removal model for generating removal samples, and then utilize human feedback to select a set of high-quality object removal pairs, with which we train a discriminator to automate the following training data generation process. By iterating this process for several rounds, we finally obtain a substantial object removal dataset with over 200K pairs. Fine-tuning the pre-trained stable diffusion model with this dataset, we obtain our RORem, which demonstrates state-of-the-art object removal performance in terms of both reliability and image quality. Particularly, RORem improves the object removal success rate over previous methods by more than 18\%. The dataset, source code and trained model are available at https://github.com/leeruibin/RORem.

Abstract: 尽管取得了显著进展，现有的目标移除方法在不完全移除、错误的内容合成和模糊的合成区域方面仍然存在困难，导致成功率较低。这些问题主要是由于缺乏高质量的成对训练数据，以及这些方法采用的自监督训练范式，迫使模型填补被遮挡区域，从而在合成被遮挡对象和恢复背景之间产生歧义。为了解决这些问题，我们提出了一种人机交互的半监督学习策略，以创建高质量的成对训练数据，旨在训练一个鲁棒的目标移除器（RORem）。我们首先从开源数据集中收集60K个训练对，以训练初始的目标移除模型来生成移除样本，然后利用人类反馈选择一组高质量的目标移除对，用这些数据训练一个判别器以自动化后续的训练数据生成过程。通过进行多轮迭代，我们最终获得了一个包含超过200K对的大规模目标移除数据集。使用该数据集微调预训练的稳定扩散模型，我们得到了RORem，它在可靠性和图像质量方面都表现出最先进的目标移除性能。特别是，RORem相比之前的方法将目标移除成功率提高了18%以上。数据集、源代码和训练好的模型可在 https://github.com/leeruibin/RORem 获取。

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.00740 [cs.CV]
	(or arXiv:2501.00740v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.00740

Submission history

From: Ruibin Li [view email]
[v1] Wed, 1 Jan 2025 06:07:02 UTC (5,721 KB)
[v2] Thu, 23 Jan 2025 10:22:58 UTC (5,721 KB)
[v3] Tue, 15 Apr 2025 12:16:15 UTC (5,721 KB)

Computer Science > Computer Vision and Pattern Recognition

Title: RORem: Training a Robust Object Remover with Human-in-the-Loop

Title: RORem：通过人机交互训练一个鲁棒的物体移除器

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title: RORem: Training a Robust Object Remover with Human-in-the-Loop Show Chinese title

Title: RORem：通过人机交互训练一个鲁棒的物体移除器

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: RORem: Training a Robust Object Remover with Human-in-the-Loop