Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2501.00740

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.00740 (cs)
[Submitted on 1 Jan 2025 (v1) , last revised 15 Apr 2025 (this version, v3)]

Title: RORem: Training a Robust Object Remover with Human-in-the-Loop

Title: RORem:通过人机交互训练一个鲁棒的物体移除器

Authors:Ruibin Li, Tao Yang, Song Guo, Lei Zhang
Abstract: Despite the significant advancements, existing object removal methods struggle with incomplete removal, incorrect content synthesis and blurry synthesized regions, resulting in low success rates. Such issues are mainly caused by the lack of high-quality paired training data, as well as the self-supervised training paradigm adopted in these methods, which forces the model to in-paint the masked regions, leading to ambiguity between synthesizing the masked objects and restoring the background. To address these issues, we propose a semi-supervised learning strategy with human-in-the-loop to create high-quality paired training data, aiming to train a Robust Object Remover (RORem). We first collect 60K training pairs from open-source datasets to train an initial object removal model for generating removal samples, and then utilize human feedback to select a set of high-quality object removal pairs, with which we train a discriminator to automate the following training data generation process. By iterating this process for several rounds, we finally obtain a substantial object removal dataset with over 200K pairs. Fine-tuning the pre-trained stable diffusion model with this dataset, we obtain our RORem, which demonstrates state-of-the-art object removal performance in terms of both reliability and image quality. Particularly, RORem improves the object removal success rate over previous methods by more than 18\%. The dataset, source code and trained model are available at https://github.com/leeruibin/RORem.
Abstract: 尽管取得了显著进展,现有的目标移除方法在不完全移除、错误的内容合成和模糊的合成区域方面仍然存在困难,导致成功率较低。 这些问题主要是由于缺乏高质量的成对训练数据,以及这些方法采用的自监督训练范式,迫使模型填补被遮挡区域,从而在合成被遮挡对象和恢复背景之间产生歧义。 为了解决这些问题,我们提出了一种人机交互的半监督学习策略,以创建高质量的成对训练数据,旨在训练一个鲁棒的目标移除器(RORem)。 我们首先从开源数据集中收集60K个训练对,以训练初始的目标移除模型来生成移除样本,然后利用人类反馈选择一组高质量的目标移除对,用这些数据训练一个判别器以自动化后续的训练数据生成过程。 通过进行多轮迭代,我们最终获得了一个包含超过200K对的大规模目标移除数据集。 使用该数据集微调预训练的稳定扩散模型,我们得到了RORem,它在可靠性和图像质量方面都表现出最先进的目标移除性能。 特别是,RORem相比之前的方法将目标移除成功率提高了18%以上。 数据集、源代码和训练好的模型可在 https://github.com/leeruibin/RORem 获取。
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2501.00740 [cs.CV]
  (or arXiv:2501.00740v3 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2501.00740
arXiv-issued DOI via DataCite

Submission history

From: Ruibin Li [view email]
[v1] Wed, 1 Jan 2025 06:07:02 UTC (5,721 KB)
[v2] Thu, 23 Jan 2025 10:22:58 UTC (5,721 KB)
[v3] Tue, 15 Apr 2025 12:16:15 UTC (5,721 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
license icon view license
Current browse context:
cs.CV
< prev   |   next >
new | recent | 2025-01
Change to browse by:
cs

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号