Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2509.00361

Help | Advanced Search

Computer Science > Robotics

arXiv:2509.00361 (cs)
[Submitted on 30 Aug 2025 ]

Title: Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation

Title: 生成性视觉预测在机器人桌面操作中与任务无关的位姿估计相结合

Authors:Chuye Zhang, Xiaoxiong Zhang, Wei Pan, Linfang Zheng, Wei Zhang
Abstract: Robotic manipulation in unstructured environments requires systems that can generalize across diverse tasks while maintaining robust and reliable performance. We introduce {GVF-TAPE}, a closed-loop framework that combines generative visual foresight with task-agnostic pose estimation to enable scalable robotic manipulation. GVF-TAPE employs a generative video model to predict future RGB-D frames from a single side-view RGB image and a task description, offering visual plans that guide robot actions. A decoupled pose estimation model then extracts end-effector poses from the predicted frames, translating them into executable commands via low-level controllers. By iteratively integrating video foresight and pose estimation in a closed loop, GVF-TAPE achieves real-time, adaptive manipulation across a broad range of tasks. Extensive experiments in both simulation and real-world settings demonstrate that our approach reduces reliance on task-specific action data and generalizes effectively, providing a practical and scalable solution for intelligent robotic systems.
Abstract: 机器人在非结构化环境中的操作需要能够在多种任务中泛化同时保持强大和可靠性能的系统。 我们引入了{GVF-TAPE},一个闭环框架,将生成式视觉预见与任务无关的姿态估计相结合,以实现可扩展的机器人操作。 GVF-TAPE 使用生成式视频模型从单个侧视RGB图像和任务描述中预测未来的RGB-D帧,提供指导机器人动作的视觉计划。 然后,解耦的姿态估计模型从预测的帧中提取末端执行器姿态,并通过低级控制器将其转换为可执行命令。 通过在闭环中迭代集成视频预见和姿态估计, GVF-TAPE 实现了在广泛任务中的实时、自适应操作。 在模拟和现实世界设置中的大量实验表明,我们的方法减少了对任务特定动作数据的依赖,并能有效泛化,为智能机器人系统提供了一个实用且可扩展的解决方案。
Comments: 9th Conference on Robot Learning (CoRL 2025), Seoul, Korea
Subjects: Robotics (cs.RO)
Cite as: arXiv:2509.00361 [cs.RO]
  (or arXiv:2509.00361v1 [cs.RO] for this version)
  https://doi.org/10.48550/arXiv.2509.00361
arXiv-issued DOI via DataCite

Submission history

From: Chuye Zhang [view email]
[v1] Sat, 30 Aug 2025 04:53:32 UTC (40,485 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
  • Other Formats
view license
Current browse context:
cs.RO
< prev   |   next >
new | recent | 2025-09
Change to browse by:
cs

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号