X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations

Pace, Maximus A.; Dan, Prithwish; Ning, Chuanruo; Bhardwaj, Atiksh; Du, Audrey; Duan, Edward W.; Ma, Wei-Chiu; Kedia, Kushal

计算机科学 > 机器人技术

arXiv:2511.04671v1 (cs)

[提交于 2025年11月6日 ]

标题： X-Diffusion：在跨身体人类示范上训练扩散策略

标题： X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations

Authors:Maximus A. Pace, Prithwish Dan, Chuanruo Ning, Atiksh Bhardwaj, Audrey Du, Edward W. Duan, Wei-Chiu Ma, Kushal Kedia

摘要：人类视频可以快速且大规模地记录，使其成为机器人学习有吸引力的训练数据来源。然而，人类和机器人在本体上存在根本差异，导致动作执行不匹配。因此，直接将人类手部运动的运动学重新映射可能会产生对机器人来说物理上不可行的动作。尽管存在这些低层次的差异，人类演示仍然提供了关于如何操作和与物体交互的有价值运动线索。我们的核心思想是利用前向扩散过程：当向动作中添加噪声时，低层次的执行差异会消失，而高层次的任务指导会被保留。我们提出了X-Diffusion，这是一种训练扩散策略的原则性框架，在不学习动态上不可行的运动的情况下，最大限度地利用人类数据。 X-Diffusion首先训练一个分类器来预测一个带有噪声的动作是由人类还是机器人执行的。然后，在添加了足够的噪声使得分类器无法辨别其本体之后，才将人类动作纳入策略训练中。与机器人执行一致的动作在低噪声水平下监督精细的去噪，而不匹配的人类动作在高噪声水平下仅提供粗略指导。我们的实验表明，在执行不匹配的情况下，简单的联合训练会降低策略性能，而X-Diffusion则持续提高性能。在五个操作任务中，X-Diffusion的平均成功率比最佳基线高出16%。项目网站可在 https://portal-cornell.github.io/X-Diffusion/ 上访问。

摘要： Human videos can be recorded quickly and at scale, making them an appealing source of training data for robot learning. However, humans and robots differ fundamentally in embodiment, resulting in mismatched action execution. Direct kinematic retargeting of human hand motion can therefore produce actions that are physically infeasible for robots. Despite these low-level differences, human demonstrations provide valuable motion cues about how to manipulate and interact with objects. Our key idea is to exploit the forward diffusion process: as noise is added to actions, low-level execution differences fade while high-level task guidance is preserved. We present X-Diffusion, a principled framework for training diffusion policies that maximally leverages human data without learning dynamically infeasible motions. X-Diffusion first trains a classifier to predict whether a noisy action is executed by a human or robot. Then, a human action is incorporated into policy training only after adding sufficient noise such that the classifier cannot discern its embodiment. Actions consistent with robot execution supervise fine-grained denoising at low noise levels, while mismatched human actions provide only coarse guidance at higher noise levels. Our experiments show that naive co-training under execution mismatches degrades policy performance, while X-Diffusion consistently improves it. Across five manipulation tasks, X-Diffusion achieves a 16% higher average success rate than the best baseline. The project website is available at https://portal-cornell.github.io/X-Diffusion/.

主题：	机器人技术 (cs.RO) ; 人工智能 (cs.AI); 计算机视觉与模式识别 (cs.CV)
引用方式：	arXiv:2511.04671 [cs.RO]
	(或者 arXiv:2511.04671v1 [cs.RO] 对于此版本)
	https://doi.org/10.48550/arXiv.2511.04671

提交历史

来自： Maximus Pace [查看电子邮件]
[v1] 星期四， 2025 年 11 月 6 日 18:56:30 UTC (26,113 KB)

计算机科学 > 机器人技术

标题： X-Diffusion：在跨身体人类示范上训练扩散策略

标题： X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器人技术

标题： X-Diffusion：在跨身体人类示范上训练扩散策略 显示英文标题

标题： X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题： X-Diffusion：在跨身体人类示范上训练扩散策略