Skip to main content
CenXiv.org
This website is in trial operation, support us!
We gratefully acknowledge support from all contributors.
Contribute
Donate
cenxiv logo > cs > arXiv:2504.12714

Help | Advanced Search

Computer Science > Multiagent Systems

arXiv:2504.12714 (cs)
[Submitted on 17 Apr 2025 (v1) , last revised 20 Apr 2025 (this version, v2)]

Title: Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

Title: 跨环境协作实现零样本多智能体协调

Authors:Kunal Jha, Wilka Carvalho, Yancheng Liang, Simon S. Du, Max Kleiman-Weiner, Natasha Jaques
Abstract: Zero-shot coordination (ZSC), the ability to adapt to a new partner in a cooperative task, is a critical component of human-compatible AI. While prior work has focused on training agents to cooperate on a single task, these specialized models do not generalize to new tasks, even if they are highly similar. Here, we study how reinforcement learning on a distribution of environments with a single partner enables learning general cooperative skills that support ZSC with many new partners on many new problems. We introduce two Jax-based, procedural generators that create billions of solvable coordination challenges. We develop a new paradigm called Cross-Environment Cooperation (CEC), and show that it outperforms competitive baselines quantitatively and qualitatively when collaborating with real people. Our findings suggest that learning to collaborate across many unique scenarios encourages agents to develop general norms, which prove effective for collaboration with different partners. Together, our results suggest a new route toward designing generalist cooperative agents capable of interacting with humans without requiring human data.
Abstract: 零样本协作(ZSC),即在合作任务中适应新伙伴的能力,是人兼容人工智能的关键组成部分。 尽管之前的工作集中在训练代理在单一任务上进行合作,但这些专业模型即使在高度相似的任务上也无法泛化。 在这里,我们研究了在具有单一合作伙伴的环境分布上进行强化学习,如何使学习到的通用合作技能支持与许多新合作伙伴在许多新问题上的零样本协作。 我们引入了两个基于Jax的程序生成器,可以创建数十亿个可解决的协作挑战。 我们开发了一种新的范式,称为跨环境协作(CEC),并表明当与真实人类合作时,它在定量和定性方面都优于竞争性基线。 我们的发现表明,在许多独特场景中学习协作会促使代理发展出通用规范,这些规范在与不同合作伙伴协作时证明是有效的。 总的来说,我们的结果表明了一条新的设计通用合作代理的途径,这些代理能够在不需人类数据的情况下与人类互动。
Comments: Accepted to CogSci 2025, In-review for ICML 2025
Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2504.12714 [cs.MA]
  (or arXiv:2504.12714v2 [cs.MA] for this version)
  https://doi.org/10.48550/arXiv.2504.12714
arXiv-issued DOI via DataCite

Submission history

From: Kunal Jha [view email]
[v1] Thu, 17 Apr 2025 07:41:25 UTC (1,796 KB)
[v2] Sun, 20 Apr 2025 20:10:41 UTC (1,504 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled
  • View Chinese PDF
  • View PDF
  • HTML (experimental)
  • TeX Source
license icon view license
Current browse context:
cs.MA
< prev   |   next >
new | recent | 2025-04
Change to browse by:
cs
cs.AI
cs.LG

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
a export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack

京ICP备2025123034号