Behavioral Exploration: Learning to Explore via In-Context Adaptation

Wagenmaker, Andrew; Zhou, Zhiyuan; Levine, Sergey

计算机科学 > 机器学习

arXiv:2507.09041 (cs)

[提交于 2025年7月11日 ]

标题：行为探索：通过上下文适应学习探索

标题： Behavioral Exploration: Learning to Explore via In-Context Adaptation

Authors:Andrew Wagenmaker, Zhiyuan Zhou, Sergey Levine

摘要：开发能够快速探索环境并在线适应行为的自主代理是机器人技术和机器学习中的一个经典挑战。虽然人类能够在很少的交互中快速实现在线探索和适应，经常能够获取新的信息和技能，但现有的算法方法往往依赖于随机探索和缓慢的基于梯度的行为更新。我们如何才能赋予自主代理与人类相当的能力？受到最近在上下文学习和大规模行为克隆方面进展的启发，本文我们提出了行为探索：训练代理在“专家”行为的空间中内化什么是上下文中的探索和适应。为了实现这一点，给定一个专家演示的数据集，我们训练一个长上下文生成模型，根据过去的观察和相对于该上下文的“探索性”来预测专家动作。这使得模型不仅能够模仿专家的行为，而且通过将其过去的交互历史输入到其上下文中，可以选择不同于之前选择的专家行为，从而实现快速的在线适应和有针对性的“类似专家”的探索。我们在模拟的运动和操作设置以及现实世界的机器人操作任务中展示了我们方法的有效性，说明了其学习适应性和探索性行为的能力。

摘要： Developing autonomous agents that quickly explore an environment and adapt their behavior online is a canonical challenge in robotics and machine learning. While humans are able to achieve such fast online exploration and adaptation, often acquiring new information and skills in only a handful of interactions, existing algorithmic approaches tend to rely on random exploration and slow, gradient-based behavior updates. How can we endow autonomous agents with such capabilities on par with humans? Taking inspiration from recent progress on both in-context learning and large-scale behavioral cloning, in this work we propose behavioral exploration: training agents to internalize what it means to explore and adapt in-context over the space of ``expert'' behaviors. To achieve this, given access to a dataset of expert demonstrations, we train a long-context generative model to predict expert actions conditioned on a context of past observations and a measure of how ``exploratory'' the expert's behaviors are relative to this context. This enables the model to not only mimic the behavior of an expert, but also, by feeding its past history of interactions into its context, to select different expert behaviors than what have been previously selected, thereby allowing for fast online adaptation and targeted, ``expert-like'' exploration. We demonstrate the effectiveness of our method in both simulated locomotion and manipulation settings, as well as on real-world robotic manipulation tasks, illustrating its ability to learn adaptive, exploratory behavior.

主题：	机器学习 (cs.LG) ; 机器人技术 (cs.RO); 系统与控制 (eess.SY)
引用方式：	arXiv:2507.09041 [cs.LG]
	(或者 arXiv:2507.09041v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.09041

提交历史

来自： Andrew Wagenmaker [查看电子邮件]
[v1] 星期五， 2025 年 7 月 11 日 21:36:19 UTC (38,981 KB)

计算机科学 > 机器学习

标题：行为探索：通过上下文适应学习探索

标题： Behavioral Exploration: Learning to Explore via In-Context Adaptation

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 行为探索：通过上下文适应学习探索 显示英文标题

标题： Behavioral Exploration: Learning to Explore via In-Context Adaptation

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：行为探索：通过上下文适应学习探索