Pre-training with Synthetic Data Helps Offline Reinforcement Learning

Wang, Zecheng; Wang, Che; Dong, Zixuan; Ross, Keith

Computer Science > Artificial Intelligence

arXiv:2310.00771v3 (cs)

[Submitted on 1 Oct 2023 (v1) , revised 24 Feb 2024 (this version, v3) , latest version 27 May 2024 (v4) ]

Title: Pre-training with Synthetic Data Helps Offline Reinforcement Learning

Title: 使用合成数据进行预训练有助于离线强化学习

Authors:Zecheng Wang, Che Wang, Zixuan Dong, Keith Ross

Abstract: Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is Q-learning-based and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets. The results of this paper not only illustrate the importance of pre-training for offline DRL but also show that the pre-training data can be synthetic and generated with remarkably simple mechanisms.

Abstract: 最近，已经证明对于离线深度强化学习（DRL），使用大规模语言语料库对决策转换器进行预训练可以提高下游性能（Reid等，2022年）。一个自然的问题是，这种性能提升是否只能通过语言预训练实现，或者是否可以通过不涉及语言的更简单的预训练方案实现。在本文中，我们首先表明语言对于性能提升不是必需的，事实上，使用合成独立同分布数据进行少量更新的预训练可以达到与使用大规模语言语料库预训练相当的性能提升；此外，使用由一步马尔可夫链生成的数据进行预训练可以进一步提高性能。受这些实验结果的启发，我们随后考虑了保守Q学习（CQL）的预训练，这是一种流行的离线DRL算法，基于Q学习，并通常采用多层感知机（MLP）主干。令人惊讶的是，使用少量更新的简单合成数据进行预训练也可以改进CQL，在D4RL Gym运动数据集上提供了稳定的性能提升。本文的结果不仅说明了预训练对于离线DRL的重要性，还表明预训练数据可以是合成的，并且可以用非常简单的机制生成。

Comments:	37 pages, 17 figures
Subjects:	Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Cite as:	arXiv:2310.00771 [cs.AI]
	(or arXiv:2310.00771v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2310.00771

Submission history

From: Zecheng Wang [view email]
[v1] Sun, 1 Oct 2023 19:32:14 UTC (23,156 KB)
[v2] Fri, 6 Oct 2023 03:16:08 UTC (6,787 KB)
[v3] Sat, 24 Feb 2024 13:54:06 UTC (21,353 KB)
[v4] Mon, 27 May 2024 17:16:03 UTC (21,353 KB)

Computer Science > Artificial Intelligence

Title: Pre-training with Synthetic Data Helps Offline Reinforcement Learning

Title: 使用合成数据进行预训练有助于离线强化学习

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title: Pre-training with Synthetic Data Helps Offline Reinforcement Learning Show Chinese title

Title: 使用合成数据进行预训练有助于离线强化学习

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: Pre-training with Synthetic Data Helps Offline Reinforcement Learning