RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established. This is a dataset accompanying the paper RL Unplugged: Benchmarks for Offline Reinforcement Learning.
In this suite of benchmarks, we try to focus on the following problems:
-
High dimensional action spaces, for example the locomotion humanoid domains, we have 56 dimensional actions.
-
High dimensional observations.
-
Partial observability, observations have egocentric vision.
-
Difficulty of exploration, using states of the art algorithms and imitation to generate data for difficult environments.
-
Real world challenges.
The data is available under RL Unplugged GCP bucket.
We are releasing a large and diverse dataset of gameplay following the protocol described by Agarwal et al., 2020, which can be used to evaluate several discrete offline RL algorithms. The dataset is generated by running an online DQN agent and recording transitions from its replay during training with sticky actions Machado et al., 2018. As stated in Agarwal et al., 2020, for each game we use data from five runs with 50 million transitions each. States in each transition include stacks of four frames to be able to do frame-stacking with our baselines. We release datasets for 46 Atari games. For details on how the dataset was generated, please refer to the paper.
These tasks are made up of the corridor locomotion tasks involving the CMU Humanoid, for which prior efforts have either used motion capture data Merel et al., 2019a, Merel et al., 2019b or training from scratch Song et al., 2020. In addition, the DM Locomotion repository contains a set of tasks adapted to be suited to a virtual rodent Merel et al., 2020. We emphasize that the DM Locomotion tasks feature the combination of challenging high-DoF continuous control along with perception from rich egocentric observations. For details on how the dataset was generated, please refer to the paper.
DeepMind Control Suite Tassa et al., 2018 is a set of control tasks implemented in MuJoCo Todorov et al., 2012. We consider a subset of the tasks provided in the suite that cover a wide range of difficulties.
Most of the datasets in this domain are generated using D4PG. For the environments Manipulator insert ball and Manipulator insert peg we use V-MPO Song et al., 2020 to generate the data as D4PG is unable to solve these tasks. We release datasets for 9 control suite tasks. For details on how the dataset was generated, please refer to the paper.
Examples in the dataset represent SARS transitions stored when running a partially online trained agent as described in RWRL.
We release 8 datasets in total -- with no combined challenge and easy combined challenge on the cartpole, walker, quadruped, and humanoid tasks. For details on how the dataset was generated, please refer to the paper.
- Install dependencies:
pip install -r requirements.txt
- (Optional) Setup MuJoCo license key for DM Control environments (instructions).
- (Optional) Install realworldrl_suite.
mkdir -p /tmp/dataset/Asterix
gsutil cp gs://rl_unplugged/atari/Asterix/run_1-00000-of-00100 \
/tmp/dataset/Asterix/run_1-00000-of-00001
python atari_example.py --path=/tmp/dataset --game=Asterix
This copies a single shard from one of the Asterix datasets from GCP to a local folder, and then runs a script that loads a single example and runs a step on the Atari environment.
Please use the following bibtex for citations:
@misc{gulcehre2020rl,
title={RL Unplugged: Benchmarks for Offline Reinforcement Learning},
author={Caglar Gulcehre and Ziyu Wang and Alexander Novikov and Tom Le Paine
and Sergio Gómez Colmenarejo and Konrad Zolna and Rishabh Agarwal and
Josh Merel and Daniel Mankowitz and Cosmin Paduraru and Gabriel
Dulac-Arnold and Jerry Li and Mohammad Norouzi and Matt Hoffman and
Ofir Nachum and George Tucker and Nicolas Heess and Nando deFreitas},
year={2020},
eprint={2006.13888},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.
property | value | ||||||
---|---|---|---|---|---|---|---|
name | RL Unplugged |
||||||
url | https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged |
||||||
sameAs | https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged |
||||||
description |
Data accompanying
[RL Unplugged: Benchmarks for Offline Reinforcement Learning]().
|
||||||
provider |
|
||||||
citation | https://identifiers.org/arxiv:2006.13888 |