diff --git a/rl_unplugged/README.md b/rl_unplugged/README.md index 8b7f9757..0eddae98 100644 --- a/rl_unplugged/README.md +++ b/rl_unplugged/README.md @@ -38,6 +38,11 @@ transition include stacks of four frames to be able to do frame-stacking with our baselines. We release datasets for 46 Atari games. For details on how the dataset was generated, please refer to the paper. +Atari is a standard RL benchmark. We recommend you to try offline RL methods +on Atari if you are interested in comparing your approach to other state of the +art offline RL methods with discrete actions. + + ## DeepMind Locomotion Dataset These tasks are made up of the corridor locomotion tasks involving the CMU @@ -49,6 +54,10 @@ Locomotion tasks feature the combination of challenging high-DoF continuous control along with perception from rich egocentric observations. For details on how the dataset was generated, please refer to the paper. +We recommend you to try offline RL methods on DeepMind Locomotion dataset, if +you are interested in very challenging offline RL dataset with continuous +action space. + ## DeepMind Control Suite Dataset DeepMind Control Suite [Tassa et al., 2018] is a set of control tasks @@ -61,6 +70,11 @@ environments Manipulator insert ball and Manipulator insert peg we use V-MPO We release datasets for 9 control suite tasks. For details on how the dataset was generated, please refer to the paper. +DeepMind Control Suite is a traditional continuous action RL benchmark. In +particular, we recommend you test your approach in DeepMind Control Suite if +you are interested in comparing against other state of the art offline RL +methods. + ## Realworld RL Dataset Examples in the dataset represent SARS transitions stored when running a @@ -71,6 +85,43 @@ We release 8 datasets in total -- with no combined challenge and easy combined challenge on the cartpole, walker, quadruped, and humanoid tasks. For details on how the dataset was generated, please refer to the paper. +## DeepMind Lab Dataset + +DeepMind Lab dataset has several levels from the challenging, partially +observable [Deepmind Lab suite](https://github.com/deepmind/lab). DeepMind Lab +dataset is collected by training distributed R2D2 by [Kapturowski et al., 2018] +agents from scratch on individual tasks. We recorded the experience across all +actors during entire training runs a few times for every task. The details of +the dataset generation process is described in [Gulcehre et al., 2021]. + +We release datasets for five different DeepMind Lab levels: `seekavoid_arena_01`, +`explore_rewards_few`, `explore_rewards_many`, `rooms_watermaze`, +`rooms_select_nonmatching_object`. We also release the snapshot datasets for +`seekavoid_arena_01` level that we generated the datasets from a trained R2D2 +snapshot with different levels of epsilons for the epsilon-greedy algorithm +when evaluating the agent in the environment. + +DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you +are interested in large-scale offline RL models with memory. + +## bsuite Dataset + +[bsuite](https://github.com/deepmind/bsuite) data was collected by training DQN +agents with the default setting in [Acme](https://github.com/deepmind/acme) from +scratch in each one of the following three tasks: cartpole, catch, and +mountain_car. + +We converted the originally deterministic environments into stochastic ones by +randomly replacing the agent action with a uniformly sampled action with a +probability of {0, 0.1, 0.2, 0.3, 0.4, 0.5}. In this case, probability of 0 +corresponds to original environment. The details of +the dataset generation process is described in [Gulcehre et al., 2021]. + +bsuite datasets are fairly light-weight and running experiments doesn't require +too much compute. We recommend you to try bsuite, if you are interested in +small-scale and easy to run offline RL datasets generated by stochastic +environments where the stochasticity of the environment is easy to control. + ## Running the code ### Installation @@ -178,3 +229,5 @@ engines such as Google Dataset Search. [Song et al., 2020]: https://arxiv.org/abs/1909.12238 [Tassa et al., 2018]: https://arxiv.org/abs/1801.00690 [Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf +[Kapturowski et al., 2018]: https://openreview.net/forum?id=r1lyTjAqYX +[Gulcehre et al., 2021]: https://arxiv.org/abs/2103.09575