Adding DeepMind Lab and bsuite dataset descriptions to README.

PiperOrigin-RevId: 363880165
victor-mageto · Mar 26, 2021 · 0a46c8e · 0a46c8e
1 parent ad49bf3
commit 0a46c8e
Showing 1 changed file with 53 additions and 0 deletions.
diff --git a/rl_unplugged/README.md b/rl_unplugged/README.md
@@ -38,6 +38,11 @@ transition include stacks of four frames to be able to do frame-stacking with
 our baselines. We release datasets for 46 Atari games. For details on how the
 dataset was generated, please refer to the paper.
 
+Atari is a standard RL benchmark. We recommend you to try offline RL methods
+on Atari if you are interested in comparing your approach to other state of the
+art offline RL methods with discrete actions.
+
+
 ## DeepMind Locomotion Dataset
 
 These tasks are made up of the corridor locomotion tasks involving the CMU
@@ -49,6 +54,10 @@ Locomotion tasks feature the combination of challenging high-DoF continuous
 control along with perception from rich egocentric observations. For details on
 how the dataset was generated, please refer to the paper.
 
+We recommend you to try offline RL methods on DeepMind Locomotion dataset, if
+you are interested in very challenging offline RL dataset with continuous
+action space.
+
 ## DeepMind Control Suite Dataset
 
 DeepMind Control Suite [Tassa et al., 2018] is a set of control tasks
@@ -61,6 +70,11 @@ environments Manipulator insert ball and Manipulator insert peg we use V-MPO
 We release datasets for 9 control suite tasks. For details on how the dataset
 was generated, please refer to the paper.
 
+DeepMind Control Suite is a traditional continuous action RL benchmark. In
+particular, we recommend you test your approach in DeepMind Control Suite if
+you are interested in comparing against other state of the art offline RL
+methods.
+
 ## Realworld RL Dataset
 
 Examples in the dataset represent SARS transitions stored when running a
@@ -71,6 +85,43 @@ We release 8 datasets in total -- with no combined challenge and easy combined
 challenge on the cartpole, walker, quadruped, and humanoid tasks. For details on
 how the dataset was generated, please refer to the paper.
 
+## DeepMind Lab Dataset
+
+DeepMind Lab dataset has several levels from the challenging, partially
+observable [Deepmind Lab suite](https://github.com/deepmind/lab). DeepMind Lab
+dataset is collected by training distributed R2D2 by [Kapturowski et al., 2018]
+agents from scratch on individual tasks. We recorded the experience across all
+actors during entire training runs a few times for every task. The details of
+the dataset generation process is described in [Gulcehre et al., 2021].
+
+We release datasets for five different DeepMind Lab levels: `seekavoid_arena_01`,
+`explore_rewards_few`, `explore_rewards_many`, `rooms_watermaze`,
+`rooms_select_nonmatching_object`. We also release the snapshot datasets for
+`seekavoid_arena_01` level that we generated the datasets from a trained R2D2
+snapshot with different levels of epsilons for the epsilon-greedy algorithm
+when evaluating the agent in the environment.
+
+DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you
+are interested in large-scale offline RL models with memory.
+
+## bsuite Dataset
+
+[bsuite](https://github.com/deepmind/bsuite) data was collected by training DQN
+agents with the default setting in [Acme](https://github.com/deepmind/acme) from
+scratch in each one of the following three tasks: cartpole, catch, and
+mountain_car.
+
+We converted the originally deterministic environments into stochastic ones by
+randomly replacing the agent action with a uniformly sampled action with a
+probability  of {0, 0.1, 0.2, 0.3, 0.4, 0.5}. In this case, probability of 0
+corresponds to original environment.  The details of
+the dataset generation process is described in [Gulcehre et al., 2021].
+
+bsuite datasets are fairly light-weight and running experiments doesn't require
+too much compute. We recommend you to try bsuite, if you are interested in
+small-scale and easy to run offline RL datasets generated by stochastic
+environments where the stochasticity of the environment is easy to control.
+
 ## Running the code
 
 ### Installation
@@ -178,3 +229,5 @@ engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
 [Song et al., 2020]: https://arxiv.org/abs/1909.12238
 [Tassa et al., 2018]: https://arxiv.org/abs/1801.00690
 [Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf
+[Kapturowski et al., 2018]: https://openreview.net/forum?id=r1lyTjAqYX
+[Gulcehre et al., 2021]: https://arxiv.org/abs/2103.09575