Skip to content

Commit

Permalink
Adding DeepMind Lab and bsuite dataset descriptions to README.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 363880165
  • Loading branch information
Caglar Gulcehre authored and derpson committed Mar 26, 2021
1 parent ad49bf3 commit 0a46c8e
Showing 1 changed file with 53 additions and 0 deletions.
53 changes: 53 additions & 0 deletions rl_unplugged/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ transition include stacks of four frames to be able to do frame-stacking with
our baselines. We release datasets for 46 Atari games. For details on how the
dataset was generated, please refer to the paper.

Atari is a standard RL benchmark. We recommend you to try offline RL methods
on Atari if you are interested in comparing your approach to other state of the
art offline RL methods with discrete actions.


## DeepMind Locomotion Dataset

These tasks are made up of the corridor locomotion tasks involving the CMU
Expand All @@ -49,6 +54,10 @@ Locomotion tasks feature the combination of challenging high-DoF continuous
control along with perception from rich egocentric observations. For details on
how the dataset was generated, please refer to the paper.

We recommend you to try offline RL methods on DeepMind Locomotion dataset, if
you are interested in very challenging offline RL dataset with continuous
action space.

## DeepMind Control Suite Dataset

DeepMind Control Suite [Tassa et al., 2018] is a set of control tasks
Expand All @@ -61,6 +70,11 @@ environments Manipulator insert ball and Manipulator insert peg we use V-MPO
We release datasets for 9 control suite tasks. For details on how the dataset
was generated, please refer to the paper.

DeepMind Control Suite is a traditional continuous action RL benchmark. In
particular, we recommend you test your approach in DeepMind Control Suite if
you are interested in comparing against other state of the art offline RL
methods.

## Realworld RL Dataset

Examples in the dataset represent SARS transitions stored when running a
Expand All @@ -71,6 +85,43 @@ We release 8 datasets in total -- with no combined challenge and easy combined
challenge on the cartpole, walker, quadruped, and humanoid tasks. For details on
how the dataset was generated, please refer to the paper.

## DeepMind Lab Dataset

DeepMind Lab dataset has several levels from the challenging, partially
observable [Deepmind Lab suite](https://github.com/deepmind/lab). DeepMind Lab
dataset is collected by training distributed R2D2 by [Kapturowski et al., 2018]
agents from scratch on individual tasks. We recorded the experience across all
actors during entire training runs a few times for every task. The details of
the dataset generation process is described in [Gulcehre et al., 2021].

We release datasets for five different DeepMind Lab levels: `seekavoid_arena_01`,
`explore_rewards_few`, `explore_rewards_many`, `rooms_watermaze`,
`rooms_select_nonmatching_object`. We also release the snapshot datasets for
`seekavoid_arena_01` level that we generated the datasets from a trained R2D2
snapshot with different levels of epsilons for the epsilon-greedy algorithm
when evaluating the agent in the environment.

DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you
are interested in large-scale offline RL models with memory.

## bsuite Dataset

[bsuite](https://github.com/deepmind/bsuite) data was collected by training DQN
agents with the default setting in [Acme](https://github.com/deepmind/acme) from
scratch in each one of the following three tasks: cartpole, catch, and
mountain_car.

We converted the originally deterministic environments into stochastic ones by
randomly replacing the agent action with a uniformly sampled action with a
probability of {0, 0.1, 0.2, 0.3, 0.4, 0.5}. In this case, probability of 0
corresponds to original environment. The details of
the dataset generation process is described in [Gulcehre et al., 2021].

bsuite datasets are fairly light-weight and running experiments doesn't require
too much compute. We recommend you to try bsuite, if you are interested in
small-scale and easy to run offline RL datasets generated by stochastic
environments where the stochasticity of the environment is easy to control.

## Running the code

### Installation
Expand Down Expand Up @@ -178,3 +229,5 @@ engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
[Song et al., 2020]: https://arxiv.org/abs/1909.12238
[Tassa et al., 2018]: https://arxiv.org/abs/1801.00690
[Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf
[Kapturowski et al., 2018]: https://openreview.net/forum?id=r1lyTjAqYX
[Gulcehre et al., 2021]: https://arxiv.org/abs/2103.09575

0 comments on commit 0a46c8e

Please sign in to comment.