Skip to content

Commit

Permalink
small updates for dm (Denys88#235)
Browse files Browse the repository at this point in the history
* a lot of small changes

* two hot almost works

* added deepmind

* readme

* small imporvements

* fixed merge

* fixed stand

* fxied

---------

Co-authored-by: Noname <nepalimsa@>
  • Loading branch information
Denys88 authored Apr 23, 2023
1 parent 55aa8a0 commit c0caa2e
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,8 @@ Additional environment supported properties and functions
* Added Deepmind Control PPO benchmark.
* Added a few more experimental ways to train value prediction (OneHot, TwoHot encoding and crossentropy loss instead of L2).
* New methods didn't. It is impossible to turn it on from the yaml files. Once we find an env which trains better it will be added to the config.
* Added shaped reward graph to the tensorboard.


1.6.0

Expand Down
2 changes: 1 addition & 1 deletion docs/DEEPMIND_ENVPOOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ poetry run python runner.py --train --file rl_games/configs/dm_control/humanoid_
## Results:

* No tuning. I just run it on a couple of envs.
* I used 4000 epochs which is ~32M steps for almost all envs except HumanoidRun. But a few millions of stepsa was enough for the most of the envs.
* I used 4000 epochs which is ~32M steps for almost all envs except HumanoidRun. But a few millions of steps was enough for the most of the envs.
* Deepmind used a pretty strange reward and training rules. A simple reward transformation: log(reward + 1) achieves best scores faster.

| Env | Rewards |
Expand Down

0 comments on commit c0caa2e

Please sign in to comment.