Skip to content

zuxinrui/rl_games

 
 

Repository files navigation

Basic RL Algorithms Implementations

Watch the video

How to run configs:

Pytorch

  • python runner.py --train --file rl_games/configs/smac/3m_torch.yaml
  • python runner.py --play --file rl_games/configs/smac/3m_torch.yaml --checkpoint 'nn/3m_cnn'

Tensorflow

  • python runner.py --tf --train --file rl_games/configs/smac/3m_torch.yaml
  • python runner.py --tf --play --file rl_games/configs/smac/3m_torch.yaml --checkpoint 'nn/3m_cnn'
  • tensorboard --logdir runs

Results on some environments:

  • 2m_vs_1z took near 2 minutes to achive 100% WR
  • corridor took near 2 hours for 95+% WR
  • MMM2 4 hours for 90+% WR
  • 6h_vs_8z got 82% WR after 8 hours of training
  • 5m_vs_6m got 72% WR after 8 hours of training

Plots:

FPS in these plots is calculated on per env basis except MMM2 (it was scaled by number of agents which is 10), to get a win rate per number of environmental steps info, the same as used in plots in QMIX, MAVEN, QTRAN or Deep Coordination Graphs (https://arxiv.org/pdf/1910.00091.pdf) papers FPS numbers under the horizontal axis should be devided by number of agents in player's team.

  • 2m_vs_1z: 2m_vs_1z
  • 3s5z_vs_3s6z: 3s5z_vs_3s6z
  • 3s_vs_5z: 3s_vs_5z
  • corridor: corridor
  • 5m_vs_6m: 5m_vs_6m
  • MMM2: MMM2

Link to the continuous results

Currently Implemented:

  • DQN
  • Double DQN
  • Dueling DQN
  • Noisy DQN
  • N-Step DQN
  • Categorical
  • Rainbow DQN
  • A2C
  • PPO

Tensorflow implementations of the DQN atari.

  • Double dueling DQN vs DQN with the same parameters

alt text Near 90 minutes to learn with this setup.

  • Different DQN Configurations tests

Light grey is noisy 1-step dddqn. Noisy 3-step dddqn was even faster. Best network (configuration 5) needs near 20 minutes to learn, on NVIDIA 1080. Currently the best setup for pong is noisy 3-step double dueling network. In pong_runs.py different experiments could be found. Less then 200k frames to take score > 18. alt text DQN has more optimistic Q value estimations.

Other Games Results

This results are not stable. Just best games, for good average results you need to train network more then 10 million steps. Some games need 50m steps.

  • 5 million frames two step noisy double dueling dqn:

Watch the video

  • Random lucky game in Space Invaders after less then one hour learning:

Watch the video

A2C and PPO Results

  • More than 2 hours for Pong to achieve 20 score with one actor playing.
  • 8 Hours for Supermario lvl1

Watch the video

  • PPO with LSTM layers

Watch the video

alt text

About

RL implementations

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 64.8%
  • Jupyter Notebook 35.2%