This repository contains the implementation of CAPG ( with PPO and TRPO.
- Chainer v4.1.0
- ChainerRL latest master
- OpenAI Gym v0.9.4 with MuJoCo envs
Use requirements.txt to install dependencies.
pip install -r requirements.txt
# Run PPO with PG and CAPG for 1M steps
python --env Humanoid-v1
python --env Humanoid-v1 --use-clipped-gaussian
# Run TRPO with PG and CAPG for 10M steps
python --env Humanoid-v1 --steps 10000000
python --env Humanoid-v1 --steps 10000000 --use-clipped-gaussian
The figure below shows average returns of training episodes of TRPO with PG and CAPG, both of which are trained for 10M timesteps on Humanoid-v1. See the paper for more results.
author = {Fujita, Yasuhiro and Maeda, Shin-ichi},
booktitle = {ICML},
title = {{Clipped Action Policy Gradient}}
year = {2018}