PG Agents: Policy Gradient Algorithms with Tensorflow

The idea behind pg_agents is to provide an easy to understand python package containing the state the art policy gradient algorithms.

Implemented algorithms

VPG: Vanilla Policy Gradient Also known as REINFORCE
TNPG: Truncated Natural Policy Gradient Reformulation of the batch RL problem in terms of a contrained optimization problem
TRPO: Trust Region Policy Optimization Extension of TNPG to ensure robustness
GAE: Generalized Advantage Estimator Method to estimate the advantage function from experience. Helps to reduce the variance of the gradient estimator.
PPO: Proximal Policy Optimization Simple but efficient extension of VPG.