The idea behind pg_agents is to provide an easy to understand python package containing the state the art policy gradient algorithms.
-
VPG: Vanilla Policy Gradient Also known as REINFORCE
-
TNPG: Truncated Natural Policy Gradient Reformulation of the batch RL problem in terms of a contrained optimization problem
-
TRPO: Trust Region Policy Optimization Extension of TNPG to ensure robustness
-
GAE: Generalized Advantage Estimator Method to estimate the advantage function from experience. Helps to reduce the variance of the gradient estimator.
-
PPO: Proximal Policy Optimization Simple but efficient extension of VPG.
- Robotschool Ant