Skip to content

Latest commit

 

History

History
54 lines (33 loc) · 1.43 KB

talks.md

File metadata and controls

54 lines (33 loc) · 1.43 KB

Tutorial: Introduction to Reinforcement Learning with Function Approximation - Decemberr 2015 - Rich Sutton

On The Hardness of Reinforcement Learning With Value-Function Approximation

Building Reproducible, Reusable and Robust Deep RL Systems - Joelle Pineau

Reinforcement Learning on Hundreds of Thousands of Cores - Henrique Ponde de Oliveira Pinto - Open AI - scaling the Open AI DOTA agents

DOTA

  • co-ordination
  • imperfect info

180 years of games per day

100,000x CPU playing the game

100x GPU learning

These needs to be connected with a controller (Redis)

  • this holds the configs & parameters
  • single source of truth
  • can easily backup to disk

Use Lua scripts inside Redis

Tutorial: Introduction to Reinforcement Learning with Function Approximation

1:21:30

What causes instability?

Not learning / sampling

  • DP diverges (w/ function approx)

Not exploration

  • policy evaluation can diverge

Not non-linear functions

  • linear functions can diverge

Risk of divergence occurs when combining

  1. function approximation
  2. bootstrapping
  3. off policy learning

Any two are OK - three not

Can we remove bootstrapping?

  • key to computational/data efficiency
  • introduces bias

1:28:25