rl-resources/talks.md at master · sarma-chsaps/rl-resources · GitHub

Tutorial: Introduction to Reinforcement Learning with Function Approximation - Decemberr 2015 - Rich Sutton

On The Hardness of Reinforcement Learning With Value-Function Approximation

Building Reproducible, Reusable and Robust Deep RL Systems - Joelle Pineau

Reinforcement Learning on Hundreds of Thousands of Cores - Henrique Ponde de Oliveira Pinto - Open AI - scaling the Open AI DOTA agents

DOTA

co-ordination
imperfect info

180 years of games per day

100,000x CPU playing the game

100x GPU learning

These needs to be connected with a controller (Redis)

this holds the configs & parameters
single source of truth
can easily backup to disk

Use Lua scripts inside Redis

Tutorial: Introduction to Reinforcement Learning with Function Approximation

1:21:30

What causes instability?

Not learning / sampling

DP diverges (w/ function approx)

Not exploration

policy evaluation can diverge

Not non-linear functions

linear functions can diverge

Risk of divergence occurs when combining

function approximation
bootstrapping
off policy learning

Any two are OK - three not

Can we remove bootstrapping?

key to computational/data efficiency
introduces bias

1:28:25