Control Meets Learning
Control Meets Learning
Introduction
• See key role for tradi.onal planning/control in finding globally
op.mal policies using tradi.onal control, but some systems…
• Difficult/impossible to model from first principles
• Result in very complex models
• Curse of dimensionality in dynamic programming
• Yield poor performance on the real-world system [Abbeel06]
http://www.nvidia.com/content/events/geoInt2015/LBrown_DL_Image_ClassificationGEOINT.pdf
How (AA, MIT) 2
Control of a Quadrotor with RL
Recent Progress
• Learning-based algorithms are state-of-the-art in many
complex robotic perception and motion-planning tasks
• In many cases these techniques outperform traditional
model-based techniques Learning Quadrupedal Locomotion
• Perhaps more significantly, they often require Human-level control in Atari
over Challenging Terrain
far less domain knowledge/expertise games [Minh15]
BADGR: An Autonomous Self- Socially Aware Motion Planning Super-Human Performance in Gran
Supervised Learning-Based with Deep RL Turismo Sport Using Deep RL
Navigation System
https://www.youtube.com/watch?v=T0A9voXzhng
https://www.youtube.com/watch?v=9j2a1oAHDL8
https://www.youtube.com/watch?v=UtoZEwrDHj4
https://www.youtube.com/watch?v=Zeyv1bN9v4A
https://www.youtube.com/watch?v=CK1szio7PyA
Review article
a r t i c l e i n f o a b s t r a c t
Article history: Following in the footsteps of the renowned report “Control in an Information Rich World,” Report of the
Received 2 April 2017 Panel on “Future Directions in Control, Dynamics, and Systems” chaired by Richard Murray (2002), this
Accepted 2 April 2017
paper aims to demonstrate that Systems & Control is at the heart of the Information and Communi-
Available online 26 April 2017
cation Technologies to most application domains. As such, Systems & Control should be acknowledged
Keywords: as a priority by funding agencies and supported at the levels necessary to enable technologies address-
Systems & Control ing critical societal challenges. A second intention of this paper is to present to the industrials and the
Research challenges young research generation, a global picture of the societal and research challenges where the discipline
Critical societal challenges of Systems & Control will play a key role. Throughout, this paper demonstrates the extremely rich, cur-
rent and future, cross-fertilization between five critical societal challenges and seven key research and
innovation Systems & Control scientific challenges. This paper is authored by members of the IFAC Task
Road Map Committee, established following the 19th IFAC World Congress in Cape Town. Other experts
who authored specific parts are listed below.
© 2017 Elsevier Ltd. All rights reserved.
Contents
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Systems & Control: a rich history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. A glimpse into future and changing paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3. Organization of this paper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Advances in Systems & Control in the past fifteen years . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. The essential role of Systems & Control in meeting critical societal challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1. Transportation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2. Energy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3. Water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4. Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
∗
Corresponding author.
E-mail address: lamnabhi@l2s.centralesupelec.fr (F. Lamnabhi-Lagarrigue).
http://dx.doi.org/10.1016/j.arcontrol.2017.04.001
1367-5788/© 2017 Elsevier Ltd. All rights reserved.
• Safe learning
• Lack of safety guarantees can have catastrophic
consequences (e.g., loss of human life);
• How to guarantee/verify safety?
• Mul.agent Learning
• Scaling to larger teams
• Architectural design choices
• Simple models good to find globally optimal policies using traditional control
• Refine using learning-based control on more complex simulations
𝑣𝑐𝑚𝑑 ,
𝑟𝑐𝑚𝑑 , 𝛿
𝜓𝑐𝑚𝑑
Pure Velocity
MFRL Throttle
Pursuit Control 𝜔𝑐𝑚𝑑
PI Wheel Speed
Controller
Actions (wireless
link) 𝜔𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑
4 Prior Optimal
(Real Car)
Control (Sim)
4
Prior Optimal
(Real Car)
Control (Sim) 6 Learned Trajectory
Learned (Real(Sim)
Trajectory Car)
Vx ( ms )
0.84
Learned Trajectory (Real(Sim)
Learned Trajectory Car)
33 0.02
°0.80
3.0
Vy ( ms )
22 0
1.5
°1
(m)
0.0
yy (m)
°2
°1.5
11 4
s )
√˙ ( rad
102
50
0
00 °2
1
1
Start
Start Start
Start
00
±
°1
°1 °1
°1°1 00 11 22 3 4 55
xx (m)
(m) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
t (s)
„ „
• Training with less data (fewer samples) is possible, but how representative is
it of the real-world?
• Design of experiments
• Extensions to MFRL ideas beyond Gaussian process models
Real world
adversarial
example
• Safety?
• Cer.fica.on?
17
Safe & Robust Learning
• NNs in feedback loop date back to 1960s, popular again in 1990s
• Previously NNs for control è Now: deep NNs used for perception, planning, control, modeling
[Goodfellow14]
• Motivated new field in computer vision/machine [Eykholt17]
learning communities of neural network verification
Fast Exact
• Analysis issues:
• Efficiently and effectively capturing the input uncertainty set
• Computation time (real time) and scalability
• Synthesis issues:
• Are analysis methods fast enough to be embedded in
design process?
• Typically require millions of analysis steps for synthesis
• How much effort to exert on robustness in
training given the use of a real-time analysis
Usefulness,
"Dog"
Difficulty
Future of Robust Learning Time, Robust Synthesis
Future of Robust Learning
N MC Samples
Robust Deep RL
With Partitioning
[Hu20]
Connected and
Robot soccer Starcraft Texas hold'em poker
autonomous vehicles
How (AA, MIT) 28
Multiagent Reinforcement Learning
• Non-stationarity is a core issue in MARL
• Difficult to converge (e.g., to an equilibrium)
• Challenging to learn an effective policy
• Standard mitigation: Centralized training
with decentralized execution (CTDE)
• Centralized alg. can have different structures [Yang20]
• Actor-critic methods generalize to mixed cooperative-competitive settings Q-DPP
• Current value-based methods mostly used for cooperative settings
How (AA, MIT) MADDPG [Lowe17] QMIX [Rashid18] Spectrum of cooperative MARL methods [Yang20] 29
Multiagent Reinforcement Learning
• Centralized learning with decentralized execution (CTDE) has achieved great
performance in challenging domains
• Soccer [Liu19], Starcraft [Mahajan19], and Hide and seek [Baker20]
Meta-PG [Al-Shevidat18]
Meta-MAPG [Kim20]
• Adap.ve interac.on architecture search that can dynamically expand and shrink
depending on scenario
• Thanks to Dr. Kasra Khosoussi, Dr. Kaveh Fathian, Dr. Michael Everett,
Dr. Chuangchuang Sun, Dr. Golnaz Habibi, and Dong-Ki Kim
• Research funded in part by ONR, AFOSR, ARO, ARL, DARPA, Boeing, Ford, Lockheed,
IBM, NGS, and AWS