0% found this document useful (0 votes)

21 views20 pages

Lecture 6 Deep Q Network and Its Variants 1 20

Uploaded by

jasper10063

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views20 pages

Lecture 6 Deep Q Network and Its Variants 1 20

Uploaded by

jasper10063

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CSIE5439 — Deep Reinforcement Learning

Deep Reinforcement Learning

Lecture 6 — Deep Q-Network and Its Variants

National Taiwan University

Department of Computer Science
and Information Engineering

Prof. Chun-Yi Lee

1
CSIE 5439 — Deep Reinforcement Learning

Outline
• Announcement
• Deep Q-Learning
• The Variants of DQN

2
CSIE5439 — Deep Reinforcement Learning

Announcement
• Assignment 1 deadline has been extended (due on 3/24 (Mon) 23:59)
• Assignment 2 has been released on NTU Cool (due on 4/7 (Mon)
23:59)
• Individual assignment
• Check the discussion forum on NTU Cool before posting your
question, as your question might have already been addressed there

3
CSIE 5439 — Deep Reinforcement Learning

Outline
• Announcement
• Deep Q-Learning
• The Variants of DQN

4
CSIE 5439 — Deep Reinforcement Learning

Game Playing: The Playground of Reinforcement Learning

Atari-2600: From Pixels to Performance
• Atari-2600, raw image inputs
• High-dimensional state space
• Embrace the success in deep neural networks (DNN)

Why Atari games are ideal test environments?

• Discrete, manageable action space (4-18 actions) simplifies the learning

problem
• Clear, immediate reward signals through game scores provide
unambiguous feedback
• Standardized environments enable reliable comparison of different (210 × 160 × 3)
algorithms
5
CSIE 5439 — Deep Reinforcement Learning

Recall: Q-Learning
The Foundation of Value-Based RL !

．Q-learning with a function approximator

．Choose the next action using the greedy policy for the next state
．Algorithm :
1. Collect data samples {st, at, rt, st+1} by a policy π
2. Calculate the update target: yi = rt + γ max Qθ(st+1, a)
a
3. Update parameters: Δw = w − α ∇Qθ(st, at)(yi − Qw(st, at))
6
CSIE 5439 — Deep Reinforcement Learning

The Data Challenge in Q-Learning

Addressing Correlation in Sequential Decision Making

．First step : Collect data samples {st, at, rt, st+1} by a policy π

{st, at, rt, st+1} → {st+1, at+1, rt+1, st+2} → . . .

Strongly correlated data

．Issue: correlated data may not be good for training DNNs

7
CSIE 5439 — Deep Reinforcement Learning

The Hidden Challenge of Q-Learning

Breaking Sequential Dependencies in Reinforcement Learning

How RL Data Differs from Supervised Learning?

• Sequential dependency: RL samples are temporally correlated

while supervised learning assumes i.i.d. data
• Policy-dependent distribution: The agent's improving policy
continuously shifts the data distribution
• Delayed rewards: Value of actions may not be apparent until
many steps later, unlike immediate labels in supervised learning
8
CSIE 5439 — Deep Reinforcement Learning

Deep Q-Network (DQN)

The Pioneering Algorithm using DNNs in RL "
• The breakthrough that changed the field
• First introduced by DeepMind as a paper: “Human-level control through
deep reinforcement learning” (Link)

9
CSIE 5439 — Deep Reinforcement Learning

The DQN Innovation Toolkit

Key Components That Enabled Success

• Parameterize the Q-function with a DNN

• Enhance data-efficiency by an experience replay buffer
• Enhance the performance by introducing two Q-functions

• Qθ is the learning Q-function

• Qθ− is the target Q-function
• Modification of the input state representations
10
CSIE 5439 — Deep Reinforcement Learning

Deep Q-Network (DQN)

The Overall Framework of DQN

Stacked Action Outputs of

Grayscale Images The Game

11
CSIE 5439 — Deep Reinforcement Learning

The Concept of the Experience Replay Buffer

The Memory of Reinforcement Learning
• How agents remember and learn from past experiences?
• Regularly storing data into a buffer
Replay Buﬀer
{st, at, rt, st+1}

12
CSIE 5439 — Deep Reinforcement Learning

Store, Replace, and Repeat

Update the Contents of the Experience Replay Buffer
• Managing memory for efficient learning (typically FIFO)
• Regularly storing data into a buffer
When buffer is full
Replay Buffer
{st, at, rt, st+1}

13
CSIE 5439 — Deep Reinforcement Learning

Why Experience Replay Matters

Breaking the Correlation Curse ⚡

Three Key Advantages That Revolutionized Deep RL

• For sampling $: For each timestep, sample multiple

experience transitions from the replay buffer

• For reducing correlation %: Reduce the chances of

correlated data (i.e., the data samples are not sequential)

• For efficiency &: Data samples can be used multiple times

(high data-efficiency)
14
CSIE 5439 — Deep Reinforcement Learning

The Concept of the Target Network

Stabilizing the Learning Process '

• Training with two Q-functions

Agent
Modify the training target yi
Learning
Network as rt + γ max Qθ −(st+1, a)
a

Target
Network
yi
15
CSIE 5439 — Deep Reinforcement Learning

Why Two Networks Provides Stability

Update Mechanism of the Target Network
• Periodically update the target network by directly copying the parameters from those
of the training network

• Hard approach : Copy the parameter directly θ− ← θ

• Soft approach : Copy gradually θ − ← τθ − + (1 − τ)θ
• Benefit: Stabilize the training target (i.e., target does note always change)
• Objective: Stabilize the training process
• Other key enhancements:
• Frame skipping
• Stacked input frames 16
CSIE 5439 — Deep Reinforcement Learning

Deep Q-Network (DQN)

The Entire Workflow of the DQN Framework

{st, at, rt, st+1} × N

Replay Buﬀer {st , a

t , rt , s
t+1 }

at Agent

st, rt
17
CSIE 5439 — Deep Reinforcement Learning

Deep Q-Network (DQN)

The Pseudo Code of DQN (

for update the target network θ − ←θ

for store data samples {st, at, rt, st+1} collected by the policy π into the
experience replay buffer Z
Sample N data entries from the experience replay buffer Z
Derive the update target and update the parameters
yi = rt + γ max Qθ−(st+1, a)
a
1 N
∑
θ=θ−α ∇Qθ(si, ai)(yi − Qθ(si, ai))
N i
18
CSIE 5439 — Deep Reinforcement Learning

Outline
• Announcement
• Deep Q-Learning
• The Variants of DQN

19
CSIE 5439 — Deep Reinforcement Learning

DQN Variants
Multiple Variants for Improving DQN

• Double DQN
• Dueling DQN
• Prioritized Experience Replay for DQN
• Deep Recurrent Q-Network (DRQN)
• Rainbow DQN

Lecture 6 Deep Q Network and Its Variants
No ratings yet
Lecture 6 Deep Q Network and Its Variants
59 pages
18 Deeprl
No ratings yet
18 Deeprl
19 pages
Lecture 6 Deep Q Network and Its Variants 21 40
No ratings yet
Lecture 6 Deep Q Network and Its Variants 21 40
20 pages
RLDL PBL AmriteshChandra 09411503121
No ratings yet
RLDL PBL AmriteshChandra 09411503121
15 pages
Deep Q-Network
No ratings yet
Deep Q-Network
15 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Case Study C Neww
No ratings yet
Case Study C Neww
12 pages
CH5 - Function Approximation
No ratings yet
CH5 - Function Approximation
33 pages
Lecture 6 Deep Q Network and Its Variants 41 59
No ratings yet
Lecture 6 Deep Q Network and Its Variants 41 59
19 pages
Deep Deformable Q-Network An Extension of Deep Q-Network
No ratings yet
Deep Deformable Q-Network An Extension of Deep Q-Network
4 pages
Report
No ratings yet
Report
11 pages
Nature 14236
No ratings yet
Nature 14236
13 pages
Human-Level Control Through Deep Reinforcement Learning
No ratings yet
Human-Level Control Through Deep Reinforcement Learning
13 pages
Introduction To Deep Reinforcement Learning
No ratings yet
Introduction To Deep Reinforcement Learning
7 pages
15 Deep Reinforcement Learning v24.2
No ratings yet
15 Deep Reinforcement Learning v24.2
115 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
DQN Atari
No ratings yet
DQN Atari
26 pages
Doom AI
No ratings yet
Doom AI
7 pages
RL Report
No ratings yet
RL Report
37 pages
MIT 6.S191: Deep Reinforcement Learning
100% (4)
MIT 6.S191: Deep Reinforcement Learning
48 pages
Playing FPS Games With Deep Reinforcement Learning: Guillaume Lample, Devendra Singh Chaplot
No ratings yet
Playing FPS Games With Deep Reinforcement Learning: Guillaume Lample, Devendra Singh Chaplot
7 pages
Q Learning
No ratings yet
Q Learning
187 pages
Yang 20 A
No ratings yet
Yang 20 A
4 pages
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
No ratings yet
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
13 pages
Untitled Document
No ratings yet
Untitled Document
11 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Introduction To Deep Q-Network (DQN) : by Divyansh Pandit
No ratings yet
Introduction To Deep Q-Network (DQN) : by Divyansh Pandit
10 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
Training A Deep Q Network DQN Agent To Play MS Pacman
No ratings yet
Training A Deep Q Network DQN Agent To Play MS Pacman
10 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Deep Reinforcement Learning Overview
No ratings yet
Deep Reinforcement Learning Overview
9 pages
4b - Deep Reinforcement Learning
No ratings yet
4b - Deep Reinforcement Learning
29 pages
Deep Q-Learning with Python Guide
No ratings yet
Deep Q-Learning with Python Guide
12 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Deep Reinforcement Learning Mohit Sewak
No ratings yet
Deep Reinforcement Learning Mohit Sewak
6 pages
Lecture 7
No ratings yet
Lecture 7
52 pages
DQN Implementation Insights and Techniques
No ratings yet
DQN Implementation Insights and Techniques
9 pages
AI Plays Geometry Dash
No ratings yet
AI Plays Geometry Dash
7 pages
What Is TD Learning
No ratings yet
What Is TD Learning
15 pages
Deep Reinforcement Learning With Quantum-Inspired Experience Replay Qing Wei, Hailan Ma, Chunlin Chen, Member, IEEE, Daoyi Dong, Senior Member, IEEE
No ratings yet
Deep Reinforcement Learning With Quantum-Inspired Experience Replay Qing Wei, Hailan Ma, Chunlin Chen, Member, IEEE, Daoyi Dong, Senior Member, IEEE
12 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
RADL LACuong
No ratings yet
RADL LACuong
81 pages
1602 02672 PDF
No ratings yet
1602 02672 PDF
10 pages
Deep Reinforcement Learning Overview
No ratings yet
Deep Reinforcement Learning Overview
52 pages
Slidedeck 8 MAS 2021 22 RL 4 Policy Grad DQN
No ratings yet
Slidedeck 8 MAS 2021 22 RL 4 Policy Grad DQN
34 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
Efficient Deep Reinforcement Learning for Game Strategy
No ratings yet
Efficient Deep Reinforcement Learning for Game Strategy
12 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
Report
No ratings yet
Report
3 pages
RL Algorithms in Gymnasium
No ratings yet
RL Algorithms in Gymnasium
59 pages
CS181 Final Project
No ratings yet
CS181 Final Project
5 pages
Deep Reinforcement Learning Handout v2.0
0% (1)
Deep Reinforcement Learning Handout v2.0
6 pages
2 4+Advanced+Tricks+for+DQNs
No ratings yet
2 4+Advanced+Tricks+for+DQNs
82 pages
Ref 11
No ratings yet
Ref 11
12 pages
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
No ratings yet
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
8 pages
Final Seminar Cognitive Computing
100% (1)
Final Seminar Cognitive Computing
24 pages
MBA Business Analytics with R
No ratings yet
MBA Business Analytics with R
3 pages
Shri Sai Institute of Technology, Aurangabad: "Case Study of Secure Computing: Achievements & Trends."
No ratings yet
Shri Sai Institute of Technology, Aurangabad: "Case Study of Secure Computing: Achievements & Trends."
16 pages
Shanti-Mental Health Chatbot
No ratings yet
Shanti-Mental Health Chatbot
11 pages
CNN-Powered Garbage Detection
No ratings yet
CNN-Powered Garbage Detection
5 pages
Professional Ethics in IT - Module 1
No ratings yet
Professional Ethics in IT - Module 1
12 pages
AI in The Workplace
No ratings yet
AI in The Workplace
5 pages
Systems Design Engineering
No ratings yet
Systems Design Engineering
10 pages
NHAI Data Lake Project Management FAQs
No ratings yet
NHAI Data Lake Project Management FAQs
9 pages
Games and Artificial Intelligence #04 - Assignment Presentations, Overview of AI in Games
No ratings yet
Games and Artificial Intelligence #04 - Assignment Presentations, Overview of AI in Games
25 pages
5gym Sched 11
No ratings yet
5gym Sched 11
14 pages
KPMG Creating Value With Ai Agents
No ratings yet
KPMG Creating Value With Ai Agents
5 pages
Power Optimization Research Paper
No ratings yet
Power Optimization Research Paper
7 pages
IA Error Codes
No ratings yet
IA Error Codes
14 pages
Theoretical Framework
No ratings yet
Theoretical Framework
7 pages
Huawei ICT Competition 2024-2025 Exam Outline - Cloud Track
No ratings yet
Huawei ICT Competition 2024-2025 Exam Outline - Cloud Track
1 page
CB Insights Tech Market Map AI Lifecycle Management in Enterprise IT
No ratings yet
CB Insights Tech Market Map AI Lifecycle Management in Enterprise IT
22 pages
Generative AI Foundations Certificate Brochure
No ratings yet
Generative AI Foundations Certificate Brochure
11 pages
The Use of Artificial Intelligence With Human in The Loop Will Allow For The Reduction in Size and Increase in Mobility of Staffs, As Well As Ensure A Very High Speed of The Decision-Making Process
No ratings yet
The Use of Artificial Intelligence With Human in The Loop Will Allow For The Reduction in Size and Increase in Mobility of Staffs, As Well As Ensure A Very High Speed of The Decision-Making Process
99 pages
Mis Slides
No ratings yet
Mis Slides
168 pages
JNTUK B.Tech 2-2 Sem (R20) 1st Mid Exam Time Table April 2022
No ratings yet
JNTUK B.Tech 2-2 Sem (R20) 1st Mid Exam Time Table April 2022
6 pages
What Is Digital Transformation p1
No ratings yet
What Is Digital Transformation p1
1 page
EBSCO GBM Reimagine Talent Practices To Unlock Human Potential 23 Juli 2024
No ratings yet
EBSCO GBM Reimagine Talent Practices To Unlock Human Potential 23 Juli 2024
4 pages
Unit 3
No ratings yet
Unit 3
21 pages
Machine Learning
No ratings yet
Machine Learning
99 pages
HONOR Earbuds 3 Pro Launch Offer
No ratings yet
HONOR Earbuds 3 Pro Launch Offer
4 pages
Facial Age and Gender Prediction Using Deep Learning
No ratings yet
Facial Age and Gender Prediction Using Deep Learning
6 pages
Best Free Online - Cybersecurity Certifications 2025
No ratings yet
Best Free Online - Cybersecurity Certifications 2025
12 pages
(PROMPTS) Audaces Sofia Guide For Command Creation
No ratings yet
(PROMPTS) Audaces Sofia Guide For Command Creation
1 page
UTBK - Module 4 For Students
No ratings yet
UTBK - Module 4 For Students
7 pages

Lecture 6 Deep Q Network and Its Variants 1 20

Uploaded by

Lecture 6 Deep Q Network and Its Variants 1 20

Uploaded by

CSIE5439 — Deep Reinforcement Learning

Deep Reinforcement Learning

National Taiwan University

Prof. Chun-Yi Lee

Game Playing: The Playground of Reinforcement Learning

Why Atari games are ideal test environments?

• Discrete, manageable action space (4-18 actions) simplifies the learning

．Q-learning with a function approximator

The Data Challenge in Q-Learning

{st, at, rt, st+1} → {st+1, at+1, rt+1, st+2} → . . .

．Issue: correlated data may not be good for training DNNs

The Hidden Challenge of Q-Learning

How RL Data Differs from Supervised Learning?

• Sequential dependency: RL samples are temporally correlated

Deep Q-Network (DQN)

The DQN Innovation Toolkit

• Parameterize the Q-function with a DNN

• Qθ is the learning Q-function

Deep Q-Network (DQN)

Stacked Action Outputs of

The Concept of the Experience Replay Buffer

Store, Replace, and Repeat

Why Experience Replay Matters

Three Key Advantages That Revolutionized Deep RL

• For sampling $: For each timestep, sample multiple

• For reducing correlation %: Reduce the chances of

• For efficiency &: Data samples can be used multiple times

The Concept of the Target Network

• Training with two Q-functions

Why Two Networks Provides Stability

• Hard approach : Copy the parameter directly θ− ← θ

Deep Q-Network (DQN)

{st, at, rt, st+1} × N

Replay Buﬀer {st , a

Deep Q-Network (DQN)

for update the target network θ − ←θ

You might also like