10 Deep Reinforcement

The document discusses deep reinforcement learning (RL) for time series decision-making, outlining key concepts such as the agent-environment interaction, exploration-exploitation dilemma, and various RL algorithms including Q-learning and DQN. It emphasizes the application of these methods in trading actions for stock price predictions and the use of experience replay for stable learning. Additionally, it introduces advanced architectures like DDPG and DDQN to improve performance and reduce overestimations in decision-making tasks.

Uploaded by

Matei Dinu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views40 pages

10 Deep Reinforcement

Uploaded by

Matei Dinu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Deep reinforcement learning

for time series decision making

Ruxandra Stoean
Further bibliography
 R. S. Sutton, A. G. Barto, Reinforcement Learning, second edition: An Introduction
(Adaptive Computation and Machine Learning series), 2018
 M. Lapan, Deep Reinforcement Learning Hands-On: Apply modern RL methods to
practical problems of chatbots, robotics, discrete optimization, web automation, and
more, 2nd Edition, 2018
 L. Graesser, W. L. Keng, Foundations of Deep Reinforcement Learning: Theory and
Practice in Python, 2019
 M. Morales, Grokking Deep Reinforcement Learning, 2020
 W. B. Powell, Reinforcement Learning and Stochastic Optimization: A Unified
Framework for Sequential Decisions, 2022
Reinforcement learning
 A learning paradigm different from
 Supervised learning
 Associate input to output in labeled data
 Unsupervised learning
 Find patterns in unlabeled data

 Reinforcement learning
 An agent in an initial state in an environment
 Loop until reach target
 Experience: take actions -> move to next state
 Get reward from the environment
 Maximize cumulative reward
 Exploit-explore information
 Perform several such episodes (similar to epochs in neural networks)
Concepts 1/3
 Action taken by the agent in the environment

 Environment response to the agent

 Reward (Value): feedback to reinforce behavior
 State: the state changes for the agent as a consequence of its action

 Loop until terminal state is reached

 Reach destination
 Obtain a maximal reward
 A number of time steps is reached
 Game over
Environment
 Deterministic
 State transition and reward are deterministic functions
 The reward for a same action in a given state is always the same
 The specific action in the particular state determines the same next state every time

 Stochastic
 The reward and the transition to a new state after a same action may not be the same as
in a previous encounter
Concepts 2/3
 Policy (π): the strategy followed by the agent in its quest
 Optimal, when it maximizes the value
 Value function
 The expected value (reward) of a state s if the agent follows the policy π
 The state-value for a policy
 Q-value (quality-value) function
 The value of the long-time gain if the agent in a state takes the action a and follows the
policy π
 Action-value function for a policy
 Temporal difference (TD)
 Computes the estimated value of a state for the policy π, based on the reward received by
the agent and the value of the next state
Exploration-Exploitation Dilemma
 Exploitation
 Take the best learned action, with the maximum expected reward at a given state
 Exploration
 Take a random action, without taking rewards into account
 Trade-off between exploitation and exploration
 Exploitation only: get stuck into local optimum
 Exploration only: large time to discover all the information
 The ε-greedy policy
 Random action is selected with probability ε
 Optimal action with 1-ε probability
 On- and off-policy approaches
On- versus off-policy
 On - SARSA (State-Action-Reward-State-Action)
 Employs the ε-greedy policy
 To estimate the Q-value, it takes the next action a’ in next state s’ using the same strategy
 target(s’) = R(s, a, s’) + γQk(s’, a’)
 Qk+1(s, a) = (1- α)Qk(s, a) + α[target(s’)]
 Off – Q-learning
 ε-greedy policy
 And, to estimate the Q-value, it uses a max greedy target policy for the best action (with
the maximum value) in the next state s’
 target(s’) = R(s, a, s’) + γmaxa’Qk(s’, a’) (Bellman equation)
 Qk+1(s, a) = (1- α)Qk(s, a) + α[target(s’)]
 Alternative formulation: Qk+1(s, a) = Qk(s, a) + α[R(s, a, s’) + γmaxa’Qk(s’, a’) – Qk(s, a)]
Concepts 3/3
 Learning rate α
 Values in [0,1]
 A value of 0 leads to no learning
 A value of 0.9 leads to very fast learning

 The discount factor γ

 Also in [0, 1]
 Makes further rewards count less than immediate ones

 ε-decay
 Initially high ε, then value is decreased to allow less random actions
Q-learning
 Model-free RL approach
 Trial-and-error algorithm, learning from action-outcome as it goes through the
environment
 It does not construct an internal model

https://www.baeldung.com/cs/reinforcement-learning-neural-network
Tabular (exact) Q-learning Algorithm
 Initialize Q0(s, a) for all states and actions (by 0)
 Repeat
 Initialize state s
 For k = 1, 2, …
 Sample an action a according to policy
 Execute a and get next state s’
 If s’ is terminal
 target(s’) = R(s, a, s’) (reward of transition)
 Else
 target(s’) = R(s, a, s’) + γmaxa’Qk(s’, a’)
 Update Qk+1(s, a) = (1- α)Qk(s, a) + α[target(s’)] to be closer to the target
 s = s’
 Until number of episodes reached
Example
Q function parameterized by a function approximator
 Q values computed by e.g. neural network (deep learning) -> get parameters θ of
the Q function; initially random weights
 Iterative regression -> fit Q-values to the computed targets
 Optimizing squared loss function

 Problem: non-stable targets, catastrophic forgetting

1. Q values for a state and action will not remain stationary as before, as the neural
network generalize between states
2. Large swings in state distributions
Approximate Q-learning Algorithm
 Initialize Q0(s, a) for all states and actions (by 0)
 Repeat
 Initialize state s
 For k = 1, 2, …
 Sample an action a according to policy
 Execute a and get next state s’
 If s’ is terminal
 target(s’) = R(s, a, s’)
 Else
 target(s’) = R(s, a, s’) + γmaxa’Qk(s’, a’)
 Gradient update on the function approximator θk+1 = θk - α∇θEs’[(Qθ(s, a) – target(s’))2] θ=θk
 s = s’
 Until number of episodes reached (complete passes of the data)
DQN Algorithm
 Transform Q-learning to a supervised learning task
1. Experience replay buffer
 Take action - get reward - go to next state and store each transition in the buffer
 Online single learning update replaced with applying batch update - sampling mini-
batch of past transitions from the buffer -> a stabler update
 Data distribution is more stationary
 Steadier learning
2. Save a copy of the weights fixed for some time to compute the target function
(target network), instead of using the current weights γmaxa’Qk(s’, a’, θ-)
Example: Trading actions in stock time series
 Problem
 Given a historical stock price time series, decide on best trading action
 BUY
 SELL
 HOLD

 Could be solved through a recurrent architecture (LSTM, GRU) to estimate the

stock price evolution
 Take the estimations and formulate a separate optimization problem to determine
the best trading actions per time step, e.g. evolutionary algorithms
State
representation
definition
State representation
Portofolio performance
Final plots: transaction history
Final plots: returns across RL episodes
Agent definition
Deep model architecture
Reset, remember transition, take action
Experience replay
Initialize Agent, import data, define actions
Hold, Buy, Sell actions
Logs
RL loop
Predict action from state and execute it
Compute reward
Call experience buffer
 In practice, for the experience memory, a deque structure is used, which is
larger than the batch on which the model is trained
 Updates when replay buffer length is larger than batch_size threshold
 New memories are pushed in and older ones are taken out from the deque
Save model at each episode and plot returns
across episodes
Evaluation
stage on
test data
Take model at episode 10 and try to get a
portofolio different from 0
Trading actions and their plot
Return by episode
Trading decisions on test data
Further deep RL architectures to avoid
overestimations
 DDPG (Deep Deterministic Policy Gradient)
 Combines DQN with DPG (Deterministic Policy Gradient)
 An actor-critic method (two neural networks)
 The actor is a deterministic policy network to determine the action
 The critic estimates the Q-value
 DDQN (Double DQN)
 Two networks: a DQN and a Target Network
 The DQN selects the best action with maximum Q-value for the next state
 The target network calculates the estimated Q-value for the action selected

Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Deep Reinforcement Learning Overview
No ratings yet
Deep Reinforcement Learning Overview
52 pages
Q Learing
No ratings yet
Q Learing
30 pages
15 Deep Reinforcement Learning v24.2
No ratings yet
15 Deep Reinforcement Learning v24.2
115 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
RL Algorithms in Gymnasium
No ratings yet
RL Algorithms in Gymnasium
59 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
Overview of Reinforcement Learning
No ratings yet
Overview of Reinforcement Learning
17 pages
Sections
No ratings yet
Sections
76 pages
37 RL
No ratings yet
37 RL
18 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
56 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Learning Task
No ratings yet
Learning Task
14 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
Basics of Reinforcement Learning
No ratings yet
Basics of Reinforcement Learning
15 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Unit 1
No ratings yet
Unit 1
18 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Unit 3 Ai
No ratings yet
Unit 3 Ai
5 pages
ml4r 2025 05
No ratings yet
ml4r 2025 05
22 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Deep Q-Learning with Python Guide
No ratings yet
Deep Q-Learning with Python Guide
12 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
No ratings yet
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
46 pages
RL Concepts and Methods
100% (1)
RL Concepts and Methods
8 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
No ratings yet
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Deep Reinforcement Learning: 1 Notation
No ratings yet
Deep Reinforcement Learning: 1 Notation
9 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
RL RS-Unit - 3
No ratings yet
RL RS-Unit - 3
6 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
23 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
TD Learning & Deep Q-Networks
No ratings yet
TD Learning & Deep Q-Networks
20 pages
CSE 445 - Lecture 9 - Reinforcement Learning
No ratings yet
CSE 445 - Lecture 9 - Reinforcement Learning
45 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Tema Lab 1
No ratings yet
Tema Lab 1
1 page
9 Deep Leaning RNN
No ratings yet
9 Deep Leaning RNN
64 pages
Tema 2
No ratings yet
Tema 2
23 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
5 2 Ensemble Learning
No ratings yet
5 2 Ensemble Learning
38 pages
5 1 Decision Trees
No ratings yet
5 1 Decision Trees
34 pages
Template Dissertation 20240614
No ratings yet
Template Dissertation 20240614
39 pages
Model-Free Control in Reinforcement Learning
No ratings yet
Model-Free Control in Reinforcement Learning
43 pages
5 - Policy Gradient Methods
No ratings yet
5 - Policy Gradient Methods
57 pages
A Concise Introduction To Reinforcement Learning: February 2018
No ratings yet
A Concise Introduction To Reinforcement Learning: February 2018
12 pages
2018 TS Optimization 34
No ratings yet
2018 TS Optimization 34
17 pages
Reinforcement Learning in Games
No ratings yet
Reinforcement Learning in Games
15 pages
DLMAIRIL01 Q4-2024 Session4
No ratings yet
DLMAIRIL01 Q4-2024 Session4
80 pages
Approximate Dynamic Programming Solving the Curses of Dimensionality Second Edition Warren B. Powell(Auth.) full ebook release
No ratings yet
Approximate Dynamic Programming Solving the Curses of Dimensionality Second Edition Warren B. Powell(Auth.) full ebook release
69 pages
Graph-Based Multi-Agent Traffic Control
No ratings yet
Graph-Based Multi-Agent Traffic Control
13 pages
Unit1 Types of ML
No ratings yet
Unit1 Types of ML
25 pages
Bca ML I
No ratings yet
Bca ML I
26 pages
Slides Active Flow Control Deep Reinforcement Learning
No ratings yet
Slides Active Flow Control Deep Reinforcement Learning
46 pages
Using Double Deep Q-Learning To Learn Attitude Control of Fixed-Wing Aircraft
No ratings yet
Using Double Deep Q-Learning To Learn Attitude Control of Fixed-Wing Aircraft
6 pages
Is ZC444 Ec-3r Second Sem 2021-2022
No ratings yet
Is ZC444 Ec-3r Second Sem 2021-2022
3 pages
Machine Learning in Finance
No ratings yet
Machine Learning in Finance
23 pages
Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change
No ratings yet
Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change
13 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Ahmed 2022
No ratings yet
Ahmed 2022
7 pages
Report
No ratings yet
Report
4 pages
Advanced Reinforcement Learning
No ratings yet
Advanced Reinforcement Learning
46 pages
A Review of Reinforcement Learning Based Intelligent Optimization For Manufacturing Scheduling
No ratings yet
A Review of Reinforcement Learning Based Intelligent Optimization For Manufacturing Scheduling
14 pages
DeepTPI: TPI via Deep Reinforcement Learning
No ratings yet
DeepTPI: TPI via Deep Reinforcement Learning
10 pages
Search (Uninformed + Informed)
No ratings yet
Search (Uninformed + Informed)
8 pages
Dunham Alves Tetris Game Playing
No ratings yet
Dunham Alves Tetris Game Playing
4 pages
Reinforcement Learning in Reliability and Maintenance Optimization - A Tutorial
100% (1)
Reinforcement Learning in Reliability and Maintenance Optimization - A Tutorial
16 pages
Artificial Intelligent Based Electric Vehicle Monitoring System Using IoT
No ratings yet
Artificial Intelligent Based Electric Vehicle Monitoring System Using IoT
42 pages
RL 3
No ratings yet
RL 3
31 pages
AI Rule Generation Framework
No ratings yet
AI Rule Generation Framework
33 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
Syl6 ML
No ratings yet
Syl6 ML
3 pages
Adaptive Healthcare Decision Support Systems Using Reinforcement Learning
No ratings yet
Adaptive Healthcare Decision Support Systems Using Reinforcement Learning
10 pages