Q-learning formula #158

Unamu7simure · 2022-04-22T05:00:41Z

Q-learning formula (18.3.10) seems to be only for non-terminal states.
If St is one of terminal states (gold or traps), Q table should not be renewed and should keep the initial values (zeros).
Codes in the method _learn of the class Agent could be revised:
if done:
q_target = r
else:
q_target = r + self.gamma*np.max(q_table[next_s])
|-->q_table[s][a] += self.lr * (q_target - q_val)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q-learning formula #158

Q-learning formula #158

Unamu7simure commented Apr 22, 2022

Q-learning formula #158

Q-learning formula #158

Comments

Unamu7simure commented Apr 22, 2022