Pytorch reimplementation of ReDo (The Dormant Neuron Phenomenon in Deep Reinforcement Learning). The paper establishes the dormant neuron phenomenon, where over the course of training a network with nonstationary targets, a significant portion of the neurons in a deep network become dormant, i.e. their activations become minimal to compared the other neurons in the layer. This phenomenon is particularly prevalent in value-based deep reinforcement learning algorithms, such as DQN and its variants. As a solution, the authors propose to periodically check for dormant neurons and reinitialize them.
The score
$
A neuron is defined as
Every
- Check whether a neuron
$i$ is$\tau$ -dormant. - If a neuron
$i$ is$\tau$ -dormant:
Re-initialize input weights and bias of$i$ .
Set the outgoing weights of$i$ to$0~.$
These results were generated using 3 seeds on DemonAttack-v4. Note I was not using typical hyperparameters for DQN, but instead chose a hyperparameter set to exaggerate the dormant neuron phenomenon.
In particular:
- Updates are done every environment step instead of every 4 steps.
- Target network updates every 2000 steps instead of every 8000.
- Fewer random samples before learing starts.
-
$\tau=0.1$ instead of$\tau=0.025$ .
I've skipped running 10M or 100M experiments because these are very expensive in terms of compute.
Update 1:
Fixed and simplified the for-loop in the redo resets.
Udpate 2: The reset-check in the main function was on the wrong level and the re-initializations are now properly done in-place and work.
Update 3:
Adam moment step-count reset is crucial for performance. Else the Adam updates will immediately create dead neurons again.
Preliminary results now look promising.
Update 4: Fixed the outgoing weight resets where the mask was generated wrongly and not applied to the outgoing weights. See this issue. Thanks @SaminYeasar!
Paper:
@inproceedings{sokar2023dormant,
title={The dormant neuron phenomenon in deep reinforcement learning},
author={Sokar, Ghada and Agarwal, Rishabh and Castro, Pablo Samuel and Evci, Utku},
booktitle={International Conference on Machine Learning},
pages={32145--32168},
year={2023},
organization={PMLR}
}
Training code is based on cleanRL:
@article{huang2022cleanrl,
author = {Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and João G.M. Araújo},
title = {CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
journal = {Journal of Machine Learning Research},
year = {2022},
volume = {23},
number = {274},
pages = {1--18},
url = {http://jmlr.org/papers/v23/21-1342.html}
}
Replay buffer and wrappers are from Stable Baselines 3:
@misc{raffin2019stable,
title={Stable baselines3},
author={Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
year={2019}
}