Explainability in Deep Reinforcement Learning
Explainability in Deep Reinforcement Learning
Abstract
A large set of the explainable Artificial Intelligence (XAI) literature is emerging on feature relevance techniques
to explain a deep neural network (DNN) output or explaining models that ingest image source data. However,
assessing how XAI techniques can help understand models beyond classification tasks, e.g. for reinforcement
learning (RL), has not been extensively studied. We review recent works in the direction to attain Explainable
Reinforcement Learning (XRL), a relatively new subfield of Explainable Artificial Intelligence, intended to
be used in general public applications, with diverse audiences, requiring ethical, responsible and trustable
algorithms. In critical situations where it is essential to justify and explain the agent’s behaviour, better
explainability and interpretability of RL models could help gain scientific insight on the inner workings of
what is still considered a black box. We evaluate mainly studies directly linking explainability to RL, and
split these into two categories according to the way the explanations are generated: transparent algorithms
and post-hoc explainability. We also review the most prominent XAI works from the lenses of how they
could potentially enlighten the further deployment of the latest advances in RL, in the demanding present
and future of everyday problems.
Keywords: Reinforcement Learning, Explainable Artificial Intelligence, Machine Learning, Deep Learning,
Responsible Artificial Intelligence, Representation Learning
1. Introduction
During the past decade, Artificial Intelligence (AI), and by extension Machine Learning (ML), have seen
an unprecedented rise in both industry and research. The progressive improvement of computer hardware
associated with the need to process larger and larger amounts of data made these underestimated techniques
shine under a new light. Reinforcement Learning (RL) focuses on learning how to map situations to actions,
in order to maximize a numerical reward signal [1]. The learner is not told which actions to take, but instead
must discover which actions are the most rewarding by trying them. Reinforcement learning addresses the
problem of how agents should learn a policy that take actions to maximize the cumulative reward through
interaction with the environment [2].
Recent progress in Deep Learning (DL) for learning feature representations has significantly impacted
RL, and the combination of both methods (known as deep RL) has led to remarkable results in a lot of areas.
Typically, RL is used to solve optimization problems when the system has a very large number of states and
has a complex stochastic structure. Notable examples include training agents to play Atari games based
on raw pixels [3, 4], board games [5, 6], complex real-world robotics problems such as manipulation [7] or
grasping [8] and other real-world applications such as resource management in computer clusters [9], network
traffic signal control [10], chemical reactions optimization [11] or recommendation systems [12].
∗ Corresponding author
Email addresses: aheuillet@enseirb-matmeca.fr (Alexandre Heuillet), fcouthouis@ensc.fr (Fabien Couthouis),
natalia.diaz@ensta-paris.fr (Natalia Díaz-Rodríguez)
1 Equal contribution
2
Executives Managers, executive Assess regulatory compli- Causality, informativeness,
board members... ance, understand corpo- confidence
rate AI applications...
Regulation Regulatory enti- Certify model compliance Causality, informativeness,
ties/agencies with the legislation in confidence, fairness, pri-
force, audits, ... vacy awareness
A common example of evaluation of an application level or human level task is to evaluate the quality
of the mental model built by the user after seeing the explanation(s). Mental models can be described as
internal representations, built upon experiences, and which allow to mentally simulate how something works
in the real world. Hoffman et al. [20] propose to evaluate mental models by 1. Asking post-task questions
3
on the behavior of the agent (such as "How does it work?" or "What does it achieve?") and 2. Asking the
participants to make predictions on the agent’s next action. These evaluations are often done using Likert
scales.
We reviewed the state of the art on XRL and summarized it in Table 3. This table presents, for each
paper, the task(s) for which an explanation is provided, the employed RL algorithms (whose Algorithms
glossary can be found in the 6), and the provided type of explanations, i.e.: based on images, diagrams
(graphical components such as bar charts, plots or graphs), or text. We also present the level of the provided
explanation (local if it explains only predictions, global if it explains the whole model), and the audience
concerned by the explanation, as discussed in Section 1.1.
4
Explainable RL Through Games Both PG, DQN, Diagrams, Experts,
a Causal Lens [25] (OpenAI DDPG, A2C, Text (Local) Users,
benchmark SARSA Executives
and
Starcraft II)
Shapley Q-value: A Local Multiagents POMDP DDPG Diagrams Experts
Reward Approach to (Cooperative (Local)
Solve Global Reward Navigation,
Games [26] Prey-and-
Predator
and Traffic
Junction)
Dot-to-Dot: Explainable Robotics MDP DDPG, Diagrams Experts,
HRL For Robotic (grasping) HER, HRL (Global) Developers
Manipulation [27]
Self-Educated Language Instruction MDP Textual Text (Local) Experts,
Agent With HER For Following HER Users,
Instruction Following [28] (MiniGrid) Developers
Commonsense and Room POMDP - Text Experts
Semantic-guided navigation (Global)
Navigation [29]
Boolean Task Algebra Game (grid) MDP DQN Diagrams Experts
[30]
Visualizing and Games MDP A3C Images Experts,
Understanding Atari [31] (Pong, (Global) Users,
Breakout, Developers
Space
Invaders)
Interestingness Elements Arcade game POMDP Q-Learning Images Users
for XRL through (Frogger) (Local)
Introspection [32, 33]
Composable DRL for Robotics MDP Soft Diagrams Experts
Robotic Manipulation (pushing and Q-learning (Local)
[34] reaching)
Symbolic-Based Robotic POMDP HMM, PAA, Diagrams Experts
Recognition of Contact grasping K-means (Local)
States for Learning
Assembly Skills [35]
Safe Reinforcement Collision POMDP Monte Carlo Diagrams Experts
Learning with Model avoidance Dropout, (Local)
Uncertainty Estimates bootstrap-
[36] ping
5
explanations of an RL algorithm after its training, such as SHAP (SHapley Additive exPlanations) [15] or
LIME [14] for standard ML models. Reviewed papers are referenced by type of explanation in Figure 2.
Figure 1: Taxonomy of the reviewed literature identified for bringing explainability to RL models. References in orange, purple,
and light blue correspond to XAI techniques using images, text or diagrams, respectively.
6
variables and encodes them as separate low-dimensional features (generally using a Variational Autoencoders
[55]). It is also possible to make use of this concept, as well as lifelong learning to learn more interpretable
representations on unsupervised classification tasks. In addition, one could argue that learning through life
would allow compacting and updating old knowledge with new one while preventing catastrophic forgetting
[56]. Thus, this is a key concept that could lead to more versatile RL agents, being able to learn new tasks
without forgetting the previous ones. Information Maximizing Generative Adversarial Networks (InfoGAN)
[57] is another model based on the principles of learning disentangled representations. The noise vector
used in traditional GANs is decomposed into two parts: z: incompressible noise; and c: the latent code
used to target the salient semantic features of the data distribution. The main idea is to feed z and c to
the generator G, to maximize the mutual information between c and G(z, c), in order to assure that the
information contained in c is preserved during the generation process. As a result, the InfoGAN model is
able to create an interpretable representation via the latent code c (i.e., the values changing according to
shape and features of the input data).
Some work has been done to learn representations by combining symbolic AI with deep RL in order to
facilitate the use of background knowledge, the exploitation of learnt knowledge, and to improve generalization
[38, 58, 59, 60]. Consequently, it also improves the explainability of the algorithms, while preserving state-of-
the-art performance.
Zambaldi et al. [21] propose making use of Inductive Logic Programming and self-attention to represent
states, actions and policies using first order logic, using a mechanism similar to graph neural networks and
more generally, message passing computations [61, 62, 63, 64]. In these kind of models entity-entity relations
are explicitly computed when considering the messages passed between connected nodes of the graph as
shown in Fig. 2. Self-attention is used here as a method to compute interactions between these different
entities (i.e. relevant pixels in a RGB image for the example from [21]), and thus perform non-local pairwise
relational computations. This technique allows an expert to visualize the agent’s attention weights associated
to its available actions and interpret how to improve the understanding of its strategy.
Another work that aims to incorporate common sense to the agent, in terms of symbolic abstraction to
represent the problem, is in [22]. This method subdivides the world state representation into many sub-states,
with a degree of associated importance based on how far the object is from the agent. This helps understand
the relevance of the actions taken by the agent by determining which sub-states were chosen.
8
While reward decompositions help to understand the agent choice preferences between several actions,
minimal sufficient explanations are used to help selecting the most important reward decompositions. Other
works that facilitate the explainability of RL models by using reward-based losses for more interpretable RL
are in [65, 49, 39].
Figure 3: Left Reward Decompositions for DQN. Right Hybrid Reward Architecture (HRA) at cell (3,4) in Cliffworld. HRA
predicts an extra “gold” reward for actions which do not lead to a terminal state. Reproduced with permission of Zoe Juozapaitis
[24].
Figure 4: Top Minimal Sufficient Explanations (MSX) (fire down engine action vs. do nothing action) for decomposed reward
DQN in Lunar Lander environment near landing site. The shaping rewards dominate decisions. Bottom RDX (noop vs.
fire-main-engine) for HRA in Lunar Lander before a crash. The RDX shows that noop is preferred to avoid penalties such as
fuel cost. Reproduced with permission of Zoe Juozapaitis [24].
In the same vein, Madumal et al. [25] use the way humans understand and represent knowledge through
causal relationships and introduce an action influence model : a causal model which can explain the behaviour
agents using causal explanations. Structural causal models [66] represent the world using random variables,
some of which might have causal relationships, which can be described thanks to a set of structural equations.
9
In this work, structural causal models are extended to include actions as part of the causal relationships.
An action influence model is a tuple represented by the state-actions ensemble and the corresponding set of
structural equations. The whole process is divided into 3 phases:
• Defining the qualitative causal relationships of variables as an action influence model.
• Learning the structural equations (as multivariate regression models during the training phase of the
agent).
• Generating explanations, called explanans, by traversing the action influence graph (see Figure 5) from
the root to the leaf reward node.
This kind of models allow encoding cause-effect relations between events (actions and states) as shown
by the graph featured in Figure 5. Thus, they can be used to generate explanations of the agent behaviour
("why" and "why not" questions), based on knowledge about how actions influence the environment. Their
method was evaluated through a user study showing that, compared to video game playing without any
explanations and relevant variable explanations, this model performs significantly better on 1) task prediction
and 2) explanation goodness. However, trust was not shown to be significantly improved.
Figure 5: Action influence graph of a Starcraft II agent. The causal chain (explanation) for action As is depicted in bold arrows
and the extracted explanan (subset of causes given the explanation) is shown as darkened nodes. The counterfactual action (why
not Ab ) explanan is shown as grayed node (B). Here, As is the explanandum, the action for which the user needs explanation.
Thus, we can answer the question "Why not build_barrack (Ab )?". Indeed, the explanation provided by the graph in bold
arrows is: "Because it is more desirable to do action build_supply_depot (As ) to have more Supply Depots as the goal is to
have more Destroyed Units (Du ) and Destroyed Buildings (Db )". Reproduced with permission of [25].
Authors of [36] also learn explanations along with the model policy on pedestrians collision avoidance
tasks. In this paper, an ensemble of LSTM networks was trained using Monte Carlo Dropout [67] and
bootstrapping [68] to estimate collision probabilities and thus predict uncertainty estimates to detect novel
observations. The magnitude of those uncertainty estimates was shown to reveal novel obstacles in a variety
of scenarios, indicating that the model knows what it does not know. The result is a collision avoidance policy
that can measure the novelty of an observation (via model uncertainty) and cautiously avoids pedestrians
that exhibit unseen behavior. Measures of model uncertainty can also be used to identify unseen data during
training or testing. Policies during simulation demonstrated to be more robust to novel observations and
take safer actions than an uncertainty-unaware baseline. This work also responds to the problem of safe
reinforcement learning [69], whose goal is to ensure reasonable system performance and/or respect safety
constraints also at the deployment phase.
10
Some work has also been made to explain multiagent RL. Wang et al. [26] developed an approach
named Shapley Q-values Deep Deterministic Policy Gradient (SQDDPG) to solve global reward games in
a multiagent context based on Shapley values and DDPG. The proposed approach relies on distributing
the global reward more efficiently across all agents. They show that integrating Shapley values into DDPG
enables to share the global reward between all agents according to their contributions: the more the agent
contributes, the more reward it will get. This contrasts to the classical shared reward approach, which
could cause inefficient learning by assigning rewards to an agent who contributed poorly. The experiments
showed that SQDDPG presents faster convergence rate and fairer credit assignment in comparison with other
algorithms (i.e. IA2C, IDDPG, COMA and MADDPG). This method allows to plot credit assignment to
each agent, which can explain how the global reward is divided during training and what agent contributed
the most to obtain the global reward.
Figure 6: Credit assignment to each predator for a fixed trajectory in prey and predator task (Multiagent Particles environment
[70]). Left figure: Trajectory sampled by an expert policy. The square represents the initial position whereas the circle
indicates the final position of each agent. The dots on the trajectory indicate each agent’s temporary positions. Right figures:
normalized credit assignments generated by different multiagent RL algorithms according to this trajectory. SQDDPG presents
fairer credit assignments in comparison with other methods. Reproduced with permission of Jianhong Wang [26].
11
Figure 7: Setup with initial state and goal diagonally opposed on the table. The heat maps show the value of the different areas
(highest in yellow) for the high-level agent to predict a sub-goal. Black squares represent the position of the cube, the red circle
is the end goal. Thus, the low-level agent will have a succession of sub-goals (e.g. multiple actions that the robotic arm must
perform such as moving or opening its pinch) that will ultimately lead to the achievement of the high-level goal (i.e. grasping
the red ball). Reproduced with permission of [27].
Figure 8: MiniGrid environment [74], where the agent is instructed through a textual string to pick up an object and place
it next to another one. The model learns to represent the achieved goal (e.g. "Pick the purple ball") via language. As this
achieved goal differs from the initial goal ("Pick the red ball"), the goal mapper relabels the episode, and both trajectories are
appended to the replay buffer. Reproduced with permission of M. Seurin [28].
12
current one. 3) the semantic grounding module used to recognize rooms; it allows the detection of the current
room and incorporates semantic understanding by generating questions about what the agent saw (”Did you
see a bathroom?”). Self-supervision is then used for fine tuning on unseen environment. The explainability
can be brought from the outputs of all parts of the entire model. We can get information about what room
is detected by the agent, what are the next rooms targeted (sub-goals), what are the rooms predicted around
the current room and what are the rooms already seen by the agent.
An original idea proposed by Tasse et al. [30] consists of making an agent learn basic tasks and then
allow it to perform new ones by composing the tasks previously learned in a boolean formula (i.e., with
conjunctions, disjunctions and negations). The main strength of this method is that the agent is able to
perform new tasks without the necessity of a learning phase. From an XRL point of view, the explainability
comes from the fact that the agent is able to express its actions as boolean formulas, which are easily readable
by humans.
13
Figure 9: Comparison of Jacobian saliency (left) first introduced by Simonyan et al. [78] to the authors’ perturbation-based
approach (right) in an actor-critic model. Red indicates saliency for the critic; blue is saliency for the actor. Reproduced with
permission of Sam Greydanus [31].
interaction with the environment. This is done using interaction data collected by the agent that is analysed
using statistical methods organized in a three-level introspection analysis: level 0: Environment analysis,
level 1: Interaction analysis; level 3: Meta-analysis. From these interestingness elements, it is then possible
to generate visual explanations (in the form of videos compiling specific highlight situations of interest in the
agent’s behaviour), where the different introspection levels and their interconnections provide contextualized
explanations.
Figure 10: The interestingness framework. The introspection framework analyses interaction data collected by the agent and
identifies interestingness elements of the interaction history. These elements are used by an explanation framework to expose
the agent’s behavior to a human user. Reproduced with permission from Pedro Sequeira [33].
The authors applied their framework to the game Frogger and used it to generate video highlights of
agents that were included in a user study. The latter showed that no summarizing technique among those
used to generate highlight videos is adapted to all types of agents and scenarios. A related result is that
agents having a monotonous, predictable performance will lack the variety of interactions needed by the
interestingness framework to generate pertinent explanations. Finally, counter-intuitively, highlighting all
different aspects of an agent’s interactions is not the best course of action ,as it may confuse users by
consecutively showing the best and poorest performances of an agent.
14
2.3. Other concepts aiding XRL
Some studies encountered do not fit in the above categories for the main reason that they are not linked
to RL or do not directly provide explanations but nonetheless, they are interesting concepts that could
contribute to the creation of new XRL methods in the future.
15
Distillation has also been used to learn task that are closely related and whose learning should improve
speed up the learning of near tasks, in DisCoRL model [97], which helps transfer from simulation to real
settings in navigation and goal based robotic tasks. We may then be able to further explain each policy
along the training evolution timeline, or each learned skill separately.
3. Discussion
Despite explainable deep RL being still an emerging research field, we observed that numerous approaches
were developed so far, as detailed in Section 2. However, there is no clear-cut method that serves all purposes.
Most of the reviewed XRL methods are specifically designed to fit a particular task, often related to games
or robotics and with no straight forward extension to other real-world RL applications. Furthermore, those
methods cannot be generalized to other tasks or algorithms as they often make specific assumptions (e.g. on
the MDP or environment properties). In fact in XRL there can be more than one model (as in Actor-Critic
architectures) and different kinds of algorithms (DQN, DDPG, SARSA...) each with its own particularities.
Moreover, there exists a wide variety of environments where each brings its own constraints. The necessity
to adapt to the considered algorithm and environment means that it is hard to provide a holistic or generic
explainability method. Thus, in our opinion, Shapley value-based methods [15, 26] can be considered as an
interesting lead to contribute to this goal. Shapley values could be used to explain the roles taken by agents
when learning a policy to achieve a collaborative task but also to detect defects in training agents or in the
data fed to the network. In addition, as a post-hoc explainability method, it may be possible to generalize
Shapley value computation to numerous RL environments and models in the same way it was done with
SHAP [15] for other black boxes Deep Learning classifiers or regressors.
Meanwhile, the research community would benefit if more global-oriented approaches, which do not focus
on a particular task or algorithm, were developed in the future, as it has already been done in general XAI,
with for instance LIME [14] or SHAP [15].
Moreover, some promising approaches to bring explainability to RL include representation learning related
concepts such as Hindsight Experience Replay, Hierarchical RL and self-attention. However, despite the
ability of those concepts to improve performance and interpretability in a mathematical sense (in particular
representation learning), they somehow lack concrete explanations targeted to end users, as they mostly
target technical domain experts and researchers. This is a key element to further develop and allow the
deployment of RL in the real world and to make algorithms more trustable and understandable by the general
public.
16
The state of the art shows there is still room for progress to be made to better explain deep RL models
in terms of different invariants preservation and other common assumptions of disentangled representation
learning [103, 104].
We reviewed and analyzed different state of the art approaches on RL and how XAI techniques can
elucidate and inform their training, debugging and communication to different stakeholder audiences.
We focused on agent based RL in this work, however, explainability in RL involving humans (e.g. in
collaborative problem solving [105]) should involve explainability methods to better assess when robots
are able to perform the requested task, and when uncertainty is an indicator of better relying a task to a
human. Equally important is to evaluate and explain other aspects in reinforcement learning, e.g. formally
explaining the role of curriculum learning [106], quality diversity or other human-learning inspired aspects
of open-ended learning [44, 107, 108]. Thus, more theoretic bases to serve explainable by design DRL are
required. The future development of post-hoc XAI techniques should adapt to the requirements to build,
train, and convey DRL models. Furthermore, it is worth noting that all presented methods decompose final
prediction into additive components attributed to particular features [109], and thus interaction between
features should be accounted for, and included in the explanation elaboration. Since most presented strategies
to explain RL have mainly considered discrete model interpretations for explaining a model, as advocated in
[110], continuous formulations of the proposed approaches (such as Integrated Gradients [111] based on the
continuous extension of Shapley value, Aumann-Shapley value cost-sharing technique) should be devised in
the future in RL contexts.
We believe the reviewed approaches and future extensions tackling the identified issues will likely be
critical in the demanding future applications of RL. We advocate for the needs of targeting in the future
more diverse audiences (developer, tester, end-user, general public) not yet approached in the development of
XAI tools. Only this way we will produce actionable explanations and more comprehensive frameworks for
explainable, trustable and responsible RL that can be deployed in practice.
5. Acknowledgements
We thank Sam Greydanus, Zoe Juozapaitis, Benjamin Beyret, Prashan Madumal, Pedro Sequiera,
Jianhong Wang, Mathieu Seurin and Vinicius Zambaldi for allowing us to use their original images for
illustration purposes. We also would like to thank Frédéric Herbreteau and Adrien Bennetot for their help
and support.
17
References
[1] R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, The MIT Press, 2018.
URL http://incompleteideas.net/book/the-book-2nd.html
[2] Y. Duan, X. Chen, R. Houthooft, J. Schulman, P. Abbeel, Benchmarking deep reinforcement learning for continuous
control (2016). arXiv:1604.06778.
URL https://arxiv.org/pdf/1604.06778.pdf
[3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep
reinforcement learning, arXiv preprint arXiv:1312.5602 (2013).
URL https://arxiv.org/pdf/1312.5602.pdf
[4] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland,
G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529.
URL https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf
[5] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel,
T. Lillicrap, K. Simonyan, D. Hassabis, Mastering chess and shogi by self-play with a general reinforcement learning
algorithm (2017). arXiv:1712.01815.
URL https://arxiv.org/pdf/1712.01815.pdf
[6] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. R. Baker, M. Lai, A. Bolton,
Y. chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, D. Hassabis, Mastering the game of go without
human knowledge, Nature 550 (2017) 354–359.
URL https://www.nature.com/articles/nature24270
[7] O. M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Pow-
ell, A. Ray, et al., Learning dexterous in-hand manipulation, The International Journal of Robotics Research (2019)
027836491988744doi:10.1177/0278364919887447.
URL http://dx.doi.org/10.1177/0278364919887447
[8] D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke,
S. Levine, Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation (2018). arXiv:1806.10293.
URL https://arxiv.org/pdf/1806.10293.pdf
[9] H. Mao, M. Alizadeh, I. Menache, S. Kandula, Resource management with deep reinforcement learning (2016) 50–
56doi:10.1145/3005745.3005750.
URL http://doi.acm.org/10.1145/3005745.3005750
[10] I. Arel, C. Liu, T. Urbanik, A. G. Kohls, Reinforcement learning-based multi-agent system for network traffic signal
control, IET Intelligent Transport Systems 4 (2) (2010) 128–135. doi:10.1049/iet-its.2009.0070.
URL http://web.eecs.utk.edu/~ielhanan/Papers/IET_ITS_2010.pdf
[11] Z. Zhou, X. Li, R. N. Zare, Optimizing chemical reactions with deep reinforcement learning, ACS Central Science 3 (12)
(2017) 1337–1344, pMID: 29296675. arXiv:https://doi.org/10.1021/acscentsci.7b00492, doi:10.1021/acscentsci.
7b00492.
URL https://doi.org/10.1021/acscentsci.7b00492
[12] G. Zheng, F. Zhang, Z. Zheng, Y. Xiang, N. Yuan, X. Xie, Z. Li, Drn: A deep reinforcement learning framework for news
recommendation (2018) 167–176doi:10.1145/3178876.3185994.
URL http://www.personal.psu.edu/~gjz5038/paper/www2018_reinforceRec/www2018_reinforceRec.pdf
[13] D. Gunning, D. W. Aha, Darpa’s explainable artificial intelligence program, AI Magazine 40 (2) (2019) 44–58.
URL https://search.proquest.com/openview/df03d4be3ad9847da3e414fb57bc1f10
[14] M. T. Ribeiro, S. Singh, C. Guestrin, "why should i trust you?": Explaining the predictions of any classifier (2016).
arXiv:1602.04938.
[15] S. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions (2017). arXiv:1705.07874.
URL https://arxiv.org/pdf/1705.07874
[16] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks
via gradient-based localization, International Journal of Computer Vision (Oct 2019). doi:10.1007/s11263-019-01228-7.
URL http://dx.doi.org/10.1007/s11263-019-01228-7
[17] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,
S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse,
M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei,
Language models are few-shot learners (2020). arXiv:2005.14165.
[18] A. B. Arrieta, N. Díaz-Rodríguez, J. D. Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina,
R. Benjamins, R. Chatila, F. Herrera, Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and
challenges toward responsible ai (2019). arXiv:1910.10045.
URL https://arxiv.org/pdf/1910.10045.pdf
[19] F. Doshi-Velez, B. Kim, Towards a rigorous science of interpretable machine learning (2017). arXiv:1702.08608.
URL https://arxiv.org/pdf/1702.08608.pdf
[20] R. R. Hoffman, S. T. Mueller, G. Klein, J. Litman, Metrics for explainable ai: Challenges and prospects (2018).
arXiv:1812.04608.
URL https://arxiv.org/pdf/1812.04608
[21] V. Zambaldi, D. Raposo, A. Santoro, V. Bapst, Y. Li, I. Babuschkin, K. Tuyls, D. Reichert, T. Lillicrap, E. Lockhart,
M. Shanahan, V. Langston, R. Pascanu, M. Botvinick, O. Vinyals, P. Battaglia, Relational deep reinforcement learning
18
(2018). arXiv:1806.01830.
URL https://arxiv.org/pdf/1806.01830
[22] A. d’Avila Garcez, A. Resende Riquetti Dutra, E. Alonso, Towards Symbolic Reinforcement Learning with Common
Sense, arXiv e-prints (2018) arXiv:1804.08597arXiv:1804.08597.
URL https://ui.adsabs.harvard.edu/abs/2018arXiv180408597D
[23] A. Raffin, A. Hill, K. R. Traoré, T. Lesort, N. D. Rodríguez, D. Filliat, Decoupling feature extraction from policy learning:
assessing benefits of state representation learning in goal based robotics, CoRR abs/1901.08651 (2019). arXiv:1901.08651.
URL http://arxiv.org/abs/1901.08651
[24] Z. Juozapaitis, A. Koul, A. Fern, M. Erwig, F. Doshi-Velez, Explainable reinforcement learning via reward decomposition.
URL http://web.engr.oregonstate.edu/~afern/papers/reward_decomposition__workshop_final.pdf
[25] P. Madumal, T. Miller, L. Sonenberg, F. Vetere, Explainable reinforcement learning through a causal lens (2019).
arXiv:1905.10958.
URL https://arxiv.org/pdf/1905.10958.pdf
[26] J. Wang, Y. Zhang, T.-K. Kim, Y. Gu, Shapley q-value: A local reward approach to solve global reward games (2019).
arXiv:1907.05707.
URL https://arxiv.org/pdf/1907.05707.pdf
[27] B. Beyret, A. Shafti, A. Faisal, Dot-to-dot: Explainable hierarchical reinforcement learning for robotic manipulation (04
2019).
URL https://arxiv.org/pdf/1904.06703.pdf
[28] G. Cideron, M. Seurin, F. Strub, O. Pietquin, Self-educated language agent with hindsight experience replay for instruction
following (2019). arXiv:1910.09451.
URL https://arxiv.org/pdf/1910.09451.pdf
[29] D. Yu, C. Khatri, A. Papangelis, A. Madotto, M. Namazifar, J. Huizinga, A. Ecoffet, H. Zheng, P. Molino, J. Clune,
et al., Commonsense and semantic-guided navigation through language in embodied environment.
URL https://vigilworkshop.github.io/static/papers/49.pdf
[30] G. N. Tasse, S. James, B. Rosman, A boolean task algebra for reinforcement learning (2020). arXiv:2001.01394.
URL https://arxiv.org/pdf/2001.01394.pdf
[31] S. Greydanus, A. Koul, J. Dodge, A. Fern, Visualizing and understanding atari agents (2017). arXiv:1711.00138.
URL https://arxiv.org/pdf/1711.00138
[32] P. Sequeira, E. Yeh, M. T. Gervasio, Interestingness elements for explainable reinforcement learning through introspection
(2019).
URL https://explainablesystems.comp.nus.edu.sg/2019/wp-content/uploads/2019/02/IUI19WS-ExSS2019-1.pdf
[33] P. Sequeira, M. Gervasio, Interestingness elements for explainable reinforcement learning: Understanding agents’ capabilities
and limitations (2019). arXiv:1912.09007.
URL https://arxiv.org/pdf/1912.09007
[34] T. Haarnoja, V. Pong, A. Zhou, M. Dalal, P. Abbeel, S. Levine, Composable deep reinforcement learning for robotic
manipulation, 2018 IEEE International Conference on Robotics and Automation (ICRA) (May 2018). doi:10.1109/icra.
2018.8460756.
URL https://arxiv.org/pdf/1803.06773.pdf
[35] A. Al-Yacoub, Y. Zhao, N. Lohse, M. Goh, P. Kinnell, P. Ferreira, E.-M. Hubbard, Symbolic-based recognition of contact
states for learning assembly skills, Frontiers in Robotics and AI 6 (2019) 99. doi:10.3389/frobt.2019.00099.
URL https://www.frontiersin.org/article/10.3389/frobt.2019.00099
[36] B. Lütjens, M. Everett, J. P. How, Safe reinforcement learning with model uncertainty estimates (2018). arXiv:1810.08700.
URL https://arxiv.org/abs/1810.08700
[37] A. Raffin, A. Hill, R. Traoré, T. Lesort, N. Díaz-Rodríguez, D. Filliat, S-rl toolbox: Environments, datasets and evaluation
metrics for state representation learning (2018). arXiv:1809.09369.
URL https://arxiv.org/abs/1809.09369
[38] M. Garnelo, M. Shanahan, Reconciling deep learning with symbolic artificial intelligence: representing objects and
relations, Current Opinion in Behavioral Sciences 29 (2019) 17–23. doi:10.1016/j.cobeha.2018.12.010.
URL https://www.researchgate.net/publication/336180670_Reconciling_deep_learning_with_symbolic_
artificial_intelligence_representing_objects_and_relations
[39] D. Pathak, P. Agrawal, A. A. Efros, T. Darrell, Curiosity-driven exploration by self-supervised prediction (2017).
arXiv:1705.05363.
URL https://arxiv.org/abs/1705.05363
[40] T. Lesort, N. Díaz-Rodríguez, J.-F. Goudou, D. Filliat, State representation learning for control: An overview, Neural
Networks 108 (2018) 379–392. doi:10.1016/j.neunet.2018.07.006.
URL http://dx.doi.org/10.1016/j.neunet.2018.07.006
[41] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives (2012). arXiv:1206.5538.
URL https://arxiv.org/abs/1206.5538
[42] T. Lesort, M. Seurin, X. Li, N. Díaz-Rodríguez, D. Filliat, Deep unsupervised state representation learning with robotic
priors: a robustness analysis, in: 2019 International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8.
URL https://hal.archives-ouvertes.fr/hal-02381375/document
[43] R. Traoré, H. Caselles-Dupré, T. Lesort, T. Sun, G. Cai, N. Díaz-Rodríguez, D. Filliat, Discorl: Continual reinforcement
learning via policy distillation (2019). arXiv:1907.05855.
URL https://arxiv.org/abs/1907.05855
19
[44] S. Doncieux, N. Bredeche, L. L. Goff, B. Girard, A. Coninx, O. Sigaud, M. Khamassi, N. Díaz-Rodríguez, D. Filliat,
T. Hospedales, A. Eiben, R. Duro, Dream architecture: a developmental approach to open-ended learning in robotics
(2020). arXiv:2005.06223.
URL https://arxiv.org/abs/2005.06223
[45] S. Doncieux, D. Filliat, N. Díaz-Rodríguez, T. Hospedales, R. Duro, A. Coninx, D. M. Roijers, B. Girard, N. Perrin,
O. Sigaud, Open-ended learning: A conceptual framework based on representational redescription, Frontiers in Neuro-
robotics 12 (2018) 59. doi:10.3389/fnbot.2018.00059.
URL https://www.frontiersin.org/article/10.3389/fnbot.2018.00059
[46] S. Alvernaz, J. Togelius, Autoencoder-augmented neuroevolution for visual doom playing (2017). arXiv:1707.03902.
URL https://arxiv.org/pdf/1707.03902
[47] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, P. Abbeel, Deep spatial autoencoders for visuomotor learning (2015).
arXiv:1509.06113.
URL https://arxiv.org/pdf/1509.06113
[48] H. van Hoof, N. Chen, M. Karl, P. van der Smagt, J. Peters, Stable reinforcement learning with autoencoders for tactile
and visual data, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2016) 3928–3934.
URL https://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/hoof2016IROS.pdf
[49] E. Shelhamer, P. Mahmoudieh, M. Argus, T. Darrell, Loss is its own reward: Self-supervision for reinforcement learning
(2016). arXiv:1612.07307.
URL https://arxiv.org/abs/1612.07307
[50] R. Jonschkowski, O. Brock, Learning state representations with robotic priors, Autonomous Robots (2015) 407–428.
URL https://doi.org/10.1007/s10514-015-9459-7
[51] I. Higgins, D. Amos, D. Pfau, S. Racaniere, L. Matthey, D. Rezende, A. Lerchner, Towards a definition of disentangled
representations (2018). arXiv:1812.02230.
URL https://arxiv.org/abs/1812.02230
[52] A. Achille, T. Eccles, L. Matthey, C. Burgess, N. Watters, A. Lerchner, I. Higgins, Life-long disentangled representation
learning with cross-domain latent homologies, in: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi,
R. Garnett (Eds.), Advances in Neural Information Processing Systems 31, Curran Associates, Inc., 2018, pp. 9873–9883.
URL http://papers.nips.cc/paper/8193-life-long-disentangled-representation-learning-with-cross-domain-latent-homologies
pdf
[53] A. Achille, S. Soatto, Emergence of invariance and disentanglement in deep representations (2017). arXiv:1706.01350.
URL https://arxiv.org/abs/1706.01350
[54] H. Caselles-Dupré, M. Garcia Ortiz, D. Filliat, Symmetry-based disentangled representation learning requires interaction
with environments, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Advances
in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp. 4606–4615.
URL http://papers.nips.cc/paper/8709-symmetry-based-disentangled-representation-learning-requires-interaction-with-envi
pdf
[55] D. P. Kingma, M. Welling, Auto-encoding variational bayes (2013). arXiv:1312.6114.
URL https://arxiv.org/pdf/1312.6114.pdf
[56] T. Lesort, V. Lomonaco, A. Stoian, D. Maltoni, D. Filliat, N. D. Rodríguez, Continual learning for robotics, ArXiv
abs/1907.00182 (2019).
URL https://arxiv.org/pdf/1907.00182.pdf
[57] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, P. Abbeel, Infogan: Interpretable representation learning by
information maximizing generative adversarial nets, CoRR abs/1606.03657 (2016). arXiv:1606.03657.
URL http://arxiv.org/abs/1606.03657
[58] A. Garcez, T. Besold, L. De Raedt, P. Földiák, P. Hitzler, T. Icard, K.-U. Kühnberger, L. Lamb, R. Miikkulainen,
D. Silver, Neural-symbolic learning and reasoning: Contributions and challenges, 2015. doi:10.13140/2.1.1779.4243.
URL https://www.researchgate.net/publication/268034411_Neural-Symbolic_Learning_and_Reasoning_
Contributions_and_Challenges
[59] A. Santoro, D. Raposo, D. G. T. Barrett, M. Malinowski, R. Pascanu, P. Battaglia, T. Lillicrap, A simple neural network
module for relational reasoning (2017). arXiv:1706.01427.
URL https://arxiv.org/pdf/1706.01427
[60] M. Garnelo, K. Arulkumaran, M. Shanahan, Towards deep symbolic reinforcement learning (2016). arXiv:1609.05518.
URL https://arxiv.org/pdf/1609.05518
[61] M. Denil, S. G. Colmenarejo, S. Cabi, D. Saxton, N. de Freitas, Programmable agents (2017). arXiv:1706.06383.
[62] T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks (2016). arXiv:1609.02907.
[63] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo,
A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston,
C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, R. Pascanu, Relational inductive biases, deep
learning, and graph networks (2018). arXiv:1806.01261.
[64] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model, IEEE Transactions
on Neural Networks 20 (1) (2009) 61–80.
URL https://persagen.com/files/misc/scarselli2009graph.pdf
[65] A. Zhang, H. Satija, J. Pineau, Decoupling dynamics and reward for transfer learning (2018). arXiv:1804.10689.
URL https://arxiv.org/abs/1804.10689
[66] J. Y. Halpern, J. Pearl, Causes and Explanations: A Structural-Model Approach. Part I: Causes, The British Journal
20
for the Philosophy of Science 56 (4) (2005) 843–887. arXiv:https://academic.oup.com/bjps/article-pdf/56/4/843/
4256158/axi147.pdf, doi:10.1093/bjps/axi147.
URL https://doi.org/10.1093/bjps/axi147
[67] Y. Gal, Z. Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning (2015).
arXiv:1506.02142.
URL https://arxiv.org/abs/1506.02142
[68] I. Osband, C. Blundell, A. Pritzel, B. V. Roy, Deep exploration via bootstrapped dqn (2016). arXiv:1602.04621.
URL https://arxiv.org/abs/1602.04621
[69] J. García, Fern, o Fernández, A comprehensive survey on safe reinforcement learning, Journal of Machine Learning
Research 16 (42) (2015) 1437–1480.
URL http://jmlr.org/papers/v16/garcia15a.html
[70] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive
environments, Neural Information Processing Systems (NIPS) (2017).
URL https://arxiv.org/pdf/1706.02275.pdf
[71] H. van Seijen, M. Fatemi, J. Romoff, R. Laroche, T. Barnes, J. Tsang, Hybrid reward architecture for reinforcement
learning (2017). arXiv:1706.04208.
URL https://arxiv.org/pdf/1706.04208
[72] H. Kawano, Hierarchical sub-task decomposition for reinforcement learning of multi-robot delivery mission (2013) 828–
835doi:10.1109/ICRA.2013.6630669.
URL https://doi.org/10.1109/ICRA.2013.6630669
[73] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, W. Zaremba,
Hindsight experience replay (2017). arXiv:1707.01495.
URL https://arxiv.org/pdf/1707.01495.pdf
[74] M. Chevalier-Boisvert, L. Willems, S. Pal, Minimalistic gridworld environment for openai gym, https://github.com/
maximecb/gym-minigrid (2018).
[75] T. N. Mundhenk, B. Y. Chen, G. Friedland, Efficient saliency maps for explainable ai (2019). arXiv:1911.11293.
URL https://arxiv.org/pdf/1911.11293
[76] S. Jain, B. C. Wallace, Attention is not explanation (2019). arXiv:1902.10186.
URL https://arxiv.org/pdf/1902.10186
[77] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods
for deep reinforcement learning (2016). arXiv:1602.01783.
URL https://arxiv.org/pdf/1602.01783
[78] K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and
saliency maps (2013). arXiv:1312.6034.
URL https://arxiv.org/pdf/1312.6034.pdf
[79] P.-J. Kindermans, S. Hooker, J. Adebayo, M. Alber, K. T. Schütt, S. Dähne, D. Erhan, B. Kim, The (un)reliability of
saliency methods (2017). arXiv:1711.00867.
URL https://arxiv.org/pdf/1711.00867
[80] J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, B. Kim, Sanity checks for saliency maps (2018). arXiv:
1810.03292.
URL https://arxiv.org/pdf/1810.03292
[81] H. Caselles-Dupré, M. Garcia-Ortiz, D. Filliat, Symmetry-based disentangled representation learning requires interaction
with environments (2019). arXiv:1904.00243.
[82] Q. Zhang, S.-C. Zhu, Visual interpretability for deep learning: a survey (2018). arXiv:1802.00614.
URL https://arxiv.org/pdf/1802.00614.pdf
[83] Q. Zhang, R. Cao, Y. N. Wu, S.-C. Zhu, Growing interpretable part graphs on convnets via multi-shot learning (2016).
arXiv:1611.04246.
URL https://arxiv.org/pdf/1611.04246
[84] Q. Zhang, R. Cao, F. Shi, Y. N. Wu, S.-C. Zhu, Interpreting cnn knowledge via an explanatory graph (2017). arXiv:
1708.01785.
URL https://arxiv.org/pdf/1708.01785
[85] T. Wu, W. Sun, X. Li, X. Song, B. Li, Towards interpretable r-cnn by unfolding latent structures (2017). arXiv:1711.05226.
URL https://arxiv.org/pdf/1711.05226v2
[86] N. Díaz-Rodríguez, V. Lomonaco, D. Filliat, D. Maltoni, Don’t forget, there is more than forgetting: new metrics for
continual learning (2018). arXiv:1810.13166.
URL https://arxiv.org/pdf/1810.13166
[87] P. Dhar, R. V. Singh, K.-C. Peng, Z. Wu, R. Chellappa, Learning without memorizing (2018). arXiv:1811.08051.
URL https://arxiv.org/pdf/1811.08051v2.pdf
[88] Z. Li, D. Hoiem, Learning without forgetting (2016). arXiv:1606.09282.
URL https://arxiv.org/pdf/1606.09282
[89] R. Hu, A. Rohrbach, T. Darrell, K. Saenko, Language-conditioned graph networks for relational reasoning (2019).
arXiv:1905.04405.
URL https://arxiv.org/abs/1905.04405
[90] M. Baroni, Linguistic generalization and compositionality in modern artificial neural networks, Philosophical Transactions
of the Royal Society B: Biological Sciences 375 (1791) (2019) 20190307. doi:10.1098/rstb.2019.0307.
21
URL http://dx.doi.org/10.1098/rstb.2019.0307
[91] T. Pierrot, G. Ligner, S. Reed, O. Sigaud, N. Perrin, A. Laterre, D. Kas, K. Beguir, N. de Freitas, Learning compositional
neural programs with recursive tree search and planning (2019). arXiv:1905.12941.
URL https://arxiv.org/pdf/1905.12941
[92] B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, Maximum entropy inverse reinforcement learning., 2008.
URL https://arxiv.org/pdf/1507.04888.pdf
[93] T. Haarnoja, H. Tang, P. Abbeel, S. Levine, Reinforcement learning with deep energy-based policies (2017). arXiv:
1702.08165.
[94] D. Han, K. Doya, J. Tani, Emergence of hierarchy via reinforcement learning using a multiple timescale stochastic RNN,
CoRR abs/1901.10113 (2019). arXiv:1901.10113.
URL http://arxiv.org/abs/1901.10113
[95] E. Kharitonov, M. Baroni, Emergent language generalization and acquisition speed are not tied to compositionality (2020).
arXiv:2004.03420.
[96] R. Chaabouni, E. Kharitonov, E. Dupoux, M. Baroni, Anti-efficient encoding in emergent communication, in: Advances
in Neural Information Processing Systems, 2019, pp. 6290–6300.
URL https://papers.nips.cc/paper/8859-anti-efficient-encoding-in-emergent-communication.pdf
[97] R. Traoré, H. Caselles-Dupré, T. Lesort, T. Sun, G. Cai, N. Díaz Rodríguez, D. Filliat, Discorl: Continual reinforcement
learning via policy distillation, arXiv preprint arXiv:1907.05855 (2019).
URL https://arxiv.org/pdf/1907.05855
[98] P. Abbeel, A. Ng, Apprenticeship learning via inverse reinforcement learning, Proceedings, Twenty-First International
Conference on Machine Learning, ICML 2004 (09 2004). doi:10.1007/978-0-387-30164-8_417.
[99] P. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, D. Amodei, Deep reinforcement learning from human preferences
(2017). arXiv:1706.03741.
URL https://arxiv.org/pdf/1706.03741
[100] J. Kim, S. Moon, A. Rohrbach, T. Darrell, J. Canny, Advisable learning for self-driving vehicles by internalizing
observation-to-action rules, in: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
URL https://openaccess.thecvf.com/content_CVPR_2020/html/Kim_Advisable_Learning_for_Self-Driving_
Vehicles_by_Internalizing_Observation-to-Action_Rules_CVPR_2020_paper.html
[101] A. Theodorou, R. H. Wortham, J. J. Bryson, Designing and implementing transparency for real time inspection of
autonomous robots, Connection Science 29 (3) (2017) 230–241. arXiv:https://doi.org/10.1080/09540091.2017.1310182,
doi:10.1080/09540091.2017.1310182.
URL https://doi.org/10.1080/09540091.2017.1310182
[102] M. Matarese, S. Rossi, A. Sciutti, F. Rea, Towards transparency of td-rl robotic systems with a human teacher (2020).
arXiv:2005.05926.
[103] F. Locatello, S. Bauer, M. Lucic, G. Rätsch, S. Gelly, B. Schölkopf, O. Bachem, Challenging common assumptions in the
unsupervised learning of disentangled representations, arXiv preprint arXiv:1811.12359 (2018).
URL https://arxiv.org/pdf/1811.12359.pdf
[104] A. Achille, S. Soatto, A separation principle for control in the age of deep learning, arXiv preprint arXiv:1711.03321
(2017).
[105] A. Bennetot, V. Charisi, N. Díaz-Rodríguez, Should artificial agents ask for help in human-robot collaborative problem-
solving?, arXiv preprint arXiv:2006.00882 (2020).
URL https://arxiv.org/pdf/2006.00882.pdf
[106] R. Portelas, C. Colas, L. Weng, K. Hofmann, P.-Y. Oudeyer, Automatic curriculum learning for deep rl: A short survey,
arXiv preprint arXiv:2003.04664 (2020).
URL https://arxiv.org/pdf/2003.04664
[107] J.-B. Mouret, J. Clune, Illuminating search spaces by mapping elites (2015). arXiv:1504.04909.
URL https://arxiv.org/abs/1504.04909
[108] J. K. Pugh, L. B. Soros, K. O. Stanley, Quality diversity: A new frontier for evolutionary computation, Frontiers in
Robotics and AI 3 (2016) 40.
URL https://www.frontiersin.org/articles/10.3389/frobt.2016.00040/full
[109] M. Staniak, P. Biecek, Explanations of model predictions with live and breakdown packages, arXiv preprint:1804.01955
(2018).
URL https://arxiv.org/pdf/1804.01955.pdf
[110] M. Sundararajan, A. Najmi, The many shapley values for model explanation (2019). arXiv:1908.08474.
URL https://arxiv.org/abs/1908.08474
[111] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, in: D. Precup, Y. W. Teh (Eds.), Proceedings
of the 34th International Conference on Machine Learning, Vol. 70 of Proceedings of Machine Learning Research, PMLR,
International Convention Centre, Sydney, Australia, 2017, pp. 3319–3328.
URL http://proceedings.mlr.press/v70/sundararajan17a.html
[112] J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, S. Whiteson, Counterfactual multi-agent policy gradients (2017).
arXiv:1705.08926.
URL https://arxiv.org/pdf/1705.08926.pdf
[113] Y. LeCun, Y. Bengio, et al., Convolutional networks for images, speech, and time series, The handbook of brain theory
and neural networks 3361 (10) (1995) 1995.
URL https://www.researchgate.net/publication/2453996_Convolutional_Networks_for_Images_Speech_and_
22
Time-Series
[114] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep
reinforcement learning (2015). arXiv:1509.02971.
URL https://arxiv.org/abs/1509.02971
[115] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative
adversarial networks (2014). arXiv:1406.2661.
URL https://arxiv.org/pdf/1406.2661.pdf
[116] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms (2017). arXiv:
1707.06347.
URL https://arxiv.org/pdf/1707.06347.pdf
[117] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic
segmentation (2013). arXiv:1311.2524.
URL https://arxiv.org/pdf/1311.2524.pdf
[118] G. Rummery, M. Niranjan, On-line q-learning using connectionist systems, Technical Report CUED/F-INFENG/TR 166
(11 1994).
URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.2539&rep=rep1&type=pdf
23
6. Appendix
6.1. Glossary
• A2C: Asynchronous Actor Critic [77]
• AI: Artificial Intelligence
24