Active Inference and Robot Control-A Case Study

Active inference and robot control:
rsif.royalsocietypublishing.org
a case study
Léo Pio-Lopez1,2, Ange Nizard1, Karl Friston3 and Giovanni Pezzulo2
1
Pascal Institute, Clermont University, Clermont-Ferrand, France
2
Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
Research 3
The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, London, UK
LP-L, 0000-0001-8081-1070
Cite this article: Pio-Lopez L, Nizard A,
Friston K, Pezzulo G. 2016 Active inference and Active inference is a general framework for perception and action that is
robot control: a case study. J. R. Soc. Interface gaining prominence in computational and systems neuroscience but is less
13: 20160616. known outside these fields. Here, we discuss a proof-of-principle implemen-
http://dx.doi.org/10.1098/rsif.2016.0616 tation of the active inference scheme for the control or the 7-DoF arm of a
(simulated) PR2 robot. By manipulating visual and proprioceptive noise
levels, we show under which conditions robot control under the active infer-
ence scheme is accurate. Besides accurate control, our analysis of the internal
Received: 3 August 2016 system dynamics (e.g. the dynamics of the hidden states that are inferred
during the inference) sheds light on key aspects of the framework such as
Accepted: 1 September 2016
the quintessentially multimodal nature of control and the differential roles
of proprioception and vision. In the discussion, we consider the potential
importance of being able to implement active inference in robots. In particu-
lar, we briefly review the opportunities for modelling psychophysiological
Subject Category: phenomena such as sensory attenuation and related failures of gain control,
Life Sciences – Engineering interface of the sort seen in Parkinson’s disease. We also consider the fundamental
difference between active inference and optimal control formulations, show-
ing that in the former the heavy lifting shifts from solving a dynamical
Subject Areas:
inverse problem to creating deep forward or generative models with
systems biology, biomathematics, dynamics, whose attracting sets prescribe desired behaviours.
computational biology
Keywords:
active inference, free energy, robot control
1. Introduction
Active inference has recently acquired significant prominence in computational
and systems neuroscience as a general theory of brain and behaviour [1,2]. This
Author for correspondence: framework uses one single principle—surprise (or free energy) minimization—
Giovanni Pezzulo to explain perception and action. It has been applied to a variety of domains,
e-mail: giovanni.pezzulo@istc.cnr.it which includes perception– action loops and perceptual learning [3,4]; Bayes
optimal sensorimotor integration and predictive control [5]; action selection
[6,7] and goal-directed behaviour [8 –12].
Active inference starts from the fundaments of self-organization which
suggests that any adaptive agent needs to maintain its biophysical states
within limits, therefore maintaining a generalized homeostasis that enables it
to resist the second law of thermodynamics [2]. To this aim, both an agent’s
actions and perceptions both need to minimize surprise, that is, a measure of
discrepancy between the agent’s current predictive or desired states. Crucially,
agents cannot minimize surprise directly but they can minimize an upper
bound of surprise, namely the free energy of their beliefs about the causes of
sensory input [1,2].
This idea is cast in terms of Bayesian inference: the agent is endowed with
priors that describe its desired states and a (hierarchical, generative) model of
the world. It uses the model to generate continuous predictions that it tries to
fulfil via action; that is to say, the agent activity samples the world to minimize
prediction errors so that surprise (or its upper bound, free-energy) is sup-
pressed. More formally, this is a process in which beliefs about (hidden or
& 2016 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution
License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original
author and source are credited.
latent) states of the world maximize Bayesian model evidence and x [ X for a particular value. The tilde notation 2
of observations, while observations are sampled selectively to ~x ¼ ðx, x0 , x0 0 , . . .Þ corresponds to variables in generalized coordi-
nates of motion [24]. Each prime is a temporal derivative. p(X )
conform to the model [13,14]. The agent has essentially two
ways to reduce surprise: change its beliefs or hypotheses denotes a probability density.
( perception), or change the world (action). For example, if it
— V is the sample space from which random fluctuations v [ V
believes that its arm is raised, but observes it is not, then it are drawn.
can either change its mind or to raise the arm—either way, — Hidden states C : C A V ! R. They depend on actions
its prediction comes true (and free energy is minimized). and are part of the dynamics of the world that causes sensory
As we see, in active inference, this result can be obtained states.
by endowing a Bayesian filtering scheme with reflex arcs — Sensory states S : C A V ! R. They are the agent’s sen-
that enable action, such as raising a robotic arm or using it sations and constitute a probabilistic mapping from action
to touch a target. In this example, the agent generates a ( pro- and hidden states.
J. R. Soc. Interface 13: 20160616

prioceptive) prediction corresponding to the sensation of — Action A : S R ! R. They are the agent’s actions and depend
raising alarm, and reflex arcs fulfil this prediction effectively on its sensory and internal states.
— Internal states R : R S V ! R. They depend on sensory
raising the hand (and minimizing surprise).
states and cause actions. They constitute the dynamics of
Active inference has clear relevance for robotic motor con-
states of the agent.
trol. As in optimal motor control [15,16], it relies on optimality — A recognition density qðC ~ j~
mÞ, which corresponds to the
principles and (Bayesian) state estimation; however, it has agent’s beliefs about the causes C (and brain state m describ-
some unique features such as the fact that it dispenses with ing those beliefs).
inverse models (see the Discussion). Similar to planning-as- — A generative density pðC ~ , ~sjmÞ corresponding to the density
inference and KL control [17–22], it uses Bayesian inference, of probabilities of the sensory states s and world states C,
but it is based on the minimization of a free-energy functional knowing the predictive model m of the agent.
that generalizes conventional cost or utility functions.
According to Ashby [25], in order to restrict themselves in a
Although the computations underlying Bayesian inference or
limited number of states, an agent must minimize the dispersion
free-energy minimization are generally hard, they become
of its sensory and hidden states. The Shannon entropy corre-
tractable as active inference uses variational inference, usually
sponds to the dispersion of the external states (here S C).
under the Laplace assumption, which enables one to summar- Under ergodic assumption, this entropy equals the long-term
ize beliefs about hidden states with a single quantity (the average of Gibbs energy
conditional mean). The resulting (neural) code corresponds
9
to the Laplace code, which is simple and efficient [3]. ~ , ~sjmÞlt =
HðS, CÞ ¼ kGðC
Despite its success in computational and systems neuro- ð2:1Þ
~ , ~sjmÞ: ;
G ¼ ln pðC
science, active inference is less known in related domains
such as motor control and robotics. For example, it remains
unproven that the framework can be adopted in challenging One can see that the Gibbs energy is defined in terms of the gen-
robotic set-ups. In this article, we ask if active inference can erative model. k:l is the expectation or the mean under a density
be effectively used to control the 7-DoF arm of a PR2 robot when indicated. However, agents cannot minimize this energy
(simulated using Robot Operating System (ROS)). We present directly, because hidden states are unknown by definition.
a series of robot reaching simulations under various conditions However, mathematically
(with or without noise on vision and/or proprioception), in
order to test the feasibility of this computational scheme in HðS,CÞ ¼ HðSÞ þ HðCjSÞ
ð2:2Þ
robotics. Furthermore, by analysing the internal system ¼ k ln pð~sðtÞjmÞ þ HðCjS ¼ ~sðtÞÞlt :
dynamics (e.g. the dynamics of the hidden states that are inferred
during the inference), our study sheds light on key aspects of the With this latter equation, we observe that sensory surprise
framework such as the quintessentially multimodal nature of ln pð~sðtÞjmÞ minimizes the entropy of the external states and
control and the relative roles of proprioception and vision. can be minimized through action if action minimizes conditional
Finally, besides providing a proof of principle for the usage of entropy. In this sense
active inference in robotics, our simulations help to illustrate 9
the differences between this scheme and alternative approaches aðtÞ ¼ arg minð ln pð~sðtÞjmÞÞ >
=
a
in computational neuroscience and robotics, such as optimal ð2:3Þ
~ ðtÞ ¼ arg minðHðCjS ¼ ~sðtÞÞÞ: >
m ;
control, and the significance of these differences from both a m
~
technological and biological perspective.
Unfortunately, we cannot minimize sensory surprise directly
(see equation (2.3)) as this entails a marginalization over hidden
2. Methods states which is intractable
In this section, we first define, mathematically, the active infer- ð

ence framework (for the continuous case). We then describe its ln pð~sjmÞ ¼ ln pðC,~sjmÞ dC: ð2:4Þ
application to robotic control and reaching.
Happily, there is a solution to this problem that comes

2.1. Active inference formalism from theoretical physics [26] and machine learning [27]
The free-energy term that is optimized (minimized) during called variational free energy, which furnishes an upper bound
action control rests on the tuple ðV, C, S, A, R, q, pÞ [23]. on surprise. This is a functional of the conditional density,
A real-valued random variable is denoted by X : V . . . ! R which minimized by action and internal states, to produce
action and perception A generalized gradient descent on variational free energy is 3
9 defined in the second pair of equations. This method is termed
aðtÞ ¼ arg min Fð~sðtÞ, m
~ ðtÞÞ >
>
> generalized filtering and rests on conditional expectations to pro-
a >
>
>
> duce a prediction (first) term and an update (second) term
~ ðtÞ ¼ arg min Fð~sðtÞ, m
m ~Þ >
>
>
>
m
~ = based upon free-energy gradients that, as we see below, can be
ð2:5Þ expressed in terms of prediction errors (this corresponds to the
~ , ~sjmÞlq þ H½qðC
Fðs, mÞ ¼ kGðC ~ j~
mÞ >
>
>
> basic form of a Kalman filter). D is a differential matrix operator
>
>
¼ D½qðC~ j~ ~ j~smÞ ln pð~sðaÞjmÞ >
mÞjjpðC >
> that operates on generalized motion and D~ m describes the gener-
>
>
; alized motion of conditional expectations. Generalized motion
ln pð~sðaÞjmÞ:
comprises vectors of velocity, acceleration, jerk, etc.
The term D½:jj: is the Kullback –Leibler divergence (or cross- The generative model has the following hierarchical form
entropy) between two densities. The minimizations on a and m ~ 9
s ¼ gð1Þ ðxð1Þ , nð1Þ Þ þ vð1Þn >
>
>

correspond to action and perception, respectively, where the ð1Þ >
>
x_ ¼ f ð1Þ ðx, nð1Þ Þ þ vx >
>
internal states m~ parametrize the conditional density q. We >
>
.. >
=
need perception in order to use free energy to finesse the (intract- .
ði1Þ ðiÞ ðiÞ ðiÞ ðiÞ > ð2:8Þ
able) evaluation of surprise. The Kullback –Leibler term is n ¼ g ðx , n Þ þ vn >
>
>
non-negative, and the free energy is therefore always greater ðiÞ
x_ ðiÞ ¼ f ðiÞ ðx, nðiÞ Þ þ vx >
>
>
>
than surprise as we can see it in the last inequality. When free >
>
.. ;
energy is minimized, it approximates surprise and as a result, .
the conditional density q approximates the posterior density
over external states The level of the hierarchy in the generative model corre-
sponds to i. f (i) and g (i) and their Gaussian random fluctuations
~ j~
D½qðC mÞjjpðC~ j~smÞ 0 )
( vx and vv on the motion of hidden states and causes define a
~
qðCj~mÞ pðC ~ j~smÞ ð2:6Þ probability density over sensations, causes of the world and
:
H½qðC~ j~
mÞ HðCjS ¼ ~sÞ hidden states that constitute the free energy of posterior or con-
ditional (Bayesian) beliefs about the causes of sensations. Note
This completes a description of approximate Bayesian infer- that the generative model becomes probabilistic because of
ence (active inference) within the variational framework. This the random fluctuations (where sensory or sensor noise corre-
free-energy formulation resolves several issues in perception sponds to fluctuations at the first level of the hierarchy and at
and action control problems, but in the following, we focus on fluctuations at higher levels induces uncertainty about hidden
action control. According to equation (2.5), free energy can be states). The inverse of the covariances matrices of these random
minimized using actions via its effect on hidden states and sen- fluctuations is called precision (i.e. inverse covariance) and is
sation. In this case, action changes the sensations to match the denoted by ðPðiÞ ðiÞ
x , Pn Þ.
agent’s expectations.
The only outstanding issue is the nature of the generative
model used to explain and sample sensations. In continuous
time formulations, the generative model is usually expressed in 2.3. Prediction errors and predictive coding
terms of coupled (stochastic) differential equations. These We can now define prediction errors on the hidden causes and
equations describe the dynamics of the (hidden) states of the states. These auxiliary variables represent the difference between
world and the ensuing behaviour of an agent [5]. This leads us conditional expectations and their predicted values based on the
to a discussion of the agent’s generative model. level above. Using A B :¼ AT B:
9
@ ~gðiÞ ðiÞ ðiÞ >
>
~x ¼ D~
m _ ðiÞ ðiÞ
mx þ ðiÞ Pn ~1n >
>
>
>
2.2. The generative model @m ~x >
>
>
>
Active inference generally assumes that the generative model ðiÞ >
>
@~f ðiÞ ðiÞ ðiÞ ðiÞ
>
>
>
supporting perception and action is nonlinear, dynamic and þ ðiÞ Px ~1x DPx ~1x >
>
@m~x >
>
deep (i.e. hierarchical), of the sort that might be entailed by >
>
>
>
cortical and subcortical hierarchies in the brain [28]. @ ~
g ðiÞ >
=
ðiÞ ðiÞ
9 ~n ¼ D~
m _ ðiÞ ðiÞ
mn þ ðiÞ Pn ~1n ð2:9Þ
s ¼ gðx, v, aÞ þ ws > @m ~ >
>
>
= >
>
x ¼ fðx, v, aÞ þ wx ðiÞ >
>
ð2:7Þ @~f >
ðiþ1Þ ðiþ1Þ >
mÞ
_a ¼ @ a Fð~s,~ >
> þ ðiÞ PðiÞ x ~ 1ðiÞ
x Pn ~1n > >
>
; @m~n >
>
m~_ ¼ D~ m @m ~ Þ:
~ Fð~s, m >
>
>
>
~ðiÞ >
>
~1x ¼ D~
ðiÞ ðiÞ
mx f ð~ ðiÞ
mx , m ðiÞ
~n Þ >
>
In bold, we have real-world states and in italic, internal states >
>
ðiÞ
>
;
of the agent. s is the sensory input, x corresponds to hidden ðiÞ ði1Þ ðiÞ ðiÞ
~1n ¼ m ~n g~ ð~ mx , m ~n Þ
states, v to hidden causes of the world and a to action. Intuitively,
hidden states and causes are used by the brain as abstract quan- ~1nðiÞ and ~1ðiÞ
x correspond to prediction errors on hidden causes and
tities in order to predict sensations. Dynamics over time is linked hidden states, respectively. The precisions PðiÞ ðiÞ
n and Px weights
by hidden states, whereas the hierarchical levels are linked by the prediction errors, so that more precise prediction errors
hidden causes. The ~ notation means that we are using general- have a greater influence during generalized filtering.
ized coordinates of motion, i.e. a vector of positions, velocities, The derivation of equation (2.8) enables us to express the
accelerations, etc. [5]. ~s, ~n and a corresponds to sensory input, gradients equation (2.7) in terms of prediction errors. Effectively,
conditional expectations and action, respectively. precise prediction errors update the prediction to provide a
One can observe a coupling between these differential Bayes optimal estimate of hidden states as a continuous function
equations: sensory states depend upon action a(t) via causes of time—where free energy corresponds to the sum of the
(x, v) and the functions (f, g). While action depends upon sen- squared prediction error (weighted by precision) at each level
sory states via internal states ~nðtÞ. These differential equations of the hierarchy. Heuristically, this corresponds to an instan-
are stochastic owing to random fluctuations (vx, vv). taneous gradient ascent in which prediction errors are
assimilated to provide for online inference. For a more detailed 4
explanation of the mathematics under this scheme, see [24].
2.4. Action
A motor trajectory (e.g. the trajectory of raising the arm) is pro-
duced via classical reflex arcs that suppress proprioceptive
prediction errors
ð1Þ ð1Þ
a_ ¼ @ a F ¼ ð@ a ~1ð1Þ
n Þ Pn ~1n ð2:10Þ
Intuitively, conditional expectations in the generative model

drive (top-down) proprioceptive predictions (e.g. the propriocep-

tive sensation of raising one’s own arm), and these predictions
are fulfilled by reflex arcs. This is because the only way for an
agent to minimize its free energy through action (and suppress
proprioceptive prediction errors) is to change proprioceptive sig-
nals, i.e. raise the arm and realize the predicted proprioceptive
sensations. According to this scheme, reflex arcs thus produce
a motor trajectory (raising the arm) to comply with set points
or trajectories prescribed by descending proprioceptive predic-
tions (cf. motor commands). At the neurobiological level, this
process is thought to occur at the level of cranial nerve nuclei
and spinal cord.
Figure 1. PR2 robot simulated using ROS [29].

2.5. Application to robotic arm control and reaching
Having described the general active inference formalism, we now
illustrate how it can be used to elicit reaching movements with a
robot: the 7-DoF arm of a PR2 robot simulated using the ROS The behaviour of the robot arm during its reaching task is
[29] (figure 1). Essentially, in our simulations, the robot has to specified in terms of the robot’s prior beliefs that constitute its
reach a target by moving (i.e. raising) its arm. We see that generative model. Here, these beliefs are based upon a basic
the key aspect of this behaviour rests on a multimodal integra- but efficient feedback control. In other words, by specifying a
tion of visual and proprioceptive signals [30,31], which play particular generative model, we create a robot that thinks it
differential—yet interconnected—roles. will behave in a particular way: in this instance, we think it
In this robotic setting, the hidden states are the angle of the behaves as an efficient feedback controller, as follows. Within
joints ðx1 , x2 , . . . , x7 Þ. The visual input is the position of the the joint configuration space (thanks to geometrical consider-
end effector, here the arm of the PR2 robot. This location ations), the prior control law provides a per-joint angular
ðn1 , n2 , n3 Þ can be seen as autonomous causal states. We increment to be applied according to the position of the end
assume that the robot knows the true mapping between effector, allowing its convergence towards the target position.
the position of its hand Pos and the angles of its joints. In In order to avoid the singular configurations of the PR2 arm,
other words, we assume that the robot knows its forward two actions a and b are superposed. The first one is a per-joint
model and can extract the true position of its end effector in action: each joint tries to align the portion of arm it supports
three-dimensional coordinates into the visual space. with the target position. The second action is distributed over
2 3 the shoulder, and the elbow providing the flexion – extension
x
gðx, nÞ ¼ gðx, nÞ ¼ 4 n 5 ð2:11Þ primitive in order to reach or escape the singular configurations
Pos of the first action (e.g. stretched arm).
Let T ¼ (t1, t2, t3) be the target position in the Euclidean
If we assume a Newtonian dynamics with viscosity k and space, Ji ¼ ðji1 , ji2 , ji3 Þ the position of the joint i in W, W ¼ R3 ,
elasticity k, then we obtain the subsequent equations of w ¼ T J the vector describing the shortest path in W to reach
motion that describe the true ( physical) evolution of hidden the target, fi ¼ Ti J=jjTi Jjj the unit vector linking each
states joint to the arm’s distal extremity, Posi ¼ ðPosi1 , Posi2 , Posi3 Þ
2 3 the unit vector collinear to the rotation axis of the joint i. Let ‘’
x0 1
6 7 be the dot product in W and ‘’ the cross product. The feedback
6 x0 2 7
6 . 7 error to be regulated to zero by the first action of the control law
6 .. 7 for the joint i is
6 7
6 x 0 7
2 3 6 7 7
ei ¼ ðw fi Þ Posi : ð2:13Þ
x_ 1 6 ða k x k x0 Þ 7
6 1 1 1 1 1 7
6 x_ 2 7 6 7
6 7 6 m1 7
6 .. 7 6 0 7 Classically, the first action is designed as a PI controller that
6 . 7 6 ða2 k1 x2 k1 x 2 Þ 7
6 7 6 7 ensures
6 x_ 7 7 6 m1 7
fðx̃, vÞ ¼ 6 7 6
6 x_ 0 1 7 ¼ 6 ða3 k1 x3 k1 x0 3 Þ 7
7 ð2:12Þ ð t¼t0
6 0 7 6 7 e_ i ¼ ai ¼ pp ei þ pi ei ðtÞ dt, ð2:14Þ
6 x_ 2 7 6 m1 7
6 7 6 7 t¼0
6 . 7 6 ða4 k2 x4 k2 x0 4 Þ 7
4 .. 5 6 7
6 7 where t0 is the current time and fpp,pig are two positive settings
6 m2 7
x_ 0 7 6 0 7 used to adjust the convergence rate. To preclude wind-up
6 ða5 k2 x5 k2 x 5 Þ 7
6 7 phenomena, the absolute value of the integral term is bounded
6 m2 7
6 7 by amax . 0.
6 .. 7
4 . 5 To operate as expected, the second action needs to predict
0
ða7 k2 x7 k2 x7 Þ=m2 the influence of the ‘stretched arm’ singularity. This is
achieved with two parameters gm and gc. They are defined as the reach a desired position in three dimensions with its 7-DoF 5
dot products arm. We simulated various starting and desired positions,
9
w f = but in this illustration, we focus on the sample problem
gm ¼ 2 illustrated in figure 2, where the start position is on the
jjwjj jjf2 jj ð2:15Þ
;
g c ¼ w f2 , bottom-centre, and the desired position is the green dot.
The four panels of figure 2 exemplify the robot reaching
where the absolute value of gc is bounded by gmax . 0. Then, the
second action is defined as under the four scenarios that we considered. In the first scen-
9 ario (figure 2a), there was no noise on proprioception and
b1 ¼ b3 ¼ b5 ¼ b6 ¼ b7 ¼ 0 =
vision. In the second, third and fourth scenarios, propriocep-
b2 ¼ gm gc p p2 ðshoulderÞ ð2:16Þ
b4 ¼ gm gc p p4 ðelbowÞ, ; tion (figure 2b), vision (figure 2c) or both (figure 2d) were
noisy, respectively. We used noise with a log precision of 4.
where p p2 and p p4 are additional positive settings used to balance As illustrated by the figures, in the absence of noise (first

the contribution of the two joints (roughly: p p2 ¼ p p4 =2). Finally, scenario), the reaching trajectory is flawless and free of static
the controller provides the empirical prior
error (figure 2a). Trajectories become less accurate when
Fi ¼ ai þ bi : ð2:17Þ either proprioception (second scenario) or vision (third scen-
ario) are noisy, still the arm reaches the desired target
In practice, to obtain reasonable behaviour, the controller set-
(figure 2b,c). However, when both proprioception and
tings were chosen as: pp ¼ 0.3, pi ¼ 0.01, amax ¼ 0.001, pp2 ¼ 2.25,
vision are noisy, the arm becomes largely unable to reach
pp4 ¼ 5, gmax ¼ 0.1.
Finally, we obtain the following generative model the target (figure 2d).
2 3 A more direct comparison between the four scenarios is
x0 1
6 0 7 possible, if one considers the average of 20 simulations
6 x2 7
6 7 from a common starting point (figure 3). Here, the four col-
6 ... 7
6 7 ours correspond to the four scenarios: first scenario (no
6 x7 0 7
6 7 noise) is blue; second scenario (noisy proprioception) is
2 3 6 ð F k x k x0 Þ 7
x_ 1 6 1 1 1 1 1 7
black, third scenario (noisy vision) is red and fourth scenario
6 7
6 x_ 2 7 6 m1 7
6 7 6 0 7 (noisy proprioception and vision) is yellow. To compare the
6 .. 7 6 ðF2 k1 x2 k1 x 2 Þ 7
6 . 7 6 7 trajectories under the four scenarios quantitatively, we com-
6 7 6 m1 7
6 x_ 7 7 6 7
f ð~x,nÞ ¼ 6 7 ¼ 6 ð F k x k 1 3 7
x 0
Þ ð2:18Þ puted the sum of Euclidian distances between the position
6 x_ 0 1 7 6 3 1 3 7
6 0 7 6 m 7 of the end effector for each iteration of the algorithm under
6 x_ 2 7 6 1 7
6 7 6 ð F k x k x0 Þ 7 the best trajectory (corresponding to scenario 1) and the
6 . 7 6 4 2 4 2 4 7
4 .. 5 6 7
6 m2 7 other trajectories. We obtained a difference between the
0
x_ 7 6 0 7
6 ðF5 k2 x5 k2 x 5 Þ 7 normal and noisy proprioception scenarios of 0.2796;
6 7
6 m2 7 between normal and noisy vision scenarios of 0.143; and a
6 7
6 . 7
6 .
. 7 difference between the normal and noisy proprioception
6 7
4 ðF7 k2 x7 k2 x0 7 Þ 5 and vision scenarios of 1.2169.
:
m2 These differences can be better appreciated if one con-
siders the internal dynamics of the system’s hidden states
Importantly, we see that the generative model has a very
(i.e. angles of the arm) during the different conditions, as
different form from the true equations of motion. In other
words, the generative model has prior beliefs that render shown in figures 4– 7 for the simulations without noise,
motor behaviour purposeful. It is this enriched generative noisy proprioception, noisy vision and noisy proprioception
model that produces goal-directed behaviour, which fulfils the and vision, respectively. The hidden states are inferred
robot’s predictions and minimizes surprise. In this instance, while the agent optimizes its expectations as described
the agent believes it is going to move with its arm towards the above (see equation (2.12)). In turn, action (a(t)) is selected
target until it touches it. The distance between the end effector based on the hidden states (technically, action is part of the
and the target is used as an error that drives the motion, as if generative process but not the generative model).
the end effector is pulled to the target. The ensuing movement The four panels of figures 4–7 show the conditional pre-
therefore resolves the Bernstein’s problem that tries to solve the dictions and prediction errors during the task. In each figure,
converse problem of pushing the end effector towards the
the top right panel shows the hidden states, and the grey
target (which is an ill-posed problem). This formulation of
areas correspond to 90% Bayesian confidence intervals. The
motor control is related to the equilibrium point hypothesis
[32] and the passive motion paradigm [33,34] and, crucially, figures show that adding noise to proprioception (figure 5)
dispenses with inverse models. Note that no solution of an makes the confidence interval much larger compared with
optimal control problem is required here. This is because the a standard case with no noise (figure 4). Confidence intervals
causes of desired behaviour are specified explicitly by the gener- further increase when both proprioception and vision are
ative or forward model (the arm is pulled to a target), and do noisy (figure 7). The top left panel shows the conditional pre-
not have to be inferred from desired consequences; see section dictions of sensory signals (coloured lines) and sensory
Discussion for a comparison of active inference and optimal prediction errors (red). These are errors on the proprioceptive
control schemes. and visual input, and are small in relation to predictions. The
bottom left panel shows the true expectation (dotted line) and
conditional expectation (solid line) about hidden causes. The
bottom right panel shows actions (coloured lines) and true
3. Results causes (dotted lines). In the noisy proprioception scenario
We tested the model in four scenarios. In all the scenarios, the (figure 5), one of the hidden states (top right panel) and
robot arm started from a fixed starting position and had to one action (bottom right panel) rises with time. This
(a) (b) 6
1.1
1.2
1.0
1.1
0.9
1.0
z (m)
0.8
0.9
z (m)
0.7
0.8
0.6
0.7
0.5
–0.5 0.6
–0.4 –0.6

–0.4
y (–0.3 0.5 –0.2
m) –0.2 0.66 0.64
–0.1 0.70 0.68 0.75 0
0.74 0.72 0.70 0.65 y (m)
0.76
x (m) x (m)
(c) (d) 1.5

1.2
1.4
1.1
1.3
1.0 1.2
0.9 1.1
z (m)
z (m)
0.8 1.0
0.9
0.7
0.8
0.6
0.7
0.5
–0.6 0.6
–0.4 0.5
–0.2
y(
1.0
m)
0 0.5
x(
0.72 0.70 0.68 0.66 0.64 0.62

0.74 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 0 0.1
m
0.78 0.76
)
x (m) y (m)
Figure 2. Reaching trajectories in three dimensions from a start to a goal location under four scenarios. (a) Scenario 1: reaching in three dimensions with 7 DoF.
(b) Scenario 2: reaching in three dimensions with noisy proprioception. (c) Scenario 3: reaching in three dimensions with noisy vision. (d) Scenario 4: reaching in
three dimensions with noisy proprioception and vision. The blue trajectory is the mean of 20 trajectories shown in grey.
1.2
4. Discussion
1.1 Our case study shows that the active inference scheme can
control the seven DoFs arm of a simulated PR2 robot—focus-
1.0
ing here on the task of reaching a desired goal location from a
0.9 ( predefined) start location.
z (m)
Our results illustrate that action control is accurate with

0.8
intact proprioception and vision, and only partly impaired if
0.7 noise is added to either of these modalities. The comparison
of the trajectories of figure 2b,c shows that adding noise to pro-
0.6
prioception is more problematic. The analysis of the dynamics
0.5 of internal system variables (figures 4–7) helps us understand-
0 ing the above results, highlighting the differential roles of
–0.2
–0.4 proprioception and vision in this scheme. In the noisy proprio-
y(
–0.6 0.75 0.60

m
0.70 0.65
ception scenario (figure 5), hidden states are significantly more
)
x (m)
uncertain compared with the reference case with no noise
Figure 3. Reaching trajectories from a common starting point. In blue, no (figure 4). Yet, despite the uncertainty about joint angles, the
noise (scenario 1). In black, noise on proprioception (scenario 2). In red, robot can still rely on (intact) vision to infer where the arm is
noise on vision (scenario 3). In yellow, noise on vision and proprioception in space, and thus it is able to reach the target ultimately—
(scenario 4). The trajectories are the mean of 20 simulations from the although it follows largely suboptimal trajectories (in relation
same start and goal (green) locations. to its prior beliefs preferences). Multimodal integration or com-
pensation is impossible if both vision and proprioception are
corresponds to an internal degree of freedom that does not sufficiently degraded (figure 7). In the noisy vision scenario,
have any effect on the trajectory. The figures show some figure 6, noise has some effect on inferred causes but only
slight oscillations after 20 iterations, which are due to the affects hidden states (and ultimately action selection) to a
fact that the arm is moving in the proximity of the target. minor extent.
(a) prediction and error (b) hidden states 7
1.5 1.0
0.8
1.0 0.6
mx(t)
0.4
0.5 0.2
0
0 –0.2
–0.4

–0.5 –0.6
~
g(m(t)) –0.8
–1.0 –1.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
(c) (d)
1.5 hidden causes 1.5 perturbation and action
1.0
v(t)
1.0
0.5
0.5 0
–0.5
0 mv(t)
–1.0 a(t)
–0.5 –0.5
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
iterations iterations
Figure 4. Dynamics of the model internal variables in the normal case. The conditional predictions and expectations are shown as functions of the iterations. (a) the
panel shows the conditional predictions (coloured lines) and the corresponding prediction errors (red lines) based upon the expected states on the upper right.
(b) The coloured lines represent the expected hidden states causing sensory predictions. These can be thought of displacements in the articulatory state space. In this
panel and throughout, the grey areas denote 90% Bayesian confidence intervals. (c) The dotted lines represent the true expectation and the solid lines show the
conditional expectation of the hidden cause. (d) Action (solid line) and true causes (dotted line).
This pattern of results shows that control is quint- if one requires, for example, to estimate the pose of a
essentially multimodal and based on both vision and to-be-grasped object.
proprioception, and adding noise to either modality can be The above-mentioned results are consistent with a large
( partially) compensated for by appealing to the other, more body of studies showing the importance of proprioception
precise dimension. However, proprioception and vision for control tasks. Patients with impaired proprioception can
play differential roles in this scheme. Proprioception has a still execute motor tasks such as reaching, grasping and loco-
direct effect on hidden states and action selection; this is motion, but their performance is suboptimal [35–38]. In
because action dynamics depend on reflex arcs that suppress principle, the scheme proposed here may be used to explain
proprioceptive (not visual) prediction errors (see §2.4). If the human data under impaired proprioception [4] or other def-
robot has poor proprioceptive information, then it can icits in motor control—or even to help design rehabilitation
use multimodal integration and the visual modality to com- therapies. In this perspective, an interesting issue that we
pensate and restore efficient control. However, if both have not addressed pertains to the attenuation of propriocep-
modalities are degraded with noise, then multimodal inte- tive prediction errors during movement. Heuristically, this
gration becomes imprecise, and the robot cannot reduce sensory attenuation is necessary to allow the prior beliefs of
error accurately—at least in the simplified control scheme the generative model to supervene over the sensory evidence
assumed here, which (on purpose) does not include any that movement has not yet occurred (or in other words, to
additional corrective mechanism. Adding noise to vision is prevent internal states encoding the fact that there is no
less problematic, given that in the (reaching) task considered movement). This speaks to a dynamic gain control that med-
here, it plays a more ancillary role. Indeed, our reaching task iates the attenuation of the precision of prediction errors
does not pose strong demands on the estimation of hidden during movement. In the example shown above, we simply
causes for accurate control; the situation may be different reduced the precision of ascending proprioceptive prediction
1.5 1.5
1.0
1.0 mx(t)
0.5
0.5
0
0
–0.5

~ –0.5
–1.0 g(m(t))
–1.5 –1.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
(c) (d)
1.5 hidden causes 2.5 perturbation and action
2.0
1.5
1.0 v(t)
1.0
0.5
0.5
0
–0.5
0
mv(t) –1.0
a(t)
–1.5
–0.5 –2.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
Figure 5. (a – d) Dynamics of the model internal variables in the noisy proprioception case. The layout of the figure is the same as figure 4. Please see the previous
figure legends for details.
errors to enable movement. Had we increased their precision, people characterize experimental data by comparing the evi-
the ensuing failure of sensory attenuation would have sub- dence for different models in Bayesian model comparison.
verted movement; perhaps in a similar way to the poverty We hope to explore this in future work.
of movements seen in Parkinson’s disease—a disease that We next hope to port the scheme to a real robot. This will
degrades motor performance profoundly [39]. This aspect be particularly interesting, because there are several facets of
of precision or gain control suggests that being able to active inference that are more easily demonstrated in a real-
implement active inference in robots will allow us to perform world artefact. These aspects include a robustness to exogen-
simulated psychophysical experiments to illustrate sensory ous perturbations. For example, the movement trajectory
attenuation and its impairments. Furthermore, it suggests a should gracefully recover from any exogenous forces applied
robotic model of Parkinson’s disease is in reach, providing to the arm during movement. Furthermore, theoretically, the
an interesting opportunity for simulating pathophysiology. active inference scheme is also robust to differences between
Clearly, some of our choices when specifying the genera- the true motor plant and the various kinematic constants in
tive model are heuristic—or appeal to established notions. the generative model. This robustness follows from the fact
For example, adding a derivative term to equation (2.14) that the movement is driven by (fictive) forces whose fixed
could change the dynamics in an interesting way. In general, points do not change with exogenous perturbations—or
the framework shown above accommodates questions about many parameters of the generative model (or process).
alternative models and dynamics through Bayesian model Another interesting advantage of real-world implementations
comparison. In principle, we have an objective function (vari- will be the opportunity to examine robustness to sensorimotor
ational free energy) that scores the quality of any generative delays. Although not necessarily a problem from a purely
model entertained by a robot—in relation to its embodied robotics perspective, biological robots suffer non-trivial
exchange with the environment. This means we could delays in the signalling of ascending sensory signals and des-
change the generative model and assess the quality of the cending motor predictions. In principle, these delays can be
ensuing behaviour using variational free energy—and select absorbed into the generative model—as has been illustrated
the best generative model in exactly the same way that in the context of oculomotor control [40]. At present, these
1.5 0.8
0.6
mx(t)
1.0 0.4
0.2
0.5
0
–0.2
0
–0.4

–0.5 –0.6
~
g(m(t))
–0.8
–1.0 –1.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
(c) hidden causes (d) perturbation and action

1.2 1.5
1.0
1.0
v(t)
0.8
0.5
0.6
0.4 0
0.2
–0.5
0
mv(t) –1.0 a(t)
–0.2
–0.4 –1.5
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
Figure 6. (a–d) Dynamics of the model internal variables in the noisy vision case. The layout of the figure is the same as figure 4. Please see the previous figure
legends for details.
proposals for how the brain copes with sensorimotor delays in robot performing a particular motor task. Optimal control
oculomotor tracking remain hypothetical. It would be extre- theory requires a mechanism for state estimation as well as
mely useful to see if they could be tested in a robotics setting. two internal models: an inverse and forward model. This
As noted above, active inference shares many similarities scheme also assumes that the appropriate optimality equation
with the passive movement paradigm (PMP, [33,34]). Although can be solved [41]. In contrast, active inference uses prior
strictly speaking, active inference is a corollary of the free- beliefs about the movement (in an extrinsic frame of reference)
energy principle, it inherits the philosophy of the PMP in the instead of optimal control signals for movements (in an intrin-
following sense. Active inference is equipped with a generative sic frame of reference). In active inference, there is no inverse
model that maps from causes to consequences. In the setting of model or cost function and the resulting trajectories are
motor control, the causes are forces that have some desired Bayes optimal. This contrasts with optimal control, which
fixed point or orbit. It is then a simple matter to predict the calls on the inverse model to finesse problems incurred by sen-
sensory consequences of those forces—as sensed by proprio- sorimotor noise and delays. Inverse models are not required in
ception or robotic sensors. These sensory predictions can active inference, because the robot’s generative (or forward)
then be realized through open loop control (e.g. peripheral model is inverted during the inference. Active inference also
servos or reflex arcs); thereby realizing the desired fixed point dispenses with cost functions, as these are replaced by the
(cf. the equilibrium point hypothesis [32]). However, unlike the robot’s (prior) beliefs (of note, there is a general duality
equilibrium point hypothesis, active inference is open loop. between control and inference [15,16]). In brief, replacing the
This is because its motor predictions are informed by deep gen- cost function with prior beliefs means that minimizing cost
erative models that are sensitive to input from all modalities corresponds to maximizing the marginal likelihood of a gen-
(including proprioception). The fact that action realizes the erative model [42–44]. A formal correspondence between
(sensory) consequences of (prior) causes explains why there cost functions and prior beliefs can be established with the
is no need for an inverse model. complete class theorem [45,46], according to which there is
Optimal motor control formulations [15,16] are fundamen- at least one prior belief and cost function that can produce a
tally different. Briefly, optimal control operates by minimizing Bayes-optimal motor behaviour. In sum, optimal control for-
some cost function in order to compute motor commands for a mulations start with a desired endpoint (consequence) and
1.5 0.8
0.6
1.0 mx(t)
0.4
0.2
0.5
0
0 –0.2
–0.4
–0.5
–0.6

–0.8
–1.0 ~
g(m(t)) –1.0
–1.5 –1.2
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
(c) hidden causes (d) perturbation and action

1.2 1.5
1.0 1.0
v(t)
0.8
0.5
0.6
0
0.4
–0.5
0.2
–1.0
0
mv(t)
–1.5 a(t)
–0.2
–0.4 –2.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
Figure 7. (a –d) Dynamics of the model internal variables in the ‘all noisy’ case. The layout of the figure is the same as figure 4. Please see the previous figure
legends for details.
tried to reverse engineer the forces (causes) that produce the Although active inference resolves many problems that
desired consequences. It is this construction that poses a diffi- attend optimal control schemes, there is no free lunch. In
cult inverse problem with solutions that are not generally active inference, all the heavy lifting is done by the generative
robust—and are often problematic in robot control. Active model—and in particular, the priors that define desired set-
inference finesses this problem by starting with the causes of points or orbits. The basic idea is to induce these attractors
movement, as opposed to the consequences. by specifying appropriate equations of motion within the
Accordingly, one can see the solutions offered under opti- generative model of the robot. This means that the art of
mal control as special cases of the solutions available under generating realistic and purposeful behaviour reduces to
an (active) inferential scheme. This is because some policies creating equations of motion that have desired attractors.
cannot be specified using cost functions but can be described These can be simple fixed-point attractors as in the example
using priors; specifically, this is the case of solenoidal move- above. They could also be much more complicated, pro-
ments, whose cost is equal for every part of the trajectory ducing quasi-periodic motion (as in walking) or fluent
[47]. This comes from variational calculus, which says that sequences of movements specified by heteroclinic cycles.
a trajectory or a policy has several components: a curl-free All the more interesting theoretical examples in the theoreti-
component that changes value and a divergence-free com- cal literature to date rest upon some form of itinerant
ponent that does not change value. The divergence-free dynamics inherent in the generative model that sometimes
motion can be only specified by a prior and not by a cost have deep structure. A nice example of this is the handwrit-
function. Discussing the relative benefits of control schemes ing example in [48] that used Lotka–Volterra equations to
with or without cost functions and inverse models is specify a sequence of saddle points—producing a series of
beyond the scope of this article. Here, it suffice to say that movements. Simpler examples could use both attracting
inverse models are generally hard to learn for robots, and and repelling fixed points that correspond to contact points
cost functions sometimes need to be defined in ad hoc and collision points, respectively, to address more practical
manner for robot control tasks. By eluding these constraints, issues in robotics. Irrespective of the particular repertoire of
active inference may offer a promising alternative to optimal attractors implicit in the generative model, the hierarchi-
control schemes. For a more detailed discussion on the links cal aspect of the generative models that underlie active
between optimal control and active inference, see [47]. inference enables the composition of movements, sequences
of movements and sequences of sequences [49–51]. In other more generally, to consider future ( predicted) and not only 11
words, provided one can write down (or learn) a deep gen- currently sensed contingencies [17,52– 56]. Planning mechan-
erative model with itinerant dynamics, there is a possibility isms have been described under the active inference scheme
of simulating realistic movements that inherit deep temporal and can solve challenging problems such as the mountain –
structure and context sensitivity. car problem [5], and can thus been seamlessly integrated in
In conclusion, in this article, we have presented a proof- the model presented here—speaking to the scalability of the
of-concept implementation of robot control using active active inference scheme. Finally, one reason for using a bio-
inference, a biologically motivated scheme that is gaining logically realistic model such as active inference is that it
prominence in computational and systems neuroscience. may be possible to directly map internal dynamics generated
The results discussed here demonstrate the feasibility of the by the robot simulator (e.g. of hidden states) to brain signals
scheme; having said this, further work is necessary to fully (e.g. EEG signals reflecting predictions and prediction errors)
demonstrate how this scheme works in more challenging generated during equivalent action planning or performance.

domains or whether it has advantages (from both technologi-
cal and biological viewpoints) over alternative control Data accessibility. All data underlying the findings described in the
schemes. Future work will address an implementation of manuscript can be downloaded from https://github.com/LPioL/
the above scheme on a real robot with the same degrees of activeinference_ROS.
freedom as the PR2. Other predictive models could be devel- Authors’ contribution. L.P.L., A.N., K.F. and G.P. conceived the study and
wrote the manuscript. L.P.L. and A.N. performed the simulations. All
oped, the generative model illustrated above is very simple authors gave final approval for publication.
and does not take advantage of the internal degrees of free- Competing interests. We declare we have no competing interests.
dom. A key generalization will be integrating planning Funding. This work was supported by the French– Italian University
mechanisms that may allow, for example, the robot to pro- (C2-21). K.F. is supported by the Wellcome trust (ref no. 088130/
actively avoid obstacles or collisions during movement—or Z/09/Z).
References
1. Friston K. 2010 The free-energy principle: a unified 11. Pezzulo G, Cartoni E, Rigoli F, Pio-Lopez L, Friston K. Front. Psychol. 4, 92. (doi:10.3389/fpsyg.2013.
brain theory? Nat. Rev. Neurosci. 11, 127–138. 2016 Active inference, epistemic value, and 00092)
(doi:10.1038/nrn2787) vicarious trial and error. Learn. Mem. 23, 322–338. 21. Todorov E. 2006 Linearly-solvable Markov
2. Friston K, Kilner J, Harrison L. 2006 A free energy (doi:10.1101/lm.041780.116) decision problems. Adv. Neural Inf. Process Syst. 19,
principle for the brain. J. Physiol.-Paris 100, 70– 87. 12. Pezzulo G, Rigoli F, Friston K. 2015 Active inference, 1369– 1376.
(doi:10.1016/j.jphysparis.2006.10.001) homeostatic regulation and adaptive behavioural 22. Maisto D, Donnarumma F, Pezzulo G. 2015 Divide et
3. Friston K, Kiebel S. 2009 Predictive coding under control. Prog. Neurobiol. 134, 17 –35. (doi:10.1016/ impera: subgoaling reduces the complexity of
the free-energy principle. Phil. Trans. R. Soc. B 364, j.pneurobio.2015.09.001) probabilistic inference and problem solving. J. R. Soc.
1211–1221. (doi:10.1098/rstb.2008.0300) 13. Dayan P, Hinton GE, Neal RM, Zemel RS. 1995 The Interface 12, 20141335. (doi:10.1098/rsif.2014.1335)
4. Friston K et al. 2012 Dopamine, affordance and Helmholtz machine. Neural Comput. 7, 889–904. 23. Friston K, Adams R, Perrinet L, Breakspear M. 2012
active inference. PLoS Comput. Biol. 8, e1002327. (doi:10.1162/neco.1995.7.5.889) Perceptions as hypotheses: saccades as experiments.
(doi:10.1371/journal.pcbi.1002327) 14. Knill DC, Pouget A. 2004 The Bayesian brain: the Front. Psychol. 3, 151.
5. Friston KJ, Daunizeau J, Kilner J, Kiebel SJ. 2010 role of uncertainty in neural coding and 24. Friston K, Stephan K, Li B, Daunizeau J. 2010
Action and behavior: a free-energy formulation. computation. Trends Neurosci. 27, 712 –719. Generalised filtering. Math. Probl. Eng. 2010,
Biol. Cybern. 102, 227–260. (doi:10.1007/s00422- (doi:10.1016/j.tins.2004.10.007) 621670. (doi:10.1155/2010/621670)
010-0364-z) 15. Todorov E, Jordan MI. 2002 Optimal feedback 25. Ashby WR. 1947 Principles of the self-organizing
6. Friston K, Schwartenbeck P, FitzGerald T, Moutoussis control as a theory of motor coordination. Nat. dynamic system. J. Gen. Psychol. 37, 125–128.
M, Behrens T, Dolan RJ. 2013 The anatomy of Neurosci. 5, 1226–1235. (doi:10.1038/nn963) (doi:10.1080/00221309.1947.9918144)
choice: active inference and agency. Front. Hum. 16. Todorov E. 2008 General duality between optimal 26. Feynman RP. 1998 Statistical mechanics: a set of
Neurosci. 7, 598. (doi:10.3389/fnhum.2013.00598) control and estimation. In 47th IEEE Conf. on lectures (advanced book classics). Boulder, CO:
7. FitzGerald T, Schwartenbeck P, Moutoussis M, Dolan Decision and Control (CDC), 2008, IEEE. Westview Press Incorporated.
RJ, Friston K. 2015 Active inference, evidence pp. 4286– 4292. 27. Hinton GE, Van Camp D. 1993 Keeping the neural
accumulation, and the urn task. Neural Comput. 27, 17. Botvinick M, Toussaint M. 2012 Planning as networks simple by minimizing the description
306–328. (doi:10.1162/NECO_a_00699) inference. Trends Cogn. Sci. 16, 485–488. (doi:10. length of the weights. In Proc. the Sixth Annual
8. Friston KJ, Daunizeau J, Kiebel SJ. 2009 1016/j.tics.2012.08.006) Conf. Computational Learning Theory, ACM.
Reinforcement learning or active inference? 18. Donnarumma F, Maisto D, Pezzulo G. 2016 Problem pp. 5–13.
PLoS ONE 4, e6421. (doi:10.1371/journal.pone. solving as probabilistic inference with subgoaling: 28. Zeki S, Shipp S. 1988 The functional logic of cortical
0006421) explaining human successes and pitfalls in the connections. Nature 335, 311– 317. (doi:10.1038/
9. Friston K, Rigoli F, Ognibene D, Mathys C, FitzGerald tower of Hanoi. PLoS Comput. Biol. 12, e1004864. 335311a0)
T, Pezzulo G. 2015 Active inference and epistemic (doi:10.1371/journal.pcbi.1004864) 29. Quigley M et al. 2009 ROS: an open-source robot
value. Cogn. Neurosci. 6, 187–214. (doi:10.1080/ 19. Pezzulo G, Rigoli F. 2011 The value of foresight: operating system. In Proc. Open-Source Software
17588928.2015.1020053) how prospection affects decision-making. Front. Workshop Int. Conf. Robotics and Automation, Kobe,
10. Friston K, FitzGerald T, Rigoli F, Schwartenbeck P, Neurosci. 5, 79. (doi:10.3389/fnins.2011.00079) Japan, May, vol. 3, p. 5.
O’Doherty J, Pezzulo G. 2016 Active inference and 20. Pezzulo G, Rigoli F, Chersi F. 2013 The mixed 30. Körding KP, Wolpert DM. 2004 Bayesian integration
learning. Neurosci. Biobehav. Rev. 68, 862–879. instrumental controller: using value of information in sensorimotor learning. Nature 427, 244–247.
(doi:10.1016/j.neubiorev.2016.06.022) to combine habitual choice and mental simulation. (doi:10.1038/nature02169)
31. Diedrichsen J, Verstynen T, Hon A, Zhang Y, Ivry RB. 39. Konczak J, Corcos DM, Horak F, Poizner H, Shapiro Comput. Biol. 5, e1000464. (doi:10.1371/journal. 12
2007 Illusions of force perception: the role of M, Tuite P, Volkmann J, Maschke M. 2009 pcbi.1000464)
sensori-motor predictions, visual information, and Proprioception and motor control in Parkinson’s 50. Pezzulo G. 2012 An active inference view of
motor errors. J. Neurophysiol. 97, 3305 –3313. disease. J. Motor Behav. 41, 543–552. (doi:10. cognitive control. Front. Psychol. 3, 478. (doi:10.
(doi:10.1152/jn.01076.2006) 3200/35-09-002) 3389/fpsyg.2012.00478)
32. Feldman AG, Levin MF. 2009 The equilibrium-point 40. Perrinet LU, Adams RA, Friston KJ. 2014 Active 51. Pezzulo G, Donnarumma F, Iodice P, Prevete R,
hypothesis—past, present and future. In Progress inference, eye movements and oculomotor delays. Dindo H. 2015 The role of synergies within
in motor control, pp. 699 –726. Berlin, Germany: Biol. Cybern. 108, 777– 801. (doi:10.1007/s00422- generative models of action execution and
Springer. 014-0620-8) recognition: a computational perspective. Comment
33. Mussa-Ivaldi F. 1988 Do neurons in the motor 41. Bellman R. 1952 On the theory of dynamic on “Grasping synergies: a motor-control approach to
cortex encode movement direction? An alternative programming. Proc. Natl Acad. Sci. USA 38, the mirror neuron mechanism” by A. D’Ausilio et al.
hypothesis. Neurosci. Lett. 91, 106 –111. (doi:10. 716 –719. (doi:10.1073/pnas.38.8.716) Phys. Life Rev. 12, 114–117. (doi:10.1016/j.plrev.

1016/0304-3940(88)90257-1) 42. Cooper GF. 2013 A method for using belief 2015.01.021)
34. Mohan V, Morasso P. 2011 Passive motion networks as influence diagrams. (https://arxiv.org/ 52. Lepora NF, Pezzulo G. 2015 Embodied choice: how
paradigm: an alternative to optimal control. Front. abs/1304.2346) action influences perceptual decision making. PLoS
Neurorob. 5, 1– 28. (doi:10.3389/fnbot.2011.00004) 43. Shachter RD. 1988 Probabilistic inference and Comput. Biol. 11, e1004110. (doi:10.1371/journal.
35. Butler AJ, Fink GR, Dohle C, Wunderlich G, Tellmann influence diagrams. Oper. Res. 36, 589–604. pcbi.1004110)
L, Seitz RJ, Zilles K, Freund H-J. 2004 Neural (doi:10.1287/opre.36.4.589) 53. Pezzulo G, van der Meer MA, Lansink CS, Pennartz
mechanisms underlying reaching for remembered 44. Pearl J. 2014 Probabilistic reasoning in intelligent CMA. 2014 Internally generated sequences in
targets cued kinesthetically or visually in left or systems: networks of plausible inference. learning and executing goal-directed behavior.
right hemispace. Hum. Brain Mapp. 21, 165 –177. San Francisco, CA: Morgan Kaufmann. Trends Cogn. Sci. 18, 647–657. (doi:10.1016/j.tics.
(doi:10.1002/hbm.20001) 45. Brown LD. 1981 A complete class theorem for 2014.06.011)
36. Diener H, Dichgans J, Guschlbauer B, Mau H. 1984 statistical problems with finite sample spaces. Ann. 54. Pezzulo G, Cisek P. 2016 Navigating the affordance
The significance of proprioception on postural Stat. 9, 1289–1300. (doi:10.1214/aos/1176345645) landscape: feedback control as a process model
stabilization as assessed by ischemia. Brain 46. Robert CP. 1992 L’analyse statistique bayésienne. of behavior and cognition. Trends Cogn. Sci. 20,
Res. 296, 103–109. (doi:10.1016/0006- Paris, France: Economica. 414–424. (doi:10.1016/j.tics.2016.03.013)
8993(84)90515-8) 47. Friston K. 2011 What is optimal about motor 55. Stoianov I, Genovesio A, Pezzulo G. 2016 Prefrontal
37. Dietz V. 2002 Proprioception and locomotor control? Neuron 72, 488 –498. (doi:10.1016/j. goal-codes emerge as latent states in probabilistic
disorders. Nat. Rev. Neurosci. 3, 781–790. (doi:10. neuron.2011.10.018) value learning. J. Cogn. Neurosci. 28, 140 –157.
1038/nrn939) 48. Friston K, Mattout J, Kilner J. 2011 Action (doi:10.1162/jocn_a_00886)
38. Sainburg RL, Ghilardi MF, Poizner H, Ghez C. 1995 understanding and active inference. Biol. Cybern. 56. Verschure P, Pennartz C, Pezzulo G. 2014 The why,
Control of limb dynamics in normal subjects and 104, 137 –160. (doi:10.1007/s00422-011-0424-z) what, where, when and how of goal directed choice:
patients without proprioception. J. Neurophysiol. 73, 49. Kiebel SJ, Von Kriegstein K, Daunizeau J, Friston KJ. neuronal and computational principles. Phil. Trans.
820–835. 2009 Recognizing sequences of sequences. PLoS R. Soc. B 369, 20130483. (doi:10.1098/rstb.2013.0483)

Active Inference and Robot Control-A Case Study

Uploaded by

Active Inference and Robot Control-A Case Study

Uploaded by

Active inference and robot control:

J. R. Soc. Interface 13: 20160616

In this section, we first define, mathematically, the active infer- ð

Happily, there is a solution to this problem that comes

J. R. Soc. Interface 13: 20160616

Intuitively, conditional expectations in the generative model

J. R. Soc. Interface 13: 20160616

Figure 1. PR2 robot simulated using ROS [29].

J. R. Soc. Interface 13: 20160616

J. R. Soc. Interface 13: 20160616

(c) (d) 1.5

0.72 0.70 0.68 0.66 0.64 0.62

Our results illustrate that action control is accurate with

–0.6 0.75 0.60

J. R. Soc. Interface 13: 20160616

J. R. Soc. Interface 13: 20160616

J. R. Soc. Interface 13: 20160616

(c) hidden causes (d) perturbation and action

J. R. Soc. Interface 13: 20160616

(c) hidden causes (d) perturbation and action

J. R. Soc. Interface 13: 20160616

J. R. Soc. Interface 13: 20160616

You might also like