Sampling-Based A Algorithm For Robot Path-Planning

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/273133066
Sampling-based A* algorithm for robot path-planning
Article in The International Journal of Robotics Research · November 2014

DOI: 10.1177/0278364914547786
CITATIONS READS
44 1,432
2 authors:
S. Mikael Persson Inna Sharf

McGill University McGill University
8 PUBLICATIONS 80 CITATIONS 136 PUBLICATIONS 2,541 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
AEROSTABILES View project
Active Debris Removal View project
All content following this page was uploaded by Inna Sharf on 18 November 2015.
The user has requested enhancement of the downloaded file.

Article
The International Journal of
Robotics Research
Sampling-based A* algorithm for robot 2014, Vol. 33(13) 1683–1708
Ó The Author(s) 2014
path-planning Reprints and permissions:

sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/0278364914547786
ijr.sagepub.com
Sven Mikael Persson and Inna Sharf
Abstract
This paper presents a generalization of the classic A* algorithm to the domain of sampling-based motion planning. The
root assumptions of the A* algorithm are examined and reformulated in a manner that enables a direct use of the search
strategy as the driving force behind the generation of new samples in a motion graph. Formal analysis is presented to
show probabilistic completeness and convergence of the method. This leads to a highly exploitative method which does
not sacrifice entropy. Many improvements are presented to this versatile method, most notably, an optimal connection
strategy, a bias towards the goal region via an Anytime A* heuristic, and balancing of exploration and exploitation on a
simulated annealing schedule. Empirical results are presented to assess the proposed method both qualitatively and quan-
titatively in the context of high-dimensional planning problems. The potential of the proposed methods is apparent, both
in terms of reliability and quality of solutions found.
Keyword
Sampling-based algorithm, path-planning, motion-planning, A*, RRT*, probabilistic completeness, simulated
annealing, robot manipulator, high-dimensional planning, optimization
1. Introduction introduction of the RRT algorithm is the concept of the

Voronoi bias (sometimes called Voronoi pull) which drives
Complex path-planning problems typically involve high- the creation of new samples through an expansion of exist-
dimensional configuration spaces, complex collision- ing samples towards unexplored regions represented by
detection geometries, kinematic or dynamic constraints, random samples drawn from the overall configuration
uncertain or dynamic environments, unconventional dis- space (most likely, unexplored regions). The first important
tance metrics or cost functions, and other domain-specific modification to the RRT algorithm was the introduction of
considerations. These types of problems are generally very a bidirectional strategy, i.e. growing two trees, one from the
difficult to solve by a direct trajectory optimization, and are goal and one from the start, and greedily attempting to con-
often inadequately served by simple navigation heuristics nect them (Kuffner and LaValle, 2000). Since then, count-
(e.g. potential field approaches). Deterministic algorithms less other modifications have been introduced, ranging
that rely on a discretization of the configuration space to from general to domain-specific.
represent all possible moves the robot can make are usually The PRM algorithm is quite different in nature as it
computationally too expensive and dedicate far too much attempts to create a complete representation of the
effort, in time and memory, where none is needed to meet collision-free configuration space through a roadmap of
the goal of finding a feasible and optimal path. For these
samples that are all connected as one graph (Kavraki et al.,
reasons, sampling-based motion-planning methods have
1996). For early overviews and summaries of PRM algo-
become a very popular alternative as they probabilistically
rithms, variations, and comparative studies, the reader is
construct a motion graph to serve the purposes of the plan-
directed towards (Geraerts and Overmars, 2002; Sucan and
ning problem and can thus be controlled to direct resources
more effectively.
Sampling-based motion planning started to gather seri- McGill University, McConnell Engineering Building, Montreal, Quebec,
ous traction with the introduction of two families of meth- Canada
ods: the rapidly-exploring random tree (RRT) (LaValle,
Corresponding author:
1998; LaValle and Kuffner, 2001); and the probabilistic Sven Mikael Persson, McGill University, McConnell Engineering
roadmap (PRM) (Kavraki et al., 1996; Kavraki and Building, 3480 University Street, Montreal, Quebec, H3A 2A7, Canada.
Latombe, 1998). The key development with the Email: mikael.persson@mail.mcgill.ca
Downloaded from ijr.sagepub.com by guest on October 17, 2015
1684 The International Journal of Robotics Research 33(13)
Kavraki, 2010). The important concepts introduced include example, the strategies for the dynamic reconfiguration of
mainly the formal analysis of probabilistic completeness the motion-graph were applied in a straight-forward man-
and of space connectivity, as well as the more practical ner to an RRT planner (Ferguson et al., 2006). Similarly,
concept of monitoring the density of nodes in the motion- since the PRM algorithm requires a shortest-path algorithm
graph and leveraging that information to ‘‘push’’ the expan- to resolve a query once a sufficiently connected roadmap is
sion in useful regions. available, one can alternate the generation of the motion
This last concept will be critical in the present paper. graph and an anytime dynamic search through it, as in
Another early algorithm that builds on this idea is the (Berg et al., 2006). Yet another possibility is to use a
expansive spaces tree (EST) (Hsu et al., 1999) which gen- layered approach such as super-imposing a coarse discrete
erates a tree via random walks (or expansions) of existing search and a sampling-based planner (Plaku, 2012).
nodes in the tree, which are picked with a probability inver- More interestingly and more relevant to the present
sely proportional to the number of nodes in their direct paper is the concept of expanding the motion graph through
neighborhood. The formal analysis in the present paper will the shortest-path algorithm itself. One interesting example
rely heavily on the analysis of the EST algorithm, as pre- is the Flexible Anytime Dynamic PRM (FADPRM)
sented in the thesis (Hsu, 2000). Few later algorithms (Belghith et al., 2006) which uses a mixture of the density
directly fall under this category of algorithms, but two nota- metric (from PRM) and the AD* heuristic to select nodes
ble later developments are: the single-query bi-directional to be expanded by random walks. The FADPRM relies on
lazy planner (SBL) (Sanchez and Latombe, 2001) which many heuristics and fine-tuned parameters that make its
introduces many practical improvements to the EST algo- application rather difficult in practice. Another notable
rithm; and the Guided-EST algorithm (Phillips et al., 2004) example of driving the sampling with a shortest-path heur-
which introduces a more sophisticated density heuristic istic is the Utility-guided RRT (Burns and Brock, 2007).
which incorporates, among other things, the A* cost of the This method is especially relevant to the present paper
nodes. In that sense, the algorithm presented in this paper because it attempts to model the utility of the exploration in
arrives at a similar result, but with the critical difference a probability theoretic fashion. Building off a prior method
that the density and A* cost combination are derived from by the same authors (Burns and Brock, 2005) which
a generalization of the A* algorithm itself and are justified explored the idea of predictive statistical models to learn
by formal analysis. the topology of the free configuration-space, the Utility-
Another important early development in sampling-based guided RRT uses local predictions of the utility of explor-
motion planning was the realization that quasi-random sam- ing around a given node and try to sample along the least-
pling could, in general, be sufficient and beneficial to such explored directions therefrom. Similarly, the application of
algorithms (Branicky et al., 2001). By their nature, quasi- equivalence classes as a means to exhaust neighborhoods is
random samplers rely on a finite discretization of space, another predictive model that can be applied (Gonzalez and
with controlled interval sizes, and generate samples from Likhachev, 2011). In the present paper, we rely on a similar
the finite set which tends to produce more uniform distribu- idea by relying on the expected value of the total path cost
tions. In other words, this can be seen as a probabilistic dis- (as in the A* algorithm) with the aim of maximizing
covery of a finite motion graph, avoiding the problem of exploitation by sampling around optimal regions.
representing the complete motion graph in memory or hav- Coming back to the classical sampling-based
ing to traverse it entirely, while benefiting from its limited approaches, Karaman and Frazzoli (2011) have presented a
and uniform density, another important concept in the pres- very influential paper in which they describe three new
ent paper. algorithms, RRT*, rapidly-exploring random graph (RRG),
The present paper inscribes itself into the trend of bring- and PRM*, which are proven to be asymptotically optimal.
ing useful concepts from deterministic path planning into The RRT* is probably the most widely used sampling-
sampling-based methods. The classic deterministic path- based algorithm today and varies from the RRT mainly in
planning method is the A* algorithm (Hart et al., 1968) the fact that it keeps track of the accumulated travel cost
which relies on a best-first exploration of the motion-graph and performs optimal re-wirings to conserve a record of
to find an optimal path from a starting node to a goal node. the optimal path from the root to any other node. A
Then, two relatively recent modifications to this classic branch-and-bound strategy can also be added to the algo-
algorithm have been made. First, the Anytime A* algorithm rithm (Karaman et al., 2011). The argument behind the
was developed to speed up the initial search for a feasible branch-and-bound strategy is that nodes that cannot con-
path and then progressively improve it, and thus, achieving tribute to optimal paths (from start to goal) should be
an anytime behavior (Likhachev et al., 2003). Then, pruned from the motion tree to reduce wasted efforts.
dynamic reconfiguration of the motion-graph was added to In machine learning and numerical optimization litera-
create the Anytime Dynamic A* (AD*) algorithm ture in particular, there is a recurring theme, that of explo-
(Likhachev et al., 2005). ration versus exploitation. This is the problem of choosing
Some attempts have been made to incorporate these between broadening a search in order to discover all possi-
aforementioned deterministic algorithms more intimately ble solutions versus refining the current best solution(s)
with different kinds of sampling-based algorithms. For (e.g. gradient descent). This balancing act has been less
Persson and Sharf 1685
present in sampling-based motion-planning literature, path-planning approach. The A* search strategy has a num-
which has implicitly favored exploration since its inception ber of advantages, two of which are lacking in current
(Kavraki et al., 1996; LaValle, 1998) and sometimes expli- sampling-based approaches, that is, the search is focused
citly (Sucan and Kavraki, 2012), but there are notable on the current most promising path and the first solution
exceptions. In many practical implementations of uni- found is the optimal one (within the discretization).
directional RRT methods, the goal region is sampled on a However, the A* search is an exhaustive search over a
regular basis in a greedy attempt to grow the tree more rap- finite motion graph (e.g. a ‘‘grid’’), which is problematic
idly towards it. The exploration-exploitation tree (EET) since, by definition, a sampling-based approach involves
(Rickert et al., 2008) is a method which draws samples the construction of the motion-graph through sampling of
from a region around the goal, making incremental expan- the configuration space, and can thus be generated ad infi-
sions towards the obtained sample, and expanding or nitum. We tackle this problem by examining the underlying
shrinking the sampling region according to a heuristic rationale behind the A* algorithm, and we generalize that
based on the success rate of the expansions, thus balancing rationale, through local predictive statistical models, so that
exploration and exploitation via a larger or smaller sam- it is suitable for a sampling-based approach.
pling region around the goal.
Another interesting method, presented in Vazquez-Otero
et al. (2012), uses a dynamic reaction–diffusion process to 2.1 Overview of A*
expand a search for the goal location and then contract At the core of the A* search algorithm is a priority queue
while leaving a simulated tension between the start and which chooses the most promising node to visit. Visiting a
goal, resulting in an optimal path. This strategy is strangely node entails surveying its neighboring nodes, adding them
reminiscent of simulated annealing methods used in numer- to the priority queue as needed, and the process is repeated
ical optimization and certain machine learning algorithms. until the goal node is discovered. In other words, nodes of
The authors did not find any reported use of simulated the motion graph start with the label ‘‘undiscovered’’, then
annealing in a sampling-based motion planner, but it has become ‘‘discovered but not expanded’’ (or ‘‘open’’), and
shown to be useful when solving a path-planning problem finally, become ‘‘expanded’’ (or ‘‘closed’’). Needless to say,
as a general non-linear trajectory-optimization problem. the search strategy is very simple, yet very effective, in fact,
Most recently, the covariant Hamiltonian optimization for optimal under most usual conditions.
motion planning (CHOMP) method presents one state-of- Clearly, the algorithm is driven by a measure of how
the-art use of simulated annealing in motion planning ‘‘promising’’ the visit of a node is to the search. This mea-
(Zucker et al., 2013). We mention these methods mainly sure is obtained through an approximation that underesti-
because one of the central novelty in the present paper is mates the total travel cost when taking a path through a
the application of a simulated annealing strategy to balance given node. In concrete terms, given a node u, if we accu-
exploration and exploitation in the proposed sampling- mulate the cost to travel from the start to that node via the
based motion planner. shortest path through the discovered portion of the motion-
This paper is organized as follows. Section 2 presents a graph, denoting that cost as g(u), and then compute a heur-
generalization of the A* algorithm by deconstructing its istic value for the remaining cost to the goal, denoting it as
basal assumptions and casting them in a probability theore- h(u), we can obtain a lower-bound approximation of the
tic framework that can be used with local predictive models total cost as
to drive the expansion of a motion graph. We name this
method the sampling-based A* algorithm (SBA*) and pro- f (u) [ g(u) + h(u) ð1Þ
vide formal analysis to characterize its convergence. Then,
where the heuristic value h(u) is required to be equal to or
in Section 3, the practical SBA* algorithm is presented
less than the actual travel cost along a non-colliding path
with a number of refinements to it, notably, an optimal con-
from u to the goal node. Usually, h(u) is simply the ‘‘bird-
nection strategy, an anytime heuristic to provide a stronger
flight’’ distance, i.e. the distance between u and the goal
goal-bias, and the use of simulated annealing to balance
node in the configuration space when obstacles are ignored.
exploration and exploitation. Finally, in Section 4, results
Once the total travel cost can be approximated, the node
are presented to characterize the behavior of the proposed
with the least total travel cost is considered as the most pro-
algorithm in a cluttered environment, in high-dimensional
mising and, thus, the priority queue chooses the minimum
spaces, and for the practical application motivating this
element (i.e. a ‘‘min-heap’’). The optimality of the A*
work: motion planning for a seven-degree-of-freedom (7-
search hinges upon requirements on the heuristic value,
dof) manipulator to capture a free-floating target.
most notably, that it does not overestimate the remaining
cost (called admissibility) and that it is monotonically
decreasing as progress is made towards the goal (called
2. Generalizing the A* algorithm
consistency). A ‘‘bird-flight’’ distance in a configuration
This section outlines the process of generalization of the space with a proper metric automatically satisfies these
A* algorithm (Hart et al., 1968) to a sampling-based conditions, and since most path-planning applications, even
in the most esoteric domains, have these properties already the value of r(u) reflects the expected total travel cost of a
for other reasons (e.g. having consistent neighborhoods), new path discovered that goes through node u. The empha-
the conditions for optimality of the A* search are easily sis here is on the fact that the partial path is new, and that it
met. is feasible, both encoded by whether the node u belongs to
The A* algorithm is well-known to most researchers in the OPEN set or not, since a node will not be in the OPEN
the field, and we will not provide further detail on it in this set if it is unreachable by a collision-free path or has
section. The key elements to keep in mind are, first, the already been expanded.
idea of choosing the most ‘‘promising’’ nodes and, second, We define two types of events that can occur during a
approximating the total travel cost by a combination of the visitation.
accumulated cost from the start node and the heuristic eva-
luation of the remaining cost to reach the goal. The latter is NEW : A new partial path was constructed ð5Þ
simple to do and is available to virtually all application
FREE : A collision - free segment was discovered ð6Þ
domains. The former, however, will need further dissection
to make the A* search applicable as the driver for a This leads to the following redefinition of r(u):
sampling-based path-planning approach.
r(u) = P(NEW, FREEju) f (u) + (1 P(NEW, FREEju))‘
ð7Þ
2.2 Generalization of the node value function

The classic formulation of the A* search algorithm (Hart 1 if u 2 fOPENg
P(NEW, FREEju) = ð8Þ
et al., 1968) makes a silent assumption, that is, expanding 0 otherwise
a node is only useful if there is information to be gained
from expanding it. This can seem like a trivial point, but it where P(NEW,FREE ju) is the probability that, given a
becomes important in a sampling-based approach. Given node u, a visitation will yield a new and collision-free path,
that the A* search is normally conducted on a finite motion which, in a classic A* algorithm, is encoded by the OPEN
graph, upon the first examination of a node, one can set.
exhaustively survey its neighborhood, after which there is Clearly, r(u) is an expected value from a binomial distri-
obviously no more information to be gained from that bution, which we can express as such:
neighborhood, and the node can be ‘‘closed’’ and never
r(u) = EN (u) ½f (u) ð9Þ
expanded again.
If one were to simply take an A* search algorithm and which is the expected value of f(u) in the neighborhood of
replace the exhaustive survey of the neighborhood by the u, when rejecting nodes that are not reachable through a
generation of samples within the nearby configuration collision-free path or have already been explored. This for-
space, the algorithm would make no progress at all, since it mulation is clearly far more general, and now holds more
would repeatedly choose nodes from the same area and hope in transposing the A* search method to sampling-
generate more samples in that neighborhood, ad infinitum. based path planning.
We need a measure of the information gain to be expected For any continuous probability P(NEW,FREE ju), we
from generating a sample in a given neighborhood, that is, obtain the following expression for r(u):
a measure of how exhaustively searched the neighborhood
of a node is. In other words, we must depart from the bin- f (u)
ary concept of ‘‘open’’ versus ‘‘closed’’ nodes from the clas- r(u) = ð10Þ
P(NEW, FREEju)
sic A* algorithm, and generalize the concept to a more
appropriate model of information gain. which, as expected, goes to infinity as the probability of
In more concrete terms, the A* search algorithm chooses discovering a new, collision-free node in the neighborhood
the node ui to expand next based on the following criteria: N (u) goes to zero. In addition, we note that, given the defi-
nition of the NEW and FREE events, they are independent
ui = arg min f (u) with u 2 fOPENg ð2Þ distributions, meaning that we can do the following:
u
which could be reformulated by defining a new node value P(NEW, FREEju) = P(NEWju) P(FREEju) ð11Þ
function:
In a classic A* search algorithm, any node belonging to
the OPEN set is given a priori values of 1 for the probabil-
f (u) if u 2 fOPENg
r(u) [ ð3Þ ities P(NEW ju) and P(FREE ju), and given that a visitation
‘ otherwise
is an exhaustive search in the finite neighborhood N (u),
ui = arg min r(u) ð4Þ when the node is closed, those probabilities drop to 0. In a
u
sampling-based approach, this open–close transition is pro-
which clearly exposes the binary nature of the expected gressively achieved as the neighborhood N (u) gets filled
information gain from expanding a node. In other words, with nodes sampled from the configuration space.
Algorithm 1. Randomized expansion from best sample.
1: Initialize a graph G = (V, E) with V = {uinit}.

2: repeat
3: uexp CHOOSEBESTSAMPLE(V) x Pick sample from V
4: {unew,success} RANDOMWALK (uexp) x Generate sample near uexp
5: if success then
6: V V[ {unew}
7: Connect unew to G by adding appropriate edges to E.
8: end if
9: until unew in goal region
2.3 Idealized sampling-based A* algorithm locally reachable from a point u, e.g. a hyperball around u
For analytical purposes, it is sufficient to assume a simpli- or a region constrained by controllability limitations.
The analysis relies on the concept of expansive spaces
fied algorithm, and therefore, Algorithm 1 presents a
(Hsu et al., 1999), and we begin with a brief reminder of
general random-expansion algorithm which is simply a
two fundamental definitions relevant to this concept, as
re-statement of Algorithm 4.1 of Hsu (2000) with the sam-
follows.
pling strategy substituted for the CHOOSEBESTSAMPLE func-
tion. The RANDOMWALK function is a simple random Definition 1 (b-LOOKOUT). Given a constant b 2 (0,1],
steering function that attempts to produce a new sample the b-LOOKOUT of a set S F is
unew in the neighborhood of uexp with a feasible path to it.
Then, the algorithm integrates that new sample to the b(S) [ fu 2 S j m(R‘ (u) n S) bm(R‘ (S) n S)g ð13Þ
motion graph. Finally, the iterations stop when a sample is
where m(S) denotes the volume of a set S and b(S) denotes
within the goal region.
its b-LOOKOUT.
This idealized algorithm is presented mainly for exposi-
tion purposes and to be used in the subsequent formal anal- Definition 2 ((a, b)-expansive). Given constants a, b
ysis, while the practical algorithm is presented in Section 2 (0,1], the set F is (a, b)-expansive if for every point
3. The key aspect of this idealized algorithm is choosing u 2 F and subset S R‘ (u), we have that
the ‘‘best’’ sample. From the previous section, it should be m(b(S)) am(S).
apparent how we would choose such a best sample for In less formal terms, the b-LOOKOUT denotes the region
expansion. Faithfully conforming to the classic A* search from which a significant portion of new space could be dis-
algorithm, we would choose the best sample to expand covered by a random walk or local sampling, while an (a,
based on the best expected value of r(u), as so b)-expansive space has the property of a lower bound on
the volume of the lookout region around any arbitrary
uexp = arg min r(u) ð12Þ point. Together, these definitions form the basis for con-
u2V
structing sequentially reachable chains of milestones that
Broadly speaking, this algorithm captures the essential explore the space (Hsu, 2000), making them an ideal basis
elements of the SBA* algorithm. Given the similarities with for the formal analysis to follow.
the EST algorithm (Hsu et al., 1999), we can consider that
the SBA* falls under that family of methods. The similari-
2.4.1 Unbiased sampling. We begin the analysis by con-
ties between these algorithms run deep enough, in fact, that
sidering the case of unbiased sampling, that is, considering
it is natural to express the formal analysis of the idealized
uniform exploration of the space. To start the construction
SBA* algorithm in terms of the formal analysis that sup-
of this formal analysis, we present a definition of the local
ports the EST family of algorithms.
Voronoi regions.
Definition 3 (Local Voronoi region). Given a point u 2 F
2.4 Formal analysis being the seed of a Voronoi region V(u), the local Voronoi
region V ‘ (u) is defined as
Given the similarities between EST and SBA*, we start the
formal analysis by building upon the work of Hsu (2000) V ‘ (u) [ R‘ (u) \ V(u) ð14Þ
on an idealized EST algorithm, to which we add a few rele-
vant definitions and lemmas to build a proof of probabilis- where R‘ (u) F is the locally reachable region around u.
tic completeness with a convergence rate of the idealized In other words, the local Voronoi region is the set of all
SBA* algorithm. In this section, we refer to the configura- points that are closer to u than to any other point and that
tion space as X and its collision-free subset as F X . are locally reachable from u.SThe local Voronoi region of a
Moreover, we use R‘ (u) F to denote the region of space point set V is then V ‘ (V ) [ u2V V ‘ (u).
Z X m(V ‘ (u))
Lemma 1. Given a metric space (X , d) and a set V of c(G) = m(du0 ) ð20Þ
points u 2 X for which there is local reachable region G u2N (u0 ) m(R‘ (u))
R‘ (u) = fv 2 X j d(u, v) Rg, the local Voronoi space of
V (as per Definition 3) is coincident with the local reach- If we look at G R‘ (V ) with
able space of V, i.e. V ‘ (V ) = R‘ (V ). N = fu 2 V ju 2 Gg V , the integral c(G) reduces to
Proof. Since the Voronoi regions span the entire space X , a X m(V ‘ (u))
point can only be excluded from V ‘ (V ) if it is not reach- c(G) = m(R‘ (N )) m(R‘ (u) n G)
u2N
m(R‘ (u))
able from the point that is closest to it (the seed of the ð21Þ
X m(V ‘ (u))
Voronoi region), and because the local reachable regions + m(G \ R‘ (u))
R‘ (u) are hyperballs of equal radius, such a point cannot u2V nN
m(R‘ (u))
be reachable from any other point in V, and thus, must
necessarily be excluded from R‘ (V ) as well. That is, a By defining the sets I = fu 2 N j R‘ (u) n G 6¼ ;g and
point belongs to V ‘ (V ) if and only if it belongs to R‘ (V ). O = fu 2 V n N j R‘ (u) \ G 6¼ ;g, the bounds on c(G) can
It is important to note that Lemma 1 extends to a free be expressed as
configuration space F over a metric space, as long as F is
compact (cf. fully controllable state space). The proof is m(G) m(R‘ (I)) c(G) m(G) + m(R‘ (O)) ð22Þ
only more involved if one has to consider reachable regions
which are expected to be conservative bounds. More impor-
R‘ (u) that intersect the non-convex boundaries of F . For
tantly, the true integral always lies within a limited interval
that reason, we omit this generalization here.
around the uniform value, with both the upper and lower
Also, one should note that a consequence of Lemma 1
margins being of equal volume on average, meaning that
is that it presents an alternative to computing the volume of
the adopted sampling strategy approximates uniform sam-
R‘ (V ) by taking the sum of the volumes of the local
pling and is unbiased.
Voronoi regions:
The significance of Lemma 2 should be clear as it pre-
X sents the proposed sampling strategy as a viable alternative
m(R‘ (V )) = m(V ‘ (v)) ð15Þ
v2V
to uniform sampling in R‘ (V ), which is difficult in practice
(and for analysis). The local deviations of the integral prob-
which suggests a sampling strategy based on the fraction abilities merely imply that the distribution of samples devi-
m(V ‘ (u)) of the total volume m(R‘ (V )) as a basis for generates from a uniform distribution symmetrically within a
ating nearly uniform samples in that region. reachable band around the boundary of G, but is not biased
by the density of the existing points in V inside or around
Lemma 2. Given a point set V where each point u 2 F
G. It is also worth noting that for most choices of G the
has a locally reachable region R‘ (u) F , then drawing a
deviations are limited to a small fraction of m(G). Lemma 2
sample u0 uniformly from the reachable neighborhood
allows us to prove the lemma that follows, which is key to
R‘ (u) of a point u chosen with probability:
obtaining a convergence rate and thus, probabilistic
m(V ‘ (u)) completeness.
P(u) = ð16Þ
m(R‘ (V )) Lemma 3. Given a point set V where each point u 2 F
has a locally reachable region R‘ (u) F , and that F is
yields a sample distribution that is an unbiased approxima-
(a, b)-expansive for some values of a, b 2 (0,1] (Hsu,
tion of a uniform distribution over R‘ (V ).
2000), then drawing a sample u0 uniformly from the reach-
Proof. The integral definition of a uniform probability dis- able neighborhood R‘ (u) of a point u chosen with
tribution over X is probability:
Z
m(V ‘ (u))
m(X ) p(x)m(dx) = m(G) forallG X ð17Þ P(u) = ð23Þ
G m(R‘ (V ))
which can only be true if the distribution is constant and yields a probability of sampling u0 in the b-LOOKOUTof
integrates to 1 over X. R‘ (V ) that is at least a.
Following the same logic, we can examine the following
Proof. Following the proof of Lemma 2, we can choose to
integral:
take the integrals c over R‘ (V ) n b(V ) and its complement,
Z where b(V) denotes the b-LOOKOUT of R‘ (V ), which sat-
c(G) = m(R‘ (V )) p(u0 )m(du0 ) ð18Þ isfy the following:
G
Z X c(b(V )) = m(R‘ (V )) c(R‘ (V ) n b(V )) ð24Þ
m(V ‘ (u))
c(G) = m(R‘ (V )) m(du0 ) ð19Þ
G u2N (u0 ) m(R ‘ (V ))m(R‘ (u)) c(b(V )) m(R‘ (V )) m(R‘ (V ) n b(V )) m(R‘ (O)) ð25Þ

c(b(V )) m(b(V )) m(R‘ (O)) ð26Þ Definition 4 (h-Entanglement). Given a free metric space
(F , ds ) (X , d) with connected components F 1 , . . . , F k ,
where O is the set of points from V that lie in the b- and some constant value h 1, then the free-space is said
LOOKOUT of R‘ (V ), which is empty, because, by definition, to be h-entangled if, for all i 2 [1,k], the total length of
points in the b-LOOKOUT must be able to reach beyond the collision-free path between any pair of points in F i is
R‘ (V ), which none of the points in V can do. Therefore, no more than an h multiple of the ‘‘bird-flight’’ distance
m(R‘ (O)) = 0. Furthermore, in a (a, b)-expansive space, the between that pair of points, i.e.
volume of the b-LOOKOUT is bounded below by a fraction a
of the total volume, meaning that c(b(V )) am(R‘ (V )), or, ds (u, v) hd(u, v) for all u, v 2 F i ð28Þ
in terms of the integral probability,
where ds(u,v) is the length of the shortest non-colliding
Z
c(b(V )) path between u and v, and d() denotes the ‘‘bird-flight’’ dis-
P(u0 2 b(V )) = p(u0 )m(du0 ) = a ð27Þ tance between two points in X .
b(V ) m(R‘ (V ))
Definition 4 characterizes the space in terms of how
which completes the proof. much more than a bird-flight distance one has to travel to
For analytical purposes, we use Algorithm 1, but substi- get from any point to any other within a single connected
tute, in place of the CHOOSEBESTSAMPLE function, the component of the free space. Naturally, one could trivially
‘‘Voronoi-sample’’ strategy described in Lemma 2. Because refine the definition to a single connected component F 0
the sampling strategy is the only relevant difference com- by saying that F 0 is h-entangled, or to a single-query prob-
pared to the idealized EST algorithm, we are able to re-use lem by considering only a single pair of points. The key
Theorem 4.3 from Hsu (2000) in its entirety by using here is that the h value gives a bound on how difficult the
Lemma 3 to arrive at the same probabilistic convergence problem can be compared with a trivial straight-line
rate, under the same assumptions. solution.
Theorem 4. Let g . 0 be the volume of the goal region in Corollary 5. Given a pair of points (uinit , ugoal ) 2 F 0 where
X and g be a constant in (0,1]. A sequence V of n nodes F 0 is an h-entangled connected component of F X for
generated by Algorithm 1 with the sample uexp chosen by some finite value h 1, let g be a constant in (0,1]. If we
the strategy described in Lemma 3 contains a node in the have a sequence V of n nodes generated by Algorithm 1 with
goal region with probability at least 1 2 g, if n (k/ the following modified expansion probability:
a)ln(2k/g) + (2/g)ln(2/g), where k = (1/b)ln(2/g).
1 m(V ‘ (u))
P0 (u) = ð29Þ
Proof. The proof for this theorem is identical to the proof to f (u) m(R‘ (V ))
Theorem 4.3 of Hsu (2000) since Lemma 3 establishes that
the probability of sampling a point in the lookout of R‘ (V ) where f(u) [ ds(uinit, u) + d(u, ugoal), then V will con-
is at least a as is required by the ideal sampler in that proof. tain a node in the goal region R‘ (ugoal ) with probability at
Theorem 4 is a significant result, even beyond the algo- least 1 2 g, if n (kh/a)ln(2k/g) + (2/g)ln(2/g), where
rithms proposed in this paper. The sampling strategy k = (1/b)ln(2/g)
adopted in the analysis can be related to many existing Proof. By definition of h-entanglement, the heuristic f(u)
sampling strategies and density metrics. Most notably, the is bounded as
RRT algorithms implicitly generate samples in regions
where the Voronoi cells are the largest by volume and, f (u) hdmax , dmax [ max d(u, v) ð30Þ
(u, v)2F 02
therefore, the bound on the convergence probability in
Theorem 4 applies to the RRT-style algorithms as well, where the maximum distance value dmax is a constant for a
which may prove more convenient than existing bounds given connected component F 0 . And because dmax is con-
(LaValle and Kuffner, 2001). Then, many algorithms based stant, we can use it to multiply P0 (u) without affecting the
on expansion or random walks, such as kinodynamic distribution, giving
motion planners, rely on density heuristics (Hsu et al.,
1999), and Lemma 2 implies that the sampling strategy dmax m(V ‘ (u)) 1 m(V ‘ (u))
P00 (u) = P0 (u)dmax = ð31Þ
adopted here should be the target value that density heuris- f (u) m(R‘ (V )) h m(R‘ (V ))
tics should approximate or be compared with.
which implies, from Lemma 3, that for a sample u0 , we have
a bounded probability that it lies in the lookout of R‘ (V ) :
2.4.2 Sampling heuristic. So far, we have ignored the pres-
ence of the heuristic function that prioritizes the search. We a
P(u0 2 b(V )) ð32Þ
now look at the effect it has on the convergence of the algo- h
rithm, but first, we must characterize the difficulty of the
and from this point on, the probability bound a/h replaces
problem with respect to the heuristic and the sought solu-
the probability a in Theorem 4, reaching the probability
tion path.
bound stated, and completing the proof.
The value of f(u) distorts the sampling probability distri- graph in the same way as in discrete search algorithms.
bution in a fashion that is concentric around the straight- Evidently, this approximation will necessarily over-estimate
line path from uinit to ugoal which will lower the worst-case the actual optimal path distance, and an asymptotic
probability of sampling in the b-LOOKOUT of R‘ (V ) if the approach of the actual value hinges on the fact that, as the
lookout region is far from that central axis. Nevertheless, motion-graph grows, its edges can capture the optimal
as Corollary 5 states, the entanglement of the space or the path. Therefore, adopting a connection strategy that will
problem worsens the probabilistic convergence in a (a, b)- capture the optimal path is critical, but, fortunately, such
expansive space by a factor 1/h on the probability a. proven connection strategies have already been developed
At this point, a salient question might arise: if the heuris- as part of the RRG and RRT* algorithms (Karaman and
tic worsens the convergence rate, why use it? To that, we Frazzoli, 2011).
must remind the reader that the heuristic may worsen the In that vein, observing the proofs of asymptotic optimal-
worst-case convergence rate, but will increase the average ity for RRG/RRT* (Karaman and Frazzoli, 2011), there are
convergence rate in problems with low entanglement. A for- clear parallels between the sequences of connected balls
mal analysis of the average case is beyond the scope of this and the linking sequences used to prove probabilistic com-
article as it would require far more elaborate constructs, but pleteness both here and for EST-like algorithms (Hsu,
practical experience and formal analysis on discrete search 2000). Moreover, those proofs assume a uniform sampling
algorithms, such as A*, support this conjecture. in the reachable free-space, which we establish in Section
2.4.1 and modify with a bias in Section 2.4.2 with a
bounded distortion on the sampling. Together, these obser-
2.4.3 Practical implications. As we emerge from the for- vations make a strong argument to support the proposition
mal analysis, it is important to outline some of the key prac- that the idealized sampling-based A* algorithm is also
tical implications of the analysis and how they carry over to asymptotically optimal when it employs the optimizing
a practical algorithm. connection strategy of either the RRG or RRT* algorithms.
First, the sampling strategy adopted in the analysis relies
on the volume of local Voronoi regions around the points of
the motion-graph. Clearly, computing those volumes is pro- 3. Practical sampling-based A* algorithms
hibitively hard and cannot be used in a practical algorithm. In this section, we present a complete and detailed descrip-
However, it is clear that the probability P(NEW, FREEju) is tion of practical SBA* algorithms, moving away from the
a direct analog of those Voronoi volumes. Therefore, to general and idealized discussions of the previous section.
achieve an unbiased sampling within R‘ (V ), the objective First, we present a predictive model that can approximate
should be to approximate either the volume of the local the node values r(u) in a convenient and computationally
Voronoi regions or, equivalently, the probability P(NEW, efficient way. Second, we define a basic, concrete form of
FREEju). Intuition also seems to indicate that if we wish to the SBA* algorithm. Finally, we present a number of
increase the convergence rate, we should use a sampling important improvements to this basic algorithm.
strategy that is even more biased towards the larger Voronoi
volumes, or least explored regions, i.e. inducing a Voronoi
bias, as is the hallmark of the RRT sampling strategy. 3.1 Predictive model for node values
Furthermore, the sampling strategy of Equation (12) is In the rich field of information theory, there are many
clearly not identical to the probabilistic sampling adopted tools that can help us construct a predictive statistical
in Corollary 5. However, given that each expansion will model for the expected value r(u), but probably the most
decrease the P(NEW, FREEju) value of its neighbors, there widely used is the Kullback–Leibler (KL) divergence
is a natural convection occurring in the ranking of the (Kullback and Leibler, 1951), which is especially useful
nodes by their expected value r(u) which provides a similar here due to its straight-forward relationship to the concept
probabilistic effect at a much cheaper computational cost. of surprisal. Given a sample drawn from the configuration-
And as was previously noted, the introduction of further space neighborhood of node u, we want to know what is
bias towards the larger local Voronoi volumes is only the probability that the sample is surprising (i.e. NEW)
expected to improve the convergence by increasing the with respect to the current set of nodes in the neighborhood
chance of sampling in the lookout of R‘ (V ), and thus, of u.
adopting Equation (12) is likely to be beneficial, rather To represent the configuration-space neighborhood
than detrimental as compared to the idealized sampling N (u), we use Su(x) as a probability distribution reflecting
strategy. the sampling region, i.e. Su(x) gives the probability of the
Finally, Corollary 5 assumes, in the definition of the configuration x in the neighborhood of u. Let Nu be the set
value function f(u), that the optimal path distance ds(uinit, of neighboring samples within the reachable region of u,
u) is available. Computing this value is as hard as solving that are currently part of the motion graph or that were once
the single-query problem itself and, therefore, only an attempted to be added to it (and failed due to a collision or
approximation can be used in practice. An approximation to the connection strategy). Then, we can define SN,u(x) as
of this value can be accumulated in the nodes of the motion the probability distribution reflecting the sampling of nodes
from the sampling regions around the nodes in Nu, instead If we use the same standard deviation s (defining the
of the neighborhood of u directly. This allows us to formu- size of the sampling region), the expression further reduces
late the probability of a surprisal around node u as to the following:
P(NEWju) = 1 eDKL (Su jjSN, u ) ð33Þ dist(p(u), p(v))2

DKL (Su jjSv ) = ð37Þ
2s2
where DKL(SujjSN,u) is the KL divergence if SN,u is used to
represent the sampling region Su and, thus, eDKL (Su jjSN , u ) is which is only part of the work, since we are really interested
the probability that a sample from Su is unsurprising to the in DKL(SujjSN,u).
distribution SN,u (i.e. the sample is in a region covered by We can model the neighborhood distribution SN,u as a
SN,u). mixture of the Gaussian distributions around each of the
Similarly, if we define Cu as the set of attempted nodes nodes in Nu. There is no closed-form expression for the KL
around u that have been tested and found not to be connec- divergence between Gaussian mixtures (or between a
tible to u due to a collision (or other limitations in the con- Gaussian and a Gaussian mixture), however, there are a
figuration space), then we get a sampling region number of suitable closed-form approximations. For exam-
representing the non-free configuration space neighbor- ple, a good approximation of the KL divergence of a
hood of u that we denote as SC,u. Then, we get the prob- Gaussian distribution and a mixture of Gaussians is the fol-
ability of sampling a reachable node as lowing variational approximation (Hershey and Olsen,
2007):
P(FREE j u) = 1 eDKL (Su jjSC, u ) ð34Þ !
X
DKL (Su jjSv )
which, together with P(NEW ju), provide the theoretical DKL (Su jjSN , u ) = log wu, v e ð38Þ
v2Nu
bedrock for constructing a predictive model for the
expected value r(u). where wu,v are the multinomial distribution factors that
Proceeding forward is a simple matter of finding a con-
weight the different
P Gaussians of the mixture SN,u, i.e. we
venient and sufficiently accurate method to approximate
must have v2Nu u, v = 1. The above can be substituted
w
the KL divergence between u and its neighbors. Clearly,
into Equation (33) to obtain this approximation of the
this method can depend largely on the application domain,
expected surprisal around node u:
and can vary in sophistication, so, we proceed by demon-
strating one particular choice: a method based on a X
P(NEW j u) = 1 wu, v eDKL (Su jjSv ) ð39Þ
Gaussian distribution for Su and Gaussian mixtures for SN,u v2Nu
and SC,u.
The advantages of Gaussian distributions include their At this point, we need to determine appropriate values
applicability to any metric space, their relationship to many for the weights wu,v. One obvious candidate is a uniform
other types of distributions as first-order approximations of distribution, i.e. all weights are equal with value
them, and the availability of a simple closed-form expres- wu, v = jN1u j , which effectively makes the expected surprisal
sion for the KL divergence. This makes the Gaussian distri- the average surprisal in the neighborhood Nu. However, the
butions suitable to many applications, even when a average surprisal is in contradiction with the premise of the
Gaussian distribution is not used for generating random KL divergence, that is, as a measure of how accurate it
samples around a given node. In very general terms, a nor- would be to model the distribution Su with the mixture
mal distribution, around node u, can be formulated as the SN,u. Clearly, if we want to maximize the accuracy, we
following probability density function (pdf): would prefer to sample from the Gaussians in SN,u that are
more probable in the distribution Su. This naturally leads to
2
1 dist(x, p(u))
2 setting the weights to be the mean probability in Su, i.e. we
Su (x) = k e
2su
ð35Þ
2ps2u 2 can use wu,v = Su(p(v)), leading to
X
where p(u) is the configuration associated with node u, su P(NEW j u) = 1 Su (p(v)) eDKL (Su jjSv ) ð40Þ
is the standard deviation around u with respect to the dis- v2Nu
tance metric dist (,), and k is the dimensionality of the

which also has the advantage, like with the average value,
configuration space. With this definition, we can proceed
that this surprisal value can be incrementally accumulated
and state the expression for the KL divergence between
in every node without needing to re-examine the neighbor-
two Gaussian distributions:
hood Nu every time a neighbor is added to it. If we define
2 the density value d(u) as follows:
k s2u su dist(p(u), p(v))2
DKL (Su jjSv ) = 2
1 log 2
+ X
2 sv sv 2s2v
d(u) [ Su (p(v)) eDKL (Su jjSv ) ð41Þ
ð36Þ v2Nu

and assuming that during iteration k, we add a new node v standard deviation, we must use Equation (36) to calculate
in the neighborhood of u, we can accumulate the density their KL divergence. Finally, as before, the constriction
value as value can be incrementally computed as new unreachable
nodes are found:
dk + 1 (u) = (1 Su (p(v))) dk (u) + Su (p(v)) eDKL (Su jjSv )
ð42Þ ck + 1 (u) = (1 Su (p(v))) ck (u) + Su (p(v)) eDKL (Su jjScv )
ð44Þ
where dk(u) is the density value of u before the addition of
node v, and dk + 1(u) is the density value after adding node At this point, we can come back to the expression for
v. This incremental calculation is extremely convenient the expected value of a sample from the configuration
from a computational point of view since traversals of space neighborhood of u, which we defined as r(u) and
neighborhoods of nodes in a graph are relatively expensive compute using Equation (10). We get the following expres-
operations, even with state-of-the-art cache-optimized data sion for r(u),
structures, and minimizing their occurrences in the itera-
tions of an algorithm can be of great benefit. g(u) + h(u)
r(u) = ð45Þ
We can proceed in a similar fashion to obtain the prob- (1 c(u))(1 d(u))
ability of sampling an unreachable node in the neighbor-
which depends on four values associated to each node: the
hood of u. This time, we define the constriction value c(u)
accumulated travel distance or cost g(u), the heuristic dis-
as follows:
tance or cost h(u), the constriction value c(u), and the den-
X sity value d(u).
c(u) [ Su (p(v)) eDKL (Su jjScv ) ð43Þ
v2Cu
where we have another distribution, denoted as Scv, which 3.2 Sampling-based A*

defines the region around v where we expect nodes to be With a key value r(u) that appropriately reflects the
unreachable. For example, if we tried to steer from node u expected path discovery around a given vertex u, we can
to node v and a collision (or other limit) was detected half- drive a sampling-based algorithm using the A* search strat-
way, then we can approximate the unreachable region as egy, and this algorithm is presented in this section. In its
being centered around v and extending to a radius equal to essence, the algorithm breaks down to two main compo-
half the distance between u and v, which we can approxi- nents: the search loop and the connection strategy. The for-
mate with a Gaussian distribution to keep things consistent. mer is presented in Algorithm 2 as SBA*-Loop, while the
Since the distributions Su and Scv are different in their latter is presented in Algorithms 3 and 4 as
Algorithm 2. The sampling-based A* algorithm Loop.
1: function SBA*-Loop Q, V, E
Require: Q is a priority queue ordered by minimum r[u] value.
Require: V is the list of all vertices of the motion-graph.
Require: E is the list of all undirected edges of the motion-graph.
Require: Each vertex u has associated values for position p[u], heuristic h[u],
accumulated distance g[u], density d[u], constriction c[u], key value r[u] and
predecessor pred[u].
Ensure: There are no more useful vertices to explore, or the termination condition
was met.
Ensure: The pred[u] values trace out the optimal path from any vertex back to the
start.
2: repeat
3: u Top(Q)
4: {pnew , success} RANDOMWALK (p[u]) x Generate reachable sample near p[u]
5: if success then
6: {P,S} NEARESTNEIGHBORS (pnew , V) x Get set of neighbors of pnew
7: v NEWVERTEX (pnew, V) x Create new vertex at position pnew
8: CONNECTPREDECESSORS (v, P, Q, V, E)
9: CONNECTSUCCESSORS (v, S, Q, V, E)
10: else
11: Update c[u] and d[u] using Equations (41) and (43). x or, Record the failure
12: REQUEUE (u, Q)
13: end if
14: until EMPTYQ or SHOULDTERMINATE (V, E)
15: end function

Algorithm 3. The sampling-based A* predecessor connection function.
1: function CONNECTPREDECESSORS (u, P, Q, V, E)

Require: Q, V, and E are as defined in SBA*-LOOP.
Require: P is the set of potential predecessors of u in the motion graph.
Ensure: The predecessors of u are connected, and P contains only viable predecessors.
2: for all v2P do
3: {success, snew} STEER(p[v], p[u]) v Attempt non-colliding travel
4: if success then
5: Update d[u] and d[v] using Equation (41).
6: e NEWEDGE (v, u, snew, E) v Create new edge with travel snew
7: if g[u] . g[v] + cost (snew) then v Choose predecessor
8: pred[u] v; g[u] g[v] + cost (snew)
9: end if
10: else
11: Update d[u], c[u], d[v] and c[v] using Equations (41) and (43).
12: P P \{v}
13: end if
14: REQUEUE (v, Q)
15: end for
16: REQUEUE (u, Q)
17: end function
Algorithm 4. The sampling-based A* successor connection function.
1: function CONNECTSUCCESSORS (u, S, Q, V, E)

Require: Q, V, and E are as defined in SBA*-LOOP.
Require: S is the set of potential successors of u in the motion graph.
Ensure: The successors of u are connected, and S contains only viable successors.
2: I {}
3: for all v 2 S do
4: {success, snew} STEER (p[u], p[v]) v Attempt non-colliding travel
5: if success then
7: e NEWEDGE (u, v, snew, E) v Create new edge with travel snew
8: if g[v] . g[u] + cost (snew) then v Rewire if necessary
9: pred[v] u; g[v] g[u] + cost (snew)
10: I I [ {v} vv has inconsistent successors
11: end if
12: else
13: Update d[u], c[u], d[v] and c[v] using Equations (41) and (43).
14: S S \{v}
15: end if
16: REQUEUE (v, Q)
17: end for
18: REQUEUE (u, Q)
19: repeat v Propagate changes in successors
20: u Pop (I)
21: for all vjpred[v] = u do v Iterate through successors of u
22: g[v] g[u] + cost (s(p[u],p[v]))
23: I I[{v}
24: REQUEUE (v, Q)
25: end for
26: until EMPTY (I)
27: end function
ConnectPredecessors and ConnectSuccessors, respectively. 3.2.1 Search and connection strategies. The main loop
In both pseudo-code presentations, many of the implemen- simply keeps a priority queue for all of the nodes in the
tation details have been omitted for brevity, outlining only motion-graph with respect to their associated key value
1
the main logic of the algorithm. r[u], prioritizing the nodes with minimum key values. At
each iteration, the best node is obtained from the queue and re-positioned in the priority queue. The main difference
a sample is drawn from its neighborhood with an attempt between Algorithms 3 and 4 as compared with the RRG
to achieve a collision-free path to that sample: this process connection routines is the work done to update the density,
is performed by an application-specific RANDOMWALK func- constriction, and key values, as well as keeping the priority
tion. If the sampling and connection were unsuccessful, the queue consistent with those values.
failure must be recorded in the cumulative values of the
density d[u] and constriction c[u] using Equations (41) and
(43), respectively, or using a recursive formula if possible, 3.2.2 Implementation remarks. At this point, we make a
as per the remarks in the previous section. Then, the node’s few general implementation remarks: these will be obvious
rank in the priority queue is updated since its key value was to any seasoned implementer, but worth mentioning never-
changed through the recorded failure. If the sampling and theless. First, our presentation of the algorithms include
connection were successful, then the new sample can be explicit mentions of the points at which the priority queue
inserted into the motion graph by first obtaining sets of is updated, which is not customary in standard pseudo-code
neighbors (predecessors and successors), then creating a expositions. We aim to show explicitly where those updates
new vertex in the motion graph and finally, calling are needed since they can represent an important computa-
Algorithms 3 and 4 to create the connections. The sets of tional cost in an implementation.
nearest neighbors can have a size (number of neighbors) Second, the recursive updates of the accumulated dis-
and range that are either fixed or dictated by an adaptive tances, after re-wirings (or successor connections) have
strategy such as the so-called star neighborhood (Karaman been performed, are presented as a non-recursive breadth-
and Frazzoli, 2011). first traversal (via the inconsistent set I). Here, we want to
The loop finishes when there are no more nodes in the make it clear that an implementation through actual recur-
queue or when some specific termination condition is sive calls is out of the question in this type of application
reached (checked with the ShouldTerminate function). In given the potential great depth of any branch of the optimal
non-optimizing sampling-based motion planners, the natu- motion tree.
ral termination condition is reached when a connection is Third, when the local planner and distance metric are
established between the start and goal locations. However, asymmetric, which is quite common in practice, the set of
like most asymptotically optimal planners, the SBA* algo- potential nearest predecessors to a given configuration
rithm does not have a natural stopping criterion. In this point is different from the set of potential nearest successors
case, this is manifested by the queue never becoming empty from that point. The nearest-neighbor query for a set of pre-
as samples are consistently requeued to it. One option to decessors and successors can, in general, be performed in
cause a natural termination is to impose a threshold on the one operation, thereby making fairly substantial savings in
density d[u] or value r[u] above which the samples are no computational effort, especially in terms of making good
longer requeued, which will eventually exhaust all valuable use of cached memory. On the contrary, if the planner and
samples and empty the queue. This termination method corresponding distance metric are symmetric, i.e. the path
introduces additional parameters to tune and it is unclear and distance from A to B are exactly the same as the path
what repercussions it would have on the convergence of the and distance from B to A, then the algorithm must simply
algorithm. However, simple termination conditions that are use the same set of neighbors for predecessors and succes-
often used in sampling-based algorithms can be easily sors. Furthermore, in that case, Algorithms 3 and 4 can be
applied here, such as terminating after a certain number of combined into a single routine, allowing some beneficial
iterations have passed or when a sufficiently good solution modifications such as the removal of the second set of
was obtained. steering attempts needed when connecting successors.
In this nominal version of the sampling-based A* algo- Finally, as in most sampling-based motion planners (and
rithm, an exhaustive and eager connection strategy is used, even planning on a fixed motion graph), the topology of the
as seen in Algorithms 3 and 4, which is similar to the con- motion graph is essentially that of a nearest-neighbor graph,
nection strategy in the RRG algorithm (Karaman and and most operations done (re-wirings) are local as well and,
Frazzoli, 2011); a more economical alternative will be pre- thus, it is recommended to choose a storage strategy that
sented in Section 3.3.1. The CONNECTPREDECESSORS routine reflects this pattern in the memory layout of the nodes, i.e.
proceeds to connect a new vertex u to its potential prede- nodes that are close to each other in configuration space
cessors, and, at the same time, it finds the optimal prede- ought to be close to each other in physical memory. There
cessor in that set. Then, the ConnectSuccessors routine are some appropriate cache-aware or cache-oblivious data
makes the successor connections going from the vertex u structures that can be used for that purpose (Kasheff, 2004;
to other neighboring vertices in the motion graph. Both Chowdhury, 2007; Jamriska et al., 2012). Moreover, an
routines rely on a Steer function that attempts to steer important computational cost in all sampling-based motion
between two points and returns a record of the path snew planners is the nearest-neighbor queries performed at each
that links the two points with cost cost (snew), if successful. iteration. It is thus important, for performance sake, to have
In addition, of course, the successors must be recursively an effective space-partitioning tree to resolve those queries
traversed to update their accumulated distance g[v] and be in O(log(N)) time (Fu et al., 2000), which often implies a
cache-friendly data structure to keep the performance from values by surveying the neighborhood will not accurately
degenerating towards O(N) time due to cache thrashing. reflect the true neighborhood of the node, unless a nearest-
neighbor search is conducted, which would be prohibitively
expensive relative to keeping a fully connected motion
3.3 Refining the sampling-based A* graph. Hence, we recommend the general form if a survey
Thus far, we have presented a very general version of the of the neighborhood is required to update the density or
sampling-based A* algorithm, but it has many shortcom- constriction values associated with the nodes.
ings which can be rectified, especially when simplifying Finally, there is a trade-off involved in checking colli-
assumptions can be made. In this section, we present a sions lazily in the SBA* algorithm. Because the SBA*
number of improvements to the general algorithm with the heuristic uses a constriction value to reflect the likelihood
aim of mitigating those problems. First, because the general of sampling colliding points in the neighborhood of a point,
version uses an RRG connection strategy, an obvious if collision is only checked for potentially optimal edges of
improvement is to adopt an RRT* connection strategy, i.e. the motion graph, then the constriction will be underesti-
pruning sub-optimal edges and evaluating collisions in a mated in general since it will not capture colliding paths in
lazy fashion, that is, under the assumption that full- sub-optimal directions. Fortunately, this does not seem to
neighborhood connectivity is not required for the density have a significant impact on the algorithm, and in fact, the
computations. Second, one shortcoming of the SBA* algo- SBA* algorithm can still work even without a constriction
rithm is that it is biased towards improving current optimal value. In our opinion, this trade-off is reasonable given the
paths (i.e. exploitation) and not towards finding a connec- potential benefit that lazy collision checking can have on
tion to the goal region. Given that this shortcoming is also overall performance.
an issue with the A* algorithm, we employ the same solu- Mostly for the sake of completeness, we present the con-
tion that is employed in that domain, i.e. the Anytime A* nection strategy in the form of Algorithms 5 and 6. These
heuristic (Likhachev et al., 2003) is used to drive the constitute rather straight-forward modifications to the gen-
growth of the motion-graph more rapidly towards the goal. eral connection algorithms. Mainly, the distance metric is
Finally, a fundamental issue with the exploitation bias is first computed and tested for giving rise to an optimal edge
the lack of exploration of the unknown regions of the con- and, if so, non-colliding steering is attempted and, if suc-
figuration space, which could potentially yield better paths. cessful, a new edge is created to replace the existing inci-
To solve this problem, we employ a simulated annealing dent edge. What is noteworthy, however, are the updates to
(Ingber, 1996) approach that balances the SBA* algorithm the density and constriction values. Whether a connection
with a purely exploratory algorithm, that is, RRT*. is useful or not (optimal), and whether a connection is pos-
sible or not (collision-free), the density value must still be
updated to reflect that a new neighboring point has been
3.3.1 Lazy and pruned connections. In this section, we added to the motion graph and, thus, the update is per-
present an alternative to the connection strategy presented formed for all potential predecessors and successors. As
in Algorithms 3 and 4 which prunes away sub-optimal per the aforementioned trade-off, the constriction value can
edges and delays collision-checking to the point of creation only be updated once a collision-free travel was attempted
of a new optimal edge. This alternative strategy is certainly and failed, as seen in Algorithms 5 and 6. Clearly, the astute
more economical in terms of computational time and mem- reader will notice that this connection strategy is in effect
ory required to store the motion graph, which now becomes the same as that used in the classic RRT* algorithm.
a motion tree due to the pruning of sub-optimal edges.
However, it is important to note the assumptions that must
be made and a trade-off involved in choosing this strategy. 3.3.2 Anytime A* as goal bias. As is well known, the A*
The first necessary assumption is that the distance algorithm is driven to fully explore the regions around the
metric must reflect the actual cost of travel between two optimal path, but not towards actually finding a feasible
nodes (if a collision-free path exists); this assumption is path as fast as possible. The SBA* algorithm will behave
necessary to use the lazy collision checking strategy. In the in a similar fashion, generating samples around the optimal
general SBA* algorithm, a steering routine is always called regions to try and enrich those areas (up to a certain den-
before the distance value is required, i.e. eagerly and, thus, sity), but there is no bias towards actually reaching the goal
the general form should be used when a simple and rela- region.
tively inexpensive distance metric cannot accurately reflect In the domain of discrete path-planning, this problem
the steering cost-to-go. can be solved with the so-called Anytime A* algorithm
It is also necessary to assume that recursive (or incre- (Likhachev et al., 2003). The idea with this method is to
mental) formulations for the density and constriction values artificially inflate the heuristic value such that priority for
are available for the given problem. If we prune sub- choosing nodes is biased towards those nodes that have a
optimal edges from the motion-graph, then it implies that smaller heuristic value, i.e. are closer to the goal region.
nodes will only be connected to their ‘‘optimal neighbors’’, This method rapidly finds a feasible solution, and as the
which means that re-computing the density or constriction heuristic values are deflated back to their true values, the
Algorithm 5. The lazy sampling-based A* predecessor connection function.
1: function LAZYCONNECTPREDECESSORS (u, P, Q, V, E)

Require: P, Q, V, and E are as defined in CONNECTPREDECESSORS.
Ensure: The optimal predecessor of u is connected.
2: for all v 2 P do
4: if g[u] . g[v] + dist (p[v],p[u]) then
5: {success,snew} STEER (p[v], p[u]) vAttempt non-colliding travel
6: if success then
7: REMOVEEDGE (pred[u], u, E) v Prune existing sub-optimal edge.
8: e NEWEDGE (v, u, snew, E) v Create new edge with travel snew
9: pred[u] v; g[u] g[v] + cost (snew)
10: else
11: Update c[u] and c[v] using Equation (44).
12: end if
13: end if
14: REQUEUE (v, Q)
15: end for
16: REQUEUE (u, Q)
17: end function
Algorithm 6. The lazy sampling-based A* successor connection function.
1: function LAZYCONNECTSUCCESSORS (u, S, Q, V, E)

Require: S, Q, V, and E are as defined in CONNECTSUCCESSORS.
Ensure: The optimal successors of u are connected.
2: I {}
3: for all v 2 S do
5: if g[v] . g[u] + dist (p[u],p[v]) then
6: {success, snew} STEER (p[u], p[v]) v Attempt non-colliding travel
7: if success then
8: REMOVEEDGE (pred[v], v, E) v Prune existing sub-optimal edge.
9: e NEWEDGE (u, v, snew, E) v Create new edge with travel snew
10: pred[v] u; g[v] g[u] + cost (snew)
11: I I [ {v} vv has inconsistent successors
12: else
13: Update c[u] and c[v] using Equation (44).
14: end if
15: end if
16: REQUEUE (v, Q)
17: end for
18: REQUEUE (u, Q)
19: repeat v Propagate changes in successors
20: u Pop (I)
21: for all vjpred[v] = u do v Iterate through successors of u
22: g[v] g[u] + cost (s(p[u],p[v]))
23: I I [ {v}
24: REQUEUE (v, Q)
25: end for
26: until EMPTY(I)
27: end function
product of the algorithm is brought back to being the opti- that is, we are looking for a solution to the following
mal path, thus, giving it an anytime quality which is useful problem:
in many applications.
For the SBA* algorithm, this anytime strategy could minðg(u) + h(u)Þ with h(u) = 0 ð46Þ
u
prove to be very useful as a bias towards establishing a
connection to the goal region. The problem statement is as which, in the classic A* heuristic, does not include the
follows: we seek an optimal path that connects to the goal, equality constraint because an exhaustive search is
guaranteed to find the node that satisfies the constraint if The only problem remaining now is to strike a balance
one exists. If we hope to find a solution that satisfies the between nodes generated with the SBA* strategy and with
equality constraint, without requiring an exhaustive search, the RRT* strategy, i.e. between exploitation and explora-
we must put emphasis on satisfying the constraint, as is tion. A classic technique used for this purpose in the con-
usually done in constrained optimization algorithms. For text of general optimization problems is simulated
this purpose, we can introduce a Lagrange multiplier annealing (Ingber, 1996). We propose to employ this strat-
l 2 [0,N[: egy in the context of motion planning, specifically as a
means of balancing the two sampling strategies. The choice
m(u) [ g(u) + h(u) + lh(u) ð47Þ between exploitation and exploration is determined ran-
domly at each iteration with a probability driven by a tem-
The effect of a large value for the Lagrange multiplier is perature value which cools as the algorithm progresses.
to drive the search more greedily towards satisfying the Initially, at high temperatures, there is a greater chance of
constraint, i.e. establishing a connection to the goal region. choosing exploration, and as cooling takes effect, the focus
Once the connection has been established, the Lagrange is shifted towards exploitation.
multiplier can be relaxed down to 0. However, given that a The classic cooling formula (Ingber, 1996) used in simu-
single connection to the goal is probably not useful enough lated annealing methods yields the following probability of
in practice, the relaxation must be progressive. choosing an exploitation step:
By going through the same derivation as in Section 3.1,
T0
one can obtain the following expected value for a given eth = 1 elog (N ) ð49Þ
node u :
where T0 is the initial temperature and N is the number of
g(u) + h(u) iterations performed. There are, of course, other alternative
rany (u) [ EN (u) ½m(u)= +l(h(u) hb )
(1 c(u))(1 d(u)) formulations to drive the cooling, but the above is both the-
ð48Þ oretically optimal under certain assumptions and works well
in practice.
where hb represents an optional goal bias in the sampling Algorithm 7 shows the complete SBA* algorithm with
strategy which measures how much closer to the goal a new simulated annealing to schedule the generation of RRT-
sample is expected to be compared with node u. The above style nodes. As one can see, it simply chooses to generate
heuristic formula illustrates how inducing a constant bias RRT nodes or SBA* nodes randomly depending on the
towards the goal, as is often done in uni-directional sam- simulated annealing schedule, and uses the same connec-
pling-based planning algorithms, may not have a significant tion strategy in either case. This algorithm combines the
impact on the prioritization of nodes. We also caution the best of both sampling-based algorithms. The RRT* algo-
reader that a naive application of a goal bias, such as repeat- rithm has the advantage of rapidly exploring the space, but
edly attempting to expand directly towards the goal, could generating enough nodes to sufficiently refine the solution
have negative impacts on the entropy of the overall search. is often prohibitively expensive. On the other hand, the
For the remainder of this paper, we will assume no bias, i.e. basic SBA* strategy has the opposite problem, it can leave
hb = 0. many regions of the space unexplored, thus missing poten-
tial optimal solutions. Through the smooth transition from
exploration to exploitation, we can retain the essential ben-
3.3.3 Balancing RRT* and SBA* with simulated efits of the RRT* algorithm while being able to refine the
annealing. The final improvement to the SBA* algorithm optimal paths more effectively.
is aimed at counter-balancing its exploitation bias with an
exploratory method. The key issue here is in generating
new nodes for the motion graph that are driven towards 4. Simulation results
unexplored regions of the configuration space. This is com- In this section, we wish to present as complete a picture as
monly referred to as the Voronoi bias and is the central ele- possible to characterize the SBA* algorithm empirically. To
ment of the RRT family of methods (Kuffner and LaValle, this end, we first present a set of results on a simple two-
2000). In RRT methods, new nodes are generated by taking dimensional point robot such that visual representations of
a random sample from the configuration space, finding its the resulting motion graphs are clear. Moreover, we present
nearest neighbor in the current motion graph, and expand- statistical analysis for environments representative of the
ing that node towards the random sample. This method can intended application and also a brief qualitative assessment
be used as a drop-in replacement for the expansion method of more difficult scenarios. Then, to better understand the
of the SBA* algorithm, which would essentially yield the behavior of the SBA* algorithm with respect to dimension-
RRG or RRT* algorithm, if using the full connection strat- ality, we plan paths through empty spaces (i.e. obstacle-
egy or the pruned connection strategy, respectively. It thus free) of increasing dimensionality. Finally, we apply the
follows that this node generation method can serve to eas- proposed method to a real-world example with a 7-dof
ily introduce exploration into the SBA* algorithm. manipulator performing a static interception task through a
Algorithm 7. The sampling-based A* algorithm loop with simulated annealing.
1: function SA-SBA*-LOOP (Q, V, E, T0 )

Require: Pre-conditions are the same as SBA*-LOOP.
Require: T0 is the initial temperature value.
Ensure: Post-conditions are the same as SBA*-LOOP.
2: repeat T0
3: eth 1 elog (N ) v Compute current entropy
4: if RANDOM([ 0, 1]) . eth then
5: u TOP (Q) v Generate SBA* node
6: {pnew, success} RANDOMWALK (p[u])
7: else
8: prand SAMPLE() v else, Generate RRT* node
9: u NEARESTPREDECESSOR (prand, V)
10: {pnew, success} STEER (p[u], prand) v Attempt non-colliding travel
11: end if
12: if success then
13: {P,S} NEARESTNEIGHBORS (pnew, V) v Get set of neighbors of pnew
14: v NEWVERTEX (pnew, V) v Create new vertex at position pnew
15: LAZYCONNECTPREDECESSORS (v, P, Q, V, E)
16: LAZYCONNECTSUCCESSORS (v, S, Q, V, E)
17: else
18: Update c[u] and d[u] using Equations (41) and (43). v or, Record the failure
19: REQUEUE (u, Q)
20: end if
21: until EMPTY (Q) or SHOULDTERMINATE (V, E)
22: end function
lightly cluttered environment and analyse its performance environment is especially useful for presentation of the
in Monte Carlo runs. shape and evolution of the motion graph as generated by
In all cases, different variants of the proposed SBA* the different path-planning algorithms. The comparisons
algorithm will be compared with each other and with the mainly involve the RRT* algorithm as a reference point,
RRT* algorithm. The RRT* algorithm is one of the best the SBA* algorithm and the SBA* algorithm with simu-
performing algorithms for these types of scenarios, i.e. sta- lated annealing (SA-SBA*; as per Section 3.3.3). For both
tic environments, no kinodynamic constraints, and assum- SBA* variants, the lazy connection strategy and the any-
ing uni-directional planning. We omit comparison with the time heuristics are used, as per Sections 3.3.1 and 3.3.2,
RRT algorithm as it shares the same sampling method as respectively. In all experiments, the initial relaxation factor
RRT* and is otherwise superseded by the RRT* optimal for the anytime heuristic was 5.0, and the initial tempera-
connection strategy. ture for the simulated annealing schedule was 2.0, which
It is important to note that all of the algorithms pre- results in approximately 500 nodes generated by the RRT-
sented were tested with the same software written by the style mechanism during the entire initial high-entropy
same implementer, only changing the core algorithmic phase of the SA-SBA* algorithm.
loop. The implementation is in C + + and is available The distance metric used here is the Euclidean distance
2
open-source under GPLv3 as part of the ReaK library. between points, measured in pixels, and the distance is also
Nearest-neighbor queries are performed, in all cases, using used as the travel cost along edges. For each sample u, with
a dynamic vantage-point tree (DVP-tree) (Fu et al., 2000) associated position p(u) [ (x, y), the heuristic value is
implementation which is capable of efficient queries for computed as
any distance metric (including asymmetric metrics). In all

cases, the vertices of the motion graph are stored within the h(u) [ pgoal p(u)2 ð50Þ
DVP-tree’s data storage, which, in turn, is a quaternary tree
laid out in breadth-first order in contiguous memory, pro- where pgoal is the position of the goal pixel and kk2 denotes
viding better locality for efficient use of cache memory. the Euclidean norm.
4.1 Two-dimensional point-robot environments 4.1.1 Single runs. In this section, we present a qualitative
As a first set of tests, we present a number of results on a assessment of the SBA* algorithms by presenting the
simple two-dimensional environment represented by a motion-graphs resulting from single runs of the different
black-and-white image whose white pixels represent free variants of SBA* and RRT* on example environments.
areas and black pixels represent obstacles. This simple The idea is to illustrate, compare and contrast the
Figure 1. Motion-graphs generated on moderately cluttered environment: (a) SBA* at 500; (b) SBA* at 1000; (c) SBA* at 1500; (d)
SA-SBA* at 500; (e) SA-SBA* at 1000; (f) SA-SBA* at 1500.
characteristics of the different algorithms. More quantita- on a highly cluttered environment. From Figure 2(a), one
tive assessments with Monte Carlo runs follow in the next can see the aforementioned issue about continuing to gener-
section. ate an RRT* tree in the hopes of refining the solution, that
Figure 1 shows a comparison of the evolution of the is, the optimal path does not improve significantly since the
SBA* algorithm versus the SA-SBA* algorithm for a mod- additional computational capital is spread over the entire
erately cluttered environment with the start position at the motion graph. A standard method to mitigate this problem
bottom-left corner and the goal location at the upper-right is to use branch-and-bound pruning, which is the simple
corner of the image. As one can see, the anytime SBA* act of eliminating nodes from the motion graph that cannot
algorithm evolves by pushing strongly for an expansion possibly be part of a path that would be shorter than the
towards the goal region, but as the area towards the goal current best solution. Figure 2(b) shows the result of 1500
region gets increasingly dense, more nodes are generated iterations of RRT* with branch-and-bound pruning. In this
back towards the starting area to obtain more uniform den- particular run, the final motion graph contains about 400
sity of nodes all along the region around the optimal path. nodes, including about 50 nodes added since the first prun-
As mentioned earlier, this greedy push to make a connec- ing pass (after the first solution was found). This highlights
tion to the goal region makes the algorithm susceptible to a problem with RRT* and branch-and-bound: the pruning
local minimas and variance in the results. When explora- strategy wants to limit the search to the optimal region
tion through RRT* is scheduled with simulated annealing, while the sampling strategy is biased towards unexplored
one can see from the bottom row in Figure 1 that the initial regions. This conflict results in a lot of iterations with very
growth is much more reminiscent of the RRT* algorithm, little progress in the number of nodes or in total cost of the
as expected, but it rapidly adopts an SBA* strategy which optimal path, because the overlap between the unexplored
helps fodder the optimal path region while leaving the far- areas of the configuration space and the areas that can pass
reaching regions virtually unaffected. This latter aspect is the pruning criteria is very small, leading, in this particular
very desirable because continuing with RRT* in the hopes case, to a new node to iteration ratio of less than 10%. This
of optimizing the optimal path wastes a lot of computa- is a considerable waste of effort since rejected samples
tional capital on far-reaching, sub-optimal regions of the must still be added to the tree before being rejected, and
configuration space. thus, the computational expense is per iteration, not per
Continuing on the topic of using RRT* entirely, we pres- successfully added node.
ent Figure 2 which compares the RRT* algorithm and the By contrast, the SA-SBA* algorithm is much more
SA-SBA* algorithm, both with and without branch-and- graceful in the solution refinement phase. As seen from
bound pruning, after an equal amount of iterations (1500) Figure 2(c), the overall continuation of the SA-SBA*
Figure 2. Motion-graphs generated after 700 iterations on a highly cluttered environment: (a) RRT*; (b) RRT* with BnB; (c) SA-
SBA*; (d) SA-SBA* with BnB.
algorithm past the initial solution is much more effective at among 100 runs of the 3 algorithms. The total travel dis-
creating a richer motion-graph along the optimal path, tance is measured in pixels of the image, with the absolute
while leaving uninteresting regions unaffected. This gener- optimal solution being about 301 pixels. The main observa-
ally leads to more progress towards refining the current tion we can make here is that the SBA* and SA-SBA*
solution, and thus, an earlier termination of the overall algorithms significantly out-perform the RRT* algorithm
algorithm. With branch-and-bound pruning, an additional on early solutions (below 1000 nodes). The early solutions
benefit of the SA-SBA* algorithm comes to light. As seen produced by the SBA* algorithm are the best of the three
from Figure 2(d), when branch-and-bound pruning is methods tested, however, the reliability of the SBA* algo-
applied to the motion-graph, the resulting motion-graph rithm is only comparable to or slightly better than the
has, in fact, about twice the amount of nodes (about 750 in RRT* algorithm (see Figure 4) and it is rather slow at
this case) and is much denser around the optimal path. The improving the solutions in the later phase. This suggests
key difference here is that the pruning criteria and the that the exploitative (or greedy) nature of the algorithm
sampling strategy are in accord with each other, leading, causes it to more often miss the global optimum in favor of
in this particular case, to a new node to iteration ratio alternative sub-optimal paths (i.e. local minimas). On the
above 50%. other hand, the SA-SBA* appears to find very good early
solutions, with more reliability than either the RRT* or
4.1.2 Monte Carlo runs. In this section, we compare the SBA* algorithms, and exhibits an asymptotic behavior that
three main algorithms, RRT*, SBA* and SA-SBA*, with closely matches that of the RRT* algorithm.
Monte Carlo runs, bearing more statistical significance. To further assess the reliability of the three methods,
There are three variables we are interested in: the quality of Figures 4(a) and 4(b) show the number of planners, out of
the solutions, the termination rates, and the execution time. 100, that have found at least one solution after a given
For the purpose of exposition, we have gathered the results number of iterations. What is clear from those results is
of 100 Monte Carlo runs on the moderately cluttered envi- that the SA-SBA* algorithm is very reliable at finding an
ronment (see Figure 1) and on the highly cluttered environ- initial solution quickly, with a median arrival at about 250
ment (see Figure 2) as a representative set of results. Many iterations and a success probability above 99% after 600
more tests have been performed, always showing compara- iterations for both the moderately and highly cluttered
ble results. environments. By contrast, those figures are more than
We begin by assessing the quality of the solutions doubled for the RRT* algorithm. As expected, the SBA*
obtained from the different algorithms. To this end, algorithm becomes less reliable as the cluttering in the
Figures 3(a) and 3(b) show the average solutions found environment is increased, due to the higher likelihood of
Comparison of RRT* and Sampling−based A* Algorithms Comparison of RRT* and Sampling−based A* Algorithms
Average Solution Distances on Moderately Cluttered 2D Env. Average Solution Distances on Highly Cluttered 2D Env.
Average Solution Distance after N iterations (pixels)
Average Solution Distance after N iterations (pixels)

420 420
RRT* RRT*
400 400
380 380
SA−SBA* SA−SBA*
360 360
340 340
320 320
SBA* SBA*
300 300
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Number of Nodes Number of Nodes
(a) (b)
Figure 3. Monte Carlo results comparing RRT*, SBA* and SA-SBA* on (a) moderately and (b) highly cluttered environments,
showing the average cost of the solutions found by a 100 planners plotted against the number of iterations.
Cumulative Success Probability on Moderately Cluttered 2D Env. Cumulative Success Probability on Highly Cluttered 2D Env.
100 100
Success Probability after N iterations (%)

90 90
80 80
70 70
60 60
50 RRT* 50 RRT*
40 40
SBA*
30 30 SBA*
20 20
10
SA−SBA* 10
SA−SBA*
0 0
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
(a) (b)
Figure 4. Monte Carlo results comparing RRT*, SBA* and SA-SBA* on (a) moderately and (b) highly cluttered environments,
showing the percentage of successful planners versus the number of iterations.
being led astray into dead-ends or unfruitful local minimas. Comparison of RRT* and Sampling−based A* Algorithms
Execution−time per iteration on Highly Cluttered 2D Env.
The most interesting phenomenon observed here is that the 300
SA-SBA* algorithm is more reliable than both algorithms.

Execution Time for each iteration (µ s)
250
One would expect that since the SA-SBA* algorithm is
part RRT* and part SBA*, under a simulated annealing 200
schedule, that it would exhibit success rates that fall

150 SA−SBA*
between those two constituent algorithms. Clearly, this is a RRT*
SBA*
combination that performs better than the sum of its parts. 100
This is, in fact, in accordance to the principles that led to

50
the development of this novel method as explained in
Section 3.3.3. By combining an exploratory method with 0
0 500 1000 1500 2000 2500 3000
an exploitative method, we solve the main shortcomings of Number of Nodes
both methods, leaving us with a method that performs bet-

Figure 5. Monte Carlo results comparing RRT*, SBA* and SA-
ter than either of those constituents. SBA* on the highly cluttered environment, showing the average
The final variable we are interested in is the actual exe- execution time (in ms) required for each iteration plotted against
cution times of the algorithms. Figure 5 presents the aver- the number of iterations (or number of nodes of the motion
age execution time for each additional iteration, measured graph).
in microseconds. The first striking observations are three
spikes of increasing magnitude at exponential intervals, growth of the contiguous-storage container used to store
around the 100th, 350th and 1400th iterations, respectively. the vertices of the motion graph. On the platform used for
This is a classic manifestation of the amortized-constant these tests, the memory capacity seems to quadruple
Figure 6. Motion-graphs generated on ‘‘indoor’’ environments such as a bug-trap, an office space, and a symmetric world, all
captured at the moment when start-goal connectivity is achieved: (a) SBA* at 1600; (b) SA-SBA* at 1600; (c) SA-SBA* at 2100; (d)
SA-SBA* at 3000.
whenever it is exhausted, leading to those spikes, which the SBA* algorithms tested will be adversely affected by
amortize to a constant run-time cost and are thus negligible these difficulties as well.
artifacts from an analytical point of view. The second Figure 6 shows a number of examples of the SBA* and
observation is that in all cases, the execution time per itera- SA-SBA* motion graphs on a number of these difficult
tion is a logarithmic function of the number of nodes in the spaces, such as the classic ‘‘bug-trap’’ environment, a sim-
motion graph. This is the cost incurred by nearest-neighbor ple office space, and a classic symmetric environment
queries performed during each iteration. Thus, as expected, riddled with narrow corridors and poor connectivity. The
the empirical time complexity of each algorithm is O(N log number of nodes displayed in Figure 6 correspond to the
N) (for N nodes). Finally, we observe that the computa- moment when start-goal connectivity is achieved, at 1600,
tional cost of each algorithm is essentially the same, with 2100, and 3000, for each environment. The RRT* solved
the SBA* algorithms being marginally better, which is the same problems with 1000, 800, and 1000 nodes,
attributable to the fact that only one nearest-neighbor query respectively. Even without statistical analysis, it is clear that
is required for a SBA* iteration, as opposed to two for an the number of nodes required for a solution is quite sub-
RRT* iteration (one for the expansion and one to find the stantial in comparison with an exploratory algorithm such
connection neighborhood). as RRT*, as expected. The algorithms nevertheless make
progress and succeed to solve the problems. Another quali-
tative observation in these scenarios is that it appears that
4.1.3 Indoor environments. Although it is not the intended narrow passages are less of an obstacle to the planners than
application, it is interesting to take a brief look at the beha- the entanglement of the solution paths, which is, again,
vior of the SBA* algorithms in ‘‘indoor’’ environments. By expected from the direct effect that entanglement has on
this, we mean the typical office environment, which are the probability of progress by the expansions. By rough
characterized by poor connectivity (in the sense of e-good- inspection of the problems used here, we could estimate
ness or (a, b)-expansiveness), narrow passages and that the h value for the entanglement of these problems
entangled solutions (in the sense of h-entanglement). As is range between 1.5 and 3. Overall, these results confirm the
clear from the convergence rates derived in Theorems 4 idea that the SBA* algorithms are more appropriate for
and 5, these factors have a direct effect on the performance problems that feature low entanglement of the solution
of the idealized SBA* algorithm. It is thus expected that paths.
Comparison of RRT* and Sampling−base A* Algorithms The linear relationship between the required node count
Number of Nodes to First Solution on High−dimensional Spaces
and the dimensionality of the space when running the
4
SBA* algorithm is interesting and deserves additional clari-
Number of Nodes to First Solution
10
fications. For an unbiased, exploratory algorithm such as
RRT*
RRT*, the volume of space to be explored is an exponential
3
function of the dimensionality, and so is the number of
10
nodes needed to explore it. In a heuristically driven algo-
SA−SBA* rithm such as SBA*, the volume of ‘‘useful’’ space to be
explored is limited to a corridor between the start and the
2
10
goal points, in the absence of obstacles, i.e. when the space
SBA*
is h-entangled with a low value of h. The volume of this
4 6 8 10 12 14 16 18 20
Number of Dimensions corridor is proportional to the length of the central path.
The same phenomenon is true of A*, and does not depend
Figure 7. Monte Carlo results comparing RRT*, SBA*, and SA- on the relaxation of the heuristic. This phenomenon was
SBA* on high-dimensional empty spaces, showing the mean
also reported for the Guided-EST (Phillips et al., 2004). It
number of vertices required to find the first solution versus the
number of dimensions of the space.
is important to note, however, that the introduction of
obstacles and, thus, raising the h value that characterizes
the entanglement of the problem will expand the volume to
4.2 High-dimensional empty spaces be explored to find a solution, ultimately requiring a full
Another interesting aspect to investigate is the behavior of exploration, as the RRT* does.
the algorithm as the dimensionality of the problem is
increased. In this section, we want to shed some light on the
dimensionality alone and, thus, we use ‘‘empty’’ spaces as 4.3 7-dof Manipulator
the configuration space, i.e. a high-dimensional Euclidean The target application of the proposed algorithm is an
space (from 3 to 20 dimensions) with no obstacles, uncooperative satellite capture scenario involving a robotic
bounded with a unit hyper-cube and with a start and goal manipulator mounted on a servicing spacecraft. As an
point on opposite corners of the hyper-cube. The heuristic uncontrolled satellite is drifting in the vicinity of the servi-
function is, again, the Euclidean norm between the sample cing spacecraft after an orbital rendezvous, the robotic
and the goal position, as in Equation (50). We limit the size manipulator should be able to autonomously move to cap-
of any segment (edge)
pffiffiffiffi connecting two configuration points ture the satellite while avoiding a self-collision or a colli-
to a length of 0:2 N where N is the number of dimensions sion with the target satellite. In this section, we show that
in the space, i.e. there is always a minimum of 5 segments the SBA* algorithms can be successfully applied to solve
required to travel from the starting point to the goal point in problems in this real scenario for a static target. What
a straight line. makes the SBA* algorithm a suitable candidate in this sce-
Figure 7 shows the mean number of vertices required to nario is that the navigation problem is characterized by a
reach an initial solution by all three algorithms for 100 runs. rather straight-forward path to the target location while
As expected, the RRT* algorithm shows an exponential side-stepping a danger zone around the target’s collision
relationship to the number of dimensions, as per the well- geometry, with possibly narrow corridors to traverse.
known curse of dimensionality. The RRT* algorithm seems Our simulations of the satellite capture task are based
to require approximately twice the amount of vertices for on the experimental facility available at the Aerospace
every additional dimension, i.e. the mean node count is Mechatronics Laboratory at McGill University which uses
O(g N) with g = 1.92. By contrast, if we direct our attention a neutrally buoyant airship to emulate the target satellite
to the SBA* algorithms, we see sub-exponential relation- and a robotic manipulator on a 3 m linear track to represent
ships. Indeed, the SBA* results show a linear relationship the space manipulator (see Figure 8(a)). The capture target
between the mean node count and the dimensionality of the for the end-effector of the manipulator is a single grapple
space, with the best regression giving O(Nr) with r = 1.02. fixture mounted on the airship which itself is a spherical
The SA-SBA* results are more difficult to characterize or blimp of 6 foot in diameter. The mobile manipulator sys-
find a good regression for, but it appears the relationship is tem has a total of 7 dofs (6-dof manipulator and the linear
quadratic, with a power-function regression giving O(Nr) track), and planning is performed in the 14-dimensional
with r = 2.01, and the quadratic curve fit having the least joint state-space, i.e. using the 7 joint positions and 7 joint
residual error of all regressions tried. It is reasonable to con- velocities as the state representation stored in the motion
clude that the SA-SBA* algorithm achieves a mean node graph and using per-joint cubic-spline interpolations
count that is bounded-above by the RRT* performance and between states. Collision geometries are represented by
bounded below by the SBA* performance in the absence of simple 3-dimensional primitives, and proximity queries
obstacles. Where exactly the SA-SBA* performance lies (collision detection) are performed with simple closed-form
will naturally depend on the tuning of the simulated anneal- expressions for each pair of primitives. Figure 8(b) shows a
ing schedule (e.g. initial temperature or cooling formula). three-dimensional rendering of the planning environment.
Figure 8. Picture of the robot-airship facility at the Aerospace Mechatronics Laboratory, McGill University: (a) actual picture; (b)
virtual environment.
To normalize the dimensions of the state space, each

component is divided by its maximum allowable rate of
change, that is, joint positions are divided by the maximum
joint velocities and the joint velocities are divided by the
maximum joint accelerations, which results in each state
component being represented as the time required to reach
that position or velocity from the origin or from rest,
respectively. We refer to this representation as the reach-
time representation, expressed in seconds. The distance
metric used is the L‘ -norm over the Euclidean norm of
joint-wise position–velocity pairs, which approximates the
time required for the most distant joint to be brought to the
destination position and velocity. According to the distance
metric, which is, again, used also as the travel cost, the
heuristic function used for any sample u with associated
point p(u) is as follows: Figure 9. Static planning scenario on the virtual environment,
showing the start and goal configurations.
hcs (u) [ max jpgoal, i pi (u)j ð51Þ
i = 1...7
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where qi is the joint coordinate i (angular or linear), vi is the
hss (u) [ max (pgoal, i pi (u))2 + (pgoal, i + 7 pi + 7 (u))2 joint velocity i, and vmax and amax are the maximum joint
i = 1...7
ð52Þ velocities and accelerations, respectively. The resulting unit

of all values in p(u) are seconds. In the planning scenarios,
where the subscripts cs and ss denote the configuration- the maximum travel distance for any edge of the graph was
space and state-space planners, respectively. The reach-time set to 1 second (i.e. the radius of local reachable regions).
representations of the joint positions p1.7 and the joint Figure 9 shows the start and goal configuration of the
velocities p8.14 are defined as scenario that will be used to test the proposed algorithms.
The scenario places the airship at an offset from the center
qi vi of the track, leaving just enough room for the manipulator
pi [ , pi + 7 [ , for i 2 ½1, 7 ð53Þ
vmax, i amax, i to move over to the opposing side of the track, which it is
Figure 10. Resulting motion graphs after 1000 iterations on the robot-airship facility, shown as traces of the end-effector positions for
(a) RRT*, (b) SBA*, and (c) SA-SBA*, and a still animation of the SA-SBA* solution in (d).
required to do because the grapple fixture, and thus the tar- and the distance is taken as the L‘ -norm of the joint-wise
get configuration, is on that side. The main features of this Euclidean norm of position and velocity, in reach-time repre-
scenario is that the corridor along which to travel to move sentation. In these experiments, the initial relaxation factor
from one side of the track to the other is quite narrow and for the anytime heuristic was 5.0, and the initial temperature
requires the manipulator to be turned to the side (profile) for the simulated annealing schedule was 5.0.
and tilted away from target to allow a safe (collision-free) Figures 11(a) and 11(b) show the average solutions
passage across. found after a given number of iterations of the different
As an example, Figure 10 shows the resulting motion- algorithms. The results corroborate those obtained in the
graphs from running the RRT*, SBA*, and SA-SBA* algo- two-dimensional cases, as do the success rates seen in
rithms for 1000 iterations (1000 nodes). One such solution Figures 11(c) and 11(d). We can see that the SA-SBA*
path is illustrated as a still animation in Figure 10(d), which algorithm out-performs the RRT* algorithm on early solu-
shows clearly that the manipulator does indeed make con- tions and then achieves a comparable asymptotic behavior.
siderable motion to avoid colliding with the airship on its On the state-space planning (14 dimensions), the difference
way to the capture configuration. is the most significant, in favor of the SA-SBA* algorithm.
Moreover, the reliability, in terms of success rates, of the
SA-SBA* is also comparable to that of the RRT* algorithm
4.3.1 Monte Carlo runs. To further validate the proposed in these scenarios. However, the SBA* algorithm does suf-
methods in this real scenario, we present the results of fer from a significant reliability problem, especially when
Monte Carlo runs of the different algorithms for planning in the dimensionality of the problem is increased to 14 in the
the configuration space (zeroth order) and in the state space state-space planning. Clearly, in very high-dimensional
(first order) of this 7 degrees-of-freedom manipulator, problems, the SBA* algorithm has more difficulty circum-
amounting to a 7-dimensional space and a 14-dimensional venting the obstacles and dead-ends due to the higher num-
space, respectively. The results presented in this section are ber of samples required to sufficiently increase the density
accumulated over 1000 runs of the different algorithms. For values. Nevertheless, it is clearly beneficial to mix SBA*
configuration-space planning, the interpolation (or local iterations into RRT* iterations, as demonstrated by the SA-
planner) is linear and the distance metric used is the L‘ - SBA* algorithm’s performance.
norm of the reach-time representation of the joint positions Table 1 presents an additional view of the results of the
or angles. For state-space planning, the interpolation is cubic Monte Carlo runs. First, we have the average first solutions
Table 1. First solution statistics for the robot-airship capture scenario, comparing RRT*, SBA* and SA-SBA*, in a zeroth-order space
(configuration space, 7 dimensions) and a first-order space (state space, 14 dimensions).
Algorithm Solution cost Number of nodes Execution time (s)

Zeroth-order (7D) Mean SD Mean SD for 3000 nodes
RRT* 3.68 0.36 150 408 4.58

SBA* 3.66 0.31 289 485 3.17
SA-SBA* 3.67 0.34 152 333 3.99
First-order (14D) Mean SD Mean SD for 6000 nodes
RRT* 5.18 0.46 1685 1385 13.83

SBA* 5.21 0.61 2750 1710 4.91
SA-SBA* 5.09 0.44 1810 1370 10.30
Average Solution Distance after N iterations (seconds)

Average Solution Distances on 7−dof Manipulator (state−space)
Average Solution Distance after N iterations (seconds)
Average Solution Distances on 7−dof Manipulator

3.7 5.4
RRT*
5.3
3.65 SA−SBA*
5.2
RRT*
3.6 SA−SBA*
5.1
3.55 5
4.9
3.5
SBA*
4.8
3.45
4.7 SBA*
3.4 4.6
0 50 100 150 200 250 300 0 500 1000 1500 2000 2500 3000
(a) (b)
Cumulative Success Probability on 7−dof Manipulator Cumulative Success Probability on 7−dof Manipulator (state−space)
100 100
90 90 RRT*
80 RRT* 80
70 70
SA−SBA* SA−SBA*
60 60
50 50
40
SBA* 40
30 30
20 20
10 10
SBA*
0 0
0 50 100 150 200 250 300 0 1000 2000 3000 4000 5000 6000
(c) (d)
Figure 11. Monte Carlo results comparing RRT*, SBA*, and SA-SBA* for the robot-airship capture scenario in a zeroth-order space
(position only, 7 dimensions) and a first-order space (state-space, 14 dimensions), showing the average cost of solutions (a), (b) and
the cumulative success rates (c), (d).
found by the planners on the 1000 runs. We can see that all for the zeroth-order and first-order planning problem,
algorithms produce comparable results, with only marginal respectively. By comparing the results, we can see that the
improvements on the mean value and the standard devia- SBA* is significantly faster than RRT*, and that the SA-
tion for the SA-SBA* algorithm. Then, we reported the SBA* falls in between, as expected. The significant run-
average number of nodes required to find the first solution time reductions achieved by SBA* iterations must be due
for each planner. We can observe that SBA* algorithm is to the fact that it requires only a single nearest-neighbor
significantly less reliable, but the SA-SBA* recoups that query as opposed to two for RRT* iterations, since the only
reliability and even achieves lower standard deviation than other significant computational cost is the collision-check-
the RRT* algorithm. Finally, the table shows the running ing, which we would expect to incur the same cost in all
time, in seconds, required to run 3000 and 6000 iterations algorithms tested.
5. Conclusion generates on the performance of the nominal SBA* algo-

rithm, since these factors are known to significantly
In this paper, we have proposed a new class of sampling-
degrade the performance of the A* algorithm.
based path-planning algorithms derived from a formal
Finally, the proposed class of algorithms has demon-
generalization of the classic A* algorithm. The key gener-
strated satisfactory performance on a real scenario of auton-
alization steps involve casting the value of the nodes of the
omous capture of a large free-floating object with a robotic
motion graph in terms of a probabilistic expected value,
manipulator, i.e. a high-dimensional space (14 dimensions)
and then replacing the binomial concept of explored versus
with complex collision geometries. The statistical analysis
unexplored nodes with a metric for the density of the
of those real high-dimensional scenarios demonstrates that
region to which the node belongs. Furthermore, salient for-
the qualities previously observed of the proposed methods
mal analysis was brought forth that demonstrates the prob-
do, indeed, carry over to those difficult problems.
abilistic completeness and convergence of an idealized
version of the proposed algorithm. A convenient, recursive
Acknowledgments
formulation for the density metric was also presented in the
form of an approximation of the KL divergence between a This work was made possible with the support of the Vanier
Gaussian distribution and mixture of the Gaussian distribu- Canada Graduate Scholarship from the National Science and
Engineering Research Council of Canada (NSERC), and the sup-
tions over neighboring nodes.
port of the Hydro Quebec Doctoral Award as part of the McGill
Then, a number of practical considerations and improve-
Engineering Doctoral Awards (MEDA). The authors would like to
ments were presented that fall in line with existing literature thank the reviewers of this article for providing very constructive
in sampling-based path-planning. A final novelty proposed feedback and for upholding the high standards of rigor required
here was the introduction of a simulated annealing sched- for productive scientific discourse.
ule, a technique borrowed from nonlinear optimization and
machine learning literature, as a means to balance between
Funding
exploratory and exploitative steps, which was then applied
to combine the RRT* and SBA* algorithms, respectively. This research received no specific grant from any funding agency
The results presented in this paper support the claims in the public, commercial, or not-for-profit sectors.
made throughout the development of the proposed meth-
ods. First, it was shown that the SBA* algorithms can more Notes
effectively explore the region around the optimal path,
1. Square brackets denote an association of precomputed values
improving the success rates, the quality of the solutions as opposed to a computation of those values.
found and the rate at which they can be improved during 2. See https://github.com/mikael-s-persson/ReaK, last accessed
the later phase of the algorithms. These are aspects not only on 19 February 2014.
desirable from a quality stand-point but also leading to
measurable benefits in time and computational capital spent
towards finding a solution to a path-planning problem. References
Second, it was observed that the interleaving of explora- Belghith K, Kabanza F, Hartman L and Nkambou R (2006) Any-
tory iterations (RRT*) with exploitative iterations (SBA*) time dynamic path-planning with flexible probabilistic road-
via a simulated annealing schedule improves the perfor- maps. In: Proceedings of the 2006 IEEE international
mance, in most aspects, compared with either algorithms in conference on robotics and automation (ICRA’06), Orlando,
isolation. This suggests, or confirms, that striking a correct FL, pp. 2372–2377.
Berg JVD, Ferguson D and Kuffner JJ (2006) Anytime path plan-
balance between exploration and exploitation is the key to
ning and replanning in dynamic environments. In: Proceedings
a path-planner that performs well in the type of scenarios of the 2006 IEEE international conference on robotics and
considered in this paper. automation (ICRA’06), Orlando, FL, pp. 2366–2371.
Third, the behavior of the SBA* algorithms in high- Branicky MS, LaValle SM, Olson K and Yang L (2001) Quasi-
dimensional spaces shows not only a graceful degeneration randomized path planning. In: Proceedings of the 2001 IEEE
of the algorithms in scenarios in which a simple straight- international conference on robotics and automation
line interpolation would suffice, but that the average num- (ICRA’01), vol. 2, Seoul, Korea, pp. 1481–1487.
ber of nodes required to solve a problem is only linearly Burns B and Brock O (2005) Sampling-based motion planning
dependent on the dimensionality of the space under using predictive models. In: Proceedings of the 2005 IEEE
obstacle-free scenarios. However, the presence of obstacles international conference on robotics and automation
(ICRA’05), Barcelona, Spain, pp. 3120–3125.
in the configuration space has a definite adverse effect on
Burns B and Brock O (2007) Single-query motion planning with
the performance of the SBA* algorithm: a problem that
utility-guided random trees. In: Proceedings of the 2007 IEEE
seems to vanish when RRT* iterations are scheduled into international conference on robotics and automation
the SBA* algorithm with, as proposed in this paper, a (ICRA’07), Roma, Italy, pp. 3307–3312.
simulated annealing schedule. This also suggests a possible Chowdhury RA (2007) Algorithms and Data Structures for
future line of investigation on the effect of the guiding Cache-efficient Computation: Theory and Experimental Eva-
heuristic function and the depth of the local minimas it luation. PhD Thesis, University of Texas at Austin.
Ferguson D, Kalra N and Stentz A (2006) Replanning with RRTs. 2000 IEEE international conference on robotics and automa-
In: Proceedings of the 2006 IEEE international conference on tion (ICRA’00), vol. 2, San Francisco, CA, pp. 995–1001.
robotics and automation (ICRA’06), Orlando, FL, pp. 1243– Kullback S and Leibler RA (1951) On information and suffi-
1248. ciency. Annals of Mathematical Statistics 22(1): 79–86.
Fu AWC, Chan PMS, Cheung YL and Moon YS (2000) Dynamic LaValle SM (1998) Rapidly-exploring Random Tree: A New Tool
VP-tree indexing for N-nearest neighbor search given pair- for Path Planning. Technical Report 98-11, Department of
wise distances. The Very Large Data Bases Journal (VLDB) 9: Computer Science. Iowa State University.
154–173. LaValle SM and Kuffner JJ (2001) Randomized kinodynamic
Geraerts RJ and Overmars MH (2002) A Comparative Study of planning. The International Journal of Robotics Research 20:
Probabilistic Roadmap Planners. Technical Report, Utrecht 378–400.
University: Information and Computer Science, Utrecht, The Likhachev M, Ferguson D, Gordon G, Stentz A and Thrun S
Netherlands. (2005) Anytime dynamic A*: An anytime, replanning algo-
Gonzalez JP and Likhachev M (2011) Search-based planning with rithm. In: Proceedings of the international conference on auto-
provable suboptimality bounds for continuous state spaces. In: mated planning and scheduling (ICAPS), Monterey, CA.
Proceedings of the fourth international symposium on combi- AAAI Press, pp. 262–271.
natorial search (SoCS’11), Castell de Cardona, Barcelona, Likhachev M, Gordon G and Thrun S (2003) ARA*: Anytime A*
Spain. AAAI Press, pp. 60–67. with provable bounds on sub-optimality. In: Proceedings of the
Hart PE, Nilsson NJ and Raphael B (1968) A formal basis for the 2003 conference in advances in neural information processing
heuristic determination of minimum cost paths. IEEE Transac- systems (NIPS), Whistler, BC, Canada. Cambridge, MA: MIT
tions on Systems Science and Cybernetics 4(2): 100–107. Press, pp. 767–774.
Hershey JR and Olsen PA (2007) Approximating the Kullback– Phillips JM, Bedrossian N and Kavraki LE (2004) Guided expan-
Leibler divergence between Gaussian mixture models. In: sive spaces trees: a search strategy for motion- and cost-
IEEE international conference on acoustics, speech and signal constrained state spaces. In: Proceedings of the 2004 IEEE
processing, volume 4, Honolulu, HI, pp. 317–320. international conference on robotics and automation
Hsu D (2000) Randomized Single-Query Motion Planning in (ICRA’04), vol. 4, New Orleans, LA, pp. 3968–3973.
Expansive Spaces. PhD Thesis. Department of Computer Sci- Plaku E (2012) Guiding sampling-based motion planning by for-
ence, Stanford University, Stanford, CA, pp. 71–90. ward and backward discrete search. In: Su CY, Rakheja S and
Hsu D, Latombe JC and Motwani R (1999) Path planning in Liu H (eds.), Intelligent Robotics and Applications (Lecture
expansive configuration spaces. International Journal of Com- Notes in Computer Science, vol. 7508). Berlin: Springer, pp.
putational Geometry & Applications 09(4/5): 495–512. 289–298.
Ingber L (1996) Adaptive simulated annealing (ASA): lessons Rickert M, Brock O and Knoll A (2008) Balancing Exploration
learned. Journal of Control and Cybernetics 25: 33–54. and Exploitation in Motion Planning. In: IEEE international
Jamriska O, Sykora D and Hornung A (2012) Cache-efficient conference on robotics and automation (ICRA), Pasadena, CA,
graph cuts on structured grids. In: IEEE Conference on Com- pp. 2812–2817.
puter Vision and Pattern Recognition (CVPR’12). IEEE Com- Sanchez G and Latombe JC (2001) A single-query bi-directional
puter Society Press, pp. 3673–3680. probabilistic roadmap planner with lazy collision checking. In:
Karaman S and Frazzoli E (2011) Sampling-based algorithms for Jarvis R and Zelinsky A (eds.), International Symposium on
optimal motion planning. The international journal of robotics Robotics Research (Springer Tracts in Advanced Robotics, vol.
research 30(7): 846–894. 6). Berlin: Springer, pp. 403–417.
Karaman S, Walter MR, Perez A, Frazzoli E and Teller S (2011) Sucan I and Kavraki L (2012) A sampling-based tree planner for
Anytime motion planning using the RRT*. In: IEEE interna- systems with complex dynamics. IEEE Transactions on
tional conference on robotics and automation (ICRA), Shang-
Robotics 28(1): 116–131.
hai, China, pp. 1478–1483.
Sucan IA and Kavraki LE (2010) On the implementation of
Kasheff Z (2004) Cache-Oblivious Dynamic Search Trees.
single-query sampling-based motion planners. In: Proceedings
M.Eng. Thesis, Department of Electrical Engineering and
of the 2010 IEEE international conference on robotics and
Computer Science, Massachusetts Institute of Technology.
automation (ICRA’10), Anchorage, AK, pp. 2005–2011.
Kavraki LE and Latombe JC (1998) Probabilistic roadmaps for
Vazquez-Otero A, Faigl J and Munuzuri AP (2012) Path planning
robot path plannings. In: Motion Planning in Robotics. New
based on reaction–diffusion process. In: IEEE/RSJ interna-
York: John Wiley & Sons.
tional conference on intelligent robots and systems, Vilamoura,
Kavraki LE, Svestka P, Latombe JC and Overmars MH (1996)
Algarve, Portugal, pp. 896–901.
Probabilistic roadmaps for path planning in high-dimensional
Zucker M, Ratliff N, Dragan A, et al. (2013) CHOMP: Covariant
configuration spaces. IEEE Transactions on Robotics and
Hamiltonian Optimization for Motion Planning. The
Automation 12(4): 566–580.
International Journal of Robotics Research 32(11):
Kuffner JJ and LaValle SM (2000) RRT-Connect: an efficient
1164–1193.
approach to single-query path planning. In: Proceedings of the
View publication stats

Sampling-Based A Algorithm For Robot Path-Planning

Uploaded by

Sampling-Based A Algorithm For Robot Path-Planning

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Sampling-based A* algorithm for robot path-planning

Article in The International Journal of Robotics Research · November 2014

S. Mikael Persson Inna Sharf

SEE PROFILE SEE PROFILE

AEROSTABILES View project

Active Debris Removal View project

The user has requested enhancement of the downloaded file.

path-planning Reprints and permissions:

Sven Mikael Persson and Inna Sharf

1. Introduction introduction of the RRT algorithm is the concept of the

Algorithm 1. Randomized expansion from best sample.

1: Initialize a graph G = (V, E) with V = {uinit}.

Downloaded from ijr.sagepub.com by guest on October 17, 2015

P(NEWju) = 1 eDKL (Su jjSN, u ) ð33Þ dist(p(u), p(v))2

tance metric dist (,), and k is the dimensionality of the

Downloaded from ijr.sagepub.com by guest on October 17, 2015

where we have another distribution, denoted as Scv, which 3.2 Sampling-based A*

Algorithm 2. The sampling-based A* algorithm Loop.

Downloaded from ijr.sagepub.com by guest on October 17, 2015

Algorithm 3. The sampling-based A* predecessor connection function.

1: function CONNECTPREDECESSORS (u, P, Q, V, E)

Algorithm 4. The sampling-based A* successor connection function.

1: function CONNECTSUCCESSORS (u, S, Q, V, E)

Algorithm 5. The lazy sampling-based A* predecessor connection function.

1: function LAZYCONNECTPREDECESSORS (u, P, Q, V, E)

Algorithm 6. The lazy sampling-based A* successor connection function.

1: function LAZYCONNECTSUCCESSORS (u, S, Q, V, E)

Algorithm 7. The sampling-based A* algorithm loop with simulated annealing.

1: function SA-SBA*-LOOP (Q, V, E, T0 )

Average Solution Distance after N iterations (pixels)

Success Probability after N iterations (%)

SA-SBA* algorithm is more reliable than both algorithms.

schedule, that it would exhibit success rates that fall

This is, in fact, in accordance to the principles that led to

both methods, leaving us with a method that performs bet-

To normalize the dimensions of the state space, each

ð52Þ velocities and accelerations, respectively. The resulting unit

Algorithm Solution cost Number of nodes Execution time (s)

RRT* 3.68 0.36 150 408 4.58

First-order (14D) Mean SD Mean SD for 6000 nodes

RRT* 5.18 0.46 1685 1385 13.83

Average Solution Distance after N iterations (seconds)

Average Solution Distances on 7−dof Manipulator

Success Probability after N iterations (%)

5. Conclusion generates on the performance of the nominal SBA* algo-

Downloaded from ijr.sagepub.com by guest on October 17, 2015

View publication stats

You might also like

tance metric dist (,), and k is the dimensionality of the