Book - Genetic Algorithm
Book - Genetic Algorithm
Genetic Programming
Riccardo Poli
Department of Computing and Electronic Systems
University of Essex – UK
rpoli@essex.ac.uk
William B. Langdon
Departments of Biological and Mathematical Sciences
University of Essex – UK
wlangdon@essex.ac.uk
Nicholas F. McPhee
Division of Science and Mathematics
University of Minnesota, Morris – USA
mcphee@morris.umn.edu
with contributions by
John R. Koza
Stanford University – USA
john@johnkoza.com
March 2008
Riccardo
c Poli, William B. Langdon, and Nicholas F. McPhee, 2008
For any reuse or distribution, you must make clear to others the licence
terms of this work. Any of these conditions can be waived if you get
permission from the copyright holders. Nothing in this license impairs
or restricts the authors’ rights.
To cite this book, please see the entry for (Poli, Langdon, and McPhee,
2008) in the bibliography.
Acknowledgements
We would like to thank the University of Essex and the University of Min-
nesota, Morris, for their support.
Many thanks to Tyler Hutchison for the use of his cool drawing on the
cover (and elsewhere!), and for finding those scary pinks and greens.
We had the invaluable assistance of many people, and we are very grateful
for their individual and collective efforts, often on very short timelines. Rick
Riolo, Matthew Walker, Christian Gagne, Bob McKay, Giovanni Pazienza,
and Lee Spector all provided useful suggestions based on an early techni-
cal report version. Yossi Borenstein, Caterina Cinel, Ellery Crane, Cecilia
Di Chio, Stephen Dignum, Edgar Galván-López, Keisha Harriott, David
Hunter, Lonny Johnson, Ahmed Kattan, Robert Keller, Andy Korth, Yev-
geniya Kovalchuk, Simon Lucas, Wayne Manselle, Alberto Moraglio, Oliver
Oechsle, Francisco Sepulveda, Elias Tawil, Edward Tsang, William Tozier
and Christian Wagner all contributed to the final proofreading festival.
Their sharp eyes and hard work did much to make the book better; any
remaining errors or omissions are obviously the sole responsibility of the
authors.
We would also like to thank Prof. Xin Yao and the School of Computer
Science of The University of Birmingham and Prof. Bernard Buxton of Uni-
versity College, London, for continuing support, particularly of the genetic
programming bibliography. We also thank Schloss Dagstuhl, where some of
the integration of this book took place.
Most of the tools used in the construction of this book are open source,1
and we are very grateful to all the developers whose efforts have gone into
building those tools over the years.
As mentioned above, this book started life as a chapter. This was
for a forthcoming handbook on computational intelligence2 edited by John
Fulcher and Lakhmi C. Jain. We are grateful to John Fulcher for his useful
comments and edits on that book chapter. We would also like to thank most
warmly John Koza, who co-authored the aforementioned chapter with us,
and for allowing us to reuse some of his original material in this book.
This book is a summary of nearly two decades of intensive research in
the field of genetic programming, and we obviously owe a great debt to all
the researchers whose hard work, ideas, and interactions ultimately made
this book possible. Their work runs through every page, from an idea made
somewhat clearer by a conversation at a conference, to a specific concept
or diagram. It has been a pleasure to be part of the GP community over
the years, and we greatly appreciate having so much interesting work to
summarise!
March 2008 Riccardo Poli
William B. Langdon
Nicholas Freitag McPhee
1 See the colophon (page 235) for more details.
2 Tentatively entitled Computational Intelligence: A Compendium and to be pub-
lished by Springer in 2008.
What’s in this book
The book is divided up into four parts.
Part I covers the basics of genetic programming (GP). This starts with a
gentle introduction which describes how a population of programs is stored
in the computer so that they can evolve with time. We explain how programs
are represented, how random programs are initially created, and how GP
creates a new generation by mutating the better existing programs or com-
bining pairs of good parent programs to produce offspring programs. This
is followed by a simple explanation of how to apply GP and an illustrative
example of using GP.
In Part II, we describe a variety of alternative representations for pro-
grams and some advanced GP techniques. These include: the evolution of
machine-code and parallel programs, the use of grammars and probability
distributions for the generation of programs, variants of GP which allow the
solution of problems with multiple objectives, many speed-up techniques
and some useful theoretical tools.
Part III provides valuable information for anyone interested in using GP
in practical applications. To illustrate genetic programming’s scope, this
part contains a review of many real-world applications of GP. These in-
clude: curve fitting, data modelling, symbolic regression, image analysis,
signal processing, financial trading, time series prediction, economic mod-
elling, industrial process control, medicine, biology, bioinformatics, hyper-
heuristics, artistic applications, computer games, entertainment, compres-
sion and human-competitive results. This is followed by a series of recom-
mendations and suggestions to obtain the most from a GP system. We then
provide some conclusions.
Part IV completes the book. In addition to a bibliography and an index,
this part includes two appendices that provide many pointers to resources,
further reading and a simple GP implementation in Java.
About the authors
The authors are experts in genetic programming with long and distinguished
track records, and over 50 years of combined experience in both theory and
practice in GP, with collaborations extending over a decade.
Riccardo Poli is a Professor in the Department of Computing and Elec-
tronic Systems at Essex. He started his academic career as an electronic en-
gineer doing a PhD in biomedical image analysis to later become an expert
in the field of EC. He has published around 240 refereed papers and a book
(Langdon and Poli, 2002) on the theory and applications of genetic pro-
gramming, evolutionary algorithms, particle swarm optimisation, biomed-
ical engineering, brain-computer interfaces, neural networks, image/signal
processing, biology and psychology. He is a Fellow of the International So-
ciety for Genetic and Evolutionary Computation (2003–), a recipient of the
EvoStar award for outstanding contributions to this field (2007), and an
ACM SIGEVO executive board member (2007–2013). He was co-founder
and co-chair of the European Conference on GP (1998–2000, 2003). He was
general chair (2004), track chair (2002, 2007), business committee member
(2005), and competition chair (2006) of ACM’s Genetic and Evolutionary
Computation Conference, co-chair of the Foundations of Genetic Algorithms
Workshop (2002) and technical chair of the International Workshop on Ant
Colony Optimisation and Swarm Intelligence (2006). He is an associate edi-
tor of Genetic Programming and Evolvable Machines, Evolutionary Compu-
tation and the International Journal of Computational Intelligence Research.
He is an advisory board member of the Journal on Artificial Evolution and
Applications and an editorial board member of Swarm Intelligence. He is a
member of the EPSRC Peer Review College, an EU expert evaluator and a
grant-proposal referee for Irish, Swiss and Italian funding bodies.
W. B. Langdon was research officer for the Central Electricity Research
Laboratories and project manager and technical coordinator for Logica be-
fore becoming a prolific, internationally recognised researcher (working at
UCL, Birmingham, CWI and Essex). He has written two books, edited
six more, and published over 80 papers in international conferences and
journals. He is the resource review editor for Genetic Programming and
Evolvable Machines and a member of the editorial board of Evolutionary
Computation. He has been a co-organiser of eight international conferences
and workshops, and has given nine tutorials at international conferences. He
was elected ISGEC Fellow for his contributions to EC. Dr Langdon has ex-
tensive experience designing and implementing GP systems, and is a leader
in both the empirical and theoretical analysis of evolutionary systems. He
also has broad experience both in industry and academic settings in biomed-
ical engineering, drug design, and bioinformatics.
Nicholas F. McPhee is a Full Professor in Computer Science in the
Division of Science and Mathematics, University of Minnesota, Morris. He
is an associate editor of the Journal on Artificial Evolution and Applica-
tions, an editorial board member of Genetic Programming and Evolvable
Machines, and has served on the program committees for dozens of interna-
tional events. He has extensive expertise in the design of GP systems, and in
the theoretical analysis of their behaviours. His joint work with Poli on the
theoretical analysis of GP (McPhee and Poli, 2001; Poli and McPhee, 2001)
received the best paper award at the 2001 European Conference on Genetic
Programming, and several of his other foundational studies continue to be
widely cited. He has also worked closely with biologists on a number of
projects, building individual-based models to illuminate genetic interactions
and changes in the genotypic and phenotypic diversity of populations.
To
Contents xi
1 Introduction 1
1.1 Genetic Programming in a Nutshell . . . . . . . . . . . . . . . 2
1.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Overview of this Field Guide . . . . . . . . . . . . . . . . . . 4
I Basics 7
2 Representation, Initialisation and Operators in Tree-based
GP 9
2.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Initialising the Population . . . . . . . . . . . . . . . . . . . . 11
2.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Recombination and Mutation . . . . . . . . . . . . . . . . . . 15
xi
CONTENTS CONTENTS
xii
CONTENTS CONTENTS
xiii
CONTENTS CONTENTS
13 Troubleshooting GP 131
13.1 Is there a Bug in the Code? . . . . . . . . . . . . . . . . . . . 131
13.2 Can you Trust your Results? . . . . . . . . . . . . . . . . . . 132
13.3 There are No Silver Bullets . . . . . . . . . . . . . . . . . . . 132
13.4 Small Changes can have Big Effects . . . . . . . . . . . . . . 133
13.5 Big Changes can have No Effect . . . . . . . . . . . . . . . . 133
13.6 Study your Populations . . . . . . . . . . . . . . . . . . . . . 134
13.7 Encourage Diversity . . . . . . . . . . . . . . . . . . . . . . . 136
13.8 Embrace Approximation . . . . . . . . . . . . . . . . . . . . . 137
13.9 Control Bloat . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
13.10Checkpoint Results . . . . . . . . . . . . . . . . . . . . . . . . 139
13.11Report Well . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
13.12Convince your Customers . . . . . . . . . . . . . . . . . . . . 140
14 Conclusions 141
B TinyGP 151
B.1 Overview of TinyGP . . . . . . . . . . . . . . . . . . . . . . . 151
B.2 Input Data Files for TinyGP . . . . . . . . . . . . . . . . . . 153
B.3 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
B.4 Compiling and Running TinyGP . . . . . . . . . . . . . . . . 162
Bibliography 167
Index 225
xiv
Chapter 1
Introduction
1
2 1 Introduction
Solution
Generate Population Run Programs and
(* (SIN (- y x))
of Random Programs Evaluate Their Quality (IF (> x 15.43)
(+ 2.3787 x)
(* (SQRT y)
(/ x 7.54))))
Figure 1.1: The basic control flow for genetic programming, where survival
of the fittest is used to find solutions.
The best way to begin is obviously by reading this book, so you’re off to
a good start. We included a wide variety of references to help guide people
through at least some of the literature. No single work, however, could claim
to be completely comprehensive. Thus Appendix A reviews a whole host of
books, videos, journals, conferences, and on-line sources (including several
freely available GP systems) that should be of assistance.
We strongly encourage doing GP as well as reading about it; the dy-
namics of evolutionary algorithms are complex, and the experience of trac-
ing through runs is invaluable. In Appendix B we provide the full Java
implementation of Riccardo’s TinyGP system.
1.3 Prerequisites
Although this book has been written with beginners in mind, unavoidably
we had to make some assumptions about the typical background of our
readers. The book assumes some working knowledge of computer science
and computer programming; this is probably an essential prerequisite to get
the most from the book.
We don’t expect that readers will have been exposed to other flavours of
evolutionary algorithms before, although a little background might be useful.
The interested novice can easily find additional information on evolutionary
computation thanks to the plethora of tutorials available on the Internet.
Articles from Wikipedia and the genetic algorithm tutorial produced by
Whitley (1994) should suffice.
4 1 Introduction
2 This is in the footer of the odd-numbered pages in the bibliography and in the index.
6 1 Introduction
3 Available at http://www.cs.bham.ac.uk/~wbl/biblio/
Part I
Basics
7
Chapter 2
Representation,
Initialisation and
Operators in Tree-based
GP
This chapter introduces the basic tools and terminology used in genetic
programming. In particular, it looks at how trial solutions are represented in
most GP systems (Section 2.1), how one might construct the initial random
population (Section 2.2), and how selection (Section 2.3) as well as crossover
and mutation (Section 2.4) are used to construct new programs.
2.1 Representation
In GP, programs are usually expressed as syntax trees rather than as lines of
code. For example Figure 2.1 shows the tree representation of the program
max(x+x,x+3*y). The variables and constants in the program (x, y and 3)
are leaves of the tree. In GP they are called terminals, whilst the arithmetic
operations (+, * and max) are internal nodes called functions. The sets of
allowed functions and terminals together form the primitive set of a GP
system.
In more advanced forms of GP, programs can be composed of multiple
components (e.g., subroutines). In this case the representation used in GP
is a set of trees (one for each component) grouped together under a special
root node that acts as glue, as illustrated in Figure 2.2. We will call these
(sub)trees branches. The number and type of the branches in a program,
9
10 2 Tree-based GP
together with certain other features of their structure, form the architecture
of the program. This is discussed in more detail in Section 6.1.
It is common in the GP literature to represent expressions in a prefix no-
tation similar to that used in Lisp or Scheme. For example, max(x+x,x+3*y)
becomes (max (+ x x) (+ x (* 3 y))). This notation often makes it eas-
ier to see the relationship between (sub)expressions and their corresponding
(sub)trees. Therefore, in the following, we will use trees and their corre-
sponding prefix-notation expressions interchangeably.
How one implements GP trees will obviously depend a great deal on
the programming languages and libraries being used. Languages that pro-
vide automatic garbage collection and dynamic lists as fundamental data
types make it easier to implement expression trees and the necessary GP
operations. Most traditional languages used in AI research (e.g., Lisp and
Prolog), many recent languages (e.g., Ruby and Python), and the languages
associated with several scientific programming tools (e.g., MATLAB1 and
Mathematica2 ) have these facilities. In other languages, one may have to
implement lists/trees or use libraries that provide such data structures.
In high performance environments, the tree-based representation of pro-
grams may be too inefficient since it requires the storage and management
of numerous pointers. In some cases, it may be desirable to use GP primi-
tives which accept a variable number of arguments (a quantity we will call
arity). An example is the sequencing instruction progn, which accepts any
number of arguments, executes them one at a time and then returns the
max
+ +
x x x ∗
3 y
Figure 2.1: GP syntax tree representing max(x+x,x+3*y).
ROOT
Branches
...
∗ ∗ ∗
x x y
t=5 t=6 t=7
+ + +
∗ / ∗ / ∗ /
x y x y 1 x y 1 0
Figure 2.3: Creation of a full tree having maximum depth 2 using the full
initialisation method (t = time).
will describe two of the simplest (and earliest) methods (the full and grow
methods), and a widely used combination of the two known as Ramped half-
and-half.
In both the full and grow methods, the initial individuals are generated
so that they do not exceed a user specified maximum depth. The depth of
a node is the number of edges that need to be traversed to reach the node
starting from the tree’s root node (which is assumed to be at depth 0). The
depth of a tree is the depth of its deepest leaf (e.g., the tree in Figure 2.1 has
a depth of 3). In the full method (so named because it generates full trees,
i.e. all leaves are at the same depth) nodes are taken at random from the
function set until the maximum tree depth is reached. (Beyond that depth,
only terminals can be chosen.) Figure 2.3 shows a series of snapshots of the
construction of a full tree of depth 2. The children of the * and / nodes
must be leaves or otherwise the tree would be too deep. Thus, at both steps
t = 3, t = 4, t = 6 and t = 7 a terminal must be chosen (x, y, 1 and 0,
respectively).
Although, the full method generates trees where all the leaves are at
the same depth, this does not necessarily mean that all initial trees will
have an identical number of nodes (often referred to as the size of a tree)
or the same shape. This only happens, in fact, when all the functions in
the primitive set have an equal arity. Nonetheless, even when mixed-arity
primitive sets are used, the range of program sizes and shapes produced by
the full method may be rather limited. The grow method, on the contrary,
allows for the creation of trees of more varied sizes and shapes. Nodes are
selected from the whole primitive set (i.e., functions and terminals) until
the depth limit is reached. Once the depth limit is reached only terminals
2.2 Initialising the Population 13
x x −
t=4 t=5
+ +
x − x −
2 2 y
Figure 2.4: Creation of a five node tree using the grow initialisation method
with a maximum depth of 2 (t = time). A terminal is chosen at t = 2,
causing the left branch of the root to be closed at that point even though
the maximum depth had not been reached.
may be chosen (just as in the full method). Figure 2.4 illustrates this
process for the construction of a tree with depth limit 2. Here the first
argument of the + root node happens to be a terminal. This closes off that
branch preventing it from growing any more before it reached the depth
limit. The other argument is a function (-), but its arguments are forced
to be terminals to ensure that the resulting tree does not exceed the depth
limit. Pseudocode for a recursive implementation of both the full and grow
methods is given in Algorithm 2.1.
Because neither the grow or full method provide a very wide array of
sizes or shapes on their own, Koza (1992) proposed a combination called
ramped half-and-half. Half the initial population is constructed using full
and half is constructed using grow. This is done using a range of depth limits
(hence the term “ramped”) to help ensure that we generate trees having a
variety of sizes and shapes.
While these methods are easy to implement and use, they often make it
difficult to control the statistical distributions of important properties such
as the sizes and shapes of the generated trees. For example, the sizes and
shapes of the trees generated via the grow method are highly sensitive to the
sizes of the function and terminal sets. If, for example, one has significantly
more terminals than functions, the grow method will almost always generate
very short trees regardless of the depth limit. Similarly, if the number of
functions is considerably greater than the number of terminals, then the
grow method will behave quite similarly to the full method. The arities
of the functions in the primitive set also influence the size and shape of the
14 2 Tree-based GP
trees produced by grow.3 Section 5.1 (page 40) describes other initialisation
mechanisms which address these issues.
The initial population need not be entirely random. If something is
known about likely properties of the desired solution, trees having these
properties can be used to seed the initial population. This, too, will be
described in Section 5.1.
2.3 Selection
As with most evolutionary algorithms, genetic operators in GP are applied
to individuals that are probabilistically selected based on fitness. That is,
better individuals are more likely to have more child programs than inferior
individuals. The most commonly employed method for selecting individuals
in GP is tournament selection, which is discussed below, followed by fitness-
proportionate selection, but any standard evolutionary algorithm selection
mechanism can be used.
In tournament selection a number of individuals are chosen at random
3 While these are particular problems for the grow method, they illustrate a general
issue where small (and often apparently inconsequential) changes such as the addition or
removal of a few functions from the function set can in fact have significant implications
for the GP system, and potentially introduce important but unintended biases.
2.4 Recombination and Mutation 15
from the population. These are compared with each other and the best of
them is chosen to be the parent. When doing crossover, two parents are
needed and, so, two selection tournaments are made. Note that tourna-
ment selection only looks at which program is better than another. It does
not need to know how much better. This effectively automatically rescales
fitness, so that the selection pressure4 on the population remains constant.
Thus, a single extraordinarily good program cannot immediately swamp the
next generation with its children; if it did, this would lead to a rapid loss
of diversity with potentially disastrous consequences for a run. Conversely,
tournament selection amplifies small differences in fitness to prefer the bet-
ter program even if it is only marginally superior to the other individuals in
a tournament.
An element of noise is inherent in tournament selection due to the ran-
dom selection of candidates for tournaments. So, while preferring the best,
tournament selection does ensure that even average-quality programs have
some chance of having children. Since tournament selection is easy to imple-
ment and provides automatic fitness rescaling, it is commonly used in GP.
Considering that selection has been described many times in the evolu-
tionary algorithms literature, we will not provide details of the numerous
other mechanisms that have been proposed. (Goldberg, 1989), for example,
describes fitness-proportionate selection, stochastic universal sampling and
several others.
strong selection pressure very highly favours the more fit individuals, while a system with
a weak selection pressure isn’t so discriminating.
16 2 Tree-based GP
Parents Offspring
Crossover
Point + (x+y)+3 + (x/2)+3
+ 3 / 3
x y x 2
Crossover
Point ∗ (y+1)* (x/2)
+ /
GARBAGE
y 1x 2
Figure 2.5: Example of subtree crossover. Note that the trees on the left
are actually copies of the parents. So, their genetic material can freely be
used without altering the original individuals.
Parents Offspring
Mutation Mutation
+ Point + Point
+ 3 + ∗
x y x y y /
Randomly Generated
Sub-tree x 2
∗
y /
x 2
per-node basis. That is, each node is considered in turn and, with a certain
probability, it is altered as explained above. This allows multiple nodes to
be mutated independently in one application of point mutation.
The choice of which of the operators described above should be used
to create an offspring is probabilistic. Operators in GP are normally mu-
tually exclusive (unlike other evolutionary algorithms where offspring are
sometimes obtained via a composition of operators). Their probability of
application are called operator rates. Typically, crossover is applied with the
highest probability, the crossover rate often being 90% or higher. On the
contrary, the mutation rate is much smaller, typically being in the region of
1%.
When the rates of crossover and mutation add up to a value p which is
less than 100%, an operator called reproduction is also used, with a rate of
1 − p. Reproduction simply involves the selection of an individual based on
fitness and the insertion of a copy of it in the next generation.
Chapter 3
19
20 3 Getting Ready to Run Genetic Programming
• the program’s external inputs. These typically take the form of named
variables (e.g., x, y).
Terminal Set
Kind of Primitive Example(s)
Variables x, y
Constant values 3, 0.45
0-arity functions rand, go left
3.2.1 Closure
For GP to work effectively, most function sets are required to have an impor-
tant property known as closure (Koza, 1992), which can in turn be broken
down into the properties of type consistency and evaluation safety.
Type consistency is required because subtree crossover (as described in
Section 2.4) can mix and join nodes arbitrarily. As a result it is necessary
that any subtree can be used in any of the argument positions for every func-
tion in the function set, because it is always possible that subtree crossover
will generate that combination. It is thus common to require that all the
functions be type consistent, i.e., they all return values of the same type,
and that each of their arguments also have this type. For example +, -, *,
and / can can be defined so that they each take two integer arguments and
return an integer. Sometimes type consistency can be weakened somewhat
by providing an automatic conversion mechanism between types. We can,
for example, convert numbers to Booleans by treating all negative values as
false, and non-negative values as true. However, conversion mechanisms can
introduce unexpected biases into the search process, so they should be used
with care.
The type consistency requirement can seem quite limiting but often sim-
ple restructuring of the functions can resolve apparent problems. For exam-
ple, an if function is often defined as taking three arguments: the test, the
value to return if the test evaluates to true and the value to return if the
test evaluates to false. The first of these three arguments is clearly Boolean,
which would suggest that if can’t be used with numeric functions like +.
22 3 Getting Ready to Run Genetic Programming
1 The decision to return the value 1 provides the GP system with a simple way to
generate the constant 1, via an expression of the form (% x x). This combined with a
similar mechanism for generating 0 via (- x x) ensures that GP can easily construct
these two important constants.
3.2 Step 2: Function Set 23
3.2.2 Sufficiency
There is one more property that primitives sets should have: sufficiency.
Sufficiency means it is possible to express a solution to the problem at hand
using the elements of the primitive set.2 Unfortunately, sufficiency can be
guaranteed only for those problems where theory, or experience with other
methods, tells us that a solution can be obtained by combining the elements
of the primitive set.
As an example of a sufficient primitive set consider {AND, OR, NOT, x1, x2,
..., xN}. It is always sufficient for Boolean induction problems, since it can
produce all Boolean functions of the variables x1, x2, ..., xN. An example
of insufficient set is {+, -, *, /, x, 0, 1, 2}, which is unable to represent
transcendental functions. The function exp(x), for example, is transcenden-
tal and therefore cannot be expressed as a rational function (basically, a
ratio of polynomials), and so cannot be represented exactly by any combi-
nation of {+, -, *, /, x, 0, 1, 2}. When a primitive set is insufficient, GP
can only develop programs that approximate the desired one. However, in
many cases such an approximation can be very close and good enough for
the user’s purpose. Adding a few unnecessary primitives in an attempt to
ensure sufficiency does not tend to slow down GP overmuch, although there
are cases where it can bias the system in unexpected ways.
2 More formally, the primitive set is sufficient if the set of all the possible recursive
3 Functional operations like addition don’t depend on the order in which their argu-
ments are evaluated. The order of side-effecting operations such as moving or turning a
robot, however, is obviously crucial.
3.3 Step 3: Fitness Function 25
- 2
x=-1
+ 1
/ -1
- 3 - -2
- 3 - -3
3 0 x 1 3 0 x 2
4 There are, however, GP systems that frequently use much smaller populations. These
3.5 Step 5: Termination and solution designation 27
typically rely more on mutation than crossover for their primary search mechanism.
5 Training data refers to the test cases used to evaluate the fitness of the evolved
individuals.
Chapter 4
Example
Genetic Programming Run
theory you can build up large constants using small constants and arithmetic operators,
the performance of your system is likely to improve considerably if you provide constants
of roughly the right magnitude from the beginning. Your choice of genetic operators can
also be important here. If you’re finding that your system is struggling to evolve the
right constants, it may be helpful to introduce mutation operators specifically designed
to search of the space of constants.
29
30 4 Example Genetic Programming Run
T = {x, <}.
The statement of the problem does not specify which functions may be
employed in the to-be-evolved program. One simple choice for the function
set is the four ordinary arithmetic functions: addition, subtraction, mul-
tiplication and division. Most numeric regression problems will require at
least these operations, sometimes with additional functions such as sin() and
log(). We will use the simple function set
F = {+, -, *, %},
the crossover operation will be used twice (each time generating one indi-
vidual), which corresponds to a crossover rate of 50%, while the mutation
and reproduction operations will each be used to generate one individual.
These are therefore applied with a rate of 25% each. For simplicity, the
architecture-altering operations are not used for this problem.
In the fifth and final step we need to specify a termination condition. A
reasonable termination criterion for this problem is that the run will continue
from generation to generation until the fitness (or error) of some individual
is less than 0.1. In this contrived example, our example run will (atypically)
yield an algebraically perfect solution with a fitness of zero after just one
generation.
4.2.1 Initialisation
GP starts by randomly creating a population of four individual computer
programs. The four programs are shown in Figure 4.1 in the form of trees.
The first randomly constructed program tree (Figure 4.1a) is equivalent
to the expression x+1. The second program (Figure 4.1b) adds the constant
terminal 1 to the result of multiplying x by x and is equivalent to x 2 +1. The
third program (Figure 4.1c) adds the constant terminal 2 to the constant
terminal 0 and is equivalent to the constant value 2. The fourth program
(Figure 4.1d) is equivalent to x.
32 4 Example Genetic Programming Run
- + + *
+ 0 1 * 2 0 x -
x 1 x x -1 -2
x+1 2 2 x
x +1
4 4
-1 1 -1 1
(a) (b)
-2 -2
4 4
-1 1 -1 1
(c) (d)
-2 -2
Figure 4.2: Graphs of the evolved functions from generation 0. The solid
line in each plot is the target function x2 + x + 1, with the dashed line
being the evolved functions from the first generation (see Figure 4.1). The
fitness of each of the four randomly created individuals of generation 0 is
approximately proportional to the area between two curves, with the actual
fitness values being 7.7, 11.0, 17.98 and 28.7 for individuals (a) through (d),
respectively.
- + - +
+ 0 % 0 x 0 1 *
x 1 x x + x
x 1
2
x+1 1 x x + x+1
Finally, we use the crossover operation to generate our final two indi-
viduals for the next generation. Because the first and second individuals in
generation 0 are both relatively fit, they are likely to be selected to partic-
ipate in crossover. However, selection can always pick suboptimal individ-
uals. So, let us assume that in our first application of crossover the pair of
selected parents is composed of the above-average tree in Figures 4.1a and
the below-average tree in Figure 4.1d. One point of the first parent, namely
the + function in Figure 4.1a, is randomly picked as the crossover point for
the first parent. One point of the second parent, namely the leftmost termi-
nal x in Figure 4.1d, is randomly picked as the crossover point for the second
parent. The crossover operation is then performed on the two parents. The
offspring (Figure 4.3c) is equivalent to x and is not particularly noteworthy.
Advanced Genetic
Programming
37
Chapter 5
Alternative Initialisations
and Operators in
Tree-based GP
39
40 5 Alternative Initialisations and Operators in Tree-based GP
range the size and shape of the initial trees within a few generations. As
discussed in Section 11.3.1 (page 101), the excessive sampling of short pro-
grams appears to be an important cause of bloat (the uncontrolled growth
of programs during GP runs, which will be described in more detail in Sec-
tion 11.3, page 101 onwards). It has been shown (Dignum and Poli, 2007)
that when the initial population is created with the size distribution pre-
ferred by crossover (see Section 11.3.1), bloat is more marked. The distri-
bution has a known mathematical formula (it is a Lagrange distribution of
the second kind), but in practice it can be created by simply performing
multiple rounds of crossover on a population created in the traditional way
before the GP run starts. This is known as Lagrange initialisation. These
findings suggest that initialisation methods which tend to produce many
short programs may in fact induce bloat sooner than methods that produce
distributions more skewed towards larger programs.
5.1.3 Seeding
The most common way of starting a GP run from an informed non-random
point is seeding the initial population with an individual which, albeit not
a solution, is thought to be a good starting point. Such a seed may have
been produced by an earlier GP run or perhaps constructed by the user
(Aler, Borrajo, and Isasi, 2002; Holmes, 1995; Hsu and Gustafson, 2001;
Langdon and Nordin, 2000; Langdon and Treleaven, 1997; Westerberg and
Levine, 2001). However, Marek, Smart, and Martin (2002) reported that
hand written programs may not be robust enough to prosper in an evolving
population.
One point to be careful of is that such a seed individual is liable to be
much better than randomly created trees. Thus, its descendants may take
over the population within a few generations. So, under evolution the seeded
population is initially liable to lose diversity rapidly. Furthermore, depend-
ing upon the details of the selection scheme used, a single seed individual
may have some chance of being removed from the population. Both problems
are normally dealt with by filling the whole population with either identical
or mutated copies of the seed. This method creates a low diversity initial
population in a controlled way, thereby avoiding the initial uncontrolled loss
of diversity associated with single seeds. Furthermore, with many copies of
the seed, few selection methods will have much chance of removing all copies
of the seed before they are able to create children. Diversity preserving tech-
niques, such as multi-objective GP (e.g., (Parrott, Li, and Ciesielski, 2005),
(Setzkorn, 2005) and Chapter 9), demes (Langdon, 1998) (see Section 10.3),
fitness sharing (Goldberg, 1989) and the use of multiple seed trees, might
also be good cures for the problems associated with the use of a single seed.
In any case, the diversity of the population should be monitored to ensure
that there is significant mixing of different initial trees.
42 5 Alternative Initialisations and Operators in Tree-based GP
5.2 GP Mutation
5.2.1 Is Mutation Necessary?
Mutation was used in early experiments in the evolution of programs, e.g.,
in (Bickel and Bickel, 1987; Cramer, 1985; Fujiki and Dickinson, 1987). It
was not, however, used in (Koza, 1992) and (Koza, 1994), as Koza wished to
demonstrate that mutation was not necessary and that GP was not perform-
ing a simple random search. This has significantly influenced the field, and
mutation is often omitted from GP runs. While mutation is not necessary
for GP to solve many problems, O’Reilly (1995) argued that mutation — in
combination with simulated annealing or stochastic iterated hill climbing —
can perform as well as crossover-based GP in some cases. Nowadays, mu-
tation is widely used in GP, especially in modelling applications. Koza also
advises to use of a low level of mutation; see, for example, (Koza, Bennett,
Andre, and Keane, 1996b).
Comparisons of crossover and mutation suggest that including mutation
can be advantageous. Chellapilla (1997b) found that a combination of six
mutation operators performed better than previously published GP work on
four simple problems. Harries and Smith (1997) also found that mutation
based hill climbers outperformed crossover-based GP systems on similar
problems. Luke and Spector (1997) suggested that the situation is complex,
and that the relative performance of crossover and mutation depends on
both the problem and the details of the GP system.
5.3 GP Crossover
During biological sexual reproduction, the genetic material from both
mother and father is combined in such a way that genes in the child are
in approximately the same position as they were in its parents. This is quite
different from traditional tree-based GP crossover, which can move a subtree
to a totally different position in the tree structure.
Crossover operators that tend to preserve the position of genetic ma-
terial are called homologous, and several notions of homologous crossover
have been proposed for GP. It is fairly straightforward to realise homolo-
gous crossover when using linear representations, and homologous operators
are widely used in linear GP (cf. Figure 7.4, page 65) (Defoin Platel, Clergue,
and Collard, 2003; Francone, Conrads, Banzhaf, and Nordin, 1999; Hansen,
2003; Hansen, Lowry, Meservy, and McDonald, 2007; Nordin, Banzhaf, and
Francone, 1999; O’Neill, Ryan, Keijzer, and Cattolico, 2003). Various forms
of homologous crossover have also been proposed for tree-based GP (Col-
let, 2007; Langdon, 2000; Lones, 2003; MacCallum, 2003; Yamamoto and
Tschudin, 2005).
The oldest homologous crossover in tree-based GP is one-point crossover
(Langdon and Poli, 2002; Poli and Langdon, 1997, 1998a). This works by se-
lecting a common crossover point in the parent programs and then swapping
the corresponding subtrees. To allow for the two parents having different
shapes, one-point crossover analyses the two trees from the root nodes and
selects the crossover point only from the parts of the two trees in the common
region (see Figure 5.1). In the common region, the parents have the same
shape.1 The common region is related to homology, in the sense that the
common region represents the result of a matching process between parent
trees. Within the common region between two parent trees, the transfer of
homologous primitives can happen like it does in a linear bit string genetic
algorithm.
Uniform crossover for trees (Poli and Langdon, 1998b) works (in the
common region) like uniform crossover in GAs. That is, the offspring are
created by visiting the nodes in the common region and flipping a coin at
1 Nodes in the common region need not be identical but they must have the same
arity. That is, they must both be leaves or both be functions with the same number of
inputs.
5.3 GP Crossover 45
Parent 1 Parent 2
+ * +*
Parent 1 Common
sin + sin* +* Region
* *
Alignment Parent 2
x x x y y y * y x y xy x*
y y y y
Selection of
Common
Crossover Point
Offspring 1 Offspring 2 Crossover Point
+ * +*
Swap
sin * * + Parent 1 sin* + * Parent 2
x y * y y x x y x y xy x*
y y y y
Modular, Grammatical
and Developmental
Tree-based GP
47
48 6 Modular, Grammatical and Developmental Tree-based GP
taken from parts of fit GP trees. Special mutation operations allowed the
GP population to share code by referring to the same code within the li-
brary. Subsequently, Angeline suggested that the scheme’s advantages lay
in allowing GP individuals to access far more code than they actually “held”
within themselves, rather than principally in developing more modular code.
Rosca and Ballard (1996a) used a similar scheme, but were able to use much
more information from the fitness function to guide the selection of the code
to be inserted into the library and its subsequent use by members of the GP
population. Olsson (1999, 1995) later developed an abstraction operator for
use in his ADATE system, where sub-functions (anonymous lambda expres-
sions) were automatically extracted. Unlike Angeline’s library approach,
Olsson’s modules remained attached to the individual they were extracted
from.
Koza’s automatically defined functions (ADFs) (Koza, 1994) remain the
most widely used method of evolving reusable components and have been
used successfully in a variety of settings. Basic ADFs (covered in Sec-
tion 6.1.1) use a fixed architecture specified in advance by the user. Koza
later extended this using architecture altering operations (Section 6.1.2),
which allow the architecture to evolve along with the programs.
ROOT
The ADF (Equation 6.2) is simply the squaring function, but by combining
this multiple times in the RPB (Equation 6.1) this individual computes x8
in a highly compact fashion.
It is important to not be fooled by a tidy example like this. ADFs
evolved in real applications are typically complex and can be very difficult
to understand. Further, simply including ADFs provides no guarantee of
modular re-use. As is discussed in Chapter 13, there are no silver bullets.
It may be that the RPB never calls an ADF or only calls it once. It is also
common for an ADF to not actually encapsulate any significant logic. For
example, an ADF might be as simple as a single terminal, in which case it
is essentially just providing a new name for that terminal.
In Koza’s approach, each ADF is attached (as a branch) to a specific indi-
vidual in the population. This is in contrast to both Angeline’s and Rosca’s
systems mentioned above, both of which have general pools of modules or
components which are shared across the population. Sometimes recursion
is allowed in ADFs, but this frequently leads to infinite computations. Typ-
ically, recursion is prevented by imposing an order on the ADFs within an
individual and by restricting calls so that ADFi can only call ADFj if i < j.
In the presence of ADFs, recombination operators are typically con-
strained to respect the larger structure. That is, during crossover, a subtree
50 6 Modular, Grammatical and Developmental Tree-based GP
from ADFi can only be swapped with a subtree from another individual’s
ADFi .
The program’s result-producing branch and its ADFs typically have dif-
ferent function and terminal sets. For example, the terminal set for ADFs
usually include arguments, such as arg0, arg1. Typically the user must
decide in advance the primitive sets, the number of ADFs and any call re-
strictions to prevent recursion. However, these choices can be evolved using
the architecture-altering operations described in Section 6.1.2.
Koza also proposed other types of automatically evolved program com-
ponents (Koza, Andre, Bennet, and Keane, 1999). Automatically defined
iterations (ADIs), automatically defined loops (ADLs) and automatically
defined recursions (ADRs) provide means to reuse code. Automatically de-
fined stores (ADSs) provide means to reuse the result of executing code.
1. The user may specify in advance the architecture of the overall pro-
gram, i.e., perform an architecture-defining preparatory step in addi-
tion to the five steps itemised in Chapter 3.
Koza and his colleagues have used these architecture altering operations
quite widely in their genetic design work, where they evolve GP trees that
encode a collection of developmental operations that, when interpreted, gen-
erate a complex structure like a circuit or an optical system (see, for example,
Section 12.3, page 118).
The idea of architecture altering operations was extended to the ex-
tremely general Genetic Programming Problem Solver (GPPS), which is
described in detail in (Koza et al., 1999, part 4). This is an open ended
system which combines a small set of basic vector-based primitives with the
architecture altering operations in a way that can, in theory, solve a wide
range of problems with almost no input required from the user other than
the fitness function. The problem is that this open-ended system needs a
very carefully constructed fitness function to guide it to a viable solution, an
enormous amount of computational effort, or both. As a result it is currently
an idea of more conceptual than practical value.
a type system, since these often generate results that are more comprehen-
sible (Haynes, Wainwright, Sen, and Schoenefeld, 1995), (Langdon, 1998,
page 126). Similarly, if there is domain knowledge that strongly suggests a
particular syntactic constraint on the solution, then ignoring that constraint
may make it much harder to find a solution.
We will focus here on three different approaches to constraining the syn-
tax of the evolved expression trees in GP: simple structure enforcement
(Section 6.2.1), strongly typed GP (Section 6.2.2) and grammar-based con-
straints (Section 6.2.3). Finally, we consider the advantages and disadvan-
tages of syntactic and type constraints and their biases (Section 6.2.4).
and all the genetic operators are implemented so as to ensure that they do
not violate the type system’s constraints.
Returning to the if example from Section 3.2.1 (page 21), we might have
an application with both numeric and Boolean terminals (e.g., get speed
and is food ahead). We might then have an if function that takes three
arguments: a test (Boolean), the value to return if the test is true, and
the value to return if the test is false. Assuming that the second and third
values are numbers, then the output of the if is also going to be numeric.
If we choose the test argument as a crossover point in the first parent, then
the subtree (excised from the second parent) to insert must have a Boolean
output. That is, we must find either a function which returns a Boolean or
a Boolean terminal in the other parent tree to be the root of the subtree
which we will insert into the new child. Conversely if we choose either the
second or third argument as a crossover point in the first parent, then the
inserted subtree must be numeric. In all three cases, given that both parents
are type correct, restricting the second crossover point in this way ensures
the child will also be type correct.
This basic approach to types can be extended to more complex type sys-
tems including simple generics (Montana, 1995), multi-level type systems
(Haynes, Schoenefeld, and Wainwright, 1996), fully polymorphic types (Ols-
son, 1994), and polymorphic higher-order type systems (Yu, 2001).
tree
E * sin ( E * t )
var ( E op E )
y var + var
x z
In this sort of system, the grammar is typically used to ensure the initial
population is made up of legal “grammatical” programs. The grammar is
also used to guide the operations of the genetic operators. Thus we need to
keep track not only of the program itself, but also the syntax rules used to
derive it.
What actually is evolved in a grammar-based GP system depends on the
particular system. Whigham (1996), for example, evolved derivation trees,
which effectively are a hierarchical representation of which rewrite rules must
be applied, and in which order, to obtain a particular program. Figure 6.2
shows an example of a derivation tree representing a grammatical program
with respect to the grammar in Equation (6.3). In this system, crossover is
restricted to only swapping subtrees deriving from a common non-terminal
symbol in the grammar. So, for example, a subtree rooted by an E node
could be replaced by another also rooted by an E, while an E-rooted subtree
could not be replaced by an op-rooted one.
The actual program represented by a derivation tree can be obtained by
reading out the leaves of the tree one by one from left to right. For the
derivation tree in Figure 6.2, for example, this produces the program
y × sin((x + z) × t).
However, for efficiency reasons, in an actual implementation it is not con-
venient to extract the program represented by a derivation tree is this way.
6.2 Constraining Structures 55
tree
→ h 39 mod 1 = 0, i.e., there is only one option i
E × sin(E × t)
→ h 7 mod 2 = 1, i.e., choose second option i
(E op E) × sin(E × t)
→ h 2 mod 2 = 0, i.e., take the first option i
(var op E) × sin(E × t)
→ h 83 mod 3 = 2, pick the third variable, i
(z op E) × sin(E × t)
→ h 66 mod 4 = 2, take the third operator i
(z × E) × sin(E × t)
...
(z × x) × sin(z × t)
2 By “solution density” we refer to the ratio between the number of acceptable solutions
in a program search space and the size of the search space itself. This is a rough assessment
of how hard a problem is, since it gives an indication of how long random search would
take to explore the program space before finding an acceptable solution.
3 The process is easier to explain with a movie. This can be downloaded from http:
//www.genetic-programming.com/gpdevelopment.html.
58 6 Modular, Grammatical and Developmental Tree-based GP
ing from developmental processes often have some regularity, which other
methods obtain through the use of ADFs, constraints, types, etc. A dis-
advantage is that, with cellular encoding, individuals require an additional
genotype-to-phenotype decoding step. However, when the fitness function
involves complex calculations with many fitness cases, the relative cost of the
decoding step is often small compared with the rest of the fitness function.
Until now we have been talking about the evolution of programs expressed
as one or more trees which are evaluated by a suitable interpreter. This is
the original and most widespread type of GP, but there are other types of
GP where programs are represented in different ways. This chapter will look
at linear programs and graph-like (parallel) programs.
7.1.1 Motivations
There are two different reasons for trying linear GP. Firstly, almost all
computer architectures represent computer programs in a linear fashion with
61
62 7 Linear and Graph Genetic Programming
Arg 2
Output Arg 1 Opcode
0...127
R0..R7 R0..R7 +−*/ or
R0..R7
in depth-first order. Primitives are only executed, however, when their arguments have
been evaluated. So, the root node is the first node visited but the last executed.
7.1 Linear Genetic Programming 63
protecting the input registers to prevent the inputs from being overwritten. Otherwise
evolved programs (especially in the early generations) are prone to writing over their
inputs before they’ve had a chance to use them in any constructive way.
64 7 Linear and Graph Genetic Programming
Parent 1
Parent 2
Offspring
Parent 1
Parent 2
Offspring 1
Offspring 2
max
max
∗ +
+
x y ∗ 3 ∗ 3
x y x y
(a) (b)
Figure 7.5: A sample tree where the same subtree is used twice (a) and
the corresponding graph-based representation of the same program (b). The
graph representation may be more efficient since it makes it possible to avoid
the repeated evaluation of the same subtree.
with nodes representing functions and terminals. Edges represent both con-
trol flow and data flow. The possible efficiency gains obtained by a graph
representation are illustrated in Figure 7.5.
In the simplest form of PDGP edges are directed and unlabelled, in
which case PDGP is a generalisation of standard GP. However, more com-
plex representations can be used, which allow the evolution of: programs,
including standard tree-like programs, logic networks, neural networks, re-
current transition networks and finite state automata. This can be achieved
by extending the representation by associating labels with the edges of the
program graph. In addition to the function and terminal sets, this form of
PDGP requires the definition of a link set. The labels on the links depend
on what is to be evolved. For example, in neural networks, the link labels
are numerical constants for the neural network weights. In a finite state au-
tomaton, the edges are labelled with the input symbols that determine the
FSA’s state transitions. It is even possible for the labels to be automatically
defined edges, which play a role similar to ADFs (Section 6.1.1) by invoking
other PDGP graphs.
In PDGP, programs are manipulated by special crossover and mutation
operators which guarantee the syntactic correctness of the offspring. Each
node occupies a position in a regular grid. The genetic operators act by
moving, copying or randomly generating sub-regions of the grid. For this
reason PDGP search operators are very efficient.
PDGP programs can be executed according to different policies depend-
7.2 Graph-Based Genetic Programming 67
ing on whether instructions with side effects are used or not. If there are
no side effects, running a PDGP program can be seen as a propagation of
the input values from the bottom to the top of the program’s graph (as in
a feed-forward artificial neural network or data flow parallel computer).
7.2.3 Cartesian GP
In Miller’s Cartesian GP (Miller, 1999; Miller and Smith, 2006), programs
are represented by linear chromosomes containing integers. These are di-
vided into groups of three or four. Each group corresponds to a position in
a 2-D array. One integer in each group defines the primitive (e.g., an AND
gate) at that location in the array. Other integers in the group define the
locations (coordinates) in the genome from which the inputs for that primi-
tive should be drawn. Each primitive does not itself define where its output
is used; this is done by later primitives. A primitive’s output may be used
more than once, or indeed not used at all, depending on the way in which
the other functions’ inputs are specified. Thus, Cartesian GP’s chromo-
somes represent graph-like programs, which is very similar to PDGP. The
main difference between the two systems is that Cartesian GP operators act
at the level of the linear chromosome, while in PDGP they act directly on
68 7 Linear and Graph Genetic Programming
Probabilistic Genetic
Programming
69
70 8 Probabilistic Genetic Programming
Different EDAs use different models for the probability distribution that
controls the sampling (see (Larrañaga, 2002; Larrañaga and Lozano, 2002)
for more information). For example, population-based incremental learning
(PBIL) (Baluja and Caruana, 1995) and the uniform multivariate distribu-
tion algorithm (UMDA) (Mühlenbein and Mahnig, 1999a,b) assume that
each variable is independent of the other variables. Consequently, these al-
gorithms need to store and adjust only a linear array of probabilities, one for
each variable. This works well for problems with weak interactions between
variables. Since no relationship between the variables is stored or learned,
however, PBIL and UMDA may have difficulties solving problems where the
interactions between variables are significant.
Naturally, higher order models are possible. For example, the MIMIC
algorithm of de Bonet, Isbell, and Viola (1997) uses second-order statis-
tics. It is also possible to use flexible models where interactions of dif-
ferent orders are captured. The Bayesian optimisation algorithm (BOA)
(Pelikan, Goldberg, and Cantú-Paz, 1999) uses baysian networks to rep-
resent generic sampling distributions, while the extended compact genetic
algorithm (eCGA) (Harik, 1999) clusters genes into groups where the genes
in each group are assumed to be linked but groups are considered inde-
pendent. The sampling distribution is then taken to be the product of the
distributions modelling the groups.
EDAs have been very successful. However, they are often unable to rep-
resent both the overall structure of the distribution and its local details,
typically being more successful at the former. This is because EDAs rep-
resent the sampling distribution using models with an, inevitably, limited
number of degrees of freedom. For example, suppose the optimal sampling
distribution has multiple peaks, corresponding to different local optima, sep-
arated by large unfit areas. Then, an EDA can either decide to represent
only one peak, or to represent all of them together with the unfit areas. If
the EDA chooses the wrong local peak this may lead to it getting stuck and
not finding the global optimum. Conversely if it takes a wider view, this
leads to wasting many trials sampling irrelevant poor solutions.
Consider, for example, a scenario where there are five binary variables,
x1 , x2 , x3 , x4 and x5 , and two promising regions: one near the string of all
zeros, i.e., (x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 0, 0), and the other near the string
of all ones, i.e., (x1 , x2 , x3 , x4 , x5 ) = (1, 1, 1, 1, 1). One option for a (simple)
EDA is to focus on one of the two regions, e.g., setting the variables xi
to 0 with high probability (say, 90%). This, however, fails to explore the
other region, and risks missing the global optimum. The other option is to
maintain both regions as possibilities by setting all the probabilities to 50%,
i.e., each of the variables xi is as likely to be 0 as 1. These probabilities will
generate samples in both of the promising regions. For example, the strings
(0, 0, 0, 0, 0) and (1, 1, 1, 1, 1) will each be generated with a 3.125% proba-
8.2 Pure EDA GP 71
1 The Hamming distance between two strings (whether binary or not) is the number
of positions where the two strings differ.
2 There is a weak form of dependency, in that there can be a primitive in a particular
position only if the primitive just above it is a function. The choice of this parent primitive
does not, however, influence the choice of the child primitive.
72 8 Probabilistic Genetic Programming
local maxima, and a steep valley between that local maxima and the global optima. They
therefore tend to “trap” populations on the local maxima
5 In the general case a node can depend on the choices of any of the nodes that have
already been chosen. Since the tree is constructed in a depth-first, left-to-right fashion,
it can depend on any nodes that are its direct ancestors, or any nodes that are to its left
in the tree. In practice, however, EDP only tracked the conditional probability of a node
on its parent.
8.2 Pure EDA GP 73
+ 0.1
* 0.2
- 0.1
/ 0.2
x 0.3
R 0.1
+ 0.2 + 0.1
0.3 0.1
* *
- 0.1 - 0.1
/ 0.1 / 0.4
x 0.2 x 0.3
R 0.1 R 0.0
Figure 8.1: Example of probability tree used for the generation of programs
in PIPE. New program trees are created starting from the root node at the
top and moving through the hierarchy. Each node in an offspring tree is
selected from the left hand side of the corresponding table with probability
given by the right hand side. Each branch of the tree continues to expand
until either the tree of probability tables is exhausted or a leaf (e.g., R) is
selected.
74 8 Probabilistic Genetic Programming
Multi-objective
Genetic Programming
The area of multi-objective GP (MO GP) has been very active in the last
decade. In a multi-objective optimisation (MOO) problem, one optimises
with respect to multiple goals or fitness functions f1 , f2 , .... The task of a
MOO algorithm is to find solutions that are optimal, or at least acceptable,
according to all the criteria simultaneously.
In most cases changing an algorithm from single-objective to multi-
objective requires some alteration in the way selection is performed. This is
how many MO GP systems deal with multiple objectives. However, there
are other options. We review the main techniques in the following sections.
The complexity of evolved solutions is one of the most difficult things
to control in evolutionary systems such as GP, where the size and shape of
the evolved solutions is under the control of evolution. In some cases, for
example, the size of the evolved solutions may grow rapidly, as if evolution
was actively promoting it, without any clear benefit in terms of fitness. We
will provide a detailed discussion of this phenomenon, which is know as bloat,
and a variety of counter measures for it in Section 11.3. However, in this
chapter we will review work where the size of evolved solutions has been
used as an additional objective in multi-objective GP systems. Of course,
we will also describe work where other objectives were used.
75
76 9 Multi-objective Genetic Programming
P
example, one could use a linear combination of the form f = i wi fi , where
the parameters w1 , w2 , . . . are suitable constants. A MOO problem can then
be solved by using any single-objective optimisation technique with f as a
fitness function. This method has been used frequently in GP to control
bloat. By combining program fitness and program size to form a parsimo-
nious fitness function one can evolve solutions that satisfy both objectives
(see Koza (1992); Zhang and Mühlenbein (1993, 1995); Zhang, Ohm, and
Mühlenbein (1997) and Section 11.3.2).
A semi-linear aggregation of fitness and speed was used in (Langdon
and Poli, 1998b) to improve the performance of GP on the Santa Fe Trail
Ant problem. There, a threshold was used to limit the impact of speed to
avoid providing an excessive bias towards ants that were fast but could not
complete the trail.
A fitness measure which linearly combines two related objectives, the
sum of squared errors and the number of hits (a hit is a fitness case in which
the error falls below a pre-defined threshold), was used in (Langdon, Barrett,
and Buxton, 2003) to predict biochemical interactions in drug discovery.
Zhang and Bhowan (2004) used a MO GP approach for object detection.
Their fitness function was a linear combination of the detection rate (the
percentage of small objects correctly reported), the false alarm rate (the
percentage of non-objects incorrectly reported as objects), and the false
alarm area (the number of false alarm pixels which were not object centres
but were incorrectly reported as object centres).
O’Reilly and Hemberg (2007) used six objectives for the evolution of
L-systems which developed into 3-D surfaces in response to a simulated
environment. The objectives included the size of the surface, its smoothness,
its symmetry, its undulation, the degree of subdivision of the surface, and
the softness of its boundaries.
(Koza, Jones, Keane, and Streeter, 2004) used 16 different objectives
in the process of designing analogue electrical circuits. In the case of an
amplifier circuit these included: the 10dB initial gain, the supply current, the
offset voltage, the gain ratio, the output swing, the variable load resistance
signal output, etc. These objectives were combined in a complex heuristic
way into a scalar fitness measure. In particular, objectives were divided
into groups and many objectives were treated as penalties that were applied
to the main fitness components only if they are outside certain acceptable
tolerances.
Pareto Front
A Non dominated
y 1 points
2
B 3
x
The main idea in MOO is the notion of Pareto dominance. Given a set
of objectives, a solution is said to Pareto dominate another if the first is not
inferior to the second in all objectives, and, additionally, there is at least one
objective where it is better. This notion can lead to a partial order, where
there is no longer a strict linear ordering of solutions. In Figure 9.1, for
example, individual A dominates (is better than) individual B along the y
axis, but B dominates A along the x axis. Thus there is no simple ordering
between then. The individual marked ‘2’, however dominates B on both
axes and would thus be considered strictly better than B.
In this case the goal of the search algorithm becomes the identification of
a set of solutions which are non-dominated by any others. Ideally, one would
want to find the Pareto front, i.e., the set of all non-dominated solutions in
the search space. However, this is often unrealistic, as the size of the Pareto
front is often limited only by the precision of the problem representation. If
x and y in Figure 9.1 are real-valued, for example, and the Pareto front is
a continuous curve, then it contains an infinite number of points, making a
complete enumeration impossible.
to focus the selection procedure towards specific regions of the Pareto front.
Hinchliffe, Willis, and Tham (1998) applied similar ideas to evolve parsimo-
nious and accurate models of chemical processes using MO GP. Langdon
and Nordin (2000) applied Pareto tournaments to obtain compact solutions
in programmatic image compression, two machine learning benchmark prob-
lems and a consumer profiling task. Nicolotti, Gillet, Fleming, and Green
(2002) used multi-objective GP to evolve quantitative structure–activity re-
lationship models in chemistry; objectives included model fitting, the total
number of terms and the occurrence of non-linear terms.
Ekart and Nemeth (2001) tried to control bloat using a variant of Pareto
tournament selection where an individual is selected if it is not dominated
by a set of randomly chosen individuals. If the test fails, another individual
is picked from the population, until one that is non-dominated is found.
In order to prevent very small individuals from taking over the population
in the early generations of runs, the Pareto criterion was modified so as
to consider as non-dominated solutions also those that were only slightly
bigger, provided their fitness was not worse.
Bleuler, Brack, Thiele, and Zitzler (2001) suggested using the well-known
multi-objective optimiser SPEA2 (Zitzler, Laumanns, and Thiele, 2001) to
reduce bloat. de Jong, Watson, and Pollack (2001) and de Jong and Pollack
(2003) proposed using a multi-objective approach to promote diversity and
reduce bloat, stressing that without diversity enhancement (given by modern
MOO methods) searches can easily converge to solutions that are too small
to solve a problem. Tests with even parity and other problems were very
encouraging. Badran and Rockett (2007) argued in favour of using mutation
to prevent the population from collapsing onto single-node individuals when
using a multi-objective GP.
As well as directly fighting bloat, MO GP can also be used to simplify
solution trees. After GP has found a suitable (but large) model, for example,
one can continue the evolutionary process, changing the fitness function to
include a second objective that the model be as small as possible (Langdon,
1998). GP can then trim the trees while ensuring that the simplified program
still fits the training data.
round of comparisons with the rest of the population was used as a tie
breaker. The method successfully evolved queues, lists, and circular lists.
Langdon and Poli (1998b) used Pareto selection with two objectives,
fitness and speed, to improve the performance of GP on the Santa Fe Trail
Ant problem. Ross and Zhu (2004) used MO GP with different variants of
Pareto selection to evolve 2-D textures. The objectives were feature tests
that were used during fitness evaluation to rate how closely a candidate
texture matched visual characteristics of a target texture image. Dimopoulos
(2005) used MO GP to identify the Pareto set for a cell-formation problem
related to the design of a cellular manufacturing production system. The
objectives included the minimisation of total intercell part movement, and
the minimisation of within-cell load variation.
Rossi, Liberali, and Tettamanzi (2001) used MO GP in electronic design
automation to evolve VHDL code. The objectives used were the suitability
of the filter transfer function and the transition activity of digital blocks.
Cordon, Herrera-Viedma, and Luque (2002) used Pareto-dominance-based
GP to learn Boolean queries in information retrieval systems. They used two
objectives: precision (the ratio between the relevant documents retrieved in
response to a query and the total number of documents retrieved) and recall
(the ratio between the relevant documents retrieved and the total number
of documents relevant to the query in the database).
Barlow (2004) used a GP extension of the well-known NSGA-II MOO
algorithm (Deb, Agrawal, Pratap, and Meyarivan, 2000) for the evolution of
autonomous navigation controllers for unmanned aerial vehicles. Their task
was locating radar stations, and all work was done using simulators. Four
objectives were used: the normalised distance from the emitter, the circling
distance from the emitter, the stability of the flight, and the efficiency of
the flight.
Araujo (2006) used MO GP for the joint solution of the tasks of statistical
parsing and tagging of natural language. Their results suggest that solving
these tasks jointly led to better results than approaching them individually.
Han, Zhou, and Wang (2006) used a MO GP approach for the identi-
fication of chaotic systems where the objectives included chaotic invariants
obtained by chaotic time series analysis as well, as the complexity and per-
formance of the models.
Khan (2006) used MO GP to evolve digital watermarking programs. The
objectives were robustness in the decoding stage, and imperceptibility by the
human visual system. Khan and Mirza (2007) added a third objective aimed
at increasing the strength of the watermark in relation to attacks.
Kotanchek, Smits, and Vladislavleva (2006) compared different flavours
of Pareto-based GP systems in the symbolic regression of industrial data.
Weise and Geihs (2006) used MO GP to evolve protocols in sensor networks.
The goal was to identify one node on a network to act as a communication
80 9 Multi-objective Genetic Programming
relay. The following objectives were used: the number of nodes that know
the designated node after a given amount of time, the size of the protocol
code, its memory requirements, and a transmission count.
Agapitos, Togelius, and Lucas (2007) used MO GP to encourage the
effective use of state variables in the evolution of controllers for toy car
racing. Three different objectives were used: the ratio of the number of
variables used within a program to the number of variables offered for use by
the primitive language, the ratio of the number of variables being set within
the program to the number of variables being accessed, and the average
positional distance between memory setting instructions and corresponding
memory reading instructions.
When two or three objectives need to be simultaneously optimised, the
Pareto front produced by an algorithm is often easy to visualise. When
more than three objectives are optimised, however, it becomes difficult to
directly visualise the set of non-dominated solutions. Valdes and Barton
(2006) proposed using GP to identify similarity mappings between high-
dimensional Pareto fronts and 3-D space, and then use virtual reality to
visualise the result.
which initially guide GP towards solutions that maximise the main objec-
tive. When enough of the population has reached reasonable levels in that
objective, the fitness function is modified so as to guide the population to-
wards the optimisation of a second objective. In principle this process can
be iterated for multiple objectives. Of course, care needs to be taken to
ensure that the functionality reached with a set of previous fitness measures
is not wiped by the search for the optima of a later fitness function. This
can be avoided by making sure each new fitness function somehow includes
all the previous ones. For example, the fitness based on the new objectives
can be added to the pre-existing objectives with some appropriate scaling
factors.
A similar effect can be achieved via static, but staged, fitness functions.
These are staged in the sense that certain levels of fitness are only be made
available to an individual once it has reached a minimum acceptable perfor-
mance on all objectives at the previous level. If each level represents one of
the objectives, individuals are then encouraged to evolve in directions that
ensure that good performance is achieved and retained on all objectives.
Koza et al. (1999) used this strategy when using GP for the evolution of
electronic circuits where many criteria, such as input-output performance,
power consumption, size, etc., must all be taken into account to produce
good circuits. Kalganova and Miller (1999) used Cartesian GP (see Sec-
tion 7.2.3) to design combinational logic circuits. A circuit’s fitness was
given by a value between 0 and 100 representing the percentage of output
bits that were correct. If the circuit was 100% functional, then a further
component was added which represented the number of gates in the graph
that were not involved in the circuit. Since all individuals had the same
number of gates available in the Cartesian GP grid, this could be used to
minimise the number of gates actually used to solve the problem at hand.
modify them in such a way as to ensure a problem’s hard constraints are respected.
82 9 Multi-objective Genetic Programming
The pygmies and civil servants approach proposed in (Ryan, 1994, 1996)
combines the separation typical of Pareto-based approaches with biased
search operators. In this system two lists are built, one where individu-
als are ranked based on fitness and the other where individuals are ranked
based on a linear combination of fitness and size (i.e., a parsimonious fit-
ness function). During crossover, the algorithm draws one parent from the
first list and the other from the second list. This can be seen as a form of
disassortative mating aimed at maintain diversity in the population. An-
other example of this kind is (Zhang and Rockett, 2005) where crossover
was modified so that an offspring is retained only if it dominates either of
its parents.
Furthermore, as discussed in Sections 5.2 and 11.3.2, there are several
mutation operators with a direct or indirect bias towards smaller programs.
This provides a pressure towards the evolution of more parsimonious solu-
tions throughout a run.
As with the staged fitness functions discussed in the previous section,
it is also possible to activate operators with a known bias towards smaller
programs only when the main objective — say a 100% correct solution — has
been achieved. This was tested in (Pujol, 1999; Pujol and Poli, 1997), where
GP was used to evolve neural networks. After a 100% correct solution was
found, one hidden node of each network in the population was replaced by
a terminal, and the evolution process was resumed. This pruning procedure
was repeated until the specified number of generations had been reached.
Chapter 10
Users of all artificial intelligence tools are always eager to extend the bound-
aries of their techniques, for example by attacking more and more difficult
problems. In fact, to solve hard problems it may be necessary to push GP
to the limit — populations of millions of programs and/or long runs may be
necessary.
There are a number of techniques to speed up, parallelise and distribute
GP search. We start by looking at ways to reduce the number of fitness
evaluations or increase their effectiveness (Section 10.1) and ways to speed
up their execution (Section 10.2). We then look at the idea of running GP
in parallel (Section 10.3) and point out that faster evaluation is not the
only reason for doing so, as geographic distribution has advantages in its
own right. In Section 10.4 we describe master–slave parallel architectures
(Section 10.4.1), running GP on graphics hardware (Section 10.4.2) and
FPGAs (Section 10.4.3). Section 10.4.4 describes a fast method to exploit
the parallelism available on every computer. Finally, Section 10.5 concludes
this chapter with a brief discussion of distributed, even global, evolution of
programs.
83
84 10 Fast and Distributed Genetic Programming
1 Common selection algorithms include roulette wheel selection (Goldberg, 1989), SUS
(Blickle, 1996; Droste, Jansen, Rudolph, Schwefel, Tinnefeld, and Wegener, 2003), but
for tournament selection, a simple rule of thumb is often sufficient. If T is the tour-
nament size, roughly logT (Pop size) generations are needed for the whole population to
become descendants of a single individual. If, for example, we use binary tournaments
(T = 2), then “take over” will require about ten generations for a population of 1,024.
Alternatively, if we have a population of one million (106 ) and use ten individuals in each
tournament (T = 10), then after about six generations more or less everyone will have
the same great6 great5 great4 great3 grand2 mother1 .
3 Siegel (1994) proposed a rather different implementation.
10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 85
used over time, the evolving population saw more of the training data and so
was less liable to over fit a fraction of them. Thirdly, by randomly changing
the fitness function, it became more difficult for evolution to produce an
overspecialised individual which took over the population at the expense of
solutions which were viable on other parts of the training data. Dynamic
subset selection (DSS) appears to have been the most successful of Gather-
cole’s suggested algorithms. It has been incorporated into Discipulus (see
page 63), and was recently used in a large data mining application (Curry,
Lichodzijewski, and Heywood, 2007).
Where each fitness evaluation may take a long time, it may be attrac-
tive to interrupt a long-running program in order to let others run. In GP
systems which allow recursion or contain iterative elements (Brave, 1996;
Langdon, 1998; Wilson and Heywood, 2007; Wong and Leung, 1996) it is
common to enforce a time limit, a limit on the number of instructions ex-
ecuted, or a bound on the number of times a loop is executed. Maxwell
(1994) proposed a solution to the question of what fitness to give to a pro-
gram that has been interrupted. He allowed each program in the population
a quantum of CPU time. When the program used up its quantum it was
check-pointed.4 In Maxwell’s system, programs gained fitness as they ran,
i.e., each time a program correctly processed a fitness case, its fitness was
incremented. Tournament selection was then performed. If all members of
the tournament had used the same number of CPU quanta, then the fitter
program was the winner. If, however, one program had used less CPU than
the others (and had a lower fitness) then it was restarted and run until it
had used as much CPU as the others. Then fitnesses were compared in the
normal way.
Teller (1994) had a similar but slightly simpler approach: every indi-
vidual in the population was run for the same amount of time. When the
allotted time elapsed a program was aborted and an answer extracted from
it, regardless of whether it had terminated or not. Teller called this an “any
time” approach. This suits graph systems like Teller’s PADO (Section 7.2.2)
or linear GP (Chapter 7.1) where it is easy to designate a register as the
output register. The answer can then be extracted from this register or from
an indexed memory cell at any point (including whilst the programming is
running). Other any time approaches include (Spector and Alpern, 1995)
and (Langdon and Poli, 2008).
A simple technique to speed up the evaluation of complex fitness func-
tions is to organise the fitness function into stages of progressively increasing
computational cost. Individuals are evaluated stage by stage. Each stage
contributes to the overall fitness of a program. However, individuals need
counter and stack) is saved so that it can later be restarted from where it was stopped.
Many multi-tasking operating systems do something similar.
86 10 Fast and Distributed Genetic Programming
only must the answer be stored, but the interpreter needs to know that the
subtree’s inputs are the same too. The common practices of GP come to our
aid here. Usually every tree in the population is run on exactly the same
inputs for each of the fitness cases. Thus, for a cache to work, the interpreter
does not need to know a tree’s inputs in detail, it need only know which of
the fixed set of test cases was used.
A simple means of implementing this type of cache is to store a vector of
values returned by each subtree for each of the test cases. Whenever a sub-
tree is created (i.e., in the initial generation, by crossover or by mutations)
the interpreter is run and the cache of values for its root node is set. Note
this is recursive, so caches can also be calculated for subtrees within it at
the same time. Now, when the interpreter is run and comes to a subtree’s
root node, it will simply retrieve the value it calculated earlier, using the
test case’s number as an index into the cache vector.
If a subtree is created by mutation, then its cache of values will be
initially empty and will have to be calculated. However, this costs no more
than it would without caches.
When code is inserted into an existing tree, be it by mutation or
crossover, the chance that the new code behaves identically to the old code
is normally very small. This means that the caches of every node between
the new code and the root node may be invalid. The simplest solution is
to re-evaluate them all. This may sound expensive, but the caches in all
the other parts of the individual remain valid and can be used when the
cache above them is re-evaluated. Thus, in effect, if the crossed over code is
inserted at depth d, only d nodes need to be evaluated.
The whole question of monitoring how effective individual caches are,
what their hit-rates are, etc. has been little explored. In practice, impressive
savings have been achieved by simple implementations, with little monitor-
ing and rudimentary garbage collection. Recent analysis (Ciesielski and Li,
2004; Dignum and Poli, 2007; Langdon and Poli, 2002; Poli et al., 2007)
has shown that GP trees tend not to have symmetric shapes, and many
leaves are very close to the root. This provides a theoretical explanation for
why considerable computational saving can be made by using fitness caches.
While it is possible to use hashing schemes to efficiently find common code,
in practice assuming that common code only arises because it was inherited
from the same location (e.g., by crossing over) is sufficient.
As well as the original Directed acyclic graph (DAG) implementation
(Handley, 1994) other work includes (Ciesielski and Li, 2004; Keijzer, 1996;
McPhee, Hopper, and Reierson, 1998; Yangiya, 1995). While so far we have
only considered programs where no side effects take place, there are cases
where caching can be extended outside this domain. For example, Langdon
(1998) used fitness caches in evolved trees with side effects by exploiting
syntax rules about where in the code the side-effects could lie.
88 10 Fast and Distributed Genetic Programming
10.4.1 Master–slave GP
If the objective is purely to speed up runs, we may want our parallel GP to
work exactly the same as it did on a single computer. This is possible, but
to achieve it we have to be very careful to ensure that, even if some parts of
the population are evaluated more quickly, parallelisation does not change
how we apply selection and which GP individual crosses over with which.
Probably the easiest way to implement this is the master–slave model.
In the master–slave model (Oussaidène, Chopard, Pictet, and Tomassini,
1997) breeding, selection crossover, mutation etc. occur just as they would
on a single computer and only fitness evaluation is spread across a network
90 10 Fast and Distributed Genetic Programming
of computers. Each GP individual and its fitness cases are sent across the
network to a different compute node. The central node waits for the compute
nodes to return their individuals’ fitnesses. Since individuals and fitness
values are typically stored in small data structures, this can be quite efficient
since transmission overheads are limited.
The central node is an obvious bottleneck. Also, a slow compute node
or a lengthy fitness case will slow down the whole GP population, since
eventually its result will be needed before moving onto the next generation.
PCI Express
Input Assembler
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA TA
TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF
Thread Processor
TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF TF
10.4 Running GP on Parallel Hardware
L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1
ROP L2 CACHE ROP L2 CACHE ROP L2 CACHE ROP L2 CACHE ROP L2 CACHE ROP L2 CACHE
64 MB 64 MB 64 MB 64 MB 64 MB 64 MB 64 MB 64 MB 64 MB 64 MB 64 MB 64 MB
Figure 10.1: nVidia 8800 Block diagram. The 128 1360 MHz Stream Processors are arranged in 16 blocks of 8. Blocks
share 16 KB memory (not shown), an 8/1 KB L1 cache, 4 Texture Address units and 8 Texture Filters. The 6×64 bit
bus (dashed) links off chip RAM at 900 MHz. (Since there are two chips for each of the six off-chip memory banks, the
bus is effectively running at up to 1800 Mhz per bank.) There are 6 Raster Operation Partitions. (nVidia, 2007).
91
92 10 Fast and Distributed Genetic Programming
that could be compiled for the GPU on the host PC. The compiled pro-
grams were transferred one at a time to a GPU for fitness evaluation. Both
groups obtained impressive speedups by running many test cases in parallel.
Langdon and Banzhaf (2008) and Langdon and Harrison (2008) created
a SIMD interpreter (Juille and Pollack, 1996) using RapidMind’s GNU C++
OpenGL framework to simultaneously run up to a quarter of a million GP
trees on an NVIDIA GPU (see Figure 10.1).5 As discussed in Section 7.1.2,
GP trees can be linearised. This avoids pointers and yields a very compact
data structure; reducing the amount of memory needed in turn facilitates
the use of large populations. To avoid recursive calls in the interpreter,
Langdon used reverse polish notation (RPN), i.e., a post-fix rather than
a pre-fix notation. Only small modifications are needed to crossover and
mutation so that they act directly on the RPN expressions. This means the
same representation is used on both the host and the GPU. Almost a billion
GP primitives can be interpreted by a single graphics card per second. In
both Cartesian and tree-based GP the genetic operations are done by the
host CPU. Wong, Wong, and Fok (2005) showed, for a genetic algorithm,
these too can be done by the GPU.
Although each of the GPU’s processors may be individually quite fast
and the manufacturers claim huge aggregate FLOPS ratings, the GPUs are
optimised for graphics work. In practice, it is hard to keep all the processors
fully loaded. Nevertheless 30 GFLOPS has been achieved (Langdon and
Harrison, 2008). Given the differences in CPU and GPU architectures and
clock speeds, often the speedup from using a GPU rather than the host
CPU is the most useful statistic. This is obviously determined by many
factors, including the relative importance of amount of computation and
size of data. The measured RPN tree speedups were 7.6-fold (Langdon and
Harrison, 2008) and 12.6-fold (Langdon and Banzhaf, 2008).
10.4.3 GP on FPGAs
Field programmable gate arrays (FPGAs) are chips which contain large ar-
rays of simple logic processing units whose functionality and connectivity
can be changed via software in microseconds by simply writing a configu-
ration into a static memory. Once an FPGA is configured it can update
all of its thousands of logic elements in parallel at the clock speed of the
circuit. Although an FPGA’s clock speed is often an order of magnitude
slower than that of a modern CPU, its massive parallelism makes it a very
powerful computational device. Because of this and of their flexibility there
has been significant interest in using FPGAs in GP.
Work has ranged from the use of FPGAs to speed up fitness evaluation
5 Bigger populations, e.g. five million programs (Langdon and Harrison, 2008), are
(Koza, Bennett, Hutchings, Bade, Keane, and Andre, 1997; Seok, Lee, and
Zhang, 2000) to the definition of specialised operators (Martin and Poli,
2002). It is even possible to implement a complete GP on FPGAs, as sug-
gested in (Heywood and Zincir-Heywood, 2000; Martin, 2001, 2002; Sidhu,
Mei, and Prasanna, 1998). A massively parallel GP implementation has also
been proposed by Eklund (2001, 2004) although to date all tests with that
architecture have only been performed in simulation.
10.4.4 Sub-machine-code GP
We are nowadays so used to writing programs using high level sequential
languages that it is very easy to forget that, underneath, computers have a
high degree of parallelism. Internally, CPUs are made up of bit-slices which
make it possible for the CPU to process all of the bits of the operands of an
instruction in one go, in a single clock tick.
Sub-machine-code GP (SMCGP) (Poli and Langdon, 1999) is a technique
to speed up GP and to extend its scope by exploiting the internal parallelism
of sequential CPUs. In Boolean classification problems, SMCGP allows the
parallel evaluation of 32 or 64 (depending on the CPU’s word size) fitness
cases per program execution, thereby providing a significant speed-up. This
has made it possible to solve parity problems with up to 4 million fitness
cases (Poli and Page, 2000). SMCGP has also been applied with success
in binary image classification problems (Adorni, Cagnoni, and Mordonini,
2002; Quintana, Poli, and Claridge, 2003). The technique has also been
extended to process multiple fitness cases per program execution in continu-
ous symbolic regression problems where inputs and outputs are real-valued
numbers (Poli, 1999b).
(a) (b)
While many have looked enviously at Koza’s 1000 node Beowulf cluster
(Sterling, 1998) and other supercomputer realisations of GP (Bennett, Koza,
Shipman, and Stiffelman, 1999; Juille and Pollack, 1996), a supercomputer is
often not necessary. Many businesses and research centres leave computers
permanently switched on. During the night their computational resources
tend to be wasted. This computing power can easily and efficiently be
used to execute distributed GP runs overnight. Typically, GP does not
demand a high performance bus to interconnect the compute nodes, and
so existing office Ethernet networks are often sufficient. While parallel GP
systems can be implemented using MPI (Walker, 2001) or PVM (Fernandez,
Sanchez, Tomassini, and Gomez, 1999), the use of such tools is not necessary:
simple Unix commands and port-to-port HTTP is sufficient (Poli, Page,
and Langdon, 1999). The population can be split and stored on modest
computers. With only infrequent interchange of parts of the population
or fitness values little bandwidth is needed. Indeed a global population
spread via the Internet (Chong and Langdon, 1999; Draves, 2006; Klein and
Spector, 2007; Langdon, 2005a), à la seti@home, is perfectly feasible (see
Figure 10.3).
Other parallel GPs include (Cheang, Leung, and Lee, 2006; Folino, Piz-
zuti, and Spezzano, 2003; Gustafson and Burke, 2006; Klein and Spector,
2007; Tanev, Uozumi, and Akhmetov, 2004).
Chapter 11
Most of this book is about the mechanics of GP and its practical use for
solving problems. In fact, as will become clear in Chapter 12, GP has
been remarkably successful as a problem-solving and engineering tool. One
might wonder how this is possible, given that GP is a non-deterministic
algorithm, and as a result its behaviour varies from run to run. It is also a
complex adaptive system which sometimes shows intricate and unexpected
behaviours (such as bloat). Thus it is only natural to be interested in GP
from the scientific point of view. That is, we want to understand why can
GP solve problems, how it does it, what goes wrong when it cannot, what are
the reasons for certain undesirable behaviours, what can we do to get rid of
them without introducing new (and perhaps even less desirable) problems,
and so on.
GP is a search technique that explores the space of computer programs.
The search for solutions to a problem starts from a group of points (random
programs) in this search space. Those points that are above average quality
are then used to generate a new generation of points through crossover,
mutation, reproduction and possibly other genetic operations. This process
is repeated over and over again until a stopping criterion is satisfied. If we
could visualise this search, we would often find that initially the population
looks like a cloud of randomly scattered points, but that, generation after
generation, this cloud changes shape and moves in the search space. Because
GP is a stochastic search technique, in different runs we would observe
different trajectories. If we could see regularities, these might provide us
with a deep understanding of how the algorithm is searching the program
space for the solutions, and perhaps help us see why GP is successful in
finding solutions in certain runs and unsuccessful in others. Unfortunately,
97
98 11 GP Theory and its Applications
0.1
0.01
0.001
0.0001
1e-05
1e-06
255
1e-07
201
151
0 127
10
20 91 Size
30 63
40
50 31
60
70 1
Three-Input Boolean equivalence class 80
Figure 11.1: Proportion of NAND trees that yield each three-input func-
tions. As circuit size increases the distribution approaches a limit.
11.3 Bloat
The replication accuracy theory (McPhee and Miller, 1995) states that the
success of a GP individual depends on its ability to have offspring that are
functionally similar to the parent. As a consequence, GP evolves towards
(bloated) representations that increase replication accuracy.
The nodes in a GP tree can often be crudely categorised into two classes:
active code and inactive code. Roughly speaking, inactive code is code
that is not executed, or is executed but its output is then discarded. All
remaining code is active code. The removal bias theory (Soule and Foster,
1998a) observes that inactive code in a GP tree tends to be low in the tree,
residing, therefore, in smaller-than-average-size subtrees. Crossover events
excising inactive subtrees produce offspring with the same fitness as their
parents. On average the inserted subtree is bigger than the excised one, so
such offspring are bigger than average while retaining the fitness of their
parent, leading ultimately to growth in the average program size.
Finally, the nature of program search spaces theory (Langdon and Poli,
1997; Langdon, Soule, Poli, and Foster, 1999) predicts that above a certain
size, the distribution of fitnesses does not vary with size. Since there are more
long programs, the number of long programs of a given fitness is greater than
the number of short programs of the same fitness. Over time GP samples
longer and longer programs simply because there are more of them.
The explanations for bloat provided by these three theories are largely qual-
itative. There have, however, been some efforts to mathematically formalise
and verify these theories. For example, Banzhaf and Langdon (2002) defined
an executable model of bloat where only the fitness, the size of active code and
the size of inactive code were represented (i.e., there was no representation of
program structures). Fitnesses of individuals were drawn from a bell-shaped
distribution, while active and inactive code lengths were modified by a size-
unbiased mutation operator. Various interesting effects were reported which
are very similar to corresponding effects found in GP runs. Rosca (2003) pro-
posed a similar, but slightly more sophisticated model which also included
an analogue of crossover. This provided further interesting evidence.
A strength of these executable models is their simplicity. A weakness is
that they suppress or remove many details of the representation and opera-
tors typically used in GP. This makes it difficult to verify if all the phenom-
ena observed in the model have analogues in GP runs, and if all important
behaviours of GP in relation to bloat are captured by the model.
11.3 Bloat 103
where µ(t + 1) is the mean size of the programs in the population at gen-
eration t + 1, E is the expectation operator, ` is a program size, and p(`, t)
is the probability of selecting programs of size ` from the population in
generation t.
This equation can be rewritten in terms of the expected change in average
program size as:
X
E[µ(t + 1) − µ(t)] = ` × (p(`, t) − Φ(`, t)), (11.2)
`
parents does not depend on the order in which the parents are drawn from the population.
104 11 GP Theory and its Applications
Rather naturally, the first and simplest method to control code growth is the
use of hard limits on the size or depth of the offspring programs generated
by the genetic operators.
Many implementations of this idea (e.g., (Koza, 1992)) apply a genetic
operator and then check whether the offspring is beyond the size or depth
limit. If it isn’t, the offspring enters the population. If, instead, the off-
spring exceeds the limit, one of the parents is returned. Obviously, this
implementation does not allow programs to grow too large. However, there
is a serious problem with this way of applying size limits, or more generally,
constraints to programs: parent programs that are more likely to violate a
constraint will tend to be copied (unaltered) more often than programs that
don’t. That is, the population will tend to be filled up with programs that
nearly infringe the constraint, which is typically not what is desired.
11.3 Bloat 105
It is well known, for example, that depth thresholds lead to the popu-
lation filling up with very bushy programs where most branches reach the
depth limit (being effectively full trees). On the contrary, size limits produce
populations of stringy programs which tend to all approach the size limit.
See (Crane and McPhee, 2005; McPhee, Jarvis, and Crane, 2004) for more
on the impact of size and depth limits, and the differences between them.
The problem can be fixed by not returning parents if the offspring violates
a constraint. This can be realised with two different strategies. Firstly, one
can just return the oversize offspring, but give it a fitness of 0, so that
selection will get rid of it at the next generation. Secondly, one can simply
declare the genetic operation failed, and try again. This can be done in two
alternative ways: a) the same parent or parents are used again, but new
mutation or crossover points are randomly chosen (which can be done up
to a certain number of times before giving up on those parents), or b) new
parents are selected and the genetic operation is attempted again.
If a limit is used, programs must not be so tightly constrained that they
cannot express any solution to the problem. As a rule of thumb, one should
try to estimate the size of the minimum possible solution (using the terminals
and functions given to GP) and add some percentage (e.g., 50-200%) as a
safety margin. In general, however, it may be hard to heuristically come up
with good limits, so some trial and error may be required. Alternatively,
one can use one of the many techniques that have been proposed to adjust
size limits during runs. These can be both at the level of individuals and the
population. See for example the work by Silva and Almeida (2003); Silva
and Costa (2004, 2005a,b); Silva, Silva, and Costa (2005).
on average the same size as the code it replaces. In Hoist mutation (Kinnear,
1994a) the new subtree is selected from the subtree being removed from the
parent, guaranteeing that the new program will be smaller than its parent.
Shrink mutation (Angeline, 1996) is a special case of subtree mutation where
the randomly chosen subtree is replaced by a randomly chosen terminal.
McPhee and Poli (2002) provides theoretical analysis and empirical evidence
that combinations of subtree crossover and subtree mutation operators can
control bloat in linear GP systems.
Other methods which control bloat by exploiting the bias of the operators
were discussed in Section 9.4.
Anti-Bloat Selection
As clarified by the size evolution equation discussed in the previous section,
in systems with symmetric operators, bloat can only happen if there are
some longer-than-average programs that are fitter than average or some
shorter-than-average programs that are less fit than average, or both. So,
it stands to reason that in order to control bloat one needs to somehow
modulate the selection probabilities of programs based on their size.
As we have discussed in Section 9.2.1, recent methods also include the
use of multi-objective optimisation to control bloat. This typically involves
the use of a modified selection based on the Pareto criterion.
A recent technique, the Tarpeian method (Poli, 2003), controls bloat
by acting directly on the selection probabilities in Equation (11.2). This is
done by setting the fitness of randomly chosen longer-than-average programs
to 0. This prevents them being parents. By changing how frequently this
is done the anti-bloat intensity of Tarpeian control can be modulated. An
advantage of the method is that the programs whose fitness is zeroed are
never executed, thereby speeding up runs.
The well-known parsimony pressure method (Koza, 1992; Zhang and
Mühlenbein, 1993, 1995; Zhang et al., 1997) changes the selection probabili-
ties by subtracting a value based on the size of each program from its fitness.
Bigger programs have more subtracted and, so, have lower fitness and tend
to have fewer children. That is, the new fitness function is f (x) − c × `(x),
where `(x) is the size of program x, f (x) is its original fitness and c is a con-
stant known as the parsimony coefficient.2 Zhang and Mühlenbein (1995)
showed some benefits of adaptively adjusting the coefficient c at each gen-
eration but most implementations actually keep the parsimony coefficient
constant.
2 While the new fitness is used to guide evolution, one still needs to use the original
Figure 11.2: Plots of the evolution average size over 500 generations for
multiple runs of the 6-MUX problem with various forms of covariant parsi-
mony pressure. The “Constant” runs had a constant target size of 150. In
the “Sin” runs the target size was sin((generation + 1)/50.0) × 50.0 + 150.
For the “Linear” runs the target size was 150 + generation. The “Limited”
runs used no size control until the size reached 250, then the target was held
at 250. Finally, the “Local” runs used c = Cov(`, f )/ Var(`), which allowed
a certain amount of drift but still avoided runaway bloat (see text).
Part III
Practical Genetic
Programming
109
Chapter 12
Applications
111
112 12 Applications
goal is to find a function whose output has some desired property, e.g., the
function matches some target values (as in the example given in Section 4.1).
This is generally known as a symbolic regression problem.
Many people are familiar with the notion of regression. Regression means
finding the coefficients of a predefined function such that the function best
fits some data. A problem with regression analysis is that, if the fit is not
good, the experimenter has to keep trying different functions by hand until
a good model for the data is found. Not only is this laborious, but also
the results of the analysis depend very much on the skills and inventiveness
of the experimenter. Furthermore, even expert users tend to have strong
mental biases when choosing functions to fit. For example, in many applica-
tion areas there is a considerable tradition of using only linear or quadratic
models, even when the data might be better fit by a more complex model.
Symbolic regression attempts to go beyond this. It consists of finding
a function that fits the given data points without making any assumptions
about the structure of that function. Since GP makes no such assumption,
it is well suited to this sort of discovery task. Symbolic regression was one
of the earliest applications of GP (Koza, 1992), and continues to be widely
studied (Cai, Pacheco-Vega, Sen, and Yang, 2006; Gustafson, Burke, and
Krasnogor, 2005; Keijzer, 2004; Lew, Spencer, Scarpa, Worden, Rutherford,
and Hemez, 2006).
The steps necessary to solve symbolic regression problems include the five
preparatory steps mentioned in Chapter 2. We practiced them in the exam-
ple in Chapter 4, which was an instance of a symbolic regression problem.
There is an important difference here, however: the data points provided in
Chapter 4 were computed using a simple formula, while in most realistic sit-
uations each point represents the measured values taken by some variables
at a certain time in some dynamic process, in a repetition of an experiment,
and so on. So, the collection of an appropriate set of data points for symbolic
regression is an important and sometimes complex task.
For instance, consider the case of using GP to evolve a soft sensor (Jor-
daan, Kordon, Chiang, and Smits, 2004). The intent is to evolve a function
that will provide a reasonable estimate of what a sensor (in an industrial
production facility) would report, based on data from other actual sensors
in the system. This is typically done in cases where placing an actual sensor
in that location would be difficult or expensive. However, it is necessary to
place at least one instance of such a sensor in a working system in order to
collect the data needed to train and test the GP system. Once the sensor
is placed, one would collect the values reported by that sensor and by all
the other real sensors that are available to the evolved function, at various
times, covering the various conditions under which the evolved system will
be expected to act.
Such experimental data typically come in large tables where numerous
12.2 Curve Fitting, Data Modelling and Symbolic Regression 115
Table 12.1: Samples showing the size and location of Elvis’s finger tip
as apparent to this two eyes, given various right arm actuator set points (4
degrees of freedom). Cf. Figure 12.1. When the data are used for training,
GP is asked to invert the mapping and evolve functions from data collected
by both cameras showing a target location to instructions to give to Elvis’s
four arm motors so that its arm moves to the target.
most symbolic regression fitness functions tend to include summing the er-
rors measured for each record in the data set, as we did in Section 4.2.2.
Usually either the absolute difference or the square of the error is used.
The fourth preparatory step typically involves choosing a size for the
population (which is often done initially based on the perceived difficulty of
the problem, and is then refined based on the actual results of preliminary
runs). The user also needs to set the balance between the selection strength
(normally tuned via the tournament size) and the intensity of variation
(which can be varied by modifying the mutation and crossover rates, but
many researchers tend to fix to some standard values).
12.3 Human Competitive Results – the Humies 117
Figure 12.1: Elvis sitting with its right hand outstretched. The apparent
position and size of a bright red laser attached to its finger tip is recorded
(see Table 12.1). The data are then used to train a GP to move the robot’s
arm to a spot in three dimensions using only its eyes.
must earn the rating of “human competitive” independently of the fact that
it was generated by an automated method.
Koza proposed that an automatically-created result should be considered
“human-competitive” if it satisfies at least one of these eight criteria:
1. The result was patented as an invention in the past, is an improvement
over a patented invention or would qualify today as a patentable new
invention.
2. The result is equal to or better than a result that was accepted as a new
scientific result at the time when it was published in a peer-reviewed
scientific journal.
3. The result is equal to or better than a result that was placed into a
database or archive of results maintained by an internationally recog-
nised panel of scientific experts.
4. The result is publishable in its own right as a new scientific result,
independent of the fact that the result was mechanically created.
5. The result is equal to or better than the most recent human-created
solution to a long-standing problem for which there has been a succes-
sion of increasingly better human-created solutions.
6. The result is equal to or better than a result that was considered an
achievement in its field at the time it was first discovered.
7. The result solves a problem of indisputable difficulty in its field.
8. The result holds its own or wins a regulated competition involving
human contestants (in the form of either live human players or human-
written computer programs).
These criteria are independent of, and at arm’s length from, the fields of
artificial intelligence, machine learning, and GP.
Over the years, dozens of results have passed the human-competitiveness
test. Some pre-2004 human-competitive results include:
• Creation of quantum algorithms, including a better-than-classical al-
gorithm for a database search problem and a solution to an AND/OR
query problem (Spector et al., 1998, 1999).
• Creation of a competitive soccer-playing program for the RoboCup 1997
competition (Luke, 1998).
• Creation of algorithms for the transmembrane segment identification
problem for proteins (Koza, 1994, Sections 18.8 and 18.10) and (Koza
et al., 1999, Sections 16.5 and 17.2).
12.3 Human Competitive Results – the Humies 119
• Creation of a sorting network for seven items using only 16 steps (Koza
et al., 1999, Sections 21.4.4, 23.6, and 57.8.1).
2004; Dempster and Jones, 2000; Dempster, Payne, Romahi, and Thompson,
2001). Pillay has used GP in social studies and teaching aids in education,
e.g. (Pillay, 2003). As well as trees (Koza, 1990), other types of GP have
been used in finance, e.g. (Nikolaev and Iba, 2002).
Since 1995 the International Conference on Computing in Economics
and Finance (CEF) has been held every year. It regularly attracts GP pa-
pers, many of which are on-line. In 2007 Brabazon and O’Neill established
the European Workshop on Evolutionary Computation in Finance and Eco-
nomics (EvoFIN). EvoFIN is held with EuroGP.
control laws to apply). For example, Fleming’s group in Sheffield used multi-
objective GP (Hinchliffe and Willis, 2003; Rodriguez-Vazquez, Fonseca, and
Fleming, 2004) to reduce the cost of running aircraft jet engines (Arkov,
Evans, Fleming, Hill, Norton, Pratt, Rees, and Rodriguez-Vazquez, 2000;
Evans, Fleming, Hill, Norton, Pratt, Rees, and Rodriguez-Vazquez, 2001).
Alves da Silva and Abrao (2002) surveyed GP and other AI techniques
applied in the electrical power industry.
Kell and his colleagues in Aberystwyth have had great success in applying
GP widely in bioinformatics (see infrared spectra above and (Allen, Davey,
Broadhurst, Heald, Rowland, Oliver, and Kell, 2003; Day, Kell, and Griffith,
2002; Gilbert, Goodacre, Woodward, and Kell, 1997; Goodacre and Gilbert,
1999; Jones, Young, Taylor, Kell, and Rowland, 1998; Kell, 2002a,b,c; Kell,
Darby, and Draper, 2001; Shaw, Winson, Woodward, McGovern, Davey,
Kaderbhai, Broadhurst, Gilbert, Taylor, Timmins, Goodacre, Kell, Alsberg,
and Rowland, 2000; Woodward, Gilbert, and Kell, 1999)). Another very
active group is that of Moore and his colleagues (Moore, Parker, Olsen, and
Aune, 2002; Motsinger, Lee, Mellick, and Ritchie, 2006; Ritchie, Motsinger,
Bush, Coffey, and Moore, 2007; Ritchie, White, Parker, Hahn, and Moore,
2003).
Computational chemistry is widely used in the drug industry. The prop-
erties of simple molecules can be calculated. However, the interactions be-
tween chemicals which might be used as drugs and medicinal targets within
the body are beyond exact calculation. Therefore, there is great interest in
the pharmaceutical industry in approximate in silico models which attempt
to predict either favourable or adverse interactions between proto-drugs and
biochemical molecules. Since these are computational models, they can be
applied very cheaply in advance of the manufacturing of chemicals, to decide
which of the myriad of chemicals might be worth further study. Potentially,
such models can make a huge impact both in terms of money and time
without being anywhere near 100% correct. Machine learning and GP have
both been tried. GP approaches include (Bains, Gilbert, Sviridenko, Gas-
con, Scoffin, Birchall, Harvey, and Caldwell, 2002; Barrett and Langdon,
2006; Buxton, Langdon, and Barrett, 2001; Felton, 2000; Globus, Lawton,
and Wipke, 1998; Goodacre, Vaidyanathan, Dunn, Harrigan, and Kell, 2004;
Harrigan et al., 2004; Hasan, Daugelat, Rao, and Schreiber, 2006; Krasno-
gor, 2004; Si, Wang, Zhang, Hu, and Fan, 2006; Venkatraman, Dalby, and
Yang, 2004; Weaver, 2004).
The use of GP in computer art can be traced back at least to the work
of Sims (Sims, 1991) and Latham.1 Jacob’s work (Jacob, 2000, 2001) pro-
vides many examples. McCormack (2006) considers the recent state of play
in evolutionary art and music. Many recent techniques are described in
(Machado and Romero, 2008).
Evolutionary music (Todd and Werner, 1999) has been dominated by
Jazz (Spector and Alpern, 1994). An exception is Bach (Federman, Spark-
man, and Watt, 1999). Most approaches to evolving music have made at
least some use of interactive evolution (Takagi, 2001) in which the fitness
of programs is provided by users, often via the Internet (Ando, Dahlsted,
Nordahl, and Iba, 2007; Chao and Forrest, 2003). The limitation is al-
most always finding enough people willing to participate (Langdon, 2004).
Costelloe and Ryan (2007) tried to reduce the human burden. Algorithmic
approaches are also possible (Cilibrasi, Vitanyi, and de Wolf, 2004; Inagaki,
2002).
One of the sorrows of AI is that as soon as it works it stops being AI (and
celebrated as such) and becomes computer engineering. For example, the
use of computer generated images has recently become cost effective and is
widely used in Hollywood. One of the standard state-of-the-art techniques
is the use of Reynold’s swarming “boids” (Reynolds, 1987) to create ani-
mations of large numbers of rapidly moving animals. This was first used in
Cliffhanger (1993) to animate a cloud of bats. Its use is now commonplace
(herds of wildebeest, schooling fish, and even large crowds of people). In
1997 Reynold was awarded an Oscar.
Since 2003, EvoMUSART (the European Workshop on Evolutionary Mu-
sic and Art) has been held every year along with the EuroGP conference as
part of the EvoStar event.
12.11 Compression
Koza (1992) was the first to use genetic programming to perform compres-
sion. He considered, in particular, the lossy compression of images. The idea
was to treat an image as a function of two variables (the row and column
of each pixel) and to use GP to evolve a function that matches as closely as
possible the original. One can then use the evolved GP tree as a lossy com-
pressed version of the image, since it is possible to obtain the original image
by evaluating the program at each row-column pair of interest. The tech-
nique, which was termed programmatic compression, was tested on one small
synthetic image with good success. Programmatic compression was further
developed and applied to realistic data (images and sounds) by Nordin and
Banzhaf (1996).
1 http://www.williamlatham1.com/
12.11 Compression 129
with standard compression methods, the time needed for compression being
measured in hours or even days. Acceleration in GP image compression was
achieved in (He, Wang, Zhang, Wang, and Fang, 2005), where an optimal
linear predictive technique was proposed which used a less complex fitness
function.
Recently Kattan and Poli (2008) proposed a GP system called GP-ZIP
for lossless data compression based on the idea of optimally combining well-
known lossless compression algorithms. The file to be compressed was di-
vided into chunks of a predefined length, and GP was asked to find the best
possible compression algorithm for each chunk in such a way as to minimise
the total length of the compressed file. The compression algorithms avail-
able to GP-ZIP included arithmetic coding, Lempel-Ziv-Welch, unbounded
prediction by partial matching, and run length encoding among others. Ex-
perimentation showed that when the file to be compressed is composed of
heterogeneous data fragments (as it is the case, for example, in archive files),
GP-zip is capable of achieving compression ratios that are significantly su-
perior to those obtained with other compression algorithms.
Chapter 13
Troubleshooting GP
The dynamics of evolutionary algorithms (including GP) are often very com-
plex, and the behaviour of an EA is typically challenging to predict or un-
derstand. As a result it is often difficult to troubleshoot such systems when
they are not performing as expected. While we obviously cannot provide
troubleshooting suggestions that are specific to every GP implementation
and application, we can suggest some general issues to keep in mind. To a
large extent the advice in (Kinnear, 1994b; Koza, 1992; Langdon, 1998) also
remains sound.
131
132 13 Troubleshooting GP
3 http://www.graphviz.org/
4 http://www.cs.ucl.ac.uk/staff/W.Langdon/lisp2dot.html
136 13 Troubleshooting GP
Figure 13.1: Visualisation of the size and shape of the entire population of
1,000 individuals in the final generation of runs using a depth limit of 50 (on
the left) and a size limit of 600 (on the right). The inner circle is at depth
50, and the outer circle is at depth 100. These plots are from (Crane and
McPhee, 2005) and were drawn using the techniques described in (Daida
et al., 2005).
//library.wolfram.com/infocenter/MathSource/5163/.
13.8 Embrace Approximation 137
of GP, populations of 1,000 or more could be considered large. However, CPU speeds
and computer memory have increased exponentially over time. So, at the time of writing
it is not unusual to see populations of hundred of thousands or millions of individuals
being used in the solution of hard problems. Research indicates that there are benefits in
splitting populations into demes even for much smaller populations. See Section 10.5.
138 13 Troubleshooting GP
Programmers know from painful experience, however, that far from proving
immediately fatal, errors can lay hidden for years. Further, not all errors
are created equal. Some are indeed critical and must be dealt with immedi-
ately, while others are rare or largely inconsequential and so never become a
major priority. The worst are arguably the severe bugs that rarely express
themselves, as they can be extremely difficult to pin down yet still have dire
consequences when they appear.
In summary, there is no such thing as a perfect (non-trivial) human-
written program and all such programs include a variety of errors of different
severity and with a different frequency of manifestation.9
This sort of variability is also very common in GP work. It provides the
sort of toehold that evolution can exploit in the early generations of GP
runs. The population of programs just needs to contain a few which move
vaguely in the right direction. Many of their offspring may be totally blind or
have no legs, just so long as a few continue to slime towards the light. Over
generations evolution may hopefully cobble together some useful features
from this initially unpromising ooze. The results, however, are unlikely
to be perfect or pretty. If you as a GP engineer insist on only accepting
solutions that are beautifully symmetric and walk on two legs on day one,
you are likely to be disappointed. As we have argued above, even human-
written programs often only approximate their intended functionality. So,
why should we not accept the same from GP?
If you accept this notion, then it is important to provide your system with
some sort of gradient upon which to act, allowing it to evolve ever better
approximations. It is also important to ensure that your test environment
(usually encapsulated in the fitness function) places appropriate emphasis on
the most important features of the space from a user perspective. Consider a
problem with five test cases, four of which are fairly easy and consequently
less important, with the fifth being crucial and quite difficult. A likely
outcome in such a setting is that individuals that can do the four easier
tasks, but are unable to make the jump to the fifth. There are several
things you could try: 1) weighting the hard task more heavily, 2) dividing
it up in some way into additional sub-tasks, or 3) changing it from being a
binary condition (meaning that an individual does or does not succeed on the
fifth task) to a continuous condition, so that an individual GP program can
partially succeed on the fifth task. The first of these options is the simplest
to implement. The second two, however, create a smoother gradient for the
evolutionary process to follow, and so may yield better results.
In addition to reporting your results, make sure you also discuss their
implications. If, for example, what GP has evolved means the customer can
save money or could improve their process in some way, then this should
be highlighted. Also be careful to not construct excessively complex expla-
nations for the observations. It is very tempting to say “X is probably due
to Y”, but for this to be believable one should at least have made some
attempt to check if Y is indeed taking place, and whether modulations or
suppression of Y in fact produce modulations and/or suppression of X.
Finally, the most likely outcomes of a text that is badly written or badly
presented are: 1) your readers will misunderstand you, and 2) you will have
fewer readers. Spell checkers can help with typos, but whenever possible
one should ensure a native English speaker has proofread the text.
Conclusions
141
142 14 Conclusions
Today, decades later, we can see that indeed Turing was right. GP has
started fulfilling his dream by providing us with a systematic method, based
on Darwinian evolution, for getting computers to automatically solve hard
real-life problems. To do so, it simply requires a high-level statement of
what needs to be done and enough computing power.
Turing also understood the need to evaluate objectively the behaviour ex-
hibited by machines, to avoid human biases when assessing their intelligence.
This led him to propose an imitation game, now known as the Turing test for
machine intelligence, whose goals are wonderfully summarised by Samuel’s
position statement quoted in the introduction of this book (page 1). The
eight criteria for human competitiveness we discussed in Section 12.3 are
essentially motivated by the same goals.
At present GP is unable to produce computer programs that would pass
the full Turing test for machine intelligence, and it might not be ready
for this immense task for centuries. Nonetheless, thanks to the constant
improvements in GP technology, in its theoretical foundations and in com-
puting power, GP has been able to solve dozens of difficult problems with
human-competitive results and to provide valuable solutions to many other
problems (see Chapter 12). These are a small step towards fulfilling Turing
and Samuel’s dreams, but they are also early signs of things to come. It is
reasonable to predict that in a few years time GP will be able to routinely
and competently solve important problems for us, in a variety of application
domains with human-competitive performance. Genetic programming will
then become an essential collaborator for many human activities. This will
be a remarkable step forward towards achieving true human-competitive
machine intelligence.
This field guide is an attempt to chart the terrain of techniques and
applications we have encountered in our journey in the world of genetic
programming. Much is still unmapped and undiscovered. We hope this
book will make it easier for other travellers to start many long and profitable
journeys in this exciting world.
If you have found this book to be useful, please feel free to redistribute it
(see page ii). Should you want to cite this book, please refer to the entry for
(Poli et al., 2008) in the bibliography.
Part IV
In the end we find that Mary does indeed have a little GP. . .
143
Appendix A
Resources
The field of GP took off in the early 1990’s, driven in significant part by
the publication of (Koza, 1992). Those early days were characterised by the
exponential growth common in the initial stages of successful technologies.
Many influential papers from that period can be found in the proceedings
of the International Conference on Genetic Algorithms (ICGA-93, ICGA-
95), the IEEE conferences on Evolutionary Computation (EC-1994), and
the Evolutionary Programming conferences. A surprisingly large number
of these are now available on-line, and we’ve included as many URLs as
we could in the bibliography.1 After almost twenty years, GP has matured
and is used in a wondrous array of applications from banking to betting,
from bomb detection to architectural design, from the steel industry to the
environment, from space to biology, and many others (as we have seen in
Section 12).
In 1996 it was possible to list almost all the studies and applications of
GP (Langdon, 1996), but today the range is far too great. In this appendix
we will review some of the wide variety of available sources on GP which
should assist readers who wish to explore further. Consulting information
available on the Web is certainly a good way to get quick answers for someone
who wants to know what GP is. These answers, however, will often be too
shallow for someone who really wants to then apply GP to solve practical
problems. People in this position should probably invest some time going
through more detailed accounts; some of the key books in the field include
(Banzhaf, Nordin, Keller, and Francone, 1998a; Koza, 1992; Langdon and
Poli, 2002), and others are listed in Section A.1. Technical papers in the
extensive GP literature may be the next stage. Although this literature is
easily accessible thanks to the complete on-line bibliography (Langdon et al.,
1995-2008), newcomers will often need to be selective in what they read. The
1 Each included URL was tested and was operational at the time of writing.
145
146 A Resources
A.4 GP Implementations
One of the reasons behind the success of GP is that it is easy to implement
own versions, and implementing a simple GP system from scratch remains
an excellent way to make sure one really understands the mechanics of GP.
In addition to being an exceptionally useful exercise, it is often easier to
customise (e.g., adding new, application specific genetic operators or imple-
menting unusual, knowledge-based initialisation strategies) a system one has
built for new purposes than a large GP distribution. All of this, however,
requires reasonable programming skills and the will to thoroughly test the
resulting system until it behaves as expected.
This is actually an extremely tricky issue in highly stochastic systems
such as GP, as we discussed in Section 13.1. The problem is that almost
any system will produce “interesting” behaviour, but it is typically very
hard to test whether it is exhibiting the correct interesting behaviour. It
is remarkably easy for small mistakes to go unnoticed for extended periods
of time (even years).2 It is also easy to incorrectly assume that “minor”
2 Several years ago Nic and some of his students discovered that one of their systems
had been performing addition instead of subtraction for several months due to a copy-
148 A Resources
paste error. Fortunately no published results were affected, but it was a very unsettling
experience.
3 http://www.cs.bham.ac.uk/~wbl/biblio/
4 The GP bibliography is a volunteer effort and depends crucially on submissions from
users. Authors are encouraged to check that their GP publications are listed, and send
missing entries to the bibliography’s maintainers.
A.5 On-Line Resources 149
Figure A.1: Co-authorship connections within GP. Each of the 1,141 dots
indicates an author, and edges link people who have co-authored one or
more GP papers. (To reduce clutter only links to first authors are shown.)
The size of each dot indicates the number of entries. The on-line version is
annotated using JavaScript and contains hyperlinks to authors and their
GP papers. The graph was created by GraphViz twopi, which tries to
place strongly connected people close together. This diagram displays just
the “centrally connected component” (Tomassini et al., 2007) and contains
approximately half of all GP papers. The remaining papers are not linked
by co-authorship to this graph. Several other large components are also
available on-line via the GP Bibliography (Langdon et al., 1995-2008).
Appendix B
TinyGP
1 http://cswww.essex.ac.uk/staff/rpoli/TinyGP/
151
152 B TinyGP
14. The random number generator can be seeded via the command line.
If this command line parameter is absent, the system uses the current
time to seed the random number generator.
15. The name of the file containing the fitness cases can be passed to
the system via the command line. If the command line parameter is
absent, the system assumes the data are stored in the current directory
in a file called “problem.dat”.
16. If the total error made by the best program goes below 10−5 TinyGP
prints a message indicating success and stops. If the problem has
not been solved after the maximum number of generations, it prints a
message indicating failure and stops.
1 100 -5 5 63
0.0 0
154 B TinyGP
0.1 0.0998334166468282
0.2 0.198669330795061
0.3 0.29552020666134
....
55 LINES OMITTED
....
5.9 -0.373876664830236
6.0 -0.279415498198926
6.1 -0.182162504272095
6.2 -0.0830894028174964
These fitness cases are sin(x) for x ∈ {0.0, 0.1, 0.2, . . . 6.2}
regression GP system. The source code, in C, is 5,906 bytes. The original version included
a compilation script which, with a variety of tricks, created a self-extracting executable
occupying 2,871 bytes (while the actual size of the executable after self-extraction was
4,540 bytes). All optimisations in the code were aimed at bringing the executable size
(as opposed to the source code size) down, the main purpose being to show that, against
popular belief, it is possible to have really tiny and efficient GP systems.
B.3 Source Code 155
The code reads command line arguments using the standard args array.
Generally the code is quite standard and should be self-explanatory for
anyone who can program in Java, whether or not they have implemented a
GP system before. Therefore very few comments have been provided in the
source code.
The source is provided below.
1 /∗
2 ∗ Program: tiny gp . java
3 ∗
4 ∗ Author : Riccardo Poli (email : rpoli@essex . ac .uk)
5 ∗
6 ∗/
7
8 import j a v a . u t i l . ∗ ;
9 import j a v a . i o . ∗ ;
10 import j a v a . t e x t . DecimalFormat ;
11
12 public c l a s s t i n y g p {
13 double [ ] f i t n e s s ;
14 char [ ] [ ] pop ;
15 s t a t i c Random rd = new Random ( ) ;
16 static f i n a l int
17 ADD = 1 1 0 ,
18 SUB = 1 1 1 ,
19 MUL = 1 1 2 ,
20 DIV = 1 1 3 ,
21 FSET START = ADD,
22 FSET END = DIV ;
23 s t a t i c double [ ] x = new double [ FSET START ] ;
24 s t a t i c double minrandom , maxrandom ;
25 s t a t i c char [ ] program ;
26 s t a t i c i n t PC;
27 s t a t i c i n t varnumber , f i t n e s s c a s e s , randomnumber ;
28 s t a t i c double f b e s t p o p = 0 . 0 , f av g p op = 0 . 0 ;
29 s t a t i c long s e e d ;
30 s t a t i c double a v g l e n ;
31 static f i n a l int
32 MAX LEN = 1 0 0 0 0 ,
33 POPSIZE = 1 0 0 0 0 0 ,
34 DEPTH = 5,
35 GENERATIONS = 1 0 0 ,
36 TSIZE = 2 ;
37 public s t a t i c f i n a l double
38 PMUT PER NODE = 0 . 0 5 ,
39 CROSSOVER PROB = 0 . 9 ;
40 s t a t i c double [ ] [ ] t a r g e t s ;
41
96 l i n e = in . readLine () ;
97 t o k e n s = new S t r i n g T o k e n i z e r ( l i n e ) ;
98 f o r ( j = 0 ; j <= varnumber ; j ++) {
99 t a r g e t s [ i ] [ j ] = Double . p a r s e D o u b l e ( t o k e n s . nextToken ( ) .
trim ( ) ) ;
100 }
101 }
102 in . close () ;
103 }
104 catch ( F i l e N o t F o u n d E x c e p t i o n e ) {
105 System . out . p r i n t l n ( ”ERROR: P l e a s e p r o v i d e a data f i l e ” ) ;
106 System . e x i t ( 0 ) ;
107 }
108 catch ( E x c e p t i o n e ) {
109 System . out . p r i n t l n ( ”ERROR: I n c o r r e c t data f o r m a t ” ) ;
110 System . e x i t ( 0 ) ;
111 }
112 }
113
114 double f i t n e s s f u n c t i o n ( char [ ] Prog ) {
115 int i = 0 , l e n ;
116 double r e s u l t , f i t = 0 . 0 ;
117
118 l e n = t r a v e r s e ( Prog , 0 ) ;
119 f o r ( i = 0 ; i < f i t n e s s c a s e s ; i ++ ) {
120 f o r ( i n t j = 0 ; j < varnumber ; j ++ )
121 x [ j ] = targets [ i ] [ j ] ;
122 program = Prog ;
123 PC = 0 ;
124 r e s u l t = run ( ) ;
125 f i t += Math . abs ( r e s u l t − t a r g e t s [ i ] [ varnumber ] ) ;
126 }
127 return(− f i t ) ;
128 }
129
130 i n t grow ( char [ ] b u f f e r , i n t pos , i n t max , i n t depth ) {
131 char prim = ( char ) rd . n e x t I n t ( 2 ) ;
132
133 i f ( pos >= max )
134 return ( −1 ) ;
135
136 i f ( pos == 0 )
137 prim = 1 ;
138
139 i f ( prim == 0 | | depth == 0 ) {
140 prim = ( char ) rd . n e x t I n t ( varnumber + randomnumber ) ;
141 b u f f e r [ pos ] = prim ;
142 return ( pos +1) ;
143 }
144 else {
145 prim = ( char ) ( rd . n e x t I n t (FSET END − FSET START + 1 ) +
FSET START) ;
146 switch ( prim ) {
147 case ADD:
148 case SUB :
158 B TinyGP
254 }
255 return ( b e s t ) ;
256 }
257
258 i n t n e g a t i v e t o u r n a m e n t ( double [ ] f i t n e s s , i n t t s i z e ) {
259 i n t w o r s t = rd . n e x t I n t (POPSIZE) , i , c o m p e t i t o r ;
260 double f w o r s t = 1 e34 ;
261
262 f o r ( i = 0 ; i < t s i z e ; i ++ ) {
263 c o m p e t i t o r = rd . n e x t I n t (POPSIZE) ;
264 i f ( f i t n e s s [ competitor ] < fworst ) {
265 fworst = f i t n e s s [ competitor ] ;
266 worst = competitor ;
267 }
268 }
269 return ( w o r s t ) ;
270 }
271
272 char [ ] c r o s s o v e r ( char [ ] p a r e n t 1 , char [ ] p a r e n t 2 ) {
273 i n t x o 1 s t a r t , xo1end , x o 2 s t a r t , xo2end ;
274 char [ ] o f f s p r i n g ;
275 int le n1 = t r a v e r s e ( parent1 , 0 ) ;
276 int le n2 = t r a v e r s e ( parent2 , 0 ) ;
277 int l e n o f f ;
278
279 x o 1 s t a r t = rd . n e x t I n t ( l e n 1 ) ;
280 xo1end = t r a v e r s e ( p a r e n t 1 , x o 1 s t a r t ) ;
281
282 x o 2 s t a r t = rd . n e x t I n t ( l e n 2 ) ;
283 xo2end = t r a v e r s e ( p a r e n t 2 , x o 2 s t a r t ) ;
284
285 l e n o f f = x o 1 s t a r t + ( xo2end − x o 2 s t a r t ) + ( l e n 1 −xo1end ) ;
286
287 o f f s p r i n g = new char [ l e n o f f ] ;
288
289 System . a r r a y c o p y ( p a r e n t 1 , 0 , o f f s p r i n g , 0 , x o 1 s t a r t ) ;
290 System . a r r a y c o p y ( p a r e n t 2 , x o 2 s t a r t , o f f s p r i n g , x o 1 s t a r t ,
291 ( xo2end − x o 2 s t a r t ) ) ;
292 System . a r r a y c o p y ( p a r e n t 1 , xo1end , o f f s p r i n g ,
293 x o 1 s t a r t + ( xo2end − x o 2 s t a r t ) ,
294 ( l e n 1 −xo1end ) ) ;
295
296 return ( o f f s p r i n g ) ;
297 }
298
299 char [ ] mutation ( char [ ] p a r e n t , double pmut ) {
300 int l e n = t r a v e r s e ( parent , 0 ) , i ;
301 int mutsite ;
302 char [ ] p a r e n t c o p y = new char [ l e n ] ;
303
304 System . a r r a y c o p y ( p a r e n t , 0 , p a r e n t c o p y , 0 , l e n ) ;
305 f o r ( i = 0 ; i < l e n ; i ++ ) {
306 i f ( rd . nextDouble ( ) < pmut ) {
307 mutsite = i ;
308 i f ( p a r e n t c o p y [ m u t s i t e ] < FSET START )
B.3 Source Code 161
java tiny gp FILE, where FILE is the dataset file name (which can include
the full path to the file). Finally, the user can specify both the datafile and
a seed for the random number generator on the command line, by giving
the command java tiny gp SEED FILE, where SEED is an integer.
As an example, we ran TinyGP on the sin(x) dataset described in Sec-
tion B.2 (which is available at http://cswww.essex.ac.uk/staff/rpoli/
TinyGP/sin-data.txt). The output produced by the program was some-
thing like the following
−− TINY GP ( Java v e r s i o n ) −−
SEED=−1
MAX LEN=10000
POPSIZE=100000
DEPTH=5
CROSSOVER PROB=0.9
PMUT PER NODE=0.05
MIN RANDOM=−5.0
MAXRANDOM=5.0
GENERATIONS=100
TSIZE=2
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
G e n e r a t i o n=0 Avg F i t n e s s =42.53760218120066 Best F i t n e s s
=39.997953686554816 Avg S i z e =10.9804
Best I n d i v i d u a l : ( 1 . 5 8 9 8 1 6 3 3 4 4 5 8 0 5 5 / −2.128280559500907)
G e n e r a t i o n=1 Avg F i t n e s s =1226.404415960088 Best F i t n e s s
=24.441994244449372 Avg S i z e =10.97024
Best I n d i v i d u a l : ( ( ( − 0 . 3 8 3 9 8 6 7 9 4 4 2 2 2 2 0 6 / −2 .27 9612716 24 28 4 03 ) +
( −1.83868 12853617673 / −1.06553859601892) ) −
(((4.984026635222818 ∗ (0.17196413319878445 −
0 . 1 2 9 4 0 4 4 2 1 5 6 5 5 9 2 3 ) ) + (X1 − −1.8956001614031734) ) ∗
0.3627020733460027) )
...
1000
Avg Fitness
Best Fitness
Fitness 100
10
1
0 20 40 60 80 100
Generations
100
Avg Size
90
80
70
60
Average Size
50
40
30
20
10
0 20 40 60 80 100
Generations
2
sin(x)
GP (gen=99)
1.5
0.5
-0.5
-1
-1.5
-2
0 1 2 3 4 5 6
x
Figure B.1: Final generation of a TinyGP sample run: best and mean
fitness (top), mean program size (middle) and behaviour of the best-so-far
individual (bottom).
B.4 Compiling and Running TinyGP 165
where
a = 2.76609789995
b = 10.240744822
c = 3.9532436939
d = 3.20011637632
e = 12.6508398844
Hardly an obvious approximation for the sine function, but still a very ac-
curate one, at least over the test range.
Bibliography
A.-C. Achilles and P. Ortyl. The Collection of Computer Science Bibliographies, 1995-
2008. URL http://liinwww.ira.uka.de/bibliography/.
A. Agapitos, J. Togelius, and S. M. Lucas. Multiobjective techniques for the use of state
in genetic programming applied to simulated car racing. In D. Srinivasan and L. Wang,
editors, 2007 IEEE Congress on Evolutionary Computation, pages 1562–1569, Singa-
pore, 25-28 September 2007. IEEE Computational Intelligence Society, IEEE Press.
ISBN 1-4244-1340-0. GPBiB
R. Aler, D. Borrajo, and P. Isasi. Using genetic programming to learn and improve
control knowledge. Artificial Intelligence, 141(1-2):29–56, October 2002. URL http:
//scalab.uc3m.es/~dborrajo/papers/aij-evock.ps.gz. GPBiB
167
168 BIBLIOGRAPHY
L. Araujo. Multiobjective genetic programming for natural language parsing and tag-
ging. In T. P. Runarsson, et al., editors, Parallel Problem Solving from Nature -
PPSN IX, volume 4193 of LNCS, pages 433–442, Reykjavik, Iceland, 9-13 September
2006. Springer-Verlag. ISBN 3-540-38990-3. URL http://ppsn2006.raunvis.hi.is/
proceedings/055.pdf. GPBiB
S. Baluja and R. Caruana. Removing the genetics from the standard genetic algorithm.
In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth
International Conference, pages 38–46. Morgan Kaufmann Publishers, San Francisco,
CA, 1995.
100 1000 2
Avg Size Avg Fitness sin(x)
Best Fitness GP (gen=0)
90
1.5
80
1
70
100
0.5
60
Average Size
Fitness
Generation 0 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
170 BIBLIOGRAPHY
W. Banzhaf and W. B. Langdon. Some considerations on the reason for bloat. Genetic
Programming and Evolvable Machines, 3(1):81–91, March 2002. ISSN 1389-2576. URL
http://web.cs.mun.ca/~banzhaf/papers/genp_bloat.pdf. GPBiB
W. Banzhaf, F. D. Francone, and P. Nordin. The effect of extensive use of the mutation
operator on generalization in genetic programming using sparse data sets. In H.-M.
Voigt, et al., editors, Parallel Problem Solving from Nature IV, Proceedings of the
International Conference on Evolutionary Computation, volume 1141 of LNCS, pages
300–309, Berlin, Germany, 22-26 September 1996. Springer Verlag. ISBN 3-540-61723-
X. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 1 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
172 BIBLIOGRAPHY
S. Brave and A. S. Wu, editors. Late Breaking Papers at the 1999 Genetic and Evolu-
tionary Computation Conference, Orlando, Florida, USA, 13 July 1999. GPBiB
G. Buason, N. Bergfeldt, and T. Ziemke. Brains, bodies, and beyond: Competitive co-
evolution of robot controllers, morphologies and environments. Genetic Programming
and Evolvable Machines, 6(1):25–51, March 2005. ISSN 1389-2576.
E. K. Burke, M. R. Hyde, and G. Kendall. Evolving bin packing heuristics with genetic
programming. In T. P. Runarsson, et al., editors, Parallel Problem Solving from Nature
- PPSN IX, volume 4193 of LNCS, pages 860–869, Reykjavik, Iceland, 9-13 September
2006. Springer-Verlag. ISBN 3-540-38990-3. URL http://www.cs.nott.ac.uk/~mvh/
ppsn2006.pdf. GPBiB
Genetic and evolutionary computation, volume 2, pages 1559–1565, London, 7-11 July
2007. ACM Press. URL http://www.cs.bham.ac.uk/~wbl/biblio/gecco2007/docs/
p1559.pdf. GPBiB
F. Castillo, A. Kordon, and G. Smits. Robust pareto front genetic programming parame-
ter selection based on design of experiments and industrial data. In R. L. Riolo, et al.,
editors, Genetic Programming Theory and Practice IV, volume 5 of Genetic and Evo-
lutionary Computation, chapter 2, pages –. Springer, Ann Arbor, 11-13 May 2006a.
ISBN 0-387-33375-4. GPBiB
M. Chami and D. Robilliard. Inversion of oceanic constituents in case I and II waters with
genetic programming algorithms. Applied Optics, 41(30):6260–6275, October 2002.
URL http://ao.osa.org/ViewMedia.cfm?id=70258&seq=0. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 2 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
174 BIBLIOGRAPHY
S.-H. Chen, J. Duffy, and C.-H. Yeh. Equilibrium selection via adaptation: Using genetic
programming to model learning in a coordination game. The Electronic Journal of
Evolutionary Modeling and Economic Dynamics, 15 January 2002. ISSN 1298-0137.
GPBiB
S.-H. Chen, H.-S. Wang, and B.-T. Zhang. Forecasting high-frequency financial time series
with evolutionary neural trees: The case of heng-sheng stock index. In H. R. Arabnia,
editor, Proceedings of the International Conference on Artificial Intelligence, IC-AI
’99, volume 2, pages 437–443, Las Vegas, Nevada, USA, 28 June-1 July 1999. CSREA
Press. ISBN 1-892512-17-3. URL http://bi.snu.ac.kr/Publications/Conferences/
International/ICAI99.ps. GPBiB
S.-B. Cho, N. X. Hoai, and Y. Shan, editors. Proceedings of The First Asian-Pacific
Workshop on Genetic Programming, Rydges (lakeside) Hotel, Canberra, Australia, 8
December 2003. ISBN 0-9751724-0-9. GPBiB
J. Clegg, J. A. Walker, and J. F. Miller. A new crossover technique for cartesian genetic
programming. In D. Thierens, et al., editors, GECCO ’07: Proceedings of the 9th
annual conference on Genetic and evolutionary computation, volume 2, pages 1580–
1587, London, 7-11 July 2007. ACM Press. URL http://www.cs.bham.ac.uk/~wbl/
biblio/gecco2007/docs/p1580.pdf. GPBiB
BIBLIOGRAPHY 175
R. J. Collins. Studies in Artificial Evolution. PhD thesis, UCLA, Artificial Life Labora-
tory, Department of Computer Science, University of California, Los Angeles, LA CA
90024, USA, 1992.
F. Corno, E. Sanchez, and G. Squillero. Evolving assembly programs: how games help
microprocessor validation. Evolutionary Computation, IEEE Transactions on, 9(6):
695–706, 2005.
E. F. Crane and N. F. McPhee. The effects of size and depth limits on tree based genetic
programming. In T. Yu, et al., editors, Genetic Programming Theory and Practice III,
volume 9 of Genetic Programming, chapter 15, pages 223–240. Springer, Ann Arbor,
12-14 May 2005. ISBN 0-387-28110-X. GPBiB
R. Crawford-Marks and L. Spector. Size control via size fair genetic operators in the
PushGP genetic programming system. In W. B. Langdon, et al., editors, GECCO
2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages
733–739, New York, 9-13 July 2002. Morgan Kaufmann Publishers. ISBN 1-55860-878-
8. URL http://alum.hampshire.edu/~rpc01/gp234.pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 3 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
176 BIBLIOGRAPHY
C. Darwin. The Origin of Species. John Murray, penguin classics, 1985 edition, 1859.
ISBN 0-14-043205-1.
T. E. Davis and J. C. Principe. A Markov chain framework for the simple genetic algo-
rithm. Evolutionary Computation, 1(3):269–288, 1993.
E. D. de Jong and J. B. Pollack. Multi-objective methods for tree size control. Genetic
Programming and Evolvable Machines, 4(3):211–233, September 2003. ISSN 1389-
2576. URL http://www.cs.uu.nl/~dejong/publications/bloat.ps. GPBiB
M. Defoin Platel, M. Clergue, and P. Collard. Maximum homologous crossover for linear
genetic programming. In C. Ryan, et al., editors, Genetic Programming, Proceed-
ings of EuroGP’2003, volume 2610 of LNCS, pages 194–203, Essex, 14-16 April 2003.
Springer-Verlag. ISBN 3-540-00971-X. URL http://www.i3s.unice.fr/~defoin/
publications/eurogp_03.pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 4 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
178 BIBLIOGRAPHY
M. Dorigo and T. Stützle. Ant Colony Optimization. MIT Press (Bradford Books), 2004.
A. Ekart and S. Z. Nemeth. Selection based on the pareto nondomination criterion for
controlling code growth in genetic programming. Genetic Programming and Evolvable
Machines, 2(1):61–73, March 2001. ISSN 1389-2576. GPBiB
S. E. Eklund. A massively parallel architecture for linear machine code genetic pro-
gramming. In Y. Liu, et al., editors, Evolvable Systems: From Biology to Hard-
ware: Proceedings of 4th International Conference, ICES 2001, volume 2210 of
Lecture Notes in Computer Science, pages 216–224, Tokyo, Japan, October 3-
5 2001. Springer-Verlag. URL http://www.springerlink.com/openurl.asp?genre=
article&issn=0302-9743&volume=2210&spage=216. GPBiB
D. I. Ellis, D. Broadhurst, and R. Goodacre. Rapid and quantitative detection of the mi-
crobial spoilage of beef by fourier transform infrared spectroscopy and machine learn-
ing. Analytica Chimica Acta, 514(2):193–201, 2004. URL http://dbkgroup.org/dave_
files/ACAbeef04.pdf. GPBiB
M. J. Felton. Survival of the fittest in drug design. Modern Drug Discovery, 3(9):49–50,
November/December 2000. ISSN 1532-4486. URL http://pubs.acs.org/subscribe/
journals/mdd/v03/i09/html/felton.html. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 5 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
180 BIBLIOGRAPHY
A. Fraser and T. Weinbrenner. GPC++ Genetic Programming C++ Class Library, 1993-
1997. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/weinbenner/gp.html.
C. Fujiki and J. Dickinson. Using the genetic algorithm to generate lisp source code
to solve the prisoner’s dilemma. In J. J. Grefenstette, editor, Genetic Algorithms
and their Applications: Proceedings of the second international conference on Genetic
Algorithms, pages 236–240, MIT, Cambridge, MA, USA, 28-31 July 1987. Lawrence
Erlbaum Associates.
A. Fukunaga and A. Stechert. Evolving nonlinear predictive models for lossless image
compression with genetic programming. In J. R. Koza, et al., editors, Genetic Pro-
gramming 1998: Proceedings of the Third Annual Conference, pages 95–102, University
of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann. ISBN
1-55860-548-7. URL http://citeseer.ist.psu.edu/507773.html. GPBiB
A. S. Fukunaga. Evolving local search heuristics for SAT using genetic programming. In
K. Deb, et al., editors, Genetic and Evolutionary Computation – GECCO-2004, Part
II, volume 3103 of Lecture Notes in Computer Science, pages 483–494, Seattle, WA,
USA, 26-30 June 2004. Springer-Verlag. ISBN 3-540-22343-6. URL http://alexf04.
maclisp.org/gecco2004.pdf. GPBiB
P. Funes, E. Sklar, H. Juille, and J. Pollack. Animal-animat coevolution: Using the animal
population as fitness function. In R. Pfeifer, et al., editors, From Animals to Animats 5:
Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior,
pages 525–533, Zurich, Switzerland, August 17-21 1998a. MIT Press. ISBN 0-262-
66144-6. URL http://www.demo.cs.brandeis.edu/papers/tronsab98.pdf. GPBiB
P. Funes, E. Sklar, H. Juille, and J. Pollack. Animal-animat coevolution: Using the animal
population as fitness function. In R. Pfeifer, et al., editors, From Animals to Animats
5: Proceedings of the Fifth International Conference on Simulation of Adaptive Be-
havior., pages 525–533, Zurich, Switzerland, August 17-21 1998b. MIT Press. ISBN
0-262-66144-6. URL http://www.demo.cs.brandeis.edu/papers/tronsab98.html.
C. Gathercole and P. Ross. Dynamic training subset selection for supervised learn-
ing in genetic programming. In Y. Davidor, et al., editors, Parallel Problem Solv-
ing from Nature III, volume 866 of LNCS, pages 312–321, Jerusalem, 9-14 October
1994. Springer-Verlag. ISBN 3-540-58484-6. URL http://citeseer.ist.psu.edu/
gathercole94dynamic.html. GPBiB
C. Gathercole and P. Ross. The MAX problem for genetic programming - highlighting
an adverse interaction between the crossover operator and a restriction on tree depth.
Technical report, Department of Artificial Intelligence, University of Edinburgh, 80
South Bridge, Edinburgh, EH1 1HN, UK, 1995. URL http://citeseer.ist.psu.edu/
gathercole95max.html. GPBiB
C. Gathercole and P. Ross. Tackling the boolean even N parity problem with ge-
netic programming and limited-error fitness. In J. R. Koza, et al., editors, Ge-
netic Programming 1997: Proceedings of the Second Annual Conference, pages 119–
127, Stanford University, CA, USA, 13-16 July 1997. Morgan Kaufmann. URL
http://citeseer.ist.psu.edu/79389.html. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 6 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
182 BIBLIOGRAPHY
on Molecular Nanotechnology, Westin Hotel in Santa Clara, CA, USA, November 12-
15, 1998 1998. URL http://www.foresight.org/Conferences/MNT6/Papers/Globus/
index.html. GPBiB
F. Gruau. Neural Network Synthesis using Cellular Encoding and the Genetic Algorithm.
PhD thesis, Laboratoire de l’Informatique du Parallilisme, Ecole Normale Supirieure de
Lyon, France, 1994. URL ftp://ftp.ens-lyon.fr/pub/LIP/Rapports/PhD/PhD1994/
PhD1994-01-E.ps.Z. GPBiB
F. Gruau and D. Whitley. Adding learning to the cellular development process: a com-
parative study. Evolutionary Computation, 1(3):213–233, 1993. GPBiB
G. Harik. Linkage learning via probabilistic modeling in the ECGA. IlliGAL Report
99010, University of Illinois at Urbana-Champaign, 1999.
K. Harries and P. Smith. Exploring alternative operators and search strategies in genetic
programming. In J. R. Koza, et al., editors, Genetic Programming 1997: Proceedings
of the Second Annual Conference, pages 147–155, Stanford University, CA, USA, 13-
16 July 1997. Morgan Kaufmann. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/
ftp/papers/harries.gp97_paper.ps.gz. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 7 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
184 BIBLIOGRAPHY
A. Hauptman and M. Sipper. Evolution of an efficient search algorithm for the mate-in-N
problem in chess. In M. Ebner, et al., editors, Proceedings of the 10th European Con-
ference on Genetic Programming, volume 4445 of Lecture Notes in Computer Science,
pages 78–89, Valencia, Spain, 11 - 13 April 2007. Springer. ISBN 3-540-71602-5. GPBiB
M. Hinchliffe, M. Willis, and M. Tham. Chemical process sytems modelling using multi-
objective genetic programming. In J. R. Koza, et al., editors, Genetic Programming
1998: Proceedings of the Third Annual Conference, pages 134–139, University of Wis-
consin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann. ISBN 1-55860-
548-7. GPBiB
S.-Y. Ho, C.-H. Hsieh, H.-M. Chen, and H.-L. Huang. Interpretable gene expression
classifier with an accurate and compact fuzzy rule base for microarray data analysis.
Biosystems, 85(3):165–176, September 2006. GPBiB
P. Holmes. The odin genetic programming system. Tech Report RR-95-3, Computer
Studies, Napier University, Craiglockhart, 216 Colinton Road, Edinburgh, EH14 1DJ,
1995. URL http://citeseer.ist.psu.edu/holmes95odin.html. GPBiB
100 1000 2
Avg Size Avg Fitness sin(x)
Best Fitness GP (gen=8)
90
1.5
80
1
70
100
0.5
60
Average Size
Fitness
Generation 8 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
186 BIBLIOGRAPHY
J.-H. Hong and S.-B. Cho. The classification of cancer based on DNA microarray data
that uses diverse ensemble genetic programming. Artificial Intelligence In Medicine,
36(1):43–58, January 2006. GPBiB
D. Howard, S. C. Roberts, and C. Ryan. Pragmatic genetic programming strategy for the
problem of vehicle detection in airborne reconnaissance. Pattern Recognition Letters,
27(11):1275–1288, August 2006. Evolutionary Computer Vision and Image Under-
standing. GPBiB
H. Iba. Random tree generation for genetic programming. In H.-M. Voigt, et al., editors,
Parallel Problem Solving from Nature IV, Proceedings of the International Conference
on Evolutionary Computation, volume 1141 of LNCS, pages 144–153, Berlin, Germany,
22-26 September 1996a. Springer Verlag. ISBN 3-540-61723-X. GPBiB
H. Iba, H. de Garis, and T. Sato. Temporal data processing using genetic programming.
In L. Eshelman, editor, Genetic Algorithms: Proceedings of the Sixth International
Conference (ICGA95), pages 279–286, Pittsburgh, PA, USA, 15-19 July 1995a. Morgan
Kaufmann. ISBN 1-55860-370-0. GPBiB
BIBLIOGRAPHY 187
H. Iba, T. Sato, and H. de Garis. Recombination guidance for numerical genetic pro-
gramming. In 1995 IEEE Conference on Evolutionary Computation, volume 1, pages
97–102, Perth, Australia, 29 November - 1 December 1995b. IEEE Press. GPBiB
C. Jacob. The art of genetic programming. IEEE Intelligent Systems, 15(3):83–84, May-
June 2000. ISSN 1094-7167. URL http://ieeexplore.ieee.org/iel5/5254/18363/
00846288.pdf. GPBiB
N. Jin and E. P. K. Tsang. Co-adaptive strategies for sequential bargaining problems with
discount factors and outside options. In Proceedings of the 2006 IEEE Congress on
Evolutionary Computation, pages 7913–7920, Vancouver, 6-21 July 2006. IEEE Press.
ISBN 0-7803-9487-9. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 9 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
188 BIBLIOGRAPHY
M. Kaboudan. Extended daily exchange rates forecasts using wavelet temporal resolu-
tions. New Mathematics and Natural Computing, 1:79–107, 2005. GPBiB
T. Kalganova and J. Miller. Evolving more efficient digital circuits by allowing circuit
layout evolution and multi-objective fitness. In A. Stoica, et al., editors, The First
NASA/DoD Workshop on Evolvable Hardware, pages 54–63, Pasadena, California,
19-21 July 1999. IEEE Computer Society. ISBN 0-7695-0256-3. GPBiB
D. B. Kell. Defence against the flood. Bioinformatics World, pages 16–18, January/Febru-
ary 2002a. URL http://dbkgroup.org/Papers/biwpp16-18_as_publ.pdf. GPBiB
BIBLIOGRAPHY 189
A. Khan and A. M. Mirza. Genetic perceptual shaping: Utilizing cover image and con-
ceivable attack information during watermark embedding. Information Fusion, 8(4):
354–365, October 2007. ISSN 1566-2535. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 11 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
190 BIBLIOGRAPHY
K. E. Kinnear, Jr. A perspective on the work in this book. In K. E. Kinnear, Jr., editor,
Advances in Genetic Programming, chapter 1, pages 3–19. MIT Press, 1994b. URL
http://cognet.mit.edu/library/books/view?isbn=0262111888. GPBiB
J. R. Koza. Two ways of discovering the size and shape of a computer program to solve a
problem. In L. Eshelman, editor, Genetic Algorithms: Proceedings of the Sixth Interna-
tional Conference (ICGA95), pages 287–294, Pittsburgh, PA, USA, 15-19 July 1995.
Morgan Kaufmann. ISBN 1-55860-370-0. URL http://www.genetic-programming.
com/jkpdf/icga1995.pdf. GPBiB
J. R. Koza, editor. Late Breaking Papers at the Genetic Programming 1996 Conference
Stanford University July 28-31, 1996, Stanford University, CA, USA, 28–31 July 1996.
Stanford Bookstore. ISBN 0-18-201031-7. URL http://www.genetic-programming.
org/gp96latebreaking.html. GPBiB
J. R. Koza, editor. Late Breaking Papers at the 1997 Genetic Programming Confer-
ence, Stanford University, CA, USA, 13–16 July 1997. Stanford Bookstore. ISBN
0-18-206995-8. URL http://www.genetic-programming.org/gp97latebreaking.html.
GPBiB
J. R. Koza, editor. Late Breaking Papers at the 1998 Genetic Programming Conference,
University of Wisconsin, Madison, WI, USA, 22-25 July 1998. Omni Press. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 13 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
192 BIBLIOGRAPHY
J. R. Koza, F. H. Bennett, III, D. Andre, and M. A. Keane. The design of analog circuits
by means of genetic programming. In P. Bentley, editor, Evolutionary Design by
Computers, chapter 16, pages 365–385. Morgan Kaufmann, San Francisco, USA, 1999.
ISBN 1-55860-605-X. URL http://www.genetic-programming.com/jkpdf/edc1999.
pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 15 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
194 BIBLIOGRAPHY
W. B. Langdon. How many good programs are there? How long are they? In K. A.
De Jong, et al., editors, Foundations of Genetic Algorithms VII, pages 183–202,
Torremolinos, Spain, 4-6 September 2002b. Morgan Kaufmann. ISBN 0-12-208155-
2. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/wbl_foga2002.pdf.
Published 2003. GPBiB
W. B. Langdon and B. F. Buxton. Genetic programming for mining DNA chip data
from cancer patients. Genetic Programming and Evolvable Machines, 5(3):251–257,
September 2004. ISSN 1389-2576. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/
ftp/papers/wbl_dnachip.pdf. GPBiB
W. B. Langdon and R. Poli. Fitness causes bloat. In P. K. Chawdhry, et al., editors, Soft
Computing in Engineering Design and Manufacturing, pages 13–22. Springer-Verlag
London, 23-27 June 1997. ISBN 3-540-76214-0. URL http://www.cs.bham.ac.uk/
~wbl/ftp/papers/WBL.bloat_wsc2.ps.gz. GPBiB
W. B. Langdon and R. Poli. Why ants are hard. In J. R. Koza, et al., editors, Genetic
Programming 1998: Proceedings of the Third Annual Conference, pages 193–201, Uni-
versity of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998a. Morgan Kaufmann.
ISBN 1-55860-548-7. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/
WBL.antspace_gp98.pdf. GPBiB
W. B. Langdon and R. Poli. Better trained ants for genetic programming. Technical
Report CSRP-98-12, University of Birmingham, School of Computer Science, April
1998b. URL ftp://ftp.cs.bham.ac.uk/pub/tech-reports/1998/CSRP-98-12.ps.gz.
GPBiB
W. B. Langdon. Genetic Programming and Data Structures. Kluwer, Boston, 1998. ISBN
0-7923-8135-1. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/gpdata. GPBiB
100 1000 2
Avg Size Avg Fitness sin(x)
Best Fitness GP (gen=17)
90
1.5
80
1
70
100
0.5
60
Average Size
Fitness
Generation 17 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
196 BIBLIOGRAPHY
W. B. Langdon. Size fair and homologous tree genetic programming crossovers. Ge-
netic Programming and Evolvable Machines, 1(1/2):95–119, April 2000. ISSN 1389-
2576. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/WBL_fairxo.
pdf. GPBiB
W. B. Langdon and R. Poli. Evolutionary solo pong players. In D. Corne, et al., editors,
Proceedings of the 2005 IEEE Congress on Evolutionary Computation, volume 3, pages
2621–2628, Edinburgh, UK, 2-5 September 2005. IEEE Press. ISBN 0-7803-9363-
5. URL http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/pong_cec2005.pdf.
GPBiB
W. B. Langdon and R. Poli. On turing complete T7 and MISC F–4 program fitness
landscapes. In D. V. Arnold, et al., editors, Theory of Evolutionary Algorithms, num-
ber 06061 in Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 5-10 February 2006.
Internationales Begegnungs- und Forschungszentrum fuer Informatik (IBFI), Schloss
Dagstuhl, Germany. URL http://drops.dagstuhl.de/opus/volltexte/2006/595.
<http://drops.dagstuhl.de/opus/volltexte/2006/595> [date of citation: 2006-01-01].
GPBiB
W. B. Langdon, T. Soule, R. Poli, and J. A. Foster. The evolution of size and shape.
In L. Spector, et al., editors, Advances in Genetic Programming 3, chapter 8, pages
163–190. MIT Press, Cambridge, MA, USA, June 1999. ISBN 0-262-19423-6. URL
http://www.cs.bham.ac.uk/~wbl/aigp3/ch08.pdf. GPBiB
R. Linden and A. Bhaya. Evolving fuzzy rules to model gene expression. Biosystems, 88
(1-2):76–91, March 2007. GPBiB
H. Lipson. How to draw a straight line using a GP: Benchmarking evolutionary design
against 19th century kinematic synthesis. In M. Keijzer, editor, Late Breaking Papers
at the 2004 Genetic and Evolutionary Computation Conference, Seattle, Washing-
ton, USA, 26 July 2004. URL http://www.cs.bham.ac.uk/~wbl/biblio/gecco2004/
LBP063.pdf. GPBiB
J. Lohn, G. Hornby, and D. Linden. Evolutionary antenna design for a NASA spacecraft.
In U.-M. O’Reilly, et al., editors, Genetic Programming Theory and Practice II, chap-
ter 18, pages 301–315. Springer, Ann Arbor, 13-15 May 2004. ISBN 0-387-23253-2.
GPBiB
M. Looks, B. Goertzel, and C. Pennachin. Learning computer programs with the bayesian
optimization algorithm. In H.-G. Beyer, et al., editors, GECCO 2005: Proceedings of
the 2005 conference on Genetic and evolutionary computation, volume 1, pages 747–
748, Washington DC, USA, 25-29 June 2005. ACM Press. ISBN 1-59593-010-8. URL
http://www.cs.bham.ac.uk/~wbl/biblio/gecco2005/docs/p747.pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 19 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
198 BIBLIOGRAPHY
J. Louchet, M. Guyon, M.-J. Lesot, and A. Boumaza. Dynamic flies: a new pattern
recognition tool applied to stereo sequence processing. Pattern Recognition Letters, 23
(1-3):335–345, January 2002. GPBiB
S. Luke. Two fast tree-creation algorithms for genetic programming. IEEE Trans-
actions on Evolutionary Computation, 4(3):274–283, September 2000. URL http:
//ieeexplore.ieee.org/iel5/4235/18897/00873237.pdf. GPBiB
P. Machado and J. Romero, editors. The Art of Artificial Evolution. Springer, 2008.
P. Marenbach. Using prior knowledge and obtaining process insight in data based mod-
elling of bioprocesses. System Analysis Modelling Simulation, 31:39–59, 1998. GPBiB
M. C. Martin. Evolving visual sonar: Depth from monocular images. Pattern Recognition
Letters, 27(11):1174–1180, August 2006. URL http://martincmartin.com/papers/
EvolvingVisualSonarPatternRecognitionLetters2006.pdf. Evolutionary Computer
Vision and Image Understanding. GPBiB
S. R. Maxwell, III. Why might some problems be difficult for genetic programming to find
solutions? In J. R. Koza, editor, Late Breaking Papers at the Genetic Programming
1996 Conference Stanford University July 28-31, 1996, pages 125–128, Stanford Uni-
versity, CA, USA, 28–31 July 1996. Stanford Bookstore. ISBN 0-18-201031-7. GPBiB
J. McCormack. New challenges for evolutionary music and art. SIGEvolution, 1(1):5–11,
April 2006. URL http://www.sigevolution.org/2006/01/issue.pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 21 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
200 BIBLIOGRAPHY
N. F. McPhee, A. Jarvis, and E. F. Crane. On the strength of size limits in linear genetic
programming. In K. Deb, et al., editors, Genetic and Evolutionary Computation –
GECCO-2004, Part II, volume 3103 of Lecture Notes in Computer Science, pages
593–604, Seattle, WA, USA, 26-30 June 2004. Springer-Verlag. ISBN 3-540-22343-
6. URL http://link.springer.de/link/service/series/0558/bibs/3103/31030593.
htm. GPBiB
N. F. McPhee and R. Poli. A schema theory analysis of the evolution of size in genetic pro-
gramming with linear representations. In J. F. Miller, et al., editors, Genetic Program-
ming, Proceedings of EuroGP’2001, volume 2038 of LNCS, pages 108–125, Lake Como,
Italy, 18-20 April 2001. Springer-Verlag. ISBN 3-540-41899-7. URL http://cswww.
essex.ac.uk/staff/poli/papers/McPhee-EUROGP2001-ST-Linear-Bloat.pdf. GPBiB
B. Mitavskiy and J. Rowe. Some results about the markov chains associated to GPs and
to general EAs. Theoretical Computer Science, 361(1):72–110, 28 August 2006. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 23 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
202 BIBLIOGRAPHY
H. Mühlenbein and T. Mahnig. FDA – a scalable evolutionary algorithm for the opti-
mization of additively decomposed functions. Evolutionary Computation, 7(4):353–376,
1999b.
C. J. Neely and P. A. Weller. Technical trading rules in the european monetary system.
Journal of International Money and Finance, 18(3):429–458, 1999. URL http://
research.stlouisfed.org/wp/1997/97-015.pdf. GPBiB
C. J. Neely and P. A. Weller. Technical analysis and central bank intervention. Journal
of International Money and Finance, 20(7):949–970, December 2001. URL http:
//research.stlouisfed.org/wp/1997/97-002.pdf. GPBiB
A. E. Nix and M. D. Vose. Modeling genetic algorithms with Markov chains. Annals of
Mathematics and Artificial Intelligence, 5:79–88, 1992.
P. Nordin. A compiling genetic programming system that directly manipulates the ma-
chine code. In K. E. Kinnear, Jr., editor, Advances in Genetic Programming, chap-
ter 14, pages 311–331. MIT Press, 1994. URL http://cognet.mit.edu/library/books/
view?isbn=0262111888. GPBiB
P. Nordin. Evolutionary Program Induction of Binary Machine Code and its Applications.
PhD thesis, der Universitat Dortmund am Fachereich Informatik, 1997. GPBiB
nVidia. NVIDIA CUDA Compute Unified Device Architecture, programming guide. Tech-
nical Report version 0.8, NVIDIA, 12 Feb 2007.
H. Oakley. Two scientific applications of genetic programming: Stack filters and non-
linear equation fitting to chaotic data. In K. E. Kinnear, Jr., editor, Advances in
Genetic Programming, chapter 17, pages 369–389. MIT Press, 1994. URL http://
cognet.mit.edu/library/books/view?isbn=0262111888. GPBiB
M. Oltean and D. Dumitrescu. Evolving TSP heuristics using multi expression program-
ming. In M. Bubak, et al., editors, Computational Science - ICCS 2004: 4th In-
ternational Conference, Part II, volume 3037 of Lecture Notes in Computer Science,
pages 670–673, Krakow, Poland, 6-9 June 2004. Springer-Verlag. ISBN 3-540-22115-
8. URL http://springerlink.metapress.com/openurl.asp?genre=article&issn=
0302-9743&volume=3037&spage=670. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 25 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
204 BIBLIOGRAPHY
S. Openshaw and I. Turton. Building new spatial interaction models using genetic
programming. In T. C. Fogarty, editor, Evolutionary Computing, Lecture Notes
in Computer Science, Leeds, UK, 11-13 April 1994. Springer-Verlag. URL http:
//www.geog.leeds.ac.uk/papers/94-1/94-1.pdf. GPBiB
U.-M. O’Reilly and M. Hemberg. Integrating generative growth and evolutionary com-
putation for form exploration. Genetic Programming and Evolvable Machines, 8(2):
163–186, June 2007. ISSN 1389-2576. Special issue on developmental systems. GPBiB
U.-M. O’Reilly and F. Oppacher. Program search with a hierarchical variable length
representation: Genetic programming, simulated annealing and hill climbing. In
Y. Davidor, et al., editors, Parallel Problem Solving from Nature – PPSN III, number
866 in Lecture Notes in Computer Science, pages 397–406, Jerusalem, 9-14 October
1994a. Springer-Verlag. ISBN 3-540-58484-6. URL http://www.cs.ucl.ac.uk/staff/
W.Langdon/ftp/papers/ppsn-94.ps.gz. GPBiB
U.-M. O’Reilly and F. Oppacher. The troubling aspects of a building block hy-
pothesis for genetic programming. In L. D. Whitley and M. D. Vose, ed-
itors, Foundations of Genetic Algorithms 3, pages 73–88, Estes Park, Col-
orado, USA, 31 July–2 August 1994b. Morgan Kaufmann. ISBN 1-55860-356-5.
URL http://citeseer.ist.psu.edu/cache/papers/cs/163/http:zSzzSzwww.ai.mit.
eduzSzpeoplezSzunamayzSzpaperszSzfoga.pdf/oreilly92troubling.pdf. Published
1995. GPBiB
U.-M. O’Reilly, T. Yu, R. L. Riolo, and B. Worzel, editors. Genetic Programming Theory
and Practice II, volume 8 of Genetic Programming, Ann Arbor, MI, USA, 13-15 May
2004. Springer. ISBN 0-387-23253-2. URL http://www.springeronline.com/sgw/cda/
frontpage/0,11855,5-40356-22-34954683-0,00.html. GPBiB
L. Panait and S. Luke. Alternative bloat control methods. In K. Deb, et al., editors,
Genetic and Evolutionary Computation – GECCO-2004, Part II, volume 3103 of Lec-
ture Notes in Computer Science, pages 630–641, Seattle, WA, USA, 26-30 June 2004.
Springer-Verlag. ISBN 3-540-22343-6. URL http://cs.gmu.edu/~lpanait/papers/
panait04alternative.pdf. GPBiB
R. Poli. Discovery of symbolic, neuro-symbolic and neural networks with parallel dis-
tributed genetic programming. Technical Report CSRP-96-14, University of Birming-
ham, School of Computer Science, August 1996a. URL ftp://ftp.cs.bham.ac.uk/
pub/tech-reports/1996/CSRP-96-14.ps.gz. Presented at 3rd International Confer-
ence on Artificial Neural Networks and Genetic Algorithms, ICANNGA’97. GPBiB
R. Poli. Genetic programming for image analysis. In J. R. Koza, et al., editors, Ge-
netic Programming 1996: Proceedings of the First Annual Conference, pages 363–
368, Stanford University, CA, USA, 28–31 July 1996b. MIT Press. URL http:
//cswww.essex.ac.uk/staff/rpoli/papers/Poli-GP1996.pdf. GPBiB
R. Poli. Parallel distributed genetic programming. In D. Corne, et al., editors, New Ideas
in Optimization, Advanced Topics in Computer Science, chapter 27, pages 403–431.
McGraw-Hill, Maidenhead, Berkshire, England, 1999a. ISBN 0-07-709506-5. URL
http://citeseer.ist.psu.edu/328504.html. GPBiB
R. Poli. Sub-machine-code GP: New results and extensions. In R. Poli, et al., editors,
Genetic Programming, Proceedings of EuroGP’99, volume 1598 of LNCS, pages 65–
82, Goteborg, Sweden, 26-27 May 1999b. Springer-Verlag. ISBN 3-540-65899-8. URL
http://www.cs.essex.ac.uk/staff/poli/papers/Poli-EUROGP1999.pdf. GPBiB
100 1000 2
Avg Size Avg Fitness sin(x)
Best Fitness GP (gen=27)
90
1.5
80
1
70
100
0.5
60
Average Size
Fitness
Generation 27 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
206 BIBLIOGRAPHY
R. Poli. Hyperschema theory for GP with one-point crossover, building blocks, and some
new results in GA theory. In R. Poli, et al., editors, Genetic Programming, Proceedings
of EuroGP’2000, volume 1802 of LNCS, pages 163–180, Edinburgh, 15-16 April 2000a.
Springer-Verlag. ISBN 3-540-67339-3. URL http://www.springerlink.com/openurl.
asp?genre=article&issn=0302-9743&volume=1802&spage=163. GPBiB
R. Poli. Exact schema theorem and effective fitness for GP with one-point crossover. In
D. Whitley, et al., editors, Proceedings of the Genetic and Evolutionary Computation
Conference, pages 469–476, Las Vegas, July 2000b. Morgan Kaufmann.
R. Poli. Exact schema theory for genetic programming and variable-length genetic algo-
rithms with one-point crossover. Genetic Programming and Evolvable Machines, 2(2):
123–163, 2001a.
R. Poli. General schema theory for genetic programming with subtree-swapping crossover.
In Genetic Programming, Proceedings of EuroGP 2001, LNCS, Milan, 18-20 April
2001b. Springer-Verlag.
R. Poli and W. B. Langdon. A new schema theory for genetic programming with one-point
crossover and point mutation. In J. R. Koza, et al., editors, Genetic Programming 1997:
Proceedings of the Second Annual Conference, pages 278–285, Stanford University,
CA, USA, 13-16 July 1997. Morgan Kaufmann. URL http://citeseer.ist.psu.edu/
327495.html. GPBiB
R. Poli and W. B. Langdon. Schema theory for genetic programming with one-point
crossover and point mutation. Evolutionary Computation, 6(3):231–252, 1998a. URL
http://cswww.essex.ac.uk/staff/poli/papers/Poli-ECJ1998.pdf. GPBiB
R. Poli and W. B. Langdon. Efficient markov chain model of machine code program
execution and halting. In R. L. Riolo, et al., editors, Genetic Programming Theory
and Practice IV, volume 5 of Genetic and Evolutionary Computation, chapter 13.
Springer, Ann Arbor, 11-13 May 2006b. ISBN 0-387-33375-4. URL http://www.cs.
essex.ac.uk/staff/poli/papers/GPTP2006.pdf. GPBiB
R. Poli, W. B. Langdon, and O. Holland. Extending particle swarm optimisation via ge-
netic programming. In M. Keijzer, et al., editors, Proceedings of the 8th European Con-
ference on Genetic Programming, volume 3447 of Lecture Notes in Computer Science,
pages 291–300, Lausanne, Switzerland, 30 March - 1 April 2005. Springer. ISBN 3-540-
25436-6. URL http://www.cs.essex.ac.uk/staff/poli/papers/eurogpPSO2005.pdf.
GPBiB
R. Poli and N. F. McPhee. Exact schema theorems for GP with one-point and stan-
dard crossover operating on linear structures and their application to the study of
the evolution of size. In J. F. Miller, et al., editors, Genetic Programming, Proceed-
ings of EuroGP’2001, volume 2038 of LNCS, pages 126–142, Lake Como, Italy, 18-20
April 2001. Springer-Verlag. ISBN 3-540-41899-7. URL http://www.springerlink.
com/openurl.asp?genre=article&issn=0302-9743&volume=2038&spage=126. GPBiB
R. Poli and N. F. McPhee. General schema theory for genetic programming with subtree-
swapping crossover: Part I. Evolutionary Computation, 11(1):53–66, March 2003a.
URL http://cswww.essex.ac.uk/staff/rpoli/papers/ecj2003partI.pdf. GPBiB
R. Poli and N. F. McPhee. General schema theory for genetic programming with subtree-
swapping crossover: Part II. Evolutionary Computation, 11(2):169–206, June 2003b.
URL http://cswww.essex.ac.uk/staff/rpoli/papers/ecj2003partII.pdf. GPBiB
100 1000 2
Avg Size Avg Fitness sin(x)
Best Fitness GP (gen=29)
90
1.5
80
1
70
100
0.5
60
Average Size
Fitness
Generation 29 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
208 BIBLIOGRAPHY
R. Poli, N. F. McPhee, and J. E. Rowe. Exact schema theory and markov chain mod-
els for genetic programming and variable-length genetic algorithms with homologous
crossover. Genetic Programming and Evolvable Machines, 5(1):31–70, March 2004.
ISSN 1389-2576. URL http://cswww.essex.ac.uk/staff/rpoli/papers/GPEM2004.
pdf. GPBiB
R. Poli and J. Page. Solving high-order boolean parity problems with smooth uniform
crossover, sub-machine code GP and demes. Genetic Programming and Evolvable
Machines, 1(1/2):37–56, April 2000. ISSN 1389-2576. URL http://citeseer.ist.
psu.edu/335584.html. GPBiB
R. Poli, J. E. Rowe, and N. F. McPhee. Markov chain models for GP and variable-length
GAs with homologous crossover. In L. Spector, et al., editors, Proceedings of the
Genetic and Evolutionary Computation Conference (GECCO-2001), pages 112–119,
San Francisco, California, USA, 7-11 July 2001. Morgan Kaufmann. ISBN 1-55860-
774-9. URL http://www.cs.bham.ac.uk/~wbl/biblio/gecco2001/d01.pdf. GPBiB
J. C. F. Pujol and R. Poli. Evolution of the topology and the weights of neural networks
using genetic programming with a dual representation. Technical Report CSRP-97-7,
University of Birmingham, School of Computer Science, February 1997. URL ftp:
//ftp.cs.bham.ac.uk/pub/tech-reports/1997/CSRP-97-07.ps.gz. GPBiB
B. Punch and D. Zongker. lil-gp Genetic Programming System, 1998. URL http://
garage.cse.msu.edu/software/lil-gp/index.html.
A. Ratle and M. Sebag. Genetic programming and domain knowledge: Beyond the limita-
tions of grammar-guided machine discovery. In M. Schoenauer, et al., editors, Parallel
Problem Solving from Nature - PPSN VI 6th International Conference, volume 1917
of LNCS, pages 211–220, Paris, France, 16-20 September 2000. Springer Verlag. URL
http://www.lri.fr/~sebag/REF/PPSN00.ps. GPBiB
A. Ratle and M. Sebag. Avoiding the bloat with probabilistic grammar-guided genetic
programming. In P. Collet, et al., editors, Artificial Evolution 5th International
Conference, Evolution Artificielle, EA 2001, volume 2310 of LNCS, pages 255–266,
Creusot, France, October 29-31 2001. Springer Verlag. ISBN 3-540-43544-1. URL http:
//link.springer.de/link/service/series/0558/papers/2310/23100255.pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 36 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
210 BIBLIOGRAPHY
R. L. Riolo, T. Soule, and B. Worzel, editors. Genetic Programming Theory and Practice
IV, volume 5 of Genetic and Evolutionary Computation, Ann Arbor, 11-13 May 2007a.
Springer. ISBN 0-387-33375-4. URL http://www.springer.com/west/home/computer/
foundations?SGWID=4-156-22-173660377-0. GPBiB
R. L. Riolo, T. Soule, and B. Worzel, editors. Genetic Programming Theory and Practice
V, Genetic and Evolutionary Computation, Ann Arbor, 17-19 May 2007b. Springer.
GPBiB
D. Rivero, J. R. R. nal, J. Dorado, and A. Pazos. Using genetic programming for character
discrimination in damaged documents. In G. R. Raidl, et al., editors, Applications
of Evolutionary Computing, EvoWorkshops2004: EvoBIO, EvoCOMNET, EvoHOT,
EvoIASP, EvoMUSART, EvoSTOC, volume 3005 of LNCS, pages 349–358, Coimbra,
Portugal, 5-7 April 2004. Springer Verlag. ISBN 3-540-21378-3. GPBiB
A. Robinson and L. Spector. Using genetic programming with multiple data types and
automatic modularization to evolve decentralized and coordinated navigation in multi-
agent systems. In E. Cantú-Paz, editor, Late Breaking Papers at the Genetic and Evo-
lutionary Computation Conference (GECCO-2002), pages 391–396, New York, NY,
July 2002. AAAI. GPBiB
C. Ryan. Pygmies and civil servants. In K. E. Kinnear, Jr., editor, Advances in Genetic
Programming, chapter 11, pages 243–263. MIT Press, 1994. URL http://cognet.mit.
edu/library/books/view?isbn=0262111888. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 43 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
212 BIBLIOGRAPHY
A. L. Samuel. AI, where it has been and where it is going. In IJCAI, pages 1152–1157,
1983.
A. Sarafopoulos. Automatic generation of affine IFS and strongly typed genetic pro-
gramming. In R. Poli, et al., editors, Genetic Programming, Proceedings of Eu-
roGP’99, volume 1598 of LNCS, pages 149–160, Goteborg, Sweden, 26-27 May 1999.
Springer-Verlag. ISBN 3-540-65899-8. URL http://www.springerlink.com/openurl.
asp?genre=article&issn=0302-9743&volume=1598&spage=149. GPBiB
K. Sastry and D. E. Goldberg. Probabilistic model building and competent genetic pro-
gramming. In R. L. Riolo and B. Worzel, editors, Genetic Programming Theory and
Practise, chapter 13, pages 205–220. Kluwer, 2003. ISBN 1-4020-7581-2. GPBiB
M. D. Schmidt and H. Lipson. Co-evolving fitness predictors for accelerating and reducing
evaluations. In R. L. Riolo, et al., editors, Genetic Programming Theory and Practice
IV, volume 5 of Genetic and Evolutionary Computation, chapter 17, pages –. Springer,
Ann Arbor, 11-13 May 2006. ISBN 0-387-33375-4. GPBiB
polytechnique.frzSzpaperszSzmarczSzAGP2.pdf/schoenauer96evolutionary.pdf.
GPBiB
H.-S. Seok, K.-J. Lee, and B.-T. Zhang. An on-line learning method for object-
locating robots using genetic programming on evolvable hardware. In M. Sugisaka
and H. Tanaka, editors, Proceedings of the Fifth International Symposium on Artifi-
cial Life and Robotics, volume 1, pages 321–324, Oita, Japan, 26-28 January 2000. URL
http://bi.snu.ac.kr/Publications/Conferences/International/AROB00.ps. GPBiB
S. C. Shah and A. Kusiak. Data mining and genetic algorithm based gene/SNP selection.
Artificial Intelligence in Medicine, 31(3):183–196, July 2004. URL http://www.icaen.
uiowa.edu/~ankusiak/Journal-papers/Gen_Shital.pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 50 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
214 BIBLIOGRAPHY
E. V. Siegel. Competitively evolving decision trees against fixed training cases for natural
language processing. In K. E. Kinnear, Jr., editor, Advances in Genetic Programming,
chapter 19, pages 409–423. MIT Press, 1994. URL http://www1.cs.columbia.edu/
nlp/papers/1994/siegel_94.pdf. GPBiB
S. Silva and J. Almeida. Dynamic maximum tree depth. In E. Cantú-Paz, et al., editors,
Genetic and Evolutionary Computation – GECCO-2003, volume 2724 of LNCS, pages
1776–1787, Chicago, 12-16 July 2003. Springer-Verlag. ISBN 3-540-40603-4. URL
http://cisuc.dei.uc.pt/ecos/dlfile.php?fn=109_pub_27241776.pdf. GPBiB
S. Silva and E. Costa. Dynamic limits for bloat control: Variations on size and depth. In
K. Deb, et al., editors, Genetic and Evolutionary Computation – GECCO-2004, Part
II, volume 3103 of Lecture Notes in Computer Science, pages 666–677, Seattle, WA,
USA, 26-30 June 2004. Springer-Verlag. ISBN 3-540-22343-6. URL http://cisuc.
dei.uc.pt/ecos/dlfile.php?fn=714_pub_31030666.pdf&idp=714. GPBiB
S. Silva and E. Costa. Comparing tree depth limits and resource-limited GP. In D. Corne,
et al., editors, Proceedings of the 2005 IEEE Congress on Evolutionary Computation,
volume 1, pages 920–927, Edinburgh, UK, 2-5 September 2005a. IEEE Press. ISBN
0-7803-9363-5. GPBiB
BIBLIOGRAPHY 215
K. Sims. Artificial evolution for computer graphics. ACM Computer Graphics, 25(4):319–
328, July 1991. URL http://delivery.acm.org/10.1145/130000/122752/p319-sims.
pdf. SIGGRAPH ’91 Proceedings. GPBiB
W. Smart and M. Zhang. Applying online gradient descent search to genetic program-
ming for object recognition. In J. Hogan, et al., editors, CRPIT ’04: Proceedings of the
second workshop on Australasian information security, Data Mining and Web Intel-
ligence, and Software Internationalisation, volume 32 no. 7, pages 133–138, Dunedin,
New Zealand, January 2004. Australian Computer Society, Inc. ISBN 1-920682-14-7.
URL http://crpit.com/confpapers/CRPITV32Smart.pdf. GPBiB
T. Soule and J. A. Foster. Removal bias: a new cause of code growth in tree based
evolutionary programming. In 1998 IEEE International Conference on Evolutionary
Computation, pages 781–186, Anchorage, Alaska, USA, 5-9 May 1998a. IEEE Press.
URL http://citeseer.ist.psu.edu/313655.html. GPBiB
T. Soule and J. A. Foster. Effects of code growth and parsimony pressure on populations
in genetic programming. Evolutionary Computation, 6(4):293–309, Winter 1998b. URL
http://mitpress.mit.edu/journals/EVCO/Soule.pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 57 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
216 BIBLIOGRAPHY
L. Spector and A. Alpern. Criticism, culture, and the automatic generation of artworks.
In Proceedings of Twelfth National Conference on Artificial Intelligence, pages 3–8,
Seattle, Washington, USA, 1994. AAAI Press/MIT Press. GPBiB
L. Spector, J. Klein, and M. Keijzer. The push3 execution stack and the evolution
of control. In H.-G. Beyer, et al., editors, GECCO 2005: Proceedings of the 2005
conference on Genetic and evolutionary computation, volume 2, pages 1689–1696,
Washington DC, USA, 25-29 June 2005a. ACM Press. ISBN 1-59593-010-8. URL
http://www.cs.bham.ac.uk/~wbl/biblio/gecco2005/docs/p1689.pdf. GPBiB
J. Stender, editor. Parallel Genetic Algorithms: Theory and Applications. IOS press,
1993.
I. Tanev, T. Uozumi, and D. Akhmetov. Component object based single system image for
dependable implementation of genetic programming on clusters. Cluster Computing
Journal, 7(4):347–356, October 2004. ISSN 1386-7857 (Paper) 1573-7543 (Online).
URL http://www.kluweronline.com/issn/1386-7857. GPBiB
A. Teller. Genetic programming, indexed memory, the halting problem, and other cu-
riosities. In Proceedings of the 7th annual Florida Artificial Intelligence Research
Symposium, pages 270–274, Pensacola, Florida, USA, May 1994. IEEE Press. URL
http://www.cs.cmu.edu/afs/cs/usr/astro/public/papers/Curiosities.ps. GPBiB
A. Teller and D. Andre. Automatically choosing the number of fitness cases: The rational
allocation of trials. In J. R. Koza, et al., editors, Genetic Programming 1997: Pro-
ceedings of the Second Annual Conference, pages 321–328, Stanford University, CA,
USA, 13-16 July 1997. Morgan Kaufmann. URL http://www.cs.cmu.edu/afs/cs/usr/
astro/public/papers/GR.ps. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 64 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
218 BIBLIOGRAPHY
S. Thompson. Type theory and functional programming. Addison Wesley Longman Pub-
lishing Co., Inc., Redwood City, CA, USA, 1991. ISBN 0-201-41667-0.
L. Trujillo and G. Olague. Using evolution to learn how to perform interest point de-
tection. In X. Y. T. et al., editor, ICPR 2006 18th International Conference on
Pattern Recognition, volume 1, pages 211–214. IEEE, 20-24 August 2006a. URL http:
//www.genetic-programming.org/hc2006/Olague-Paper-2-ICPR-2006.pdf. GPBiB
L. Trujillo and G. Olague. Synthesis of interest point detectors through genetic pro-
gramming. In M. Keijzer, et al., editors, GECCO 2006: Proceedings of the 8th an-
nual conference on Genetic and evolutionary computation, volume 1, pages 887–894,
Seattle, Washington, USA, 8-12 July 2006b. ACM Press. ISBN 1-59593-186-4. URL
http://www.cs.bham.ac.uk/~wbl/biblio/gecco2006/docs/p887.pdf. GPBiB
E. P. K. Tsang, S. Markose, and H. Er. Chance discovery in stock index option and future
arbitrage. New Mathematics and Natural Computation, 1(3):435–447, 2005.
E. P. K. Tsang and J. Li. EDDIE for financial forecasting. In S.-H. Chen, editor, Genetic
Algorithms and Genetic Programming in Computational Finance, chapter 7, pages
161–174. Kluwer Academic Press, 2002. ISBN 0-7923-7601-3. URL http://cswww.
essex.ac.uk/CSP/finance/papers/TsangLi-FGP-Chen_CompFinance.pdf. GPBiB
E. P. K. Tsang, J. Li, and J. M. Butler. EDDIE beats the bookies. Software: Practice
and Experience, 28(10):1033–1043, 1998. ISSN 0038-0644. URL http://cswww.essex.
ac.uk/CSP/finance/papers/TsBuLi-Eddie-Software98.pdf. GPBiB
J. J. Valdes and A. J. Barton. Virtual reality visual data mining via neural networks
obtained from multi-objective evolutionary optimization: Application to geophysical
prospecting. In International Joint Conference on Neural Networks, IJCNN’06, pages
4862–4869, Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada, 16-21
July 2006. IEEE. GPBiB
R. L. Walker. Search engine case study: searching the web using ge-
netic programming and MPI. Parallel Computing, 27(1-2):71–89, January
2001. URL http://www.sciencedirect.com/science/article/B6V12-42K5HNX-4/1/
57eb870c72fb7768bb7d824557444b72. GPBiB
P. Walsh and C. Ryan. Paragen: A novel technique for the autoparallelisation of se-
quential programs using genetic programming. In J. R. Koza, et al., editors, Ge-
netic Programming 1996: Proceedings of the First Annual Conference, pages 406–
409, Stanford University, CA, USA, 28–31 July 1996. MIT Press. URL http:
//cognet.mit.edu/library/books/view?isbn=0262611279. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 71 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
220 BIBLIOGRAPHY
M. L. Wong and K. S. Leung. Evolving recursive functions for the even-parity problem
using genetic programming. In P. J. Angeline and K. E. Kinnear, Jr., editors, Advances
in Genetic Programming 2, chapter 11, pages 221–240. MIT Press, Cambridge, MA,
USA, 1996. ISBN 0-262-01158-1. GPBiB
M. L. Wong and K. S. Leung. Data Mining Using Grammar Based Genetic Programming
and Applications, volume 3 of Genetic Programming. Kluwer Academic Publishers,
January 2000. ISBN 0-7923-7746-X. GPBiB
BIBLIOGRAPHY 221
M.-L. Wong, T.-T. Wong, and K.-L. Fok. Parallel evolutionary algorithms on graphics
processing unit. In D. Corne, et al., editors, Proceedings of the 2005 IEEE Congress on
Evolutionary Computation, volume 3, pages 2286–2293, Edinburgh, Scotland, UK, 2-5
September 2005. IEEE Press. ISBN 0-7803-9363-5. URL http://ieeexplore.ieee.
org/servlet/opac?punumber=10417&isvol=3.
H. Xie, M. Zhang, and P. Andreae. Genetic programming for automatic stress detection
in spoken english. In F. Rothlauf, et al., editors, Applications of Evolutionary Com-
puting, EvoWorkshops2006: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoInter-
action, EvoMUSART, EvoSTOC, volume 3907 of LNCS, pages 460–471, Budapest, 10-
12 April 2006. Springer Verlag. ISBN 3-540-33237-5. URL http://www.springerlink.
com/openurl.asp?genre=article&issn=0302-9743&volume=3907&spage=460. GPBiB
K. Yanai and H. Iba. Program evolution by integrating EDP and GP. In K. Deb, et al.,
editors, Genetic and Evolutionary Computation – GECCO-2004, Part I, volume 3102
of Lecture Notes in Computer Science, pages 774–785, Seattle, WA, USA, 26-30 June
2004. Springer-Verlag. ISBN 3-540-22344-4. URL http://www.iba.k.u-tokyo.ac.jp/
papers/2004/yanaiGECCO2004.pdf. GPBiB
80
1
70
100
0.5
60
Average Size
Fitness
Generation 78 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
222 BIBLIOGRAPHY
T. Yu. Hierachical processing for evolving recursive and modular programs using higher
order functions and lambda abstractions. Genetic Programming and Evolvable Ma-
chines, 2(4):345–380, December 2001. ISSN 1389-2576. GPBiB
T. Yu and S.-H. Chen. Using genetic programming with lambda abstraction to find techni-
cal trading rules. In Computing in Economics and Finance, University of Amsterdam,
8-10 July 2004. GPBiB
T. Yu, R. L. Riolo, and B. Worzel, editors. Genetic Programming Theory and Practice
III, volume 9 of Genetic Programming, Ann Arbor, 12-14 May 2005. Springer. ISBN
0-387-28110-X. GPBiB
B.-T. Zhang and D.-Y. Cho. Coevolutionary fitness switching: Learning complex col-
lective behaviors using genetic programming. In L. Spector, et al., editors, Advances
in Genetic Programming 3, chapter 18, pages 425–445. MIT Press, Cambridge, MA,
USA, June 1999. ISBN 0-262-19423-6. URL http://bi.snu.ac.kr/Publications/
Books/aigp3.ps. GPBiB
B.-T. Zhang and H. Mühlenbein. Evolving optimal neural networks using genetic al-
gorithms with Occam’s razor. Complex Systems, 7:199–220, 1993. URL http:
//citeseer.ist.psu.edu/zhang93evolving.html. GPBiB
B.-T. Zhang and H. Mühlenbein. Balancing accuracy and parsimony in genetic pro-
gramming. Evolutionary Computation, 3(1):17–38, 1995. URL http://www.ais.
fraunhofer.de/~muehlen/publications/gmd_as_ga-94_09.ps. GPBiB
M. Zhang and U. Bhowan. Pixel statistics and program size in genetic programming for
object detection. Technical Report CS-TR-04-3, Computer Science, Victoria University
of Wellington, New Zealand, 2004. GPBiB
M. Zhang and W. Smart. Using gaussian distribution to construct fitness functions in ge-
netic programming for multiclass object classification. Pattern Recognition Letters, 27
(11):1266–1274, August 2006. Evolutionary Computer Vision and Image Understand-
ing. GPBiB
E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the Strength Pareto Evolu-
tionary Algorithm. Technical Report 103, Gloriastrasse 35, CH-8092 Zurich, Switzer-
land, 2001. URL http://citeseer.ist.psu.edu/article/zitzler01spea.html.
100 1000 2
Avg Size Avg Fitness sin(x)
Best Fitness GP (gen=85)
90
1.5
80
1
70
100
0.5
60
Average Size
Fitness
Generation 85 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
Index
225
226 INDEX
80
1
70
100
0.5
60
Average Size
Fitness
Generation 92 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
228 INDEX
80
1
70
100
0.5
60
Average Size
Fitness
Generation 99 50
40
0
-0.5
10
30
10
-1
-1.5
1 -2
0 20 40 60 80 100 0 20 40 60 80 100 0 1 2 3 4 5 6
Generations Generations x
230 INDEX
search spaces, 99
thesis write up, 139
time series, financial, 123–124
TinyGP, 151–162
Toffoli, distribution of circuits, 99
tournament selection, 14–15, 86
Transputer, 94
trap function, 72
travelling salesman problem (TSP), 127
tree
depth, 12
derivation, 54
editing, 46
size, 12
tree adjoining grammar (TAG), 55
tree-based representation, 10
Tron, 127
Turing complete GP, 64
Turing complete program, theory, 99
Turing test, 117, 142
type
consistency, 21–22, 51
conversion, 51
initialisation, 56
more understandable programs, 51
multiple, 52
single, 51
strong, 52
system
higher-order, 53
multi-level, 53
watermark security
application, 122
multi-objective GP, 79
wavelet lossy compression, 129
world wide GP, 95
Wright’s geographic model of evolution, 88
Colophon
This book was primarily written using the LATEX document preparation
system, along with BibTEX, pdflatex and makeindex. Most of the editing
was done using the emacs and xemacs editors, along with extensions such
as RefTEX; some was done with TEXShop as well. Most of the data plots
were generated using gnuplot and the R statistics package. Diagrams were
generated with a variety of tools, including the Graphviz package, tgif and
xfig. A whole host of programming and scripting languages were used to
automate various processes in both the initial scientific research and in the
production of this book; they are too numerous to list here, but were crucial
nonetheless. The cover was created with Adobe Photoshop1 and gimp.
Coordinating the work of three busy, opinionated authors is not trivial,
and would have been much more difficult without the use of revision control
systems such as Subversion. Around 500 commits were made in a six month
period, averaging around 10 commits per day in the final weeks. The actual
files were hosted as a project at http://assembla.com; we didn’t realise
until several months into the project that Assembla’s president is in fact
Andy Singleton, who did some cool early work in GP in the mid-90’s.
The “reviews” and “summaries” on the back cover were generated
stochastically using the idea of N-grams from linguistics. For the “reviews”
we collected a number of reviews of previous books on GP and EAs, and
tabulated the frequency of different triples of adjacent words. These fre-
quencies of triples in the source text were then used to guide the choices of
words in the generated “reviews”. The only word following the pair “ad”
and “hoc” in our source reviews, for example, was “tweaks”; thus once “ad”
and “hoc” had been chosen, the next word had to be “tweaks”. The pair
“of the”, on the other hand, appears numerous times in our source text, fol-
lowed by words such as “field”, “body”, and “rapidly”. However, “theory”
is the most common successor, and, therefore, the most likely to be cho-
sen to follow “of the” in the generation of new text. The generation of the
“summaries” was similar, but based on the front matter of the book itself.
See (Poli and McPhee, 2008a) for an application of these ideas in genetic
programming.
1 Adobe Photoshop is a registered trademark of Adobe Systems Incorporated